Re: [zfs-discuss] ZFS Pool, what happen when disk failure

2010-04-23 Thread Haudy Kazemi

aneip wrote:

I really new to zfs and also raid.

I have 3 hard disk, 500GB, 1TB, 1.5TB.

On each HD i wanna create 150GB partition + remaining space.

I wanna create raidz for 3x150GB partition. This is for my document + photo.
  
You should be able to create 150 GB slices on each drive, and then 
create a RAIDZ1 out of those 3 slices.



As for the remaining I wanna create my video library. This one no need any 
redundancy since I can simply backup my dvd again.

The question would be, if I create strip pool from the remaining space (350 + 
850 + 1350 GB space). What happen if 1 of the HD failure. Do I loose some file 
of I loose the whole pool?
  
Your remaining space can be configured as slices.  These slices can be 
added directly to a second pool without any redundancy.  If any drive 
fails, that whole non-redundant pool will be lost.  Data recovery 
attempts will likely find that any recoverable video is like 
swiss-cheese, with gaps in it.  This is because files are spread across 
striped devices as they're written to increase read and write 
performance.  In a JBOD arrangement, however, some files might still be 
complete, but I don't believe ZFS supports JBOD-style non-redundant 
pools.  For most people that is not a big deal, as part of the point of 
ZFS is to focus on data integrity and performance, neither of which is 
offered by JBOD (as it is still ruined by single device failures, it is 
just that it is easier to carve files out of a JBOD than a broken RAID).


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Pool, what happen when disk failure

2010-04-23 Thread aneip
I really new to zfs and also raid.

I have 3 hard disk, 500GB, 1TB, 1.5TB.

On each HD i wanna create 150GB partition + remaining space.

I wanna create raidz for 3x150GB partition. This is for my document + photo.

As for the remaining I wanna create my video library. This one no need any 
redundancy since I can simply backup my dvd again.

The question would be, if I create strip pool from the remaining space (350 + 
850 + 1350 GB space). What happen if 1 of the HD failure. Do I loose some file 
of I loose the whole pool?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Making ZFS better: zfshistory

2010-04-23 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Edward Ned Harvey
> 
> Actually, I find this very surprising:
> Question posted:
> http://lopsa.org/pipermail/tech/2010-April/004356.html

As the thread unfolds, it appears, although netapp may sometimes have some
problems with "mv" directories ... This is evidence that appears to be
weakening ... Sometimes they do precisely what you would want them to do.
Which is, to have the .snapshot directory available under every
subdirectory, and the contents are the best you could hope for that
location.  Meaning, the parent directory .snapshot contains an image of the
way the filesystem looked at the time of the snapshot, but the snapshots of
the children, even after the children were renamed, preserve the history of
where the children came from (before they were renamed via "mv").

Still waiting to know more; maybe the bad behavior was a bug that was fixed
in some release.  But to anyone who may have read that thread already:
there are conflicting results coming from different people now.  Meaning, 3
data points, with 2 good and 1 bad.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Snapshots and Data Loss

2010-04-23 Thread Geoff Nordli
>-Original Message-
>From: Ross Walker [mailto:rswwal...@gmail.com]
>Sent: Friday, April 23, 2010 7:08 AM
>>
>> We are currently porting over our existing Learning Lab Infrastructure
>> platform from MS Virtual Server to VBox + ZFS.  When students
>> connect into
>> their lab environment it dynamically creates their VMs and load
>> balances
>> them across physical servers.
>
>You can also check out OpenSolaris' Xen implementation, which if you
>use Linux VMs will allow PV VMs as well as hardware assisted full
>virtualized Windows VMs. There are public domain Windows Xen drivers
>out there.
>
>The advantage of using Xen is it's VM live migration and XMLRPC
>management API. As it runs as a bare metal hypervisor it also allows
>fine granularity of CPU schedules, between guests and the host VM, but
>unfortunately it's remote display technology leaves something to be
>desired. For Windows VMs I use the built-in remote desktop, and for
>Linux VMs I use XDM and use something like 'thinstation' on the client
>side.
>
>-Ross

Hi Ross.

We decided to use a hosted hypervisor like VirtualBox because our customers
use a variety of different platforms and they don't run high end workloads.
We want a lot of flexibility on configuration and OS support (both host and
guest).

Remote control is a challenge.   In our scenario students are going to spin
up exact copies of a lab environment and we need to isolate their machines
in separate networks so you can't directly connect to the VMs.  We don't
know what Guest OS they are going to run so we can't rely on the guest OS
remote control tools.  We want students to be able to have "console" access
and they need to be able to share it out with an instructor.  We want
students to be able to connect from any type of device.  We don't want to
rely on other connection broker software to coordinate access.   VirtualBox
is great because it provides console level access via RDP.  RDP performs
well enough and is pretty much on everything. 

This is probably getting a bit off topic now :) 

Geoff 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle to no longer support ZFS on OpenSolaris?

2010-04-23 Thread Andriy Gapon
on 23/04/2010 04:22 BM said the following:
> On Tue, Apr 20, 2010 at 2:18 PM, Ken Gunderson  wrote:
>> Greetings All:
>>
>> Granted there has been much fear, uncertainty, and doubt following
>> Oracle's take over of Sun, but I ran across this on a FreeBSD mailing
>> list post dated 4/20/2010"
>>
>> "...Seems that Oracle won't offer support for ZFS on opensolaris"
>>
>> Link here to full post here:
>>
>> 
> 
> I am not surprised it comes from FreeBSD mail list. :)

Why this attitude about FreeBSD? Did we eat your lunch?

Have you actually bothered to follow the link?
First, it was a pure speculation inside a question.
Second, look what kind of mailing list was that (general FreeBSD-related 
questions
from anyone).
Third, look from whom that came - just a random person asking a question.
[Paranoia mode: maybe it was even you.]

If, for example, I posted some nonsense about e.g. Apple on e.g. OpenSolaris
mailing list; what conclusions would you make then?

> I am amazed of
> their BSD conferences when they presenting all this *BSD stuff using
> Apple Macs (they claim it is a FreeBSD, just very bad version of it),
> Ubuntu Linux (not yet BSD) or GNU/Microsoft Windows (oh, everybody
> does that sin, right?) with a PowerPoint running on it (sure, who
> wants ugly OpenOffice if there no brain enough to use LaTeX).

What you wrote tells more about you than about FreeBSD and FreeBSD community.

P.S.
I am surprised that on this useful mostly technical mailing list such random
garbage from a random source gets posted at all.  And then gets taken seriously
even...

-- 
Andriy Gapon
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Benchmarking Methodologies

2010-04-23 Thread Scott Meilicke
My use case for opensolaris is as a storage server for a VM environment (we 
also use EqualLogic, and soon an EMC CX4-120). To that end, I use iometer 
within a VM, simulating my VM IO activity, with some balance given to easy 
benchmarking. We have about 110 VMs across eight ESX hosts. Here is what I do:

* Attach a 100G vmdk to one Windows 2003 R2 VM
* Create a 32G test file (my opensolaris box has 16G of RAM)
* export/import the pool on the solaris box, and reboot my guest to clear 
caches all around
* Run a disk queue depth of 32 outstanding IOs
* 60% read, 65% random, 8k block size
* Run for five minutes spool up, then run the test for five minutes

My actual workload is closer to 50% read, 16k block size, so I adjust my 
interpretation of the results accordingly. 

Probably I should run a lot more iometer daemons.

Performance will increase as the benchmark runs due to the l2arc filling up, so 
I found that running the benchmark starting at 5 minutes into the work load was 
a happy medium. Things will get a bit faster the longer the benchmark runs, but 
this is good as far as benchmarking goes.

Only occasionally due I get wacko results, which I happily toss out the window.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for ISCSI ntfs backing store.

2010-04-23 Thread Scott Meilicke
At the time we had it setup as 3 x 5 disk raidz, plus a hot spare. These 16 
disks were in a SAS cabinet, and the the slog was on the server itself. We are 
now running 2 x 7 raidz2 plus a hot spare and slog, all inside the cabinet. 
Since the disks are 1.5T, I was concerned about resliver times for a failed 
disk.

About the only thing I would consider at this point is getting an SSD for the 
l2arc for dedupe performance.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS mirrored boot disks

2010-04-23 Thread Alex
I was having this same problem with snv_134. I executed all the same commands 
as you did. The cloned disk booted up to the "Hostname:" line and then died. 
Booting with the "-kv" kernel option in GRUB, it died at a different point each 
time, most commonly after:

"srn0 is /pseudo/s...@0"

What's worse, my primary disk wouldn't boot either! I tried all manner of 
swapping disks in and out, unplugging & plugging certain disks, changing boot 
order in BIOS, etc. These are PATA disks and I tried changing master to slave, 
slave to master, booting with one drive but not the other, enabling/disabling 
DMA on the drives, etc.

But anyway, after my customary 8 hours of Googling, I found the fix:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6923585

Looks like I neglected to detach the mirror before removing it...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files

2010-04-23 Thread tim Kries
Dedup is a key element for my purpose, because i am planning a central 
repository for like 150 Windows Server 2008 (R2) servers which would take a lot 
less storage if they dedup right.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files

2010-04-23 Thread Khyron
A few things come to mind...

1. A lot better than...what?  Setting the recordsize to 4K got you some
deduplication but maybe the pertinent question is what were you
expecting?

2. Dedup is fairly new.  I haven't seen any reports of experiments like
yours so...CONGRATULATIONS!!  You're probably the first.  Or at least
the first willing to discuss it with the world as a matter of public record?

Since dedup is new, you can't expect much in the way of previous
experience with it.  I also haven't seen coordinated experiments of various
configurations with dedup off then on, for comparison.

In the end, the question is going to be whether that level of dedup is going

to be enough for you.  Is dedup even important?  Is it just a "gravy"
feature
or a key requirement?  You're in un-explored territory, it appears.

On Fri, Apr 23, 2010 at 11:41, tim Kries  wrote:

> Hi,
>
> I am playing with opensolaris a while now. Today i tried to deduplicate the
> backup VHD files Windows Server 2008 generates. I made a backup before and
> after installing AD-role and copied the files to the share on opensolaris
> (build 134). First i got a straight 1.00x, then i set recordsize to 4k (to
> be like NTFS), it jumped up to 1.29x after that. But it should be a lot
> better right?
>
> Is there something i missed?
>
> Regards
> Tim
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
"You can choose your friends, you can choose the deals." - Equity Private

"If Linux is faster, it's a Solaris bug." - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files

2010-04-23 Thread tim Kries
It was active all the time.

Made a new zfs with -o dedup=on, copied with default record size, got no dedup, 
deleted files, set recordsize 4k, dedup ratio 1.29x
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files

2010-04-23 Thread Richard Jahnel
You might note, dedupe only dedupes data that is writen after the flag is set. 
It does not retroactivly dedupe already writen data.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle to no longer support ZFS on OpenSolaris?

2010-04-23 Thread Michael Sullivan
Bogdan,

Thanks for pointing this out and passing along the latest news from Oracle.

Stamp out FUD wherever possible.  At this point, unless it is said officially, 
and Oracle generally keeps pretty tight lipped about products and directions, 
people should regard most things as heresy.

Cheers,

Mike

---
Michael Sullivan   
michael.p.sulli...@me.com
http://www.kamiogi.net/
Japan Mobile: +81-80-3202-2599
US Phone: +1-561-283-2034

On 23 Apr 2010, at 10:22 , BM wrote:

> On Tue, Apr 20, 2010 at 2:18 PM, Ken Gunderson  wrote:
>> Greetings All:
>> 
>> Granted there has been much fear, uncertainty, and doubt following
>> Oracle's take over of Sun, but I ran across this on a FreeBSD mailing
>> list post dated 4/20/2010"
>> 
>> "...Seems that Oracle won't offer support for ZFS on opensolaris"
>> 
>> Link here to full post here:
>> 
>> 
> 
> I am not surprised it comes from FreeBSD mail list. :) I am amazed of
> their BSD conferences when they presenting all this *BSD stuff using
> Apple Macs (they claim it is a FreeBSD, just very bad version of it),
> Ubuntu Linux (not yet BSD) or GNU/Microsoft Windows (oh, everybody
> does that sin, right?) with a PowerPoint running on it (sure, who
> wants ugly OpenOffice if there no brain enough to use LaTeX).
> 
> As for a starter, please somebody read this:
> http://developers.sun.ru/techdays2010/reports/OracleSolarisTrack/TD_STP_OracleSolarisFuture_Roberts.pdf
> ...and thus better I suggest to refrain people broadcasting a complete
> garbage from a trash dump places to spread this kind of FUD to the
> public and thus just shaking an air with no meaning behind.
> 
> Take care.
> 
> -- 
> Kind regards, BM
> 
> Things, that are stupid at the beginning, rarely ends up wisely.
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files

2010-04-23 Thread tim Kries
Hi,

I am playing with opensolaris a while now. Today i tried to deduplicate the 
backup VHD files Windows Server 2008 generates. I made a backup before and 
after installing AD-role and copied the files to the share on opensolaris 
(build 134). First i got a straight 1.00x, then i set recordsize to 4k (to be 
like NTFS), it jumped up to 1.29x after that. But it should be a lot better 
right?

Is there something i missed?

Regards
Tim
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can RAIDZ disks be slices ?

2010-04-23 Thread Haudy Kazemi

Sunil wrote:

If you like, you can later add a fifth drive
relatively easily by 
replacing one of the slices with a whole drive.





how does this affect my available storage if I were to replace both of those 
sparse 500GB files with a real 1TB drive? Will it be same? Or will I have 
expanded my storage? If I understand correctly, I would need to replace other 3 
drives with 1TB as well to expand beyond 3X500GB.

So, in essence I can go from 3x500GB to 3X1000GB in-place with this scheme in 
future if I have the money to upgrade all the drives to 1TB, WITHOUT needing 
any movement of data to temp? Please say yes!:-)
  


It should work to replace devices the way you describe.  The only time 
you need some temp storage space is if you want to change the 
arrangement of devices that make up the pool, e.g. to go from 
striped-mirrors to RAIDZ2, or RAIDZ1 to RAIDZ2, or some other 
combination.  If you just want to replace devices with identical or 
larger sized devices you don't need to move the data anywhere.


The capacity will expand to the lowest common denominator.  In some 
OpenSolaris builds I believe this happened automatically when all member 
devices had been upgraded.  At some point in later builds I think it was 
changed to require manual intervention to prevent problems (like the 
pool suddenly growing to fill all the new big drives when the admin 
really wanted the unused space to stay unused..say for partition/slice 
based short stroking, or when smaller drives were being kept around as 
spares.  If ZFS had the ability to shrink and use smaller devices this 
would not have been as big of a problem.


As I understand it from the documentation, replacement can happen two 
ways.  First, you can connect the replacement device to the system at 
the same time as the original device is working, and then issue the 
replace command.  I think this technique is safe, as the original device 
is still available during the replacement procedure and could be used to 
provide redundancy to the rest of the pool until the new device finishes 
resilvering.  (Does anyone know if this is really the case...i.e. if 
redundancy is preserved during the replacement operation when both 
original and new devices are connected simultaneously and both are 
functioning correctly?  One way to verify this is might be to run zfs 
replace on a non-redundant pool while both devices are connected.)


The second way is to (physically) disconnect the original device and 
connect the new device in its place.  The pool will be degraded because 
a member device is missing...if you have RAIDZ1, you have no redundancy 
remaining, if you have RAIDZ2, you still have 1 level of redundancy 
intact.  The zfs replace command should be able to rebuild the missing 
data onto the replacement new device.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Data movement across filesystems within a pool

2010-04-23 Thread devsk
I would have thought that the file movement from one FS to another within the 
same pool would be almost instantaneous. Why does it take to platter for such a 
movement?

# time cp /tmp/blockfile /pcshare/1gb-tempfile
real0m5.758s

# time mv /pcshare/1gb-tempfile .
real0m4.501s

Both FSs are with compression=off. /tmp is RAM.

-devsk
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Snapshots and Data Loss

2010-04-23 Thread Ross Walker

On Apr 22, 2010, at 11:03 AM, Geoff Nordli  wrote:


From: Ross Walker [mailto:rswwal...@gmail.com]
Sent: Thursday, April 22, 2010 6:34 AM

On Apr 20, 2010, at 4:44 PM, Geoff Nordli   
wrote:



If you combine the hypervisor and storage server and have students
connect to the VMs via RDP or VNC or XDM then you will have the
performance of local storage and even script VirtualBox to take a
snapshot right after a save state.

A lot less difficult to configure on the client side, and allows you
to deploy thin clients instead of full desktops where you can get  
away

with it.

It also allows you to abstract the hypervisor from the client.

Need a bigger storage server with lots of memory, CPU and storage
though.

Later, if need be, you can break out the disks to a storage appliance
with an 8GB FC or 10Gbe iSCSI interconnect.



Right, I am in the process now of trying to figure out what the load  
looks

like with a central storage box and how ZFS needs to be configured to
support that load.  So far what I am seeing is very exciting :)

We are currently porting over our existing Learning Lab Infrastructure
platform from MS Virtual Server to VBox + ZFS.  When students  
connect into
their lab environment it dynamically creates their VMs and load  
balances

them across physical servers.


You can also check out OpenSolaris' Xen implementation, which if you  
use Linux VMs will allow PV VMs as well as hardware assisted full  
virtualized Windows VMs. There are public domain Windows Xen drivers  
out there.


The advantage of using Xen is it's VM live migration and XMLRPC  
management API. As it runs as a bare metal hypervisor it also allows  
fine granularity of CPU schedules, between guests and the host VM, but  
unfortunately it's remote display technology leaves something to be  
desired. For Windows VMs I use the built-in remote desktop, and for  
Linux VMs I use XDM and use something like 'thinstation' on the client  
side.


-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re-attaching zpools after machine termination [amazon ebs & ec2]

2010-04-23 Thread Phillip Oldham
I can replicate this case; Start new instance > attach EBS volumes > reboot 
instance > data finally available.

Guessing that it's something to do with the way the volumes/devices are "seen" 
& then made available. 

I've tried running various operations (offline/online, scrub) to see whether it 
will force zfs to recognise the drive, but nothing seems to work other than a 
restart after attaching.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re-attaching zpools after machine termination [amazon ebs & ec2]

2010-04-23 Thread Robert Milkowski

On 23/04/2010 13:38, Phillip Oldham wrote:

The instances are "ephemeral"; once terminated they cease to exist, as do all 
their settings. Rebooting an image keeps any EBS volumes attached, but this isn't the 
case I'm dealing with - its when the instance terminates unexpectedly. For instance, if a 
reboot operation doesn't succeed or if there's an issue with the data-centre.

There isn't any way (yet, AFACT) to attach an EBS during the boot process, so 
they must be attached after boot.
   

Then perhaps you should do zpool import -R / pool *after* you attach EBS.
That way Solaris won't automatically try to import the pool and your 
scripts will do it once disks are available.


--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re-attaching zpools after machine termination [amazon ebs & ec2]

2010-04-23 Thread Mark Musante

On 23 Apr, 2010, at 8.38, Phillip Oldham wrote:

> The instances are "ephemeral"; once terminated they cease to exist, as do all 
> their settings. Rebooting an image keeps any EBS volumes attached, but this 
> isn't the case I'm dealing with - its when the instance terminates 
> unexpectedly. For instance, if a reboot operation doesn't succeed or if 
> there's an issue with the data-centre.

OK, I think if this issue can be addressed, it would be by people familiar with 
how EC2 & EBS interact.  The steps I see are:

- start a new instance
- attach the EBS volumes to it
- log into the instance and "zpool online" the disks

I know the last step can be automated with a script inside the instance, but 
I'm not sure about the other two steps.


Regards,
markm

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD best practices

2010-04-23 Thread Darren J Moffat

On 23/04/2010 12:24, Edward Ned Harvey wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of thomas

Someone on this list threw out the idea a year or so ago to just setup
2 ramdisk servers, export a ramdisk from each and create a mirror slog
from them.


Isn't the whole point of a ramdisk to be fast?
And now it's going to be at the other end of an Ethernet, with TCP and ...
some additional filesystem overhead?


iSCSI over 1G or even 10G Ethernet to something on the remote side can 
be very fast, faster than a 7200rpm drive and possibly faster than a 15k 
rpm drive.


Or maybe it isn't Ethernet but Infiband, then we are looking at very fast.

The point of the ZFS L2ARC cache devices is to be faster than your main 
pool devices.  In particular the idea is to allow you to use cheaper 
7200 rpm (or maybe even slower) disks rather than expensive 15k rpm 
drives but to get equivalent or better performance for certain types of 
workload that have traditionally been dominated by 15k rpm drives.


If you are using this as a ZFS log device then you need to be more 
careful as the log device does need to persist, otherwise there is no 
point in having it.


I remember many years ago on SPARCstation ELC (sun4c) systems with only 
8Mb of RAM and local swap (IIRC local / but remote /usr too so a 
dataless client) it was better to run some X applications remotely on 
another machine (that someone else was using) than to let them swap 
locally.  The idea being that you had to be unlucky for both machines to 
need to swap and both to swap out the same program at the same time. 
That was only over 10BaseT.


What I'm saying is that this isn't new, don't assume that the path 
to/from local storage is faster than networking.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re-attaching zpools after machine termination [amazon ebs & ec2]

2010-04-23 Thread Phillip Oldham
One thing I've just noticed is that after a reboot of the new instance, which 
showed no data on the EBS volume, the files return. So:

1. Start new instance
2. Attach EBS vols
3. `ls /foo` shows no data
4. Reboot instance
5. Wait a few minutes
6. `ls /foo` shows data as expected

Not sure if this helps track down why, after the initial start + attach, the 
EBS vol shows no data.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re-attaching zpools after machine termination [amazon ebs & ec2]

2010-04-23 Thread Phillip Oldham
The instances are "ephemeral"; once terminated they cease to exist, as do all 
their settings. Rebooting an image keeps any EBS volumes attached, but this 
isn't the case I'm dealing with - its when the instance terminates 
unexpectedly. For instance, if a reboot operation doesn't succeed or if there's 
an issue with the data-centre.

There isn't any way (yet, AFACT) to attach an EBS during the boot process, so 
they must be attached after boot.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re-attaching zpools after machine termination [amazon ebs & ec2]

2010-04-23 Thread Mark Musante

On 23 Apr, 2010, at 7.31, Phillip Oldham wrote:

> I'm not actually issuing any when starting up the new instance. None are 
> needed; the instance is booted from an image which has the zpool 
> configuration stored within, so simply starts and sees that the devices 
> aren't available, which become available after I've attached the EBS device.
> 

Forgive my ignorance with EC2/EBS, but why doesn't the instance remember that 
there were EBS volumes attached?  Why aren't they automatically attached prior 
to booting solaris within the instance?  The error output from "zpool status" 
that you're seeing matches what I would expect if we are attempting to import 
the pool at boot, and the disks aren't present.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Making ZFS better: zfshistory

2010-04-23 Thread Edward Ned Harvey
> From: Richard Elling [mailto:richard.ell...@gmail.com]
> 
> One last try. If you change the "real" directory structure, how are
> those
> changes reflected in the "snapshot" directory structure?
> 
> Consider:
>   echo "whee" > /a/b/c/d.txt
>   [snapshot]
>   mv /a/b /a/B
> 
> What does /a/B/c/.snapshot point to?  If the answer is "nothing," then
> I see
> significantly less value in the feature.

Actually, I find this very surprising:
Question posted:
http://lopsa.org/pipermail/tech/2010-April/004356.html 
Answered:
http://lopsa.org/pipermail/tech/2010-April/004358.html
My comments about it:
http://lopsa.org/pipermail/tech/2010-April/004361.html
and
http://lopsa.org/pipermail/tech/2010-April/004362.html

So apparently, the netapp snapshot is more directory based, and not so much
filesystem based.  Very surprising, to me.

But this is good, because if implemented in ZFS, it leaves room for
improvement:  There's no reason why the snapshots under "a" couldn't be
identical to the snapshots under "a/e"  The only difficulty is for the
filesystem to recognize a "mv" command as such, and to link things
appropriately behind the scenes.  In this case, "mv" is different from "cp ;
rm" which was a long-time problem of svn, analogous to this one in the
filesystem.  But eventually, svn got it right, and now recognizes "mv"
should preserve the history for items after they're renamed.

IMHO, the (ideal) desired behavior is *neither* what ZFS currently does, or
what netapp currently does.  IMHO, the ideal desired behavior is as
described in that last email above.  Namely:
* Let the snap of the parent directory remain the same as it was at the time
of the snapshot, 
And
* Let the snapshots of the renamed subdirectory go along with the new name
of the subdirectory.

Anyway, it's all pointless at this moment.  Because obviously that's all
nontrivial, and very diminishing returns.  So I maintain very near zero hope
that it'll ever happen.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re-attaching zpools after machine termination [amazon ebs & ec2]

2010-04-23 Thread Phillip Oldham
I'm not actually issuing any when starting up the new instance. None are 
needed; the instance is booted from an image which has the zpool configuration 
stored within, so simply starts and sees that the devices aren't available, 
which become available after I've attached the EBS device.

Before the image was bundled the following zpool commands were issued with the 
EBS volumes attached at "10" (primary), "6" (log main) and "7" (log mirror):

# zpool create foo c7d16 log mirror c7d6 c7d7
# zpool status
  pool: mnt
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
mnt ONLINE   0 0 0
  c7d1p0ONLINE   0 0 0
  c7d2p0ONLINE   0 0 0

errors: No known data errors

  pool: foo
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
foo ONLINE   0 0 0
  c7d16 ONLINE   0 0 0
logsONLINE   0 0 0
  mirrorONLINE   0 0 0
c7d6ONLINE   0 0 0
c7d7ONLINE   0 0 0

errors: No known data errors

  pool: rpool
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
rpool   ONLINE   0 0 0
  c7d0s0ONLINE   0 0 0

errors: No known data errors

After booting a new instance based on the image I see this:

# zpool status

  pool: foo
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-3C
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
foo UNAVAIL  0 0 0  insufficient replicas
  c7d16 UNAVAIL  0 0 0  cannot open
logsUNAVAIL  0 0 0  insufficient replicas
  mirrorUNAVAIL  0 0 0  insufficient replicas
c7d6UNAVAIL  0 0 0  cannot open
c7d7UNAVAIL  0 0 0  cannot open

Which changes to "ONLINE" (as previous) when the EBS volumes are attached.

After reading through the documentation a little more, could this be due to the 
zpool.cache file being stored on the image (& therefore refreshed after each 
boot) rather than somewhere more persistent?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD best practices

2010-04-23 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of thomas
> 
> Someone on this list threw out the idea a year or so ago to just setup
> 2 ramdisk servers, export a ramdisk from each and create a mirror slog
> from them.

Isn't the whole point of a ramdisk to be fast?
And now it's going to be at the other end of an Ethernet, with TCP and ...
some additional filesystem overhead?

No thank you.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re-attaching zpools after machine termination [amazon ebs & ec2]

2010-04-23 Thread Mark Musante

On 23 Apr, 2010, at 7.06, Phillip Oldham wrote:
> 
> I've created an OpenSolaris 2009.06 x86_64 image with the zpool structure 
> already defined. Starting an instance from this image, without attaching the 
> EBS volume, shows the pool structure exists and that the pool state is 
> "UNAVAIL" (as expected). Upon attaching the EBS volume to the instance the 
> status of the pool changes to "ONLINE", the mount-point/directory is 
> accessible and I can write data to the volume.
> 
> Now, if I terminate the instance, spin-up a new one, and connect the same 
> (now unattached) EBS volume to this new instance the data is no longer there 
> with the EBS volume showing as blank. 

Could you share with us the zpool commands you are using?


Regards,
markm___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re-attaching zpools after machine termination [amazon ebs & ec2]

2010-04-23 Thread Phillip Oldham
I'm trying to provide some "disaster-proofing" on Amazon EC2 by using a 
ZFS-based EBS volume for primary data storage with Amazon S3-backed snapshots. 
My aim is to ensure that, should the instance terminate, a new instance can 
spin-up, attach the EBS volume and auto-/re-configure the zpool.

I've created an OpenSolaris 2009.06 x86_64 image with the zpool structure 
already defined. Starting an instance from this image, without attaching the 
EBS volume, shows the pool structure exists and that the pool state is 
"UNAVAIL" (as expected). Upon attaching the EBS volume to the instance the 
status of the pool changes to "ONLINE", the mount-point/directory is accessible 
and I can write data to the volume.

Now, if I terminate the instance, spin-up a new one, and connect the same (now 
unattached) EBS volume to this new instance the data is no longer there with 
the EBS volume showing as blank. 

EBS is supposed to ensure persistence of data after EC2 instance termination, 
so I'm assuming that when the newly attached drive is seen by ZFS for the first 
time it is wiping the data some how? Or possibly that some ZFS logs or details 
on file location/allocation aren't being persisted? Assuming this, I created an 
additional EBS volume to persist the intent-logs across instances but I'm 
seeing the same problem.

I'm new to ZFS, and would really appreciate the community's help on this.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mac OS X clients with ZFS server

2010-04-23 Thread Alex Blewitt

On 22 Apr 2010, at 20:50, Rich Teer  wrote:


On Thu, 22 Apr 2010, Alex Blewitt wrote:

Hi Alex,


For your information, the ZFS project lives (well, limps really) on
at http://code.google.com/p/mac-zfs. You can get ZFS for Snow Leopard
from there and we're working on moving forwards from the ancient pool
support to something more recent. I've relatively recently merged in
the onnv-gate repository (at build 72) which should make things  
easier

to track in the future.


That's good to hear!  I thought Apple yanking ZFS support from Mac  
OS was

a really dumb idea.  Do you work for Apple?


No, the entire effort is community based. Please feel free to join up  
to the mailing list from the project page if you're interested in ZFS  
on Mac OSX.


Alex
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool lists 2 controllers the same, how do I replace one?

2010-04-23 Thread Markus Kovero

> If you're lucky, the device will be marked as not being present, and then 
> you can use the GUID.

> To find out, use the command "zdb -C" to dump out the configuation 
> information.  In the output, look for the offline disk (it should be under 
> a heading "children[3]").  If the "not_present" value is there, then you 
> can use the guid to do the replace.  The guid is the really long number 
> listed after the "id" value (which should also be 3 in your config).

What if not_present isn't there;

children[1]:
type: 'disk'
id: 1
guid: 9311942279929207354
path: '/dev/dsk/c6t33d0s0'
devid: 'id1,s...@n50014ee0010dd179/a'
phys_path: 
'/p...@0,0/pci8086,3...@7/pci8086,3...@0/pci1028,1...@8/s...@21,0:a'
whole_disk: 1
DTL: 584
create_txg: 40

children[14]:
type: 'disk'
id: 14
guid: 13886071452172028089
path: '/dev/dsk/c6t33d0s0'
devid: 'id1,s...@n50014ee101e8fc90/a'
phys_path: 
'/p...@0,0/pci8086,3...@7/pci8086,3...@0/pci1028,1...@8/s...@21,0:a'
whole_disk: 1
DTL: 449
create_txg: 64771

Other is failed and other is online on pool, in different raidz2-sets

Yours
Markus Kovero

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss