from:"Edward Ned Harvey \(opensolarisisdeadlongliveopensolaris\)"

Re: [zfs-discuss] This mailing list EOL???

2013-03-21 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

mail-archive.com is an independent third party.

This is one of their FAQ's
http://www.mail-archive.com/faq.html#duration

The Mail Archive has been running since 1998. Archiving services are planned to 
continue indefinitely. We do not plan on ever needing to remove archived 
material. Do not, however, misconstrue these intentions with a warranty of any 
kind. We reserve the right to discontinue service at any time.



From: zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Deirdre Straughan
Sent: Wednesday, March 20, 2013 5:16 PM
To: Cindy Swearingen; zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] This mailing list EOL???

Will the archives of all the lists be preserved? I don't think we've seen a 
clear answer on that (it's possible you haven't, either!).
On Wed, Mar 20, 2013 at 2:14 PM, Cindy Swearingen 
mailto:cindy.swearin...@oracle.com>> wrote:
Hi Ned,

This list is migrating to java.net<http://java.net> and will not be available
in its current form after March 24, 2013.

The archive of this list is available here:

http://www.mail-archive.com/zfs-discuss@opensolaris.org/

I will provide an invitation to the new list shortly.

Thanks for your patience.

Cindy


On 03/20/13 15:05, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) 
wrote:
I can't seem to find any factual indication that 
opensolaris.org<http://opensolaris.org> mailing
lists are going away, and I can't even find the reference to whoever
said it was EOL in a few weeks ... a few weeks ago.

So ... are these mailing lists going bye-bye?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org<mailto:zfs-discuss@opensolaris.org>
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org<mailto:zfs-discuss@opensolaris.org>
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--


best regards,
Deirdré Straughan
Community Architect, SmartOS
illumos Community Manager


cell 720 371 4107
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] This mailing list EOL???

2013-03-20 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

I can't seem to find any factual indication that opensolaris.org mailing lists 
are going away, and I can't even find the reference to whoever said it was EOL 
in a few weeks ... a few weeks ago.

So ... are these mailing lists going bye-bye?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] What would be the best tutorial cum reference doc for ZFS

2013-03-19 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Hans J. Albertsson
> 
> I'm looking for something that would make me afterwards understand what,
> say, commands like  zpool import ... or zfs send ... actually do, and
> some idea as to why, so I can begin to understand ZFS in a way that
> allows me to make educated guesses on how to perform tasks I haven't
> tried before.

man zpool
man zfs
And the ZFS Best Practices Guide
And the ZFS Evil (I forget what it's called, performance tuning? just search 
for evil, you'll find it.)

But almost everything is literally in the man pages.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] partioned cache devices

2013-03-15 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Andrew Werchowiecki
> 
> muslimwookie@Pyzee:~$ sudo zpool add aggr0 cache c25t10d1p2
> Password:
> cannot open '/dev/dsk/c25t10d1p2': I/O error
> muslimwookie@Pyzee:~$
> 
> I have two SSDs in the system, I've created an 8gb partition on each drive for
> use as a mirrored write cache. I also have the remainder of the drive
> partitioned for use as the read only cache. However, when attempting to add
> it I get the error above.

Sounds like you're probably running into confusion about how to partition the 
drive.  If you create fdisk partitions, they will be accessible as p0, p1, p2, 
but I think p0 unconditionally refers to the whole drive, so the first 
partition is p1, and the second is p2.

If you create one big solaris fdisk parititon and then slice it via "partition" 
where s2 is typically the encompassing slice, and people usually use s1 and s2 
and s6 for actual slices, then they will be accessible via s1, s2, s6

Generally speaking, it's unadvisable to split the slog/cache devices anyway.  
Because:  

If you're splitting it, evidently you're focusing on the wasted space.  Buying 
an expensive 128G device where you couldn't possibly ever use more than 4G or 
8G in the slog.  But that's not what you should be focusing on.  You should be 
focusing on the speed (that's why you bought it in the first place.)  The slog 
is write-only, and the cache is a mixture of read/write, where it should be 
hopefully doing more reads than writes.  But regardless of your actual success 
with the cache device, your cache device will be busy most of the time, and 
competing against the slog.

You have a mirror, you say.  You should probably drop both the cache & log.  
Use one whole device for the cache, use one whole device for the log.  The only 
risk you'll run is:

Since a slog is write-only (except during mount, typically at boot) it's 
possible to have a failure mode where you think you're writing to the log, but 
the first time you go back and read, you discover an error, and discover the 
device has gone bad.  In other words, without ever doing any reads, you might 
not notice when/if the device goes bad.  Fortunately, there's an easy 
workaround.  You could periodically (say, once a month) script the removal of 
your log device, create a junk pool, write a bunch of data to it, scrub it 
(thus verifying it was written correctly) and in the absence of any scrub 
errors, destroy the junk pool and re-add the device as a slog to the main pool.

I've never heard of anyone actually being that paranoid, and I've never heard 
of anyone actually experiencing the aforementioned possible undetected device 
failure mode.  So this is all mostly theoretical.

Mirroring the slog device really isn't necessary in the modern age.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] maczfs / ZEVO

2013-02-18 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Edward Ned Harvey
> 
> Tim, Simon, Volker, Chris, and Erik - How do you use it?
> I am making the informed guess, that you're using it primarily on non-
> laptops, which have second hard drives, and you're giving the entire disk to
> the zpool.  Right?

Perhaps it works fine for whole disks, or even partitions, but with my 
file-backed pool, the performance was terrible.  Everything else I could work 
around ... lack of zvol, inability to import during reboot ...  But the 
performance problem was significant enough for me to scrap it and go back to 
normal.  Oh well.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs-discuss mailing list & opensolaris EOL

2013-02-17 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Tim Cook [mailto:t...@cook.ms]
> 
> We can agree to disagree.
> 
> I think you're still operating under the auspices of Oracle wanting to have an
> open discussion.  This is patently false.  

I'm just going to respond to this by saying thank you, Cindy, Casper, Neil, and 
others, for all the help over the years.  I think we all agree it was cooler 
when opensolaris was open, but things are beyond our control, so be it.  Moving 
forward, I don't expect Oracle to be any more open than MS or Apple or Google, 
which is to say, I understand there's stuff you can't talk about, and support 
you can't give freely or openly.  But to the extent you're still able to 
discuss publicly known things, thank you.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs-discuss mailing list & opensolaris EOL

2013-02-17 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Tim Cook [mailto:t...@cook.ms]
> 
> Why would I spend all that time and
> energy participating in ANOTHER list controlled by Oracle, when they have
> shown they have no qualms about eliminating it with basically 0 warning, at
> their whim?

>From an open source, community perspective, I understand and agree with this 
>sentiment.  If OSS projects behave this way, they die.  The purpose of an 
>oracle-hosted mailing list is not for the sake of being open in any way. It's 
>for the sake of allowing public discussions about their product.  While a 
>certain amount of knowledge will exist with or without the list (people can 
>still download solaris 11 for evaluation purposes and test it out on the honor 
>system) there will be less oracle-specific knowledge in existence without the 
>list.  For anyone who's 100% dedicated to OSS and/or illumos and doesn't care 
>about oracle-specific stuff, there's no reason to use that list.  But for 
>those of us who are sysadmins, developers using eval-licensed solaris, or in 
>any way not completely closed to the possibility of using oracle zfs / 
>solaris...  For those of us, it makes sense.

Guess what, I formerly subscribed to netapp-toasters as well.  Until zfs came 
along and I was able to happily put netapp in my past.  Perhaps someday I'll 
leave zfs behind in favor of btrfs.  But not yet.

Guess what also, there is a very active thriving Microsoft forum out there too. 
 And they don't even let you download MS Office or Windows for evaluation 
purposes - they're even more closed than Oracle in this regard.  They learned 
their lesson about piracy and the honor system.   ;-)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] maczfs / ZEVO

2013-02-17 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Tim Cook [mailto:t...@cook.ms]
> Sent: Friday, February 15, 2013 11:14 AM
> 
> I have a few coworkers using it.  No horror stories and it's been in use 
> about 6
> months now.  If there were any showstoppers I'm sure I'd have heard loud
> complaints by now :)

So, I have discovered a *couple* of unexpected problems.
At first, I thought it would be nice to split my HD into 2 partitions, use the 
2nd partition for zpool, and use vmdk wrapper around a zvol raw device.  So I 
started partitioning my HD.  As it turns out, there's a bug in diskutility...  
As long as you partition your hard drive and *format* the second partition with 
hfs+, then it works very smoothly. But then I couldn't find any way to dismount 
the second partition (there is no eject) ... If I go back, I think maybe I'll 
figure it out, but I didn't try too hard ... I resized back to normal, and then 
split again, selecting the "Empty Space" option for the second partition.  Bad 
idea.  Diskutillity horked the partition tables, and I had to restore from time 
machine.  I thought maybe it was just a fluke, so I repeated the whole process 
a second time ... try to split disk, try to make the second half "Free Space" 
and forced to restore system.

Lesson learned. Don't try to create an unused partition on the mac HD.

So then I just created one big honking file via "dd" and used it for zpool 
store.  Tried to create zvol.  Unfortunately zevo doesn't do zvol.

Ok, no problem.  Windows can run NTFS inside a vmdk file inside a zfs 
filesystem inside an hfs+ file inside the hfs+ filesystem.  (Yuk.)  But it 
works.  

Unfortunately, because it's a file in the backend, zevo doesn't find the pool 
on reboot.  It doesn't seem to do the equivalent of a zpool.cache.  I've asked 
a question in their support forum to see if there's some way to solve that 
problem, but I don't know yet.

Tim, Simon, Volker, Chris, and Erik - How do you use it?
I am making the informed guess, that you're using it primarily on non-laptops, 
which have second hard drives, and you're giving the entire disk to the zpool.  
Right?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs-discuss mailing list & opensolaris EOL

2013-02-17 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: cindy swearingen [mailto:cindy.swearin...@gmail.com]
> 
> This was new news to use too and we're just talking over some options
> yesterday
> afternoon so please give us a chance to regroup and provide some
> alternatives.
> 
> This list will be shutdown but we can start a new one on java.net.

Thanks Cindy - I, for one, am in favor of another list on java.net, because the 
development is basically split into oracle & illumos.  While illumos users 
might have a small aversion to using another oracle list, I think oracle users 
will likely have a much larger aversion to using a non-oracle list.  So I think 
there's room for both lists, as well as just cause for both lists.

If at all possible, I would advise preserving the history of these mailing 
lists.  Extremely useful sometimes, when referencing past conversations and 
stuff, and searching for little tidbits via google.

I would also advise making some sort of announcement on any of the other 
opensolaris mailing lists that happen to be active.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs-discuss mailing list & opensolaris EOL

2013-02-16 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Bob Friesenhahn [mailto:bfrie...@simple.dallas.tx.us]
>  
> Good for you.  I am sure that Larry will be contacting you soon.

hehehehehe...  he knows better.   ;-)


> Previously Oracle announced and invited people to join their
> discussion forums, which are web-based and virtually dead.

Invited people with paid support contracts.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs-discuss mailing list & opensolaris EOL

2013-02-16 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: sriram...@gmail.com [mailto:sriram...@gmail.com] On Behalf Of
> Sriram Narayanan
> 
> Or, given that this is a weekend, we assume that someone within Oracle
> would see this mail only on Monday morning Pacific Time, then send out
> some mails within, and be able to respond in public only by Wednesday
> evening Pacific Time at best.

I remembered to take that into account.  Question was posted Friday morning, 
EST.  And not every oracle employee subscribes here with their work email 
address.  Nor does everyone limit themselves to conversing in the community 
during only business hours.

Don't forget Monday's a holiday.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs-discuss mailing list & opensolaris EOL

2013-02-16 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Tim Cook [mailto:t...@cook.ms]
> 
> That would be the logical decision, yes.  Not to poke fun, but did you really
> expect an official response after YEARS of nothing from Oracle?  This is the
> same company that refused to release any Java patches until the DHS issued
> a national warning suggesting that everyone uninstall Java.

Well, yes.  We do have oracle employees who contribute to this mailing list.  
It is not accurate or fair to stereotype the whole company.  Oracle by itself 
is as large as some cities or countries.

I can understand a company policy of secrecy about development direction and 
stuff like that.  I would think somebody would be able to officially confirm or 
deny that this mailing list is going to stop.  At least one of their system 
administrators lurks here...

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs-discuss mailing list & opensolaris EOL

2013-02-16 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

In the absence of any official response, I guess we just have to assume this 
list will be shut down, right?
So I guess we just have to move to the illumos mailing list, as Deirdre 
suggests?



From: zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Edward Ned Harvey 
(opensolarisisdeadlongliveopensolaris)
Sent: Friday, February 15, 2013 11:00 AM
To: zfs-discuss@opensolaris.org
Subject: [zfs-discuss] zfs-discuss mailing list & opensolaris EOL

So, I hear, in a couple weeks' time, opensolaris.org is shutting down.  What 
does that mean for this mailing list?  Should we all be moving over to 
something at illumos or something?

I'm going to encourage somebody in an official capacity at opensolaris to 
respond...
I'm going to discourage unofficial responses, like, illumos enthusiasts etc 
simply trying to get people to jump this list.

Thanks for any info ...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] maczfs / ZEVO

2013-02-15 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

Anybody using maczfs / ZEVO?  Have good or bad things to say, in terms of 
reliability, performance, features?

My main reason for asking is this:  I have a mac, I use Time Machine, and I 
have VM's inside.  Time Machine, while great in general, has the limitation of 
being unable to intelligently identify changed bits inside a VM file.  So you 
have to exclude the VM from Time Machine, and you have to run backup software 
inside the VM.

I would greatly prefer, if it's reliable, to let the VM reside on ZFS and use 
zfs send to backup my guest VM's.

I am not looking to replace HFS+ as the primary filesystem of the mac; although 
that would be cool, there's often a reliability benefit to staying on the 
supported, beaten path, standard configuration.  But if ZFS can be used to hold 
the guest VM storage reliably, I would benefit from that.

Thanks...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] zfs-discuss mailing list & opensolaris EOL

2013-02-15 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

So, I hear, in a couple weeks' time, opensolaris.org is shutting down.  What 
does that mean for this mailing list?  Should we all be moving over to 
something at illumos or something?

I'm going to encourage somebody in an official capacity at opensolaris to 
respond...
I'm going to discourage unofficial responses, like, illumos enthusiasts etc 
simply trying to get people to jump this list.

Thanks for any info ...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] how to know available disk space

2013-02-08 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Pasi Kärkkäinen [mailto:pa...@iki.fi]
> 
> What's the correct way of finding out what actually uses/reserves that 1023G
> of FREE in the zpool?

Maybe this isn't exactly what you need, but maybe:

for fs in `zfs list -H -o name` ; do echo $fs ; zfs get 
reservation,refreservation,usedbyrefreservation $fs ; done


> At this point the filesystems are full, and it's not possible to write to them
> anymore.

You'll have to either reduce your reservations, or destroy old snapshots.  Or 
add more disks.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] how to know available disk space

2013-02-06 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Gregg Wonderly [mailto:gregg...@gmail.com]
> 
> This is one of the greatest annoyances of ZFS.  I don't really understand how,
> a zvol's space can not be accurately enumerated from top to bottom of the
> tree in 'df' output etc.  Why does a "zvol" divorce the space used from the
> root of the volume?

The way I would say that is:  Intuitively, I think people expect reservations 
to count against Alloc and Used.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] how to know available disk space

2013-02-06 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

I have a bunch of VM's, and some samba shares, etc, on a pool.  I created the 
VM's using zvol's, specifically so they would have an appropriate 
refreservation and never run out of disk space, even with snapshots.  Today, I 
ran out of disk space, and all the VM's died.  So obviously it didn't work.

When I used "zpool status" after the system crashed, I saw this:
NAME  SIZE  ALLOC   FREE  EXPANDSZCAP  DEDUP  HEALTH  ALTROOT
storage   928G   568G   360G -61%  1.00x  ONLINE  -

I did some cleanup, so I could turn things back on ... Freed up about 4G.

Now, when I use "zpool status" I see this:
NAME  SIZE  ALLOC   FREE  EXPANDSZCAP  DEDUP  HEALTH  ALTROOT
storage   928G   564G   364G -60%  1.00x  ONLINE  -

When I use "zfs list storage" I see this:
NAME  USED  AVAIL  REFER  MOUNTPOINT
storage   909G  4.01G  32.5K  /storage

So I guess the lesson is (a) refreservation and zvol alone aren't enough to 
ensure your VM's will stay up.  and (b) if you want to know how much room is 
*actually* available, as in "usable," as in, "how much can I write before I run 
out of space," you should use "zfs list" and not "zpool status"

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Scrub performance

2013-02-04 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Edward Ned Harvey
>
> I can tell you I've had terrible everything rates when I used dedup.

So, the above comment isn't fair, really.  The truth is here:
http://mail.opensolaris.org/pipermail/zfs-discuss/2011-July/049209.html

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Scrub performance

2013-02-04 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Koopmann, Jan-Peter
> 
> all I can tell you is that I've had terrible scrub rates when I used dedup. 

I can tell you I've had terrible everything rates when I used dedup.


> The
> DDT was a bit too big to fit in my memory (I assume according to some very
> basic debugging). 

This is more or less irrelevant, becuase the system doesn't load it into memory 
anyway.  It will cache a copy in ARC just like everything else in the pool.  It 
gets evicted just as quickly as everything else.


> Only two of my datasets were deduped. On scrubs and
> resilvers I noticed that sometimes I had terrible rates with < 10MB/sec. Then
> later it rose up to < 70MB/sec. After upgrading some discs (same speeds
> observed) I got rid of the deduped datasets (zfs send/receive them) and
> guess what: All of the sudden scrub goes to 350MB/sec steady and only take
> a fraction of the time.

Are you talking about scrub rates for the complete scrub?  Because if you sit 
there and watch it, from minute to minute, it's normal for it to bounce really 
low for a long time, and then really high for a long time, etc.  The only 
measurement that has any real meaning is time to completion.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] RFE: Un-dedup for unique blocks

2013-01-29 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Robert Milkowski [mailto:rmilkow...@task.gda.pl]
> 
> That is one thing that always bothered me... so it is ok for others, like
> Nexenta, to keep stuff closed and not in open, while if Oracle does it they
> are bad?

Oracle, like Nexenta, and my own company CleverTrove, and Microsoft, and 
Netapp, has every right to close source development, if they believe it's 
beneficial to their business.  For all we know, Oracle might not even have a 
choice about it - it might have been in the terms of settlement with NetApp 
(because open source ZFS definitely hurt NetApp business.)  The real question 
is, in which situations, is it beneficial to your business to be closed source, 
as opposed to open source?  There's the whole redhat/centos dichotomy.  At 
first blush, it would seem redhat gets screwed by centos (or oracle linux) but 
then you realize how many more redhat derived systems are out there, compared 
to suse, etc.  By allowing people to use it for free, it actually gains 
popularity, and then redhat actually has a successful support business model as 
compared to suse, which tanked.

But it's useless to argue about whether oracle's making the right business 
choice, whether open or closed source is better for their business.  Cuz it's 
their choice, regardless who agrees.  Arguing about it here isn't going to do 
any good.

Those of us who gained something and no longer count on having that benefit 
moving forward have a tendency to say "You gave it to me for free before, now 
I'm pissed off because you're not giving it to me for free anymore."  instead 
of "thanks for what you gave before."

The world moves on.  There's plenty of time to figure out which solution is 
best for you, the consumer, in the future product offerings:  commercial closed 
source product offering, open source product offering, or something completely 
different such as btrfs.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] RFE: Un-dedup for unique blocks

2013-01-23 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Gary Mills [mailto:gary_mi...@fastmail.fm]
> 
> > In solaris, I've never seen it swap out idle processes; I've only
> > seen it use swap for the bad bad bad situation.  I assume that's all
> > it can do with swap.
> 
> You would be wrong.  Solaris uses swap space for paging.  Paging out
> unused portions of an executing process from real memory to the swap
> device is certainly beneficial.  Swapping out complete processes is a
> desperation move, but paging out most of an idle process is a good
> thing.

You seem to be emphasizing the distinction between swapping and paging.  My 
point though, is that I've never seen the swap usage (which is being used for 
paging) on any solaris derivative to be used nonzero, for the sake of keeping 
something in cache.  It seems to me, that solaris will always evict all cache 
memory before it swaps (pages) out even the most idle process memory.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] RFE: Un-dedup for unique blocks

2013-01-22 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Nico Williams
> 
> As for swap... really, you don't want to swap.  If you're swapping you
> have problems.  

For clarification, the above is true in Solaris and derivatives, but it's not 
universally true for all OSes.  I'll cite linux as the example, because I know 
it.  If you provide swap to a linux kernel, it considers this a degree of 
freedom when choosing to evict data from the cache, versus swapping out idle 
processes (or zombie processes.)  As long as you swap out idle process memory 
that is colder than some cache memory, swap actually improves performance.  But 
of course, if you have any active process starved of ram and consequently 
thrashing swap actively, of course, you're right.  It's bad bad bad to use swap 
that way.

In solaris, I've never seen it swap out idle processes; I've only seen it use 
swap for the bad bad bad situation.  I assume that's all it can do with swap.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] RFE: Un-dedup for unique blocks

2013-01-22 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Darren J Moffat [mailto:darr...@opensolaris.org]
> 
> Support for SCSI UNMAP - both issuing it and honoring it when it is the
> backing store of an iSCSI target.

When I search for scsi unmap, I come up with all sorts of documentation that 
... is ... like reading a medical journal when all you want to know is the 
conversion from 98.6F to C.

Would you mind momentarily, describing what SCSI UNMAP is used for?  If I were 
describing to a customer (CEO, CFO) I'm not going to tell them about SCSI 
UNMAP, I'm going to say the new system has a new feature that enables ... or 
solves the ___ problem...  

Customer doesn't *necessarily* have to be as clueless as CEO/CFO.  Perhaps just 
another IT person, or whatever.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] RFE: Un-dedup for unique blocks

2013-01-21 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Sašo Kiselkov [mailto:skiselkov...@gmail.com]
> 
> as far as incompatibility among products, I've yet to come
> across it

I was talking about ... install solaris 11, and it's using a new version of zfs 
that's incompatible with anything else out there.  And vice-versa.  (Not sure 
if feature flags is the default, or zpool 28 is the default, in various 
illumos-based distributions.  But my understanding is that once you upgrade to 
feature flags, you can't go back to 28.  Which means, mutually, anything >28 is 
incompatible with each other.)  You have to typically make a conscious decision 
and plan ahead, and intentionally go to zpool 28 and no higher, if you want 
compatibility between systems.


> Let us know at z...@lists.illumos.org how that goes, perhaps write a blog
> post about your observations. I'm sure the BTRFS folks came up with some
> neat ideas which we might learn from.

Actually - I've written about it before (but it'll be difficult to find, and 
nothing earth shattering, so not worth the search.)  I don't think there's 
anything that zfs developers don't already know.  Basic stuff like fsck, and 
ability to shrink and remove devices, those are the things btrfs has and zfs 
doesn't.  (But there's lots more stuff that zfs has and btrfs doesn't.  Just 
making sure my previous comment isn't seen as a criticism of zfs, or a 
judgement in favor of btrfs.)

And even with a new evaluation, the conclusion can't be completely clear, nor 
immediate.  Last evaluation started about 10 months ago, and we kept it in 
production for several weeks or a couple of months, because it appeared to be 
doing everything well.  (Except for features that were known to be not-yet 
implemented, such as read-only snapshots (aka quotas) and btrfs-equivalent of 
"zfs send.")  Problem was, the system was unstable, crashing about once a week. 
 No clues why.  We tried all sorts of things in kernel, hardware, drivers, with 
and without support, to diagnose and capture the cause of the crashes.  Then 
one day, I took a blind stab in the dark (for the ninetieth time) and I 
reformatted the storage volume ext4 instead of btrfs.  After that, no more 
crashes.  That was approx 8 months ago.

I think the only thing I could learn upon a new evaluation is:  #1  I hear 
"btrfs send" is implemented now.  I'd like to see it with my own eyes before I 
believe it.  #2  I hear quotas (read-only snapshots) are implemented now.  
Again, I'd like to see it before I believe it.  #3  Proven stability.  Never 
seen it yet with btrfs.  Want to see it with my eyes and stand the test of time 
before it earns my trust.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] RFE: Un-dedup for unique blocks

2013-01-21 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Richard Elling [mailto:richard.ell...@gmail.com]
> 
> I disagree the ZFS is developmentally challenged. 

As an IT consultant, 8 years ago before I heard of ZFS, it was always easy to 
sell Ontap, as long as it fit into the budget.  5 years ago, whenever I told 
customers about ZFS, it was always a quick easy sell.  Nowadays, anybody who's 
heard of it says they don't want it, because they believe it's a dying product, 
and they're putting their bets on linux instead.  I try to convince them 
otherwise, but I'm trying to buck the word on the street.  They don't listen, 
however much sense I make.  I can only sell ZFS to customers nowadays, who have 
still never heard of it.

"Developmentally challenged" doesn't mean there is no development taking place. 
 It means the largest development effort is working closed-source, and not 
available for free (except some purposes), so some consumers are going to 
follow their path, while others are going to follow the open source branch 
illumos path, which means both disunity amongst developers and disunity amongst 
consumers, and incompatibility amongst products.  So far, in the illumos 
branch, I've only seen bugfixes introduced since zpool 28, no significant 
introduction of new features.  (Unlike the oracle branch, which is just as easy 
to sell as ontap).

Which presents a challenge.  Hence the term, "challenged."

Right now, ZFS is the leading product as far as I'm concerned.  Better than MS 
VSS, better than Ontap, better than BTRFS.  It is my personal opinion that one 
day BTRFS will eclipse ZFS due to oracle's unsupportive strategy causing 
disparity and lowering consumer demand for zfs, but of course, that's just a 
personal opinion prediction for the future, which has yet to be seen.  So far, 
every time I evaluate BTRFS, it fails spectacularly, but the last time I did, 
was about a year ago.  I'm due for a BTRFS re-evaluation now.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] RFE: Un-dedup for unique blocks

2013-01-20 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Nico Williams
> 
> To decide if a block needs dedup one would first check the Bloom
> filter, then if the block is in it, use the dedup code path, else the
> non-dedup codepath and insert the block in the Bloom filter.  

Sorry, I didn't know what a Bloom filter was before I replied before - Now I've 
read the wikipedia article and am consequently an expert.   *sic*   ;-)

It sounds like, what you're describing...  The first time some data gets 
written, it will not produce a hit in the Bloom filter, so it will get written 
to disk without dedup.  But now it has an entry in the Bloom filter.  So the 
second time the data block gets written (the first duplicate) it will produce a 
hit in the Bloom filter, and consequently get a dedup DDT entry.  But since the 
system didn't dedup the first one, it means the second one still needs to be 
written to disk independently of the first one.  So in effect, you'll always 
"miss" the first duplicated block write, but you'll successfully dedup n-1 
duplicated blocks.  Which is entirely reasonable, although not strictly 
optimal.  And sometimes you'll get a false positive out of the Bloom filter, so 
sometimes you'll be running the dedup code on blocks which are actually unique, 
but with some intelligently selected parameters such as Bloom table size, you 
can get this probability to be reasonably small, like less tha
 n 1%.

In the wikipedia article, they say you can't remove an entry from the Bloom 
filter table, which would over time cause consistent increase of false positive 
probability (approaching 100% false positives) from the Bloom filter and 
consequently high probability of dedup'ing blocks that are actually unique; but 
with even a minimal amount of thinking about it, I'm quite sure that's a 
solvable implementation detail.  Instead of storing a single bit for each entry 
in the table, store a counter.  Every time you create a new entry in the table, 
increment the different locations; every time you remove an entry from the 
table, decrement.  Obviously a counter requires more bits than a bit, but it's 
a linear increase of size, exponential increase of utility, and within the 
implementation limits of available hardware.  But there may be a more 
intelligent way of accomplishing the same goal.  (Like I said, I've only 
thought about this minimally).

Meh, well.  Thanks for the interesting thought.  For whatever it's worth.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] iSCSI access patterns and possible improvements?

2013-01-20 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Richard Elling [mailto:richard.ell...@gmail.com]
> Sent: Saturday, January 19, 2013 5:39 PM
> 
> the space allocation more closely resembles a variant
> of mirroring,
> like some vendors call "RAID-1E"

Awesome, thank you.   :-)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] RFE: Un-dedup for unique blocks

2013-01-20 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Nico Williams
> 
> I've wanted a system where dedup applies only to blocks being written
> that have a good chance of being dups of others.
> 
> I think one way to do this would be to keep a scalable Bloom filter
> (on disk) into which one inserts block hashes.
> 
> To decide if a block needs dedup one would first check the Bloom
> filter, then if the block is in it, use the dedup code path, 

How is this different or better than the existing dedup architecture?  If you 
found that some block about to be written in fact matches the hash of an 
existing block on disk, then you've already determined it's a duplicate block, 
exactly as you would, if you had dedup enabled.  In that situation, gosh, it 
sure would be nice to have the extra information like reference count, and 
pointer to the duplicate block, which exists in the dedup table.  

In other words, exactly the way existing dedup is already architected.


> The nice thing about this is that Bloom filters can be sized to fit in
> main memory, and will be much smaller than the DDT.

If you're storing all the hashes of all the blocks, how is that going to be 
smaller than the DDT storing all the hashes of all the blocks?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Resilver w/o errors vs. scrub with errors

2013-01-20 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Jim Klimov
> 
> And regarding the "considerable activity" - AFAIK there is little way
> for ZFS to reliably read and test "TXGs newer than X" 

My understanding is like this:  When you make a snapshot, you're just creating 
a named copy of the present latest TXG.  When you zfs send incremental from one 
snapshot to another, you're creating the delta between two TXG's, that happen 
to have names.  So when you break a mirror and resilver, it's exactly the same 
operation as an incremental zfs send, it needs to calculate the delta between 
the latest (older) TXG on the previously UNAVAIL device, up to the latest TXG 
on the current pool.  Yes this involves examining the meta tree structure, and 
yes the system will be very busy while that takes place.  But the work load is 
very small relative to whatever else you're likely to do with your pool during 
normal operation, because that's the nature of the meta tree structure ... very 
small relative to the rest of your data.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Resilver w/o errors vs. scrub with errors

2013-01-20 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Stephan Budach
> 
> I am always experiencing chksum errors while scrubbing my zpool(s), but
> I never experienced chksum errors while resilvering. Does anybody know
> why that would be? 

When you resilver, you're not reading all the data on all the drives.  Only 
just enough to resilver, which doesn't include all the data that was previously 
in-sync (maybe a little of it, but mostly not).  Even if you have a completely 
failed drive, replaced with a completely new empty drive, if you have a 3-way 
mirror, you only need to read one good copy of the data in order to write the 
resilver'd data onto the new drive.  So you could still be failing to detect 
cksum errors on the *other* side of the mirror, which wasn't read during the 
resilver.

What's more, when you resilver, the system is just going to write the target 
disk.  Not go back and verify every written block of the target disk.

So, think of a scrub as a "complete, thorough, resilver" whereas "resilver" is 
just a lightweight version, doing only the parts that are known to be out-of 
sync, and without subsequent read verification.


> This happens on all of my servers, Sun Fire 4170M2,
> Dell PE 650 and on any FC storage that I have.

While you apparently have been able to keep the system in production for a 
while, consider yourself lucky.  You have a real problem, and solving it 
probably won't be easy.  Your problem is either hardware, firmware, or drivers. 
 If you have a support contract on the Sun, I would recommend starting there.  
Because the Dell is definitely a configuration that you won't find official 
support for - just a lot of community contributors, who will likely not provide 
a super awesome answer for you super soon. 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] iSCSI access patterns and possible improvements?

2013-01-19 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Bob Friesenhahn
> 
> If almost all of the I/Os are 4K, maybe your ZVOLs should use a
> volblocksize of 4K?  This seems like the most obvious improvement.

Oh, I forgot to mention - The above logic only makes sense for mirrors and 
stripes.  Not for raidz (or raid-5/6/dp in general)

If you have a pool of mirrors or stripes, the system isn't forced to subdivide 
a 4k block onto multiple disks, so it works very well.  But if you have a pool 
blocksize of 4k and let's say a 5-disk raidz (capacity of 4 disks) then the 4k 
block gets divided into 1k on each disk and 1k parity on the parity disk.  Now, 
since the hardware only supports block sizes of 4k ... You can see there's a 
lot of wasted space, and if you do a bunch of it, you'll also have a lot of 
wasted time waiting for seeks/latency.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] poor CIFS and NFS performance

2012-12-31 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Eugen Leitl
> 
> I have a pool of 8x ST31000340AS on an LSI 8-port adapter as
> a raidz3 (no compression nor dedup) with reasonable bonnie++
> 1.03 values, e.g.  145 MByte/s Seq-Write @ 48% CPU and 291 MByte/s
> Seq-Read @ 53% CPU. 

For 8-disk raidz3 (effectively 5 disks) I would expect approx 640MB/s for both 
seq read and seq write.  The first halving (from 640 down to 291) could maybe 
be explained by bottlenecking through a single HBA or something like that, so I 
wouldn't be too concerned about that.  But the second halving, from 291 down to 
145 ... A single disk should do 128MB/sec no problem, so the whole pool writing 
at only 145MB/sec sounds wrong to me.

But as you said ... This isn't the area of complaint...  Moving on, you can 
start a new discussion about this if you want to later...


> My problem is pretty poor network throughput. An NFS
> mount on 12.04 64 bit Ubuntu (mtu 9000) or CIFS are
> read at about 23 MBytes/s. Windows 7 64 bit (also jumbo
> frames) reads at about 65 MBytes/s. The highest transfer
> speed on Windows just touches 90 MByte/s, before falling
> back to the usual 60-70 MBytes/s.
> 
> Does anyone have any suggestions on how to debug/optimize
> throughput?

The first thing I would do is build another openindiana box and try NFS / CIFS 
to/from it.  See how it behaves.  Whenever I've seen this sort of problem 
before, it was version incompatibility requiring tweaks between the client and 
server.  I don't know which version of samba / solaris cifs is being used ... 
But at some point in history (win7), windows transitioned from NTLM v1 to v2, 
and at that point, all the older servers became 4x slower with the new clients, 
but if you built a new server with the new clients, then the old version was 4x 
slower than the new.  

Not to mention, I've had times when I couldn't even get linux & solars to 
*talk* to each other over NFS, due to version differences, nevermind tweak all 
the little performance knobs.

So my advice is to first eliminate any question about version / implementation 
differences, and see where that takes you.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs receive options (was S11 vs illumos zfs compatiblity)

2012-12-22 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Cindy Swearingen [mailto:cindy.swearin...@oracle.com]
> 
> Which man page are you referring to?
> 
> I see the zfs receive -o syntax in the S11 man page.

Oh ...  It's the latest openindiana.  So I suppose it must be a new feature 
post-rev-28 in the non-open branch...

But it's no big deal.  I found that if I "zfs create" and then "zfs set" a few 
times, and then "zfs receive" I get the desired behavior.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs receive options (was S11 vs illumos zfs compatiblity)

2012-12-21 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Edward Ned Harvey
> 
> zfs send foo/bar@42 | zfs receive -o compression=on,sync=disabled biz/baz
> 
> I have not yet tried this syntax.  Because you mentioned it, I looked for it 
> in
> the man page, and because it's not there, I hesitate before using it.

Also, readonly=on
...
and ...
Bummer.  When I try zfs receive with -o, I get the message:
invalid option 'o'

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] zfs receive options (was S11 vs illumos zfs compatiblity)

2012-12-21 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of bob netherton
> 
> You can, with recv, override any property in the sending stream that can
> be
> set from the command line (ie, a writable).  
> 
> # zfs send repo/support@cpu-0412 | zfs recv -o version=4 repo/test
> cannot receive: cannot override received version

Are you sure you can do this with other properties?  It's not in the man page.  
I would like to set the compression & sync on the receiving end:

zfs send foo/bar@42 | zfs receive -o compression=on,sync=disabled biz/baz

I have not yet tried this syntax.  Because you mentioned it, I looked for it in 
the man page, and because it's not there, I hesitate before using it.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] S11 vs illumos zfs compatiblity

2012-12-14 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Bob Netherton
> 
> At this point, the only thing would be to use 11.1 to create a new pool at 
> 151's
> version (-o version=) and top level dataset (-O version=).   Recreate the file
> system hierarchy and do something like an rsync.  I don't think there is
> anything more elegant, I'm afraid.

Is that right?  You can't use zfs send | zfs receive to send from a newer 
version and receive on an older version?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] The format command crashes on 3TB disk but zpool create ok

2012-12-14 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of sol
> 
> I added a 3TB Seagate disk (ST3000DM001) and ran the 'format' command but
> it crashed and dumped core.
> 
> However the zpool 'create' command managed to create a pool on the whole
> disk (2.68 TB space).
> 
> I hope that's only a problem with the format command and not with zfs or
> any other part of the kernel.

Suspicion and conjecture only:  I think format uses a fdisk label, which has a 
2T limit.  

Normally it's advised to use the whole disk directly via zpool anyway, so 
hopefully that's a good solution for you.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] any more efficient way to transfer snapshot between two hosts than ssh tunnel?

2012-12-14 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Fred Liu
> 
> BTW, anyone played NDMP in solaris? Or is it feasible to transfer snapshot via
> NDMP protocol?

I've heard you could, but I've never done it.  Sorry I'm not much help, except 
as a cheer leader.  You can do it!  I think you can!  Don't give up! heheheheh
Please post back whatever you find, or if you have to figure it out for 
yourself, then blog about it and post that.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Remove disk

2012-12-07 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Freddie Cash
> 
> On Thu, Dec 6, 2012 at 12:35 AM, Albert Shih  wrote:
>  Le 01/12/2012 ? 08:33:31-0700, Jan Owoc a ?crit
> 
> > 2) replace the disks with larger ones one-by-one, waiting for a
> > resilver in between
> 
> This is the point I don't see how to do it. I've 48 disk actually from
> /dev/da0 -> /dev/da47 (I'm under FreeBSD 9.0) lets say 3To.

You have 12 x 2T disks in a raidz2, and you want to replace those disks with 4T 
each.  Right?

Start with a scrub.  Wait for it to complete.  Ensure you have no errors.

sudo format -e < /dev/null > before.txt
Then "zpool offline" one disk.  Pull it out and stick a new 4T disk in its 
place.  "devfsadm -Cv" to recognize the new disk.
sudo format -e < /dev/null > after.txt
diff before.txt after.txt
You should see one device disappeared, and a new one was created.
Now "zpool replace" to replace the old disk with the new disk.

"zpool status" should show the new drive resilvering.
Wait for the resilver to finish.

Repeat 11 more times.  Replace each disk, one at a time, with resilver in 
between.

When you're all done, it might expand to the new size automatically, or you 
might need to play with the "autoexpand" property to make use of the new 
storage space.

What percentage full is your pool?
When you're done, please write back to tell us how much time this takes.  I 
predict it will take a very long time, and I'm curious to know exactly how 
much.  Before you start, I'm going to guess ...  80% full, and 7-10 days to 
resilver each drive.  So the whole process will take you a few months to 
complete.  (That's the disadvantage of a bunch of disks in a raidzN 
configuration.)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] query re disk mirroring

2012-11-29 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Enda o'Connor - Oracle Ireland -
> 
> Say I have an ldoms guest that is using zfs root pool that is mirrored,
> and the two sides of the mirror are coming from two separate vds
> servers, that is
> mirror-0
>c3d0s0
>c4d0s0
> 
> where c3d0s0 is served by one vds server, and c4d0s0 is served by
> another vds server.
> 
> Now if for some reason, this physical rig loses power, then how do I
> know which side of the mirror to boot off, ie which side is most recent.

If one storage host goes down, it should be no big deal, one side of the mirror 
becomes degraded, and later when it comes up again, it resilvers.

If one storage host goes down, and the OS continues running for a while and 
then *everything* goes down, later you bring up both sides of the storage, and 
bring up the OS, and the OS will know which side is more current because of the 
higher TXG.  So the OS will resilver the old side.

If one storage host goes down, and the OS continues running for a while and 
then *everything* goes down...  Later you bring up only one half of the 
storage, and bring up the OS.  Then the pool will refuse to mount, because with 
missing devices, it doesn't know if maybe the other side is more current.

As long as one side of the mirror disappears and reappears while the OS is 
still running, no problem.

As long as all the devices are present during boot, no problem.

Only problem is when you try to boot from one side of a broken mirror.  If you 
need to do this, you should mark the broken mirror as broken before shutting 
down - Certainly detach would do the trick.  Perhaps "offline" might also do 
the trick.

Does that answer it?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs on SunFire X2100M2 with hybrid pools

2012-11-29 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Jim Klimov
> 
> this is
> the part I am not certain about - it is roughly as cheap to READ the
> gzip-9 datasets as it is to read lzjb (in terms of CPU decompression).

Nope.  I know LZJB is not LZO, but I'm starting from a point of saying that LZO 
is specifically designed to be super-fast, low-memory for decompression.  (As 
claimed all over the LZO webpage, as well as wikipedia, and supported by my own 
personal experience using lzop).

So for comparison to LZJB, see here:
http://denisy.dyndns.org/lzo_vs_lzjb/

LZJB is, at least according to these guys, even faster than LZO.  So I'm 
confident concluding that lzjb (default) decompression is significantly faster 
than zlib (gzip) decompression.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Question about degraded drive

2012-11-28 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Chris Dunbar - Earthside, LLC
> 
> # zpool replace tank c11t4d0
> # zpool clear tank

I would expect this to work, or detach/attach.  You should scrub periodically, 
and ensure no errors after scrub.  But the really good question is why does the 
device go offline?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Question about degraded drive

2012-11-28 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Freddie Cash
> 
> And you can try 'zpool online' on the failed drive to see if it comes back
> online.

Be cautious here - I have an anecdote, which might represent a trend in best 
practice, or it might just be an anecdote.  At least once, I had an iscsi 
device go offline, and then I "zpool online"d the device, and it seemed to work 
- resilvered successfully, zpool status showed clean, I'm able to zfs send and 
zfs receive.  But for normal usage (go in and actually use the files in the 
pool) it was never usable again.  I don't know the root cause right now.  Maybe 
it's iscsi related.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs on SunFire X2100M2 with hybrid pools

2012-11-28 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Jim Klimov
> 
> I really hope someone better versed in compression - like Saso -
> would chime in to say whether gzip-9 vs. lzjb (or lz4) sucks in
> terms of read-speeds from the pools. My HDD-based assumption is
> in general that the less data you read (or write) on platters -
> the better, and the spare CPU cycles can usually take the hit.

Oh, I can definitely field that one - 
The lzjb compression (default compression as long as you just turn compression 
on without specifying any other detail) is very fast compression, similar to 
lzo.  It generally has no noticeable CPU overhead, but it saves you a lot of 
time and space for highly repetitive things like text files (source code) and 
sparse zero-filled files and stuff like that.  I personally always enable this. 
 "compresson=on"

zlib (gzip) is more powerful, but *way* slower.  Even the fastest level gzip-1 
uses enough CPU cycles that you probably will be CPU limited rather than IO 
limited.  There are very few situations where this option is better than the 
default lzjb.

Some data (anything that's already compressed, zip, gz, etc, video files, 
jpg's, encrypted files, etc) are totally uncompressible with these algorithms.  
If this is the type of data you store, you should not use compression.

Probably not worth mention, but what the heck.  If you normally have 
uncompressible data and then one day you're going to do a lot of stuff that's 
compressible...  (Or vice versa)...  The compression flag is only used during 
writes.  Once it's written to the pool, compressed or uncompressed, it stays 
that way, even if you change the flag later.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs on SunFire X2100M2 with hybrid pools

2012-11-27 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Eugen Leitl
> 
> can I make e.g. LSI SAS3442E
> directly do SSD caching (it says something about CacheCade,
> but I'm not sure it's an OS-side driver thing), as it
> is supposed to boost IOPS? Unlikely shot, but probably
> somebody here would know.

Depending on the type of work you will be doing, the best performance thing you 
could do is to disable zil (zfs set sync=disabled) and use SSD's for cache.  
But don't go *crazy* adding SSD's for cache, because they still have some 
in-memory footprint.  If you have 8G of ram and 80G SSD's, maybe just use one 
of them for cache, and let the other 3 do absolutely nothing.  Better yet, make 
your OS on a pair of SSD mirror, then use pair of HDD mirror for storagepool, 
and one SSD for cache.  Then you have one SSD unused, which you could 
optionally add as dedicated log device to your storagepool.  There are specific 
situations where it's ok or not ok to disable zil - look around and ask here if 
you have any confusion about it.  

Don't do redundancy in hardware.  Let ZFS handle it.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Directory is not accessible

2012-11-26 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Sami Tuominen
> 
> How can one remove a directory containing corrupt files or a corrupt file
> itself? For me rm just gives input/output error.

I was hoping to see somebody come up with an answer for this ... I would expect 
rm to work...

Maybe you have to rm the parent of the thing you're trying to rm?  But I kinda 
doubt it.

Maybe you need to verify you're rm'ing the right thing?  I believe, if you 
scrub the pool, it should tell you the name of the corrupt things.

Or maybe you're not experiencing a simple cksum mismatch, maybe you're 
experiencing a legitimate IO error.  The "rm" solution could only possibly work 
to clear up a cksum mismatch.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Appliance as a general-purpose server question

2012-11-23 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Jim Klimov
>  
> I wonder if it would make weird sense to get the boxes, forfeit the
> cool-looking Fishworks, and install Solaris/OI/Nexenta/whatever to
> get the most flexibility and bang for a buck from the owned hardware...

This is what we decided to do at work, and this is the reason why.
But we didn't buy the appliance-branded boxes; we just bought normal servers 
running solaris.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Woeful performance from an iSCSI pool

2012-11-22 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Ian Collins
> 
> I look after a remote server that has two iSCSI pools.  The volumes for
> each pool are sparse volumes and a while back the target's storage
> became full, causing weird and wonderful corruption issues until they
> manges to free some space.
> 
> Since then, one pool has been reasonably OK, but the other has terrible
> performance receiving snapshots.  Despite both iSCSI devices using the
> same IP connection, iostat shows one with reasonable service times while
> the other shows really high (up to 9 seconds) service times and 100%
> busy.  This kills performance for snapshots with many random file
> removals and additions.
> 
> I'm currently zero filling the bad pool to recover space on the target
> storage to see if that improves matters.
> 
> Has anyone else seen similar behaviour with previously degraded iSCSI
> pools?

This sounds exactly like the behavior I was seeing with my attempt at two 
machines zpool mirror'ing each other via iscsi.  In my case, I had two machines 
that are both targets and initiators.  I made the initiator service dependent 
on the target service, and I made the zpool mount dependent on the initiator 
service, and I made the virtualbox guest start dependent on the zpool mount.

Everything seemed fine for a while, including some reboots.  But then one 
reboot, one of my systems stayed down too long, and when it finally came back 
up, both machines started choking.

So far I haven't found any root cause, and so far the only solution I've found 
was to reinstall the OS.  I tried everything I know in terms of removing, 
forgetting, recreating the targets, initiators, and pool, but somehow none of 
that was sufficient.

I recently (yesterday) got budgetary approval to dig into this more, so 
hopefully maybe I'll have some insight before too long, but don't hold your 
breath.  I could fail, and even if I don't, it's likely to be weeks or months.

What I want to know from you is:

Which machines are your solaris machines?  Just the targets?  Just the 
initiators?  All of them?

You say you're having problems just with snapshots.  Are you sure you're not 
having trouble with all sorts of IO, and not just snapshots?  What about import 
/ export?

In my case, I found I was able to zfs send, zfs receive, zfs status, all fine.  
But when I launched a guest VM, there would be a massive delay - you said up to 
9 seconds - I was sometimes seeing over 30s - sometimes crashing the host 
system.  And the guest OS was acting like it was getting IO error, without 
actually displaying error message indicating IO error.  I would attempt, and 
sometimes fail, to power off the guest vm (kill -KILL VirtualBox).  After the 
failure began, zpool status still works (and reports no errors), but if I try 
to do things like export/import, they fail indefinitely, and I need to power 
cycle the host.  While in the failure mode, I can zpool iostat, and I sometimes 
see 0 transactions with nonzero bandwidth.  Which defies my understanding.

Did you ever see the iscsi targets "offline" or "degraded" in any way?  Did you 
do anything like "online" or "clear?"

My systems are openindiana - the latest, I forget if that's 151a5 or a6

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zvol wrapped in a vmdk by Virtual Box and double writes?

2012-11-21 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Jim Klimov
> 
> As for ZIL - even if it is used with the in-pool variant, I don't
> think your setup needs any extra steps to disable it (as Edward likes
> to suggest), and most other setups don't need to disable it either.

No, no - I know I often suggest disabling the zil, because so many people 
outrule it on principle (the evil tuning guide says "disable the zil (don't!)")

But in this case, I was suggesting precisely the opposite of disabling it.  I 
was suggesting making it more aggressive.

But now that you mention it - if he's looking for maximum performance, perhaps 
disabling the zil would be best for him.   ;-)  

Nathan, it will do you some good to understand when it's ok or not ok to 
disable the zil.   (zfs set sync=disabled)  If this is a guest VM in your 
laptop or something like that, then it's definitely safe.  If the guest VM is a 
database server, with a bunch of external clients (on the LAN or network or 
whatever) then it's definitely *not* safe.

Basically if anything external of the VM is monitoring or depending on the 
state of the VM, then it's not ok.  But, if the VM were to crash and go back in 
time by a few seconds ... If there are no clients that would care about that 
... then it's safe to disable ZIL.  And that is the highest performance thing 
you can possibly do.


> It also shouldn't add much to your writes - the in-pool ZIL blocks
> are then referenced as userdata when the TXG commit happens (I think).

I would like to get some confirmation of that - because it's the opposite of 
what I thought.  
I thought the ZIL is used like a circular buffer.  The same blocks will be 
overwritten repeatedly.  But if there's a sync write over a certain size, then 
it skips the ZIL and writes immediately to main zpool storage, so it doesn't 
have to get written twice.


> I also think that with a VM in a raw partition you don't get any
> snapshots - neither ZFS as underlying storage ('cause it's not),
> not hypervisor snaps of the VM. So while faster, this is also some
> trade-off :)

Oh - But not faster than zvol.  I am currently a fan of wrapping zvol inside 
vmdk, so I get maximum performance and also snapshots.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zvol wrapped in a vmdk by Virtual Box and double writes?

2012-11-20 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Nathan Kroenert
> 
> I chopped into a few slices - p0 (partition table), p1 128GB, p2 60gb.
> 
> As part of my work, I have used it both as a RAW device (cxtxdxp1) and
> wrapped partition 1 with a virtualbox created VMDK linkage, and it works
> like a champ. :) Very happy with that.
> 
> I then tried creating a new zpool using partition 2 of the disk (zpool
> create c2d0p2) and then carved a zvol out of that (30GB), and wrapped
> *that* in a vmdk.

Why are you parititoning, then creating zpool, and then creating zvol?
I think you should make the whole disk a zpool unto itself, and then carve out 
the 128G zvol and 60G zvol.  For that matter, why are you carving out multiple 
zvol's?  Does your Guest VM really want multiple virtual disks for some reason?

Side note:  Assuming you *really* just want a single guest to occupy the whole 
disk and run as fast as possible...  If you want to snapshot your guest, you 
should make the whole disk one zpool, and then carve out a zvol which is 
significantly smaller than 50%, say perhaps 40% or 45% might do the trick.  The 
zvol will immediately reserve all the space it needs, and if you don't have 
enough space leftover to completely replicate the zvol, you won't be able to 
create the snapshot.  If your pool ever gets over 90% used, your performance 
will degrade, so a 40% zvol is what I would recommend.

Back to the topic:

Given that you're on the SSD, there is no faster nonvolatile storage you can 
use for ZIL log device.  So you should leave the default ZIL inside the pool... 
 Don't try adding any separate slice or anything as a log device...  But as you 
said, sync writes will hit the disk twice.  I would have to guess it's a good 
idea for you to tune ZFS to immediately flush transactions whenever there's a 
sync write.  I forget how this is done - there's some tunable that indicates 
anything sync write over a certain size should be immediately flushed...


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zvol access rights - chown zvol on reboot / startup / boot

2012-11-17 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Edward Ned Harvey
> 
> An easier event to trigger is the starting of the virtualbox guest.  Upon vbox
> guest starting, check the service properties for that instance of vboxsvc, and
> chmod if necessary.  But vboxsvc runs as non-root user...
> 
> I like the idea of using zfs properties, if someday the functionality is 
> going to
> be built into ZFS, and we can simply scrap the SMF chown service.  But these
> days, ZFS isn't seeing a lot of public development.

I just built this into simplesmf, http://code.google.com/p/simplesmf/
Support to execut the zvol chown immediately prior to launching guestvm
I know Jim is also building it into vboxsvc, but I haven't tried that yet.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Dedicated server running ESXi with no RAID card, ZFS for storage?

2012-11-17 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
> 
> > Found quite a few posts on
> > various
> > forums of people complaining that RDP with external auth doesn't work (or
> > not reliably),
> 
> Actually, it does work, and it works reliably, but the setup is very much not
> straightforward.  I'm likely to follow up on this later today, because as
> coincidence would have it, this is on my to-do for today.

I just published "simplesmf" http://code.google.com/p/simplesmf/
which includes a lot of the work I've done in the last month.  Relevant to this 
discussion, the step-by-step instructions to enable VBoxHeadless external 
authentication, and connect the RDP client to it.
http://code.google.com/p/simplesmf/source/browse/trunk/samples/virtualbox-guest-control/headless-hints.txt


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zvol access rights - chown zvol on reboot / startup / boot

2012-11-16 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Jim Klimov
> 
> Well, as a simple stone-age solution (to simplify your SMF approach),
> you can define custom attributes on dataset, zvols included. I think
> a custom attr must include a colon ":" in the name, and values can be
> multiline if needed. Simple example follows:
> 
> # zfs set owner:user=jim pool/rsvd
> 
> Then you can query the zvols for such attribute values and use them
> in chmod, chown, ACL settings, etc. from your script. This way the
> main goal is reached: the ownership config data stays within the pool.

Given that zfs doesn't already have built-in support for these properties at 
mount time, given the necessity to poll for these values using an 
as-yet-unwritten SMF service, I'm not necessarily in agreement that zfs 
properties is a better solution than using a conf file to list these properties 
on a per-vdev basis.  Either way, a SMF service manages it, and it's difficult 
or impossible to trigger an SMF to occur on every mount, and only on every 
mount.  So the SMF would have to be either a one-time shot at bootup or manual 
refresh (and consequently miss anything mounted later) or it will have to 
continuously poll all the filesystems and volumes in the system.

An easier event to trigger is the starting of the virtualbox guest.  Upon vbox 
guest starting, check the service properties for that instance of vboxsvc, and 
chmod if necessary.  But vboxsvc runs as non-root user...

I like the idea of using zfs properties, if someday the functionality is going 
to be built into ZFS, and we can simply scrap the SMF chown service.  But these 
days, ZFS isn't seeing a lot of public development.

If we assume the SMF service is the thing that will actually be used, from now 
until someday when BTRFS eventually eclipses ZFS, then I would rather see a 
conf file or SMF service property, so the SMF service doesn't constantly scan 
all the filesystems and volumes for their zfs properties.  It just checks the 
conf file and knows instantly which ones need to be chown'd.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zvol access rights - chown zvol on reboot / startup / boot

2012-11-16 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Geoff Nordli
> 
> Instead of using vdi, I use comstar targets and then use vbox built-in scsi
> initiator.

Based on my recent experiences, I am hesitant to use the iscsi ... I don't know 
if it was the iscsi initiator or target that was unstable, or the combination 
of both running on the same system, or some other characteristic...  Plus when 
I think about the complexity of creating the zvol and configuring the target, 
with iscsi and IP overhead... As compared to just creating the zvol and using 
it directly...

Maybe there is unavoidable complexity around the chown, but it seems like the 
chown should be easier and simpler than the iscsi solution...

But in any event, thanks for the suggestion.  It's nice to know there's at 
*least* one alternative option.   ;-)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] zvol access rights - chown zvol on reboot / startup / boot

2012-11-15 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

When I google around for anyone else who cares and may have already solved the 
problem before I came along - it seems we're all doing the same thing for the 
same reason.  If by any chance you are running VirtualBox on a solaris / 
opensolaris / openidiana / whatever ZFS host, you could of course use .vdi 
files for the VM virtual disks, but a lot of us are using zvol instead, for 
various reasons.  To do the zvol, you first create the zvol (sudo zfs create 
-V) and then chown it to the user who runs VBox (sudo chown someuser 
/dev/zvol/rdsk/...) and then create a rawvmdk that references it (VBoxManage 
internalcommands createrawvmdk -filename /home/someuser/somedisk.vmdk -rawdisk 
/dev/zvol/rdsk/...)

The problem is - during boot / reboot, or anytime the zpool or zfs filesystem 
is mounted or remounted, export, import...  The zvol ownership reverts back to 
root:root.  So you have to repeat your "sudo chown" before the guest VM can 
start.

And the question is ...  Obviously I can make an SMF service which will chown 
those devices automatically, but that's kind of a crappy solution.

Is there any good way to assign the access rights, or persistently assign 
ownership of zvol's?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Dedicated server running ESXi with no RAID card, ZFS for storage?

2012-11-14 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Dan Swartzendruber
> 
> Well, I think I give up for now.  I spent quite a few hours over the last
> couple of days trying to get gnome desktop working on bare-metal OI,
> followed by virtualbox.  

I would recommend installing OI desktop, not OI server.  Because I too, tried 
to get gnome working in OI server, to no avail.  But if you install OI desktop, 
it simply goes in, brainlessly, simple.


> Found quite a few posts on
> various
> forums of people complaining that RDP with external auth doesn't work (or
> not reliably), 

Actually, it does work, and it works reliably, but the setup is very much not 
straightforward.  I'm likely to follow up on this later today, because as 
coincidence would have it, this is on my to-do for today.

Right now, I'll say this much:  When you RDP from a windows machine to a 
windows machine, you get prompted for password.  Nice, right?  Seems pretty 
obvious.   ;-)   But the VirtualBox RDP server doesn't have that capability.   
Pt...  You need to enter the username & password into the RDP client, and 
save it, before attempting the connection.


> The final straw was when I
> rebooted the OI server as part of cleaning things up, and... It hung.  

Bummer.  That might be some unsupported hardware for running OI.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Dedicated server running ESXi with no RAID card, ZFS for storage?

2012-11-09 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Eugen Leitl
> 
> On Thu, Nov 08, 2012 at 04:57:21AM +0000, Edward Ned Harvey
> (opensolarisisdeadlongliveopensolaris) wrote:
> 
> > Yes you can, with the help of Dell, install OMSA to get the web interface
> > to manage the PERC.  But it's a pain, and there is no equivalent option for
> > most HBA's.  Specifcally, on my systems with 3ware, I simply installed the
> > solaris 3ware utility to manage the HBA.  Which would not be possible on
> > ESXi.  This is important because the systems are in a remote datacenter,
> and
> > it's the only way to check for red blinking lights on the hard drives.  ;-)
> 
> I thought most IPMI came with full KVM, and also SNMP, and some ssh built-
> in.

Depends.

So, one possible scenario:  You power up the machine for the first time, you 
enter ILOM console, you create username & password & static IP address.  From 
now on, you're able to get the remote console, awesome, great.  No need for 
ipmi-tool in the OS.

Another scenario, that I encounter just as often:  You inherit some system from 
the previous admin.  They didn't set up IPMI or ILOM.  They installed ESXi, and 
now the only thing you can do is power off the system to do it.

But in the situation where I inherit a Linux / Solaris machine from a previous 
admin who didn't config ipmi...  I don't need to power down.  I can config the 
ipmi via ipmi-tools.

Going a little further down these trails...

If you have a basic IPMI device, then all it does is *true* ipmi, which is a 
standard protocol.  You have to send it ipmi signals via the ipmi-tool command 
on your laptop (or another server).  It doesn't use SSL; it uses either no 
encryption, or a preshared key.  The preshared key is a random HEX 20 character 
long string.  If you configure that at the boot time (as in the first situation 
mentioned above) then you have to type in at the physical console at first 
boot:  new username, new password, new static IP address etc, and the new 
encryption key.  But if you're running a normal OS, you can skip all that, boot 
the new OS, and paste all that stuff in via ssh, using the local ipmi-tool to 
config the local ipmi device.

If you have a newer, more powerful ILOM device, then you probably only need to 
assign an IP address to the ilom.  Then you can browse to it via https and do 
whatever else you need to do.

Make sense?

Long story short, "Depends.";-)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Dedicated server running ESXi with no RAID card, ZFS for storage?

2012-11-09 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Karl Wagner [mailto:k...@mouse-hole.com]
> 
> If I was doing this now, I would probably use the ZFS aware OS bare metal,
> but I still think I would use iSCSI to export the ZVols (mainly due to the 
> ability
> to use it across a real network, hence allowing guests to be migrated simply)

Yes, if your VM host is some system other than your ZFS baremetal storage 
server, then exporting the zvol via iscsi is a good choice, or exporting your 
storage via NFS.  Each one has their own pros/cons, and I would personally be 
biased in favor of iscsi.

But if you're going to run the guest VM on the same machine that is the ZFS 
storage server, there's no need for the iscsi.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Dedicated server running ESXi with no RAID card, ZFS for storage?

2012-11-09 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Dan Swartzendruber [mailto:dswa...@druber.com]
> 
> I have to admit Ned's (what do I call you?)idea is interesting.  I may give
> it a try...

Yup, officially Edward, most people call me Ned.

I contributed to the OI VirtualBox instructions.  See here:
http://wiki.openindiana.org/oi/VirtualBox

Jim's vboxsvc is super powerful - But at first I found it overwhelming, mostly 
due to unfamiliarity with SMF.  One of these days I'm planning to contribute a 
"Quick Start" guide to vboxsvc, but for now, if you find it confusing in any 
way, just ask for help here.  (Right Jim?)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Dedicated server running ESXi with no RAID card, ZFS for storage?

2012-11-08 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Dan Swartzendruber [mailto:dswa...@druber.com]
> 
> Now you have me totally confused.  How does your setup get data from the
> guest to the OI box?  If thru a wire, if it's gig-e, it's going to be
> 1/3-1/2 the speed of the other way.  If you're saying you use 10gig or
> some-such, we're talking about a whole different animal.

Sorry - 

In the old setup, I had ESXi host, with solaris 10 guest, exporting NFS back to 
the host.  So ESXi created the other guests inside the NFS storage pool.  In 
this setup, the bottleneck is the virtual LAN that maxes out around 2-3 Gbit, 
plus TCP/IP and NFS overhead that degrades the usable performance a bit more.

In the new setup, I have openindiana running directly on the hardware (OI is 
the host) and virtualization is managed by VirtualBox.  I would use zones if I 
wanted solaris/OI guests, but it just so happens I want linux & windows guests. 
 There is no bottleneck.  My linux guest can read 6Gbit/sec and write 3Gbit/sec 
(I'm using 3 disks mirrored with another 3 disks, each can read/write 1 
Gbit/sec).

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Dedicated server running ESXi with no RAID card, ZFS for storage?

2012-11-08 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Karl Wagner
> 
> I am just wondering why you export the ZFS system through NFS?
> I have had much better results (albeit spending more time setting up) using
> iSCSI. I found that performance was much better, 

A couple years ago, I tested and benchmarked both configurations on the same 
system.  I found that the performance was equal both ways (which surprised me 
because I expected NFS to be slower due to FS overhead.)  I cannot say if CPU 
utilization was different - but the IO measurements were the same.  At least, 
indistinguishably different.

Based on those findings, I opted to use NFS for several weak reasons.  If I 
wanted to, I could export NFS to more different systems.  I know everything 
nowadays supports iscsi initiation, but it's not as easy to set up as a NFS 
client.  If you want to expand the guest disk, in iscsi, ...  I'm not 
completely sure you *can* expand a zvol, but if you can, you at least have to 
shut everything down, then expand and bring it all back up and then have the 
iscsi initiator expand to occupy the new space.  But in NFS, the client can 
simply expand, no hassle.  

I like being able to look in a filesystem and see the guests listed there as 
files.  Know I could, if I wanted to, copy those things out to any type of 
storage I wish.  Someday, perhaps I'll want to move some guest VM's over to a 
BTRFS server instead of ZFS.  But it would be more difficult with iscsi.

For what it's worth, in more recent times, I've opted to use iscsi.  And here 
are the reasons:

When you create a guest file in a ZFS filesystem, it doesn't automatically get 
a refreservation.  Which means, if you run out of disk space thanks to 
snapshots and stuff, the guest OS suddenly can't write to disk, and it's a hard 
guest crash/failure.  Yes you can manually set the refreservation, if you're 
clever, but it's easy to get wrong.

If you create a zvol, by default, it has an appropriately sized refreservation 
that guarantees the guest will always be able to write to disk.

Although I got the same performance using iscsi or NFS with ESXi...  I did NOT 
get the same result using VirtualBox.

In Virtualbox, if I use a *.vdi file...  The performance is *way* slower than 
using a *.vmdk wrapper for physical device (zvol).  ( using VBoxManage 
internalcommands createrawvmdk )

The only problem with the zvol / vmdk idea in virtualbox is that every reboot 
(or remount) the zvol becomes owned by root again.  So I have to manually chown 
the zvol for each guest each time I restart the host.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Dedicated server running ESXi with no RAID card, ZFS for storage?

2012-11-08 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Jim Klimov
> 
> the VM running "a ZFS OS" enjoys PCI-pass-through, so it gets dedicated
> hardware access to the HBA(s) and harddisks at raw speeds, with no
> extra layers of lags in between. 

Ah.  But even with PCI pass-thru, you're still limited by the virtual LAN 
switch that connects ESXi to the ZFS guest via NFS.  When I connected ESXi and 
a guest this way, obviously your bandwidth between the host & guest is purely 
CPU and memory limited.  Because you're not using a real network interface; 
you're just emulating the LAN internally.  I streamed data as fast as I could 
between ESXi and a guest, and found only about 2-3 Gbit.  That was over a year 
ago so I forget precisely how I measured it ... NFS read/write perhaps, or wget 
or something.  I know I didn't use ssh or scp, because those tend to slow down 
network streams quite a bit.  The virtual network is a bottleneck (unless 
you're only using 2 disks, in which case 2-3 Gbit is fine.)

I think THIS is where we're disagreeing:  I'm saying "Only 2-3 gbit" but I see 
Dan's email said " since the traffic never leaves the host (I get 3gb/sec or so 
usable thruput.)"  and  "No offense, but quite a few people are doing exactly 
what I describe and it works just fine..."

It would seem we simply have different definitions of "fine" and "abysmal."
;-)


> Also, VMWare does not (AFAIK) use ext3, but their own VMFS which is,
> among other things, cluster-aware (same storage can be shared by
> several VMware hosts).

I didn't know vmfs3 had extensions - I think vmfs3 is based on ext3.  At least, 
all the performance characteristics I've ever observed are on-par with ext3.  
But it makes sense they would extend it in some way.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Dedicated server running ESXi with no RAID card, ZFS for storage?

2012-11-07 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
>  
> Stuff like that.  I could go on, but it basically comes down to:  With
> openindiana, you can do a lot more than you can with ESXi.  Because it's a
> complete OS.  You simply have more freedom, better performance, less
> maintenance, less complexity.  IMHO, it's better in every way.

Oh - I just thought of an important one - make that two, three...

On ESXi, you can't run ipmi-tools.  Which means, if you're configuring ipmi, 
you have to do it at power-on, by hitting the BIOS key, and then you have to 
type in your encryption key by hand (20 hex chars).  Whereas, with a real OS, 
you run ipmi-tool and paste on the ssh prompt.  (Even if you enable ssh prompt 
on ESXi, you won't get ipmi-tool running there.)

I have two systems that have 3ware HBA's, and I have some systems with Dell 
PERC.  

Yes you can, with the help of Dell, install OMSA to get the web interface to 
manage the PERC.  But it's a pain, and there is no equivalent option for most 
HBA's.  Specifcally, on my systems with 3ware, I simply installed the solaris 
3ware utility to manage the HBA.  Which would not be possible on ESXi.  This is 
important because the systems are in a remote datacenter, and it's the only way 
to check for red blinking lights on the hard drives.  ;-)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Dedicated server running ESXi with no RAID card, ZFS for storage?

2012-11-07 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Dan Swartzendruber [mailto:dswa...@druber.com]
> 
> I'm curious here.  Your experience is 180 degrees opposite from mine.  I
> run an all in one in production and I get native disk performance, and
> ESXi virtual disk I/O is faster than with a physical SAN/NAS for the NFS
> datastore, since the traffic never leaves the host (I get 3gb/sec or so
> usable thruput.)  

What is all in one?
I wonder if we crossed wires somehow...  I thought Tiernan said he was running 
Nexenta inside of ESXi, where Nexenta exports NFS back to the ESXi machine, so 
ESXi will have the benefit of ZFS underneath its storage.

That's what I used to do.

When I said performance was abysmal, I meant, if you dig right down and 
pressure the system for throughput to disk, you've got a Linux or Windows VM 
isnide of ESX, which is writing to a virtual disk, which ESX is then wrapping 
up inside NFS and TCP, talking on the virtual LAN to the ZFS server, which 
unwraps the TCP and NFS, pushes it all through the ZFS/Zpool layer, writing 
back to the virtual disk that ESX gave it, which is itself a layer on top of 
Ext3, before it finally hits disk.  Based purely on CPU and memory throughput, 
my VM guests were seeing a max throughput of around 2-3 Gbit/sec.  That's not 
*horrible* abysmal.  But it's bad to be CPU/memory/bus limited if you can just 
eliminate all those extra layers, and do the virtualization directly isnide a 
system that supports zfs.


> > I have abandoned ESXi in favor of openindiana or solaris running as the
> host, with virtualbox running the guests.  I am S much happier now.
> But it takes a higher level of expertise than running ESXi, but the results 
> are
> much better.
> >
> in what respect?  due to the 'abysmal performance'?

No - mostly just the fact that I am no longer constrained by ESXi.  In ESXi, 
you have such limited capabilities of monitoring, storage, and how you 
interface it ...  You need a windows client, you only have a few options in 
terms of guest autostart and so forth.  If you manage all that in a shell 
script (or whatever) you can literally do anything you want.  Startup one 
guest, then launch something that polls the first guest for the operational 
XMPP interface (or whatever service you happen to care about) before launching 
the second guest, etc.  Obviously you can still do brain-dead timeouts or 
monitoring for the existence of late-boot-cycle services such as vmware-tools 
too, but that's no longer your only option.

Of particular interest, I formerly had ESXi running a guest that was a DHCP and 
DNS server, and everything else had to wait for it.  Now I run DHCP and DNS 
directly inside of the host openindiana.  (So I eliminated one VM).  I am now 
able to connect to guest consoles via VNC or RDP (ok on mac and linux), whereas 
with ESXi your only choice is to connect via VSphere from windows.  

In ESXi, you cannot use a removable USB drive to store your removable backup 
storage.  I was using an eSATA drive, and I needed to reboot the whole system 
every time I rotated backups offsite.  But with openindiana as the host, I can 
add/remove removable storage, perform my zpool imports / exports, etc, all 
without any rebooting.

Stuff like that.  I could go on, but it basically comes down to:  With 
openindiana, you can do a lot more than you can with ESXi.  Because it's a 
complete OS.  You simply have more freedom, better performance, less 
maintenance, less complexity.  IMHO, it's better in every way.

I say "less complexity" but maybe not.  It depends.  I have greater complexity 
in the host OS, but I have less confusion and less VM dependencies, so to me 
that's less complexity.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Dedicated server running ESXi with no RAID card, ZFS for storage?

2012-11-07 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Tiernan OToole
> 
> I have a Dedicated server in a data center in Germany, and it has 2 3TB 
> drives,
> but only software RAID. I have got them to install VMWare ESXi and so far
> everything is going ok... I have the 2 drives as standard data stores...

ESXi doesn't do software raid, so ... what are you talking about?


> But i am paranoid... So, i installed Nexenta as a VM, gave it a small disk to
> boot off and 2 1Tb disks on separate physical drives... I have created a 
> mirror
> pool and shared it with VMWare over NFS and copied my ISOs to this share...

I formerly did exactly the same thing.  Of course performance is abysmal 
because you're booting a guest VM to share storage back to the host where the 
actual VM's run.  Not to mention, there's the startup dependency, which is 
annoying to work around.  But yes it works.


> 1: If you where given the same hardware, what would you do? (RAID card is
> an extra EUR30 or so a month, which i don't really want to spend, but could, 
> if
> needs be...)

I have abandoned ESXi in favor of openindiana or solaris running as the host, 
with virtualbox running the guests.  I am S much happier now.  But it takes 
a higher level of expertise than running ESXi, but the results are much better.


> 2: should i mirror the boot drive for the VM?

Whenever possible, you should always give more than one storage device to ZFS 
and let it do redundancy of some kind, be it mirror or raidz.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Strange mount -a problem in Solaris 11.1

2012-11-01 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Ian Collins
> 
> >> ioctl(3, ZFS_IOC_OBJECT_STATS, 0xF706BBB0)
> >>
> >> The system boots up fine in the original BE.  The root (only) pool in a
> >> single drive.
> >>
> >> Any ideas?
> > devfsadm -Cv
> > rm /etc/zfs/zpool.cache
> > init 6
> >
> 
> That was a big enough stick to fix it.  Nasty bug none the less.

I wonder what caused it?
The ioctl error suggests inability to access some device.  Hence the devfsadm 
and rm zpool.cache.  Force the system to search for devices anew.  

What did you upgrade from?  Perhaps in your old system, you had pools made of 
c0t0d0 and so forth, while in sol 11, the devices all became multipath?  If so, 
I would expect the upgrader to be smart enough to do the devfsadm for you, and 
rebuild the zpool.cache.

Anyway, glad you got out of the woods.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Strange mount -a problem in Solaris 11.1

2012-10-31 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Ian Collins
> 
> Have have a recently upgraded (to Solaris 11.1) test system that fails
> to mount its filesystems on boot.
> 
> Running zfs mount -a results in the odd error
> 
> #zfs mount -a
> internal error
> Invalid argument
> 
> truss shows the last call as
> 
> ioctl(3, ZFS_IOC_OBJECT_STATS, 0xF706BBB0)
> 
> The system boots up fine in the original BE.  The root (only) pool in a
> single drive.
> 
> Any ideas?

devfsadm -Cv
rm /etc/zfs/zpool.cache
init 6

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Scrub and checksum permutations

2012-10-28 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Jim Klimov
> 
> I tend to agree that parity calculations likely
> are faster (even if not all parities are simple XORs - that would
> be silly for double- or triple-parity sets which may use different
> algos just to be sure).

Even though parity calculation is faster than fletcher, which is faster than 
sha256, it's all irrelevant, except in the hugest of file servers.  Go write to 
disk or read from disk as fast as you can, and see how much CPU you use.  Even 
on moderate fileservers that I've done this on (a dozen disks in parallel) the 
cpu load is negligible.  

If you ever get up to a scale where the cpu load becomes significant, you solve 
it by adding more cpu's.  There is a limit somewhere, but it's huge.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Zpool LUN Sizes

2012-10-27 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Fajar A. Nugraha
> 
> So my
> suggestion is actually just present one huge 25TB LUN to zfs and let
> the SAN handle redundancy.

Oh - No

Definitely let zfs handle the redundancy.  Because ZFS is doing the 
checksumming, if it finds a cksum error, it needs access to the redundant copy 
in order to correct it.  If you let the SAN handle the redundancy, then zfs 
finds a cksum error, and your data is unrecoverable.  (Just the file in 
question, not the whole pool or anything like that.)

The answer to Morris's question, about size of LUNs and so forth...  It really 
doesn't matter what size the LUNs are.  Just choose based on your redundancy 
and performance requirements.  Best would be to go JBOD, or if that's not 
possible, create a bunch of 1-disk volumes and let ZFS handle them as if 
they're JBOD.

Performance is much better if you use mirrors instead of raid.  (Sequential 
performance is just as good either way, but sequential IO is unusual for most 
use cases. Random IO is much better with mirrors, and that includes scrubs & 
resilvers.)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Zpool LUN Sizes

2012-10-27 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
> 
> Performance is much better if you use mirrors instead of raid.  (Sequential
> performance is just as good either way, but sequential IO is unusual for most
> use cases. Random IO is much better with mirrors, and that includes scrubs &
> resilvers.)

Even if you think you use sequential IO...  If you use snapshots...  Thanks to 
the nature of snapshot creation & deletion & the nature of COW, you probably 
don't have much sequential IO in your system, after a couple months of actual 
usage.  Some people use raidzN, but I always use mirrors.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Scrub and checksum permutations

2012-10-25 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Jim Klimov
> 
> Logically, yes - I agree this is what we expect to be done.
> However, at least with the normal ZFS reading pipeline, reads
> of redundant copies and parities only kick in if the first
> read variant of the block had errors (HW IO errors, checksum
> mismatch).

I haven't read or written the code myself personally, so I'm not authoritative. 
 But I certainly know I've heard it said on this list before, that when you 
read a mirror, it only reads one side (as you said) unless there's an error; 
this allows a mirror to read 2x faster than a single disk (which I confirm by 
benchmarking.)  However, a scrub reads both sides, all redundant copies of the 
data.  I'm personally comfortably confident assuming this is true also for 
reading the redundant copies of raidzN data.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Scrub and checksum permutations

2012-10-25 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Karl Wagner
> 
> I can only speak anecdotally, but I believe it does.
> 
> Watching zpool iostat it does read all data on both disks in a mirrored
> pair.
> 
> Logically, it would not make sense not to verify all redundant data.
> The point of a scrub is to ensure all data is correct.

Same for me.

Think about it:  When you write some block, it computes parity bits, and writes 
them to the redundant parity disks.  When you later scrub the same data, it 
wouldn't make sense to do anything other than repeating this process, to verify 
all the disks including parity.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs send to older version

2012-10-23 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Karl Wagner
> 
> The only thing I think Oracle should have done differently is to allow
> either a downgrade or creating a send stream in a lower version
> (reformatting the data where necessary, and disabling features which
> weren't present). However, this would not be a simple addition, and it
> is probably not worth it for Oracle's intended customers.

So you have a backup server in production, that has storage and does a zfs send 
to removable media, on periodic basis.  (I know I do.)

So you buy a new server, and it comes with a new version of zfs.  Now you can't 
backup your new server.

Or maybe you upgrade some other machine, and now you can't back *it* up.

The ability to either downgrade a pool, or send a stream that's compatible with 
an older version seems pretty obvious, as a missing feature.

I will comment on the irony, that right now, there's another thread on this 
list seeing a lot of attention, regarding how to receive a 'zfs send' data 
stream on non-ZFS systems.  But there is no discussion about receiving on older 
zfs systems.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs send to older version

2012-10-23 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Richard Elling [mailto:richard.ell...@gmail.com]
> 
> At some point, people will bitterly regret some "zpool upgrade" with no way
> back.
> 
> uhm... and how is that different than anything else in the software world?
> 
> No attempt at backward compatibility, and no downgrade path, not even by
> going back to an older snapshot before the upgrade.
> 
> ZFS has a stellar record of backwards compatibility. The only break with
> backwards
> compatibility I can recall was a bug fix in the send stream somewhere around
> opensolaris b34.
> 
> Perhaps you are confusing backwards compatibility with forwards
> compatibility?

Semantics.  New version isn't compatible with old version, or old version isn't 
compatible with new version.  Either way, same end result.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] What is L2ARC write pattern?

2012-10-23 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Jim Klimov
> 
>One idea I have is that a laptop which only has a single HDD slot,
> often has SD/MMC cardreader slots. If populated with a card for L2ARC,
> can it be expected to boost the laptop's ZFS performance?

You won't find that type of card with performance that's worth a damn.  Worse 
yet, it will likely be extremely unreliable.

In a SSD, all the performance and reliability come from intelligence in the 
controller, which emulates SATA HDD on one side, and manages Flash memory on 
the other side.  Things like wear leveling, block mapping, garbage collection, 
etc, that's where all the performance comes from.  You're not going to get it 
in a USB stick or a SD card.  You're only going to get it in full size SSD's 
that consume power, and to some extent, the good stuff will cost more.  (But of 
course, there's no way for the consumer to distinguish between paying for 
quality, and paying for marketing and margin, without trying it.)

Even if you do try it, most likely you won't know the difference until a month 
later, having two identical systems with identical workload side-by-side.  This 
is NOT to say the difference is insignificant; it's very significant, but 
without a point of reference, you don't have any comparison.  All the published 
performance specs are fudged - but not lies - they represent optimal 
conditions, which are unrealistic. All the mfgrs are going to publish 
comparable specs, and none of them represent real life usage.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] What happens when you rm zpool.cache?

2012-10-22 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Jim Klimov [mailto:jimkli...@cos.ru]
> Sent: Monday, October 22, 2012 7:26 AM
> 
> Are you sure that the system with failed mounts came up NOT in a
> read-only root moment, and that your removal of /etc/zfs/zpool.cache
> did in fact happen (and that you did not then boot into an earlier
> BE with the file still in it)?

I'm going to take your confusion and disbelief in support of my confusion and 
disbelief.  So it's not that I didn't understand what to expect ... it's that I 
somehow made a mistake, but I don't know what (and I don't care enough to try 
reproducing the same circumstance.)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] What happens when you rm zpool.cache?

2012-10-22 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Edward Ned Harvey
> 
> If you rm /etc/zfs/zpool.cache and reboot...  The system is smart enough (at
> least in my case) to re-import rpool, and another pool, but it didn't figure 
> out
> to re-import some other pool.
> 
> How does the system decide, in the absence of rpool.cache, which pools it's
> going to import at boot?

So, in this thread, I haven't yet got the answer that I expect or believe.  
Because, the behavior I observed was:

I did a "zfs send" from one system to another, received onto 
/localpool/backups.  Side note, the receiving system has three pools: rpool, 
localpool, and iscsipool.  Unfortunately, I sent the zfs properties with it, 
including the mountpoint.  Naturally, there was already something mounted on / 
and /exports and /exports/home, so the zfs receive failed to mount on the 
receiving system, but I didn't notice that.  Later, I rebooted.

During reboot, of course, rpool mounted correctly on /, but then the system 
found the localpool/backups filesystems, and mounted /exports, /exports/home 
and so forth.  So when it tried to mount rpool/exports, it failed.  Then, 
iscsipool was unavailable, so the system failed to bootup completely.  I was 
able to login to console as myself, but I had no home directory, so I su'd to 
root.

I tried to change the mountpoints of localpool/backups/exports and so forth - 
but it failed.  Filesystem is in use, or filesystem busy or something like 
that.  (Because I logged in, obviously.)  I tried to export localpool, and 
again failed.  So I wanted some way to prevent localpool from importing or 
mounting next time, although I can't make it unmount or change mountpoints this 
time.

rm /etc/zfs/zpool.cache ; init 6

This time, the system came up, and iscsipool was not imported (as expected.)  
But I was surprised - localpool was imported.

Fortunately, this time the system mounted filesystems in the right order - 
rpool/exports was mounted under /exports, and I was able to login as myself, 
and export/import / change mountpoints of the localpool filesystems.  One more 
reboot just to be sure, and voila, no problem.

Point in question is - After I removed the zpool.cache file, I expected rpool 
to be the only pool imported upon reboot.  That's not what I observed, and I 
was wondering how the system knew to import localpool?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] What happens when you rm zpool.cache?

2012-10-22 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Gary Mills
> 
> On Sun, Oct 21, 2012 at 11:40:31AM +0200, Bogdan Ćulibrk wrote:
> >Follow up question regarding this: is there any way to disable
> >automatic import of any non-rpool on boot without any hacks of
> removing
> >zpool.cache?
> 
> Certainly.  Import it with an alternate cache file.  You do this by
> specifying the `cachefile' property on the command line.  The `zpool'
> man page describes how to do this.

You can also specify cachefile=none
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] vm server storage mirror

2012-10-20 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Timothy Coalson [mailto:tsc...@mst.edu]
> Sent: Friday, October 19, 2012 9:43 PM
> 
> A shot in the dark here, but perhaps one of the disks involved is taking a 
> long
> time to return from reads, but is returning eventually, so ZFS doesn't notice
> the problem?  Watching 'iostat -x' for busy time while a VM is hung might tell
> you something.

Oh yeah - this is also bizarre.  I watched "zpool iostat" for a while.  It was 
showing me :
Operations (read and write) consistently 0
Bandwidth (read and write) consistently non-zero, but something small, like 
1k-20k or so.

Maybe that is normal to someone who uses zpool iostat more often than I do.  
But to me, zero operations resulting in non-zero bandwidth defies logic.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] What happens when you rm zpool.cache?

2012-10-20 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

If you rm /etc/zfs/zpool.cache and reboot...  The system is smart enough (at 
least in my case) to re-import rpool, and another pool, but it didn't figure 
out to re-import some other pool.

How does the system decide, in the absence of rpool.cache, which pools it's 
going to import at boot?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs send to older version

2012-10-19 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Richard Elling
> 
>> At some point, people will bitterly regret some "zpool upgrade" with no way
>> back.
> 
> uhm... and how is that different than anything else in the software world?

No attempt at backward compatibility, and no downgrade path, not even by going 
back to an older snapshot before the upgrade.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] vm server storage mirror

2012-10-19 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

Yikes, I'm back at it again, and so frustrated.

For about 2-3 weeks now, I had the iscsi mirror configuration in production, as 
previously described.  Two disks on system 1 mirror against two disks on system 
2, everything done via iscsi, so you could zpool export on machine 1, and then 
zpool import on machine 2 for a manual failover.

Created the dependency - initiator depends on target, and created a new smf 
service to mount the iscsi zpool after the initiator is up (and consequently 
export the zpool before the initiator shuts down.)  Able to reboot, everything 
working perfectly.

Until today.

Today I rebooted one system for some maintenance, and it stayed down longer 
than expected, so those disks started throwing errors on the second machine.  
First system eventually came up again, second system resilvered, everything 
looked good.  I zpool clear'd the pool on the second machine just to make the 
counters look pretty again.

But it wasn't pretty at all.

This is so bizarre - 

Throughout the day, the VM's on system 2 kept choking.  I had to powercycle 
system 2 about half a dozen times due to unresponsiveness.  Exactly the type of 
behavior you expect for IO error - but nothing whatsoever appears in the system 
log, and the zpool status still looks clean.

Several times, I destroyed the pool and recreated it completely from backup.  
zfs send and zfs receive both work fine.  But strangely - when I launch a VM, 
the IO grinds to a halt, and I'm forced to powercycle (usually) the host.

You might try to conclude it's something wrong with virtualbox - but it's not.  
I literally copied & pasted the zfs send | zfs receive commands that restored 
the pool from backup, but this time restored it onto local storage.  The only 
difference is local disk versus iscsi pool.  And then it finally worked without 
any glitches.

During the day, trying to get the iscsi pool up again - this is so bizarre - I 
did everything I could think of, to get back to a pristine state.  I removed 
iscsi targets, I removed lun's (lu's), I removed the static discovery and 
re-added it, got new device names, I wiped the disks (zpool destroy & zpool 
create)  re-created lu's, re-created static discovery, re-created targets, 
re-created zpools...  The behavior was the same no matter what I did.

I can create the pool, import it, zfs receive onto it no problem, but then when 
I launch the VM, the whole system grinds to a halt.  VirtualBox will be in a 
"sleep" state, Virtualbox shows the green light on the hard drive indicating 
it's trying to read, meanwhile if I try to X it out, it won't die, and gnome 
gives me the "Force Quit" dialog, meanwhile I can sudo kill -KILL VirtualBox, 
and VirtualBox *still* won't die.  Any "zpool" or "zfs" command I type in hangs 
indefinitely (even time-slider daemon or zfs auto snapshot are hung).  I can 
poke around the system in other areas - on other pools and stuff - but the only 
way out of it is power cycle.

It's so weird, that once the problem happens once, I have not yet found any way 
to recover from it except to reformat and reinstall the OS for the whole 
system.  I cannot, for the life of me, think of *any*thing that could be 
storing state like this, preventing me from getting back into a usable iscsi 
mirror pool.

One thing I haven't tried yet - 

It appears, I think, that when you make a disk, let's say c2t4d0 an iscsi 
target, let's say c6t7blahblahblahd0...  It appears, I think, that 
c6t7blahblahblahd0 is actually c2t4d0s2.  I could create a pool using c2t4d0, 
and/or zero the whole disk, completely obliterating any semblance of partition 
tables inside there, or old redundant copies of old uberblocks or anything like 
that.  But seriously, I'm grasping at straws here, just trying to find *any* 
place where some bad state is stored that I haven't thought of yet.

I shouldn't need to reformat the host.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Changing rpool device paths/drivers

2012-10-19 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of James C. McPherson
> 
> As far as I'm aware, having an rpool on multipathed devices
> is fine. 

Even a year ago, a new system I bought from Oracle came with multipath devices 
for all devices by default.  Granted, there weren't any multiple paths on that 
system...  But it was using the multipath device names.  I expect this is the 
new default for everything moving forward.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs send to older version

2012-10-19 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Ian Collins
> 
> You have to create pools/filesystems with the older versions used by the
> destination machine.

Apparently "zpool create -d -o version=28" you might want to do on the new 
system...  (I just wrote 28, but make sure it matches the latest version 
supported by your receiving system.)  You might have to do something similar to 
use an older ZFS version too.  And then you should be able to send from the new 
to the old system.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] openindiana-1 filesystem, time-slider, and snapshots

2012-10-16 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

Can anyone explain to me what the openindiana-1 filesystem is all about?  I 
thought it was the "backup" copy of the openindiana filesystem, when you apply 
OS updates, but that doesn't seem to be the case...

I have time-slider enabled for rpool/ROOT/openindiana.  It has a daily snapshot 
(amongst others).  But every day when the new daily snap is taken, the old 
daily snap rotates into the rpool/ROOT/openindiana-1 filesystem.  This is 
messing up my cron-scheduled "zfs send" script - which detects that the 
rpool/ROOT/openindiana filesystem no longer has the old daily snapshot, and 
therefore has no snapshot in common with the receiving system, and therefore 
sends a new full backup every night.

To make matters more confusing, when I run "mount" and when I zfs get all | 
grep -i mount, I see / on rpool/ROOT/openindiana-1

It would seem, I shouldn't be backing up openindiana, but instead, backup 
openindiana-1?  I would have sworn, out-of-the-box, there was no openindiana-1. 
 Am I simply wrong?

My expectation is that rpool/ROOT/openindiana should have lots of snaps 
available...  3 frequent: one every 15 mins, 23 hourly: one every hour, 6 
daily: one every day, 4 weekly: one every 7 days, etc.

I checked to ensure auto-snapshot service is enabled.  I checked svccfg to 
ensure I understood the correct interval, keep, and period (as described above.)

I have the expected behavior (as I described, the expected behavior according 
to my expectations) on rpool/export/home/eharvey...  But the behavior is 
different on rpool/ROOT/openindiana, even though, as far as I can tell, I have 
the same settings for both.  That is, simply, com.sun:auto-snapshot=true

One more comment - I recall, when I first configured time-slider, they have a 
threshold, default 80% pool used before they automatically bump off old 
snapshots (or stop taking new snaps, I'm not sure what the behavior is).  I 
don't see that setting anywhere I look, using svccfg or zfs get.

My pools are pretty much empty right now.  Nowhere near the 80% limit.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Fixing device names after disk shuffle

2012-10-14 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Paul van der Zwan
>  
> What was c5t2 is now c7t1 and what was c4t1 is now c5t2.
> Everything seems to be working fine, it's just a bit confusing.

That ... Doesn't make any sense.  Did you reshuffle these while the system was 
powered on or something?

sudo devfsadm -Cv
sudo zpool export datapool
sudo zpool export homepool
sudo zpool import -a
sudo reboot -p

The normal behavior is:  During the import, or during the reboot when the 
filesystem gets mounted, zfs searches the available devices in the system for 
components of a pool.  I don't see any way the devices reported by "zpool 
status" wouldn't match the devices reported by "format."  Unless, as you say, 
it's somehow overridden by the cache file.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS best practice for FreeBSD?

2012-10-14 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Edward Ned Harvey
> 
> A solid point.  I don't.
> 
> This doesn't mean you can't - it just means I don't.

This response was kind of long-winded.  So here's a simpler version:

Suppose 6 disks in a system, each 2T.  c0t0d0 through c0t5d0

rpool is a mirror:
mirror c0t0d0p1 c0t1d0p1

c0t0d0p2 = 1.9T, unused (Extended, unused)
c0t1d0p2 = 1.9T, unused (Extended, unused)

Now partition all the other disks the same.  Create datapool:

zpool create datapool \
mirror c0t0d0p2 c0t1d0p2 \
mirror c0t2d0p1 c0t3d0p1 \
mirror c0t2d0p2 c0t3d0p2 \
mirror c0t4d0p1 c0t5d0p1 \
mirror c0t4d0p2 c0t4d0p2

Add a spare?  A seventh disk, c0t6d0
Partition it.
add spare c0t6d0p1 spare c0t6d0p2

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS best practice for FreeBSD?

2012-10-14 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Ian Collins [mailto:i...@ianshome.com]
> 
> On 10/13/12 02:12, Edward Ned Harvey
> (opensolarisisdeadlongliveopensolaris) wrote:
> > There are at least a couple of solid reasons *in favor* of partitioning.
> >
> > #1  It seems common, at least to me, that I'll build a server with let's 
> > say, 12
> disk slots, and we'll be using 2T disks or something like that.  The OS itself
> only takes like 30G which means if I don't partition, I'm wasting 1.99T on 
> each
> of the first two disks.  As a result, when installing the OS, I always 
> partition
> rpool down to ~80G or 100G, and I will always add the second partitions of
> the first disks to the main data pool.
> 
> How do you provision a spare in that situation?

A solid point.  I don't.

This doesn't mean you can't - it just means I don't.

If I'm not mistaken...  If you have a pool with multiple different sizes of 
devices in the pool, you only need to add a spare of the larger size.  If you 
have a smaller device failure, I believe the pool will use the larger spare 
device rather than not using a spare.  So if I'm not mistaken, you can add a 
spare to your pool exactly the same, regardless of having partitions or no 
partitions.

If I'm wrong - if the pool won't use the larger spare device in place of a 
smaller failed device (partition), then you would likely need to add one spare 
for each different size device used in your pool.  In particular, this means:

Option 1:  Given that you partition your first 2 disks, 80G for OS and 1.99T 
for data, you would likely want to partition *all* your disks the same, 
including the disk that's designated as a spare.  Then you could add your spare 
80G partition as a spare device, and your spare 1.99T partition as a spare 
device.

Option 2:  Suppose you partition your first disks, and you don't want to hassle 
on all the rest. (This is my case.)  Or you have physically different size 
devices, a pool that was originall made of 1T disks but now it's been extended 
to include a bunch of 2T disks, or something like that.  It's conceivable you 
would want to have a spare of each different size, which could in some cases 
mean you use two spares (one partitioned and one not) in a pool where you might 
otherwise have only one spare.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Building an On-Site and Off-Size ZFS server, replication question

2012-10-12 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

Jim, I'm trying to contact you off-list, but it doesn't seem to be working.  
Can you please contact me off-list?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS best practice for FreeBSD?

2012-10-12 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of andy thomas
> 
> According to a Sun document called something like 'ZFS best practice' I
> read some time ago, best practice was to use the entire disk for ZFS and
> not to partition or slice it in any way. Does this advice hold good for
> FreeBSD as well?

I'm not going to address the FreeBSD question.  I know others have made some 
comments on the "best practice" on solaris, but here goes:

There are two reasons for the "best practice" of not partitioning.  And I 
disagree with them both.

First, by default, the on-disk write cache is disabled.  But if you use the 
whole disk in a zpool, then zfs enables the cache.  If you partition a disk and 
use it for only zpool's, then you might want to manually enable the cache 
yourself.  This is a fairly straightforward scripting exercise.  You may use 
this if you want:  (No warranty, etc, it will probably destroy your system if 
you don't read and understand and rewrite it yourself before attempting to use 
it.)
https://dl.dropbox.com/u/543241/dedup%20tests/cachecontrol/cachecontrol.zip

If you do that, you'll need to re-enable the cache once on each boot (or zfs 
mount).

The second reason is because when you "zpool import" it doesn't automatically 
check all the partitions of all the devices - it only scans devices.  So if you 
are forced to move your disks to a new system, you try to import, you get an 
error message, you panic and destroy your disks.  To overcome this problem, you 
just need to be good at remembering the disks were partitioned - Perhaps you 
should make a habit of partitioning *all* of your disks, so you'll *always* 
remember.  On zpool import, you need to specify the partitions to scan for 
zpools.  I believe this is the "zpool import -d" option.

And finally - 

There are at least a couple of solid reasons *in favor* of partitioning.

#1  It seems common, at least to me, that I'll build a server with let's say, 
12 disk slots, and we'll be using 2T disks or something like that.  The OS 
itself only takes like 30G which means if I don't partition, I'm wasting 1.99T 
on each of the first two disks.  As a result, when installing the OS, I always 
partition rpool down to ~80G or 100G, and I will always add the second 
partitions of the first disks to the main data pool.

#2  A long time ago, there was a bug, where you couldn't attach a mirror unless 
the two devices had precisely the same geometry.  That was addressed in a 
bugfix a couple of years ago.  (I had a failed SSD mirror, and Sun shipped me a 
new SSD with a different firmware rev, and the size of the replacement device 
was off by 1 block, so I couldn't replace the failed SSD).  After the bugfix, a 
mirror can be attached if there's a little bit of variation in the sizes of the 
two devices.  But it's not quite enough - As recently as 2 weeks ago, I tried 
to attach two devices that were precisely the same, but couldn't because of the 
different size.  One of them was a local device, and the other was an iscsi 
target.   So I guess iscsi must require a little bit of space, and that was 
enough to make the devices un-mirror-able without partitioning.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Building an On-Site and Off-Size ZFS server, replication question

2012-10-12 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Richard Elling [mailto:richard.ell...@gmail.com]
> 
> Pedantically, a pool can be made in a file, so it works the same...

Pool can only be made in a file, by a system that is able to create a pool.  
Point is, his receiving system runs linux and doesn't have any zfs; his 
receiving system is remote from his sending system, and it has been suggested 
that he might consider making an iscsi target available, so the sending system 
could "zpool create" and "zfs receive" directly into a file or device on the 
receiving system, but it doesn't seem as if that's going to be possible for him 
- he's expecting to transport the data over ssh.  So he's looking for a way to 
do a "zfs receive" on a linux system, transported over ssh.  Suggested answers 
so far include building a VM on the receiving side, to run openindiana (or 
whatever) or using zfs-fuse-linux. 

He is currently writing his "zfs send" datastream into a series of files on the 
receiving system, but this has a few disadvantages as compared to doing "zfs 
receive" on the receiving side.  Namely, increased risk of data loss and less 
granularity for restores.  For these reasons, it's been suggested to find a way 
of receiving via "zfs receive" and he's exploring the possibilities of how to 
improve upon this situation.  Namely, how to "zfs receive" on a remote linux 
system via ssh, instead of cat'ing or redirecting into a series of files.

There, I think I've recapped the whole thread now.   ;-)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Building an On-Site and Off-Size ZFS server, replication question

2012-10-11 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Richard Elling [mailto:richard.ell...@gmail.com]
> 
> Read it again he asked, "On that note, is there a minimal user-mode zfs thing
> that would allow
> receiving a stream into an image file?"  Something like:
>   zfs send ... | ssh user@host "cat > file"

He didn't say he wanted to cat to a file.  But it doesn't matter.  It was only 
clear from context, responding to the advice of "zfs receive"ing into a 
zpool-in-a-file, that it was clear he was asking about doing a "zfs receive" 
into a file, not just cat.  If you weren't paying close attention to the 
thread, it would be easy to misunderstand what he was asking for.

When he asked for "minimal user-mode" he meant, something less than a 
full-blown OS installation just for the purpose of zfs receive.  He went on to 
say, he was considering zfs-fuse-on-linux.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Directory is not accessible

2012-10-10 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Sami Tuominen
> 
> Unfortunately there aren't any snapshots.
> The version of zpool is 15. Is it safe to upgrade that?
> Is zpool clear -F supported or of any use here?

The only thing that will be of use to restore your data will be a backup.

To forget about the lost data and make the error message go away, simply rm the 
bad directory (and/or its parent).

You're probably wondering, you have redundancy and no faulted devices, so how 
could this happen?  There are a few possible explanations, but they're all 
going to have one thing in common:  At some point, something got corrupted 
before it was written corrupted and the redundant copy also written corrupted.  
It might be you had a CPU error, or some parity error in non-ECC ram, or a bus 
glitch or bad firmware in the HBA, for example.  The fact remains, something 
was written corrupted, and the redundant copy was also written corrupted.  All 
you can do is restore from a snapshot, restore from a backup, or accept it for 
what it is and make the error go away.

Sorry to hear it...

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Building an On-Site and Off-Size ZFS server, replication question

2012-10-10 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Richard Elling
> 
> >> If the recipient system doesn't support "zfs receive," [...]
> >
> > On that note, is there a minimal user-mode zfs thing that would allow
> > receiving a stream into an image file? No need for file/directory access
> > etc.
> 
> cat :-)

He was asking if it's possible to do "zfs receive" on a system that doesn't 
natively support zfs.  The answer is no, unless you want to consider fuse or 
similar.  I can't speak about zfs on fuse or anything - except that I 
personally wouldn't trust it.  There are differences even between zfs on 
solaris versus freebsd, vs whatever, all of which are fully supported, much 
better than zfs on fuse.  But different people use and swear by all of these 
things - so maybe it would actually be a good solution for you.

The direction I would personally go would be an openindiana virtual machine to 
do the zfs receive.


> > I was thinking maybe the zfs-fuse-on-linux project may have suitable bits?
> 
> I'm sure most Linux distros have cat

hehe.  Anyway.  Answered above.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How many disk in one pool

2012-10-05 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Albert Shih
> 
> I'm actually running ZFS under FreeBSD. I've a question about how many
> disks I  have in one pool.
> 
> At this moment I'm running with one server (FreeBSD 9.0) with 4 MD1200
> (Dell) meaning 48 disks. I've configure with 4 raidz2 in the pool (one on
> each MD1200)
> 
> On what I understand I can add more more MD1200. But if I loose one
> MD1200
> for any reason I lost the entire pool.
> 
> In your experience what's the  ? 100 disk ?
> 
> How FreeBSD manage 100 disk ? /dev/da100 ?

Correct about if you lose one storage tray you lose the pool.  Ideally you 
would span your redundancy across trays as well as across disks - but in your 
situation, 12 disks in raidz2 - and 4 trays - it's just not realistic for you.  
You would have to significantly increase cost (not to mention rebuild pool) in 
order to keep the same available disk space and gain the redundancy.

Go ahead and add more trays.  I've never heard of any limit of number of disks 
you can have in ZFS.  I'm sure there is a limit, but whatever it is, you're 
nowhere near it.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Building an On-Site and Off-Size ZFS server, replication question

2012-10-05 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Frank Cusack
> 
> On Fri, Oct 5, 2012 at 3:17 AM, Ian Collins  wrote:
> I do have to suffer a slow, glitchy WAN to a remote server and rather than
> send stream files, I broke the data on the remote server into a more fine
> grained set of filesystems than I would do normally.  In this case, I made the
> directories under what would have been the leaf filesystems filesystems
> themselves.
> 
> Meaning you also broke the data on the LOCAL server into the same set of
> more granular filesystems?  Or is it now possible to zfs send a subdirectory 
> of
> a filesystem?

"zfs create" instead of "mkdir"

As Ian said - he didn't zfs send subdirs, he made filesystems where he 
otherwise would have used subdirs.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] vm server storage mirror

2012-10-05 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Edward Ned Harvey
> 
> I must be missing something - I don't see anything above that indicates any
> required vs optional dependencies.

Ok, I see that now.  (Thanks to the SMF FAQ).
A dependency may have grouping optional_all, require_any, or require_all.

Mine is require_all, and I figured out the problem.  I had my automatic zpool 
import/export script dependent on the initiator ... But it wasn't the initiator 
going down first.  It was the target going down first.

So the solution is like this:

sudo svccfg -s svc:/network/iscsi/initiator:default
svc:/network/iscsi/initiator:default> addpg iscsi-target dependency
svc:/network/iscsi/initiator:default> setprop iscsi-target/grouping = astring: 
"require_all"
svc:/network/iscsi/initiator:default> setprop iscsi-target/restart_on = 
astring: "none"
svc:/network/iscsi/initiator:default> setprop iscsi-target/type = astring: 
"service"
svc:/network/iscsi/initiator:default> setprop iscsi-target/entities = fmri: 
"svc:/network/iscsi/target:default"
svc:/network/iscsi/initiator:default> exit

sudo svcadm refresh svc:/network/iscsi/initiator:default

And additionally, create the SMF service dependent on initiator, which will 
import/export the iscsi pools automatically.

http://nedharvey.com/blog/?p=105

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Building an On-Site and Off-Size ZFS server, replication question

2012-10-05 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Tiernan OToole
> 
> I am in the process of planning a system which will have 2 ZFS servers, one on
> site, one off site. The on site server will be used by workstations and 
> servers
> in house, and most of that will stay in house. There will, however, be data i
> want backed up somewhere else, which is where the offsite server comes
> in... This server will be sitting in a Data Center and will have some storage
> available to it (the whole server currently has 2 3Tb drives, though they are
> not dedicated to the ZFS box, they are on VMware ESXi). There is then some
> storage (currently 100Gb, but more can be requested) of SFTP enabled
> backup which i plan to use for some snapshots, but more on that later.
> 
> Anyway, i want to confirm my plan and make sure i am not missing anything
> here...
> 
> * build server in house with storage, pools, etc...
> * have a server in data center with enough storage for its reason, plus the
> extra for offsite backup
> * have one pool set as my "offsite" pool... anything in here should be backed
> up off site also...
> * possibly have another set as "very offsite" which will also be pushed to the
> SFTP server, but not sure...
> * give these pools out via SMB/NFS/iSCSI
> * every 6 or so hours take a snapshot of the 2 offsite pools.
> * do a ZFS send to the data center box
> * nightly, on the very offsite pool, do a ZFS send to the SFTP server
> * if anything goes wrong (my server dies, DC server dies, etc), Panic,
> download, pray... the usual... :)
> 
> Anyway, I want to make sure i am doing this correctly... Is there anything on
> that list that sounds stupid or am i doing anything wrong? am i missing
> anything?
> 
> Also, as a follow up question, but slightly unrelated, when it comes to the 
> ZFS
> Send, i could use SSH to do the send, directly to the machine... Or i could
> upload the compressed, and possibly encrypted dump to the server... Which,
> for resume-ability and speed, would be suggested? And if i where to go with
> an upload option, any suggestions on what i should use?

It is recommended, whenever possible, you should pipe the "zfs send" directly 
into a "zfs receive" on the receiving system.  For two solid reasons:

If a single bit is corrupted, the whole stream checksum is wrong and therefore 
the whole stream is rejected.  So if this occurs, you want to detect it (in the 
form of one incremental failed) and then correct it (in the form of the next 
incremental succeeding).  Whereas, if you store your streams on storage, it 
will go undetected, and everything after that point will be broken.

If you need to do a restore, from a stream stored on storage, then your only 
choice is to restore the whole stream.  You cannot look inside and just get one 
file.  But if you had been doing send | receive, then you obviously can look 
inside the receiving filesystem and extract some individual specifics.

If the recipient system doesn't support "zfs receive," you might consider 
exporting an iscsi device, and allowing the sender system deal with it 
directly.  Or share a filesystem (such as NFS) with the sender system, and let 
the sender create a recipient filesystem inside a file container, so the sender 
can deal with it directly.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Making ZIL faster

2012-10-05 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

> From: Neil Perrin [mailto:neil.per...@oracle.com]
> 
> In general - yes, but it really depends. Multiple synchronous writes of any
> size
> across multiple file systems will fan out across the log devices. That is
> because there is a separate independent log chain for each file system.
> 
> Also large synchronous writes (eg 1MB) within a specific file system will be
> spread out.
> The ZIL code will try to allocate a block to hold all the records it needs to
> commit up to the largest block size - which currently for you should be 128KB.
> Anything larger will allocate a new block - on a different device if there are
> multiple devices.
> 
> However, lots of small synchronous writes to the same file system might not
> use more than one 128K block and benefit from multiple slog devices.

That is an awesome explanation.  Thank you.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

1 2 >

1 - 100 of 126 matches

Mail list logo