date:20060913

Re: [zfs-discuss] Proposal: multiple copies of user data

2006-09-13 Thread Matthew Ahrens


Torrey McMahon wrote:

Matthew Ahrens wrote:
The problem that this feature attempts to address is when you have 
some data that is more important (and thus needs a higher level of 
redundancy) than other data.  Of course in some situations you can use 
multiple pools, but that is antithetical to ZFS's pooled storage 
model.  (You have to divide up your storage, you'll end up with 
stranded storage and bandwidth, etc.) 


Can you expand? I can think of some examples where using multiple pools 
- even on the same host - is quite useful given the current feature set 
of the product.  Or are you only discussing the specific case where a
host would want more reliability for a certain set of data then an 
other? If that's the case I'm still confused as to what failure cases 
would still allow you to retrieve your data if there are more then one 
copy in the fs or pool.but I'll gladly take some enlightenment. :)


(My apologies for the length of this response, I'll try to address most 
of the issues brought up recently...)


When I wrote this proposal, I was only seriously thinking about the case 
where you want different amounts of redundancy for different data. 
Perhaps because I failed to make this clear, discussion has concentrated 
on laptop reliability issues.  It is true that there would be some 
benefit to using multiple copies on a single-disk (eg. laptop) pool, but 
of course it would not protect against the most common failure mode 
(whole disk failure).


One case where this feature would be useful is if you have a pool with 
no redundancy (ie. no mirroring or raid-z), because most of the data in 
the pool is not very important.  However, the pool may have a bunch of 
disks in it (say, four).  The administrator/user may realize (perhaps 
later on) that some of their data really *is* important and they would 
like some protection against losing it if a disk fails.  They may not 
have the option of adding more disks to mirror all of their data (cost 
or physical space constraints may apply here).  Their problem is solved 
by creating a new filesystem with copies=2 and putting the important 
data there.  Now, if a disk fails, then the data in the copies=2 
filesystem will not be lost.  Approximately 1/4 of the data in other 
filesystems will be lost.  (There is a small chance that some tiny 
fraction of the data in the copies=2 filesystem will still be lost if we 
were forced to put both copies on the disk that failed.)


Another plausible use case would be where you have some level of 
redundancy, say you have a Thumper (X4500) with its 48 disks configured 
into 9 5-wide single-parity raid-z groups (with 3 spares).  If a single 
disk fails, there will be no data loss.  However, if two disks within 
the same raid-z group fail, data will be lost.  In this scenario, 
imagine that this data loss probability is acceptable for most of the 
data stored here, but there is some extremely important data for which 
this is unacceptable.  Rather than reconfiguring the entire pool for 
higher redundancy (say, double-parity raid-z) and less usable storage, 
you can simply create a filesystem with copies=2 within the raid-z 
storage pool.  Data within that filesystem will not be lost even if any 
three disks fail.


I believe that these use cases, while not being extremely common, do 
occur.  The extremely low amount of engineering effort required to 
implement the feature (modulo the space accounting issues) seems 
justified.  The fact that this feature does not solve all problems (eg, 
it is not intended to be a replacement for mirroring) is not a downside; 
not all features need to be used in all situations :-)


The real problem with this proposal is the confusion surrounding disk 
space accounting with copies1.  While the same issues are present when 
using compression, people are understandably less upset when files take 
up less space than expected.  Given the current lack of interest in this 
feature, the effort required to address the space accounting issue does 
not seem justified at this time.


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Proposal: multiple copies of user data

2006-09-13 Thread eric kustarz


David Dyer-Bennet wrote:


On 9/12/06, eric kustarz [EMAIL PROTECTED] wrote:


So it seems to me that having this feature per-file is really useful.
Say i have a presentation to give in Pleasanton, and the presentation
lives on my single-disk laptop - I want all the meta-data and the actual
presentation to be replicated.  We already use ditto blocks for the
meta-data.  Now we could have an extra copy of the actual data.  When i
get back from the presentation i can turn off the extra copies.



Yes, you could do that.

*I* would make a copy on a CD, which I would carry in a separate case
from the laptop.



Do you backup the presentation to CD everytime you make an edit?



I think my presentation is a lot safer than your presentation.



I'm sure both of our presentations would be equally safe as we would 
know not to have the only copy(ies) on our personage.




Similarly for your digital images example; I don't consider it safe
until I have two or more *independent* copies.  Two copies on a single
hard drive doesn't come even close to passing the test for me; as many
people have pointed out, those tend to fail all at once.  And I will
also point out that laptops get stolen a lot.  And of course all the
accidents involving fumble-fingers, OS bugs, and driver bugs won't be
helped by the data duplication either.  (Those will mostly be helped
by sensible use of snapshots, though, which is another argument for
ZFS on *any* disk you work on a lot.)



Well of course you would have a separate, independent copy if it really 
mattered.




The more I look at it the more I think that a second copy on the same
disk doesn't protect against very much real-world risk.  Am I wrong
here?  Are partial(small) disk corruptions more common than I think?
I don't have a good statistical view of disk failures.



Well let's see - my friend accompanied me on a trip and saved her photos 
daily onto her laptop.  Near the end of the trip her hard drive started 
having problems.  The hard drive was not dead, as it was bootable and 
you could access certain data.  Upon returning home she was able to 
retrieve some of her photos but not all.  She would have been much 
happier having ZFS + copies.


And yes, you could backup to CD/DVD every night, but its a pain and 
people don't do it (as much as they should).


Side note: it would have cost hundreds of dollars for data recovery to 
have just the *possibility* to get the other photos.


eric


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Proposal: multiple copies of user data

2006-09-13 Thread eric kustarz


Torrey McMahon wrote:


eric kustarz wrote:


Matthew Ahrens wrote:


Matthew Ahrens wrote:

Here is a proposal for a new 'copies' property which would allow 
different levels of replication for different filesystems.




Thanks everyone for your input.

The problem that this feature attempts to address is when you have 
some data that is more important (and thus needs a higher level of 
redundancy) than other data.  Of course in some situations you can 
use multiple pools, but that is antithetical to ZFS's pooled storage 
model.  (You have to divide up your storage, you'll end up with 
stranded storage and bandwidth, etc.)


Given the overwhelming criticism of this feature, I'm going to 
shelve it for now.




So it seems to me that having this feature per-file is really 
useful.  Say i have a presentation to give in Pleasanton, and the 
presentation lives on my single-disk laptop - I want all the 
meta-data and the actual presentation to be replicated.  We already 
use ditto blocks for the meta-data.  Now we could have an extra copy 
of the actual data.  When i get back from the presentation i can turn 
off the extra copies. 



Under what failure nodes would your data still be accessible? What 
things can go wrong that still allow you to access the data because 
some event has removed one copy but left the others?



Silent data corruption of one of the copies.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re: Re: Proposal: multiple copies of user data

2006-09-13 Thread Dick Davies


On 13/09/06, Matthew Ahrens [EMAIL PROTECTED] wrote:

Dick Davies wrote:



 But they raise a lot of administrative issues

Sure, especially if you choose to change the copies property on an
existing filesystem.  However, if you only set it at filesystem creation
time (which is the recommended way), then it's pretty easy to address
your issues:


You're right, that would prevent getting into some nasty messes (I see
this as closer to encryption than compression in that respect).

I still feel we'd be doing the same job in several places.
But I'm sure anyone who cares has a pretty good idea of my opinion,
so I'll shut up now :)

Thanks for taking the time to feedback on the feedback.

--
Rasputin :: Jack of All Trades - Master of Nuns
http://number9.hellooperator.net/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS imported simultanously on 2 systems...

2006-09-13 Thread Michael Schuster


I think this is user error: the man page explicitly says:

 -f   Forces import, even if the pool appears  to  be
  potentially active.

and that's exactly what you did. If the behaviour had been the same without 
the -f option, I guess this would be a bug.


HTH

Mathias F wrote:

Hi,

we are testing ZFS atm as a possible replacement for Veritas VM.
While testing, we encountered a serious problem, which corrupted the whole 
filesystem.

First we created a standard Raid10 with 4 disks.
[b]NODE2:../# zpool create -f swimmingpool mirror c0t3d0 c0t11d0 mirror c0t4d0 
c0t12d0

NODE2:../# zpool list
NAMESIZEUSED   AVAILCAP  HEALTH ALTROOT
swimmingpool   33.5G 81K   33.5G 0%  ONLINE -

NODE2:../# zpool status
pool: swimmingpool
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
swimmingpool  ONLINE   0 0 0
mirror ONLINE   0 0 0
c0t3d0   ONLINE   0 0 0
c0t11d0  ONLINE   0 0 0
mirror ONLINE   0 0 0
c0t4d0   ONLINE   0 0 0
c0t12d0  ONLINE   0 0 0
errors: No known data errors
[/b]

After that we made a new ZFS and copied a testing file on it.

[b]NODE2:../# zfs create swimmingpool/babe

NODE2:../# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
swimmingpool   108K  33.0G  25.5K  /swimmingpool
swimmingpool/babe 24.5K  33.0G  24.5K  /swimmingpool/babe

NODE2:../# cp /etc/hosts /swimmingpool/babe/
[/b]

Now we test the behaviour if importing the ZFS on another system while it is 
still imported on the first one.
The expected behaviour would be that ZFS couldn't be imported due to possible 
corruption, but instead it is imported just fine!
We now were able to write simultanously from both systems on the same ZFS:

[b]NODE1:../# zpool import -f swimmingpool
NODE1:../# man man  /swimmingpool/babe/man
NODE2:../# cat /dev/urandom  /swimmingpool/babe/testfile 
NODE1:../# cat /dev/urandom  /swimmingpool/babe/testfile2 

NODE1:../# ls -l /swimmingpool/babe/
-r--r--r--   1 root root   2194 Sep  8 14:31 hosts
-rw-r--r--   1 root root  17531 Sep  8 14:52 man
-rw-r--r--   1 root root 3830447920 Sep  8 16:20 testfile2

NODE2:../# ls -l /swimmingpool/babe/
-r--r--r--   1 root root   2194 Sep  8 14:31 hosts
-rw-r--r--   1 root root 3534355760 Sep  8 16:19 testfile
[/b]

This can't be supposed to be the normal behaviour.
Did we encounter a bug or is this still under development?
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
Michael Schuster  +49 89 46008-2974 / x62974
visit the online support center:  http://www.sun.com/osc/

Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Memory Usage

2006-09-13 Thread Wee Yeh Tan


On 9/13/06, Thomas Burns [EMAIL PROTECTED] wrote:

BTW -- did I guess right wrt where I need to set arc.c_max (etc/system)?


I think you need to use mdb.  As Mark and Johansen mentioned, only do
this as your last resort.

# mdb -kw

arc::print -a c_max

d3b0f874 c_max = 0x1d0fe800

d3b0f874 /W 0x1000

arc+0x34:   0x1d0fe800  =   0x1000

arc::print -a c_max

d3b0f874 c_max = 0x1000

$q



--
Just me,
Wire ...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread Mathias F

Well, we are using the -f parameter to test failover functionality.
If one system with mounted ZFS is down, we have to use the force to mount it on 
the failover system.
But when the failed system comes online again, it remounts the ZFS without 
errors, so it is mounted simultanously on both nodes

That's the real problem we have :[

mfg
Mathias

 I think this is user error: the man page explicitly
 says:
 
 -f   Forces import, even if the pool
  appears  to  be
   potentially active.
 y what you did. If the behaviour had been the same
 without 
 the -f option, I guess this would be a bug.
 
 HTH
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread Michael Schuster


Mathias F wrote:

Well, we are using the -f parameter to test failover functionality.
If one system with mounted ZFS is down, we have to use the force to mount it on 
the failover system.
But when the failed system comes online again, it remounts the ZFS without 
errors, so it is mounted simultanously on both nodes


ZFS currently doesn't support this, I'm sorry to say. *You* have to make sure 
that a zpool is not imported on more than one node at a time.


regards
--
Michael Schuster  +49 89 46008-2974 / x62974
visit the online support center:  http://www.sun.com/osc/

Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread Thomas Wagner

 
On Wed, Sep 13, 2006 at 12:28:23PM +0200, Michael Schuster wrote:
 Mathias F wrote:
 Well, we are using the -f parameter to test failover functionality.
 If one system with mounted ZFS is down, we have to use the force to mount 
 it on the failover system.
 But when the failed system comes online again, it remounts the ZFS without 
 errors, so it is mounted simultanously on both nodes

This is used on a regularly basis within cluster frameworks...

 ZFS currently doesn't support this, I'm sorry to say. *You* have to make 
 sure that a zpool is not imported on more than one node at a time.

Why not using a real cluster-software as *You*, taking care of using
resources like a filesystem (ufs, zfs, others...) in a consistent way?

I think ZFS does enough to make shure not accidentially using 
filesystems/pools from more then one hosts at a time. If you 
want more, please consider using a cluster-framework with heartbeats 
and all that great stuff ...


Regards,
Thomas

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] 'zfs mirror as backup' status?

2006-09-13 Thread Dick Davies


Since we were just talking about resilience on laptops,
I wondered if it there had been any progress in sorting
some of the glitches that were involved in:

http://www.opensolaris.org/jive/thread.jspa?messageID=25144#25144

?
--
Rasputin :: Jack of All Trades - Master of Nuns
http://number9.hellooperator.net/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread Mathias F

Without -f option, the ZFS can't be imported while reserved for the other 
host, even if that host is down.

As I said, we are testing ZFS as a [b]replacement for VxVM[/b], which we are 
using atm. So as a result our tests have failed and we have to keep on using 
Veritas.

Thanks for all your answers.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Proposal: multiple copies of user data

2006-09-13 Thread Mike Gerdts


On 9/13/06, Richard Elling [EMAIL PROTECTED] wrote:

 * Mirroring offers slightly better redundancy, because one disk from
each mirror can fail without data loss.

 Is this use of slightly based upon disk failure modes?  That is, when
 disks fail do they tend to get isolated areas of badness compared to
 complete loss?  I would suggest that complete loss should include
 someone tripping over the power cord to the external array that houses
 the disk.

The field data I have says that complete disk failures are the exception.
I hate to leave this as a teaser, I'll expand my comments later.

BTW, this feature will be very welcome on my laptop!  I can't wait :-)


On servers and stationary desktops, I just don't care whether it is a
whole disk failure or a few bad blocks.  In that case I have the
resources to mirror, RAID5, perform daily backups, etc.

The laptop disk failures that I have seen have typically been limited
to a few bad blocks.  As Torey McMahon mentioned, they tend to start
out with some warning signs followed by a full failure.  I would
*really* like to have that window between warning signs and full
failure as my opportunity to back up my data and replace my
non-redundant hard drive with no data loss.

The only part of the proposal I don't like is space accounting.
Double or triple charging for data will only confuse those apps and
users that check for free space or block usage.  If this is worked
out, it would be a great feature for those times when mirroring just
isn't an option.

Mike

--
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Proposal: multiple copies of user data

2006-09-13 Thread Tobias Schacht


On 9/13/06, Mike Gerdts [EMAIL PROTECTED] wrote:

The only part of the proposal I don't like is space accounting.
Double or triple charging for data will only confuse those apps and
users that check for free space or block usage.


Why exactly isn't reporting the free space divided by the copies
value on that particular file system an easy solution for this? Did I
miss something?


Tobias
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread Michael Schuster


Mathias F wrote:

Without -f option, the ZFS can't be imported while reserved for the other 
host, even if that host is down.

As I said, we are testing ZFS as a [b]replacement for VxVM[/b], which we are 
using atm. So as a result our tests have failed and we have to keep on using 
Veritas.

Thanks for all your answers.


I think I get the whole picture, let me summarise:

- you create a pool P and an FS on host A
- Host A crashes
- you import P on host B; this only works with -f, as zpool import otherwise 
refuses to do so.

- now P is imported on B
- host A comes back up and re-accesses P, thereby leading to (potential) 
corruption.
- your hope was that when host A comes back, there exists a mechanism for 
telling it you need to re-import.

- Vxvm, as you currently use it, has this functionality

Is that correct?

regards
--
Michael Schuster  +49 89 46008-2974 / x62974
visit the online support center:  http://www.sun.com/osc/

Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread James C. McPherson


Mathias F wrote:

Without -f option, the ZFS can't be imported while reserved for the other
host, even if that host is down.


This is the correct behaviour. What do you want to cause? data corruption?


As I said, we are testing ZFS as a [b]replacement for VxVM[/b], which we
are using atm. So as a result our tests have failed and we have to keep on
using Veritas.


As I understand things, SunCluster 3.2 is expected to have support for HA-ZFS
and until that version is released you will not be running in a supported
configuration and so any errors you encounter are *your fault alone*.


Didn't we have the PMC (poor man's cluster) talk last week as well?


James C. McPherson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS API (again!), need quotactl(7I)

2006-09-13 Thread Boyd Adamson


On 13/09/2006, at 2:29 AM, Eric Schrock wrote:

On Tue, Sep 12, 2006 at 07:23:00AM -0400, Jeff A. Earickson wrote:


Modify the dovecot IMAP server so that it can get zfs quota  
information
to be able to implement the QUOTA feature of the IMAP protocol  
(RFC 2087).

In this case pull the zfs quota numbers for quoted home directory/zfs
filesystem.  Just like what quotactl() would do with UFS.

I am really surprised that there is no zfslib API to query/set zfs
filesystem properties.  Doing a fork/exec just to execute a zfs get
or zfs set is expensive and inelegant.


The libzfs API will be made public at some point.  However, we need to
finish implementing the bulk of our planned features before we can  
feel

comfortable with the interfaces.  It will take a non-trivial amount of
work to clean up all the interfaces as well as document them.  It will
be done eventually, but I wouldn't expect it any time soon - there are
simply too many important things to get done first.

If you don't care about unstable interfaces, you're welcome to use  
them

as-is.  If you want a stable interface, you are correct that the only
way is through invoking 'zfs get' and 'zfs set'.


I'm sure I'm missing something, but is there some reason that statvfs 
() is not good enough?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread Zoram Thanga


Hi Mathias,

Mathias F wrote:

Without -f option, the ZFS can't be imported while reserved for the other 
host, even if that host is down.

As I said, we are testing ZFS as a [b]replacement for VxVM[/b], which we are 
using atm. So as a result our tests have failed and we have to keep on using 
Veritas.


Sun Cluster 3.2, which is in beta at the moment, will allow you to do 
this automatically. I don't think what you are trying to do here will be 
supportable unless it's managed by SC3.2.


Let me know if you'd like to try out the SC3.2 beta.


Thanks,
Zoram
--
Zoram Thanga::Sun Cluster Development::http://blogs.sun.com/zoram
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Proposal: multiple copies of user data

2006-09-13 Thread Al Hopper

On Tue, 12 Sep 2006, Matthew Ahrens wrote:

 Torrey McMahon wrote:
  Matthew Ahrens wrote:
  The problem that this feature attempts to address is when you have
  some data that is more important (and thus needs a higher level of
  redundancy) than other data.  Of course in some situations you can use
  multiple pools, but that is antithetical to ZFS's pooled storage
  model.  (You have to divide up your storage, you'll end up with
  stranded storage and bandwidth, etc.)
 
  Can you expand? I can think of some examples where using multiple pools
  - even on the same host - is quite useful given the current feature set
  of the product.  Or are you only discussing the specific case where a
  host would want more reliability for a certain set of data then an
  other? If that's the case I'm still confused as to what failure cases
  would still allow you to retrieve your data if there are more then one
  copy in the fs or pool.but I'll gladly take some enlightenment. :)

 (My apologies for the length of this response, I'll try to address most
 of the issues brought up recently...)

 When I wrote this proposal, I was only seriously thinking about the case
 where you want different amounts of redundancy for different data.
 Perhaps because I failed to make this clear, discussion has concentrated
 on laptop reliability issues.  It is true that there would be some
 benefit to using multiple copies on a single-disk (eg. laptop) pool, but
 of course it would not protect against the most common failure mode
 (whole disk failure).
... lots of Good Stuff elided 

Soon Samsung will release a 100% flash memory based drive (32Gb) in a
laptop form factor.  But flash memory chips have a limited number of write
cycles available, and when exceeded, this usually results in data
corruption.  Some people have already encountered this issue with USB
thumb drives.  Its especially annoying if you were using the thumb drive
as a, what you thought was, a 100% _reliable_ backup mechanism.

This is a perfect application for ZFS copies=2.  Also, consider that there
is no time penalty for positioning the heads on a flash drive.  So now
you would have 2 options in a laptop type application with a single flash
based drive:

a) create a mirrored pool using 2 slices - expensive in terms of storage
   utilization
b) create a pool with no redundancy
   create a filesystem called importantPresentationData within that pool
   with copies=2 (or more).

Matthew - build it and they will come!

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
   Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
OpenSolaris Governing Board (OGB) Member - Feb 2006
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread Michael Schuster


Mathias F wrote:

I think I get the whole picture, let me summarise:

- you create a pool P and an FS on host A
- Host A crashes
- you import P on host B; this only works with -f, as
zpool import otherwise 
refuses to do so.

- now P is imported on B
- host A comes back up and re-accesses P, thereby
leading to (potential) 
corruption.

- your hope was that when host A comes back, there
exists a mechanism for 
telling it you need to re-import.

- Vxvm, as you currently use it, has this
functionality

Is that correct?


Yes it is, you got it ;)
VxVM just notices that it's previously imported DiskGroup(s) (for ZFS this is 
the Pool) were failed over and doesn't try to re-acquire them. It waits for an 
admin action.

The topic of clustering ZFS is not the problem atm, we just test the failover 
behaviour manually.


well, I think nevertheless you'll have to wait for SunCluster 3.2 for this to 
work. As others have said, ZFS as is currently is not made to work as you 
expect it to.


regards
--
Michael Schuster  +49 89 46008-2974 / x62974
visit the online support center:  http://www.sun.com/osc/

Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread James C. McPherson


Mathias F wrote:
...

Yes it is, you got it ;) VxVM just notices that it's previously imported
DiskGroup(s) (for ZFS this is the Pool) were failed over and doesn't try to
re-acquire them. It waits for an admin action.
The topic of clustering ZFS is not the problem atm, we just test the
failover behaviour manually.



Actually, this is the entirety of the problem: you are expecting a
product which is *not* currently multi-host-aware to behave in the
same safe manner as one which is. *AND* you're doing so knowing that
you are outside of the protection of a clustering framework.

WHY?

What valid tests do you think you are going to be able to run?

Wait for the SunCluster 3.2 release (or the beta). Don't faff around
with a data-killing test suite in an unsupported configuration.



James C. McPherson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread Dale Ghent


James C. McPherson wrote:

As I understand things, SunCluster 3.2 is expected to have support for 
HA-ZFS

and until that version is released you will not be running in a supported
configuration and so any errors you encounter are *your fault alone*.


Still, after reading Mathias's description, it seems that the former 
node is doing an implicit forced import when it boots back up. This 
seems wrong to me.


zpools should be imported only of the zpool itself says it's not already 
taken, which of course would be overidden by a manual -f import.


zpool sorry, i already have a boyfriend, host b
host a darn, ok, maybe next time

rather than the current scenario:

zpool host a, I'm over you now. host b is now the man in my life!
host a I don't care! you're coming with me anyways. you'll always be 
mine!

* host a stuffs zpool into the car and drives off

...and we know those situations never turn out particularly well.

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] zfs receive kernel panics the machine

2006-09-13 Thread Niclas Sodergard


Hi,

I'm running some experiments with zfs send and receive on Solaris 10u2
between two different machines. On server 1 I have the following

data/zones/app1838M  26.5G   836M  /zones/app1
data/zones/[EMAIL PROTECTED]  2.35M  -   832M  -

I have a script that creates a new snapshot and sends the diff to the
other machine. When I do a zfs receive on the other side the machine
kernel panics (see below for the panic).

I've done a zpool scrub to make sure the pool is ok (no errors found)
and I now wonder what steps I can take to stop this from happening.

cheers,
Nickus

panic[cpu0]/thread=30002033020: BAD TRAP: type=31 rp=2a101067030
addr=0 mmu_fsr=0 occurred in module SUNW,UltraSPARC-IIe due to a
NULL pointer dereference

zfs: trap type = 0x31
pid=615, pc=0x11efa24, sp=0x2a1010668d1, tstate=0x4480001602, context=0x4cd
g1-g7: 7ba9a3a4, 0, 1864400, 0, , 10, 30002033020

02a101066d50 unix:die+78 (31, 2a101067030, 0, 0, 2a101066e10, 1075000)
 %l0-3: c080 0031 0100 2000
 %l4-7: 0181a010 0181a000  004480001602
02a101066e30 unix:trap+8fc (2a101067030, 5, 1fff, 1c00, 0, 1)
 %l0-3:  030004664780 0031 
 %l4-7: e000 0200 0001 0005
02a101066f80 unix:ktl0+48 (7, 0, 18a4800, 30007998a00,
30007998a00, 180c000)  %l0-3: 0003 1400
004480001602 01019840
 %l4-7: 0300020f4200 0003  02a101067030
02a1010670d0 SUNW,UltraSPARC-IIe:bcopy+1554 (fcfff8667600,
30007998a00, 0, 140, 1, 72bb1)
 %l0-3: 0001 03000799c648 0008 0300020faab0
 %l4-7:   0002 01f8
02a1010672d0 zfs:zfsctl_ops_root+b75c8d0 (30007996f40,
30003e82860, , 3000799c5d8, 3000799c590, 2)
 %l0-3: 03000799c538  434b 030001a25500
 %l4-7: 0001 0020 0002 030007996ff0
02a101067380 zfs:dnode_reallocate+150 (10e, 13, 3000799c538, 10e,
0, 30003e82860)
 %l0-3: 7bada800 0011 03000799c590 0200
 %l4-7: 0020 030007996f40 030007996f40 0013
02a101067430 zfs:dmu_object_reclaim+80 (0, 0, 13, 200, 11, 7bada400)
 %l0-3: 0008 0007 0001 1af0
 %l4-7: 03072b00  1aef 030003e82860
02a1010674f0 zfs:restore_object+1b8 (2a101067710, 300038da6c8,
2a1010676c8, 11, 30003e82860, 200)
 %l0-3:  0002 010e 0010
 %l4-7:  4a004000 0004 010e
02a1010675b0 zfs:dmu_recvbackup+608 (300036b7a00, 300036b7cd8,
300036b7b30, 300075159c0, 1, 0)
 %l0-3: 0040 02a101067710 0138 030004664780
 %l4-7: 0002f5bacbac  0200 0001
02a101067770 zfs:zfs_ioc_recvbackup+38 (300036b7000, 0, 0, 0, 9, 0)
 %l0-3: 0004  0064 
 %l4-7:  0300036b700f  0031
02a101067820 zfs:zfsdev_ioctl+160 (70336c00, 5d, ffbfee40, 1f, 7c, e68)
 %l0-3: 0300036b7000   007c
 %l4-7: 7bacd668 703371e0 02e8 70336ef8
02a1010678d0 genunix:___const_seg_90212+1c60c (30006705600,
5a1f, ffbfee40, 13, 300046d9148, 11f86c8)
 %l0-3: 030004be2200 030004be2200 0004 030004664780
 %l4-7: 0003 0001  018a5c00
02a101067990 genunix:ioctl+184 (4, 3000438c9a0, ffbfee40,
ff38db68, 40350, 5a1f)
 %l0-3:   0004 14da
 %l4-7: 0001   

syncing file systems... 2 1 done
skipping system dump - no dump device configured
rebooting...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread Mathias F

[...]
 a product which is *not* currently multi-host-aware to
 behave in the
 same safe manner as one which is.

That`s the point we figured out while testing it ;)
I just wanted to have our thoughts reviewed by other ZFS users.

Our next steps IF the failover would have succeeded would be to create a little 
ZFS-agent for a VCS testing cluster.
We haven't used Sun Cluster and won't use it in future.

regards
Mathias
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re[2]: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

2006-09-13 Thread Robert Milkowski

Hello Frank,

Tuesday, September 12, 2006, 9:41:05 PM, you wrote:

FC It would be interesting to have a zfs enabled HBA to offload the checksum
FC and parity calculations.  How much of zfs would such an HBA have to
FC understand?

That won't be end-to-end checksuming anymore, right?
That way you can disable ZFS checksuming at all and base only on HW
RAID.


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re[2]: [zfs-discuss] Re: Re: ZFS forces system to paging to the point it is

2006-09-13 Thread Robert Milkowski

Hello Philippe,

It was recommended to lower ncsize and I did (to default ~128K).
So far it works ok for last days and staying at about 1GB free ram
(fluctuating between 900MB-1,4GB).

Do you think it's a long term solution or with more load and more data
the problem can surface again even with current ncsize value?


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Memory Usage

2006-09-13 Thread Robert Milkowski

Hello Thomas,

Tuesday, September 12, 2006, 7:40:25 PM, you wrote:

TB Hi,

TB We have been using zfs for a couple of months now, and, overall, really
TB like it.  However, we have run into a major problem -- zfs's memory  
TB requirements
TB crowd out our primary application.  Ultimately, we have to reboot the
TB machine
TB so there is enough free memory to start the application.

What exactly bad behavior did you notice? In general if app needs
memory ZFS should free it - however it doesn't always work that good
now.

TB What I would like is:

TB 1) A way to limit the size of the cache (a gig or two would be fine  
TB for us)

You can't.

TB 2) A way to clear the caches -- hopefully, something faster than  
TB rebooting
TB the machine.

export/import the pool. Eventually export a pool and unload zfs
module.


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re[2]: System hang caused by a bad snapshot

2006-09-13 Thread Ben Miller

 Hello Matthew,
 Tuesday, September 12, 2006, 7:57:45 PM, you wrote:
 MA Ben Miller wrote:
  I had a strange ZFS problem this morning.  The
 entire system would
  hang when mounting the ZFS filesystems.  After
 trial and error I
  determined that the problem was with one of the
 2500 ZFS filesystems.
  When mounting that users' home the system would
 hang and need to be
  rebooted.  After I removed the snapshots (9 of
 them) for that
  filesystem everything was fine.
  
  I don't know how to reproduce this and didn't get
 a crash dump.  I
  don't remember seeing anything about this before
 so I wanted to
  report it and see if anyone has any ideas.
 
 MA Hmm, that sounds pretty bizarre, since I don't
 think that mounting a 
 MA filesystem doesn't really interact with snapshots
 at all. 
 MA Unfortunately, I don't think we'll be able to
 diagnose this without a 
 MA crash dump or reproducibility.  If it happens
 again, force a crash dump
 MA while the system is hung and we can take a look
 at it.
 
 Maybe it wasn't hung after all. I've seen similar
 behavior here
 sometimes. Did your disks used in a pool were
 actually working?
 

  There was lots of activity on the disks (iostat and status LEDs) until it got 
to this one filesystem and everything stopped.  'zpool iostat 5' stopped 
running, the shell wouldn't respond and activity on the disks stopped.  This fs 
is relatively small  (175M used of a 512M quota).

 Sometimes it takes a lot of time (30-50minutes) to
 mount a file system
 - it's rare, but it happens. And during this ZFS
 reads from those
 disks in a pool. I did report it here some time ago.
 
  In my case the system crashed during the evening and it was left hung up when 
I came in during the morning, so it was hung for a good 9-10 hours.

Ben
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] when zfs enabled java

2006-09-13 Thread Jill Manfield

My customer is running java on a ZFS file system.  His platform is Soalris 10 
x86 SF X4200.  When he enabled ZFS his memory of 18 gigs drops to 2 gigs rather 
quickly.  I had him do a #  ps -e -o pid,vsz,comm | sort -n +1 and it came back:

The culprit application you see is java:
507 89464 /usr/bin/postmaster
515 89944 /usr/bin/postmaster
517 91136 /usr/bin/postmaster
508 96444 /usr/bin/postmaster
516 98088 /usr/bin/postmaster
503 3449580 /usr/jre1.5.0_07/bin/amd64/java
512 3732468 /usr/jre1.5.0_07/bin/amd64/java

Here is what the customer responded:
Well, Java's is a memory hog, but it's not the leak -- it's the
application.  Even after it fails due to lack of memory, the memory is
not reclaimed and we can no longer restart it.
Is there a bug on zfs?  I did not find one in sunsolve but then again I might 
have been searching the wrong thing.

We have done some slueth work and are starting to think our problem
might be ZFS -- the new file system Sun supports.  The documentation for
ZFS states that it tries to cache as much as it can, and it uses kernel
memory for the cache.  That would explain memory gradually disappearing.
ZFS can give memory back, but it does not do so quickly.

So, is there any way to check that?  If turns out to be the problem...

1) Is there a way to limit the size of ZFS's caches?

If not, then

2) Is there a way to clear ZFS's cache?

If not, then

3) Is there a way to force the Java VM to take a certain amount of
memory on startup and never give it back?  Xms does not appear to work.

Thanks,

Jill
=== 
 
   S U N  M I C R O S Y S T E M S  I N C.
   
   Jill Manfield - TSE-Alternate Platform Team
   email: [EMAIL PROTECTED]
  phone: (800)USA-4SUN (Reference your case #)
   address:  1617 Southwood Drive Nashua,NH 03063
   mailstop: NSH-01- B287
   Mgr: Dave O'Connor: [EMAIL PROTECTED]
  
Submit, View and Update tickets at http://www.sun.com/service/online

This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review or distribution by others is strictly 
prohibited.  If you are not the intended recipient please contact the sender 
and delete all copies.
=
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Snapshots and backing store

2006-09-13 Thread Nicolas Dorfsman

Hi,

  There's something really bizarre in ZFS snaphot specs : Uses no separate 
backing store. .

  Hum...if I want to mutualize one physical volume somewhere in my SAN as THE 
snaphots backing-store...it becomes impossible to do !   Really bad.

  Is there any chance to have a backing-store-file option in a future release 
?

  In the same idea, it would be great to have some sort of propertie to add a 
disk/LUN/physical_space to a pool, only reserved to backing-store.  At now, the 
only thing I see to disallow users to use my backing-store space for their 
usage is to put quota.

Nico
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Snapshots and backing store

2006-09-13 Thread Scott Howard

On Wed, Sep 13, 2006 at 07:38:22AM -0700, Nicolas Dorfsman wrote:
   There's something really bizarre in ZFS snaphot specs : Uses no separate 
 backing store. .

It's not at all bizarre once you understand how ZFS works. I'd suggest
reading through some of the documentation available at 
http://www.opensolaris.org/os/community/zfs/docs/ , in paricular the
Slides available there.

   Hum...if I want to mutualize one physical volume somewhere in my SAN as THE 
 snaphots backing-store...it becomes impossible to do !   Really bad.
 
   Is there any chance to have a backing-store-file option in a future 
 release ?

Doing this would have a significant hit on performance if nothing else.

Currently when you do a write to a volume which is snapshotted the system
has to :
1) Write the new data

(Yes, that's it - one step.  OK, so I'm ignoring metadata, but...)

If there was a dedicated backing store, this would change to :
1) Read the old data
2) Write the old data to the backing store
3) Write the new data
4) Free the old data (ok, so that's metadata only, but hey)

ZFS isn't copy-on-write in the same way that things like ufssnap are.
ufssnap is copy-on-write in that when you write something, it copies out
the old data and writes it somewhere else (the backing store).  ZFS doesn't
need to do this - it simply writes the new data to a new location, and
leaves the old data where it is. If that old data is needed for a snapshot
then it's left unchanged, if it's not then it's freed.

  Scott
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re: ZFS forces system to paging to the point it is

2006-09-13 Thread Mark Maybee


Robert Milkowski wrote:

Hello Philippe,

It was recommended to lower ncsize and I did (to default ~128K).
So far it works ok for last days and staying at about 1GB free ram
(fluctuating between 900MB-1,4GB).

Do you think it's a long term solution or with more load and more data
the problem can surface again even with current ncsize value?


Robert,

I don't think this should be impacted too much by load/data, as long
as the DNLC is able to evict, you should be in good shape.  We are
still working on a fix for the root cause of this issue however.

-Mark
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Snapshots and backing store

2006-09-13 Thread Matthew Ahrens


Nicolas Dorfsman wrote:

Hi,

There's something really bizarre in ZFS snaphot specs : Uses no
separate backing store. .

Hum...if I want to mutualize one physical volume somewhere in my SAN
as THE snaphots backing-store...it becomes impossible to do !
Really bad.

Is there any chance to have a backing-store-file option in a future
release ?

In the same idea, it would be great to have some sort of propertie to
add a disk/LUN/physical_space to a pool, only reserved to
backing-store.  At now, the only thing I see to disallow users to use
my backing-store space for their usage is to put quota.


If you want to copy your filesystems (or snapshots) to other disks, you
can use 'zfs send' to send them to a different pool (which may even be 
on a different machine!).


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: Bizzare problem with ZFS filesystem

2006-09-13 Thread Anantha N. Srirama

I ran the DTrace script and the resulting output is rather large (1 million 
lines and 65MB), so I won't burden this forum with that much data. Here are the 
top 100 lines from the DTrace output. Let me know if you need the full output 
and I'll figure out a way for the group to get it.

dtrace: description 'fbt:zfs::' matched 2404 probes
CPU FUNCTION
520  - zfs_lookup 2929705866442880
520- zfs_zaccess  2929705866448160
520  - zfs_zaccess_common 2929705866451840
520- zfs_acl_node_read2929705866455040
520  - zfs_acl_node_read_internal 2929705866458400
520- zfs_acl_alloc2929705866461040
520- zfs_acl_alloc2929705866462880
520  - zfs_acl_node_read_internal 2929705866464080
520- zfs_acl_node_read2929705866465600
520- zfs_ace_access   2929705866467760
520- zfs_ace_access   2929705866468880
520- zfs_ace_access   2929705866469520
520- zfs_ace_access   2929705866470320
520- zfs_acl_free 2929705866471920
520- zfs_acl_free 2929705866472960
520  - zfs_zaccess_common 2929705866474720
520- zfs_zaccess  2929705866476320
520- zfs_dirlook  2929705866478320
520  - zfs_dirent_lock2929705866480880
520  - zfs_dirent_lock2929705866486560
520  - zfs_dirent_unlock  2929705866489840
520  - zfs_dirent_unlock  2929705866491600
520- zfs_dirlook  2929705866492560
520  - zfs_lookup 2929705866494080
520  - zfs_getattr2929705866499360
520- dmu_object_size_from_db  2929705866503520
520- dmu_object_size_from_db  2929705866507920
520  - zfs_getattr2929705866509280
520  - zfs_lookup 2929705866520400
520- zfs_zaccess  2929705866521200
520  - zfs_zaccess_common 2929705866521920
520- zfs_acl_node_read2929705866523280
520  - zfs_acl_node_read_internal 2929705866524800
520- zfs_acl_alloc2929705866526000
520- zfs_acl_alloc2929705866526800
520  - zfs_acl_node_read_internal 2929705866527280
520- zfs_acl_node_read2929705866528160
520- zfs_ace_access   2929705866528720
520- zfs_ace_access   2929705866529280
520- zfs_ace_access   2929705866529920
520- zfs_ace_access   2929705866530800
520- zfs_acl_free 2929705866531360
520- zfs_acl_free 2929705866531920
520  - zfs_zaccess_common 2929705866532560
520- zfs_zaccess  2929705866533440
520- zfs_dirlook  2929705866534000
520  - zfs_dirent_lock2929705866534640
520  - zfs_dirent_lock2929705866535600
520  - zfs_dirent_unlock  2929705866536480
520  - zfs_dirent_unlock  2929705866537120
520- zfs_dirlook  2929705866537760
520  - zfs_lookup 2929705866538400
520  - zfs_getsecattr 2929705866543600
520- zfs_getacl   2929705866546240
520  - zfs_zaccess2929705866546960
520- zfs_zaccess_common   2929705866547680
520  - zfs_acl_node_read  2929705866548720
520- zfs_acl_node_read_internal   2929705866549440
520  - zfs_acl_alloc  2929705866550080
520  - zfs_acl_alloc  2929705866550720
520- zfs_acl_node_read_internal   2929705866551600
520  - zfs_acl_node_read  2929705866552160
520  - zfs_ace_access 2929705866552720
520  - zfs_ace_access 2929705866553280
520  - zfs_ace_access 2929705866554160
520  - zfs_ace_access 2929705866554720
520  - zfs_ace_access 2929705866555600
520  - zfs_ace_access 2929705866556160
520  - zfs_ace_access 2929705866557040
520  - zfs_ace_access 2929705866557600
520  - zfs_ace_access 2929705866558160
520  - zfs_ace_access 2929705866558720
520  - zfs_ace_access 2929705866559760
520  - zfs_ace_access

Re: [zfs-discuss] Proposal: multiple copies of user data

2006-09-13 Thread Bill Sommerfeld

On Wed, 2006-09-13 at 02:30, Richard Elling wrote:
 The field data I have says that complete disk failures are the exception.
 I hate to leave this as a teaser, I'll expand my comments later.

That matches my anecdotal experience with laptop drives; maybe I'm just
lucky, or maybe I'm just paying attention than most to the sounds they
start to make when they're having a bad hair day, but so far they've
always given *me* significant advance warning of impending doom,
generally by failing to read a bunch of disk sectors.

That said, I think the best use case for the copies  1 config would be
in systems with exactly two disks -- which covers most of the 1U boxes
out there.  

One question for Matt: when ditto blocks are used with raidz1, how well
does this handle the case where you encounter one or more single-sector
read errors on other drive(s) while reconstructing a failed drive?

for a concrete example

A0 B0 C0 D0 P0
A1 B1 C1 D1 P1

(A0==A1, B0==B1, ...; A^B^C^D==P)

Does the current implementation of raidz + ditto blocks cope with the
case where all of A, C0, and D1 are unavailable?

- Bill

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

2006-09-13 Thread Anton B. Rang

 It would be interesting to have a zfs enabled HBA to offload the checksum
 and parity calculations. How much of zfs would such an HBA have to
 understand?

That's an interesting question.

For parity, it's actually pretty easy.  One can envision an HBA which took a
group of related write commands and computed the parity on the fly, using
it for a final write command.  This would, however, probably limit the size
of a block that could be written to whatever amount of memory was available
for buffering on the HBA.  (Of course, memory is relatively cheap these days,
but it's still not free, so the HBA might have only a few megabytes.)

The checksum is more difficult.  If you're willing to delay writing an indirect
block until all of its children have been written [*], then we can just compute
the checksum for each block as it goes out, and that's easy [**] -- easier than 
the
parity, in fact, since there's no buffering required beyond the checksum itself.
ZFS in fact does delay this write at present.  However, I've argued in the past
that ZFS shouldn't delay it, but should write indirect blocks in parallel with 
the
data blocks.  It would be interesting to determine whether the performance
improvement of doing checksums on the HBA would outweigh the potential
benefit of writing indirect blocks in parallel.  Maybe it would for larger 
writes.

Anyone got an FPGA programmer and an open-source SATA implementation?  :-)
(Unfortunately storage protocols have a complex analog side, and except for
1394, I'm not aware of any implementations that separate the digital/analog,
which makes prototyping a lot harder, at least without much more detailed
documentation on the controllers than you're likely to find.)

-- Anton

[*] Actually, you don't need to delay until the writes have made it to disk, but
since you want to compute the checksum as the data goes out to the disk rather
than making a second pass over it, you'd need to wait until the data has at 
least
been sent to the drive cache.

[**] For SCSI and FC, there's added complexity in that the drives can request
data out-of-order. You can disable this but at the cost of some performance
on high-end drives.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: when zfs enabled java

2006-09-13 Thread Roch - PAE

Jill Manfield writes:

  My customer is running java on a ZFS file system.  His platform is Soalris 
  10 x86 SF X4200.  When he enabled ZFS his memory of 18 gigs drops to 2 gigs 
  rather quickly.  I had him do a #  ps -e -o pid,vsz,comm | sort -n +1 and it 
  came back:
  
  The culprit application you see is java:
  507 89464 /usr/bin/postmaster
  515 89944 /usr/bin/postmaster
  517 91136 /usr/bin/postmaster
  508 96444 /usr/bin/postmaster
  516 98088 /usr/bin/postmaster
  503 3449580 /usr/jre1.5.0_07/bin/amd64/java
  512 3732468 /usr/jre1.5.0_07/bin/amd64/java
  
  Here is what the customer responded:
  Well, Java's is a memory hog, but it's not the leak -- it's the
  application.  Even after it fails due to lack of memory, the memory is
  not reclaimed and we can no longer restart it.
  Is there a bug on zfs?  I did not find one in sunsolve
  but then again I might have been searching the wrong thing.

Assuming you run S10U2, you may be hit by this one:

4034947 anon_swap_adjust(), anon_resvmem() should call kmem_reap() if availrmem 
is low.

Fixed in snv_42.


It would show up as bad return code from either of the above 
function when java fails to startup.

-r



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Snapshots and backing store

2006-09-13 Thread Nicolas Dorfsman

Well.

 ZFS isn't copy-on-write in the same way that things
 like ufssnap are.
 ufssnap is copy-on-write in that when you write
 something, it copies out
 the old data and writes it somewhere else (the
 backing store).  ZFS doesn't
 need to do this - it simply writes the new data to a
 new location, and
 leaves the old data where it is. If that old data is
 needed for a snapshot
 then it's left unchanged, if it's not then it's
 freed.

  We need to think ZFS as ZFS, and not as a new filesystem ! I mean, the whole 
concept is different.

  So. What could be the best architecture ?
  With UFS, I used to have separate metadevices/LUNs for each application.
  With ZFS, I thought it would be nice to use a separate pool for each 
application.
  But, it means multiply snapshot backing-store OR dynamically remove/add this 
space/LUN to pool where we need to do backups. Knowing that I can't serialize 
backups, my only option is to multiply reservation for backing-stores.  Uh !
  Another option would be to create a single pool and put all apllications in 
it...don't think this as a solution.

   Any suggestion ?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Snapshots and backing store

2006-09-13 Thread Nicolas Dorfsman

 If you want to copy your filesystems (or snapshots)
 to other disks, you
 can use 'zfs send' to send them to a different pool
 (which may even be 
 on a different machine!).

Oh no ! It means copy the whole filesystem. The target here is definitively to 
snapshot the filesystem and them backup the snapshot.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread Frank Cusack

On September 13, 2006 6:09:50 AM -0700 Mathias F 
[EMAIL PROTECTED] wrote:

[...]

a product which is *not* currently multi-host-aware to
behave in the
same safe manner as one which is.


That`s the point we figured out while testing it ;)
I just wanted to have our thoughts reviewed by other ZFS users.

Our next steps IF the failover would have succeeded would be to create a
little ZFS-agent for a VCS testing cluster. We haven't used Sun Cluster
and won't use it in future.


/etc/zfs/zpool.cache is used at boot time to find what pools to import.
Remove it when the system boots and after it goes down and comes back
up it won't import any pools.  Not quite the same as not importing if
they are imported elsewhere, but perhaps close enough for you.

On September 13, 2006 10:15:28 PM +1000 James C. McPherson 
[EMAIL PROTECTED] wrote:

As I understand things, SunCluster 3.2 is expected to have support for
HA-ZFS
and until that version is released you will not be running in a supported
configuration and so any errors you encounter are *your fault alone*.


Didn't we have the PMC (poor man's cluster) talk last week as well?


I understand the objection to mickey mouse configurations, but I don't
understand the objection to (what I consider) simply improving safety.

Why again shouldn't zfs have a hostid written into the pool, to prevent
import if the hostid doesn't match?

And why should failover be limited to SC?  Why shouldn't VCS be able to
play?  Why should SC have secrets on how to do failover?  After all,
this is OPENsolaris.  And anyway many homegrown solutions (the kind
I'm familiar with anyway) are of high quality compared to commercial
ones.

-frank

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

2006-09-13 Thread Anton B. Rang

 just measured quickly that a 1.2Ghz sparc can do [400-500]MB/sec
 of encoding (time spent in misnamed function
 vdev_raidz_reconstruct) for a 3 disk raid-z group.

Strange, that seems very low.

Ah, I see. The current code loops through each buffer, either copying or XORing 
it into the parity. This likely would perform quite a bit better if it were 
reworked to go through more than one buffer at a time, doing the XOR. (Reading 
the partial parity is expensive.) Actually, this would be an instance where 
using assembly language or even processor-dependent code would be useful. Since 
the prefetch buffers on UltraSPARC are only applicable to floating-point loads, 
we should probably use prefetch  the VIS xor instructions. (Even calling bcopy 
instead of using the existing copy loop would help.)

FWIW, on large systems we ought to be aiming to sustain 8 GB/s or so of writes, 
and using 16 CPUs for just parity computation seems inordinately painful. :-)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

2006-09-13 Thread Anton B. Rang

With ZFS however the in-between cache is obsolete, as individual disk caches 
can be used directly. I also openly question whether even the dedicated RAID 
HW is faster than the newest CPUs in modern servers.

Individual disk caches are typically in the 8-16 MB range; for 15 disks, that 
gives you about 256 MB. A RAID with 15 drives behind it might have 2-4 GB of 
cache. That's a big improvement.

The dedicated RAID hardware may not be faster than the newest CPUs, but as a 
friend of mine has pointed out, even though delegating a job to somebody else 
often means it's done more slowly, it frees him up to do his other work. (It's 
also pondering the difference between latency and bandwidth. When parity is 
computed inline with the data path, as is often the case for hardware 
controllers, the bandwidth is relatively low since it's happening at the speed 
of data transfer to an individual disk, but the latency is effectively zero, 
since it's not adding any time to the transfer.)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread Eric Schrock

On Wed, Sep 13, 2006 at 09:14:36AM -0700, Frank Cusack wrote:
 
 Why again shouldn't zfs have a hostid written into the pool, to prevent
 import if the hostid doesn't match?

See:

6282725 hostname/hostid should be stored in the label

Keep in mind that this is not a complete clustering solution - only a
mechanism to prevent administrator misconfiguration. In particular, it's
possible for one host to be doing a failover, and the other host open
the pool before the hostid has been written to the disk.

 And why should failover be limited to SC?  Why shouldn't VCS be able to
 play?  Why should SC have secrets on how to do failover?  After all,
 this is OPENsolaris.  And anyway many homegrown solutions (the kind
 I'm familiar with anyway) are of high quality compared to commercial
 ones.

I'm not sure I understand this.  There is no built-in clustering support
for UFS - simultaneously mounting the same UFS filesystem on different
hosts will corrupt your data as well.  You need some sort of higher
level logic to correctly implement clustering.  This is not a SC
secret - it's how you manage non-clustered filesystems in a failover
situation.

Storing the hostid as a last-ditch check for administrative error is a
reasonable RFE - just one that we haven't yet gotten around to.
Claiming that it will solve the clustering problem oversimplifies the
problem and will lead to people who think they have a 'safe' homegrown
failover when in reality the right sequence of actions will irrevocably
corrupt their data.

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Proposal: multiple copies of user data

2006-09-13 Thread Torrey McMahon


eric kustarz wrote:


I want per pool, per dataset, and per file - where all are done by the 
filesystem (ZFS), not the application.  I was talking about a further 
enhancement to copies than what Matt is currently proposing - per 
file copies, but its more work (one thing being we don't have 
administrative control over files per se).


Now if you could do that and make it something that can be set at 
install time it would get a lot more interesting. When you install 
Solaris to that single laptop drive you can select files or even 
directories that have more then one copy in case of a problem down the road.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re: Proposal: multiple copies of user data

2006-09-13 Thread Gregory Shaw

On Sep 12, 2006, at 2:55 PM, Celso wrote:On 12/09/06, Celso [EMAIL PROTECTED] wrote: One of the great things about zfs, is that it protects not just against mechanical failure, butagainst silent data corruption. Having this availableto laptop owners seems to me to be important tomaking zfs even more attractive.I'm not arguing against that. I was just saying that*if* this was useful to you(and you were happy with the dubiousresilience/performance benefits) you canalready create mirrors/raidz on a single disk byusing partitions asbuilding blocks.There's no need to implement the proposal to gainthat. It's not as granular though is it?In the situation you  describe:...you split one disk in two. you then have effectively two partitions which you can then create a new mirrored zpool with. Then everything is mirrored. Correct?With ditto blocks, you can selectively add copies (seeing as how filesystem are so easy to create on zfs). If you are only concerned with copies of your important documents and email, why should /usr/bin be mirrored.That's my opinion anyway. I always enjoy choice, and I really believe this is a useful and flexible one.CelsoThis message posted from opensolaris.org___zfs-discuss mailing listzfs-discuss@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss One item missed in the discussion is the idea that individual ZFS filesystems can be created in a pool that will have the duplicate block behavior.  The idea being that only a vast subset of your data may be critical.  This allows additional flexibility in a single disk configuration.  Rather than sacrificing 1/2 of the pool storage, I can say that my critical documents will reside in a pool that will keep two copies on disk.I think it's a great idea.  It may not be for everybody, but I think the ability to treat some of my files as critical is a excellent feature. -Gregory Shaw, IT ArchitectPhone: (303) 673-8273        Fax: (303) 673-2773ITCTO Group, Sun Microsystems Inc.500 Eldorado Blvd, UBRM02-401               [EMAIL PROTECTED] (work)Broomfield, CO 80021                          [EMAIL PROTECTED] (home)"When Microsoft writes an application for Linux, I've Won." - Linus Torvalds ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Comments on a ZFS multiple use of a pool, RFE.

2006-09-13 Thread James Dickens


I filed this RFE earlier, since there is no way for non sun personel
to see this RFE for a while I am posting it here, and asking for
feedback from the community.

[Fwd: CR 6470231 Created P5 opensolaris/triage-queue Add an inuse
check that is inforced even if import -f is used.]   Inbox
Assign a GTD Label to this Conversation: [Show]
Statuses:   Next Action, Action, Waiting On, SomeDay, Finished
Contexts:   Car, Desk, Email, Home, Office, Phone, Waiting
References: ProjectHome, Reference
Misc.:
*Synopsis*: Add an inuse check that is inforced even if import -f is used.
http://bt2ws.central.sun.com/CrPrint?id=6470231


*Change Request ID*: 6470231

*Synopsis*: Add an inuse check that is inforced even if import -f is used.

Product: solaris
Category: opensolaris
Subcategory: triage-queue
Type: RFE
Subtype:
Status: 1-Dispatched
Substatus:
Priority: 5-Very Low
Introduced In Release:
Introduced In Build:
Responsible Manager: [EMAIL PROTECTED]
Responsible Engineer:
Initial Evaluator: [EMAIL PROTECTED]
Keywords: opensolaris

=== *Description* 
Category
 kernel
Sub-Category
 zfs
Description
 Currently many people have been trying to import ZFS pools on
multiple systems at once. currently this is unsupported, and causes
massive data corruption to the pool.
If ZFS refuses to import any zfs pool that was used in the last 5
minutes, that was not cleanly exported. This prevents the filesystem
from being mounted on multiple systems at once.
Frequency
 Always
Regression
 No
Steps to Reproduce
 import the same storage pool on more than one machine or domain.

Expected Result
 #zpool import -f datapool1
Error: ZFS pool datapool1 is currently imported on another system, and
was accessed less than 5 minutes ago, ZFS currently does not currently
support concurrent access. If this filesystem is no longer in use on
the other system please export the filesystem from the other system or
try again in 5 minutes.

Actual Result
 #zpool import -f datapool1
#
a few minutes later the system crashes because of concurrent use.
Error Message(s)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread Frank Cusack

On September 13, 2006 9:32:50 AM -0700 Eric Schrock [EMAIL PROTECTED] 
wrote:

On Wed, Sep 13, 2006 at 09:14:36AM -0700, Frank Cusack wrote:


Why again shouldn't zfs have a hostid written into the pool, to prevent
import if the hostid doesn't match?


See:

6282725 hostname/hostid should be stored in the label

Keep in mind that this is not a complete clustering solution - only a
mechanism to prevent administrator misconfiguration. In particular, it's
possible for one host to be doing a failover, and the other host open
the pool before the hostid has been written to the disk.


And why should failover be limited to SC?  Why shouldn't VCS be able to
play?  Why should SC have secrets on how to do failover?  After all,
this is OPENsolaris.  And anyway many homegrown solutions (the kind
I'm familiar with anyway) are of high quality compared to commercial
ones.


I'm not sure I understand this.  There is no built-in clustering support
for UFS - simultaneously mounting the same UFS filesystem on different
hosts will corrupt your data as well.  You need some sort of higher
level logic to correctly implement clustering.  This is not a SC
secret - it's how you manage non-clustered filesystems in a failover
situation.


But UFS filesystems don't automatically get mounted (well, we know how
to not automatically mount them in /etc/vfstab).  The SC secret is
in how importing of pools is prevented at boot time.  Of course you need
more than that, but my complaint was against the idea that you cannot
build a reliable solution yourself, instead of just sharing info about
zpool.cache albeit with a warning.


Storing the hostid as a last-ditch check for administrative error is a
reasonable RFE - just one that we haven't yet gotten around to.
Claiming that it will solve the clustering problem oversimplifies the
problem and will lead to people who think they have a 'safe' homegrown
failover when in reality the right sequence of actions will irrevocably
corrupt their data.


Thanks for that clarification, very important info.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Proposal: multiple copies of user data

2006-09-13 Thread Bart Smaalders


Torrey McMahon wrote:

eric kustarz wrote:


I want per pool, per dataset, and per file - where all are done by the 
filesystem (ZFS), not the application.  I was talking about a further 
enhancement to copies than what Matt is currently proposing - per 
file copies, but its more work (one thing being we don't have 
administrative control over files per se).


Now if you could do that and make it something that can be set at 
install time it would get a lot more interesting. When you install 
Solaris to that single laptop drive you can select files or even 
directories that have more then one copy in case of a problem down the 
road.




Actually, this is a perfect use case for setting the copies=2
property after installation.  The original binaries are
quite replaceable; the customizations and personal files
created later on are not.

- Bart

--
Bart Smaalders  Solaris Kernel Performance
[EMAIL PROTECTED]   http://blogs.sun.com/barts
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Snapshots and backing store

2006-09-13 Thread Torrey McMahon


Matthew Ahrens wrote:

Nicolas Dorfsman wrote:

Hi,

There's something really bizarre in ZFS snaphot specs : Uses no
separate backing store. .

Hum...if I want to mutualize one physical volume somewhere in my SAN
as THE snaphots backing-store...it becomes impossible to do !
Really bad.

Is there any chance to have a backing-store-file option in a future
release ?

In the same idea, it would be great to have some sort of propertie to
add a disk/LUN/physical_space to a pool, only reserved to
backing-store.  At now, the only thing I see to disallow users to use
my backing-store space for their usage is to put quota.


If you want to copy your filesystems (or snapshots) to other disks, you
can use 'zfs send' to send them to a different pool (which may even be 
on a different machine!).





The confusion is probably around the word snapshot and all its various 
usage over the years.


The one particular case people will probably slam their head into a wall 
is exporting snapshots to other hosts. If you can get the customer or 
tech to think in terms of where they want the data and how instead of 
snapshots, or lun copies, or whatever, it makes for an easier conversation.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Proposal: multiple copies of user data

2006-09-13 Thread Torrey McMahon


Bart Smaalders wrote:

Torrey McMahon wrote:

eric kustarz wrote:


I want per pool, per dataset, and per file - where all are done by 
the filesystem (ZFS), not the application.  I was talking about a 
further enhancement to copies than what Matt is currently 
proposing - per file copies, but its more work (one thing being we 
don't have administrative control over files per se).


Now if you could do that and make it something that can be set at 
install time it would get a lot more interesting. When you install 
Solaris to that single laptop drive you can select files or even 
directories that have more then one copy in case of a problem down 
the road.




Actually, this is a perfect use case for setting the copies=2
property after installation.  The original binaries are
quite replaceable; the customizations and personal files
created later on are not.



We've been talking about user data but the chance of corrupting 
something on disk and then detecting a bad checksum on something in 
/kernel is also possible. (Disk drives do weird things from time to 
time.) If I was sufficiently paranoid I would want everything required 
to get into single-user mode, some other stuff, and then my user data, 
duplicated to avoid any issues.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread Dale Ghent


On Sep 13, 2006, at 12:32 PM, Eric Schrock wrote:


Storing the hostid as a last-ditch check for administrative error is a
reasonable RFE - just one that we haven't yet gotten around to.
Claiming that it will solve the clustering problem oversimplifies the
problem and will lead to people who think they have a 'safe' homegrown
failover when in reality the right sequence of actions will  
irrevocably

corrupt their data.


HostID is handy, but it'll only tell you who MIGHT or MIGHT NOT have  
control of the pool.


Such an RFE would even more worthwhile if it included something such  
as a time stamp. This time stamp (or similar time-oriented signature)  
would be updated regularly (bases on some internal ZFS event). If  
this stamp goes for an arbitrary length of time without being  
updated, another host in the cluster could force import it on the  
assumption that the original host is no longer able to communicate to  
the zpool.


This is a simple idea description, but perhaps worthwhile if you're  
already going to change the label structure for adding the hostid.


/dale

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread Frank Cusack

On September 13, 2006 1:28:47 PM -0400 Dale Ghent [EMAIL PROTECTED] 
wrote:

On Sep 13, 2006, at 12:32 PM, Eric Schrock wrote:


Storing the hostid as a last-ditch check for administrative error is a
reasonable RFE - just one that we haven't yet gotten around to.
Claiming that it will solve the clustering problem oversimplifies the
problem and will lead to people who think they have a 'safe' homegrown
failover when in reality the right sequence of actions will
irrevocably
corrupt their data.


HostID is handy, but it'll only tell you who MIGHT or MIGHT NOT have
control of the pool.

Such an RFE would even more worthwhile if it included something such  as
a time stamp. This time stamp (or similar time-oriented signature)  would
be updated regularly (bases on some internal ZFS event). If  this stamp
goes for an arbitrary length of time without being  updated, another host
in the cluster could force import it on the  assumption that the original
host is no longer able to communicate to  the zpool.

This is a simple idea description, but perhaps worthwhile if you're
already going to change the label structure for adding the hostid.


Sounds cool!  Better than depending on an out-of-band heartbeat.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] zpool always thinks it's mounted on another system

2006-09-13 Thread Rich

Hi zfs-discuss,I was running Solaris 11, b42 on x86, and I
tried upgrading to b44. I didn't have space on the root for
live_upgrade, so I booted from disc to upgrade, but it failed on every
attempt, so I ended up blowing away / and doing a clean b44 install.
Now the zpool that was attached to that system won't stop
thinking that it's mounted on another system, regardless of what I try.On
boot, the system thinks the pool is mounted elsewhere, and won't mount
it unless I log in and zpool import -f. I tried zpool export followed
by import, and that required no -f, but on reboot, lo, the problem
returned.
I even tried destroying and reimporting the pool, which led to this hilarious sequence:# zpool importno pools available to import# zpool import -D  pool: moonsideid: 8290331144559232496
 state: ONLINE (DESTROYED)action: The pool can be imported using its name or numeric identifier.The pool was destroyed, but can be imported using the '-Df' flags.config:moonsideONLINE
  raidz1ONLINEc2t0d0  ONLINEc2t1d0  ONLINEc2t2d0  ONLINEc2t3d0  ONLINEc2t4d0  ONLINEc2t5d0  ONLINEc2t6d0  ONLINE
# zpool import -D moonsidecannot import 'moonside': pool may be in use from other systemuse '-f' to import anyway#This
is either a bug or a missing feature (the ability to make a filesystem
stop thinking it's mounted somewhere else) - anybody have any ideas?
Thanks,- Rich
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread Darren J Moffat


Frank Cusack wrote:

Sounds cool!  Better than depending on an out-of-band heartbeat.


I disagree it sounds really really bad.  If you want a high availability 
cluster you really need a faster interconnect than spinning rust which 
is probably the slowest interface we have now!


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool always thinks it's mounted on another system

2006-09-13 Thread Eric Schrock

Can you send the output of 'zdb -l /dev/dsk/c2t0d0s0' ?   So you do the
'zpool import -f' and all is well, but then when you reboot, it doesn't
show up, and you must import it again?  Can you send the output of 'zdb
-C' both before and after you do the import?

Thanks,

- Eric

On Wed, Sep 13, 2006 at 01:40:13PM -0400, Rich wrote:
 Hi zfs-discuss,
 
 I was running Solaris 11, b42 on x86, and I tried upgrading to b44. I didn't
 have space on the root for live_upgrade, so I booted from disc to upgrade,
 but it failed on every attempt, so I ended up blowing away / and doing a
 clean b44 install.
 
 Now the zpool that was attached to that system won't stop thinking that it's
 mounted on another system, regardless of what I try.
 
 On boot, the system thinks the pool is mounted elsewhere, and won't mount it
 unless I log in and zpool import -f. I tried zpool export followed by
 import, and that required no -f, but on reboot, lo, the problem returned.
 
 I even tried destroying and reimporting the pool, which led to this
 hilarious sequence:
 
 # zpool import
 no pools available to import
 # zpool import -D
  pool: moonside
id: 8290331144559232496
 
 state: ONLINE (DESTROYED)
 action: The pool can be imported using its name or numeric identifier.
The pool was destroyed, but can be imported using the '-Df' flags.
 config:
 
moonsideONLINE
 
  raidz1ONLINE
c2t0d0  ONLINE
c2t1d0  ONLINE
c2t2d0  ONLINE
c2t3d0  ONLINE
c2t4d0  ONLINE
c2t5d0  ONLINE
c2t6d0  ONLINE
 
 # zpool import -D moonside
 cannot import 'moonside': pool may be in use from other system
 use '-f' to import anyway
 #
 
 This is either a bug or a missing feature (the ability to make a filesystem
 stop thinking it's mounted somewhere else) - anybody have any ideas?
 
 Thanks,
 - Rich

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: Re: [zfs-discuss] marvel cards.. as recommended

2006-09-13 Thread Joe Little


On 9/12/06, James C. McPherson [EMAIL PROTECTED] wrote:

Joe Little wrote:
 So, people here recommended the Marvell cards, and one even provided a
 link to acquire them for SATA jbod support. Well, this is what the
 latest bits (B47) say:

 Sep 12 13:51:54 vram marvell88sx: [ID 679681 kern.warning] WARNING:
 marvell88sx0: Could not attach, unsupported chip stepping or unable to
 get the chip stepping
 Sep 12 13:51:54 vram marvell88sx: [ID 679681 kern.warning] WARNING:
 marvell88sx1: Could not attach, unsupported chip stepping or unable to
 get the chip stepping
 Sep 12 13:51:54 vram marvell88sx: [ID 679681 kern.warning] WARNING:
 marvell88sx0: Could not attach, unsupported chip stepping or unable to
 get the chip stepping
 Sep 12 13:51:54 vram marvell88sx: [ID 679681 kern.warning] WARNING:
 marvell88sx1: Could not attach, unsupported chip stepping or unable to
 get the chip stepping

 Any takers on how to get around this one?

You could start by providing the output from prtpicl -v and
prtconf -v as well as /usr/X11/bin/scanpci -v -V 1 so we
know which device you're actually having a problem with.

Is the pci vendor+deviceid for that card listed in your
/etc/driver_aliases file against the marvell88sx driver?


James



I don't know if you really want all those large files, but
/etc/driver_aliases lists:

marvell88sx pci11ab,6081.9

[EMAIL PROTECTED]:~# lspci | grep Marv
03:01.0 SCSI storage controller: Marvell Technology Group Ltd.
MV88SX6081 8-port SATA II PCI-X Controller (rev 07)
05:01.0 SCSI storage controller: Marvell Technology Group Ltd.
MV88SX6081 8-port SATA II PCI-X Controller (rev 07)

[EMAIL PROTECTED]:~# lspci -n | grep 11ab
03:01.0 0100: 11ab:6081 (rev 07)
05:01.0 0100: 11ab:6081 (rev 07)

And it sees the module:
198 f571   9f10  62   1  marvell88sx (marvell88sx HBA Driver v1.8)

Is this a support revision of the card? Is there something stupid like
enabling the jumpers or some such that's required?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Loss of compression with send/receive

2006-09-13 Thread Eric Schrock

You want:

6421959 want zfs send to preserve properties ('zfs send -p')

Which Matt is currently working on.

- Eric

On Thu, Sep 14, 2006 at 02:04:32AM +0800, Darren Reed wrote:
 Using Solaris 10, Update 2 (b9a)
 
 I've just used zfs send | zfs receive to move some filesystems
 from one disk to another (I'm sure this is the quickest move I've
 ever done!) but in doing so, I lost zfs set compression=on on
 those filesystems.
 
 If I create the filesystems first and enable compression, I can't
 receive to them (results in an error.)  Is there some way around
 this?  Patch?  RFE?
 
 Darren
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread Dale Ghent


On Sep 13, 2006, at 1:37 PM, Darren J Moffat wrote:

That might be acceptable in some environments but that is going to  
cause  disks to spin up.  That will be very unacceptable in a  
laptop and maybe even in some energy conscious data centres.


Introduce an option to 'zpool create'? Come to think of it,  
describing attributes for a pool seems to be lacking (unlike zfs  
volumes)


What you are proposing sounds a lot like a cluster hear beat which  
IMO really should not be implemented by writing to disks.


That would be an extreme example of the use for this. While it / 
could/ be used as a heart beat mechanism, it would be useful  
administratively.


# zpool status foopool
Pool foopool is currently imported by host.blah.com
Import time: 4 April 2007 16:20:00
Last activity: 23 June 2007 18:42:53
...
...


/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread Ceri Davies

On Wed, Sep 13, 2006 at 06:37:25PM +0100, Darren J Moffat wrote:
 Dale Ghent wrote:
 On Sep 13, 2006, at 12:32 PM, Eric Schrock wrote:
 
 Storing the hostid as a last-ditch check for administrative error is a
 reasonable RFE - just one that we haven't yet gotten around to.
 Claiming that it will solve the clustering problem oversimplifies the
 problem and will lead to people who think they have a 'safe' homegrown
 failover when in reality the right sequence of actions will irrevocably
 corrupt their data.
 
 HostID is handy, but it'll only tell you who MIGHT or MIGHT NOT have 
 control of the pool.
 
 Such an RFE would even more worthwhile if it included something such as 
 a time stamp. This time stamp (or similar time-oriented signature) would 
 be updated regularly (bases on some internal ZFS event). If this stamp 
 goes for an arbitrary length of time without being updated, another host 
 in the cluster could force import it on the assumption that the original 
 host is no longer able to communicate to the zpool.
 
 That might be acceptable in some environments but that is going to cause 
  disks to spin up.  That will be very unacceptable in a laptop and 
 maybe even in some energy conscious data centres.
 
 What you are proposing sounds a lot like a cluster hear beat which IMO 
 really should not be implemented by writing to disks.

Wouldn't it be possible to implement this via SCSI reservations (where
available) a la quorum devices?

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgpbrlHYCwiGr.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool always thinks it's mounted on another system

2006-09-13 Thread Rich

I do the 'zpool import -f moonside', and all is well until I reboot, at which point I must zpool import -f again.Below is zdb -l /dev/dsk/c2t0d0s0's output:LABEL 0
 version=3 name='moonside' state=0 txg=1644418 pool_guid=8290331144559232496 top_guid=12835093579979239393 guid=7480231448190751824
 vdev_tree type='raidz' id=0 guid=12835093579979239393 nparity=1 metaslab_array=13 metaslab_shift=30 ashift=9 asize=127371575296
 children[0] type='disk' id=0 guid=7480231448190751824 path='/dev/dsk/c2t0d0s0' devid='id1,[EMAIL PROTECTED]/a'
 whole_disk=1 DTL=23 children[1] type='disk' id=1 guid=2626377814825345466 path='/dev/dsk/c2t1d0s0'
 devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 DTL=22 children[2] type='disk' id=2 guid=16932309055791750053
 path='/dev/dsk/c2t2d0s0' devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 DTL=21 children[3] type='disk' id=3
 guid=18145699204085538208 path='/dev/dsk/c2t3d0s0' devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 DTL=20 children[4]
 type='disk' id=4 guid=2046828747707454119 path='/dev/dsk/c2t4d0s0' devid='id1,[EMAIL PROTECTED]/a' whole_disk=1
 DTL=19 children[5] type='disk' id=5 guid=5851407888580937378 path='/dev/dsk/c2t5d0s0' devid='id1,
[EMAIL PROTECTED]/a' whole_disk=1 DTL=18 children[6] type='disk' id=6 guid=10476478316210434659 path='/dev/dsk/c2t6d0s0'
 devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 DTL=17LABEL 1
 version=3 name='moonside' state=0 txg=1644418 pool_guid=8290331144559232496 top_guid=12835093579979239393 guid=7480231448190751824 vdev_tree type='raidz'
 id=0 guid=12835093579979239393 nparity=1 metaslab_array=13 metaslab_shift=30 ashift=9 asize=127371575296 children[0] type='disk'
 id=0 guid=7480231448190751824 path='/dev/dsk/c2t0d0s0' devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 DTL=23
 children[1] type='disk' id=1 guid=2626377814825345466 path='/dev/dsk/c2t1d0s0' devid='id1,[EMAIL PROTECTED]/a'
 whole_disk=1 DTL=22 children[2] type='disk' id=2 guid=16932309055791750053 path='/dev/dsk/c2t2d0s0'
 devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 DTL=21 children[3] type='disk' id=3 guid=18145699204085538208
 path='/dev/dsk/c2t3d0s0' devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 DTL=20 children[4] type='disk' id=4
 guid=2046828747707454119 path='/dev/dsk/c2t4d0s0' devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 DTL=19 children[5]
 type='disk' id=5 guid=5851407888580937378 path='/dev/dsk/c2t5d0s0' devid='id1,[EMAIL PROTECTED]/a' whole_disk=1
 DTL=18 children[6] type='disk' id=6 guid=10476478316210434659 path='/dev/dsk/c2t6d0s0' devid='id1,
[EMAIL PROTECTED]/a' whole_disk=1 DTL=17LABEL 2 version=3 name='moonside'
 state=0 txg=1644418 pool_guid=8290331144559232496 top_guid=12835093579979239393 guid=7480231448190751824 vdev_tree type='raidz' id=0 guid=12835093579979239393
 nparity=1 metaslab_array=13 metaslab_shift=30 ashift=9 asize=127371575296 children[0] type='disk' id=0 guid=7480231448190751824
 path='/dev/dsk/c2t0d0s0' devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 DTL=23 children[1] type='disk' id=1
 guid=2626377814825345466 path='/dev/dsk/c2t1d0s0' devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 DTL=22 children[2]
 type='disk' id=2 guid=16932309055791750053 path='/dev/dsk/c2t2d0s0' devid='id1,[EMAIL PROTECTED]/a' whole_disk=1
 DTL=21 children[3] type='disk' id=3 guid=18145699204085538208 path='/dev/dsk/c2t3d0s0' devid='id1,
[EMAIL PROTECTED]/a' whole_disk=1 DTL=20 children[4] type='disk' id=4 guid=2046828747707454119 path='/dev/dsk/c2t4d0s0'
 devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 DTL=19 children[5] type='disk' id=5 guid=5851407888580937378
 path='/dev/dsk/c2t5d0s0' devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 DTL=18 children[6] type='disk' id=6
 guid=10476478316210434659 path='/dev/dsk/c2t6d0s0' devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 DTL=17
LABEL 3 version=3 name='moonside' state=0 txg=1644418 pool_guid=8290331144559232496 top_guid=12835093579979239393 guid=7480231448190751824
 vdev_tree type='raidz' id=0 guid=12835093579979239393 nparity=1 metaslab_array=13 metaslab_shift=30 ashift=9 asize=127371575296
 children[0] type='disk' id=0 guid=7480231448190751824 path='/dev/dsk/c2t0d0s0' devid='id1,[EMAIL PROTECTED]/a'
 whole_disk=1 DTL=23 children[1] type='disk' id=1 guid=2626377814825345466 path='/dev/dsk/c2t1d0s0'
 devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 DTL=22 children[2] type='disk' id=2 guid=16932309055791750053
 path='/dev/dsk/c2t2d0s0' devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 DTL=21 children[3] type='disk' id=3
 guid=18145699204085538208 path='/dev/dsk/c2t3d0s0' devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 DTL=20 children[4]
 type='disk' id=4 guid=2046828747707454119 path='/dev/dsk/c2t4d0s0' devid='id1,[EMAIL

Re: [zfs-discuss] Comments on a ZFS multiple use of a pool, RFE.

2006-09-13 Thread James Dickens


On 9/13/06, Eric Schrock [EMAIL PROTECTED] wrote:

There are several problems I can see:

- This is what the original '-f' flag is for.  I think a better approach
  is to expand the default message of 'zpool import' with more
  information, such as which was the last host to access the pool and
  when.  The point of '-f' is that you have recognized that the pool
  is potentially in use, but as an administrator you've made a higher
  level determination that it is in fact safe to import.


this would not be the first time that Solaris overrided an administive
command, because its just not safe or sane  to do so. For example.

rm -rf /


- You are going to need a flag to override this behavior for clustering
  situations.  Forcing the user to always wait 5 minutes is
  unacceptable.


wouldn't it be more likely to for a clustering solution to use libzfs
or we could add another  method to zfs for failing over in clustering
solutions, this method would then check to see if the OS and the pool
supported clustering at the time of import. Since Clustering support
is not yet released for ZFS we have clean slate on how ZFS deals with
it.


- By creating a new flag (lets say '-F'), you are just going to
  introduce more complexity, and customers will get equally used to
  issuing 'zpool import -fF', and now you're back to the same problem
  all over again.


if 5 minutes is to long, perhaps it could reduced this to 2 minutes
and make ZFS update a value stored on the pool once a minute that it
is in use. we could update the pool in use  flag more often but it
seems excessive since its only a courner case anyway.

Another possible method to handle this case, but its more work but
would not impact existing fast paths, would be for zpool to watch the
pool if it appears to have been accessed in the last X minutes and not
exported, is to have ZFS watch the devices and see if any other disk
commands occur from the old host, if any occurs its obvious that the
administrator is putting the system in to a case where it will crash
the system, so isn't it better to fail the import then crash? Any
extra delay this check imposes would not break existing specifications
because importing devices is not guaranteed fixed performance because
ZFS needs to find the devices  and verify that all components of pool
are in tact and not failed.


James


- A pool which is in use on another host but inactive for more than 5
  minutes will fail this check (since no transactions will have been
  pushed), but could potentially write data after the pool has been
  imported.

- This breaks existing behavior.  The CLI utilities are documented as
  commmitted (a.k.a stable), and breaking existing customer scripts
  isn't acceptable.


the existing behaviour is broken, ZFS should return EBUSY, if ZFS can
determine that the pool  is in active use without extreme measures.

This is the same behaviour that happens should a user try and mount a
filesystem twice on one system.

James




This seems to take the wrong approach to the root problem.  Depending on
how you look at it, the real root problem is either:

a) ZFS is not a clustered filesystem, and actively using the same pool
   on multiple systems (even opening said pool) will corrupt data.

b) 'zpool import' doesn't present enough information for an
   administrator to reliably determine if a pool is actually in use
   on multiple systems.

The former is obviously a ton of work and something we're thinking about
but won't address any time soon.  The latter can be addressed by
presenting more useful information when 'zpool import' is run without
the '-f' flag.

- Eric

On Wed, Sep 13, 2006 at 12:14:06PM -0500, James Dickens wrote:
 I filed this RFE earlier, since there is no way for non sun personel
 to see this RFE for a while I am posting it here, and asking for
 feedback from the community.

 [Fwd: CR 6470231 Created P5 opensolaris/triage-queue Add an inuse
 check that is inforced even if import -f is used.]   Inbox
 Assign a GTD Label to this Conversation: [Show]
 Statuses: Next Action, Action, Waiting On, SomeDay, Finished
 Contexts: Car, Desk, Email, Home, Office, Phone, Waiting
 References:   ProjectHome, Reference
 Misc.:
 *Synopsis*: Add an inuse check that is inforced even if import -f is used.
 http://bt2ws.central.sun.com/CrPrint?id=6470231


 *Change Request ID*: 6470231

 *Synopsis*: Add an inuse check that is inforced even if import -f is used.

 Product: solaris
 Category: opensolaris
 Subcategory: triage-queue
 Type: RFE
 Subtype:
 Status: 1-Dispatched
 Substatus:
 Priority: 5-Very Low
 Introduced In Release:
 Introduced In Build:
 Responsible Manager: [EMAIL PROTECTED]
 Responsible Engineer:
 Initial Evaluator: [EMAIL PROTECTED]
 Keywords: opensolaris

 === *Description*
 
 Category
  kernel
 Sub-Category
  zfs
 Description
  Currently many people have been trying to import ZFS pools on

[zfs-discuss] Re: Comments on a ZFS multiple use of a pool, RFE.

2006-09-13 Thread Anton B. Rang

I think there are at least two separate issues here.

The first is that ZFS doesn't support multiple hosts accessing the same pool. 
That's simply a matter of telling people. UFS doesn't support multiple hosts, 
but it doesn't have any special features to prevent administrators from 
*trying* it. They'll just corrupt their filesystem.

The second is that ZFS remembers pools and automatically imports them at boot 
time. This is a bigger problem, because it means that if you create a pool on 
host A, shut down host A, import the pool to host B, and then boot host A, your 
pool is automatically destroyed.

The hostid solution that VxVM uses would catch this second problem, because 
when A came up after its reboot, it would find that -- even though it had 
created the pool -- it was not the last machine to access it, and could refuse 
to automatically mount it. If the administrator really wanted it mounted, they 
could force the issue. Relying on the administrator to know that they have to 
remove a file (the 'zpool cache') before they let the machine come up out of 
single-user mode seems the wrong approach to me. (By default, we'll shoot you 
in the foot, but we'll give you a way to unload the gun if you're fast enough 
and if you remember.)

The hostid approach seems better to me than modifying the semantics of force. 
I honestly don't think the problem is administrators who don't know what 
they're doing; I think the problem is that our defaults are wrong in the case 
of shared storage.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: marvel cards.. as recommended

2006-09-13 Thread Anton B. Rang

If I'm reading the source correctly, for the $60xx boards, the only supported 
revision is $09.  Yours is $07, which presumably has some errata with no 
workaround, and which the Solaris driver refuses to support. Hope you can 
return it ... ?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: marvel cards.. as recommended

2006-09-13 Thread Anton B. Rang

A quick peek at the Linux source shows a small workaround in place for the 07 
revision...maybe if you file a bug against Solaris to support this revision it 
might be possible to get it added, at least if that's the only issue.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Proposal: multiple copies of user data

2006-09-13 Thread Anton B. Rang

Is this true for single-sector, vs. single-ZFS-block, errors?  (Yes, it's 
pathological and probably nobody really cares.) I didn't see anything in the 
code which falls back on single-sector reads. (It's slightly annoying that the 
interface to the block device drivers loses the SCSI error status, which tells 
you the first sector which was bad.)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Comments on a ZFS multiple use of a pool, RFE.

2006-09-13 Thread Eric Schrock

On Wed, Sep 13, 2006 at 02:29:55PM -0500, James Dickens wrote:

 this would not be the first time that Solaris overrided an administive
 command, because its just not safe or sane  to do so. For example.
 
 rm -rf /

As I've repeated before, and will continue to repeat, it's not actually
possible for ZFS to determine whether a pool is in active use (short of
making ZFS a cluster-aware filesystem).  Adding arbitrary delays doesn't
change this fact, and only less likely.  I've given you examples of
where this behavior is safe and sane and useful, so the above
simplification (upon which most of the other arguments are based) isn't
really valid.

I'm curious why you didn't comment on my other suggestion (displaying
last accessed host and time as part of 'zpool import'), which seems to
solve your problem by giving the administrator the data they need to
make an appropriate decision.

As Anton and other have mentioned in previous discussions there seems to
be several clear RFEs that everyone can agree with:

1. Store the hostid, hostname, and last time written as part of the
   label.

2. During auto-import (aka open), if the the hostid is different from
   our own, fault the pool and generate an appropriate FMA event.

3. During manual import, display the last hostname and time accessed if
   the hostid is not our own, and the pool is still marked ACTIVE.

This prevents administrators from shooting themselves in the foot, while
still allowing explicit cluster failover to operate with more
information than was available before.

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Comments on a ZFS multiple use of a pool, RFE.

2006-09-13 Thread James Dickens


On 9/13/06, Eric Schrock [EMAIL PROTECTED] wrote:

On Wed, Sep 13, 2006 at 02:29:55PM -0500, James Dickens wrote:

 this would not be the first time that Solaris overrided an administive
 command, because its just not safe or sane  to do so. For example.

 rm -rf /

As I've repeated before, and will continue to repeat, it's not actually
possible for ZFS to determine whether a pool is in active use (short of
making ZFS a cluster-aware filesystem).  Adding arbitrary delays doesn't
change this fact, and only less likely.  I've given you examples of
where this behavior is safe and sane and useful, so the above
simplification (upon which most of the other arguments are based) isn't
really valid.


I disagree with this, isn't there way to track when the last read was?
Even computing the last write access time that you are already
recommending, then sleeping for 30 seconds and recompute the value by
accessing the disk again and compare,even a read on the pool will
possibly cause a write if ATIME tracking is enabled on the filesystem,
if someone is accessing the pool we are importing underneath us, it is
a dead give away that we are about to explode if we continue on this
path.


I'm curious why you didn't comment on my other suggestion (displaying
last accessed host and time as part of 'zpool import'), which seems to
solve your problem by giving the administrator the data they need to
make an appropriate decision.


its a good suggestion, it just doesn't go far enough in my opinion.


As Anton and other have mentioned in previous discussions there seems to
be several clear RFEs that everyone can agree with:

1. Store the hostid, hostname, and last time written as part of the
   label.

2. During auto-import (aka open), if the the hostid is different from
   our own, fault the pool and generate an appropriate FMA event.

3. During manual import, display the last hostname and time accessed if
   the hostid is not our own, and the pool is still marked ACTIVE.

This prevents administrators from shooting themselves in the foot, while
still allowing explicit cluster failover to operate with more
information than was available before.


if this is what the community decides I can live with this, I may even
provide a patch for OpenSolaris distros that does the more intensive
check, it seems to be an easy fix once you have complete #1.

James



- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: when zfs enabled java

2006-09-13 Thread Mark Maybee


Jill Manfield wrote:

My customer is running java on a ZFS file system.  His platform is Soalris 10 
x86 SF X4200.  When he enabled ZFS his memory of 18 gigs drops to 2 gigs rather 
quickly.  I had him do a #  ps -e -o pid,vsz,comm | sort -n +1 and it came back:

The culprit application you see is java:
507 89464 /usr/bin/postmaster
515 89944 /usr/bin/postmaster
517 91136 /usr/bin/postmaster
508 96444 /usr/bin/postmaster
516 98088 /usr/bin/postmaster
503 3449580 /usr/jre1.5.0_07/bin/amd64/java
512 3732468 /usr/jre1.5.0_07/bin/amd64/java

Here is what the customer responded:
Well, Java's is a memory hog, but it's not the leak -- it's the
application.  Even after it fails due to lack of memory, the memory is
not reclaimed and we can no longer restart it.
Is there a bug on zfs?  I did not find one in sunsolve but then again I might 
have been searching the wrong thing.

We have done some slueth work and are starting to think our problem
might be ZFS -- the new file system Sun supports.  The documentation for
ZFS states that it tries to cache as much as it can, and it uses kernel
memory for the cache.  That would explain memory gradually disappearing.
ZFS can give memory back, but it does not do so quickly.


Yup, this is likely your problem.  ZFS takes a little time to give
back memory, and the app may fail with ENOMEM before this happens.


So, is there any way to check that?  If turns out to be the problem...

1) Is there a way to limit the size of ZFS's caches?


Well... sort of.  You can set the size of arc.c_max and this will
put an upper bound on the cache.  But this is a bit of a hack.


If not, then

2) Is there a way to clear ZFS's cache?


Try unmounting/mounting the file system, if that does not work,
try export/import of the pool.

-Mark
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Snapshots and backing store

2006-09-13 Thread Matthew Ahrens


Nicolas Dorfsman wrote:

We need to think ZFS as ZFS, and not as a new filesystem ! I mean,
the whole concept is different.


Agreed.


So. What could be the best architecture ?


What is the problem?


With UFS, I used to have separate metadevices/LUNs for each
application. With ZFS, I thought it would be nice to use a separate
pool for each application.


Ick.  It would be much better to have one pool, and a separate
filesystem for each application.


But, it means multiply snapshot backing-store OR dynamically
remove/add this space/LUN to pool where we need to do backups.


I don't understand this statement.  What problem are you trying to 
solve?  If you want to do backups, simply take a snapshot, then point 
your backup program at it.  If you want faster incremental backups, use 
'zfs send -i' to generate the file to backup.


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Bizzare problem with ZFS filesystem

2006-09-13 Thread Anantha N. Srirama

One more piece of information. I was able to ascertain the slowdown happens 
only when ZFS is used heavily; meaning lots of inflight I/O. This morning when 
the system was quiet my writes to the /u099 filesystem was excellent and it has 
gone south like I reported earlier. 

I am currently awaiting the completion of a write to /u099, well over 60 
seconds. At the same time I was able create/save files in /u001 without any 
problems. The only difference between the /u001 and /u099 is the size of the 
filesystem (256GB vs 768GB).

Per your suggestion I ran a 'zfs set' command and it completed after a wait of 
around 20 seconds while my file save from vi against /u099 is still pending!!!
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Snapshots and backing store

2006-09-13 Thread Darren Dunham

 Including performance considerations ?
 For instance, if I have two Oracle Databases with two I/O profiles (TP versus 
 Batch)...what would be the best :
 
 1) Two pools, each one on two LUNs. Each LUN distributed on n trays.
 2) One pool on one LUN. This LUN distributed on 2 x n trays.
 3) One pool striped on two LUNs. Each LUN distributed on n trays.

Good question.  I'll bet there's no way to determine that without
testing.  It may be that the extra extra performance from having the
additional lun(s) within a single pool outweighs any performance issues
from having both workloads use the same storage.  

 With one pool, no problem.
 
 With n pools, my problem is the space used by the snapshot. With the
 COW method of UFS snapshot I can put all backing-stores on one single
 volume.  With ZFS snapshot, it's conceptualy impossible.

Yup.  That's due to the differences in how those snapshots are
implemented. 

In the future you may be able to add and remove storage from pools
dynamically.  In such a case, it could be possible to bring a disk into
a pool, increase disk usage during a snapshot, delete the snapshot, then
remove the disk.  Disk removal would require copying data and be a
performance hit.  Then you go and do the same thing with the other
pools.

Today this isn't possible because you cannot migrate data off of a VDEV
to reclaim the storage.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Snapshots and backing store

2006-09-13 Thread Torrey McMahon


Matthew Ahrens wrote:

Nicolas Dorfsman wrote:

We need to think ZFS as ZFS, and not as a new filesystem ! I mean,
the whole concept is different.


Agreed.


So. What could be the best architecture ?


What is the problem?


With UFS, I used to have separate metadevices/LUNs for each
application. With ZFS, I thought it would be nice to use a separate
pool for each application.


Ick.  It would be much better to have one pool, and a separate
filesystem for each application. 



I agree but can you set performance boundaries based on the filesystem? 
The pool level seems to be the place to do such things. For example 
making sure an application has a set level of iops at its disposal.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Importing ZFS filesystems across architectures...

2006-09-13 Thread Erik Trimble

OK, this may seem like a stupid question (and we all know that there are
such things...)

I'm considering sharing a disk array (something like a 3510FC) between
two different systems, a SPARC and an Opteron.

Will ZFS transparently work to import/export pools between the two
systems?  That is, can I export a pool created on the SPARC box, then
import that on the Opteron box and have all the data there (and the pool
work normally)?

Normally, I'd run into problems with Fdisk vs EFI vs VTOC
labeling/partitioning, but I was hoping that ZFS would magically make my
life simpler here...

:-)




-- 
Erik Trimble
Java System Support
Mailstop:  usca14-102
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Importing ZFS filesystems across architectures...

2006-09-13 Thread Eric Schrock

If you're using EFI labels, yes (VTOC labels are not endian neutral).
ZFS will automatically convert endianness from the on-disk format, and
new data will be written using the native endianness, so data will be
gradually be rewritten to avoid the byteswap overhead.

- Eric

On Wed, Sep 13, 2006 at 03:55:27PM -0700, Erik Trimble wrote:
 OK, this may seem like a stupid question (and we all know that there are
 such things...)
 
 I'm considering sharing a disk array (something like a 3510FC) between
 two different systems, a SPARC and an Opteron.
 
 Will ZFS transparently work to import/export pools between the two
 systems?  That is, can I export a pool created on the SPARC box, then
 import that on the Opteron box and have all the data there (and the pool
 work normally)?
 
 Normally, I'd run into problems with Fdisk vs EFI vs VTOC
 labeling/partitioning, but I was hoping that ZFS would magically make my
 life simpler here...
 
 :-)
 
 
 
 
 -- 
 Erik Trimble
 Java System Support
 Mailstop:  usca14-102
 Phone:  x17195
 Santa Clara, CA
 Timezone: US/Pacific (GMT-0800)
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Importing ZFS filesystems across architectures...

2006-09-13 Thread James C. McPherson


Erik Trimble wrote:

OK, this may seem like a stupid question (and we all know that there are
such things...)

I'm considering sharing a disk array (something like a 3510FC) between
two different systems, a SPARC and an Opteron.

Will ZFS transparently work to import/export pools between the two
systems?  That is, can I export a pool created on the SPARC box, then
import that on the Opteron box and have all the data there (and the pool
work normally)?

Normally, I'd run into problems with Fdisk vs EFI vs VTOC
labeling/partitioning, but I was hoping that ZFS would magically make my
life simpler here...



Use EFI and you should be fine. as long as you don't try to
import any pools on both hosts at the same time.



James

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Importing ZFS filesystems across architectures...

2006-09-13 Thread Torrey McMahon


Erik Trimble wrote:

OK, this may seem like a stupid question (and we all know that there are
such things...)

I'm considering sharing a disk array (something like a 3510FC) between
two different systems, a SPARC and an Opteron.

Will ZFS transparently work to import/export pools between the two
systems?  That is, can I export a pool created on the SPARC box, then
import that on the Opteron box and have all the data there (and the pool
work normally)?

Normally, I'd run into problems with Fdisk vs EFI vs VTOC
labeling/partitioning, but I was hoping that ZFS would magically make my
life simpler here...

  


As long as you don't try to mount the pool from both systems at the same 
time, do funky auto-takeover stuff, etc. you should be good to goat 
least you're supposed to be able to do this. I haven't tested it.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Importing ZFS filesystems across architectures...

2006-09-13 Thread James Dickens


On 9/13/06, Erik Trimble [EMAIL PROTECTED] wrote:

OK, this may seem like a stupid question (and we all know that there are
such things...)

I'm considering sharing a disk array (something like a 3510FC) between
two different systems, a SPARC and an Opteron.

Will ZFS transparently work to import/export pools between the two
systems?  That is, can I export a pool created on the SPARC box, then
import that on the Opteron box and have all the data there (and the pool
work normally)?



yes, this is a design feature.

When you first move the pool and import it. The system will read the
blocks of  data and see that its a different endian that the current
host, and convert as necessary, as data is written, its is written in
native format, so data that is written by the pools new host will
suffer no endian penalty reading data it wrote, and a small penalty
accessing data from the old endian type.

James Dickens
uadmin.blogspot.com



Normally, I'd run into problems with Fdisk vs EFI vs VTOC
labeling/partitioning, but I was hoping that ZFS would magically make my
life simpler here...

:-)




--
Erik Trimble
Java System Support
Mailstop:  usca14-102
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread James C. McPherson


Frank Cusack wrote:
...[snip James McPherson's objections to PMC]

I understand the objection to mickey mouse configurations, but I don't
understand the objection to (what I consider) simply improving safety.

...

And why should failover be limited to SC?  Why shouldn't VCS be able to
play?  Why should SC have secrets on how to do failover?  After all,
this is OPENsolaris.  And anyway many homegrown solutions (the kind
I'm familiar with anyway) are of high quality compared to commercial
ones.



Frank, this isn't a SunCluster vs VCS argument. It's an argument about


* doing cluster-y stuff with the protection that a cluster framework
  provides

versus

* doing cluster-y stuff without the protection that a cluster framework
  provides



If you want to use VCS be my guest, and let us know how it goes.

If you want to use a homegrown solution, then please let us know
what you did to get it working, how well it copes and how you are
addressing any data corruption that might occur.

I tend to refer to SunCluster more than VCS simply because I've got
more in depth experience with Sun's offering.



James C. McPherson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Comments on a ZFS multiple use of a pool, RFE.

2006-09-13 Thread Daniel Rock


Anton B. Rang schrieb:

The hostid solution that VxVM uses would catch this second problem,

 because when A came up after its reboot, it would find that -- even
 though it had created the pool -- it was not the last machine to access
 it, and could refuse to automatically mount it. If the administrator
 really wanted it mounted, they could force the issue. Relying on the
 administrator to know that they have to remove a file (the
 'zpool cache') before they let the machine come up out of single-user
 mode seems the wrong approach to me. (By default, we'll shoot you in
 the foot, but we'll give you a way to unload the gun if you're fast
 enough and if you remember.)

I haven't tried: Does ZFS try to -f (force) import of zpools in 
/etc/zfs/zpool.cache or does it just do a normal import and fails if the disks 
seem to be in use elsewhere, e.g. after a reboot of a proably failed and later

repaired machine?


Just to clear some things up. The OP who started the whole discussion would 
have had the same problems with VxVM as he has now with ZFS. If you force an 
import of a disk group on one host while it is still active on another host 
won't make the DG magically disappear on the other one.


The corresponding flag to zpool import -f is vxdg import -C. If you issue 
this command you could also end up with the same DG imported on more than one 
host. Because on VxVM there is usually another level of indirection (volumes 
ontop of the DG which may contain filesystems and also have to manually mount) 
just importing a DG is normally harmless.


So also with VxVM you can shoot yourself in the foot.

On host B
B# vxdg -C import DG
B# vxvol -g DG startall
B# mount /dev/vx/dsk/DG/filesys /some/where
B# do_someting on /some/where

while still on host A
A# do_something on /some/where

Instead of a zpool.cache file VxVM uses the hostid (not to be confused with 
the host-id, normally just the ordinary hostname `uname -n` of the machine) to 
know which DGs it should mount automatically. Additionally each DG (or more 
precisely: each disk) has an autoimport flag which also has to be turned on to 
make the DG autoimported during bootup.


So to mimic VxVM in ZFS the solution would simply be: add an autoimport flag 
to the zpool.





Daniel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Comments on a ZFS multiple use of a pool, RFE.

2006-09-13 Thread Frank Cusack


On September 14, 2006 1:25:01 AM +0200 Daniel Rock [EMAIL PROTECTED] wrote:

Just to clear some things up. The OP who started the whole discussion would 
have had the same
problems with VxVM as he has now with ZFS. If you force an import of a disk 
group on one host
while it is still active on another host won't make the DG magically disappear 
on the other one.


The OP was just showing a test case.  On a real system your HA software
would exchange a heartbeat and not do a double import.  The problem with
zfs is that after the original system fails and the second system imports
the pool, the original system also tries to import on [re]boot, and the OP
didn't know how to disable this.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread Frank Cusack


On September 13, 2006 4:33:31 PM -0700 Frank Cusack [EMAIL PROTECTED] wrote:

You'd typically have a dedicated link for heartbeat, what if that cable
gets yanked or that NIC port dies.  The backup system could avoid mounting
the pool if zfs had its own heartbeat.  What if the cluster software
has a bug and tells the other system to take over?  zfs could protect
itself.


hmm actually probably not considering heartbeat intervals and failover
time vs. probable zpool update frequency.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: zfs and Oracle ASM

2006-09-13 Thread Anantha N. Srirama

I did a non-scientific benchmark against ASM and ZFS. Just look for my posts 
and you'll see it. To summarize it was a statistical tie for simple loads of 
around 2GB of data and we've chosen to stick with ASM for a variety of reasons 
not the least of which is its ability to rebalance when disks are 
added/removed. Better integration comes to mind too.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread Richard Elling


Dale Ghent wrote:

James C. McPherson wrote:

As I understand things, SunCluster 3.2 is expected to have support 
for HA-ZFS
and until that version is released you will not be running in a 
supported
configuration and so any errors you encounter are *your fault 
alone*.


Still, after reading Mathias's description, it seems that the former 
node is doing an implicit forced import when it boots back up. This 
seems wrong to me.

Repeat the experiment with UFS, or most other file systems, on a raw
device and you would get the same behaviour as ZFS: corruption.

The question on the table is why doesn't ZFS behave like a cluster-aware
volume manager not why does ZFS behave like UFS when 2 nodes
mount the same file system simultaneously?
-- richard

-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread Frank Cusack


On September 13, 2006 7:07:40 PM -0700 Richard Elling [EMAIL PROTECTED] wrote:

Dale Ghent wrote:

James C. McPherson wrote:


As I understand things, SunCluster 3.2 is expected to have support
for HA-ZFS
and until that version is released you will not be running in a
supported
configuration and so any errors you encounter are *your fault
alone*.


Still, after reading Mathias's description, it seems that the former
node is doing an implicit forced import when it boots back up. This
seems wrong to me.

Repeat the experiment with UFS, or most other file systems, on a raw
device and you would get the same behaviour as ZFS: corruption.


Again, the difference is that with UFS your filesystems won't auto
mount at boot.  If you repeated with UFS, you wouldn't try to mount
until you decided you should own the disk.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: zfs and Oracle ASM

2006-09-13 Thread Richard Elling


Anantha N. Srirama wrote:

I did a non-scientific benchmark against ASM and ZFS. Just look for my posts 
and you'll see it. To summarize it was a statistical tie for simple loads of 
around 2GB of data and we've chosen to stick with ASM for a variety of reasons 
not the least of which is its ability to rebalance when disks are 
added/removed. Better integration comes to mind too.
 
  

Yes.  I think I commented on this last year, too.  ASM is Oracle's solution
to replace all other file systems for their database.  You can expect 
that Oracle
will ensure that it's features are tightly coupled to the systems 
management

interfaces available from Oracle.  As such, there will always be better
integration between Oracle Database and ASM than any other generic file
system.  In other words, Oracle gains a lot by developing ASM to be
consistent with their systems management infrastructure and running on
heterogeneous, legacy systems -- a good thing.

(I don't think ZFS is going to lose any revenue stream from ASM ;-)
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re: Re: Proposal: multiple copies of user data

2006-09-13 Thread Wee Yeh Tan


On 9/13/06, Matthew Ahrens [EMAIL PROTECTED] wrote:

Sure, if you want *everything* in your pool to be mirrored, there is no
real need for this feature (you could argue that setting up the pool
would be easier if you didn't have to slice up the disk though).


Not necessarily.  Implementing this on the FS level will still allow
the administrator to turn on copies on the entire pool if since the
pool is technically also a FS and the property is inherited by child
FS's.  Of course, this will allow the admin to turn off copies to the
FS containing junk.


It could be recommended in some situations.  If you want to protect
against disk firmware errors, bit flips, part of the disk getting
scrogged, then mirroring on a single disk (whether via a mirror vdev or
copies=2) solves your problem.  Admittedly, these problems are probably
less common that whole-disk failure, which mirroring on a single disk
does not address.


I beg to differ from experience that the above errors are more common
than whole disk failures.  It's just that we do not notice the disks
are developing problems but panic when they finally fail completely.
That's what happens to most of my disks anyway.

Disks are much smarter nowadays with hiding bad sectors but it doesn't
mean that there are none.  If your precious data happens to sit on
one, you'll be crying for copies.


--
Just me,
Wire ...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Snapshots and backing store

2006-09-13 Thread David Magda


On Sep 13, 2006, at 10:52, Scott Howard wrote:


It's not at all bizarre once you understand how ZFS works. I'd suggest
reading through some of the documentation available at
http://www.opensolaris.org/os/community/zfs/docs/ , in paricular the
Slides available there.


The presentation that 'goes' with those slides is available online:

http://www.sun.com/software/solaris/zfs_learning_center.jsp

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re: marvel cards.. as recommended

2006-09-13 Thread Joe Little


Yeah. I got the message from a few others, and we are hoping to
return/buy the newer one. I've sort of surprised by the limited set of
SATA RAID or JBOD cards that one can actually use. Even the one's
linked to on this list sometimes aren't supported :). I need to get up
and running like yesterday, so we are just ordering the cards post
haste.


On 9/13/06, Anton B. Rang [EMAIL PROTECTED] wrote:

A quick peek at the Linux source shows a small workaround in place for the 07 
revision...maybe if you file a bug against Solaris to support this revision it 
might be possible to get it added, at least if that's the only issue.


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] any update on zfs root/boot ?

2006-09-13 Thread James C. McPherson



Hi folks,
I'm in the annoying position of having to replace my rootdisk
(since it's a [EMAIL PROTECTED]@$! maxtor and dying). I'm currently running
with zfsroot after following Tabriz' and TimF's procedure to
enable that. However, I'd like to know whether there's a better
way to get zfs root/boot happening? The mini-ufs partition
kludge is getting a bit tired :)

My plan for the moment (with build45, the most recent iso that
I have) is to


* install the new disk
* boot to singleuser off the media
* create a swap slice and an everythingelse slice on the
  new disk
* zpool create rootpool everythingelse slice
* reboot and start the installer
* convince the installer that I don't need to partition anything
* install to the new rootpool.


The second last step is where I imagine the most difficulty
will arise. Is there anything that springs to mind which I
could do to ensure it works?


thanks in advance,
James C. McPherson
--
Solaris kernel software engineer, system admin and troubleshooter
http://www.jmcp.homeunix.com/blog
Find me on LinkedIn @ http://www.linkedin.com/pub/2/1ab/967
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

88 matches

Mail list logo