Re: [zfs-discuss] ZFS Mirror cloning

2009-07-25 Thread Andrew Gabriel

Richard Elling wrote:

That is because you had only one other choice: filesystem level copy.
With ZFS I believe you will find that snapshots will allow you to have
better control over this. The send/receive process is very, very similar
to a mirror resilver, so you are only carrying your previous process
forward into a brave new world. You'll find that send/receive is much
more flexible than broken mirrors can be.
 -- richard 


I always do my zpool backups by split mirrors, but I did just try using 
zfs send/receive to see if it's viable. Unfortunately, it craps out 
about 10% in with cannot receive incremental stream: invalid backup 
stream, so it doesn't get me very far. I would guess from the time 
taken to get as far as it did, that if it had worked, it's probably 
going to be about 3 times slower than a resilver and split mirror. 
That's for a 500GB zpool with 8 filesystems and 3,500 snapshots.


--
Andrew
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD's and ZFS...

2009-07-25 Thread Tristan Ball
The 128G Supertalent Ultradrive ME. The larger version of the drive 
mentioned in the original post. Sorry, should have made that a little 
clearer. :-)


T

Kyle McDonald wrote:

Tristan Ball wrote:

It just so happens I have one of the 128G and two of the 32G versions in
my drawer, waiting to go into our DR disk array when it arrives.
  

Hi Tristan,

Just so I can be clear, What model/brand are the drives you were testing?

 -Kyle


I dropped the 128G into a spare Dell 745 (2GB ram) and used a Ubuntu
liveCD to run some simple iozone tests on it. I had some stability
issues with Iozone crashing however I did get some results...

Attached are what I've got. I intended to do two sets of tests, one for
each of sequential reads, writes, and a random IO mix. I also wanted
to do a second set of tests, running a streaming read or streaming write
in parallel with the random IO mix, as I understand many SSD's have
trouble with those kind of workloads.

As it turns out, so did my test PC. :-)
I've used 8K IO sizes for all the stage one tests - I know I might get
it to go faster with a larger size, but I like to know how well systems
will do when I treat them badly!

The Stage_1_Ops_thru_run is interesting. 2000+ ops/sec on random writes,
5000 on reads.


The Streaming write load and random over writes were started at the
same time - although I didn't see which one finished first, so it's
possible that the stream finished first and allowed the random run to
finish strong. Basically take these numbers with several large grains of
salt!

Interestingly, the random IO mix doesn't slow down much, but the
streaming writes are hurt a lot.

Regards,
Tristan.



-Original Message-
From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of thomas
Sent: Friday, 24 July 2009 5:23 AM
To: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] SSD's and ZFS...

 

I think it is a great idea, assuming the SSD has good write


performance.
 

This one claims up to 230MB/s read and 180MB/s write and it's only


$196.
 

http://www.newegg.com/Product/Product.aspx?Item=N82E16820609393

Compared to this one (250MB/s read and 170MB/s write) which is $699.

Are those claims really trustworthy? They sound too good to be true!




MB/s numbers are not a good indication of performance. What you should
pay attention to are usually random IOPS write and read. They tend to
correlate a bit, but those numbers on newegg are probably just best case
from the manufacturer.

In the world of consumer grade SSDs, Intel has crushed everyone on IOPS
performance.. but the other manufacturers are starting to catch up a
bit.
  



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  



__
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
__

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD's and ZFS...

2009-07-25 Thread Tristan Ball



Bob Friesenhahn wrote:

On Fri, 24 Jul 2009, Tristan Ball wrote:


I've used 8K IO sizes for all the stage one tests - I know I might get
it to go faster with a larger size, but I like to know how well systems
will do when I treat them badly!

The Stage_1_Ops_thru_run is interesting. 2000+ ops/sec on random writes,
5000 on reads.


This seems like rather low random write performance.  My 12-drive 
array of rotating rust obtains 3708.89 ops/sec.  In order to be 
effective, it seems that a synchronous write log should perform 
considerably better than the backing store.


That really depends on what you're trying to achieve. Even if this 
single drive is only showing equivilient performance to a twelve drive 
array (and I suspect your 3700 ops/sec would slow down over a bigger 
data set, as seeks make more of an impact), that still means that if the 
SSD is used as a ZIL, those sync writes don't have to be written to the 
spinning disks immediately giving the scheduler a better change to order 
the IO's providing better over all latency response for the requests 
that are going to disk.


And while I didn't make it clear, I actually intend to use the 128G 
drive as a L2ARC. While it's effectiveness will obviously depend on the 
access patterns, the cost of adding the drive to the array is basically 
trivial, and it significantly increases the total ops/sec the array is 
capable of during those times that the access patterns provide for it. 
For my use, it was a case of might as well. :-)


Tristan.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-25 Thread roland
Running this kind of setup absolutely can give you NO garanties at all.
Virtualisation, OSOL/zfs on WinXP. It's nice to play with and see it
working but would I TRUST precious data to it? No way!

why not?
if i write some data trough virtualization layer which goes straight trough to 
raw disk - what`s the problem?
do a snapshot and you can be sure you have a safe state. or not?
you can check if you are consistent by doing a scrub. or not?
taken buffers/caches into consideration, you could eventually loose some 
seconds/minutes of work, but doesn`t zfs use transactional design which ensures 
consistency? 

so, how can that happen what´s being reported here, if zfs takes so much care 
of consistency?

When that happens, ZFS believes the data is safely written, but a power cut or 
crash can cause severe problems with the pool.

didn`t i read a million times that zfs ensures an always consistent state and 
is self healing, too?

so, if new blocks are always written at new positions - why can`t we just roll 
back to a point in time (for example last snapshot) which is known to be 
safe/consistent ?

i give a shit about the last 5 minutes of work if i can recover my TB sized 
pool instead.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The importance of ECC RAM for ZFS

2009-07-25 Thread Michael McCandless
Thanks for the numerous responses everyone!  Responding to some of the
answers...:

 ZFS has to trust the storage to have committed the data it 
 claims to have committed in the same way it has to trust the integrity 
 of the RAM it uses for checksummed data.

I hope that's not true.

Ie, I can understand that if an IO system lies in an fsync call
(returns before the bits are in fact on stable storage) that ZFS might
lose the pool.  EG it seems like that may've been what happened on the
VB thread (though I agree since it was only the guest that crashed,
the writes should in fact have made it to disk, so...).

But if a bit flips in RAM, at a particularly unlucky moment, is there
any chance whatsoever that ZFS could lose the pool?  There seems to be
mixed opinions here so far... but if I were tallying votes it looks
like more people say no, it cannot than yes it may.

  For example, if the wrong bit flips at the wrong time, could I lose my
  entire RAID-Z pool instead of, say, corrupting one file's contents or
  metadata? Is there such a possibility?
 
 Not likely, but I don't think anyone has done such low-level
 analysis to prove it.

So this is exactly what I'm driving at -- has there really been no
such low level failure analysis?  Ie, if a bit error happens at point
XYZ in ZFS's code, what's the impact (for XYZ at all interesting
points)?

EG say (pure speculation) ZFS has a global checksum that's written on
closing the pool, and then later the pool cannot be imported when the
checksum is bad.  Since a bit error could corrupt that checksum, this
would in fact mean I could lose the pool due to an unluckily timed
bit error.

The decision (to use ECC or not) ought to be a basic cost/benefit
analysis, once one has the facts.  I'm trying to get to the facts
here... ie, if you don't use ECC just how bad is it when bit errors
inevitably happen?  If the effects are local (file/dir contents 
metadata get corrupted) that's one thing; if I could lose the pool
that's very different.

[Eventually] armed with the facts, one should be free to decide on ECC
or not just like one picks, say, the latest  greatest consumer hard
drive (higher risk of errors since they have no track record) or a
known good enterprise hard drive.

 You still have the processor to worry about though.

and

 NB many hard disk drives and controllers have only parity protected
 memory. So even if your main memory is ECC, it is unlikely that the
 entire data path is ECC protected.

These are good points -- even if you have ECC RAM, your CPU and PCI
bus and other parts of the data path could still flip bits.  So I'm
really hoping the answer is no, you'll never lose the pool from
bit errors.

 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6667683
 
 Most of the issues that I've read on this list would have been 
 solved if there was a mechanism where the user / sysadmin could tell 
 ZFS to simply go back until it found a TXG that worked.

This one sounds important!  Any means of disaster recovery would be
very welcome...

BTW is there some way for a user to vote/comment on bugs?  EG I think
I've hit this one:

  http://bugs.opensolaris.org/view_bug.do?bug_id=6807184

And would love to vote, share my config, situation, etc.  But I can't
find any links that let me, there are no comments on the bug, etc.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-25 Thread Bob Friesenhahn

On Sat, 25 Jul 2009, roland wrote:


When that happens, ZFS believes the data is safely written, but a 
power cut or crash can cause severe problems with the pool.


didn`t i read a million times that zfs ensures an always consistent 
state and is self healing, too?


so, if new blocks are always written at new positions - why can`t we 
just roll back to a point in time (for example last snapshot) which 
is known to be safe/consistent ?


As soon as you have more then one disk in the equation, then it is 
vital that the disks commit their data when requested since otherwise 
the data on disk will not be in a consistent state.  If the disks 
simply do whatever they want then some disks will have written the 
data while other disks will still have it cached.  This blows the 
consistent state on disk even though zfs wrote the data in order and 
did all the right things.  Any uncommitted data in disk cache will be 
forgotten if the system loses power.


There is an additional problem if when the disks finally get around to 
writing the cached data that they write it in a different order than 
requested while ignoring the commit request.  It is common that the 
disks write data in the most efficient order, but it absolutely must 
commit all of the data when requested so that the checkpoint is valid.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-25 Thread roland
As soon as you have more then one disk in the equation, then it is
vital that the disks commit their data when requested since otherwise
the data on disk will not be in a consistent state.

ok, but doesn`t that refer only to the most recent data?
why can i loose a whole 10TB pool including all the snapshots with the 
logging/transactional nature of zfs?

isn`t the data in the snapshots set to read only so all blocks with snapshotted 
data don`t change over time (and thus give an secure entry to a consistent 
point in time) ?

ok, this are probably some short-sighted questions, but i`m trying to 
understand how things could go wrong with zfs and how issues like these happen.

on other filesystems, we have tools for fsck as a last resort or tools to 
recover data from unmountable filesystems. 
with zfs i don`t know any of these, so it`s that will solaris mount my zfs 
after the next crash? question which frightens me a little bit.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] sharenfs question

2009-07-25 Thread dick hoogendijk
# zfs create store/snaps
# zfs set sharenfs='rw=arwen,root=arwen' store/snaps
# share
-...@store/snaps   /store/snaps   sec=sys,rw=arwen,root=arwen 

arwen# zfs send -Rv rp...@0906  /net/westmark/store/snaps/rpool.0906
zsh: permission denied: /net/westmark/store/snaps/rpool.0906

*** BOTH systems have NFSMAPID DOMAIN=nagual.nl set in the
*** file /etc/default/nfs

The NFS docs mention that the rw option can be a node (like arwen).
But as you can see I get no access when I set rw=arwen.
And yet arwen is known!
This rule works:
#zfs set sharenfs='root=arwen' store/snaps
The snapshots are send from arwen to the remote machine and get the
root:root privileges. So that,s OK.
This rule does NOT work:
# zfs set sharenfs='rw=arwen,root=arwen' store/snaps
I get a permission denied. Apparently rw=arwen is nog reckognized.

Is something wrong in the syntax the way ZFS uses sharenfs?
Or have I misread the manual of share_nfs?
What can be wrong is the line zfs set sharenfs='rw=arwen,root=arwen'
store/snaps

-- 
Dick Hoogendijk -- PGP/GnuPG key: 01D2433D
+ http://nagual.nl/ | nevada / OpenSolaris 2010.02 B118
+ All that's really worth doing is what we do for others (Lewis Carrol)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-25 Thread David Magda

On Jul 25, 2009, at 12:24, roland wrote:

why can i loose a whole 10TB pool including all the snapshots with  
the logging/transactional nature of zfs?


Because ZFS does not (yet) have an (easy) way to go back a previous  
state. That's what this bug is about:



need a way to rollback to an uberblock from a previous txg


http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6667683

While in most cases ZFS will cleanly recover after a non-clean  
shutdown, there are situations where the disks doing strange things  
(like lying) have caused the ZFS data structures to become wonky. The  
'broken' data structure will cause all branches underneath it to be  
lost--and if it's near the top of the tree, it could mean a good  
portion of the pool is inaccessible.


Fixing the above bug should hopefully allow users / sysadmins to tell  
ZFS to go 'back in time' and look up previous versions of the data  
structures.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-25 Thread roland
thanks for the explanation !

one more question:

 there are situations where the disks doing strange things
(like lying) have caused the ZFS data structures to become wonky. The
'broken' data structure will cause all branches underneath it to be
lost--and if it's near the top of the tree, it could mean a good
portion of the pool is inaccessible.

can snapshots also be affected by such issue or are they somewhat immune here?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-25 Thread David Magda


On Jul 25, 2009, at 14:17, roland wrote:


thanks for the explanation !

one more question:


there are situations where the disks doing strange things
(like lying) have caused the ZFS data structures to become wonky. The
'broken' data structure will cause all branches underneath it to be
lost--and if it's near the top of the tree, it could mean a good
portion of the pool is inaccessible.


can snapshots also be affected by such issue or are they somewhat  
immune here?


Yes, it can be affected. If the snapshot's data structure / record is  
underneath the corrupted data in the tree then it won't be able to be  
reached.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-25 Thread Frank Middleton

On 07/25/09 02:50 PM, David Magda wrote:


Yes, it can be affected. If the snapshot's data structure / record is
underneath the corrupted data in the tree then it won't be able to be
reached.


Can you comment on if/how mirroring or raidz mitigates this, or tree
corruption in general? I have yet to lose a pool even on a machine
with fairly pathological problems, but it is mirrored (and copies=2).

I was also wondering if you could explain why the ZIL can't
repair such damage.

Finally, a number of posters blamed VB for ignoring a flush, but
according to the evil tuning guide, without any application syncs,
ZFS may wait up to 5 seconds before issuing a synch, and there
must be all kinds of failure modes even on bare hardware where
it never gets a chance to do one at shutdown. This is interesting
if you do ZFS over iscsi because of the possibility of someone
tripping over a patch cord or a router blowing a fuse. Doesn't
this mean /any/ hardware might have this problem, albeit with much
lower probability?

Thanks

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Beginner's question on output of zfs list

2009-07-25 Thread Axelle Apvrille
Hi,
I'm pretty sure this is a beginner's question. The output of my zfs list is 
shown below. I don't understand how two ZFS file systems (opensolaris and 
opensolaris-1) can be mounted on the same mountpoint. [Hint: they correspond to 
different BEs].
And as I probably should erase one of them (the old is opensolaris), how do I 
check first what's on it ?
Thanks !

NAME   USED  AVAIL  REFER  MOUNTPOINT
rpool 41.9G  6.11G  1.42G  /a/rpool
rpool/ROOT10.4G  6.11G18K  legacy
rpool/ROOT/opensolaris7.18G  6.11G  6.75G  /
rpool/ROOT/opensolaris-1  3.26G  6.11G  7.37G  /
rpool/dump 895M  6.11G   895M  -
rpool/export  28.2G  6.11G19K  /export
rpool/export/home 28.2G  6.11G   654M  /export/home
rpool/export/home/axelle  27.6G  6.11G  27.4G  /export/home/axelle
rpool/swap 895M  6.90G  88.3M  -
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-25 Thread Carson Gaspar

Frank Middleton wrote:

Finally, a number of posters blamed VB for ignoring a flush, but
according to the evil tuning guide, without any application syncs,
ZFS may wait up to 5 seconds before issuing a synch, and there
must be all kinds of failure modes even on bare hardware where
it never gets a chance to do one at shutdown. This is interesting
if you do ZFS over iscsi because of the possibility of someone
tripping over a patch cord or a router blowing a fuse. Doesn't
this mean /any/ hardware might have this problem, albeit with much
lower probability?


No. You'll lose unwritten data, but won't corrupt the pool, because the on-disk 
state will be sane, as long as your iSCSI stack doesn't lie about data commits 
or ignore cache flush commands. Why is this so difficult for people to 
understand? Let me create a simple example for you.


Get yourself 4 small pieces of paper, and number them 1 through 4.

On piece 1, write Four (app write disk A)
On piece 2, write Score (app write disk B)
Place piece 1 and piece 2 together on the side (metadata write, cache flush)
On piece 3, write Every (app overwrite disk A)
On piece 4, write Good (app overwrite disk B)
Place piece 2 and piece 3 on top of pieces one and 2 (metadata write, cache 
flush)

IFF you obeyed the instructions, the only things you could ever have on the side 
are nothing, Four Score, or Every Good (we assume that side placement is 
atomic). You could get killed after writing something on pieces 3 or 4, and lose 
them, but you could never have garbage.


Now if you were too lazy to bother to follow the instructions properly, we could 
end up with bizarre things. This is what happens when storage lies and re-orders 
writes across boundaries.


--
Carson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The importance of ECC RAM for ZFS

2009-07-25 Thread Ian Collins

Michael McCandless wrote:

Thanks for the numerous responses everyone!  Responding to some of the
answers...:

  
ZFS has to trust the storage to have committed the data it 
claims to have committed in the same way it has to trust the integrity 
of the RAM it uses for checksummed data.



I hope that's not true.

Ie, I can understand that if an IO system lies in an fsync call
(returns before the bits are in fact on stable storage) that ZFS might
lose the pool.  EG it seems like that may've been what happened on the
VB thread (though I agree since it was only the guest that crashed,
the writes should in fact have made it to disk, so...).

But if a bit flips in RAM, at a particularly unlucky moment, is there
any chance whatsoever that ZFS could lose the pool?  There seems to be
mixed opinions here so far... but if I were tallying votes it looks
like more people say no, it cannot than yes it may.

  
I've never seen reports of that happening.  What I have seen is 
corrupted files.  Without checksums, the files would have been silently 
corrupted.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Beginner's question on output of zfs list

2009-07-25 Thread Ian Collins

Axelle Apvrille wrote:

Hi,
I'm pretty sure this is a beginner's question. The output of my zfs list is 
shown below. I don't understand how two ZFS file systems (opensolaris and 
opensolaris-1) can be mounted on the same mountpoint. [Hint: they correspond to 
different BEs].
  
They may have the same mountpoint as long as you don't attempt to mount 
them both at the same time.



And as I probably should erase one of them (the old is opensolaris), how do I 
check first what's on it ?
Thanks !
  

Don't do that, use the LU tools to manage BEs.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The importance of ECC RAM for ZFS

2009-07-25 Thread Marc Bevand
dick hoogendijk dick at nagual.nl writes:
 
 I live in Holland and it is not easy to find motherboards that (a)
 truly support ECC ram and (b) are (Open)Solaris compatible.

Virtually all motherboards for AMD processors support ECC RAM because the 
memory controller is in the CPU and all AMD CPUs support ECC RAM.

I have heard of a few BIOSes that refuse to POST if ECC RAM is detected, but 
this is often an attempt to segment markets, rather than a real lack of 
ability to support ECC RAM.

-mrb

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-25 Thread Toby Thain


On 25-Jul-09, at 3:32 PM, Frank Middleton wrote:


On 07/25/09 02:50 PM, David Magda wrote:


Yes, it can be affected. If the snapshot's data structure / record is
underneath the corrupted data in the tree then it won't be able to be
reached.


Can you comment on if/how mirroring or raidz mitigates this, or tree
corruption in general? I have yet to lose a pool even on a machine
with fairly pathological problems, but it is mirrored (and copies=2).

I was also wondering if you could explain why the ZIL can't
repair such damage.

Finally, a number of posters blamed VB for ignoring a flush, but
according to the evil tuning guide, without any application syncs,
ZFS may wait up to 5 seconds before issuing a synch, and there
must be all kinds of failure modes even on bare hardware where
it never gets a chance to do one at shutdown. This is interesting
if you do ZFS over iscsi because of the possibility of someone
tripping over a patch cord or a router blowing a fuse. Doesn't
this mean /any/ hardware might have this problem, albeit with much
lower probability?


The problem is assumed *ordering*. In this respect VB ignoring  
flushes and real hardware are not going to behave the same.


--Toby



Thanks

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool is lain to burnination (bwahahahah!)

2009-07-25 Thread Graeme Clark
Hi Folks,

I have a small problem. I've disappeared about 5.9TB of data. 

My host system was (well, still is) connected to this storage via iSCSI and 
MPXIO, doing round robin of a pair of GigE ports. I'd like to make a quick 
excuse before we begin here.

I was originally doing raidz2 (there are 15 disks involved), however during 
heavy load (i.e. a scrub with ~50% disk usage) errors showed up all over the 
place and eventually faulted the pool. I assume I was either running into 
problems with bandwidth or generally freaking out the array with the volume of 
IO.

So - I'm instead exporting the whole deal as a single 5.9TB LUN (done as raid5 
on the iSCSI appliance - a Promise m500i). Well, that was all good and well 
until I had a kernel panic earlier today and system came back rather unhappily. 

My pool now looks like this:

  pool: store
 state: FAULTED
status: One or more devices could not be used because the label is missing 
or invalid.  There are insufficient replicas for the pool to continue
functioning.
action: Destroy and re-create the pool from a backup source.
   see: http://www.sun.com/msg/ZFS-8000-5E
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
storeFAULTED  0 0 0  corrupted data
  c0t22310001557D05D5d0  FAULTED  0 0 0  corrupted data

I still see the LUN:

AVAILABLE DISK SELECTIONS:
   0. c0t22310001557D05D5d0 Promise-VTrak M500i-021B-5.90TB
  /scsi_vhci/d...@g22310001557d05d5


I can do a zdb to the device and I get some info (well, actually to s0 on the 
disk, which is weird because I think I built the array without specifying a 
slice. Maybe relevant - don't know...)

gecl...@ostore:~# zdb -l /dev/rdsk/c0t22310001557D05D5d0s0

LABEL 0

version=14
name='store'
state=0
txg=178224
pool_guid=13934602390719084200
hostid=8462299
hostname='store'
top_guid=14931103169794670927
guid=14931103169794670927
vdev_tree
type='disk'
id=0
guid=14931103169794670927
path='/dev/dsk/c0t22310001557D05D5d0s0'
devid='id1,s...@x22310001557d05d5/a'
phys_path='/scsi_vhci/d...@g22310001557d05d5:a'
whole_disk=1
metaslab_array=23
metaslab_shift=35
ashift=9
asize=6486985015296
is_log=0
DTL=44

LABEL 1

version=14
name='store'
state=0
txg=178224
pool_guid=13934602390719084200
hostid=8462299
hostname='store'
top_guid=14931103169794670927
guid=14931103169794670927
vdev_tree
type='disk'
id=0
guid=14931103169794670927
path='/dev/dsk/c0t22310001557D05D5d0s0'
devid='id1,s...@x22310001557d05d5/a'
phys_path='/scsi_vhci/d...@g22310001557d05d5:a'
whole_disk=1
metaslab_array=23
metaslab_shift=35
ashift=9
asize=6486985015296
is_log=0
DTL=44

LABEL 2

version=14
name='store'
state=0
txg=178224
pool_guid=13934602390719084200
hostid=8462299
hostname='store'
top_guid=14931103169794670927
guid=14931103169794670927
vdev_tree
type='disk'
id=0
guid=14931103169794670927
path='/dev/dsk/c0t22310001557D05D5d0s0'
devid='id1,s...@x22310001557d05d5/a'
phys_path='/scsi_vhci/d...@g22310001557d05d5:a'
whole_disk=1
metaslab_array=23
metaslab_shift=35
ashift=9
asize=6486985015296
is_log=0
DTL=44

LABEL 3

version=14
name='store'
state=0
txg=178224
pool_guid=13934602390719084200
hostid=8462299
hostname='store'
top_guid=14931103169794670927
guid=14931103169794670927
vdev_tree
type='disk'
id=0
guid=14931103169794670927
path='/dev/dsk/c0t22310001557D05D5d0s0'
devid='id1,s...@x22310001557d05d5/a'
phys_path='/scsi_vhci/d...@g22310001557d05d5:a'
whole_disk=1
metaslab_array=23
metaslab_shift=35
ashift=9
asize=6486985015296
is_log=0
DTL=44

I can force export and import the pool, but I can't seem to get it active 
again. I've been reading around trying to get this figured out. Is this a 
scenario in which I should expect to have well and truly lost all of my data?

The data is not irreplaceable, I can rebuild / restore from backups but it will 
take an awful long time. I'm aware that this is a highly suboptimal setup, but 
feel free to beat me up a bit on it anyways.

In an ideal scenario I'd like to somehow 

[zfs-discuss] Fishworks iSCSI cache enabled...

2009-07-25 Thread Marcelo Leal
Hello all,
 Somebody using iSCSI cache enable on 7000 series? I'm talking about 
OpenSolaris (ZFS) as an iSCSI initiator, because i don't know another 
filesystem that handles disk caches.
 So, that option was created for ZFS ;-)? 
 Any suggestions on this?

 Thanks

 Leal
[ http://www.eall.com.br/blog ]
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss