Re: [zfs-discuss] Help with setting up ZFS

2009-07-27 Thread Ross
 This is going to be used for my parents business (Im
 merely setting it up for them and then leaving it.)
 So basically what I want is reliability and
 redundancy.  I want there to be very little chance
 of data loss as the business they are in requires
 them to keep all documents.  

Ok, ZFS is good, but what you really need here is a proper backup strategy.  If 
need be, skimp on the server so that you can create a good backup system.  
Never, ever, keep all your eggs in one basket.

If their data is that important, you need to get a copy off-site, and you need 
some kind of automated process to do that - people don't realise how important 
backups are, and if you leave it to a manual system it won't get done or 
checked.

I'd be very tempted to use zfs send/receive to send the data to another 
machine, even if it's just a virtualbox server you run at home.

PS.  You're also going to need some kind of remote monitoring of that server - 
sure, raidz2 will keep your data going when a disk fails, but unless you know 
that the disk needs replacing, what's going to happen?  What's going to happen 
to that server in a couple of years time when you've forgotten all about it and 
suddenly get a call from your parents to say it's stopped working?  If I were 
you, I'd write a script to run zpool status -x, and email you if there are 
any errors.

PPS.  Yes, you can and should scrub regularly, running that once a week is as 
easy as adding a line to crontab.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How to check what's on a given ZFS file system ?

2009-07-27 Thread Axelle Apvrille
Hi,
I've already sent a few posts around this issue, but haven't quite got the 
answer - so I'll try to clarify my question :)

Since I have upgraded from 2008.11 to 2009.06 a new BE has been created. On 
ZFS, that corresponds to two file systems, both (strangely) mounted on /. The 
old BE corresponds to a file system of 7G, the new one to a fs of 3G. As I fall 
short of space and do not need the old 2008.11 any more, I would like to work 
out how to delete it. I know I should do beadm destroy name. But before I do 
that I would like to understand what I am erasing in those 7G (makes sense, 
doesn't it ?). 

In particular, I'd like to:
- understand what's in that old 7G ?
- understand why the new environment only takes 3G
- possibly mount the old 7G to check what's inside ? I'm surprised it is said 
to be mounted on /
- understand what beadm destroy actually destroys !

Thanks very much !
Axelle

$ zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
rpool 42.1G  5.90G  1.42G  /a/rpool
rpool/ROOT10.5G  5.90G18K  legacy
rpool/ROOT/opensolaris7.18G  5.90G  6.75G  /
rpool/ROOT/opensolaris-1  3.30G  5.90G  7.39G  /
rpool/dump 895M  5.90G   895M  -
rpool/export  28.4G  5.90G19K  /export
rpool/export/home 28.4G  5.90G   654M  /export/home
rpool/export/home/axelle  27.8G  5.90G  27.4G  /export/home/axelle
rpool/swap 895M  6.69G  88.3M  -

$ beadm list 
BEActive Mountpoint Space Policy Created  
---- -- - -- ---  
opensolaris   -  -  7.57G static 2009-01-03 13:18 
opensolaris-1 NR /  3.42G static 2009-07-20 22:38
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to check what's on a given ZFS file system ?

2009-07-27 Thread Fajar A. Nugraha
On Mon, Jul 27, 2009 at 2:51 PM, Axelle
Apvrilleno-re...@opensolaris.org wrote:
 Hi,
 I've already sent a few posts around this issue, but haven't quite got the 
 answer - so I'll try to clarify my question :)

 Since I have upgraded from 2008.11 to 2009.06 a new BE has been created. On 
 ZFS, that corresponds to two file systems, both (strangely) mounted on /. The 
 old BE corresponds to a file system of 7G, the new one to a fs of 3G.

Not quite. beadm list and the USED column of zfs list does not
always correspond to the sum of data size of all it's
files/directories. Do something like this:
- mkdir /a (if not's there already)
- beadm mount rpool/ROOT/opensolaris /a
- df -h /
- df -h /a
- beadm umount rpool/ROOT/opensolaris

Compare the output of df to the output of beadm list, and you'll see
what I mean. Note that the df output should be similar to that of
REFER from zfs list.

 As I fall short of space and do not need the old 2008.11 any more, I would 
 like to work out how to delete it. I know I should do beadm destroy name. 
 But before I do that I would like to understand what I am erasing in those 7G 
 (makes sense, doesn't it ?).

 In particular, I'd like to:
 - understand what's in that old 7G ?
 - understand why the new environment only takes 3G

some of the space is probably shared (as in they refer to the same
block) between the old and new environment (/var comes to mind). If
you destroy the old BE (which is essentially a zfs destroy), it
might be possible that you won't be able to reclaim all that 7G space
if the new environment still refers to them.

It's actualy odd that your old BE uses more space. In my setup the new
BE always uses most space. Try runnning this:

zfs get all | grep origin


 - possibly mount the old 7G to check what's inside ?

use beadm mount

 I'm surprised it is said to be mounted on /

No they aren't. The mountpoint property merely records what the
mountpoint is or will be when the fs is mounted, and not the actual
current mountpoint. In your case rpool/ROOT/opensolaris is not mounted
anywhere. Do this

zfs get mounted,mountpoint,canmount rpool/ROOT/opensolaris
rpool/ROOT/opensolaris-1

 - understand what beadm destroy actually destroys !

This fs

 rpool/ROOT/opensolaris    7.18G  5.90G  6.75G  /

which should be a zfs clone of this fs, or the other way around

 rpool/ROOT/opensolaris-1  3.30G  5.90G  7.39G  /

-- 
Fajar
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with setting up ZFS

2009-07-27 Thread Brian
Yes Ive thought about some off-site strategy.  My parents are used to loading 
their data onto an external hard drive, however this always struck me as a bad 
strategy.  A tape backup system is unlikely due to the cost, however I could 
get them to continue also loading the data onto an external hard drive.  In the 
ideal case the following would happen:  They drag their windows files onto the 
ZFS server to store them.  This then triggers an automatic load of the files 
onto an external hard drive.  To implement this I guess I could have the 
external hard drive always plugged into the ZFS system and write a scrip to 
load the files onto the external also.  Im not sure if its possible to do this 
on the windows side, i.e writing some script to load the data onto the external 
when its plugged into the windows machine.  

That said I have read about software that will automatically backup data.  
Presumably I could configure the software to load the data onto the server, and 
at the same time load it onto the external which would be attached to their 
windows machine, and have it do this every night.  I haven't looked too hard 
into this yet as I'm presuming there will be complications in having a windows 
backup program deal with loading data onto an OpenSolaris server.  Though it 
might be straightforward, I just haven't looked at it yet.

The ZFS send/receive command can presumably only send the filesystem to another 
OpenSolaris OS right?  Is there anyone way to send it to a normal Linux 
distribution (ext3)?


I have one more question about the hardware setup.  You said to get a breakout 
cable for the SAS plugs.  Is it advisable to put 4 hard drives onto one SAS 
plug?  I was thinking of just getting 4 SATA to SAS cables and hooking each 
drive up to a different SAS plug (as I believe there are 8.)  

I am also going to look into setting up some system for accessing the server 
over the internet.  FTP would be the easiest choice, however it might be fun to 
set up an Apache server.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with setting up ZFS

2009-07-27 Thread Brian
Thank you, Ill definitely implement a script to scrub the system, and have the 
system email me if there is a problem.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with setting up ZFS

2009-07-27 Thread Erik Trimble

Brian wrote:
Yes Ive thought about some off-site strategy.  My parents are used to loading their data onto an external hard drive, however this always struck me as a bad strategy.  A tape backup system is unlikely due to the cost, however I could get them to continue also loading the data onto an external hard drive.  In the ideal case the following would happen:  They drag their windows files onto the ZFS server to store them.  This then triggers an automatic load of the files onto an external hard drive.  To implement this I guess I could have the external hard drive always plugged into the ZFS system and write a scrip to load the files onto the external also.  Im not sure if its possible to do this on the windows side, i.e writing some script to load the data onto the external when its plugged into the windows machine.  


That said I have read about software that will automatically backup data.  
Presumably I could configure the software to load the data onto the server, and 
at the same time load it onto the external which would be attached to their 
windows machine, and have it do this every night.  I haven't looked too hard 
into this yet as I'm presuming there will be complications in having a windows 
backup program deal with loading data onto an OpenSolaris server.  Though it 
might be straightforward, I just haven't looked at it yet.

The ZFS send/receive command can presumably only send the filesystem to another 
OpenSolaris OS right?  Is there anyone way to send it to a normal Linux 
distribution (ext3)?
  

Send/receive is ZFS-only.  rsync  is a common way of moving data these days.
I have one more question about the hardware setup.  You said to get a breakout cable for the SAS plugs.  Is it advisable to put 4 hard drives onto one SAS plug?  I was thinking of just getting 4 SATA to SAS cables and hooking each drive up to a different SAS plug (as I believe there are 8.)  

  
The two plugs that I indicated are multi-lane SAS ports, which /require/ 
using a breakout cable;  don't worry - that the design for them. 
multi-lane means exactly that - several actual SAS connections in a 
single plug.  The other 6 ports next to them (in black) are SATA ports 
connected to the ICH9R.



--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with setting up ZFS

2009-07-27 Thread erik.ableson
The zfs send command generates a differential file between the two  
selected snapshots so you can send that to anything you'd like.  The  
catch of course is that then you have a collection of files on your  
Linux box that are pretty much useless since your can't mount them or  
read the contents in any meaningful way.  If you're running a Linux  
server as the destination the easiest solution is to create a virtual  
machine running the same revision of OpenSolaris as the server and use  
that as a destination.


It doesn't necessarily need a publicly exposed IP address - you can  
get the source to send the differential file to the Linux box and then  
have the VM import the file using a recv command to integrate the  
contents into a local ZFS filesystem. I think that VirtualBox lets you  
access shared folders so you could write a script to check for new  
files and then use the recv command to process them. The trick as  
always for this kind of thing is determining that the file is complete  
before attempting to import it.


There's some good examples in the ZFS Administration Guide (p187) for  
handling remote transfers.

zfs send tank/ci...@today | ssh newsys zfs recv sandbox/res...@today

For a staged approach you could pipe the output to a compressed file  
and send that over to the Linux box.


Combined with a key exchange between the two systems you don't need to  
keep passwords in your scripts either.


Cheers,

Erik

On 27 juil. 09, at 11:15, Brian wrote:

The ZFS send/receive command can presumably only send the filesystem  
to another OpenSolaris OS right?  Is there anyone way to send it to  
a normal Linux distribution (ext3)?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs destroy slow?

2009-07-27 Thread Markus Kovero
Hi, how come zfs destroy being so slow, eg. destroying 6TB dataset renders zfs 
admin commands useless for time being, in this case for hours?
(running osol 111b with latest patches.)

Yours
Markus Kovero
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs destroy slow?

2009-07-27 Thread Markus Kovero
Oh well, whole system seems to be deadlocked.
nice. Little too keen keeping data safe :-P

Yours
Markus Kovero

From: zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Markus Kovero
Sent: 27. heinäkuuta 2009 13:39
To: zfs-discuss@opensolaris.org
Subject: [zfs-discuss] zfs destroy slow?

Hi, how come zfs destroy being so slow, eg. destroying 6TB dataset renders zfs 
admin commands useless for time being, in this case for hours?
(running osol 111b with latest patches.)

Yours
Markus Kovero
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] sam-fs on zfs-pool

2009-07-27 Thread Tobias Exner

Hi list,

I've did some tests and run into a very strange situation..


I created a zvol using zfs create -V and initialize an sam-filesystem 
on this zvol.

After that I restored some testdata using a dump from another system.

So far so good.

After some big troubles I found out that releasing files in the 
sam-filesystem doesn't create space on the underlying zvol.
So staging and releasing files just work until the zfs list shows me a 
zvol with 100% usage although the sam-filesystem was only filled up to 20%.

I didn't create snapshots and a scrub did show any errors.

When the zvol was filled up even a sammkfs can't solve the problem. I 
had to destroy the zvol ( not zpool ).

After that I was able recreate a new zvol with sam-fs on top.


Is that a known behaviour? .. or did I run into a bug?


System:

SAM-FS 4.6.85
Solaris 10 U7 X86


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] When writing to SLOG at full speed all disk IO is blocked

2009-07-27 Thread Marcelo Leal
Hello,
 Well, i'm trying to understand this workload, but what i'm seeing to reproduce 
this is just flood the SSD with writes, and the disks show no activity. I'm 
testing with aggr (two links), and for one or two seconds there is no read 
activity (output from server). 
 Right now i'm suspecting something with the network, because i did some ZFS 
tuning, and seems like i'm not getting the 100% utilization on SSD, and the 
behaviour is still happening. I need to confirm this, and will share with you.
 Thanks for your reply.

 Leal
[ http://www.eall.com.br/blog ]
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] sharenfs question

2009-07-27 Thread Mark Shellenbaum

dick hoogendijk wrote:

# zfs create store/snaps
# zfs set sharenfs='rw=arwen,root=arwen' store/snaps
# share
-...@store/snaps   /store/snaps   sec=sys,rw=arwen,root=arwen 


arwen# zfs send -Rv rp...@0906  /net/westmark/store/snaps/rpool.0906
zsh: permission denied: /net/westmark/store/snaps/rpool.0906

*** BOTH systems have NFSMAPID DOMAIN=nagual.nl set in the
*** file /etc/default/nfs

The NFS docs mention that the rw option can be a node (like arwen).
But as you can see I get no access when I set rw=arwen.
And yet arwen is known!
This rule works:
#zfs set sharenfs='root=arwen' store/snaps
The snapshots are send from arwen to the remote machine and get the
root:root privileges. So that,s OK.
This rule does NOT work:
# zfs set sharenfs='rw=arwen,root=arwen' store/snaps
I get a permission denied. Apparently rw=arwen is nog reckognized.

Is something wrong in the syntax the way ZFS uses sharenfs?
Or have I misread the manual of share_nfs?
What can be wrong is the line zfs set sharenfs='rw=arwen,root=arwen'
store/snaps



I would suggest you open a bug on this.  You can use either bugzilla or 
bugster:


http://defect.opensolaris.org/bz/
http://bugs.opensolaris.org/

This issue does have some similarities with the following bug, but its 
different enough to warrant its own bug


http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6856710


  -Mark
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] sam-fs on zfs-pool

2009-07-27 Thread Dean Roehrich
On Mon, Jul 27, 2009 at 02:14:24PM +0200, Tobias Exner wrote:
 Hi list,
 
 I've did some tests and run into a very strange situation..
 
 
 I created a zvol using zfs create -V and initialize an sam-filesystem 
 on this zvol.
 After that I restored some testdata using a dump from another system.
 
 So far so good.
 
 After some big troubles I found out that releasing files in the 
 sam-filesystem doesn't create space on the underlying zvol.
 So staging and releasing files just work until the zfs list shows me a 
 zvol with 100% usage although the sam-filesystem was only filled up to 20%.
 I didn't create snapshots and a scrub did show any errors.

This is most likely QFS bug number 6837405.

Dean
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] sharenfs question

2009-07-27 Thread dick hoogendijk
On Mon, 27 Jul 2009 08:26:06 -0600
Mark Shellenbaum mark.shellenb...@sun.com wrote:

 I would suggest you open a bug on this.
 http://defect.opensolaris.org/bz/

Done. Bugzilla – Bug 10294 Submitted

-- 
Dick Hoogendijk -- PGP/GnuPG key: 01D2433D
+ http://nagual.nl/ | SunOS 10u7 05/09 | OpenSolaris 2010.02 B118
+ All that's really worth doing is what we do for others (Lewis Carrol)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-27 Thread Marcelo Leal
 That's only one element of it Bob.  ZFS also needs
 devices to fail quickly and in a predictable manner.
 
 A consumer grade hard disk could lock up your entire
 pool as it fails.  The kit Sun supply is more likely
 to fail in a manner ZFS can cope with.

 I agree 100%.
 Hardware, firmware, drivers, should be fully integrated to a mission critical 
app. With the wrong firmware, and consumer grade HD, disks failures stalls the 
entire pool. I have experience with disks failing and taking 2 or tree seconds 
to the system cope with (not just ZFS, but the controller, etc).

 Leal.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] When writing to SLOG at full speed all disk IO is blocked

2009-07-27 Thread Bob Friesenhahn

On Mon, 27 Jul 2009, Marcelo Leal wrote:

Well, i'm trying to understand this workload, but what i'm seeing to 
reproduce this is just flood the SSD with writes, and the disks show 
no activity. I'm testing with aggr (two links), and for one or two 
seconds there is no read activity (output from server). Right now


In other situations we have noticed that writes take priority over 
reads in ZFS.  When ZFS writes a TXG, reads go away for a little 
while. In most server situations, we do want writes to take 
precedence.  If synchronous writes don't take precedence then writers 
of important updates may be blocked due to many readers trying to 
access those important updates.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-27 Thread Ross
Heh, I'd kill for failures to be handled in 2 or 3 seconds.  I saw the failure 
of a mirrored iSCSI disk lock the entire pool for 3 minutes.  That has been 
addressed now, but device hangs have the potential to be *very* disruptive.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with setting up ZFS

2009-07-27 Thread Toby Thain


On 27-Jul-09, at 5:46 AM, erik.ableson wrote:

The zfs send command generates a differential file between the two  
selected snapshots so you can send that to anything you'd like.   
The catch of course is that then you have a collection of files on  
your Linux box that are pretty much useless since your can't mount  
them or read the contents in any meaningful way.  If you're running  
a Linux server as the destination the easiest solution is to create  
a virtual machine running the same revision of OpenSolaris as the  
server and use that as a destination.


It doesn't necessarily need a publicly exposed IP address - you can  
get the source to send the differential file to the Linux box and  
then have the VM import the file using a recv command to  
integrate the contents into a local ZFS filesystem. I think that  
VirtualBox lets you access shared folders so you could write a  
script to check for new files and then use the recv command to  
process them.


VirtualBox can forward a host port to a guest, so one can ssh from  
outside and process the stream directly. Also note Erik's public key  
idea below.


--Toby

The trick as always for this kind of thing is determining that the  
file is complete before attempting to import it.


There's some good examples in the ZFS Administration Guide (p187)  
for handling remote transfers.

zfs send tank/ci...@today | ssh newsys zfs recv sandbox/res...@today

For a staged approach you could pipe the output to a compressed  
file and send that over to the Linux box.


Combined with a key exchange between the two systems you don't need  
to keep passwords in your scripts either.


Cheers,

Erik

On 27 juil. 09, at 11:15, Brian wrote:

The ZFS send/receive command can presumably only send the  
filesystem to another OpenSolaris OS right?  Is there anyone way  
to send it to a normal Linux distribution (ext3)?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-27 Thread Eric D. Mudama

On Sun, Jul 26 at  1:47, David Magda wrote:


On Jul 25, 2009, at 16:30, Carson Gaspar wrote:


Frank Middleton wrote:

Doesn't this mean /any/ hardware might have this problem, albeit 
with much lower probability?


No. You'll lose unwritten data, but won't corrupt the pool, because 
the on-disk state will be sane, as long as your iSCSI stack doesn't 
lie about data commits or ignore cache flush commands.


But this entire thread started because Virtual Box's virtual disk /
did/ lie about data commits.


Why is this so difficult for people to understand?


Because most people make the (not unreasonable assumption) that disks 
save data the way that they're supposed to: that the data goes in is 
the data that comes out, and that when the OS tells them to empty the 
buffer that they actually flush it.


It's only us storage geeks that generally know the ugly truth that 
this assumption is not always true. :)


Can *someone* please name a single drive+firmware or RAID
controller+firmware that ignores FLUSH CACHE / FLUSH CACHE EXT
commands? Or worse, responds ok when the flush hasn't occurred?

Everyone on this list seems to blame lying hardware for ignoring
commands, but disks are relatively mature and I can't believe that
major OEMs would qualify disks or other hardware that willingly ignore
commands.

--eric

--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-27 Thread Thomas Burgess
i was under the impression it was virtualbox and it's default setting that
ignored the command, not the hard drive

On Mon, Jul 27, 2009 at 1:27 PM, Eric D. Mudama
edmud...@bounceswoosh.orgwrote:

 On Sun, Jul 26 at  1:47, David Magda wrote:


 On Jul 25, 2009, at 16:30, Carson Gaspar wrote:

  Frank Middleton wrote:

  Doesn't this mean /any/ hardware might have this problem, albeit with
 much lower probability?


 No. You'll lose unwritten data, but won't corrupt the pool, because the
 on-disk state will be sane, as long as your iSCSI stack doesn't lie about
 data commits or ignore cache flush commands.


 But this entire thread started because Virtual Box's virtual disk /
 did/ lie about data commits.

  Why is this so difficult for people to understand?


 Because most people make the (not unreasonable assumption) that disks save
 data the way that they're supposed to: that the data goes in is the data
 that comes out, and that when the OS tells them to empty the buffer that
 they actually flush it.

 It's only us storage geeks that generally know the ugly truth that this
 assumption is not always true. :)


 Can *someone* please name a single drive+firmware or RAID
 controller+firmware that ignores FLUSH CACHE / FLUSH CACHE EXT
 commands? Or worse, responds ok when the flush hasn't occurred?

 Everyone on this list seems to blame lying hardware for ignoring
 commands, but disks are relatively mature and I can't believe that
 major OEMs would qualify disks or other hardware that willingly ignore
 commands.

 --eric

 --
 Eric D. Mudama
 edmud...@mail.bounceswoosh.org


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-27 Thread Chris Ridd


On 27 Jul 2009, at 18:49, Thomas Burgess wrote:



i was under the impression it was virtualbox and it's default  
setting that ignored the command, not the hard drive


Do other virtualization products (eg VMware, Parallels, Virtual PC)  
have the same default behaviour as VirtualBox?


I've a suspicion they all behave similarly dangerously, but actual  
data would be useful.


Cheers,

Chris
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-27 Thread Adam Sherman

On 27-Jul-09, at 13:54 , Chris Ridd wrote:
i was under the impression it was virtualbox and it's default  
setting that ignored the command, not the hard drive


Do other virtualization products (eg VMware, Parallels, Virtual PC)  
have the same default behaviour as VirtualBox?


I've a suspicion they all behave similarly dangerously, but actual  
data would be useful.


Also, I think it may have already been posted, but I haven't found the  
option to disable VirtualBox' disk cache. Anyone have the incantation  
handy?


Thanks,

A

--
Adam Sherman
CTO, Versature Corp.
Tel: +1.877.498.3772 x113



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-27 Thread Mike Gerdts
On Mon, Jul 27, 2009 at 12:54 PM, Chris Riddchrisr...@mac.com wrote:

 On 27 Jul 2009, at 18:49, Thomas Burgess wrote:


 i was under the impression it was virtualbox and it's default setting that
 ignored the command, not the hard drive

 Do other virtualization products (eg VMware, Parallels, Virtual PC) have the
 same default behaviour as VirtualBox?

I've lost a pool due to LDoms doing the same.  This bug seems to be related.

http://bugs.opensolaris.org/view_bug.do?bug_id=6684721

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The importance of ECC RAM for ZFS

2009-07-27 Thread Marc Bevand
dick hoogendijk dick at nagual.nl writes:
 
 Than why is it that most AMD MoBo's in the shops clearly state that ECC
 Ram is not supported on the MoBo?

To restate what Erik explained: *all* AMD CPUs support ECC RAM, however poorly 
written motherboard specs often make the mistake of confusing non-ECC vs. ECC
with unbuffered vs. registered (these are 2 completely unrelated technical
characteristics). So, don't blindly trust manuals saying ECC RAM is not
supported.

-mrb

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-27 Thread David Magda
On Mon, July 27, 2009 13:59, Adam Sherman wrote:

 Also, I think it may have already been posted, but I haven't found the
 option to disable VirtualBox' disk cache. Anyone have the incantation
 handy?

http://forums.virtualbox.org/viewtopic.php?f=8t=13661start=0

It tells VB not to ignore the sync/flush command. Caching is still enabled
(it wasn't the problem).

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-27 Thread Frank Middleton

On 07/27/09 01:27 PM, Eric D. Mudama wrote:


Everyone on this list seems to blame lying hardware for ignoring
commands, but disks are relatively mature and I can't believe that
major OEMs would qualify disks or other hardware that willingly ignore
commands.


You are absolutely correct, but if the cache flush command never makes
it to the disk, then it won't see it. The contention is that by not
relaying the cache flush to the disk, VirtualBox caused the OP to lose
his pool.

IMO this argument is bogus because AFAIK the OP didn't actually power
his system down, so the data would still have been in the cache, and
presumably have eventually have been written. The out-of-order writes
theory is also somewhat dubious, since he was able to write 10TB without
VB relaying the cache flushes. This is all highly hardware dependant,
and AFAIK no one ever asked the OP what hardware he had, instead,
blasting him for running VB on MSWindows. Since IIRC he was using raw
disk access, it is questionable whether or not MS was to blame, but
in general it simply shouldn't be possible to lose a pool under
any conditions.

It does raise the question of what happens in general if a cache
flush doesn't happen if, for example, a system crashes in such a way
that it requires a power cycle to restart, and the cache never gets
flushed. Do disks with volatile caches attempt to flush the cache
by themselves if they detect power down? It seems that the ZFS team
recognizes this as a problem, hence the CR to address it.

It turns out that (at least on this almost 4 year old blog)
http://blogs.sun.com/perrin/entry/the_lumberjack that the ZILs
/are/ allocated recursively from the main pool.  Unless there is
a ZIL for the ZILs, ZFS really isn't fully journalled, and this
could be the real explanation for all lost pools and/or file
systems. It would be great to hear from the ZFS team that writing
a ZIL, presumably a transaction in it's own right, is protected
somehow (by a ZIL for the ZILs?).

Of course the ZIL isn't a journal in the traditional sense, and
AFAIK it has no undo capability the way that a DBMS usually has,
but it needs to be structured so that bizarre things that happen
when something as robust as Solaris crashes don't cause data loss.
The nightmare scenario is when one disk of a mirror begins to
fail and the system comes to a grinding halt where even stop-a
doesn't respond, and a power cycle is the only way out. Who
knows what writes may or may not have been issued or what the
state of the disk cache might be at such a time.

-- Frank

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] sam-fs on zfs-pool

2009-07-27 Thread Tobias Exner

Hi Dean,

may you provide more infos about that?

Are you able to send me a bug description for a better understanding?
Is there a patch available, or do I have to use a previous patch of sam-qfs?


Thanks in advance...


Tobias



Dean Roehrich schrieb:

On Mon, Jul 27, 2009 at 02:14:24PM +0200, Tobias Exner wrote:
  

Hi list,

I've did some tests and run into a very strange situation..


I created a zvol using zfs create -V and initialize an sam-filesystem 
on this zvol.

After that I restored some testdata using a dump from another system.

So far so good.

After some big troubles I found out that releasing files in the 
sam-filesystem doesn't create space on the underlying zvol.
So staging and releasing files just work until the zfs list shows me a 
zvol with 100% usage although the sam-filesystem was only filled up to 20%.

I didn't create snapshots and a scrub did show any errors.



This is most likely QFS bug number 6837405.

Dean


  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to check what's on a given ZFS file system ?

2009-07-27 Thread Axelle Apvrille
Thanks - that answers my question ! :))
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-27 Thread Richard Elling


On Jul 27, 2009, at 10:27 AM, Eric D. Mudama wrote:


On Sun, Jul 26 at  1:47, David Magda wrote:


On Jul 25, 2009, at 16:30, Carson Gaspar wrote:


Frank Middleton wrote:

Doesn't this mean /any/ hardware might have this problem, albeit  
with much lower probability?


No. You'll lose unwritten data, but won't corrupt the pool,  
because the on-disk state will be sane, as long as your iSCSI  
stack doesn't lie about data commits or ignore cache flush commands.


But this entire thread started because Virtual Box's virtual disk /
did/ lie about data commits.


Why is this so difficult for people to understand?


Because most people make the (not unreasonable assumption) that  
disks save data the way that they're supposed to: that the data  
goes in is the data that comes out, and that when the OS tells them  
to empty the buffer that they actually flush it.


It's only us storage geeks that generally know the ugly truth that  
this assumption is not always true. :)


Can *someone* please name a single drive+firmware or RAID
controller+firmware that ignores FLUSH CACHE / FLUSH CACHE EXT
commands? Or worse, responds ok when the flush hasn't occurred?


two seconds with google shows
http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=183771NewLang=enHilite=cache+flush

Give it up. These things happen.  Not much you can do about it, other
than design around it.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-27 Thread Adam Sherman

On 27-Jul-09, at 15:14 , David Magda wrote:
Also, I think it may have already been posted, but I haven't found  
the

option to disable VirtualBox' disk cache. Anyone have the incantation
handy?


http://forums.virtualbox.org/viewtopic.php?f=8t=13661start=0

It tells VB not to ignore the sync/flush command. Caching is still  
enabled

(it wasn't the problem).


Thanks!

As Russell points on in the last post to that thread, it doesn't seem  
possible to do this with virtual SATA disks? Odd.


A.

--
Adam Sherman
CTO, Versature Corp.
Tel: +1.877.498.3772 x113



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Subscribing broken?

2009-07-27 Thread Tim Cook
So it is broken then... because I'm on week 4 now, no responses to this thread, 
and I'm still not getting any emails.

Anyone from Sun still alive that can actually do something?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] deduplication

2009-07-27 Thread Tim Cook
buMP?  I watched the stream for several hours and never heard a word about 
dedupe.  The blogs also all seem to be completely bare of mention.  What's the 
deal?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Subscribing broken?

2009-07-27 Thread Cindy . Swearingen

Tim,

I sent your subscription problem to the OpenSolaris help list.

We should hear back soon.

Cindy

On 07/27/09 16:15, Tim Cook wrote:

So it is broken then... because I'm on week 4 now, no responses to this thread, 
and I'm still not getting any emails.

Anyone from Sun still alive that can actually do something?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Subscribing broken?

2009-07-27 Thread Cindy . Swearingen

Tim,

If you could send me your email address privately, the
OpenSolaris list folks have a better chance of resolving
this problem.

I promise I won't sell it to anyone. :-)

Cindy

On 07/27/09 16:25, cindy.swearin...@sun.com wrote:

Tim,

I sent your subscription problem to the OpenSolaris help list.

We should hear back soon.

Cindy

On 07/27/09 16:15, Tim Cook wrote:

So it is broken then... because I'm on week 4 now, no responses to 
this thread, and I'm still not getting any emails.


Anyone from Sun still alive that can actually do something?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [indiana-discuss] zfs issues?

2009-07-27 Thread James Lever


On 28/07/2009, at 6:44 AM, dick hoogendijk wrote:


Are there any known issues with zfs in OpenSolaris B118?
I run my pools formatted like the original release 2009.06 (I want to
be able to go back to it ;-). I'm a bit scared after reading about
serious issues in B119 (will be skipped, I heard). But B118 is safe?


Well, actually, I have an issue with ZFS under b118 on osol.

Under b117, I attached a second disk to my root pool and confirmed  
everything worked fine.  Rebooted with the disks in reverse order to  
prove grub install worked and everything was fine.  Removed one of the  
spindles, did an upgrade to b118, rebooted and tested and then  
rebooted and added the removed volume, this was an explicit test of  
automated resilvering and it worked perfectly.  Did one or two  
explicit scrubs along the way and they were fine too.


So then I upgrade my zpool from version 14 to version 16 and now zpool  
scrub rpool hangs the ZFS subsystem.  The machine still runs, it's  
pingable etc, but anything that goes to disk (at least rpool) hangs  
indefinitely.  This happens whether I boot with the mirror in tact or  
degraded with one spindle removed.


I had help trying to create a crash dump, but everything we tried  
didn't cause the system to panic.  0eip;:c;:c and other weird magic I  
don't fully grok


Has anybody else seen this weirdness?

cheers,
James

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can't offline a RAID-Z2 device: no valid replica

2009-07-27 Thread Cindy . Swearingen

Hi Laurent,

I was able to reproduce on it on a Solaris 10 5/09 system.
The problem is fixed in a current Nevada bits and also in
the upcoming Solaris 10 release.

The bug fix that integrated this change might be this one:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6328632
zpool offline is a bit too conservative

I can understand that you would want to offline a faulty
disk. In the meantime, you might use fmdump to help isolate
the transient error problems.

Thanks,

Cindy

On 07/20/09 08:36, Laurent Blume wrote:
Thanks a lot, Cindy! 


Let me know how it goes or if I can provide more info.
Part of the bad luck I've had with that set, is that it reports such errors 
about once a month, then everything goes back to normal again. So I'm pretty 
sure that I'll be able to try to offline the disk someday.

Laurent

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-27 Thread Nigel Smith
David Magda wrote:
 This is also (theoretically) why a drive purchased from Sun is more  
 that expensive then a drive purchased from your neighbourhood computer  
 shop: Sun (and presumably other manufacturers) takes the time and  
 effort to test things to make sure that when a drive says I've synced  
 the data, it actually has synced the data. This testing is what  
 you're presumably paying for.

So how do you test a hard drive to check it does actually sync the data?
How would you do it in theory?
And in practice?

Now say we are talking about a virtual hard drive,
rather than a physical hard drive.
How would that affect the answer to the above questions?

Thanks
Nigel
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [indiana-discuss] zfs issues?

2009-07-27 Thread Robert Thurlow

James Lever wrote:

I had help trying to create a crash dump, but everything we tried didn't 
cause the system to panic.  0eip;:c;:c and other weird magic I don't 
fully grok


I can't help with your ZFS issue, but to get a reasonable crash
dump in circumstances like these, you should be able to do
savecore -L on OpenSolaris.

Rob T
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with setting up ZFS

2009-07-27 Thread Trevor Pretty




Brian

This is a chunk of a script I wrote: To make it go to another machine
change the send/receive something like the other example below

Creates a copy of a zfs filesystem and mounts it on the local machine
(the "do_command" just made my demo self running).

scrubbing is easy just a cron entry!

Code Chunk 1

if [ -d $COPY_DIR ]; then
 echo "="
 echo "Make a copy of my current $HOME_DIR in $COPY_DIR"
 echo "="
 # Work out where $HOME_DIR and $COPY_DIR are located in ZFS
 #
 HOME_POOL=`/bin/df -h $HOME_DIR | grep $HOME_DIR | awk '{ print $1
}' | head -1`
 # This only works if /Backup is mounted and I now umount it so I
can always mount /Backup/home.
 # I had problems when I used the top dir as a filesystem when
reboot after an LU.
 #COPY_POOL=`/bin/df -h $COPY_DIR | grep $COPY_DIR | awk '{ print
$1 }' | head -1`
 COPY_POOL=`/usr/sbin/zfs list | grep $COPY_DIR | grep -v $HOME_DIR
| awk '{ print $1 }' | head -1`
 # Use zfs send and recieve
 # 
 # /usr/sbin/zfs destroy -fR $COPY_POOL$HOME_DIR # It can exist!
 /usr/sbin/zfs destroy -fR $home_p...@now 1/dev/null 21
#Just in case we aborted for some reason last time
 /usr/sbin/umount -f $COPY_DIR/$HOME_DIR 1/dev/null
21 # Just is case somebody is cd'ed to it
 sync
 usr/sbin/zfs snapshot $home_p...@now  \
 /usr/sbin/zfs send $home_p...@now | /usr/sbin/zfs receive -F
$COPY_POOL$HOME_DIR  \
 /usr/sbin/zfs destroy $home_p...@now
 /usr/sbin/zfs destroy $copy_pool$home_...@now
 /usr/sbin/zfs umount $COPY_POOL 1/dev/null 21 # It
should not be mounted
 /usr/sbin/zfs set mountpoint=none $COPY_POOL 
 /usr/sbin/zfs set mountpoint=$COPY_DIR$HOME_DIR
$COPY_POOL$HOME_DIR 
 /usr/sbin/zfs mount $COPY_POOL$HOME_DIR
 /usr/sbin/zfs set readonly=on $COPY_POOL$HOME_DIR 
 sync
 /bin/du -sk $COPY_DIR/$HOME  /tmp/email$$
fi


Code chunk 2

How I demoed send/recieve

# http://blogs.sun.com/timc/entry/ssh_cheat_sheet
#
# [r...@norton:] ssh-keygen -t rsa
#   no pass phrase
# [r...@norton:] cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
#
# Edit /etc/ssh/sshd_config, change line to
#  PermitRootLogin yes
# 
# [r...@norton:] svcadm restart ssh


## Lets send the snaphost to another pool ##

echo ""
echo ""
echo "Create a new pool and send the snaphot to it to back it up"
echo ""
echo "Note: The pool could be on a remote systems"
echo "I will simply use ssh to localhost"
echo ""
do_command zpool create backup_pool $DISK5
do_command zpool status backup_pool
press_return
# Note do_command does not work via the pipe so I will just use echo
# Need to setup ssh - see notes above
echo ""
echo ""
echo "-- zfs send sap_pool/PRD/sapda...@today | ssh localhost
zfs receive -F backup_pool/sapdata1"
echo ""
zfs send sap_pool/PRD/sapda...@today | ssh localhost zfs receive -F
backup_pool/sapdata1
do_command df -h /sapdata1
do_command df -h /backup_pool/sapdata1
echo ""
echo "Notice the backup is not compressed!"
echo ""
press_return
do_command ls -alR /backup_pool/sapdata1 | more

Brian wrote:

  Thank you, Ill definitely implement a script to scrub the system, and have the system email me if there is a problem.
  


-- 





Trevor
Pretty |
Technical Account Manager
|
+64
9 639 0652 |
+64
21 666 161
Eagle
Technology Group Ltd. 
Gate
D, Alexandra Park, Greenlane West, Epsom
Private Bag 93211,
Parnell, Auckland










www.eagle.co.nz
This email is confidential and may be legally 
privileged. If received in error please destroy and immediately notify 
us.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-27 Thread Toby Thain


On 27-Jul-09, at 3:44 PM, Frank Middleton wrote:


On 07/27/09 01:27 PM, Eric D. Mudama wrote:


Everyone on this list seems to blame lying hardware for ignoring
commands, but disks are relatively mature and I can't believe that
major OEMs would qualify disks or other hardware that willingly  
ignore

commands.


You are absolutely correct, but if the cache flush command never makes
it to the disk, then it won't see it. The contention is that by not
relaying the cache flush to the disk,


No - by COMPLETELY ignoring the flush.


VirtualBox caused the OP to lose
his pool.

IMO this argument is bogus because AFAIK the OP didn't actually power
his system down, so the data would still have been in the cache, and
presumably have eventually have been written. The out-of-order writes
theory is also somewhat dubious, since he was able to write 10TB  
without

VB relaying the cache flushes.


Huh? Of course he could. The guest didn't crash while he was doing it!

The corruption occurred when the guest crashed (iirc). And the out  
of order theory need not be the *only possible* explanation, but it  
*is* sufficient.



This is all highly hardware dependant,


Not in the least. It's a logical problem.


and AFAIK no one ever asked the OP what hardware he had, instead,
blasting him for running VB on MSWindows.


Which is certainly not relevant to my hypothesis of what broke. I  
don't care what host he is running. The argument is the same for all.



Since IIRC he was using raw
disk access, it is questionable whether or not MS was to blame, but
in general it simply shouldn't be possible to lose a pool under
any conditions.


How about when flushes are ignored?



It does raise the question of what happens in general if a cache
flush doesn't happen if, for example, a system crashes in such a way
that it requires a power cycle to restart, and the cache never gets
flushed.


Previous explanations have not dented your misunderstanding one iota.

The problem is not that an attempted flush did not complete. It was  
that any and all flushes *prior to crash* were ignored. This is where  
the failure mode diverges from real hardware.


Again, look:

A B C FLUSH D E F FLUSHCRASH

Note that it does not matter *at all* whether the 2nd flush  
completed. What matters from an integrity point of view is that the  
*previous* flush was completed (and synchronously). Visualise this on  
the two scenarios:


1) real hardware: (barring actual defects) that A,B,C were written  
was guaranteed by the first flush (otherwise D would never have been  
issued). Integrity of system is intact regardless of whether the 2nd  
flush completed.


2) VirtualBox: flush never happened. Integrity of system is lost, or  
at best unknown, if it depends on A,B,C all completing before D.




...

Of course the ZIL isn't a journal in the traditional sense, and
AFAIK it has no undo capability the way that a DBMS usually has,
but it needs to be structured so that bizarre things that happen
when something as robust as Solaris crashes don't cause data loss.


A lot of engineering effort has been expended in UFS and ZFS to  
achieve just that. Which is why it's so nutty to undermine that by  
violating semantics in lower layers.



The nightmare scenario is when one disk of a mirror begins to
fail and the system comes to a grinding halt where even stop-a
doesn't respond, and a power cycle is the only way out. Who
knows what writes may or may not have been issued or what the
state of the disk cache might be at such a time.


Again, if the flush semantics are respected*, this is not a problem.

--Toby

* - When this operation completes, previous writes are verifiably on  
durable media**.


** - Durable media meaning physical media in a bare metal  
environment, and potentially virtual media in a virtualised  
environment.





-- Frank

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [indiana-discuss] zfs issues?

2009-07-27 Thread James Lever


On 28/07/2009, at 9:22 AM, Robert Thurlow wrote:


I can't help with your ZFS issue, but to get a reasonable crash
dump in circumstances like these, you should be able to do
savecore -L on OpenSolaris.


That would be well and good if I could get a login - due to the rpool  
being unresponsive, that was not possible.


So the only recourse we had was via kmdb :/  Is there a way to  
explicitly invoke savecore via kmdb?


James

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] deduplication

2009-07-27 Thread James C. McPherson
On Mon, 27 Jul 2009 15:17:52 -0700 (PDT)
Tim Cook no-re...@opensolaris.org wrote:

 buMP?  I watched the stream for several hours and never heard a word
 about dedupe.  The blogs also all seem to be completely bare of mention.
 What's the deal?

ZFS Deduplication was most definitely talked about in both
Bill and Jeff's keynote as well as their roundtable discussion. 

We had some problems with audio and video quality during the
conference, I think that's had an impact on what we've been
able to put up for viewing. I will find out more and post when
I have some facts on it.


James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
Kernel Conference Australia - http://au.sun.com/sunnews/events/2009/kernel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool export taking hours

2009-07-27 Thread fyleow
I have a raidz1 tank of 5x 640 GB hard drives on my newly installed OpenSolaris 
2009.06 system. I did a zpool export tank and the process has been running for 
3 hours now taking up 100% CPU usage.

When I do a zfs list tank it's still shown as mounted. What's going on here? 
Should it really be taking this long?

$ zfs list tank
NAME   USED  AVAIL  REFER  MOUNTPOINT
tank  1.10T  1.19T  36.7K  /tank

$ zpool status tank
  pool: tank
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  raidz1ONLINE   0 0 0
c7t0d0  ONLINE   0 0 0
c7t1d0  ONLINE   0 0 0
c7t2d0  ONLINE   0 0 0
c7t3d0  ONLINE   0 0 0
c7t4d0  ONLINE   0 0 0

errors: No known data errors
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool export taking hours

2009-07-27 Thread George Wilson

fyleow wrote:

I have a raidz1 tank of 5x 640 GB hard drives on my newly installed OpenSolaris 
2009.06 system. I did a zpool export tank and the process has been running for 
3 hours now taking up 100% CPU usage.

When I do a zfs list tank it's still shown as mounted. What's going on here? 
Should it really be taking this long?

$ zfs list tank
NAME   USED  AVAIL  REFER  MOUNTPOINT
tank  1.10T  1.19T  36.7K  /tank

$ zpool status tank
  pool: tank
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  raidz1ONLINE   0 0 0
c7t0d0  ONLINE   0 0 0
c7t1d0  ONLINE   0 0 0
c7t2d0  ONLINE   0 0 0
c7t3d0  ONLINE   0 0 0
c7t4d0  ONLINE   0 0 0

errors: No known data errors


Can you run the following command and post the output:

# echo ::pgrep zpool | ::walk thread | ::findstack -v | mdb -k


Thanks,
George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss