Re: [zfs-discuss] NFS performance on ZFS vs UFS

2008-01-25 Thread Darren J Moffat
Tomas Ögren wrote:
 On 24 January, 2008 - Steve Hillman sent me these 1,9K bytes:
 
 I realize that this topic has been fairly well beaten to death on this 
 forum, but I've also read numerous comments from ZFS developers that they'd 
 like to hear about significantly different performance numbers of ZFS vs UFS 
 for NFS-exported filesystems, so here's one more.

 The server is an x4500 with 44 drives configured in a RAID10 zpool, and two 
 drives mirrored and formatted with UFS for the boot device. It's running 
 Solaris 10u4, patched with the Recommended Patch Set from late Dec/07. The 
 client (if it matters) is an older V20z w/ Solaris 10 3/05. No tuning has 
 been done on either box

 The test involved copying lots of small files (2-10k) from an NFS client to 
 a mounted NFS volume. A simple 'cp' was done, both with 1 thread and 4 
 parallel threads (to different directories) and then I monitored to see how 
 fast the files were accumulating on the server.

 ZFS:
 1 thread - 25 files/second; 4 threads - 25 files/second (~6 per thread)

 UFS: (same server, just exported /var from the boot volume)
 1 thread - 200 files/second; 4 threads - 520 files/second (~130/thread)
 
 To get similar (lower) consistency guarantees, try disabling ZIL..
 google://zil_disable .. This should up the speed, but might cause disk
 corruption if the server crashes while a client is writing data.. (just
 like with UFS)

Disabling the ZIL does NOT cause disk corruption.  It doesn't even cause 
  ZFS to be inconsistent on disk.  What it does to is mean that you 
onlonger have guaranteed synchronous write semantics - ie on crash an 
application might have done a synch write that never made it to stable 
storage.

BTW there isn't really any such think as disk corruption there is 
data corruption :-)

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Synchronous scrub?

2008-01-25 Thread Robert Milkowski
Hello Eric,

Wednesday, January 23, 2008, 7:21:42 PM, you wrote:

ES Sorry, no such feature exists.  We do generate sysevents for when
ES resilvers are completed, but not scrubs.  Adding those sysevents would
ES be an easy change, but doing anything more complicated (such as baking
ES that functionality into zpool(1M)) would be annoying.

zpool --wait so it doesn't exit till requested scrub is completed?
Shouldn't be that hard to implement (it should wait in user space so
it could be killed).


Best regards,
 Robert Milkowski   mailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance on ZFS vs UFS

2008-01-25 Thread Robert Milkowski
Hello Darren,



DJM BTW there isn't really any such think as disk corruption there is 
DJM data corruption :-)

Well, if you scratch it hard enough :)




-- 
Best regards,
 Robert Milkowski   mailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Order of operations w/ checksum errors

2008-01-25 Thread Kam
zpool status shows a few checksum errors against 1 device in a raidz1 3 disk 
array and no read or write errors against that device. The pool marked as 
degraded. Is there a difference if you clear the errors for the pool before you 
scrub versus scrubing then clearing the errors? I'm not sure if the clearing 
errors prior to a scrub will replicate out any bad blocks that were identified 
as checksum errors previously that had since been cleared.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] 2 servers, 1 ZFS filesystem and corruptions

2008-01-25 Thread Niksa Franceschi
Hi,

I have this setup:
2xSUN V440 servers with FC adapters, installed Solaris 10u4.

Both servers see one LUN on XP storage.
On that LUN is created ZFS filesystem (on server1).

If I export that ZFS filesystem on server1, I can import it on server2, and 
vice-versa.

If I have imported ZFS on server1 and try to import it on server2, it will fail 
(which is correct behavior).

However, if I export filesystem on server1, import it on server2 and reboot 
server1 - after reboot, server1 will import same ZFS filesystem that is at that 
point mounted on server2, and I get corruptions since both systems have same 
ZFS FS mounted at same time!

Is there any way to avoid such behavior - as this issue only arrizes at server 
reboot?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Order of operations w/ checksum errors

2008-01-25 Thread Robert Milkowski
Hello Kam,

Friday, January 25, 2008, 9:11:24 AM, you wrote:

K zpool status shows a few checksum errors against 1 device in a
K raidz1 3 disk array and no read or write errors against that
K device. The pool marked as degraded. Is there a difference if you
K clear the errors for the pool before you scrub versus scrubing then
K clearing the errors? I'm not sure if the clearing errors prior to a
K scrub will replicate out any bad blocks that were identified as
K checksum errors previously that had since been cleared.
K  

It doesn't mater - scrub won't replicate any errors.
Now if these errors were correctable errors they were corrected by zfs
at the same time it discovered them.

Could you post zpool satus output?

-- 
Best regards,
 Robert Milkowskimailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 2 servers, 1 ZFS filesystem and corruptions

2008-01-25 Thread Niksa Franceschi
Hi, 

pool wasn't exported.
server1 was rebooted (with ZFS on it).
During reboot ZFS (pool) was released, and I could import it on server2 (which 
I have done).
However, when server1 was booting up it imported pool and mounted ZFS 
filesystems even thou they were already imported and mounted on server2.

As I said, what is interesting - if both servers are up, I cannot import pool 
to other server, if it is on another.
However, if server is booted up it somehow avoids check if same pool is already 
imported on other server, which in the end leads to same pool being imported on 
both servers and corruption.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] sharenfs with over 10000 file systems

2008-01-25 Thread Richard L. Hamilton
 New, yes. Aware - probably not.
 
 Given cheap filesystems, users would create many
 filesystems was an easy guess, but I somehow don't
 think anybody envisioned that users would be creating
 tens of thousands of filesystems.
 
 ZFS - too good for it's own good :-p

IMO (and given mails/posts I've seen typically by people using
or wanting to use zfs at large universities and the like, for home
directories) this is frequently driven by the need for per-user
quotas.  Since zfs doesn't have per-uid quotas, this means they
end up creating (at least one) filesystem per user.  That means a
share per user, and locally a mount per user, which will never
scale as well as (locally) a single share of /export/home, and a
single mount (although there would of course be automounts to /home
on demand, but they wouldn't slow down bootup).  sharemgr and the
like may be attempts to improve the situation, but they mitigate rather
than eliminate the consequences of exploding what used to be a single
large filesystem into a bunch of relatively small ones, simply based on
the need to have per-user quotas with zfs.

And there are still situations where a per-uid quota would be useful,
such as /var/mail (although I could see that corrupting mailboxes
in some cases) or other sorts of shared directories.

OTOH, the implementation could certainly vary a little.  The
equivalent of the quotas file should be automatically created
when quotas are enabled, and invisible; and unless quotas are not
only disabled but purged somehow, it should maintain per-uid use
statistics even for uids with no quotas, to eliminate the need for
quotacheck (initialization of quotas might well be restricted to filesystem
creation time, to eliminate the need for a cumbersome pass through
existing data, at least at first; but that would probably be wanted too,
since people don't always plan ahead).  But other quota-related
functionality could IMO maintain, although the implementations
might have to get smarter, and there ought to be some alternative
to the method presently used with ufs of simply reading the
quotas file to iterate through the available stats.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 2 servers, 1 ZFS filesystem and corruptions

2008-01-25 Thread Robert Milkowski
Hello Niksa,

Friday, January 25, 2008, 9:27:17 AM, you wrote:

NF Hi,

NF I have this setup:
NF 2xSUN V440 servers with FC adapters, installed Solaris 10u4.

NF Both servers see one LUN on XP storage.
NF On that LUN is created ZFS filesystem (on server1).

NF If I export that ZFS filesystem on server1, I can import it on server2, and 
vice-versa.

NF If I have imported ZFS on server1 and try to import it on
NF server2, it will fail (which is correct behavior).

NF However, if I export filesystem on server1, import it on server2
NF and reboot server1 - after reboot, server1 will import same ZFS
NF filesystem that is at that point mounted on server2, and I get
NF corruptions since both systems have same ZFS FS mounted at same time!

NF Is there any way to avoid such behavior - as this issue only arrizes at 
server reboot?

If the pool was exported it shouldn't have imported it.
You sure you have actually exported it and not just unmounted?

Another useful feature for you possible is 'zpool import -R ..'


-- 
Best regards,
 Robert Milkowski   mailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 2 servers, 1 ZFS filesystem and corruptions

2008-01-25 Thread Victor Latushkin
Niksa Franceschi wrote:
 Hi, 
 
 pool wasn't exported.
 server1 was rebooted (with ZFS on it).
 During reboot ZFS (pool) was released, and I could import it on server2 
 (which I have done).
 However, when server1 was booting up it imported pool and mounted ZFS 
 filesystems even thou they were already imported and mounted on server2.
 
 As I said, what is interesting - if both servers are up, I cannot import pool 
 to other server, if it is on another.
 However, if server is booted up it somehow avoids check if same pool is 
 already imported on other server, which in the end leads to same pool being 
 imported on both servers and corruption.

You need fix for
   6282725 hostname/hostid should be stored in the label

It is available in latest Nevada bits, but not yet available in Solaris 
10 update.

For more information please see the following link:

http://blogs.sun.com/erickustarz/entry/poor_man_s_cluster_end

Hth,
Victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-25 Thread Christopher Gorski
Christopher Gorski wrote:
 unsorted/photosbackup/laptopd600/[D]/cag2b/eujpg/103-0398_IMG.JPG is a
 file that is always missing in the new tree.

Oops, I meant:

unsorted/drive-452a/[E]/drive/archives/seconddisk_20nov2002/eujpg/103-0398_IMG.JPG
is always missing in the new tree.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-25 Thread Christopher Gorski
Robert Milkowski wrote:
 Hello Christopher,
 
 Friday, January 25, 2008, 5:37:58 AM, you wrote:
 
 CG michael schuster wrote:
 I assume you've assured that there's enough space in /pond ...

 can you try

 $(cd pond/photos; tar cf - *) | (cd /pond/copytestsame; tar xf -)
 
 CG I tried it, and it worked.  The new tree is an exact copy of the old one.
 
 could you run your cp as 'truss -t open -o /tmp/cp.truss cp * '
 
 and then see if you can see all files being open for reads and check
 if they were successfully opened for writes?
 

I ran:

#truss -t open -o /tmp/cp.truss cp -pr * /pond/copytestsame/

Same result as with cp.  The same files are missing in the new tree.

unsorted/photosbackup/laptopd600/[D]/cag2b/eujpg/103-0398_IMG.JPG is a
file that is always missing in the new tree.

# ls
/pond/photos/unsorted/drive-452a/\[E\]/drive/archives/seconddisk_20nov2002/eujpg/103*
/pond/photos/unsorted/drive-452a/[E]/drive/archives/seconddisk_20nov2002/eujpg/103-0398_IMG.JPG
/pond/photos/unsorted/drive-452a/[E]/drive/archives/seconddisk_20nov2002/eujpg/103-0399_IMG.JPG
/pond/photos/unsorted/drive-452a/[E]/drive/archives/seconddisk_20nov2002/eujpg/103-0400_IMG.JPG
# ls
/pond/copytestsame/unsorted/drive-452a/\[E\]/drive/archives/seconddisk_20nov2002/eujpg/103*
/pond/copytestsame/unsorted/drive-452a/[E]/drive/archives/seconddisk_20nov2002/eujpg/103-0399_IMG.JPG
/pond/copytestsame/unsorted/drive-452a/[E]/drive/archives/seconddisk_20nov2002/eujpg/103-0400_IMG.JPG
# grep eujpg /tmp/cp.truss | grep 103 | grep seconddisk
open64(unsorted/drive-452a/[E]/drive/archives/seconddisk_20nov2002/eujpg/103-0399_IMG.JPG,
O_RDONLY) = 0
open64(unsorted/drive-452a/[E]/drive/archives/seconddisk_20nov2002/eujpg/103-0399_IMG.JPG,
O_RDONLY) = 0
open64(/pond/copytestsame//unsorted/drive-452a/[E]/drive/archives/seconddisk_20nov2002/eujpg/103-0399_IMG.JPG,
O_RDONLY) = 6
open64(unsorted/drive-452a/[E]/drive/archives/seconddisk_20nov2002/eujpg/103-0400_IMG.JPG,
O_RDONLY) = 0
open64(unsorted/drive-452a/[E]/drive/archives/seconddisk_20nov2002/eujpg/103-0400_IMG.JPG,
O_RDONLY) = 0
open64(/pond/copytestsame//unsorted/drive-452a/[E]/drive/archives/seconddisk_20nov2002/eujpg/103-0400_IMG.JPG,
O_RDONLY) = 6

The missing file does not seem to be in the truss output.

-Chris

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance on ZFS vs UFS

2008-01-25 Thread Joerg Schilling
Torrey McMahon [EMAIL PROTECTED] wrote:


 http://www.philohome.com/hammerhead/broken-disk.jpg :-)

Be careful, things like this can result in device corruption!

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-25 Thread cgorski
On Fri, 25 Jan 2008 15:18:36 -0500
  Tiernan, Daniel [EMAIL PROTECTED] 
wrote:
 
 You may have hit a cp and or shell bug due to the 
directory naming
 topology. Rather then depend on cp -r I prefer the cpio 
method:
 
 find * print|cpio -pdumv dest_path
 
 I'd try the find by itself to see if it yields the 
correct file list
 before piping into cpio...
 

I will look into this and Jörg's suggestion when I return 
to the machine on Monday.

-Chris
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-25 Thread Tiernan, Daniel
 
You may have hit a cp and or shell bug due to the directory naming
topology. Rather then depend on cp -r I prefer the cpio method:

find * print|cpio -pdumv dest_path

I'd try the find by itself to see if it yields the correct file list
before piping into cpio...


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Christopher
Gorski
Sent: Friday, January 25, 2008 12:52 PM
To: Robert Milkowski
Cc: zfs-discuss@opensolaris.org; michael schuster
Subject: Re: [zfs-discuss] missing files on copy

Christopher Gorski wrote:
 unsorted/photosbackup/laptopd600/[D]/cag2b/eujpg/103-0398_IMG.JPG is

 a file that is always missing in the new tree.

Oops, I meant:

unsorted/drive-452a/[E]/drive/archives/seconddisk_20nov2002/eujpg/103-0
398_IMG.JPG
is always missing in the new tree.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


==
Please access the attached hyperlink for an important electronic communications 
disclaimer: 

http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
==

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 2 servers, 1 ZFS filesystem and corruptions

2008-01-25 Thread eric kustarz

On Jan 25, 2008, at 6:06 AM, Niksa Franceschi wrote:

 Yes, the link explains quite well the issue we have.
 Only difference is that server1 can be manually rebooted, and while  
 it's still down I can mount ZFS pool on server2 even without -f  
 option, and yet server1 when booted up still mounts at same time.

 Just one questiong though.
 Is there any ETA when this patch may be available as official  
 Solaris 10 patch?

The current ETA is an early build of s10u6.  We hope to have patches  
available before the full update 6.

If you have a support contract, feel free to escalate.

eric

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] LowEnd Batt. backed raid controllers that will deal with ZFS commit semantics correctly?

2008-01-25 Thread Albert Chin
On Fri, Jan 25, 2008 at 12:59:18AM -0500, Kyle McDonald wrote:
 ... With the 256MB doing write caching, is there any further benefit
 to moving thte ZIL to a flash or other fast NV storage?

Do some tests with/without ZIL enabled. You should see a big
difference. You should see something equivalent to the performance of
ZIL disabled with ZIL/RAM. I'd do ZIL with a battery-backed RAM in a
heartbeat if I could find a card. I think others would as well.

-- 
albert chin ([EMAIL PROTECTED])
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] LowEnd Batt. backed raid controllers that will deal with ZFS commit semantics correctly?

2008-01-25 Thread Kyle McDonald
Albert Chin wrote:
 On Fri, Jan 25, 2008 at 12:59:18AM -0500, Kyle McDonald wrote:
   
 ... With the 256MB doing write caching, is there any further benefit
 to moving thte ZIL to a flash or other fast NV storage?
 

 Do some tests with/without ZIL enabled. You should see a big
 difference. You should see something equivalent to the performance of
 ZIL disabled with ZIL/RAM. I'd do ZIL with a battery-backed RAM in a
 heartbeat if I could find a card. I think others would as well.

   
I agree when your disk's are slow to place the changes in 'safe' storage.

My question is, with the ZIL on the main disks of the zPool, *and* those 
same disks write-cached by the battery-backed RAM on the RIAD 
controller, aren't the ZIL writes going to be (nearly?) just as fast as 
they would be to a dedicated NVRAM or FLASH device?

Granted the 256MB on the RAID controller may not be enough, and it's a 
shame to have to share it among all the writes to the disk, not just the 
ZIL writes, but it should still be a huge improvement. My question is 
just how close does it come to the dedicated ZIL device? 90%? 50%?

For that matter, considering *all* the writes that ZFS will do (in my 
case) will be to battery backed cache devices, is there still a risk to 
disabling the ZIL altogether?

 -Kyle





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 2 servers, 1 ZFS filesystem and corruptions

2008-01-25 Thread Niksa Franceschi
Yes, the link explains quite well the issue we have.
Only difference is that server1 can be manually rebooted, and while it's still 
down I can mount ZFS pool on server2 even without -f option, and yet server1 
when booted up still mounts at same time.

Just one questiong though.
Is there any ETA when this patch may be available as official Solaris 10 patch?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance on ZFS vs UFS

2008-01-25 Thread Torrey McMahon
Robert Milkowski wrote:
 Hello Darren,



 DJM BTW there isn't really any such think as disk corruption there is 
 DJM data corruption :-)

 Well, if you scratch it hard enough :)
   

http://www.philohome.com/hammerhead/broken-disk.jpg :-)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-25 Thread Joerg Schilling
Christopher Gorski [EMAIL PROTECTED] wrote:

  can you try
 
  $(cd pond/photos; tar cf - *) | (cd /pond/copytestsame; tar xf -)
  
  CG I tried it, and it worked.  The new tree is an exact copy of the old 
  one.
  
  could you run your cp as 'truss -t open -o /tmp/cp.truss cp * '
  
  and then see if you can see all files being open for reads and check
  if they were successfully opened for writes?
  

 I ran:

 #truss -t open -o /tmp/cp.truss cp -pr * /pond/copytestsame/

 Same result as with cp.  The same files are missing in the new tree.

 unsorted/photosbackup/laptopd600/[D]/cag2b/eujpg/103-0398_IMG.JPG is a
 file that is always missing in the new tree.
...

 The missing file does not seem to be in the truss output.

Do not expect to see anything useful when tracing open.

But check getdents(2) i.e. what gets called from readdir(3).
I recently got a star bug report from a FreeBSD guy that turned
out to be a result from a missing .. entry in a zfs snapshot
root dir.

Check the source of the failing program also...

I did recently spend a lot of time in fixing nasty bugs in the SCCS
source and it turned out that there have been places where the author
believed that . and .. are always returned by readdir(3) and that 
they are always returned first.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss