Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning

2010-02-23 Thread Frank Batschulat (Home)
update on this one: 

a workaround if you so will, or the more appropriate way to do this is 
apparently
to use lofiadm(1M) to create a pseudo block device comprising the file hosted 
on NFS
and use the created lofi device (eg. /dev/lofi/1) as the device for zpool create
and all subsequent I/O (this was not producing the strange CKSUM errors), eg.:

osoldev.root./export/home/batschul.= mount -F nfs opteron:/pool/zones /nfszone
osoldev.root./export/home/batschul.= mount -v| grep nfs
opteron:/pool/zones on /nfszone type nfs 
remote/read/write/setuid/devices/xattr/dev=9080001 on Tue Feb  9 10:37:00 2010
osoldev.root./export/home/batschul.= nfsstat -m
/nfszone from opteron:/pool/zones
 Flags: 
vers=4,proto=tcp,sec=sys,hard,intr,link,symlink,acl,rsize=1048576,wsize=1048576,retrans=5,timeo=600
 Attr cache:acregmin=3,acregmax=60,acdirmin=30,acdirmax=60

osoldev.root./export/home/batschul.=  mkfile -n 7G /nfszone/remote.file
osoldev.root./export/home/batschul.=  ls -la /nfszone
total 28243534
drwxrwxrwx   2 nobody   nobody 6 Feb  9 09:36 .
drwxr-xr-x  30 batschul other 32 Feb  8 22:24 ..
-rw---   1 nobody   nobody   7516192768 Feb  9 09:36 remote.file

osoldev.root./export/home/batschul.= lofiadm -a /nfszone/remote.file
/dev/lofi/1

osoldev.root./export/home/batschul.= lofiadm
Block Device File   Options
/dev/lofi/1  /nfszone/remote.file   -

osoldev.root./export/home/batschul.= zpool create -m /tank/zones/nfszone 
nfszone /dev/lofi/1

Feb  9 10:50:35 osoldev zfs: [ID 249136 kern.info] created version 22 pool 
nfszone using 22

osoldev.root./export/home/batschul.= zpool status -v nfszone
  pool: nfszone
 state: ONLINE
 scrub: none requested
config:

NAME   STATE READ WRITE CKSUM
nfszoneONLINE   0 0 0
  /dev/lofi/1  ONLINE   0 0 0

---
frankB
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning

2010-01-09 Thread Frank Batschulat (Home)
On Fri, 08 Jan 2010 18:33:06 +0100, Mike Gerdts mger...@gmail.com wrote:

 I've written a dtrace script to get the checksums on Solaris 10.
 Here's what I see with NFSv3 on Solaris 10.

jfyi, I've reproduces it as well using a Solaris 10 Update 8 SB2000 sparc client
and NFSv4.

much like you I also get READ errors along with the CKSUM errors which
is different from my observation on a ONNV client.

unfortunately your dtrace script did not worked for me, ie. it
did not spit out anything :(

cheers
frankB

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning

2010-01-08 Thread Frank Batschulat (Home)
On Wed, 23 Dec 2009 03:02:47 +0100, Mike Gerdts mger...@gmail.com wrote:

 I've been playing around with zones on NFS a bit and have run into
 what looks to be a pretty bad snag - ZFS keeps seeing read and/or
 checksum errors.  This exists with S10u8 and OpenSolaris dev build
 snv_129.  This is likely a blocker for anything thinking of
 implementing parts of Ed's Zones on Shared Storage:

 http://hub.opensolaris.org/bin/view/Community+Group+zones/zoss

 The OpenSolaris example appears below.  The order of events is:

 1) Create a file on NFS, turn it into a zpool
 2) Configure a zone with the pool as zonepath
 3) Install the zone, verify that the pool is healthy
 4) Boot the zone, observe that the pool is sick
[...]
 r...@soltrain19# zoneadm -z osol boot

 r...@soltrain19# zpool status osol
   pool: osol
  state: DEGRADED
 status: One or more devices has experienced an unrecoverable error.  An
 attempt was made to correct the error.  Applications are unaffected.
 action: Determine if the device needs to be replaced, and clear the errors
 using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
  scrub: none requested
 config:

 NAME  STATE READ WRITE CKSUM
 osol  DEGRADED 0 0 0
   /mnt/osolzone/root  DEGRADED 0 0   117  too many errors

 errors: No known data errors

Hey Mike, you're not the only victim of these strange CHKSUM errors, I hit
the same during my slightely different testing, where I'm NFS mounting an
entire, pre-existing remote file living in the zpool on the NFS server and use 
that to create a zpool and install zones into it.

I've filed today:

6915265 zpools on files (over NFS) accumulate CKSUM errors with no apparent 
reason

here's the relevant piece worth investigating out of it (leaving out the actual 
setup etc..)
as in your case, creating the zpool and installing the zone into it still gives
a healthy zpool, but immediately after booting the zone, the zpool served over 
NFS
accumulated CHKSUM errors.

of particular interest are the 'cksum_actual' values as reported by Mike for his
test case here:

http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg33041.html

if compared to the 'chksum_actual' values I got in the fmdump error output on 
my test case/system:

note, the NFS servers zpool that is serving and sharing the file we use is 
healthy.

zone halted now on my test system, and checking fmdump:

osoldev.batschul./export/home/batschul.= fmdump -eV | grep cksum_actual | sort 
| uniq -c | sort -n | tail
   2cksum_actual = 0x4bea1a77300 0xf6decb1097980 0x217874c80a8d9100 
0x7cd81ca72df5ccc0
   2cksum_actual = 0x5c1c805253 0x26fa7270d8d2 0xda52e2079fd74 
0x3d2827dd7ee4f21
   6cksum_actual = 0x28e08467900 0x479d57f76fc80 0x53bca4db5209300 
0x983ddbb8c4590e40
*A   6cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00 
0x89715e34fbf9cdc0
*B   7cksum_actual = 0x0 0x0 0x0 0x0
*C  11cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 
0x280934efa6d20f40
*D  14cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 
0x7e0aef335f0c7f00
*E  17cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 
0xd4f1025a8e66fe00
*F  20cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00 
0x7f84b11b3fc7f80
*G  25cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500 
0x82804bc6ebcfc0

osoldev.root./export/home/batschul.= zpool status -v
  pool: nfszone
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
nfszone DEGRADED 0 0 0
  /nfszone  DEGRADED 0 0   462  too many errors

errors: No known data errors

==

now compare this with Mike's error output as posted here:

http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg33041.html

# fmdump -eV | grep cksum_actual | sort | uniq -c | sort -n | tail

   2cksum_actual = 0x14c538b06b6 0x2bb571a06ddb0 0x3e05a7c4ac90c62 
0x290cbce13fc59dce
*D   3cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 
0x7e0aef335f0c7f00
*E   3cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 
0xd4f1025a8e66fe00
*B   4cksum_actual = 0x0 0x0 0x0 0x0
   4cksum_actual = 0x1d32a7b7b00 0x248deaf977d80 0x1e8ea26c8a2e900 
0x330107da7c4bcec0
   5cksum_actual = 0x14b8f7afe6 0x915db8d7f87 0x205dc7979ad73 
0x4e0b3a8747b8a8
*C   6cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 
0x280934efa6d20f40
*A   6

Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning

2010-01-08 Thread Frank Batschulat (Home)
On Fri, 08 Jan 2010 13:55:13 +0100, Darren J Moffat darr...@opensolaris.org 
wrote:

 Frank Batschulat (Home) wrote:
 This just can't be an accident, there must be some coincidence and thus 
 there's a good chance
 that these CHKSUM errors must have a common source, either in ZFS or in NFS ?

 What are you using for on the wire protection with NFS ?  Is it shared
 using krb5i or do you have IPsec configured ?  If not I'd recommend
 trying one of those and see if your symptoms change.

Hey Darren, doing krb5i is certainly a good idea for additional protection in 
general,
however I have some doubts that NFS OTW corruption will produce the exact same
wrong checksum inside 2 totally different setups and networks, as comparing
Mike and my results showed [see 1].

cheers
frankB

[1]

osoldev.batschul./export/home/batschul.= fmdump -eV | grep cksum_actual | sort 
| uniq -c | sort -n | tail
   2cksum_actual = 0x4bea1a77300 0xf6decb1097980 0x217874c80a8d9100 
0x7cd81ca72df5ccc0
   2cksum_actual = 0x5c1c805253 0x26fa7270d8d2 0xda52e2079fd74 
0x3d2827dd7ee4f21
   6cksum_actual = 0x28e08467900 0x479d57f76fc80 0x53bca4db5209300 
0x983ddbb8c4590e40
*A   6cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00 
0x89715e34fbf9cdc0
*B   7cksum_actual = 0x0 0x0 0x0 0x0
*C  11cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 
0x280934efa6d20f40
*D  14cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 
0x7e0aef335f0c7f00
*E  17cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 
0xd4f1025a8e66fe00
*F  20cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00 
0x7f84b11b3fc7f80
*G  25cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500 
0x82804bc6ebcfc0

==

now compare this with Mike's error output as posted here:

http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg33041.html

# fmdump -eV | grep cksum_actual | sort | uniq -c | sort -n | tail

   2cksum_actual = 0x14c538b06b6 0x2bb571a06ddb0 0x3e05a7c4ac90c62 
0x290cbce13fc59dce
*D   3cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 
0x7e0aef335f0c7f00
*E   3cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 
0xd4f1025a8e66fe00
*B   4cksum_actual = 0x0 0x0 0x0 0x0
   4cksum_actual = 0x1d32a7b7b00 0x248deaf977d80 0x1e8ea26c8a2e900 
0x330107da7c4bcec0
   5cksum_actual = 0x14b8f7afe6 0x915db8d7f87 0x205dc7979ad73 
0x4e0b3a8747b8a8
*C   6cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 
0x280934efa6d20f40
*A   6cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00 
0x89715e34fbf9cdc0
*F  16cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00 
0x7f84b11b3fc7f80
*G  48cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500 
0x82804bc6ebcfc0

and observe that the values in 'chksum_actual' causing our CHKSUM pool errors 
eventually
because of missmatching with what had been expected are the SAME ! for 2 totally
different client systems and 2 different NFS servers (mine vrs. Mike's),
see the entries marked with *A to *G.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [ufs-discuss] Re: [zfs-discuss] Differences between ZFS and UFS.

2007-01-02 Thread Frank Batschulat (Home)

On Sat, 30 Dec 2006 18:13:04 +0100, [EMAIL PROTECTED] wrote:




I think removing the ability to use link(2) or unlink(2) on directories
would hurt no-one and would make a few things easier.


I'd be rather carful here, see the standards implications drafted in
4917742.


The standard gives permission to disallow unlink() on directories:

The path argument shall not name a directory unless the process has
appropriate privileges and the implementation supports using unlink()
on directories.

The ZFS implementation disallows it.


I'd be more then happy to see this feature disappearing from UFS as well.

---
frankB
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [ufs-discuss] Re: [zfs-discuss] Differences between ZFS and UFS.

2006-12-30 Thread Frank Batschulat (Home)

On Sat, 30 Dec 2006 15:50:53 +0100, [EMAIL PROTECTED] wrote:




Link with the target being a directory and the source a any file or
only directories?  And only as superuer?


I'm sorry, I ment unlink(2) here.


Ah, so symmetrical with link(2) to directories.

unlink(2) doesn't always work and rmdir(2) will not remove empty  
directories

with a link count other than 2 (for . and ..)

You can't unlink . or ..; though in early days it was what was used
to create directories.  (mknod dir , link(dir, dir/.) , link(., dir/..))

I think removing the ability to use link(2) or unlink(2) on directories
would hurt no-one and would make a few things easier.


I'd be rather carful here, see the standards implications drafted in  
4917742.


---
frankB
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: [ufs-discuss] Differences between ZFS and UFS.

2006-12-30 Thread Frank Batschulat (Home)
On Sat, 30 Dec 2006 02:28:49 +0100, Pawel Jakub Dawidek [EMAIL PROTECTED]  
wrote:



Hi.

Here are some things my file system test suite discovered on Solaris ZFS
and UFS.

Bascially ZFS pass all my tests (about 3000). I see one problem with UFS
and two differences:

1. link(2) manual page states that privileged processes can make
   multiple links to a directory. This looks like a general comment, but
   it's only true for UFS.

2. link(2) in UFS allows to remove directories, but doesn't allow this
   in ZFS.


both are actually correct, the standards wording permits either way:

snip link(2)/unlink(2) POSIX IEEE Std 1003.1, 2004 Edition
If path1 names a directory, link() shall fail unless the process has  
appropriate

privileges and the implementation supports using link() on directories.

The path argument shall not name a directory unless the process has  
appropriate

privileges and the implementation supports using unlink() on directories.
snip end

UFS does support link(2)/unlink(2) with appropriate privilidges of  
directories

while ZFS does not.

However, it gets interesting when SVID3 comes into play:

snip
  The link(BA_OS) and unlink(BA_OS) descriptions in SVID3 both specify that
  a process with appropriate privileges is allowed to operate on a  
directory.
  We have claimed to conform to SVID3 since Solaris 2.0 and have not  
announced

  that we ever plan to EOL SVID3 conformance.
snip end


3. Unsuccessful link(2) can update file's ctime:

# fstest mkdir foo 0755
# fstest create foo/bar 0644
# fstest chown foo/bar 65534 -1
# ctime1=`fstest stat foo/bar ctime`
# sleep 1
	# fstest -u 65534 link foo/bar foo/baz   --- this unsuccessful  
operation updates ctime

EACCES
# ctime2=`fstest stat ${n0} ctime`
# echo $ctime1 $ctime2
1167440797 1167440798



I'd call this a bug, although the standard does not presicely mentions  
this,

but at least it sais:

snip link(2) POSIX IEEE Std 1003.1, 2004 Edition
Upon successful completion, link() shall mark for update the st_ctime  
field of the file.
Also, the st_ctime and st_mtime fields of the directory that contains the  
new entry shall be marked for update.


If link() fails, no link shall be created and the link count of the file  
shall remain unchanged.

snip end

Since we fail the successfull completion part and we should stay away from  
modifying
the link count in that case I think it makes a good argument to also  
require us to

stay away from updating the ctime as well.

--
frankB

(I'd rather be a forest than a street)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

2006-10-09 Thread Frank Batschulat (Home)

On Tue, 10 Oct 2006 01:25:36 +0200, Roch [EMAIL PROTECTED] wrote:


You tell me ? We have 2 issues

can we make 'tar x' over direct attach, safe (fsync)
and posix compliant while staying close to current
performance characteristics ? In other words do we
have the posix leeway to extract files in parallel ?


why fsync(3C) ? it is usually more heavy weight then
opening the file with O_SYNC - and both provide
POSIX synchronized  file integrity completion.

---
frankB
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss