Re: [zfs-discuss] missing files on copy

2008-01-30 Thread Will Murnane
On Jan 30, 2008 1:34 AM, Carson Gaspar [EMAIL PROTECTED] wrote:
 If this is Sun's cp, file a bug. It's failing to notice that it didn't
 provide a large enough buffer to getdents(), so it only got partial results.

 Of course, the getdents() API is rather unfortunate. It appears the only
 safe algorithm is:

 while ((r = getdents(...))  0) {
 /* process results */
 }
 if (r  0) {
 /* handle error */
 }

 You _always_ have to call it at least twice to be sure you've gotten
 everything.
In OpenSolaris, cp uses (indirectly) readdir(), not raw getdents().
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libcmd/common/cp.c#487
which uses the build-a-linked-list code here:
http://src.opensolaris.org/source/xref/sfw/usr/src/cmd/coreutils/coreutils-6.7/lib/fts.c#913
That code appears to error out and return incomplete results if a) the
filename is too long or b) an integer overflows.  Christopher's
filenames are only 96 chars; could Unicode be involved somehow?  b)
seems unlikely in the extreme.  It still seems like a bug, but I don't
see where it is.  I am only an egg ;-)

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-30 Thread Casper . Dik

Christopher Gorski wrote:

 I noticed that the first calls in the cp and ls to getdents() return 
 similar file lists, with the same values.
 
 However, in the ls, it makes a second call to getdents():

If this is Sun's cp, file a bug. It's failing to notice that it didn't 
provide a large enough buffer to getdents(), so it only got partial results.

cp doesn't use getdents() but it uses readdir() instead; the whole 
buffer is hidden to it.

Of course, the getdents() API is rather unfortunate. It appears the only 
safe algorithm is:

while ((r = getdents(...))  0) {
   /* process results */
}
if (r  0) {
   /* handle error */
}

You _always_ have to call it at least twice to be sure you've gotten 
everything.


That's why you never use getdents but rather readdir() which hides this 
for you.

It appears that the off_t of the directory entries in the particular 
second read is  2^32; so perhaps a cp which hasn't been compiled with
handle large files is being used?

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-30 Thread Casper . Dik


That code appears to error out and return incomplete results if a) the
filename is too long or b) an integer overflows.  Christopher's
filenames are only 96 chars; could Unicode be involved somehow?  b)
seems unlikely in the extreme.  It still seems like a bug, but I don't
see where it is.  I am only an egg ;-)


And ls would fail in the same manner.


There's one piece of code in cp (see usr/src/cmd/mv/mv.c) which 
short-circuits a readdir-loop:

while ((dp = readdir(srcdirp)) != NULL) {
int ret;

if ((ret = traverse_attrfile(dp, source, target, 1)) == -1)
continue;
else if (ret  0) {
++error;
goto out;
}


This is strange to me because all other failures result in cp going
over to the next file.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL controls in Solaris 10 U4?

2008-01-30 Thread Roch - PAE

Jonathan Loran writes:
  
  Is it true that Solaris 10 u4 does not have any of the nice ZIL controls 
  that exist in the various recent Open Solaris flavors?  I would like to 
  move my ZIL to solid state storage, but I fear I can't do it until I 
  have another update.  Heck, I would be happy to just be able to turn the 
  ZIL off to see how my NFS on ZFS performance is effected before spending 
  the $'s.  Anyone know when will we see this in Solaris 10?
  

You can certainly turn it off with any release (Jim's link).

It's true that S10u4 does not have the Separate Intent Log 
to allow using an SSD for ZIL blocks. I believe S10U5 will
have that feature.

As  noted,   disabling  the  ZIL  won't   lead to   ZFS pool
corruption,   just  DBcorruption(that includes   NFS
clients). To protect against that, in  the event of a server
crash  with zil_disable=1,  you'd  need  to reboot  all  NFS
clients of the server (clear the client's caches) and better
do this before the   server comes back  up  (kind of a   raw
proposition here).

-r


  Thanks,
  
  Jon
  
  -- 
  
  
  - _/ _/  /   - Jonathan Loran -   -
  -/  /   /IT Manager   -
  -  _  /   _  / / Space Sciences Laboratory, UC Berkeley
  -/  / /  (510) 643-5146 [EMAIL PROTECTED]
  - __/__/__/   AST:7731^29u18e3
   
  
  
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-30 Thread Joerg Schilling
Will Murnane [EMAIL PROTECTED] wrote:

 On Jan 30, 2008 1:34 AM, Carson Gaspar [EMAIL PROTECTED] wrote:
  If this is Sun's cp, file a bug. It's failing to notice that it didn't
  provide a large enough buffer to getdents(), so it only got partial results.
 
  Of course, the getdents() API is rather unfortunate. It appears the only
  safe algorithm is:
 
  while ((r = getdents(...))  0) {
  /* process results */
  }
  if (r  0) {
  /* handle error */
  }
 
  You _always_ have to call it at least twice to be sure you've gotten
  everything.
 In OpenSolaris, cp uses (indirectly) readdir(), not raw getdents().
 http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libcmd/common/cp.c#487
 which uses the build-a-linked-list code here:
 http://src.opensolaris.org/source/xref/sfw/usr/src/cmd/coreutils/coreutils-6.7/lib/fts.c#913
 That code appears to error out and return incomplete results if a) the
 filename is too long or b) an integer overflows.  Christopher's
 filenames are only 96 chars; could Unicode be involved somehow?  b)
 seems unlikely in the extreme.  It still seems like a bug, but I don't
 see where it is.  I am only an egg ;-)

An interesting thought

We of course need to know whether the user used /bin/cp or a shadow 
implementation from ksh93.

I did never see any problems with star(1) and star(1)/libfind(3) are heavy
readdir(3) users...

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-30 Thread Joerg Schilling
Christopher Gorski [EMAIL PROTECTED] wrote:

  Of course, the getdents() API is rather unfortunate. It appears the only 
  safe algorithm is:
  
  while ((r = getdents(...))  0) {
  /* process results */
  }
  if (r  0) {
  /* handle error */
  }
  
  You _always_ have to call it at least twice to be sure you've gotten 
  everything.
  

 Yes, it is Sun's cp.  I'm trying, with some difficulty, to figure out 
 exactly how to reproduce this error in a way not specific to my data.  I 
 copied a set of randomly generated files with a deep directory structure 
 and cp seems to correctly call getdents() multiple times.

Note that cp (mv) does not call getdents() directly but readdir().

If there is a problem, it is most likely in readdir() and it really looks 
strangee that ls(1) (although it uses the same implementaion) works for you.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-30 Thread Joerg Schilling
[EMAIL PROTECTED] wrote:

 And ls would fail in the same manner.


 There's one piece of code in cp (see usr/src/cmd/mv/mv.c) which 
 short-circuits a readdir-loop:

 while ((dp = readdir(srcdirp)) != NULL) {
 int ret;

 if ((ret = traverse_attrfile(dp, source, target, 1)) == -1)
 continue;
 else if (ret  0) {
 ++error;
 goto out;
 }


 This is strange to me because all other failures result in cp going
 over to the next file.

traverse_attrfile() returns -1 only for:

if ((dp-d_name[0] == '.'  dp-d_name[1] == '\0') || 
(dp-d_name[0] == '.'  dp-d_name[1] == '.'  
dp-d_name[2] == '\0') || 
(sysattr_type(dp-d_name) == _RO_SATTR) || 
(sysattr_type(dp-d_name) == _RW_SATTR)) 
return (-1); 

So this primarily skips '.' and '..'.

The rest seems to check for DOS extensions in extended attributes.

 but this is only done to copy attributes and not files.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-30 Thread Robert Milkowski
Hello Christopher,

Wednesday, January 30, 2008, 7:27:01 AM, you wrote:

CG Carson Gaspar wrote:
 Christopher Gorski wrote:
 
 I noticed that the first calls in the cp and ls to getdents() return 
 similar file lists, with the same values.

 However, in the ls, it makes a second call to getdents():
 
 If this is Sun's cp, file a bug. It's failing to notice that it didn't 
 provide a large enough buffer to getdents(), so it only got partial results.
 
 Of course, the getdents() API is rather unfortunate. It appears the only 
 safe algorithm is:
 
 while ((r = getdents(...))  0) {
   /* process results */
 }
 if (r  0) {
   /* handle error */
 }
 
 You _always_ have to call it at least twice to be sure you've gotten 
 everything.
 

CG Yes, it is Sun's cp.  I'm trying, with some difficulty, to figure out 
CG exactly how to reproduce this error in a way not specific to my data.  I
CG copied a set of randomly generated files with a deep directory structure
CG and cp seems to correctly call getdents() multiple times.

If you could re-create empty files - exactly the same directory
atructure and file names, check if you still got a problem.
If you do, then if you could send a script here (mkdir's -p and touch)
so we can investigate.

Assuming your file names and directory structure can be made public.

-- 
Best regards,
 Robert Milkowskimailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-30 Thread Christopher Gorski
Joerg Schilling wrote:
 Will Murnane [EMAIL PROTECTED] wrote:
 
 On Jan 30, 2008 1:34 AM, Carson Gaspar [EMAIL PROTECTED] wrote:
 If this is Sun's cp, file a bug. It's failing to notice that it didn't
 provide a large enough buffer to getdents(), so it only got partial results.

 Of course, the getdents() API is rather unfortunate. It appears the only
 safe algorithm is:

 while ((r = getdents(...))  0) {
 /* process results */
 }
 if (r  0) {
 /* handle error */
 }

 You _always_ have to call it at least twice to be sure you've gotten
 everything.
 In OpenSolaris, cp uses (indirectly) readdir(), not raw getdents().
 http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libcmd/common/cp.c#487
 which uses the build-a-linked-list code here:
 http://src.opensolaris.org/source/xref/sfw/usr/src/cmd/coreutils/coreutils-6.7/lib/fts.c#913
 That code appears to error out and return incomplete results if a) the
 filename is too long or b) an integer overflows.  Christopher's
 filenames are only 96 chars; could Unicode be involved somehow?  b)
 seems unlikely in the extreme.  It still seems like a bug, but I don't
 see where it is.  I am only an egg ;-)
 
 An interesting thought
 
 We of course need to know whether the user used /bin/cp or a shadow 
 implementation from ksh93.
 
 I did never see any problems with star(1) and star(1)/libfind(3) are heavy
 readdir(3) users...
 
 Jörg
 

I am able to replicate the problem in bash using:
#truss -tall -vall -o /tmp/getdents.bin.cp.truss /bin/cp -pr
/pond/photos/* /pond/copytestsame/

So I'm assuming that's using /bin/cp

Also, from my _very limited_ investigation this morning, it seems that
#grep Err /tmp/getdents.bin.cp.truss | grep -v ENOENT | grep getdents

returns entries such as:
getdents64(0, 0xFEC92000, 8192) Err#9 EBADF
getdents64(0, 0xFEC92000, 8192) Err#9 EBADF
getdents64(0, 0xFEC92000, 8192) Err#9 EBADF
getdents64(0, 0xFEE34000, 8192) Err#9 EBADF
...(truncated)

whereas it seems like with a copy where everything is transferred
correctly that the above statement returns no getdents64() with an EBADF
error, leading me to believe that somewhere along the line getdents64 is
attempted to be called but that the descriptor is invalidated somehow.
Again...I am only gleaming that from a very limited test.

-Chris

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-30 Thread Joerg Schilling
Christopher Gorski [EMAIL PROTECTED] wrote:

 I am able to replicate the problem in bash using:
 #truss -tall -vall -o /tmp/getdents.bin.cp.truss /bin/cp -pr
 /pond/photos/* /pond/copytestsame/

 So I'm assuming that's using /bin/cp

 Also, from my _very limited_ investigation this morning, it seems that
 #grep Err /tmp/getdents.bin.cp.truss | grep -v ENOENT | grep getdents

 returns entries such as:
 getdents64(0, 0xFEC92000, 8192) Err#9 EBADF
 getdents64(0, 0xFEC92000, 8192) Err#9 EBADF
 getdents64(0, 0xFEC92000, 8192) Err#9 EBADF
 getdents64(0, 0xFEE34000, 8192) Err#9 EBADF
 ...(truncated)

If you get this, you may need to provide the full truss output
to allow to understand what'ts happening.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-30 Thread Joerg Schilling
Robert Milkowski [EMAIL PROTECTED] wrote:

 If you could re-create empty files - exactly the same directory
 atructure and file names, check if you still got a problem.
 If you do, then if you could send a script here (mkdir's -p and touch)
 so we can investigate.

If you like to replicate a long directory structure with empty files,
you can use star:

star -c -meta f=/tmp/x.tar -C dir .

and later:

star -xp -xmeta f=/tmp/x.tar 

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS configuration for a thumper

2008-01-30 Thread Albert Shih
Hi all

I've a Sun X4500 with 48 disk of 750Go

The server come with Solaris install on two disk. That's mean I've got 46
disk for ZFS.

When I look the defautl configuration of the zpool 

zpool create -f zpool1 raidz c0t0d0 c1t0d0 c4t0d0 c6t0d0 c7t0d0
zpool add -f zpool1 raidz c0t1d0 c1t1d0 c4t1d0 c5t1d0 c6t1d0 c7t1d0
zpool add -f zpool1 raidz c0t2d0 c1t2d0 c4t2d0 c5t2d0 c6t2d0 c7t2d0
zpool add -f zpool1 raidz c0t3d0 c1t3d0 c4t3d0 c5t3d0 c6t3d0 c7t3d0
zpool add -f zpool1 raidz c0t4d0 c1t4d0 c4t4d0 c6t4d0 c7t4d0
zpool add -f zpool1 raidz c0t5d0 c1t5d0 c4t5d0 c5t5d0 c6t5d0 c7t5d0
zpool add -f zpool1 raidz c0t6d0 c1t6d0 c4t6d0 c5t6d0 c6t6d0 c7t6d0
zpool add -f zpool1 raidz c0t7d0 c1t7d0 c4t7d0 c5t7d0 c6t7d0 c7t7d0

that's mean there'are pool with 5 disk and other with 6 disk.

When I want to do the same I've got this message :

mismatched replication level: pool uses 5-way raidz and new vdev uses 6-way 
raidz

I can force this with «-f» option.

But what's that mean (sorry if the question is stupid). 

What's kind of pool you use with 46 disk ? (46=2*23 and 23 is prime number
that's mean I can make raidz with 6 or 7 or any number of disk).

Regards.

--
Albert SHIH
Observatoire de Paris Meudon
SIO batiment 15
Heure local/Local time:
Mer 30 jan 2008 16:36:49 CET
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-30 Thread Casper . Dik


Also, from my _very limited_ investigation this morning, it seems tha=
t
#grep Err /tmp/getdents.bin.cp.truss | grep -v ENOENT | grep getdents

returns entries such as:
getdents64(0, 0xFEC92000, 8192) Err#9 EBADF
getdents64(0, 0xFEC92000, 8192) Err#9 EBADF
getdents64(0, 0xFEC92000, 8192) Err#9 EBADF
getdents64(0, 0xFEE34000, 8192) Err#9 EBADF
=2E..(truncated)

Ah, this looks like someone closed stdin and then something weird
happened.  Hm.



We need full truss out, specifically of all calls which return or
release filedescriptors.

The plot thickens.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration for a thumper

2008-01-30 Thread Albert Shih
 Le 30/01/2008 à 11:01:35-0500, Kyle McDonald a écrit
 Albert Shih wrote:
 What's kind of pool you use with 46 disk ? (46=2*23 and 23 is prime number
 that's mean I can make raidz with 6 or 7 or any number of disk).
 
   
 Depending on needs for space vs. performance, I'd probably pixk eithr 5*9 
 or 9*5,  with 1 hot spare.

Thanks for the tips...

How you can check the speed (I'm totally newbie on Solaris)

I've use 

mkfile 10g

for write and I've got same perf with 5*9 or 9*5.

Have you some advice about tool like iozone ? 

Regards.

--
Albert SHIH
Observatoire de Paris Meudon
SIO batiment 15
Heure local/Local time:
Mer 30 jan 2008 17:10:55 CET
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration for a thumper

2008-01-30 Thread Tim
On 1/30/08, Albert Shih [EMAIL PROTECTED] wrote:


Thanks for the tips...

 How you can check the speed (I'm totally newbie on Solaris)

 I've use

 mkfile 10g

 for write and I've got same perf with 5*9 or 9*5.

 Have you some advice about tool like iozone ?

 Regards.

 --
 Albert SHIH
 Observatoire de Paris Meudon
 SIO batiment 15
 Heure local/Local time:
 Mer 30 jan 2008 17:10:55 CET
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss





I'd take a look at bonnie++

http://www.sunfreeware.com/programlistintel10.html#bonnie++


--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration for a thumper

2008-01-30 Thread Kyle McDonald
Albert Shih wrote:
 What's kind of pool you use with 46 disk ? (46=2*23 and 23 is prime number
 that's mean I can make raidz with 6 or 7 or any number of disk).

   
Depending on needs for space vs. performance, I'd probably pixk eithr 
5*9 or 9*5,  with 1 hot spare.

   -Kyle

 Regards.

 --
 Albert SHIH
 Observatoire de Paris Meudon
 SIO batiment 15
 Heure local/Local time:
 Mer 30 jan 2008 16:36:49 CET
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-30 Thread Robert Milkowski
Hello Joerg,

Wednesday, January 30, 2008, 2:56:27 PM, you wrote:

JS Robert Milkowski [EMAIL PROTECTED] wrote:

 If you could re-create empty files - exactly the same directory
 atructure and file names, check if you still got a problem.
 If you do, then if you could send a script here (mkdir's -p and touch)
 so we can investigate.

JS If you like to replicate a long directory structure with empty files,
JS you can use star:

JS star -c -meta f=/tmp/x.tar -C dir .

JS and later:

JS star -xp -xmeta f=/tmp/x.tar 


It really is a swiss knife :)
That's a handy one (although it's a first time I actually have seen a
need for such functionality).


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing files on copy

2008-01-30 Thread Joerg Schilling
[EMAIL PROTECTED] wrote:



 Also, from my _very limited_ investigation this morning, it seems tha=
 t
 #grep Err /tmp/getdents.bin.cp.truss | grep -v ENOENT | grep getdents
 
 returns entries such as:
 getdents64(0, 0xFEC92000, 8192) Err#9 EBADF
 getdents64(0, 0xFEC92000, 8192) Err#9 EBADF
 getdents64(0, 0xFEC92000, 8192) Err#9 EBADF
 getdents64(0, 0xFEE34000, 8192) Err#9 EBADF
 =2E..(truncated)

 Ah, this looks like someone closed stdin and then something weird
 happened.  Hm.

stdin is usually not a directory ;-9

This looks much more weird

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL controls in Solaris 10 U4?

2008-01-30 Thread Neil Perrin


Roch - PAE wrote:
 Jonathan Loran writes:
   
   Is it true that Solaris 10 u4 does not have any of the nice ZIL controls 
   that exist in the various recent Open Solaris flavors?  I would like to 
   move my ZIL to solid state storage, but I fear I can't do it until I 
   have another update.  Heck, I would be happy to just be able to turn the 
   ZIL off to see how my NFS on ZFS performance is effected before spending 
   the $'s.  Anyone know when will we see this in Solaris 10?
   
 
 You can certainly turn it off with any release (Jim's link).
 
 It's true that S10u4 does not have the Separate Intent Log 
 to allow using an SSD for ZIL blocks. I believe S10U5 will
 have that feature.

Unfortunately it will not. A lot of ZFS fixes and features
that had existed for a while will not be in U5 (for reasons I
can't go into here). They should be in S10U6...

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL controls in Solaris 10 U4?

2008-01-30 Thread Jonathan Loran


Neil Perrin wrote:


 Roch - PAE wrote:
 Jonathan Loran writes:
 Is it true that Solaris 10 u4 does not have any of the nice ZIL 
 controls   that exist in the various recent Open Solaris flavors?  I 
 would like to   move my ZIL to solid state storage, but I fear I 
 can't do it until I   have another update.  Heck, I would be happy 
 to just be able to turn the   ZIL off to see how my NFS on ZFS 
 performance is effected before spending   the $'s.  Anyone know when 
 will we see this in Solaris 10?
  
 You can certainly turn it off with any release (Jim's link).

 It's true that S10u4 does not have the Separate Intent Log to allow 
 using an SSD for ZIL blocks. I believe S10U5 will
 have that feature.

Don't think we can live with this.  Thanks
 Unfortunately it will not. A lot of ZFS fixes and features
 that had existed for a while will not be in U5 (for reasons I
 can't go into here). They should be in S10U6...

 Neil.
I feel like we're being hung out to dry here.  I've got 70TB on 9 
various Solaris 10 u4 servers, with different data sets.  All of these 
are NFS servers.  Two servers have a ton of small files, with a lot of 
read and write updating, and NFS performance on these are abysmal.  ZFS 
is installed on SAN array's (my first mistake).  I will test by 
disabling the ZIL, but if it turns out the ZIL needs to be on a separate 
device, we're hosed. 

Before ranting any more, I'll do the test of disabling the ZIL.  We may 
have to build out these systems with Open Solaris, but that will be hard 
as they are in production.  I would have to install the new OS on test 
systems and swap out the drives during scheduled down time.  Ouch.

Jon

-- 


- _/ _/  /   - Jonathan Loran -   -
-/  /   /IT Manager   -
-  _  /   _  / / Space Sciences Laboratory, UC Berkeley
-/  / /  (510) 643-5146 [EMAIL PROTECTED]
- __/__/__/   AST:7731^29u18e3
 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL controls in Solaris 10 U4?

2008-01-30 Thread Vincent Fox
Are you already running with zfs_nocacheflush=1?   We have SAN arrays with dual 
battery-backed controllers for the cache, so we definitely have this set on all 
our production systems.  It makes a big difference for us.

As I said before I don't see the catastrophe in disabling ZIL though.

We actually run our production Cyrus mail servers using failover servers so our 
downtime is typically just the small interval to switch active  idle nodes 
anyhow.  We did this mainly for patching purposes.

But we toyed with the idea of running OpenSolaris on them, then just upgrading 
the idle node to new OpenSolaris image every month using Jumpstart and 
switching to it.  Anything goes wrong switch back to the other node.

What we ended up doing, for political reasons, was putting the squeeze on our 
Sun reps and getting a 10u4 kernel spin patch with... what did they call it?  
Oh yeah a big wad of ZFS fixes.  So this ends up being a hug PITA because for 
the next 6 months to a year we are tied to getting any kernel patches through 
this other channel rather than the usual way.   But it does work for us, so 
there you are.

Give my choice I'd go with OpenSolaris but that's a hard sell for datacenter 
management types.  I think it's no big deal in a production shop with good 
JumpStart and CFengine setups, where any host should be rebuildable from 
scratch in a matter of hours.  Good luck.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration for a thumper

2008-01-30 Thread Marion Hakanson
[EMAIL PROTECTED] said:
 I'd take a look at bonnie++
 http://www.sunfreeware.com/programlistintel10.html#bonnie++ 

Also filebench:
  http://www.solarisinternals.com/wiki/index.php/FileBench

You'll see the most difference between 5x9 and 9x5 in small random reads:

http://blogs.sun.com/relling/entry/zfs_raid_recommendations_space_performance
http://blogs.sun.com/relling/entry/raid_recommendations_space_vs_mttdl
http://lindsay.at/blog/archive/2007/04/15/zfs-performance-models-for-a-streamin
g-server.html

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL controls in Solaris 10 U4?

2008-01-30 Thread Dale Ghent
On Jan 30, 2008, at 3:44 PM, Vincent Fox wrote:

 What we ended up doing, for political reasons, was putting the  
 squeeze on our Sun reps and getting a 10u4 kernel spin patch with...  
 what did they call it?  Oh yeah a big wad of ZFS fixes.  So this  
 ends up being a hug PITA because for the next 6 months to a year we  
 are tied to getting any kernel patches through this other channel  
 rather than the usual way.   But it does work for us, so there you  
 are.

Speaking of big wad of ZFS fixes, is it me or is anyone else here  
getting kind of displeased over the glacial speed of the backporting  
of ZFS stability fixes to s10? It seems that we have to wait around  
4-5 months for a oft-delayed s10 update for any fixes of substance to  
come out.

Not only that, but also one day the zfs is its own patch, and then it  
is part of the current KU, and now it's part of the nfs patch where  
zfs isn't mentioned anywhere in the patch's synopsis.

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL controls in Solaris 10 U4?

2008-01-30 Thread Marion Hakanson
[EMAIL PROTECTED] said:
 I feel like we're being hung out to dry here.  I've got 70TB on 9  various
 Solaris 10 u4 servers, with different data sets.  All of these  are NFS
 servers.  Two servers have a ton of small files, with a lot of  read and
 write updating, and NFS performance on these are abysmal.  ZFS  is installed
 on SAN array's (my first mistake).  I will test by  disabling the ZIL, but if
 it turns out the ZIL needs to be on a separate  device, we're hosed.  

If you're using SAN arrays, you should be in good shape.  I'll echo what
Vincent Fox said about using either zfs_nocacheflush=1 (which is in S10U4),
or setting the arrays to ignore the cache flush (SYNC_CACHE) requests.
We do the latter here, and it makes a huge difference for NFS clients,
basically putting the ZIL in NVRAM.

However, I'm also unhappy about having to wait for S10U6 for the separate
ZIL and/or cache features of ZFS.  The lack of NV ZIL on our new Thumper
makes it painfully slow over NFS for the large number of file create/delete
type of workload.

Here's a question:  Would having the client mount with -o nocto have
the same effect (for that particular client) as disabling the ZIL on the
server?  If so, it might be less drastic than losing the ZIL for everyone.

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS under VMware

2008-01-30 Thread Lewis Thompson
Hello,

I'm planning to use VMware Server on Ubuntu to host multiple VMs, one
of which will be a Solaris instance for the purposes of ZFS
I would give the ZFS VM two physical disks for my zpool, e.g. /dev/sda
and /dev/sdb, in addition to the VMware virtual disk for the Solaris
OS

Now I know that Solaris/ZFS likes to have total control over the disks
to ensure writes are flushed as and when it is ready for them to
happen, so I wonder if anybody comment on what implications using the
disks in this way (i.e. through Linux and then VMware) has on the
control Solaris has over these disks?  By using a VM will I be missing
out in terms of reliability?  If so, can anybody suggest any
improvements I could make while still allowing Solaris/ZFS to run in a
VM?

Many thanks, Lewis
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS under VMware

2008-01-30 Thread Torrey McMahon
Lewis Thompson wrote:
 Hello,

 I'm planning to use VMware Server on Ubuntu to host multiple VMs, one
 of which will be a Solaris instance for the purposes of ZFS
 I would give the ZFS VM two physical disks for my zpool, e.g. /dev/sda
 and /dev/sdb, in addition to the VMware virtual disk for the Solaris
 OS

 Now I know that Solaris/ZFS likes to have total control over the disks
 to ensure writes are flushed as and when it is ready for them to
 happen, so I wonder if anybody comment on what implications using the
 disks in this way (i.e. through Linux and then VMware) has on the
 control Solaris has over these disks?  By using a VM will I be missing
 out in terms of reliability?  If so, can anybody suggest any
 improvements I could make while still allowing Solaris/ZFS to run in a
 VM?

I'm not sure what the perf aspects would be but it depends on what the 
VMware software passes through. Does it ignore cache sync commands in 
its i/o stack? Got me.

You won't be missing out on reliability but you will be introducing more 
layers in the stack where something could go wrong.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL controls in Solaris 10 U4?

2008-01-30 Thread Jonathan Loran

Vincent Fox wrote:
 Are you already running with zfs_nocacheflush=1?   We have SAN arrays with 
 dual battery-backed controllers for the cache, so we definitely have this set 
 on all our production systems.  It makes a big difference for us.

   
No, we're not using the zfs_nocacheflush=1, but our SAN array's are set 
to cache all writebacks, so it shouldn't be needed.  I may test this, if 
I get the chance to reboot one of the servers, but I'll bet the storage 
arrays' are working correctly.

 As I said before I don't see the catastrophe in disabling ZIL though.

   
No catastrophe, just a potential mess.

 We actually run our production Cyrus mail servers using failover servers so 
 our downtime is typically just the small interval to switch active  idle 
 nodes anyhow.  We did this mainly for patching purposes.
   
Wish we could afford such replication.  Poor EDU environment here, I'm 
afraid.
 But we toyed with the idea of running OpenSolaris on them, then just 
 upgrading the idle node to new OpenSolaris image every month using Jumpstart 
 and switching to it.  Anything goes wrong switch back to the other node.

 What we ended up doing, for political reasons, was putting the squeeze on our 
 Sun reps and getting a 10u4 kernel spin patch with... what did they call it?  
 Oh yeah a big wad of ZFS fixes.  So this ends up being a hug PITA because 
 for the next 6 months to a year we are tied to getting any kernel patches 
 through this other channel rather than the usual way.   But it does work for 
 us, so there you are.
   
Mmmm, for us, Open Solaris may be easier.  I manly was after stability, 
to be honest.  Our ongoing experience with bleeding edge Linux is 
painful at times, and on our big iron, I want them to just work.  but if 
they're so slow, they're not really working right, are they?  Sigh...
 Give my choice I'd go with OpenSolaris but that's a hard sell for datacenter 
 management types.  I think it's no big deal in a production shop with good 
 JumpStart and CFengine setups, where any host should be rebuildable from 
 scratch in a matter of hours.  Good luck.
  
   
True, I'll think about that going forward.  Thanks,

Jon
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

-- 


- _/ _/  /   - Jonathan Loran -   -
-/  /   /IT Manager   -
-  _  /   _  / / Space Sciences Laboratory, UC Berkeley
-/  / /  (510) 643-5146 [EMAIL PROTECTED]
- __/__/__/   AST:7731^29u18e3
 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL controls in Solaris 10 U4?

2008-01-30 Thread Neil Perrin


Jonathan Loran wrote:
 Vincent Fox wrote:
 Are you already running with zfs_nocacheflush=1?   We have SAN arrays with 
 dual battery-backed controllers for the cache, so we definitely have this 
 set on all our production systems.  It makes a big difference for us.

   
 No, we're not using the zfs_nocacheflush=1, but our SAN array's are set 
 to cache all writebacks, so it shouldn't be needed.  I may test this, if 
 I get the chance to reboot one of the servers, but I'll bet the storage 
 arrays' are working correctly.

I think there's some confusion. ZFS and the ZIL issue controller commands
to force the disk cache to be flushed to ensure data is on stable
storage. If the disk cache is battery backed then the costly flush
is unnecessary. As Vincent said, setting zfs_nocacheflush=1 can make a
huge difference.

Note that this is a system wide variable so all controllers serving ZFS
devices should be non volatile to enable it.

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL controls in Solaris 10 U4?

2008-01-30 Thread Vincent Fox
 No, we're not using the zfs_nocacheflush=1, but our
 SAN array's are set 
 to cache all writebacks, so it shouldn't be needed.
  I may test this, if 
 get the chance to reboot one of the servers, but
  I'll bet the storage 
 rrays' are working correctly.

Bzzzt, wrong.

Read up on a few threads about this variable.  The ZFS flush command used 
equates to flush to rust for most any array.  What this works out to, is your 
array is not using it's NV for what it's supposed to.  You get a little data in 
the NV but it's tagged with this command that requires the NV to finish it's 
job and report back data is on disk, before proceeding.   Hopefully at some 
point the array people and the ZFS people will have a meeting of the minds on 
this issue of having the array report to the OS yes I have battery-back SAFE 
NV and it will all just automagically work.  Until then, we set the variable 
in /etc/system.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] 30 seond hang, ls command....

2008-01-30 Thread Neal Pollack
I'm running Nevada build 81 on x86 on an Ultra 40.
# uname -a
SunOS zbit 5.11 snv_81 i86pc i386 i86pc
Memory size: 8191 Megabytes

I started with this zfs pool many dozens of builds ago, approx a year ago.
I do live upgrade and zfs upgrade every few builds.

When I have not accessed the zfs file systems for a long time,
if I cd there and do an ls command, nothing happens for approx 30 seconds.

Any clues how I would find out what is wrong?

--

# zpool status -v
  pool: tank
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  raidz2ONLINE   0 0 0
c2d0ONLINE   0 0 0
c3d0ONLINE   0 0 0
c4d0ONLINE   0 0 0
c5d0ONLINE   0 0 0
c6d0ONLINE   0 0 0
c7d0ONLINE   0 0 0
c8d0ONLINE   0 0 0

errors: No known data errors


# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
tank   172G  2.04T  52.3K  /tank
tank/arc   172G  2.04T   172G  /zfs/arc

# zpool list
NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
tank  3.16T   242G  2.92T 7%  ONLINE  -



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 30 seond hang, ls command....

2008-01-30 Thread Nathan Kroenert
Any chance the disks are being powered down, and you are waiting for 
them to power back up?

Nathan. :)

Neal Pollack wrote:
 I'm running Nevada build 81 on x86 on an Ultra 40.
 # uname -a
 SunOS zbit 5.11 snv_81 i86pc i386 i86pc
 Memory size: 8191 Megabytes
 
 I started with this zfs pool many dozens of builds ago, approx a year ago.
 I do live upgrade and zfs upgrade every few builds.
 
 When I have not accessed the zfs file systems for a long time,
 if I cd there and do an ls command, nothing happens for approx 30 seconds.
 
 Any clues how I would find out what is wrong?
 
 --
 
 # zpool status -v
   pool: tank
  state: ONLINE
  scrub: none requested
 config:
 
 NAMESTATE READ WRITE CKSUM
 tankONLINE   0 0 0
   raidz2ONLINE   0 0 0
 c2d0ONLINE   0 0 0
 c3d0ONLINE   0 0 0
 c4d0ONLINE   0 0 0
 c5d0ONLINE   0 0 0
 c6d0ONLINE   0 0 0
 c7d0ONLINE   0 0 0
 c8d0ONLINE   0 0 0
 
 errors: No known data errors
 
 
 # zfs list
 NAME   USED  AVAIL  REFER  MOUNTPOINT
 tank   172G  2.04T  52.3K  /tank
 tank/arc   172G  2.04T   172G  /zfs/arc
 
 # zpool list
 NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
 tank  3.16T   242G  2.92T 7%  ONLINE  -
 
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] I.O error: zpool metadata corrupted after powercut

2008-01-30 Thread kristof
Last 2 weeks we had 2 zpools corrupted.

Pool was visible via zpool import, but could not be imported anymore. During 
import attempt we got I/O error,

After a first powercut we lost our jumpstart/nfsroot zpool (another pool was 
still OK). Luckaly jumpstart data was backed up and easely restored, nfsroot 
Filesystems where not but those where just test machines.  We thought the 
metadata corruption was caused because of the zfs no cache flush setting we had 
configured in /etc/system (for perfomance reason) in combination with a non 
battery backuppped NVRAM cache (areca raid controller).

zpool was raidz with 10 local sata disks (JBOD mode)


2 days ago we had another powercut in our test labo :-(

And again one pool was lost. This system was not configured with zfs no cache 
flush. On the pool we had +/- 40 zvols used by running vm's (iscsi 
boot/swap/data disks for xen  virtual box guests)

The first failure was on a b68 system, the second on a b77 system.

Last zpool was using iscsi disks: 

setup:

pool
 mirror:
   iscsidisk1 san1
   iscsidisk1 san2
 mirror:
   iscsidisk2 san1
   iscsidisk2 san2

I thought zfs was always persistent on disk, but apparently a power cut has can 
cause unrecoverable damage.

I can accept the first failure (because of the dangerous setting), but loosing 
that second pool was unacceptable for me.

Since no fsck alike utility is available for zfs I was wondering if there are 
any plans to create something like meta data repair tools?

Using ZFS now for almost 1 year I was a big Fan, In one year I lost not 1 zpool 
till last week.

At this time I'm concidering to say ZFS is not yet production ready

any comment welcome...

krdoor
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Can't offline second disk in a mirror

2008-01-30 Thread Boyd Adamson
Since I spend a lot of time going from machine to machine so I thought  
I'd carry a pool with me on a couple of USB keys. It all works fine  
but it's slow, so I thought I'd attach a file vdev to the pool and  
then offline the USB devices for speed, then undo when I want to take  
the keys with me. Unfortunately, it seems that once I've offlined one  
device, the mirror is marked as degraded and then I'm not allows to  
take the other USB key offline:

# zpool create usb mirror /dev/dsk/c5t0d0p0 /dev/dsk/c6t0d0p0
# mkfile 2g /file
# zpool attach usb c6t0d0p0 /file
# zpool status
pool: usb
  state: ONLINE
  scrub: resilver completed with 0 errors on Thu Jan 31 13:24:22 2008
config:

 NAME  STATE READ WRITE CKSUM
 usb   ONLINE   0 0 0
   mirror  ONLINE   0 0 0
 c5t0d0p0  ONLINE   0 0 0
 c6t0d0p0  ONLINE   0 0 0
 /file ONLINE   0 0 0

errors: No known data errors
# zpool offline usb c5t0d0p0
Bringing device c5t0d0p0 offline
# zpool status
   pool: usb
  state: DEGRADED
status: One or more devices has been taken offline by the administrator.
 Sufficient replicas exist for the pool to continue  
functioning in a
 degraded state.
action: Online the device using 'zpool online' or replace the device  
with
 'zpool replace'.
  scrub: resilver completed with 0 errors on Thu Jan 31 13:24:22 2008
config:

 NAME  STATE READ WRITE CKSUM
 usb   DEGRADED 0 0 0
   mirror  DEGRADED 0 0 0
 c5t0d0p0  OFFLINE  0 0 0
 c6t0d0p0  ONLINE   0 0 0
 /file ONLINE   0 0 0

errors: No known data errors
# zpool offline usb c6t0d0p0
cannot offline c6t0d0p0: no valid replicas
# cat /etc/release
 Solaris 10 8/07 s10x_u4wos_12b X86
Copyright 2007 Sun Microsystems, Inc.  All Rights Reserved.
 Use is subject to license terms.
 Assembled 16 August 2007


I've experimented with other configurations (not just keys and files,  
but slices as well) and found the same thing - once one device in a  
mirror is offline I can't offline any others, even though there are  
other (sometimes multiple) copies left.

Of course, I can detach the device, but I was hoping to avoid a full  
resilver when I reattach.

Is this the expected behaviour? Am I missing something that would mean  
that what I'm trying to do is a bad idea?

Boyd
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I.O error: zpool metadata corrupted after powercut

2008-01-30 Thread Richard Elling
kristof wrote:
 Last 2 weeks we had 2 zpools corrupted.

 Pool was visible via zpool import, but could not be imported anymore. During 
 import attempt we got I/O error,
   

What exactly was the error message?
Also look at the fma messages, as they are often more precise.
 -- richard

 After a first powercut we lost our jumpstart/nfsroot zpool (another pool was 
 still OK). Luckaly jumpstart data was backed up and easely restored, nfsroot 
 Filesystems where not but those where just test machines.  We thought the 
 metadata corruption was caused because of the zfs no cache flush setting we 
 had configured in /etc/system (for perfomance reason) in combination with a 
 non battery backuppped NVRAM cache (areca raid controller).

 zpool was raidz with 10 local sata disks (JBOD mode)


 2 days ago we had another powercut in our test labo :-(

 And again one pool was lost. This system was not configured with zfs no cache 
 flush. On the pool we had +/- 40 zvols used by running vm's (iscsi 
 boot/swap/data disks for xen  virtual box guests)

 The first failure was on a b68 system, the second on a b77 system.

 Last zpool was using iscsi disks: 

 setup:

 pool
  mirror:
iscsidisk1 san1
iscsidisk1 san2
  mirror:
iscsidisk2 san1
iscsidisk2 san2

 I thought zfs was always persistent on disk, but apparently a power cut has 
 can cause unrecoverable damage.

 I can accept the first failure (because of the dangerous setting), but 
 loosing that second pool was unacceptable for me.

 Since no fsck alike utility is available for zfs I was wondering if there are 
 any plans to create something like meta data repair tools?

 Using ZFS now for almost 1 year I was a big Fan, In one year I lost not 1 
 zpool till last week.

 At this time I'm concidering to say ZFS is not yet production ready

 any comment welcome...

 krdoor
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Hardware RAID vs. ZFS RAID

2008-01-30 Thread Gregory Perry
Hello,

I have a Dell 2950 with a Perc 5/i, two 300GB 15K SAS drives in a RAID0 array.  
I am considering going to ZFS and I would like to get some feedback about which 
situation would yield the highest performance:  using the Perc 5/i to provide a 
hardware RAID0 that is presented as a single volume to OpenSolaris, or using 
the drives separately and creating the RAID0 with OpenSolaris and ZFS?  Or 
maybe just adding the hardware RAID0 to a ZFS pool?  Can anyone suggest some 
articles or FAQs on implementing ZFS RAID?

Which situation would provide the highest read and write throughput?

Thanks in advance
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] seen on freebsd-stable: reproducible zfs panic

2008-01-30 Thread James C. McPherson
Hi everybody,
Greg pointed me to
http://lists.freebsd.org/pipermail/freebsd-stable/2008-January/040136.html
from a Daniel Eriksson:



If you import and export more than one zpool FreeBSD will panic during
shutdown. This bug is present in both RELENG_7 and RELENG_7_0 (I have
not tested CURRENT).

kgdb output:

Syncing disks, vnodes remaining...2 1 0 0 done
All buffers synced.
vput: negative ref count
0xc2ad1aa0: tag ufs, type VDIR
 usecount 0, writecount 0, refcount 2 mountedhere 0
 flags (VV_ROOT)
  VI_LOCKedv_object 0xc1030174 ref 0 pages 1
  lock type ufs: EXCL (count 1) by thread 0xc296 (pid 1)
 ino 2, on dev ad0s1a
panic: vput: negative ref cnt
KDB: stack backtrace:
db_trace_self_wrapper(c086ad8a,d3b19b68,c06265ba,c0868fd5,c08e9ca0,...)
at db_trace_self_wrapper+0x26
kdb_backtrace(c0868fd5,c08e9ca0,c086f57a,d3b19b74,d3b19b74,...) at
kdb_backtrace+0x29
panic(c086f57a,c085,c086f561,c2ad1aa0,d3b19b90,...) at panic+0xaa
vput(c2ad1aa0,2,d3b19bf0,c296,c086eedd,...) at vput+0xdb
dounmount(c2ba6d0c,8,c296,0,0,...) at dounmount+0x49f
vfs_unmountall(c0868ebb,0,c2967000,8,d3b19c50,...) at
vfs_unmountall+0x33
boot(c296,8,1,c295e000,c296,...) at boot+0x3e3
reboot(c296,d3b19cfc,4,c086b882,56,...) at reboot+0x66
syscall(d3b19d38) at syscall+0x33a
Xint0x80_syscall() at Xint0x80_syscall+0x20
--- syscall (55, FreeBSD ELF32, reboot), eip = 0x8050903, esp =
0xbfbfe90c, ebp = 0xbfbfe9d8 ---
Uptime: 2m42s
Physical memory: 503 MB
Dumping 39 MB: 24 8


Run this script and then reboot the computer to trigger the panic:

dd if=/dev/zero of=/usr/_disk1 bs=1m count=80
dd if=/dev/zero of=/usr/_disk2 bs=1m count=80
mdconfig -f /usr/_disk1 -u 1
mdconfig -f /usr/_disk2 -u 2
/etc/rc.d/zfs forcestart
zpool create tank1 md1
zpool create tank2 md2
sleep 2
touch /tank1/testfile
touch /tank2/testfile
sleep 2
zpool export tank2
zpool export tank1
sleep 10
zpool import tank1
zpool import tank2
sleep 2
touch /tank1/testfile
touch /tank2/testfile
sleep 2
zpool export tank2
zpool export tank1
/etc/rc.d/zfs forcestop
sleep 2
mdconfig -d -u 1
mdconfig -d -u 2
rm /usr/_disk1
rm /usr/_disk2

/Daniel Eriksson




Anybody seen anything like this, on Solaris or freebsd?

I've got my doubts about whether Daniel's got a valid test.



thanks,
James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL controls in Solaris 10 U4?

2008-01-30 Thread Steve Hillman
 
 However, I'm also unhappy about having to wait for S10U6 for the separate
 ZIL and/or cache features of ZFS.  The lack of NV ZIL on our new Thumper
 makes it painfully slow over NFS for the large number of file create/delete
 type of workload.

I did a bit of testing on this (because I'm in the same boat) and was able to 
work around it by breaking my filesystem up into lots of individual zfs 
filesystems. Although the performance of each one isn't great, as long as your 
load is threaded and distributed across filesystems, it should balance out.

Steve Hillman
Simon Fraser University
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss