[zfs-discuss] zfs snapshot stalled?

2008-10-20 Thread Carsten Aulbert
Hi all,

I've just seen something weird. On a zpool which looks a bit busy right
now (~ 100 read op/s, 100 kB/s) I started a zfs snapshot about an hour
ago. Until now, a taking a snapshot took usually at few seconds at most
even for largish ~TByte file systems. I don't know if the read IOs are
currently related to the snapshot itself or if another user doing this
since I have not looked prior to taking the snapshot.

My remaining questions after searching the web:

(1) Is it common that snapshots can take this long?
(2) Is there a way to stop it if one assumes somethings went wrong? I.e.
is there a special signal I could send it?

Thanks for any hint

Carsten
-- 
Dr. Carsten Aulbert - Max Planck Institute for Gravitational Physics
Callinstrasse 38, 30167 Hannover, Germany
Phone/Fax: +49 511 762-17185 / -17193
http://www.top500.org/system/9234 | http://www.top500.org/connfam/6/list/31
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best practice recommendations for backing up to ZFS

2008-10-20 Thread Oleg Muravskiy
If you are using rsync already, I would run it on server in daemon mode. And 
there are Windows clients that support rsync protocol.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Success Stories

2008-10-20 Thread gm_sjo
Hi all,

I  have built out an 8TB SAN at home using OpenSolaris + ZFS. I have
yet to put it into 'production' as a lot of the issues raised on this
mailing list are putting me off trusting my data onto the platform
right now.

Throughout time, I have stored my personal data on NetWare and now NT
and this solution has been 100% reliable for the last 12 years. Never
a single problem (nor have I had any issues with NTFS with the tens of
thousands of spindles i've worked with over the years).

I appreciate 99% of the time people only comment if they have a
problem, which is why I think it'd be nice for some people who have
successfully implemented ZFS, including making various use of the
features (recovery, replacing disks, etc), could just reply to this
post with a sentence or paragraph detailing how great it is for them.
Not necessarily interested in very small implementations of one/two
disks that haven't changed config since the first day it was
installed, but more aimed towards setups that are 'organic' and have
changed/been_administered over time (to show functionality of the
tools, resilience of the platform, etc.)..

.. Of course though, I guess a lot of people who may have never had a
problem wouldn't even be signed up on this list! :-)


Thanks!
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs snapshot stalled?

2008-10-20 Thread Carsten Aulbert
Hi again,

brief update:

the process ended successfully (at least a snapshot was created) after
close to 2 hrs. Since the load is still the same as before taking the
snapshot I blame other users' processes reading from that array for the
long snapshot duration.

Carsten Aulbert wrote:

 My remaining questions after searching the web:
 
 (1) Is it common that snapshots can take this long?
 (2) Is there a way to stop it if one assumes somethings went wrong? I.e.
 is there a special signal I could send it?
 
 Thanks for any hint
 
 Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-20 Thread Victor Latushkin
Richard Elling пишет:
 Keep in mind that this is for Solaris 10 not opensolaris.
 
 Keep in mind that any changes required for Solaris 10 will first
 be available in OpenSolaris, including any changes which may
 have already been implemented.

Indeed. For example, less than a week ago fix for the following two CRs 
(along with some others) was put back into Solaris Nevada:

6333409 traversal code should be able to issue multiple reads in parallel
6418042 want traversal in depth-first pre-order for quicker 'zfs send'

This should have positive impact on 'zfs send' performance.

Wbr,
victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpool import not working - I broke my pool...

2008-10-20 Thread Ross
If you have any backups of your boot volume, I found that the pool can be 
mounted on boot provided it's still listed in your /etc/zfs/zpool.cache file.  
I've moved to OpenSolaris now purely so I can take snapshots of my boot volume 
and backup that file.

The relevant bug you need fixing is this one, but I've no idea how long it 
might take before that fix is done.
http://bugs.opensolaris.org/view_bug.do?bug_id=6733267
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recover after disk labels failure

2008-10-20 Thread Oleg Muravskiy
 Is there a way to recover from this problem? I'm
 pretty sure the data is still OK, it's just labels
 that get corrupted by controller or zfs. :(

And this s confirmed by zdb, after loong wait for comparison of data and 
checksums: no data errors.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Lost Disk Space

2008-10-20 Thread Ben Rockwood
No takers? :)

benr.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Lost Disk Space

2008-10-20 Thread Marcelo Leal
Hello there...
 I did see that already, talk with some guys without answer too... ;-)
 Actually, this week i did not see discrepancy between tools, but the pool 
information was wrong (space used). Exporting/importing, scrub, and etc, did 
not solve. I know that zfs is async in the status report ;-), but only after 
a reboot the status was OK again.
ps.: b89

 Leal.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Slow zpool import with b98

2008-10-20 Thread kristof
I'm also seeing a very slow import on the 2008.11 build 98 prerelease.

I have the following setup:

a striped zpool of 2 mirrors, both mirrors have 1 local disk and 1 iscsi disk.

I was testing a setup with iscsiboot (windows vista) with gpxeboot, every 
client was booted from a iscsi exposed zvol, 

I had a lot of snapshots, I enabled the automatic snapshot service last week, I 
configured it to create a snapshot every 15min , And I kept all snapshots,

 Today (after 4days) my first server crashed, I'm right now importing the pool 
on my second node to test failover, but the import is alredy running for more 
then 45min :-(

If I had know it took that much time, I should have configured to remove ol 
snapshots.

K
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Slow zpool import with b98

2008-10-20 Thread kristof
I'm also seeing a slow import on the 2008.11 build 98 prerelease.

but my situation is a little different :

I have the following setup:

a striped zpool of 2 mirrors, both mirrors have 1 local disk and 1 iscsi disk.

I was testing iscsiboot (windows vista) with gpxeboot, every client was booted 
from a iscsi exposed zvol, and was connected via cifs to 1 zfs filesystem

I had a lot of snapshots, I enabled the automatic snapshot service last week, I 
configured it to create a snapshot every 15min for every zvol and zfs 
filesystem , And I configured it to keep all snapshots.

 Today (after running well for 4days) my first server crashed, I'm right now 
importing the pool on my second node to test failover, but the import is 
already running for more then 45min :-(

So keeping all those snapshots is a dangerous situation since it takes so 
much time to import the zpool on a another node.

K
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] unloading zfs module

2008-10-20 Thread shelly
Is it possible to unload zfs module without rebooting the computer. 

I am making some changes in zfs kernel code. then compiling it. i then want to 
reload the newly rebuilt module without rebooting.

I tried modunload. But its always givin can't unload the module: Device busy .

What can be the possible solution
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How can i make my zpool as faulted.

2008-10-20 Thread Sanjeev Bagewadi
Yuvraj,

I see that you are using files as disks.
You could write a few random bytes to one of the files and that would 
induce corruption.
To make a particular disk faulty you could mv the file to a new name.

Also, you can explore the zinject from zfs testsuite . Probably it has a 
way to induce fault.

Thanks and regards,
Sanjeev

yuvraj wrote:
 Hi Sanjeev,
 I am herewith giving all the details of my zpool by 
 firirng #zpool status command on commandline. Please go through the same and 
 help me out.

  Thanks in advance.

   
 Regards,
   Yuvraj 
 Balkrishna Jadhav.

 ==

 # zpool status
   pool: mypool1
  state: ONLINE
  scrub: none requested
 config:

 NAMESTATE READ WRITE CKSUM
 mypool1 ONLINE   0 0 0
   /disk1ONLINE   0 0 0
   /disk2ONLINE   0 0 0

 errors: No known data errors

   pool: zpool21
  state: ONLINE
  scrub: scrub completed with 0 errors on Sat Oct 18 13:01:52 2008
 config:

 NAMESTATE READ WRITE CKSUM
 zpool21 ONLINE   0 0 0
   /disk3ONLINE   0 0 0
   /disk4ONLINE   0 0 0

 errors: No known data errors
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] unloading zfs module

2008-10-20 Thread Darren J Moffat
shelly wrote:
 Is it possible to unload zfs module without rebooting the computer. 
 
 I am making some changes in zfs kernel code. then compiling it. i then want 
 to reload the newly rebuilt module without rebooting.
 
 I tried modunload. But its always givin can't unload the module: Device 
 busy .
 
 What can be the possible solution

If your root/boot filesystem is zfs then you won't be able to unload the
module.

If your root/boot filesystem is nfs or ufs then you can unload it like this:


svcadm disable -t fmd
svcadm disable -t sysevent
pools=`zpool list -H -o name`
for p in $pools ; do
 zpool export $p
done
sleep 5
modunload -i 0
modinfo | grep zfs

If you do NOT see any output from modinfo then the unload succeeded
if you DO see output from modinfo it did not succeed.


-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-20 Thread Scott Williamson
On Mon, Oct 20, 2008 at 1:52 AM, Victor Latushkin
[EMAIL PROTECTED]wrote

 Indeed. For example, less than a week ago fix for the following two CRs
 (along with some others) was put back into Solaris Nevada:

 6333409 traversal code should be able to issue multiple reads in parallel
 6418042 want traversal in depth-first pre-order for quicker 'zfs send'


That is helpful Victor. Does anyone have a full list of CRs that I can
provide to sun support? I have tried searching the bugs database, but I
didn't even find those two on my own.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] ZFS Success Stories

2008-10-20 Thread Will Murnane
 On Mon, Oct 20, 2008 at 03:10, gm_sjo [EMAIL PROTECTED] wrote:
 I appreciate 99% of the time people only comment if they have a
 problem, which is why I think it'd be nice for some people who have
 successfully implemented ZFS, including making various use of the
 features (recovery, replacing disks, etc), could just reply to this
 post with a sentence or paragraph detailing how great it is for them.
My initial test of zfs was with a few IDE disks which I had found
flaky on other platforms (md5 mismatches, that kind of thing).  I put
them all in a non-redundant pool, and loaded some data on the pool.
Then I let it sit in the corner serving NFS for a couple weeks,
scrubbed the pool every once in a while, and watched the error
counters.  It confirmed what I'd seen: these disks gave off errors
spontaneously.  This was a good start: the first time I'd seen a
storage stack that had the audacity to complain about problems with
its hardware.

So I upgraded, and put in known-good disks.  I started with a
mirrored pair of 750s, then added another pair, then added a pair of
log disks.  At each step, things moved smoothly, and speed increased.

I've also helped my brother set up a Solaris/ZFS setup, on a bit
larger scale but with a more static configuration.  He started with
Linux, md raid, and XFS, using raid 5 on 8 320GB disks and a
Supermicro AOC-SAT2-MV8 Marvell controller.  Unfortunately, he lost
basically the entire array due to corruption in some layer of the
stack.  So I suggested ZFS as an alternative.  This was around Build
67 of Nevada.  He put his 8 disks in a raidz pool.  About a year ago,
he bought six 500gb disks and another Marvell controller, made a new
raidz vdev (in a new pool) out of them, and added six of the 320gb
disks in another vdev.  A month or so ago, he bought six 1TB disks,
made a new pool out of them, and moved all his data over to it.

At each step of the way, he upgraded to solve a problem.  Moving from
Linux to Solaris was because it had better drivers for the
Marvell-based card.  Adding the 500GB disks was because he was out of
space, and the reason we didn't just add another vdev to the existing
pool is because his case only has room for 13 disks.  Finally, the 320
gig disks have started returning checksum errors, so he wanted to get
them out of the pool.  The system as a whole has been very reliable,
but due to some ZFS limitations (no vdev removal, no stripe width
changing) a new pool has been needed at each stage.

My experiences with ZFS at home have been very positive, but I also
use it at work.  I'm concerned about the speed of zfs send and with
being able to remove vdevs before I will recommend it unilaterally for
work purposes, but despite these issues I have a couple pools in
production: one serving mail, one serving user home directories, and
one serving data for research groups.  We have had no problems with
these pools, but I keep an eye on the backup logs for them.  I hope
that eventually such careful watching will not be necessary.

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best practice recommendations for backing up to ZFS Fileserver

2008-10-20 Thread Wade . Stuart

[EMAIL PROTECTED] wrote on 10/19/2008 01:59:29 AM:

 Ares Drake wrote:
  Greetings.
 
  I am currently looking into setting up a better backup solution for our
  family.
 
  I own a ZFS Fileserver with a 5x500GB raidz. I want to back up data
(not
  the OS itself) from multiple PCs running linux oder windowsXP. The
linux
  boxes are connected via 1000Mbit, the windows machines either via
  gigabit as well or 54Mbit WPA encrypted WLAN. So far i've set up
sharing
  via NFS on the Solaris box and it works well from both Linux and
Windows
  (via SFU).
 
  Anyone have a similar setup, recommendations, or maybe something I
could
  use as an idea?
 I store all the data worth keeping on my windows boxes on either iSCSI
 volumes exported from a ZFS server, or SMB shares from the same server.
 That way, nothing has to be coped and they never run out of space.

I use rsync with the inplace-update setting and a blocksize matching the
zfs store.  I call it from the zfs server and snap after it finishes.
seems to allow for a massive amount of snaps as only changed blocks are
updated.  I make sure to --exclude volatile and non important files such as
swap.  I then share the snapshot directory via samba with vfs shadowcopy
enabled to allow for restores from any of the snaps via xp's shadowcopy
interface.

3x raw storage on my zfs server allows me about 1 year of staggered
snapshots -- your milage will vary with data volatility and snap frequency.

-Wade

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS scalability in terms of file system count (or lack thereof) in S10U6

2008-10-20 Thread Pramod Batni


Paul B. Henson wrote:
 snip

 At about 5000 filesystems, it starts taking over 30 seconds to
 create/delete additional filesystems.

 At 7848, over a minute:

 # time zfs create export/user/test

 real1m22.950s
 user1m12.268s
 sys 0m10.184s

 I did a little experiment with truss:

 # truss -c zfs create export/user/test2

 syscall   seconds   calls  errors
 _exit.000   1
 read .004 892
 open .023  67   2
 close.001  80
 brk  .006 653
 getpid   .0378598
 mount.006   1
 sysi86   .000   1
 ioctl 115.534 313036787920
 execve   .000   1
 fcntl.000  18
 openat   .000   2
 mkdir.000   1
 getppriv .000   1
 getprivimplinfo  .000   1
 issetugid.000   4
 sigaction.000   1
 sigfillset   .000   1
 getcontext   .000   1
 setustack.000   1
 mmap .000  78
 munmap   .000  28
 xstat.000  65  21
 lxstat   .000   1   1
 getrlimit.000   1
 memcntl  .000  16
 sysconfig.000   5
 lwp_sigmask  .000   2
 lwp_private  .000   1
 llseek   .084   15819
 door_info.000  13
 door_call.1038391
 schedctl .000   1
 resolvepath  .000  19
 getdents64   .000   4
 stat64   .000   3
 fstat64  .000  98
 zone_getattr .000   1
 zone_lookup  .000   2
    --   
 sys totals:   115.804 31338551   7944
 usr time: 107.174
 elapsed:  897.670


 and it seems the majority of time is spent in ioctl calls, specifically:

 ioctl(16, MNTIOC_GETMNTENT, 0x08045A60) = 0
   

Yes, the implementation of the above ioctl walks the list of mounted 
filesystems 'vfslist'
[in this case it walks 5000 nodes of a linked list before the ioctl 
returns] This in-kernel traversal of the filesystems
is taking time.
 Interestingly, I tested creating 6 filesystems simultaneously, which took a
 total of only three minutes, rather than 9 minutes had they been created
   
 sequentially. I'm not sure how parallelizable I can make an identity
 management provisioning system though.
   
 Was I mistaken about the increased scalability that was going to be
 available? Is there anything I could configure differently to improve this
 performance? We are going to need about 30,000 filesystems to cover our
   

You could set 'zfs set mountpoint=none pool-name' and then create the 
filesystems
under the pool-name . [In my experiments the number of ioctl's went 
down drastically.]
You could then set a mountpoint for the pool and then issue a 'zpool 
mount -a' .

Pramod
 faculty, staff, students, and group project directories. We do have 5
 x4500's which will be allocated to the task, so about 6000 filesystems per.
 Depending on what time of the quarter it is, our identity management sytem
 can create hundreds up to thousands of accounts, and when we purge accounts
 quarterly we typically delete 10,000 or so. Currently those jobs only take
 2-6 hours, with this level of performance from ZFS they would take days if
 not over a week :(.

 Thanks for any suggestions. What is the internal recommendation on maximum
 number of file systems per server?


   
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs cp hangs when the mirrors are removed ..

2008-10-20 Thread Karthik Krishnamoorthy
Hi Richard,
Richard Elling wrote:
 Karthik Krishnamoorthy wrote:
 We did try with this

 zpool set failmode=continue pool option

 and the wait option before pulling running the cp command and pulling 
 out the mirrors and in both cases there was a hang and I have a core 
 dump of the hang as well.
   

 You have to wait for the I/O drivers to declare that the device is
 dead.  This can be up to several minutes, depending on the driver.
Okay, the customer indicated they didn't see a hang with the ufs when 
they ran the same test with UFS.

 Any pointers to the bug opening process ?
   

 http://bugs.opensolaris.org, or bugster if you have an account.
 Be sure to indicate which drivers you are using, as this is not likely
 a ZFS bug, per se.  Output from prtconf -D should be a minimum.
I have the core dump of the hang. Will make that available as well.

Thanks and regards,
Karthik
 -- richard

 Thanks
 Karthik

 On 10/15/08 22:27, Neil Perrin wrote:
  
 On 10/15/08 23:12, Karthik Krishnamoorthy wrote:

 Neil,

 Thanks for the quick suggestion, the hang seems to happen even with 
 the zpool set failmode=continue pool option.

 Any other way to recover from the hang ?
   
 You should set the property before you remove the devices.
 This should prevent the hang. It isn't used to recover from it.

 If you did do that then it seems like a bug somewhere in ZFS or the 
 IO stack
 below it. In which case you should file a bug.

 Neil.

 thanks and regards,
 Karthik

 On 10/15/08 22:03, Neil Perrin wrote:
  
 Karthik,

 The pool failmode property as implemented governs the behaviour 
 when all
 the devices needed are unavailable. The default behaviour is to wait
 (block) until the IO can continue - perhaps by re-enabling the 
 device(s).
 The behaviour you expected can be achieved by zpool set 
 failmode=continue pool,
 as shown in the link you indicated below.

 Neil.

 On 10/15/08 22:38, Karthik Krishnamoorthy wrote:

 Hello All,

   Summary:
   
   cp command for mirrored zfs hung when all the disks in the 
 mirrored
   pool were unavailable.
 Detailed description:
   ~
 The cp command (copy a 1GB file from nfs to zfs) hung when 
 all the disks
   in the mirrored pool (both c1t0d9 and c2t0d9) were removed 
 physically.
NAMESTATE READ WRITE CKSUM
  testONLINE  0 0 0
mirrorONLINE  0 0 0
  c1t0d9  ONLINE  0 0 0
  c2t0d9  ONLINE  0 0 0
 We think if all the disks in the pool are unavailable, cp 
 command should
   fail with error (not cause hang).
 Our request:
   
   Please investigate the root cause of this issue.
  
   How to reproduce:
   ~
   1. create a zfs mirrored pool
   2. execute cp command from somewhere to the zfs mirrored pool.
   3. remove the both of disks physically during cp command working
 =  hang happen (cp command never return and we can't kill cp 
 command)

 One engineer pointed me to this page  
 http://opensolaris.org/os/community/arc/caselog/2007/567/onepager/ 
 and indicated that if all the mirrors are removed zfs enters a 
 hang like state to prevent the kernel from going into a panic 
 mode and this type of feature would be an RFE.

 My questions are

 Are there any documentation of the mirror configuration of zfs 
 that explains what happens when the underlying
 drivers detect problems in one of the mirror devices?

 It seems that the traditional views of mirror or raid-2 would 
 expect that the
 mirror would be able to proceed without interruption and that 
 does not seem to be this case in ZFS.
 What is the purpose of the mirror, in zfs?  Is it more like an 
 instant
 backup?  If so, what can the user do to recover, when there is an
 IO error on one of the devices?


 Appreciate any pointers and help,

 Thanks and regards,
 Karthik
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Windows XP nfs client poor performance

2008-10-20 Thread Bob Bencze
Greetings.
I have a X4500 with an 8TB RAIDZ datapool, currently 75% full. I have it carved 
up into several  filesystems. I share out two of the  filesystems 
/datapool/data4 (approx 1.5TB) and /datapool/data5 (approx 3.5TB). THe data is 
imagery, and the primary application on the PCs is Socetset.
The clients are Windows XP Pro, and I use services for unix (SFU) to mount the 
nfs shares from the thumper. When a client PC accesses files from data4, they 
come across quickly. When the same client accesses files from data5, the 
transfer rate comes to a crawl, and sometimes the application times out.
The only difference I can see is the size of the volume, the data is all of the 
same type.

I could find no references for any limitations on the volume size of nfs shares 
or mounts. It seems inconsistent and difficult to duplicate. I plan to begin a 
more in-depth troubleshooting of the problem with dtrace.  

Has anyone seen anything like this before?

Thanks.

-Bob Bencze
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] RAID-Z True Volume Size?

2008-10-20 Thread William Saadi
Hi all,

I have a little question.
WIth RAID-Z rules, what is the true usable disks space?
Is there a calcul like any RAID (ex. RAID5 = nb of disks - 1 for parity)?

Thank you for your help, i found everywhere on the web and i don't found my 
answer...
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] My 500-gig ZFS is gone: insufficient replicas, corrupted data

2008-10-20 Thread Richard Elling
Eugene Gladchenko wrote:
 Hi,

 I'm running FreeBSD 7.1-PRERELEASE with a 500-gig ZFS drive. Recently I've 
 encountered a FreeBSD problem (PR kern/128083) and decided about updating the 
 motherboard BIOS. It looked like the update went right but after that I was 
 shocked to see my ZFS destroyed! Rolling the BIOS back did not help.

 Now it looks like that:

 # zpool status
   pool: tank
  state: UNAVAIL
 status: One or more devices could not be used because the label is missing
 or invalid.  There are insufficient replicas for the pool to continue
 functioning.
 action: Destroy and re-create the pool from a backup source.
see: http://www.sun.com/msg/ZFS-8000-5E
  scrub: none requested
 config:

 NAMESTATE READ WRITE CKSUM
 tankUNAVAIL  0 0 0  insufficient replicas
   ad4   UNAVAIL  0 0 0  corrupted data
 # zdb -l /dev/ad4
 
 LABEL 0
 
 version=6
 name='tank'
 state=0
 txg=4
 pool_guid=12069359268725642778
 hostid=2719189110
 hostname='home.gladchenko.ru'
 top_guid=5515037892630596686
 guid=5515037892630596686
 vdev_tree
 type='disk'
 id=0
 guid=5515037892630596686
 path='/dev/ad4'
 devid='ad:5QM0WF9G'
 whole_disk=0
 metaslab_array=14
 metaslab_shift=32
 ashift=9
 asize=500103118848
 
 LABEL 1
 
 version=6
 name='tank'
 state=0
 txg=4
 pool_guid=12069359268725642778
 hostid=2719189110
 hostname='home.gladchenko.ru'
 top_guid=5515037892630596686
 guid=5515037892630596686
 vdev_tree
 type='disk'
 id=0
 guid=5515037892630596686
 path='/dev/ad4'
 devid='ad:5QM0WF9G'
 whole_disk=0
 metaslab_array=14
 metaslab_shift=32
 ashift=9
 asize=500103118848
 
 LABEL 2
 
 failed to unpack label 2
 
 LABEL 3
 
 failed to unpack label 3
   

This would occur if the beginning of the partition was intact,
but the end is not.  Causes for the latter include:
1. partition table changes (or vtoc for SMI labels)
2. something overwrote data at the end

If the cause is #1, then restoring the partion should work.  If
the cause is #2, then the data may be gone.

Note: ZFS can import a pool with one working label, but if
more of the data is actually unavailable or overwritten, then
it may not be able to get to a consistent state.
 -- richard


 #

 I've tried to import the problem pool into OpenSolaris 2008.05 with no 
 success:

 # zpool import
   pool: tank
  id: 12069359268725642778
  state: UNAVAIL
 status: The pool was last accessed by another system.
 action: The pool cannot be imported due to damaged devices or data.
 see: http://www.sun.com/msg/ZFS-8000-EY
 config:

 tank   UNAVAIL  0 0 0  insufficient replicas
   c3d0s2   UNAVAIL  0 0 0  corrupted data
 #

 Is there a way to recover my files from this broken pool? Maybe at least some 
 of them? The drive was 4/5 full. :(

 I would appreciate any help.

 p.s. I already bought another drive of the same size yesterday. My next ZFS 
 experience definitely will be a mirrored one.
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RAID-Z True Volume Size?

2008-10-20 Thread Richard Elling
William Saadi wrote:
 Hi all,

 I have a little question.
 WIth RAID-Z rules, what is the true usable disks space?
   

It depends on what data you write to it, how the writes are done, and
what compression or redundancy parameters are set.

 Is there a calcul like any RAID (ex. RAID5 = nb of disks - 1 for parity)?
   

In general, by default, yes, this will work.  raidz approximates raid-5,
raidz2 approximates raid-6.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-20 Thread Gary Mills
On Thu, Oct 16, 2008 at 03:50:19PM +0800, Gray Carper wrote:
 
Sidenote: Today we made eight network/iSCSI related tweaks that, in
aggregate, have resulted in dramatic performance improvements (some I
just hadn't gotten around to yet, others suggested by Sun's Mertol
Ozyoney)...
- disabling the Nagle algorithm on the head node
- setting each iSCSI target block size to match the ZFS record size of
128K
- disabling thin provisioning on the iSCSI targets
- enabling jumbo frames everywhere (each switch and NIC)
- raising ddi_msix_alloc_limit to 8
- raising ip_soft_rings_cnt to 16
- raising tcp_deferred_acks_max to 16
- raising tcp_local_dacks_max to 16

Can you tell us which of those changes made the most dramatic
improvement?  I have a similar situation here, with a 2-TB ZFS pool on
a T2000 using Iscsi to a Netapp file server.  Is there any way to tell
in advance if any of those changes will make a difference?  Many of
them seem to be server resources.  How can I determine their current
usage?

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Windows XP nfs client poor performance

2008-10-20 Thread Ross
I've had serious problems trying to get Windows to run as a NFS server with 
SFU.  On a modern raid array I can't get it above 4MB/s transfer rates.  It's 
slow enough that virtual machines running off it time out almost every time I 
try to boot them.

Oddly it worked ok when I used an old IDE disk - I got about 20MB/s out of it 
then.  But it's not good when an IDE disk can outperform a raid array capable 
of around 500MB/s.

It's probably not directly related to the issues you're having, but my own 
experience of SFU would make me want to repeat your tests with a Linux client 
before asuming it's a Solaris or ZFS problem.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] ZFS Success Stories

2008-10-20 Thread Jonathan Loran

We have 135 TB capacity with about 75 TB in use on zfs based storage.  
zfs use started about 2 years ago, and has grown from there.  This spans 
9 SAN appliances, with 5 head nodes, and 2 more recent servers running 
zfs on JBOD with vdevs made up of raidz2. 

So far, the experience has been very positive.  Never lost a bit of 
data.  We scrub weekly, and I've started sleeping better at night.  I 
have also read the horror stories, but we aren't seeing them here. 

We did have some performance issues, especially involving the SAN 
storage on more heavily used systems, but enabling the cache on the SAN 
devices without pushing fsync through to disk basically fixed that.  
Your zfs layout can profoundly effect performance, which is a down 
side.  It's best to test your setup under an approximate realistic work 
load  to balance capacity with performance before deploying.

BTW, most of our zfs deployment is on Solaris 10{u4,u5}, but two large 
servers are on OpenSolaris svn86.  The OpenSolaris servers seem to be 
considerably faster, and more feature rich, without any reliability 
issues, so far.

Jon

gm_sjo wrote:
 Hi all,

 I  have built out an 8TB SAN at home using OpenSolaris + ZFS. I have
 yet to put it into 'production' as a lot of the issues raised on this
 mailing list are putting me off trusting my data onto the platform
 right now.

 Throughout time, I have stored my personal data on NetWare and now NT
 and this solution has been 100% reliable for the last 12 years. Never
 a single problem (nor have I had any issues with NTFS with the tens of
 thousands of spindles i've worked with over the years).

 I appreciate 99% of the time people only comment if they have a
 problem, which is why I think it'd be nice for some people who have
 successfully implemented ZFS, including making various use of the
 features (recovery, replacing disks, etc), could just reply to this
 post with a sentence or paragraph detailing how great it is for them.
 Not necessarily interested in very small implementations of one/two
 disks that haven't changed config since the first day it was
 installed, but more aimed towards setups that are 'organic' and have
 changed/been_administered over time (to show functionality of the
 tools, resilience of the platform, etc.)..

 .. Of course though, I guess a lot of people who may have never had a
 problem wouldn't even be signed up on this list! :-)


 Thanks!
 ___
 storage-discuss mailing list
 [EMAIL PROTECTED]
 http://mail.opensolaris.org/mailman/listinfo/storage-discuss
   

-- 


- _/ _/  /   - Jonathan Loran -   -
-/  /   /IT Manager   -
-  _  /   _  / / Space Sciences Laboratory, UC Berkeley
-/  / /  (510) 643-5146 [EMAIL PROTECTED]
- __/__/__/   AST:7731^29u18e3
 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-20 Thread Jim Dunham
Gary,

   Sidenote: Today we made eight network/iSCSI related tweaks that, in
   aggregate, have resulted in dramatic performance improvements  
 (some I
   just hadn't gotten around to yet, others suggested by Sun's Mertol
   Ozyoney)...
   - disabling the Nagle algorithm on the head node
   - setting each iSCSI target block size to match the ZFS record  
 size of
   128K
   - disabling thin provisioning on the iSCSI targets
   - enabling jumbo frames everywhere (each switch and NIC)
   - raising ddi_msix_alloc_limit to 8
   - raising ip_soft_rings_cnt to 16
   - raising tcp_deferred_acks_max to 16
   - raising tcp_local_dacks_max to 16

 Can you tell us which of those changes made the most dramatic
 improvement?

   - disabling the Nagle algorithm on the head node

This will have a dramatic effective on most I/Os, except for large  
sequential writes.

 - setting each iSCSI target block size to match the ZFS record size  
 of 128K
  - enabling jumbo frames everywhere (each switch and NIC)


These will have a positive effect for large writes, both sequential  
and random

   - disabling thin provisioning on the iSCSI targets

This only has a benefit for file-based or dsk based backing stores. If  
one use rdsk backing stores of any type, this is not an issue.

Jim

 I have a similar situation here, with a 2-TB ZFS pool on
 a T2000 using Iscsi to a Netapp file server.  Is there any way to tell
 in advance if any of those changes will make a difference?  Many of
 them seem to be server resources.  How can I determine their current
 usage?

 -- 
 -Gary Mills--Unix Support--U of M Academic Computing and  
 Networking-
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jim Dunham
Storage Platform Software Group
Sun Microsystems, Inc.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Windows XP nfs client poor performance

2008-10-20 Thread Brent Jones
On Mon, Oct 20, 2008 at 9:29 AM, Bob Bencze [EMAIL PROTECTED] wrote:
 Greetings.
 I have a X4500 with an 8TB RAIDZ datapool, currently 75% full. I have it 
 carved up into several  filesystems. I share out two of the  filesystems 
 /datapool/data4 (approx 1.5TB) and /datapool/data5 (approx 3.5TB). THe data 
 is imagery, and the primary application on the PCs is Socetset.
 The clients are Windows XP Pro, and I use services for unix (SFU) to mount 
 the nfs shares from the thumper. When a client PC accesses files from data4, 
 they come across quickly. When the same client accesses files from data5, the 
 transfer rate comes to a crawl, and sometimes the application times out.
 The only difference I can see is the size of the volume, the data is all of 
 the same type.

 I could find no references for any limitations on the volume size of nfs 
 shares or mounts. It seems inconsistent and difficult to duplicate. I plan to 
 begin a more in-depth troubleshooting of the problem with dtrace.

 Has anyone seen anything like this before?

 Thanks.

 -Bob Bencze
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


SFU NFS is often slow, but tunable, here is something you might find
handy to squeeze some speed out of it:
http://technet.microsoft.com/en-us/library/bb463205.aspx

HTH

-- 
Brent Jones
[EMAIL PROTECTED]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Setting per-file record size / querying fs/file record size?

2008-10-20 Thread Nicolas Williams
I've a report that the mismatch between SQLite3's default block size and
ZFS' causes some performance problems for Thunderbird users.

It'd be great if there was an API by which SQLite3 could set its block
size to match the hosting filesystem or where it could set the DB file's
record size to match the SQLite3/app default block size (1KB).

Is there such an API?  If not, is there an RFE I could add a call record
to?

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS scalability in terms of file system count (or lack thereof) in S10U6

2008-10-20 Thread Paul B. Henson
On Mon, 20 Oct 2008, Pramod Batni wrote:

 Yes, the implementation of the above ioctl walks the list of mounted
 filesystems 'vfslist' [in this case it walks 5000 nodes of a linked list
 before the ioctl returns] This in-kernel traversal of the filesystems is
 taking time.

Hmm, O(n) :(... I guess that is the implementation of getmntent(3C)?

Why does creating a new ZFS filesystem require enumerating all existing
ones?

 You could set 'zfs set mountpoint=none pool-name' and then create the
 filesystems under the pool-name . [In my experiments the number of
 ioctl's went down drastically.] You could then set a mountpoint for the
 pool and then issue a 'zpool mount -a' .

That would work for an initial mass creation, but we are going to need to
create and delete fairly large numbers of file systems over time, this
workaround would not help for that.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  [EMAIL PROTECTED]
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-20 Thread Tano
A couple of updates:

Installed Opensolaris on a Poweredge 1850 with a single network card, default 
iscsitarget configuration (no special tweaks or tpgt settings), vmotion was 
about 10 percent successful before I received write errors on disk.

10 percent better than the Poweredge 1900 iscsitarget.

The GUID's are set by VMWare when the iscsi initiator connects to the 
Opensolaris Target. Therefore I have no control what the GUIDs are and from my 
observations it doesn't matter with the GUIDs are identical. Unless there is a 
bug in Vmware and GUIDs. 

I have followed the instructions to delete the backing stores, the zfs 
partitions and start a new. I even went as far as rebooting the machine after I 
created a Single LUN, connected to the vmware initiator. I then repeated the 
same steps when creating the second LUN. Overall VMWare determined the GUID # 
of the iscsi target. I 

Right now I am applying a ton of VMWare patches that have iscsi connectivity 
repairs and other security updates. 

I will be resorting back to a linux iscsi target model if the patches do not 
work to check whether the physical machines have an abnormality or networking 
that may be causing problems.

I'll be submitting more updates as I continue testing! 

cliff notes: nothing has worked so far :(
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-20 Thread Gray Carper
Hey, Jim! Thanks so much for the excellent assist on this - much better than
I could have ever answered it!

I thought I'd add a little bit on the other four...

 - raising ddi_msix_alloc_limit to 8

For PCI cards that use up to 8 interrupts, which our 10GBe adapters do. The
previous value of 2 could cause some CPU interrupt bottlenecks. So far, this
has been more of a preventative measure - we haven't seen a case where this
really made any performance impact.

 - raising ip_soft_rings_cnt to 16

This increases the number of kernel threads associated with packet
processing and is specifically meant to reduce the latency in handling
10GBe. This showed a small performance improvement.

 - raising tcp_deferred_acks_max to 16

This reduces the number of ACK packets sent, thus reducing the overall TCP
overhead. This showed a small performance improvement.

 - raising tcp_local_dacks_max to 16

This also slows down ACK packets and showed a tiny performance improvement.

Overall, we have found these four settings to not make a whole lot of
difference, but every little bit helps. ; The four that Jim went through
were much more impactful particularly the enabling of jumbo frames and the
disabling of the Nagle algorithm.

-Gray

On Tue, Oct 21, 2008 at 4:21 AM, Jim Dunham [EMAIL PROTECTED] wrote:

 Gary,

   Sidenote: Today we made eight network/iSCSI related tweaks that, in
  aggregate, have resulted in dramatic performance improvements (some I
  just hadn't gotten around to yet, others suggested by Sun's Mertol
  Ozyoney)...
  - disabling the Nagle algorithm on the head node
  - setting each iSCSI target block size to match the ZFS record size of
  128K
  - disabling thin provisioning on the iSCSI targets
  - enabling jumbo frames everywhere (each switch and NIC)
  - raising ddi_msix_alloc_limit to 8
  - raising ip_soft_rings_cnt to 16
  - raising tcp_deferred_acks_max to 16
  - raising tcp_local_dacks_max to 16


 Can you tell us which of those changes made the most dramatic
 improvement?


   - disabling the Nagle algorithm on the head node


 This will have a dramatic effective on most I/Os, except for large
 sequential writes.

  - setting each iSCSI target block size to match the ZFS record size of
 128K
  - enabling jumbo frames everywhere (each switch and NIC)



 These will have a positive effect for large writes, both sequential and
 random

   - disabling thin provisioning on the iSCSI targets


 This only has a benefit for file-based or dsk based backing stores. If one
 use rdsk backing stores of any type, this is not an issue.

 Jim

  I have a similar situation here, with a 2-TB ZFS pool on
 a T2000 using Iscsi to a Netapp file server.  Is there any way to tell
 in advance if any of those changes will make a difference?  Many of
 them seem to be server resources.  How can I determine their current
 usage?

 --
 -Gary Mills--Unix Support--U of M Academic Computing and
 Networking-
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


 Jim Dunham
 Storage Platform Software Group
 Sun Microsystems, Inc.




-- 
Gray Carper
MSIS Technical Services
University of Michigan Medical School
[EMAIL PROTECTED]  |  skype:  graycarper  |  734.418.8506
http://www.umms.med.umich.edu/msis/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss