Re: [zfs-discuss] ZFS flash issue
Thanks Cindy Enda for the info .. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs proerty aclmode gone in 147?
On Tue, Sep 28, 2010 at 12:18:49PM -0700, Paul B. Henson wrote: On Sat, 25 Sep 2010, [iso-8859-1] Ralph Böhme wrote: Darwin ACL model is nice and slick, the new NFSv4 one in 147 is just braindead. chmod resulting in ACLs being discarded is a bizarre design decision. Agreed. What's the point of ACLs that disappear? Sun didn't want to fix acl/chmod interaction, maybe one of the new OpenSolaris forks will do the right thing... I've researched this enough (mainly by reading most of the ~240 or so relevant zfs-discuss posts and several bug reports) to conclude the following: I've researched this by reading the specs for NFSv4, withdrawn draft POSIX 1e and Darwin ACLs, and implemented mapping between them in an UNIX AFP fileserver. - ACLs derived from POSIX mode_t and/or POSIX Draft ACLs that result in DENY ACEs are enormously confusing to users. - ACLs derived from POSIX mode_t and/or POSIX Draft ACLs that result in DENY ACEs are susceptible to ACL re-ordering when modified from Windows clients -which insist on DENY ACEs first-, leading to much confusion. Imo the approach of intertwinig UNIX mode with ACEs was a bad idea in the first place, but it's in the spec so of course the implemenatations that follow it must honour it. POSIX 1e do something similar, but my point being here is, that this is not neccessarily the most clever, clean and safe spec. Note that Darwin (OS X) does _not_ do this mumbo-jumbo, so ... - This all gets more confusing when hand-crafted ZFS inherittable ACEs are mixed with chmod(2)s with the old aclmode=groupmask setting. The old aclmode=passthrough setting was dangerous and had to be removed, period. (Doing chmod(600) would not necessarily deny other users/groups access -- that's very, very broken.) ... in Darwin this will not remove any ACL from the object. The Darwin kernel evaluates permissions in a first match paradigm, evaluating the ACL before the mode and it does not intertwine ACL and mode. It's a slick, clean, easy to understand and safe design. With this model I can stick a ACL to an oject saying deny unredeemed_hacker everything and be sure that this ACL will stick there without being removed by any chmod. Fixing one NFSv4 spec ACL design issue by (mapping mode and ACL) by just removing ACLs when the mapping must be done is spec conforming, but imo a bad idea. I haven't yet really studied the details of this implementation change in 147, so maybe I'm complaining to early. Regards, -r -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [osol-discuss] [illumos-Developer] zpool upgrade and zfs upgrade behavior on b145
Hi Cindy, I did see your first email pointing to that bughttp://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6538600. Apologies for not addressing it earlier. It is my opinion that the behavior Mike, and I http://illumos.org/issues/217 (or anyone else upgrading pools right now) is seeing is a entirely new and different bug. The bug you point to, originally submitted in 2007 says it manifests itself before a reboot. Also you say exporting and importing clear the problem. After several reboots, zdb still shows the older pool version, which means that this is a new bug or perhaps the bug you are referencing is not listing clearly and accurately what it should be and is incomplete. Suppose an export and import can update the pool label config on a large storage pool, great. How would someone go about exporting the rpool the operating system is on?? As far as I know, It's impossible to export the zpool the operating system is running on. I don't think it can be done, but I'm new so maybe I'm missing something. One option I have not explored that might work: Booting to a live CD that has the same or higher pool version present and then doing: zpool import zpool import -f rpool zpool export rpool and then rebooting into the operating system. Perhaps this might be an option that works to update the label config / zdb for rpool but I think fixing the root problem would be much more beneficial for everyone in the long run. Being that zdb is a troubleshooting/debugging tool, I would think that it's necessary for it to be aware of the proper pool version to work properly and so admins know what's really going on with their pools. The bottom line here is that if zdb is going to be part of zfs, it needs to display what is currently on disk, including the label config. If I were an admin thinking about trusting hundreds of GB's of data to zfs I would want the debugger to show me whats really on the disks. Additionally, even though zpool and zfs get version display the true and updated versions, I'm not convinced that the problem is zdb, as the label config is almost certainly set by the zpool and/or zfs commands. Somewhere, something is not happening that is supposed to when initiating a zpool upgrade, but since I know virtually nothing of the internals of zfs, I do not know where. Sincerely, -Chris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [osol-discuss] [illumos-Developer] zpool upgrade and zfs upgrade behavior on b145
Additionally, even though zpool and zfs get version display the true and updated versions, I'm not convinced that the problem is zdb, as the label config is almost certainly set by the zpool and/or zfs commands. Somewhere, something is not happening that is supposed to when initiating a zpool upgrade, but since I know virtually nothing of the internals of zfs, I do The problem is likely in the boot block or in grub. The development version did not update the boot block; newer versions of beadm do fix boot blocks. For now, I'd recommend you upgrade the boot block on all halves of a bootable mirror before you upgrade the zpool version or the zfs version. export/import won't help. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs proerty aclmode gone in 147?
On 9/28/2010 2:13 PM, Nicolas Williams wrote: The version of samba bundled with Solaris 10 seems to insist on chmod'ing stuff. I've tried all of the various options that should disable mapping to mode bits, yet still randomly when people copy files in over CIFS, ACL's get destroyed by chmod interaction and access control is broken. I finally ended up having to preload a shared object that overrides chmod and turns it into a nullop. Oh my! After another re-read of man zfs in onnv135 and the spec for aclmode there, it seems they've really removed the only useful setting for aclmode. To circumvent this I'll probably also have to wrap all chmod(2) calls (in code though, not by preloading like you had to) in my app and turn them into null-ops when performed on a ZFS volume (which my app knows), alongside wrapping all filesystem object creation actions in umask(2) calls in order to get the desired mode. bleah! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [osol-discuss] [illumos-Developer] zpool upgrade and zfs upgrade behavior on b145
Well strangely enough, I just logged into a OS b145 machine. It's rpool is not mirrored, just a single disk. I know that zdb reported zpool version 22 after at least the first 3 reboots after rpool upgrade, so I stopped checking. zdb now reports version 27. This machine has probably been rebooted about five or six times since the pool version upgrade. One should not have to reboot six times! More mystery to this pool upgrade behavior!! -Chris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] My filesystem turned from a directory into a special character device
Interesting thread. So how would you go about fixing this? I suspect you have to track down the vnode, znode_t and eventually modify one kernel buffers for znode_phys_t. If your left with the decision to completely rebuild then repairing this might be the only choice some people may have. Dave On 09/27/10 11:56, Victor Latushkin wrote: On Sep 27, 2010, at 8:30 PM, Scott Meilicke wrote: I am running nexenta CE 3.0.3. I have a file system that at some point in the last week went from a directory per 'ls -l' to a special character device. This results in not being able to get into the file system. Here is my file system, scott2, along with a new file system I just created, as seen by ls -l: drwxr-xr-x 4 root root4 Sep 27 09:14 scott crwxr-xr-x 9 root root 0, 0 Sep 20 11:51 scott2 Notice the 'c' vs. 'd' at the beginning of the permissions list. I had been fiddling with permissions last week, then had problems with a kernel panic. Are you still running with aok/zfs_recover being set? Have you seen this issue before panic? Perhaps this is related? May be. Any ideas how to get access to my file system? This can be fixed, but it is a bit more complicated and error prone that setting couple of variables. Regards Victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs proerty aclmode gone in 147?
On Wed, Sep 29, 2010 at 03:44:57AM -0700, Ralph Böhme wrote: On 9/28/2010 2:13 PM, Nicolas Williams wrote: The version of samba bundled with Solaris 10 seems to insist on chmod'ing stuff. I've tried all of the various Just in case it's not clear, I did not write the quoted text. (One can tell from the level of quotation that an attribution is missing and that none of my text was quoted. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] rpool spare
Using ZFS v22, is it possible to add a hot spare to rpool? Thanks ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] rpool spare
Hi Tony, The current behavior is that you can add a spare to a root pool. If the spare kicks in automatically, you would need to apply the boot blocks manually before you could boot from the spared-in disk. A good alternative is to create a two-way or three-way mirrored root pool. We're tracking the root pool boot issues. If a bug isn't filed for this issue, I will file it. Thanks, Cindy On 09/29/10 08:31, Tony MacDoodle wrote: Using ZFS v22, is it possible to add a hot spare to rpool? Thanks ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Is there any way to stop a resilver?
Is there any way to stop a resilver? We gotta stop this thing - at minimum, completion time is 300,000 hours, and maximum is in the millions. Raidz2 array, so it has the redundancy, we just need to get data off. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is there any way to stop a resilver?
Has it been running long? Initially the numbers are way off. After a while it settles down into something reasonable. How many disks, and what size, are in your raidz2? -Scott On 9/29/10 8:36 AM, LIC mesh licm...@gmail.com wrote: Is there any way to stop a resilver? We gotta stop this thing - at minimum, completion time is 300,000 hours, and maximum is in the millions. Raidz2 array, so it has the redundancy, we just need to get data off. We value your opinion! How may we serve you better? Please click the survey link to tell us how we are doing: http://www.craneae.com/ContactUs/VoiceofCustomer.aspx Your feedback is of the utmost importance to us. Thank you for your time. Crane Aerospace Electronics Confidentiality Statement: The information contained in this email message may be privileged and is confidential information intended only for the use of the recipient, or any employee or agent responsible to deliver it to the intended recipient. Any unauthorized use, distribution or copying of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify the sender immediately and destroy the original message and all attachments from your electronic files. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is there any way to stop a resilver?
It's always running less than an hour. It usually starts at around 300,000h estimate(at 1m in), goes up to an estimate in the millions(about 30mins in) and restarts. Never gets past 0.00% completion, and K resilvered on any LUN. 64 LUNs, 32x5.44T, 32x10.88T in 8 vdevs. On Wed, Sep 29, 2010 at 11:40 AM, Scott Meilicke scott.meili...@craneaerospace.com wrote: Has it been running long? Initially the numbers are *way* off. After a while it settles down into something reasonable. How many disks, and what size, are in your raidz2? -Scott On 9/29/10 8:36 AM, LIC mesh licm...@gmail.com wrote: Is there any way to stop a resilver? We gotta stop this thing - at minimum, completion time is 300,000 hours, and maximum is in the millions. Raidz2 array, so it has the redundancy, we just need to get data off. -- We value your opinion! http://www.craneae.com/surveys/satisfaction.htmHow may we serve you better?Please click the survey link to tell us how we are doing: http://www.craneae.com/surveys/satisfaction.htm http://www.craneae.com/surveys/satisfaction.htm Your feedback is of the utmost importance to us. Thank you for your time. Crane Aerospace Electronics Confidentiality Statement: The information contained in this email message may be privileged and is confidential information intended only for the use of the recipient, or any employee or agent responsible to deliver it to the intended recipient. Any unauthorized use, distribution or copying of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify the sender immediately and destroy the original message and all attachments from your electronic files. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is there any way to stop a resilver?
What version of OS? Are snapshots running (turn them off). So are there eight disks? On 9/29/10 8:46 AM, LIC mesh licm...@gmail.com wrote: It's always running less than an hour. It usually starts at around 300,000h estimate(at 1m in), goes up to an estimate in the millions(about 30mins in) and restarts. Never gets past 0.00% completion, and K resilvered on any LUN. 64 LUNs, 32x5.44T, 32x10.88T in 8 vdevs. On Wed, Sep 29, 2010 at 11:40 AM, Scott Meilicke scott.meili...@craneaerospace.com wrote: Has it been running long? Initially the numbers are way off. After a while it settles down into something reasonable. How many disks, and what size, are in your raidz2? -Scott On 9/29/10 8:36 AM, LIC mesh licm...@gmail.com http://licm...@gmail.com wrote: Is there any way to stop a resilver? We gotta stop this thing - at minimum, completion time is 300,000 hours, and maximum is in the millions. Raidz2 array, so it has the redundancy, we just need to get data off. We value your opinion! How may we serve you better? Please click the survey link to tell us how we are doing: http://www.craneae.com/ContactUs/VoiceofCustomer.aspx Your feedback is of the utmost importance to us. Thank you for your time. Crane Aerospace Electronics Confidentiality Statement: The information contained in this email message may be privileged and is confidential information intended only for the use of the recipient, or any employee or agent responsible to deliver it to the intended recipient. Any unauthorized use, distribution or copying of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify the sender immediately and destroy the original message and all attachments from your electronic files. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is there any way to stop a resilver?
What caused the resilvering to kick off in the first place? Lin On Sep 29, 2010, at 8:46 AM, LIC mesh wrote: It's always running less than an hour. It usually starts at around 300,000h estimate(at 1m in), goes up to an estimate in the millions(about 30mins in) and restarts. Never gets past 0.00% completion, and K resilvered on any LUN. 64 LUNs, 32x5.44T, 32x10.88T in 8 vdevs. On Wed, Sep 29, 2010 at 11:40 AM, Scott Meilicke scott.meili...@craneaerospace.com wrote: Has it been running long? Initially the numbers are way off. After a while it settles down into something reasonable. How many disks, and what size, are in your raidz2? -Scott On 9/29/10 8:36 AM, LIC mesh licm...@gmail.com wrote: Is there any way to stop a resilver? We gotta stop this thing - at minimum, completion time is 300,000 hours, and maximum is in the millions. Raidz2 array, so it has the redundancy, we just need to get data off. We value your opinion! How may we serve you better?Please click the survey link to tell us how we are doing: http://www.craneae.com/surveys/satisfaction.htm Your feedback is of the utmost importance to us. Thank you for your time. Crane Aerospace Electronics Confidentiality Statement: The information contained in this email message may be privileged and is confidential information intended only for the use of the recipient, or any employee or agent responsible to deliver it to the intended recipient. Any unauthorized use, distribution or copying of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify the sender immediately and destroy the original message and all attachments from your electronic files. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Fwd: Is there any way to stop a resilver?
This is an iSCSI/COMSTAR array. The head was running 2009.06 stable with version 14 ZFS, but we updated that to build 134 (kept the old OS drives) - did not, however, update the zpool - it's still version 14. The targets are all running 2009.06 stable, exporting 4 raidz1 LUNs each of 6 drives - 8 shelves have 1TB drives, the other 8 have 2TB drives. The head sees the filesystem as comprised of 8 vdevs of 8 iSCSI LUNs each, with SSD ZIL and SSD L2ARC. On Wed, Sep 29, 2010 at 11:49 AM, Scott Meilicke scott.meili...@craneaerospace.com wrote: What version of OS? Are snapshots running (turn them off). So are there eight disks? On 9/29/10 8:46 AM, LIC mesh licm...@gmail.com wrote: It's always running less than an hour. It usually starts at around 300,000h estimate(at 1m in), goes up to an estimate in the millions(about 30mins in) and restarts. Never gets past 0.00% completion, and K resilvered on any LUN. 64 LUNs, 32x5.44T, 32x10.88T in 8 vdevs. On Wed, Sep 29, 2010 at 11:40 AM, Scott Meilicke scott.meili...@craneaerospace.com wrote: Has it been running long? Initially the numbers are *way* off. After a while it settles down into something reasonable. How many disks, and what size, are in your raidz2? -Scott On 9/29/10 8:36 AM, LIC mesh licm...@gmail.com http://licm...@gmail.com wrote: Is there any way to stop a resilver? We gotta stop this thing - at minimum, completion time is 300,000 hours, and maximum is in the millions. Raidz2 array, so it has the redundancy, we just need to get data off. -- We value your opinion! http://www.craneae.com/surveys/satisfaction.htmHow may we serve you better?Please click the survey link to tell us how we are doing: http://www.craneae.com/surveys/satisfaction.htm http://www.craneae.com/surveys/satisfaction.htm Your feedback is of the utmost importance to us. Thank you for your time. Crane Aerospace Electronics Confidentiality Statement: The information contained in this email message may be privileged and is confidential information intended only for the use of the recipient, or any employee or agent responsible to deliver it to the intended recipient. Any unauthorized use, distribution or copying of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify the sender immediately and destroy the original message and all attachments from your electronic files. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is there any way to stop a resilver?
Most likely an iSCSI timeout, but that was before my time here. Since then, there have been various individual drives lost along the way on the shelves, but never a whole LUN, so, theoretically, /except/ for iSCSI timeouts, there has been no great reason to resilver. On Wed, Sep 29, 2010 at 11:51 AM, Lin Ling lin.l...@oracle.com wrote: What caused the resilvering to kick off in the first place? Lin On Sep 29, 2010, at 8:46 AM, LIC mesh wrote: It's always running less than an hour. It usually starts at around 300,000h estimate(at 1m in), goes up to an estimate in the millions(about 30mins in) and restarts. Never gets past 0.00% completion, and K resilvered on any LUN. 64 LUNs, 32x5.44T, 32x10.88T in 8 vdevs. On Wed, Sep 29, 2010 at 11:40 AM, Scott Meilicke scott.meili...@craneaerospace.com wrote: Has it been running long? Initially the numbers are *way* off. After a while it settles down into something reasonable. How many disks, and what size, are in your raidz2? -Scott On 9/29/10 8:36 AM, LIC mesh licm...@gmail.com wrote: Is there any way to stop a resilver? We gotta stop this thing - at minimum, completion time is 300,000 hours, and maximum is in the millions. Raidz2 array, so it has the redundancy, we just need to get data off. -- We value your opinion! http://www.craneae.com/surveys/satisfaction.htmHow may we serve you better?Please click the survey link to tell us how we are doing: http://www.craneae.com/surveys/satisfaction.htm http://www.craneae.com/surveys/satisfaction.htm Your feedback is of the utmost importance to us. Thank you for your time. Crane Aerospace Electronics Confidentiality Statement: The information contained in this email message may be privileged and is confidential information intended only for the use of the recipient, or any employee or agent responsible to deliver it to the intended recipient. Any unauthorized use, distribution or copying of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify the sender immediately and destroy the original message and all attachments from your electronic files. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mac OS X clients with ZFS server
Hi all, Thanks to some clues from people on this list, I have finally resolved this issue! To summarise, I was having problems with timeouts when applications on my MacBook Pro tried to create new files on an NFS file system that was mounted from my server running snv_130 (writes to existing files were fine). The solution was to assign a static IP address to the MBP and ensure that proper forward and reverse DNS entries were present (in all of my tests to date, the MBP was using DHCP to gets its IP address, and I haven't bothered populating my DNS with DHCP-related entries). Once the MBP was using a static IP address as described above, creating new files on NFS mounted file systems works as flawlessly as one would expect! Thanks again to everyone who chimed in with ideas. No, if I could only stop Aqua from doing the brain-dead click mouse for input focus and auto-raise the window that has input focus things I'd be really happy.. -- Rich Teer, Publisher Vinylphile Magazine www.vinylphilemag.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] rpool spare
Tony, A brief follow-up is that the issue of applying the boot blocks automatically to a spare for a root pool is covered by this existing CR 6668666. See this URL for more details. http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6668666 Thanks, Cindy On 09/29/10 08:38, Cindy Swearingen wrote: Hi Tony, The current behavior is that you can add a spare to a root pool. If the spare kicks in automatically, you would need to apply the boot blocks manually before you could boot from the spared-in disk. A good alternative is to create a two-way or three-way mirrored root pool. We're tracking the root pool boot issues. If a bug isn't filed for this issue, I will file it. Thanks, Cindy On 09/29/10 08:31, Tony MacDoodle wrote: Using ZFS v22, is it possible to add a hot spare to rpool? Thanks ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Fwd: Is there any way to stop a resilver?
(I left the list off last time sorry) No, the resliver should only be happening if there was a spare available. Is the whole thing scrubbing? It looks like it. Can you stop it with a zpool scrub s pool So... Word of warning, I am no expert at this stuff. Think about what I am suggesting before you do it :). Although stopping a scrub is pretty innocuous. -Scott On 9/29/10 9:22 AM, LIC mesh licm...@gmail.com wrote: You almost have it - each iSCSI target is made up of 4 of the raidz vdevs - 4 * 6 = 24 disks. 16 targets total. We have one LUN with status of UNAVAIL but didn't know if removing it outright would help - it's actually available and well as far as the target is concerned, so we thought it went UNAVAIL as a result of iSCSI timeouts - we've since fixed the switches buffers, etc. See: http://pastebin.com/pan9DBBS On Wed, Sep 29, 2010 at 12:17 PM, Scott Meilicke scott.meili...@craneaerospace.com wrote: OK, let me see if I have this right: 8 shelves, 1T disks, 24 disks per shelf = 192 disks 8 shelves, 2T disks, 24 disks per shelf = 192 disks Each raidz is six disks. 64 raidz vdevs Each iSCSI target is made up of 8 of these raidz vdevs (8 x 6 disks = 48 disks) Then the head takes these eight targets, and makes a raidz2. So the raidz2 depends upon all 384 disks. So when a failure occurs, the resliver is accessing all 384 disks. If I have this right, which I am in serious doubt :), then that will either take an enormous amount of time to complete, or never. It looks like never. Recovery: From the head, can you see which vdev has failed? If so, can you remove it to stop the resliver? On 9/29/10 8:57 AM, LIC mesh licm...@gmail.com http://licm...@gmail.com wrote: This is an iSCSI/COMSTAR array. The head was running 2009.06 stable with version 14 ZFS, but we updated that to build 134 (kept the old OS drives) - did not, however, update the zpool - it's still version 14. The targets are all running 2009.06 stable, exporting 4 raidz1 LUNs each of 6 drives - 8 shelves have 1TB drives, the other 8 have 2TB drives. The head sees the filesystem as comprised of 8 vdevs of 8 iSCSI LUNs each, with SSD ZIL and SSD L2ARC. On Wed, Sep 29, 2010 at 11:49 AM, Scott Meilicke scott.meili...@craneaerospace.com http://scott.meili...@craneaerospace.com wrote: What version of OS? Are snapshots running (turn them off). So are there eight disks? On 9/29/10 8:46 AM, LIC mesh licm...@gmail.com http://licm...@gmail.com http://licm...@gmail.com wrote: It's always running less than an hour. It usually starts at around 300,000h estimate(at 1m in), goes up to an estimate in the millions(about 30mins in) and restarts. Never gets past 0.00% completion, and K resilvered on any LUN. 64 LUNs, 32x5.44T, 32x10.88T in 8 vdevs. On Wed, Sep 29, 2010 at 11:40 AM, Scott Meilicke scott.meili...@craneaerospace.com http://scott.meili...@craneaerospace.com http://scott.meili...@craneaerospace.com wrote: Has it been running long? Initially the numbers are way off. After a while it settles down into something reasonable. How many disks, and what size, are in your raidz2? -Scott On 9/29/10 8:36 AM, LIC mesh licm...@gmail.com http://licm...@gmail.com http://licm...@gmail.com http://licm...@gmail.com wrote: Is there any way to stop a resilver? We gotta stop this thing - at minimum, completion time is 300,000 hours, and maximum is in the millions. Raidz2 array, so it has the redundancy, we just need to get data off. We value your opinion! http://www.craneae.com/surveys/satisfaction.htm How may we serve you better?Please click the survey link to tell us how we are doing: http://www.craneae.com/surveys/satisfaction.htm http://www.craneae.com/surveys/satisfaction.htm http://www.craneae.com/surveys/satisfaction.htm Your feedback is of the utmost importance to us. Thank you for your time. Crane Aerospace Electronics Confidentiality Statement: The information contained in this email message may be privileged and is confidential information intended only for the use of the recipient, or any employee or agent responsible to deliver it to the intended recipient. Any unauthorized use, distribution or copying of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify the sender immediately and destroy the original message and all attachments from your electronic files. -- Scott Meilicke | Enterprise Systems Administrator | Crane Aerospace Electronics | +1 425-743-8153 | M: +1 206-406-2670 We value your opinion! How may we serve you better? Please click the survey link to tell us how we are doing: http://www.craneae.com/ContactUs/VoiceofCustomer.aspx Your feedback is of the utmost importance to us. Thank you for your time.
Re: [zfs-discuss] Resilver endlessly restarting at completion
The endless resilver problem still persists on OI b147. Restarts when it should complete. I see no other solution than to copy the data to safety and recreate the array. Any hints would be appreciated as that takes days unless i can stop or pause the resilvering. On Mon, Sep 27, 2010 at 1:13 PM, Tuomas Leikola tuomas.leik...@gmail.comwrote: Hi! My home server had some disk outages due to flaky cabling and whatnot, and started resilvering to a spare disk. During this another disk or two dropped, and were reinserted into the array. So no devices were actually lost, they just were intermittently away for a while each. The situation is currently as follows: pool: tank state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver in progress for 5h33m, 22.47% done, 19h10m to go config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c11t1d0p0 ONLINE 0 0 0 c11t2d0ONLINE 0 0 5 c11t6d0p0 ONLINE 0 0 0 spare-3ONLINE 0 0 0 c11t3d0p0ONLINE 0 0 0 106M resilvered c9d1 ONLINE 0 0 0 104G resilvered c11t4d0p0 ONLINE 0 0 0 c11t0d0p0 ONLINE 0 0 0 c11t5d0p0 ONLINE 0 0 0 c11t7d0p0 ONLINE 0 0 0 93.6G resilvered raidz1-2 ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 c6t3d0 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 2.50K resilvered c6t5d0 ONLINE 0 0 0 c6t6d0 ONLINE 0 0 0 c6t7d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 1 logs /dev/zvol/dsk/rpool/log ONLINE 0 0 0 cache c6t0d0p0 ONLINE 0 0 0 spares c9d1 INUSE currently in use errors: No known data errors And this has been going on for a week now, always restarting when it should complete. The questions in my mind atm: 1. How can i determine the cause for each resilver? Is there a log? 2. Why does it resilver the same data over and over, and not just the changed bits? 3. Can i force remove c9d1 as it is no longer needed but c11t3 can be resilvered instead? I'm running opensolaris 134, but the event originally happened on 111b. I upgraded and tried quiescing snapshots and IO, none of which helped. I've already ordered some new hardware to recreate this entire array as raidz2 among other things, but there's about a week of time when I can run debuggers and traces if instructed to. - Tuomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Resilver endlessly restarting at completion
Answers below... Tuomas Leikola wrote: The endless resilver problem still persists on OI b147. Restarts when it should complete. I see no other solution than to copy the data to safety and recreate the array. Any hints would be appreciated as that takes days unless i can stop or pause the resilvering. On Mon, Sep 27, 2010 at 1:13 PM, Tuomas Leikola tuomas.leik...@gmail.com mailto:tuomas.leik...@gmail.com wrote: Hi! My home server had some disk outages due to flaky cabling and whatnot, and started resilvering to a spare disk. During this another disk or two dropped, and were reinserted into the array. So no devices were actually lost, they just were intermittently away for a while each. The situation is currently as follows: pool: tank state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver in progress for 5h33m, 22.47% done, 19h10m to go config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c11t1d0p0 ONLINE 0 0 0 c11t2d0ONLINE 0 0 5 c11t6d0p0 ONLINE 0 0 0 spare-3ONLINE 0 0 0 c11t3d0p0ONLINE 0 0 0 106M resilvered c9d1 ONLINE 0 0 0 104G resilvered c11t4d0p0 ONLINE 0 0 0 c11t0d0p0 ONLINE 0 0 0 c11t5d0p0 ONLINE 0 0 0 c11t7d0p0 ONLINE 0 0 0 93.6G resilvered raidz1-2 ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 c6t3d0 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 2.50K resilvered c6t5d0 ONLINE 0 0 0 c6t6d0 ONLINE 0 0 0 c6t7d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 1 logs /dev/zvol/dsk/rpool/log ONLINE 0 0 0 cache c6t0d0p0 ONLINE 0 0 0 spares c9d1 INUSE currently in use errors: No known data errors And this has been going on for a week now, always restarting when it should complete. The questions in my mind atm: 1. How can i determine the cause for each resilver? Is there a log? If you're running OI b147 then you should be able to do the following: # echo ::zfs_dbgmsg | mdb -k /var/tmp/dbg.out Send me the output. 2. Why does it resilver the same data over and over, and not just the changed bits? If you're having drives fail prior to the initial resilver finishing then it will restart and do all the work over again. Are drives still failing randomly for you? 3. Can i force remove c9d1 as it is no longer needed but c11t3 can be resilvered instead? You can detach the spare and let the resilver work on only c11t3. Can you send me the output of 'zdb - tank 0'? Thanks, George ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is there any way to stop a resilver?
Can you post the output of 'zpool status'? Thanks, George LIC mesh wrote: Most likely an iSCSI timeout, but that was before my time here. Since then, there have been various individual drives lost along the way on the shelves, but never a whole LUN, so, theoretically, /except/ for iSCSI timeouts, there has been no great reason to resilver. On Wed, Sep 29, 2010 at 11:51 AM, Lin Ling lin.l...@oracle.com mailto:lin.l...@oracle.com wrote: What caused the resilvering to kick off in the first place? Lin On Sep 29, 2010, at 8:46 AM, LIC mesh wrote: It's always running less than an hour. It usually starts at around 300,000h estimate(at 1m in), goes up to an estimate in the millions(about 30mins in) and restarts. Never gets past 0.00% completion, and K resilvered on any LUN. 64 LUNs, 32x5.44T, 32x10.88T in 8 vdevs. On Wed, Sep 29, 2010 at 11:40 AM, Scott Meilicke scott.meili...@craneaerospace.com mailto:scott.meili...@craneaerospace.com wrote: Has it been running long? Initially the numbers are *way* off. After a while it settles down into something reasonable. How many disks, and what size, are in your raidz2? -Scott On 9/29/10 8:36 AM, LIC mesh licm...@gmail.com http://licm...@gmail.com/ wrote: Is there any way to stop a resilver? We gotta stop this thing - at minimum, completion time is 300,000 hours, and maximum is in the millions. Raidz2 array, so it has the redundancy, we just need to get data off. We value your opinion! http://www.craneae.com/surveys/satisfaction.htm How may we serve you better?Please click the survey link to tell us how we are doing: http://www.craneae.com/surveys/satisfaction.htmhttp://www.craneae.com/surveys/satisfaction.htm Your feedback is of the utmost importance to us. Thank you for your time. Crane Aerospace Electronics Confidentiality Statement: The information contained in this email message may be privileged and is confidential information intended only for the use of the recipient, or any employee or agent responsible to deliver it to the intended recipient. Any unauthorized use, distribution or copying of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify the sender immediately and destroy the original message and all attachments from your electronic files. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Resilver endlessly restarting at completion
Thanks for taking an interest. Answers below. On Wed, Sep 29, 2010 at 9:01 PM, George Wilson george.r.wil...@oracle.comwrote: On Mon, Sep 27, 2010 at 1:13 PM, Tuomas Leikola tuomas.leik...@gmail.commailto: tuomas.leik...@gmail.com wrote: (continuous resilver loop) has been going on for a week now, always restarting when it should complete. The questions in my mind atm: 1. How can i determine the cause for each resilver? Is there a log? If you're running OI b147 then you should be able to do the following: # echo ::zfs_dbgmsg | mdb -k /var/tmp/dbg.out Send me the output. Sending verbose output in a separate email. I'm not very familiar with this but it does show some restarting lines. 2. Why does it resilver the same data over and over, and not just the changed bits? If you're having drives fail prior to the initial resilver finishing then it will restart and do all the work over again. Are drives still failing randomly for you? Drives haven't been dropping since the initial incidents. It's run to completion a few times now without (visible) issues with the drives. Then again I think there is some magic to reinsert a device back into the array if there is some intermittent SATA disconnection. 3. Can i force remove c9d1 as it is no longer needed but c11t3 can be resilvered instead? You can detach the spare and let the resilver work on only c11t3. Can you send me the output of 'zdb - tank 0'? Detach commands complain there's not enough replicas. Of course I can physically remove the device, at which point a scrub would suffice (the disks must be relatively well up-to-date by now..) Sending zdb output in a separate mail as soon as it completes.. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Migrating to an aclmode-less world
Currently I'm still using OpenSolaris b134 and I had used the 'aclmode' property on my file systems. However, the aclmode property has been dropped now: http://arc.opensolaris.org/caselog/PSARC/2010/029/20100126_mark.shellenbaum I'm wondering what will happen to the ACLs on these files and directories if I upgrade to a newer Solaris version (OpenIndiana b147 perhaps). I'm sharing the file systems using CIFS. I was using very simple ACLs like below for easy inheritance of ACLs, which worked OK for my needs. # zfs set aclinherit=passthrough tank/home/fred/projects # zfs set aclmode=passthrough tank/home/fred/projects # chmod A=\ owner@:rwxpdDaARWcCos:fd-:allow,\ group@:rwxpdDaARWcCos:fd-:allow,\ everyone@:rwxpdDaARWcCos:fd-:deny \ /tank/home/fred/projects # chown fred:fred /tank/home/fred/projects # zfs set sharesmb=name=projects tank/home/fred/projects Cheers, Simon -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] When Zpool has no space left and no snapshots
On Wed, September 22, 2010 21:25, Aleksandr Levchuk wrote: I ran out of space, consequently could not rm or truncate files. (It make sense because it's a copy-on-write and any transaction needs to be written to disk. It worked out really well - all I had to do is destroy some snapshots.) If there are no snapshots to destroy, how to prepare for a situation when a ZFS pool looses it's last free byte? Add some more space somewhere around 90%, or earlier :-). If you do get stuck, you can add another vdev when full, too. Just remember that you're stuck with whatever you add forever, since there's no way to remove a vdev from a pool. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] tagged ACL groups: let's just keep digging until we come out the other side (was: zfs proerty aclmode gone in 147?)
rb == Ralph Böhme ra...@rsrc.de writes: rb The Darwin kernel evaluates permissions in a first rb match paradigm, evaluating the ACL before the mode well...I think it would be better to AND them together like AFS did. In that case it doesn't make any difference in which order you do it because AND is commutative. The Darwin method you describe means one might remove permissions with chmod but still have access granted under first-match by the ACL. I just tested, and Darwin does indeed work this way. :( One way to get from NFSv4 to what I want is that you might add EVEN MORE complexity and have ``tagged ACL groups'': * all the existing ACL tools and NFS/SMB clients targeting the #(null) tag, * traditional 'chmod' unix permissions targeting the #(unix) tag. * The evaluation within a tag-group is first-match like now, * The result of each tag-group is ANDed together for the final evaluation When accomodating Darwin ACL's or Windows ACL's or Linux NFSv4 ACL's or translated POSIX ACL's, the result of the imperfect translation can be shoved into a tag-group if it's unclean. The way I would implement the userspace, tools would display all tag groups if given some new argument, but they would always be incapable of editing any tag group except #(null). Another chroot-like tool would swap a given tag-group for #(null) for all child processes: car...@awabagal:~/bar$ ls -v\# foo -rw-r--r-- 1 carton carton 0 Sep 29 18:31 foo 0#(unix):owner@:execute:deny 1#(unix):owner@:read_data/write_data/append_data/write_xattr/write_attributes /write_acl/write_owner:allow 2#(unix):group@:write_data/append_data/execute:deny 3#(unix):group@:read_data:allow 4#(unix):everyone@:write_data/append_data/write_xattr/execute/write_attributes /write_acl/write_owner:deny 5#(unix):everyone@:read_data/read_xattr/read_attributes/read_acl/synchronize :allow car...@awabagal:~/bar$ chmod A+owner@:write_data:deny foo car...@awabagal:~/bar$ ls -v\# foo -rw-r--r-- 1 carton carton 0 Sep 29 18:31 foo 0#(null):owner@:write_data:deny # 0#(unix):owner@:execute:deny 1#(unix):owner@:read_data/write_data/append_data/write_xattr/write_attributes /write_acl/write_owner:allow 2#(unix):group@:write_data/append_data/execute:deny 3#(unix):group@:read_data:allow 4#(unix):everyone@:write_data/append_data/write_xattr/execute/write_attributes /write_acl/write_owner:deny 5#(unix):everyone@:read_data/read_xattr/read_attributes/read_acl/synchronize :allow car...@awabagal:~/bar$ echo lala foo -bash: foo: Permission denied car...@awabagal:~/bar$ chpacl baz ls -v\# foo -rw-r--r-- 1 carton carton 0 Sep 29 18:31 foo # 0#root:owner@:write_data:deny -- #root is what's mapped to #(null) at boot # 0#(unix):owner@:execute:deny 1#(unix):owner@:read_data/write_data/append_data/write_xattr/write_attributes /write_acl/write_owner:allow 2#(unix):group@:write_data/append_data/execute:deny 3#(unix):group@:read_data:allow 4#(unix):everyone@:write_data/append_data/write_xattr/execute/write_attributes /write_acl/write_owner:deny 5#(unix):everyone@:read_data/read_xattr/read_attributes/read_acl/synchronize :allow car...@awabagal:~/bar$ chpacl '(null)' true chpacl: '(null)' is reserved. car...@awabagal:~/bar$ chpacl baz chmod A+owner@:read_data:deny foo car...@awabagal:~/bar$ chpacl baz ls -v\# foo -rw-r--r-- 1 carton carton 0 Sep 29 18:31 foo 0#(null):owner@:read_data:deny # 0#root:owner@:write_data:deny # 0#(unix):owner@:execute:deny 1#(unix):owner@:read_data/write_data/append_data/write_xattr/write_attributes /write_acl/write_owner:allow 2#(unix):group@:write_data/append_data/execute:deny 3#(unix):group@:read_data:allow 4#(unix):everyone@:write_data/append_data/write_xattr/execute/write_attributes /write_acl/write_owner:deny 5#(unix):everyone@:read_data/read_xattr/read_attributes/read_acl/synchronize :allow car...@awabagal:~bar$ cat foo -bash: foo: Permission denied car...@awabagal:~bar$ chpacl baz cat foo -- current tagspace is irrelevant to ACL evaluation -bash: foo: Permission denied car...@awabagal:~/bar$ ls -v\# foo -rw-r--r-- 1 carton carton 0 Sep 29 18:31 foo 0#(null):owner@:write_data:deny # 0#baz:owner@:read_data:deny # 0#(unix):owner@:execute:deny 1#(unix):owner@:read_data/write_data/append_data/write_xattr/write_attributes /write_acl/write_owner:allow 2#(unix):group@:write_data/append_data/execute:deny 3#(unix):group@:read_data:allow 4#(unix):everyone@:write_data/append_data/write_xattr/execute/write_attributes /write_acl/write_owner:deny 5#(unix):everyone@:read_data/read_xattr/read_attributes/read_acl/synchronize :allow
Re: [zfs-discuss] When Zpool has no space left and no snapshots
You can truncate a file: Echo bigfile That will free up space without the 'rm' -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of David Dyer-Bennet Sent: Wednesday, September 29, 2010 12:59 PM To: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] When Zpool has no space left and no snapshots On Wed, September 22, 2010 21:25, Aleksandr Levchuk wrote: I ran out of space, consequently could not rm or truncate files. (It make sense because it's a copy-on-write and any transaction needs to be written to disk. It worked out really well - all I had to do is destroy some snapshots.) If there are no snapshots to destroy, how to prepare for a situation when a ZFS pool looses it's last free byte? Add some more space somewhere around 90%, or earlier :-). If you do get stuck, you can add another vdev when full, too. Just remember that you're stuck with whatever you add forever, since there's no way to remove a vdev from a pool. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] When Zpool has no space left and no snapshots
On Wed, September 29, 2010 15:17, Matt Cowger wrote: You can truncate a file: Echo bigfile That will free up space without the 'rm' Copy-on-write; the new version gets written to the disk before the old version is released, it doesn't just overwrite. AND, if it's in any snapshots, the old version doesn't get released. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] tagged ACL groups: let's just keep digging until we come out the other side (was: zfs proerty aclmode gone in 147?)
Keep in mind that Windows lacks a mode_t. We need to interop with Windows. If a Windows user cannot completely change file perms because there's a mode_t completely out of their reach... they'll be frustrated. Thus an ACL-and-mode model where both are applied doesn't work. It'd be nice, but it won't work. The mode has to be entirely encoded by the ACL. But we can't resort to interesting encoding tricks as Windows users won't understand them. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] tagged ACL groups: let's just keep digging until we come out the other side (was: zfs proerty aclmode gone in 147?)
Keep in mind that Windows lacks a mode_t. We need to interop with Windows. Oh my, I see. Another itch to scratch. Now at least Windows users are happy while me and mabye others are not. -r -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Resliver making the system unresponsive
This must be resliver day :) I just had a drive failure. The hot spare kicked in, and access to the pool over NFS was effectively zero for about 45 minutes. Currently the pool is still reslivering, but for some reason I can access the file system now. Resliver speed has been beaten to death I know, but is there a way to avoid this? For example, is more enterprisy hardware less susceptible to reslivers? This box is used for development VMs, but there is no way I would consider this for production with this kind of performance hit during a resliver. My hardware: Dell 2950 16G ram 16 disk SAS chassis LSI 3801 (I think) SAS card (1068e chip) Intel x25-e SLOG off of the internal PERC 5/i RAID controller Seagate 750G disks (7200.11) I am running Nexenta CE 3.0.3 (SunOS rawhide 5.11 NexentaOS_134f i86pc i386 i86pc Solaris) pool: data01 state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Wed Sep 29 14:03:52 2010 1.12T scanned out of 5.00T at 311M/s, 3h37m to go 82.0G resilvered, 22.42% done config: NAME STATE READ WRITE CKSUM data01 DEGRADED 0 0 0 raidz2-0 ONLINE 0 0 0 c1t8d0 ONLINE 0 0 0 c1t9d0 ONLINE 0 0 0 c1t10d0ONLINE 0 0 0 c1t11d0ONLINE 0 0 0 c1t12d0ONLINE 0 0 0 c1t13d0ONLINE 0 0 0 c1t14d0ONLINE 0 0 0 raidz2-1 DEGRADED 0 0 0 c1t22d0ONLINE 0 0 0 c1t15d0ONLINE 0 0 0 c1t16d0ONLINE 0 0 0 c1t17d0ONLINE 0 0 0 c1t23d0ONLINE 0 0 0 spare-5REMOVED 0 0 0 c1t20d0 REMOVED 0 0 0 c8t18d0 ONLINE 0 0 0 (resilvering) c1t21d0ONLINE 0 0 0 logs c0t1d0 ONLINE 0 0 0 spares c8t18d0 INUSE currently in use errors: No known data errors Thanks for any insights. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] tagged ACL groups: let's just keep digging until we come out the other side (was: zfs proerty aclmode gone in 147?)
On Wed, Sep 29, 2010 at 03:09:22PM -0700, Ralph Böhme wrote: Keep in mind that Windows lacks a mode_t. We need to interop with Windows. Oh my, I see. Another itch to scratch. Now at least Windows users are happy while me and mabye others are not. Yes. Pardon me for forgetting to mention this earlier. There's so many wrinkles here... But this is one of the biggers; I should not have forgotten it. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] tagged ACL groups: let's just keep digging until we come out the other side (was: zfs proerty aclmode gone in 147?)
On Wed, Sep 29, 2010 at 05:21:51PM -0500, Nicolas Williams wrote: On Wed, Sep 29, 2010 at 03:09:22PM -0700, Ralph Böhme wrote: Keep in mind that Windows lacks a mode_t. We need to interop with Windows. Oh my, I see. Another itch to scratch. Now at least Windows users are happy while me and mabye others are not. Yes. Pardon me for forgetting to mention this earlier. There's so many wrinkles here... But this is one of the biggers; I should not have s/biggers/biggest/ forgotten it. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Resliver making the system unresponsive
I should add I have 477 snapshots across all files systems. Most of them are hourly snaps (225 of them anyway). On Sep 29, 2010, at 3:16 PM, Scott Meilicke wrote: This must be resliver day :) I just had a drive failure. The hot spare kicked in, and access to the pool over NFS was effectively zero for about 45 minutes. Currently the pool is still reslivering, but for some reason I can access the file system now. Resliver speed has been beaten to death I know, but is there a way to avoid this? For example, is more enterprisy hardware less susceptible to reslivers? This box is used for development VMs, but there is no way I would consider this for production with this kind of performance hit during a resliver. My hardware: Dell 2950 16G ram 16 disk SAS chassis LSI 3801 (I think) SAS card (1068e chip) Intel x25-e SLOG off of the internal PERC 5/i RAID controller Seagate 750G disks (7200.11) I am running Nexenta CE 3.0.3 (SunOS rawhide 5.11 NexentaOS_134f i86pc i386 i86pc Solaris) pool: data01 state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Wed Sep 29 14:03:52 2010 1.12T scanned out of 5.00T at 311M/s, 3h37m to go 82.0G resilvered, 22.42% done config: NAME STATE READ WRITE CKSUM data01 DEGRADED 0 0 0 raidz2-0 ONLINE 0 0 0 c1t8d0 ONLINE 0 0 0 c1t9d0 ONLINE 0 0 0 c1t10d0ONLINE 0 0 0 c1t11d0ONLINE 0 0 0 c1t12d0ONLINE 0 0 0 c1t13d0ONLINE 0 0 0 c1t14d0ONLINE 0 0 0 raidz2-1 DEGRADED 0 0 0 c1t22d0ONLINE 0 0 0 c1t15d0ONLINE 0 0 0 c1t16d0ONLINE 0 0 0 c1t17d0ONLINE 0 0 0 c1t23d0ONLINE 0 0 0 spare-5REMOVED 0 0 0 c1t20d0 REMOVED 0 0 0 c8t18d0 ONLINE 0 0 0 (resilvering) c1t21d0ONLINE 0 0 0 logs c0t1d0 ONLINE 0 0 0 spares c8t18d0 INUSE currently in use errors: No known data errors Thanks for any insights. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Scott Meilicke ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Resliver making the system unresponsive
Yeah, I'm having a combination of this and the resilver constantly restarting issue. And nothing to free up space. It was recommended to me to replace any expanders I had between the HBA and the drives with extra HBAs, but my array doesn't have expanders. If your's does, you may want to try that. Otherwise, wait it out :( On Wed, Sep 29, 2010 at 6:37 PM, Scott Meilicke sc...@kmclan.net wrote: I should add I have 477 snapshots across all files systems. Most of them are hourly snaps (225 of them anyway). On Sep 29, 2010, at 3:16 PM, Scott Meilicke wrote: This must be resliver day :) I just had a drive failure. The hot spare kicked in, and access to the pool over NFS was effectively zero for about 45 minutes. Currently the pool is still reslivering, but for some reason I can access the file system now. Resliver speed has been beaten to death I know, but is there a way to avoid this? For example, is more enterprisy hardware less susceptible to reslivers? This box is used for development VMs, but there is no way I would consider this for production with this kind of performance hit during a resliver. My hardware: Dell 2950 16G ram 16 disk SAS chassis LSI 3801 (I think) SAS card (1068e chip) Intel x25-e SLOG off of the internal PERC 5/i RAID controller Seagate 750G disks (7200.11) I am running Nexenta CE 3.0.3 (SunOS rawhide 5.11 NexentaOS_134f i86pc i386 i86pc Solaris) pool: data01 state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Wed Sep 29 14:03:52 2010 1.12T scanned out of 5.00T at 311M/s, 3h37m to go 82.0G resilvered, 22.42% done config: NAME STATE READ WRITE CKSUM data01 DEGRADED 0 0 0 raidz2-0 ONLINE 0 0 0 c1t8d0 ONLINE 0 0 0 c1t9d0 ONLINE 0 0 0 c1t10d0ONLINE 0 0 0 c1t11d0ONLINE 0 0 0 c1t12d0ONLINE 0 0 0 c1t13d0ONLINE 0 0 0 c1t14d0ONLINE 0 0 0 raidz2-1 DEGRADED 0 0 0 c1t22d0ONLINE 0 0 0 c1t15d0ONLINE 0 0 0 c1t16d0ONLINE 0 0 0 c1t17d0ONLINE 0 0 0 c1t23d0ONLINE 0 0 0 spare-5REMOVED 0 0 0 c1t20d0 REMOVED 0 0 0 c8t18d0 ONLINE 0 0 0 (resilvering) c1t21d0ONLINE 0 0 0 logs c0t1d0 ONLINE 0 0 0 spares c8t18d0 INUSE currently in use errors: No known data errors Thanks for any insights. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Scott Meilicke ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss