Re: [zfs-discuss] Legality and the future of zfs...
On Thu, 8 Jul 2010, Edward Ned Harvey wrote: Yep. Provided it supported ZFS, a Mac Mini makes for a compelling SOHO server. Warning: a Mac Mini does not have eSATA ports for external storage. It's dangerous to use USB for external storage since many (most? all?) USB-SATA chips discard SYNC instead of passing FLUSH to the drive - very bad for ZFS. Dell's Zino HD is a better choice - it has two eSATA ports (port multiplier capable). -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Legality and the future of zfs...
On 9 Jul 2010, at 08:55, James Van Artsdalen james-opensola...@jrv.org wrote: On Thu, 8 Jul 2010, Edward Ned Harvey wrote: Yep. Provided it supported ZFS, a Mac Mini makes for a compelling SOHO server. Warning: a Mac Mini does not have eSATA ports for external storage. It's dangerous to use USB for external storage since many (most? all?) USB-SATA chips discard SYNC instead of passing FLUSH to the drive - very bad for ZFS. All Mac Minis have FireWire - the new ones have FW800. In any case, the server class mini has two internal hard drives which make them amenable to mirroring. The Mac ZFS port limps on in any case - though I've not managed to spend much time on it recently, I have been making progress this week. The Google code project is at http://code.google.com/p/maczfs/ and my Github is at http://github.com/alblue/ for those that are interested. Alex ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Consequences of resilvering failure
On Tue, Jul 6, 2010 at 10:05 AM, Roy Sigurd Karlsbakk r...@karlsbakk.netwrote: The pool will remain available, but you will have data corruption. The simple way to avoid this, is to use a raidz2, where the chances are far lower for data corruption. It's also possible to replace a drive while the failed / failing drive is still in the system. If some of the disk still has usable data, it can be used in the resilver. Of course using raidz2 or raidz3 is probably easier. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedup accounting anomaly / dedup experiments
On Thu, Jul 1, 2010 at 1:33 AM, Lutz Schumann presa...@storageconcepts.dewrote: Anyone knowing why the dedup factor is wrong ? Any insights on what has actually been written (compressed meta data, deduped meta data .. etc.) would be greatly appreshiated. Metadata and ditto blocks. Even with dedup, zfs will write multiple copies of blocks after reaching a certain threshold. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SATA 6G controller for OSOL
On 07/ 9/10 09:58 AM, Brandon High wrote: On Fri, Jul 9, 2010 at 12:42 AM, James Van Artsdalen james-opensola...@jrv.org wrote: If these 6 Gb/s controllers are based on the Marvell part I would test them thoroughly before deployment - those chips have been problematic. The Marvell 88SE9123 was the troublemaker, and it's not available anymore. The 88SE9120, 88SE9125 and 88SE9128 are the fixed versions. Could you be more specific about the problems with 88SE9123, especially with SATA ? I am in the process of setting up a system with AD2SA6GPX1 HBA based on this chipset (at least according to the product pages [*]). v. [*] http://www.addonics.com/products/host_controller/ad2sa6gpx1.asp ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS recovery tools
So, just I took the right command. For me the output is very cryptic and I cannot get any information helping me. I uploaded the output to a filehoster. http://ifile.it/vzwn50s/Output.txt I hope you can tell me what it means. Regards ron -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] spreading data after adding devices to pool
I use ZFS (on FreeBSD) for my home NAS. I started on 4 drives then added 4 and have now added another 4, bringing the total up to 12 drives on 3 raidzs in 1 pool. I was just wondering if there was any advantage or disadvantage to spreading the data across the 3 raidz, as two are currently full and one is completely empty. If it would improve performance to spread the data, is there any easy way to do it? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Should i enable Write-Cache ?
On Jul 8, 2010, at 4:37 PM, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Philippe Schwarz 3Ware cards Any drawback (except that without BBU, i've got a pb in case of power loss) in enabling the WC with ZFS ? If you don't have a BBU, and you care about your data, don't enable WriteBack. If you enable writeback without a BBU, you might as well just disable ZIL instead. It's more effective, and just as dangerous. Actually, disabling the ZIL is probably faster *and* safer than running WriteBack without BBU. ZIL and data loss are orthogonal. As long as the device correctly respects the (nonvolatile) cache flush requests, then the data can be safe. -- richard -- Richard Elling rich...@nexenta.com +1-760-896-4422 ZFS and NexentaStor training, Rotterdam, July 13-15, 2010 http://nexenta-rotterdam.eventbrite.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Legality and the future of zfs...
From: Rich Teer [mailto:rich.t...@rite-group.com] Sent: Thursday, July 08, 2010 7:43 PM Yep. Provided it supported ZFS, a Mac Mini makes for a compelling SOHO server. The lack of ZFS is the main thing holding me back here... I don't really want to go into much detail here (it's a zfs list, not an anti-apple list) but in my personal experience, OSX server is simply not a stable OS. Even with all the patches installed, and a light workload, my dumb leopard server keeps doing really dumb things like ... failing to start the dhcp service, or spontaneously losing its password database, or failing to release a time machine image when a client goes offline... thus necessitating a server reboot before the client can use time machine again ... I generally reboot my XServe once per week, whereas my linux, windows, and solaris servers only need reboots for hardware issues or certain system updates. Quarterly. After several iterations, we finally disabled all OSX services except time machine. If you happen to like mac minis for their *hardware* cough instead of their software (osx server), you could always install osol, or freebsd, or linux or something on that machine instead. I like mac laptops, but their server and enterprise offerings are beyond pathetic. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SATA 6G controller for OSOL
My advice would be to NOT use the AD2SA6GPX1 HBA for building an opensolaris storage box. Although the AHCI driver will load, the drives are not visible to the OS, and device configuration fails according to 'cfgadm -al'. I have a couple of them that are now residing in a linux box as I was unable to get them to work with OSOL b134 or Nexenta 3.0rc1. I did post in the forums over there, but received no response. I also submitted a bug, but have not had any response at all. I haven't gone upstream yet as I have been distracted by other things. I have not tried them with Solaris 10. Matt On Fri, Jul 9, 2010 at 3:40 AM, Vladimir Kotal vladimir.ko...@sun.comwrote: On 07/ 9/10 09:58 AM, Brandon High wrote: On Fri, Jul 9, 2010 at 12:42 AM, James Van Artsdalen james-opensola...@jrv.org wrote: If these 6 Gb/s controllers are based on the Marvell part I would test them thoroughly before deployment - those chips have been problematic. The Marvell 88SE9123 was the troublemaker, and it's not available anymore. The 88SE9120, 88SE9125 and 88SE9128 are the fixed versions. Could you be more specific about the problems with 88SE9123, especially with SATA ? I am in the process of setting up a system with AD2SA6GPX1 HBA based on this chipset (at least according to the product pages [*]). v. [*] http://www.addonics.com/products/host_controller/ad2sa6gpx1.asp ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Matt Urbanowski Graduate Student 5-51 Medical Sciences Building Dept. Of Cell Biology University of Alberta Edmonton, Alberta, Canada T6G 2H7 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Hash functions (was Re: Hashing files rapidly on ZFS)
On Thu, Jul 08, 2010 at 08:42:33PM -0700, Garrett D'Amore wrote: On Fri, 2010-07-09 at 10:23 +1000, Peter Jeremy wrote: In theory, collisions happen. In practice, given a cryptographic hash, if you can find two different blocks or files that produce the same output, please publicise it widely as you have broken that hash function. Not necessarily. While you *should* publicize it widely, given all the possible text that we have, and all the other variants, its theoretically possibly to get likely. Like winning a lottery where everyone else has a million tickets, but you only have one. Such an occurrence -- if isolated -- would not, IMO, constitute a 'breaking' of the hash function. A hash function is broken when we know how to create colliding inputs. A random collision does not a break make, though it might, perhaps, help figure out how to break the hash function later. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] spreading data after adding devices to pool
You could move the data elsewhere using zfs send and recv, destroy the original datasets and then recreate them. This would stripe the data across the vdevs. Of course, when BP-rewrite becomes available it should be possible to simply redistribute blocks amongst the various vdevs without having to go through destroying/creating. On Fri, Jul 9, 2010 at 6:57 AM, George Helyar ghel...@gmail.com wrote: I use ZFS (on FreeBSD) for my home NAS. I started on 4 drives then added 4 and have now added another 4, bringing the total up to 12 drives on 3 raidzs in 1 pool. I was just wondering if there was any advantage or disadvantage to spreading the data across the 3 raidz, as two are currently full and one is completely empty. If it would improve performance to spread the data, is there any easy way to do it? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Matt Urbanowski Graduate Student 5-51 Medical Sciences Building Dept. Of Cell Biology University of Alberta Edmonton, Alberta, Canada T6G 2H7 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED
Hi Cindy, Not sure exactly when the drives went into this state, but it is likely that it happened when I added a second pool, added the same spares to the second pool, then later destroyed the second pool. There have been no controller or any other hardware changes to this system - it is all original parts. The device names are valid, the issue is that they are listed twice - once for a spare which is AVAIL and another time for the spare which is FAULTED. I've tried zpool remove, zpool offline, zpool clear, zpool export/import, I've unconfigured the drives via cfgadm and tried a remove, nothing works to remove the FAULTED spares. I was just able remove the AVAIL spares, but only since they were listed first in the spares list: [IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c0t6d0 [IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c5t5d0 [IDGSUN02:/dev/dsk] root# zpool status pool: idgsun02 state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM idgsun02ONLINE 0 0 0 raidz2ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 raidz2ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c6t0d0 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 spares c0t6d0FAULTED corrupted data c5t5d0FAULTED corrupted data errors: No known data errors What's interesting is that running the zpool remove commands a second time has no effect (presumably because zpool is using GUID internally). I may have, at one point, tried to re-add the drive again after seeing the state FAULTED and not being able to remove it, which is probably where the second set of entries came from. (Pretty much exactly what's described here: http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSFaultedSpares). What I really need is to be able to remove the two bogus faulted spares, and I think the only way I'll be able to do that is via the GUIDs, since the (valid) vdev path is shown as the same for each. I would guess zpool is attempting to remove the device I've got a support case open, but no traction on that as of yet. -- Ryan Schwartz, UNIX Systems Administrator, VitalSource Technologies, Inc. - An Ingram Digital Company Mob: (608) 886-3513 ▪ ryan.schwa...@ingramdigital.com On Jul 8, 2010, at 5:25 PM, Cindy Swearingen wrote: Hi Ryan, What events lead up to this situation? I've seen a similar problem when a system upgrade caused the controller numbers of the spares to change. In that case, the workaround was to export the pool, correct the spare device names, and import the pool. I'm not sure if this workaround applies to your case. Do you know if the spare device names changed? My hunch is that you could export this pool, reconnect the spare devices, and reimport the pool, but I'd rather test this on my own pool first and I can't reproduce this problem. I don't think you can remove the spares by their GUID. At least, I couldn't. You said you tried to remove the spares with zpool remove. Did you try this command: # zpool remove idgsun02 c0t6d0 Or this command, which I don't think would work, but you would get a message like this: # zpool remove idgsun02 c0t6d0s0 cannot remove c0t6d0s0: no such device in pool Thanks, Cindy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] BP rewrite? (Was Re: spreading data after adding devices to pool)
Anyone knows where in the pipeline BP rewrite is, or how long this pipeline is? You could move the data elsewhere using zfs send and recv, destroy the original datasets and then recreate them. This would stripe the data across the vdevs. Of course, when BP-rewrite becomes available it should be possible to simply redistribute blocks amongst the various vdevs without having to go through destroying/creating. On Fri, Jul 9, 2010 at 6:57 AM, George Helyar ghel...@gmail.com wrote: I use ZFS (on FreeBSD) for my home NAS. I started on 4 drives then added 4 and have now added another 4, bringing the total up to 12 drives on 3 raidzs in 1 pool. I was just wondering if there was any advantage or disadvantage to spreading the data across the 3 raidz, as two are currently full and one is completely empty. If it would improve performance to spread the data, is there any easy way to do it? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Matt Urbanowski Graduate Student 5-51 Medical Sciences Building Dept. Of Cell Biology University of Alberta Edmonton, Alberta, Canada T6G 2H7 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED
Hi Ryan, Which Solaris release is this? Thanks, Cindy On 07/09/10 10:38, Ryan Schwartz wrote: Hi Cindy, Not sure exactly when the drives went into this state, but it is likely that it happened when I added a second pool, added the same spares to the second pool, then later destroyed the second pool. There have been no controller or any other hardware changes to this system - it is all original parts. The device names are valid, the issue is that they are listed twice - once for a spare which is AVAIL and another time for the spare which is FAULTED. I've tried zpool remove, zpool offline, zpool clear, zpool export/import, I've unconfigured the drives via cfgadm and tried a remove, nothing works to remove the FAULTED spares. I was just able remove the AVAIL spares, but only since they were listed first in the spares list: [IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c0t6d0 [IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c5t5d0 [IDGSUN02:/dev/dsk] root# zpool status pool: idgsun02 state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM idgsun02ONLINE 0 0 0 raidz2ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 raidz2ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c6t0d0 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 spares c0t6d0FAULTED corrupted data c5t5d0FAULTED corrupted data errors: No known data errors What's interesting is that running the zpool remove commands a second time has no effect (presumably because zpool is using GUID internally). I may have, at one point, tried to re-add the drive again after seeing the state FAULTED and not being able to remove it, which is probably where the second set of entries came from. (Pretty much exactly what's described here: http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSFaultedSpares). What I really need is to be able to remove the two bogus faulted spares, and I think the only way I'll be able to do that is via the GUIDs, since the (valid) vdev path is shown as the same for each. I would guess zpool is attempting to remove the device I've got a support case open, but no traction on that as of yet. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED
Ok, so after removing the spares marked as AVAIL and re-adding them again, I put myself back in the you're effed, dude boat. What I should have done at that point is a zpool export/import at that point which would have resolved it. So what I did was recreate the steps that got me into the state where the AVAIL spares were listed first, rather than the FAULTED ones (which allowed me to remove them as demonstrated in my previous email). I created another pool sharing the same spares, removed the spares then destroyed it, then exported and imported the main pool again. Once that operation completed, I was then able to remove the spares again, export/import the pool, and the problem is now resolved. zpool create cleanup c5t3d0 c4t3d0 spare c0t6d0 c5t5d0 zpool remove cleanup c0t6d0 c5t5d0 zpool destroy cleanup zpool export idgsun02 zpool import idgsun02 zpool remove idgsun02 c0t6d0 zpool remove idgsun02 c5t5d0 zpool export idgsun02 zpool import idgsun02 And the resultant zpool status is this: [IDGSUN02:/] root# zpool status pool: idgsun02 state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM idgsun02ONLINE 0 0 0 raidz2ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 raidz2ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c6t0d0 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 spares c0t6d0AVAIL c5t5d0AVAIL errors: No known data errors Hopefully this might help someone in the future if they get into this situation. -- Ryan Schwartz, UNIX Systems Administrator, VitalSource Technologies, Inc. - An Ingram Digital Company Mob: (608) 886-3513 ▪ ryan.schwa...@ingramdigital.com On Jul 9, 2010, at 11:38 AM, Ryan Schwartz wrote: Hi Cindy, Not sure exactly when the drives went into this state, but it is likely that it happened when I added a second pool, added the same spares to the second pool, then later destroyed the second pool. There have been no controller or any other hardware changes to this system - it is all original parts. The device names are valid, the issue is that they are listed twice - once for a spare which is AVAIL and another time for the spare which is FAULTED. I've tried zpool remove, zpool offline, zpool clear, zpool export/import, I've unconfigured the drives via cfgadm and tried a remove, nothing works to remove the FAULTED spares. I was just able remove the AVAIL spares, but only since they were listed first in the spares list: [IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c0t6d0 [IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c5t5d0 [IDGSUN02:/dev/dsk] root# zpool status pool: idgsun02 state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM idgsun02ONLINE 0 0 0 raidz2ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 raidz2ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c6t0d0 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 spares c0t6d0FAULTED corrupted data c5t5d0FAULTED corrupted data errors: No known data errors What's interesting is that running the zpool remove commands a second time has no effect (presumably
Re: [zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED
Cindy, [IDGSUN02:/] root# cat /etc/release Solaris 10 10/08 s10x_u6wos_07b X86 Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 27 October 2008 But as noted in my recent email, I've resolved this with an export/import with only 2 of the 4 spares listed (they were listed as FAULTED, but the export/import fixed that right up). -- Ryan Schwartz, UNIX Systems Administrator, VitalSource Technologies, Inc. - An Ingram Digital Company Mob: (608) 886-3513 ▪ ryan.schwa...@ingramdigital.com On Jul 9, 2010, at 1:00 PM, Cindy Swearingen wrote: Hi Ryan, Which Solaris release is this? Thanks, Cindy On 07/09/10 10:38, Ryan Schwartz wrote: Hi Cindy, Not sure exactly when the drives went into this state, but it is likely that it happened when I added a second pool, added the same spares to the second pool, then later destroyed the second pool. There have been no controller or any other hardware changes to this system - it is all original parts. The device names are valid, the issue is that they are listed twice - once for a spare which is AVAIL and another time for the spare which is FAULTED. I've tried zpool remove, zpool offline, zpool clear, zpool export/import, I've unconfigured the drives via cfgadm and tried a remove, nothing works to remove the FAULTED spares. I was just able remove the AVAIL spares, but only since they were listed first in the spares list: [IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c0t6d0 [IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c5t5d0 [IDGSUN02:/dev/dsk] root# zpool status pool: idgsun02 state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM idgsun02ONLINE 0 0 0 raidz2ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 raidz2ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c6t0d0 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 spares c0t6d0FAULTED corrupted data c5t5d0FAULTED corrupted data errors: No known data errors What's interesting is that running the zpool remove commands a second time has no effect (presumably because zpool is using GUID internally). I may have, at one point, tried to re-add the drive again after seeing the state FAULTED and not being able to remove it, which is probably where the second set of entries came from. (Pretty much exactly what's described here: http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSFaultedSpares). What I really need is to be able to remove the two bogus faulted spares, and I think the only way I'll be able to do that is via the GUIDs, since the (valid) vdev path is shown as the same for each. I would guess zpool is attempting to remove the device I've got a support case open, but no traction on that as of yet. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED
I was going to suggest the export/import step next. :-) I'm glad you were able to resolve it. We are working on making spare behavior more robust. In the meantime, my advice is keep life simple and do not share spares, logs, caches, or even disks between pools. Thanks, Cindy On 07/09/10 12:08, Ryan Schwartz wrote: Cindy, [IDGSUN02:/] root# cat /etc/release Solaris 10 10/08 s10x_u6wos_07b X86 Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 27 October 2008 But as noted in my recent email, I've resolved this with an export/import with only 2 of the 4 spares listed (they were listed as FAULTED, but the export/import fixed that right up). ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Legality and the future of zfs...
ab == Alex Blewitt alex.blew...@gmail.com writes: ab All Mac Minis have FireWire - the new ones have FW800. I tried attaching just two disks to a ZFS host using firewire, and it worked very badly for me. I found: 1. The solaris firewire stack isn't as good as the Mac OS one. 2. Solaris is very obnoxious about drives it regards as ``removeable''. There are ``hot-swappable'' drives that are not considered removeable but can be removed about as easily, that are maybe handled less obnoxiously. Firewire's removeable while SAS/SATA are hot-swappable. 3. The quality of software inside the firewire cases varies wildly and is a big source of stability problems. (even on mac) The companies behind the software are sketchy and weak, while only a few large cartels make SAS expanders for example. Also, the price of these cases is ridiculously high compared to SATA world. If you go there you may as well take your wad next door and get SAS. 4. The translation between firewire and SATA is not a simple one, and is not transparent to 'smartctl' commands, or other werid things like hard disk firmware upgraders. though I guess the same is true of the lsi controllers under solaris. This problem's rampant unfortunately. 5. Firewire is slow. too slow to make 2x speed interesting. and the host chips are not that advanced so they use a lot of CPU. 6. The DTL partial-mirror-resilver doesn't work. With b130 it still doesn't work. After half a mirror goes away and comes back, scrubs always reveal CKSUM errors on the half that went away. With b71 I foudn if I meticulously 'zpool offline'd the disks before taking them away, the CKSUM errors didn't happen. With b130 that no longer helps. so, scratchy unreliable connections are just unworkable. Even iSCSI is not great, but firewire cases sprawled all over a desk with trippable scratchy cables is just not on. It's better to have larger cases that can be mounted in a rack, or if not that, at least cases that are heavier and fewer in number and fewer in cordage. suggest that you do not waste time with firewire. SATA, SAS, or fuckoff. None of this is an insult to your blingy designer apple iShit. It applies equally well to any hardware involving lots of tiny firewire cases. pgp6yEjqWzyNZ.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Legality and the future of zfs...
On Fri, 2010-07-09 at 15:02 -0400, Miles Nordin wrote: ab == Alex Blewitt alex.blew...@gmail.com writes: ab All Mac Minis have FireWire - the new ones have FW800. I tried attaching just two disks to a ZFS host using firewire, and it worked very badly for me. I found: 1. The solaris firewire stack isn't as good as the Mac OS one. Indeed. There has been some improvement here in the past year or two, but I still wouldn't deem it ready for serious production work. 2. Solaris is very obnoxious about drives it regards as ``removeable''. There are ``hot-swappable'' drives that are not considered removeable but can be removed about as easily, that are maybe handled less obnoxiously. Firewire's removeable while SAS/SATA are hot-swappable. Actually, most of the removable and hotpluggable devices have the same handling. But SAS/SATA HBAs rarely identify their devices as hotpluggable, even though they are. There are other issues you hit as a result here. We're approaching the state where all media are hotpluggable with the exception of legacy PATA and parallel SCSI. And those are becoming rarer and rarer. (Granted many hardware chassis don't support hotplug of internal SATA drives, but that's an attribute of the chassis.) 3. The quality of software inside the firewire cases varies wildly and is a big source of stability problems. (even on mac) The companies behind the software are sketchy and weak, while only a few large cartels make SAS expanders for example. Also, the price of these cases is ridiculously high compared to SATA world. If you go there you may as well take your wad next door and get SAS. I'd be highly concerned about whether 1394 adapters did cache flush correctly. -- Garrett ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] snapshot out of space
On Fri, Jul 9, 2010 at 8:04 AM, Tony MacDoodle tpsdoo...@gmail.com wrote: datapool/pluto refreservation 70G local This means that every snapshot will require 70G of free space? No. Could you provide the information requested? -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SATA 6G controller for OSOL
On Fri, Jul 9, 2010 at 2:40 AM, Vladimir Kotal vladimir.ko...@sun.comwrote: Could you be more specific about the problems with 88SE9123, especially with SATA ? I am in the process of setting up a system with AD2SA6GPX1 HBA based on this chipset (at least according to the product pages [*]). http://lmgtfy.com/?q=marvell+9123+problems The problems seem to be mostly with the PATA controller that's built in. Regardless, Marvell no longer offers the 9123. Any vendor offering cards based on it is probably using chips bought as surplus or for recycling. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SATA 6G controller for OSOL
Agreed! I'm not sure why Addonics is selling them, given the history of problems. At any rate, I'm glad that I didn't pay anything for the 3 three that I have. On Fri, Jul 9, 2010 at 1:56 PM, Brandon High bh...@freaks.com wrote: On Fri, Jul 9, 2010 at 2:40 AM, Vladimir Kotal vladimir.ko...@sun.comwrote: Could you be more specific about the problems with 88SE9123, especially with SATA ? I am in the process of setting up a system with AD2SA6GPX1 HBA based on this chipset (at least according to the product pages [*]). http://lmgtfy.com/?q=marvell+9123+problems The problems seem to be mostly with the PATA controller that's built in. Regardless, Marvell no longer offers the 9123. Any vendor offering cards based on it is probably using chips bought as surplus or for recycling. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Matt Urbanowski Graduate Student 5-51 Medical Sciences Building Dept. Of Cell Biology University of Alberta Edmonton, Alberta, Canada T6G 2H7 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs snapshot revert
I'm not trying to fix anything in particular, I'm just curious. In case I rollback a filesystem and then realize, I wanted a file from the original file system (before rollback). I read the section on clones here: http://docs.sun.com/app/docs/doc/819-5461/gavvx?a=view but I'm still not sure what clones are. They sound the same as snapshots. From what I read, it seems like it is possible to snapshot a filesystem, clone the snapshot and then rollback the file system. At this point the clone will be the same size as the data changed during the rollback? So from the clone, files that were unique to the filesystem before rollback can be restored? Thanks. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] resilver of older root pool disk
This is a hypothetical question that could actually happen: Suppose a root pool is a mirror of c0t0d0s0 and c0t1d0s0 and for some reason c0t0d0s0 goes off line, but comes back on line after a shutdown. The primary boot disk would then be c0t0d0s0 which would have much older data than c0t1d0s0. Under normal circumstances ZFS would know that c0t0d0s0 needs to be resilvered. But in this case c0t0d0s0 is the boot disk. Would ZFS still be able to correctly resilver the correct disk under these circumstances? I suppose it might depend on which files, if any, had actually changed... Thanks -- Frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Legality and the future of zfs...
On 9 Jul 2010, at 20:38, Garrett D'Amore wrote: On Fri, 2010-07-09 at 15:02 -0400, Miles Nordin wrote: ab == Alex Blewitt alex.blew...@gmail.com writes: ab All Mac Minis have FireWire - the new ones have FW800. I tried attaching just two disks to a ZFS host using firewire, and it worked very badly for me. I found: 1. The solaris firewire stack isn't as good as the Mac OS one. Indeed. There has been some improvement here in the past year or two, but I still wouldn't deem it ready for serious production work. That may be true for Solaris; but not so for Mac OS X. And after all, that's what I'm working to get ZFS on. 3. The quality of software inside the firewire cases varies wildly and is a big source of stability problems. (even on mac) It would be good if you could refrain from spreading FUD if you don't have experience with it. I have used FW400 and FW800 on Mac systems for the last 8 years; the only problem was with the Oxford 911 chipset in OSX 10.1 days. Since then, I've not experienced any issues to do with the bus itself. It may not suit everyone's needs, and it may not be supported well on OpenSolaris, but it works fine on a Mac. Alex ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] BP rewrite? (Was Re: spreading data after adding devices to pool)
+1 I badly need this. On 09/07/2010, at 19.40, Roy Sigurd Karlsbakk wrote: Anyone knows where in the pipeline BP rewrite is, or how long this pipeline is? You could move the data elsewhere using zfs send and recv, destroy the original datasets and then recreate them. This would stripe the data across the vdevs. Of course, when BP-rewrite becomes available it should be possible to simply redistribute blocks amongst the various vdevs without having to go through destroying/creating. -- Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs snapshot revert
On 07/10/10 08:10 AM, zfsnoob4 wrote: I'm not trying to fix anything in particular, I'm just curious. In case I rollback a filesystem and then realize, I wanted a file from the original file system (before rollback). I read the section on clones here: http://docs.sun.com/app/docs/doc/819-5461/gavvx?a=view but I'm still not sure what clones are. In simple terms, writeable snapshots. They sound the same as snapshots. From what I read, it seems like it is possible to snapshot a filesystem, clone the snapshot and then rollback the file system. You have to promote the clone before you can destroy its source snapshot. At this point the clone will be the same size as the data changed during the rollback? So from the clone, files that were unique to the filesystem before rollback can be restored? The clone will be a clone of the filesystem at the time its source snapshot was created. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs send/recv hanging in 2009.06
I have a couple of systems running 2009.06 that hang on relatively large zfs send/recv jobs. With the -v option, I see the snapshots coming across, and at some point the process just pauses, IO and CPU usage go to zero, and it takes a hard reboot to get back to normal. The same script running against the same data doesn't hang on 2008.05. There are maybe 100 snapshots, 200GB of data total. Just trying to send to a blank external USB drive in one case, and in the other, I'm restoring from a USB drive to a local drive, but the behavior is the same. I see that others have had a similar problem, but there doesn't seem to be any answers - https://opensolaris.org/jive/thread.jspa?messageID=384540 http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg34493.html http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg37158.html I'd like to stick with a released version of OpenSolaris, so I'm hoping that the answer isn't to switch to the dev repository and pull down b134. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Legality and the future of zfs...
Folks, I would appreciate it if you can create a separate thread for Mac Mini. Back to the original subject. NetApp has deep pockets. A few companies have already backed out of zfs as they cannot afford to go through a lawsuit. I am in a stealth startup company and we rely on zfs for our application. The future of our company, and many other businesses, depends on what happens to zfs. If you are in a similar boat, what actions are you planning? Regards, Peter -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS, IPS (IBM ServeRAID) driver, and a kernel panic...
Hi, I have been trying out the latest NextentaCore and NexentaStor Community ed. builds (they have the driver I need built in) on the hardware I have with this controller. The only difference between the 2 machines is that the 'Core' machine has 16GB of RAM and the 'Stor' one has 12GB. On both machines I did the following: 1) Created zpool consisting of a single RaidZ from 5 300GB U320 10K drives. 2) Created 4 filesystems in the pool. 3) On the 4 filesystems I set the dedup and compression properties to cover all the combinations. (off/off, off/on, on/off, and on/on) On the 'Stor' machine I elected to Disable the ZIL and cacheflush through the web GUI. I didn't do this on the 'Core' machine. On the 'Core' machine I mounted the 4 Filesystems from the 'Stor' machine via NFSv4. Now for a bit of history. I tried out the 'Stor' machine in this exact config (but with ZIL and Cache flushes on) about a month ago with version 3.0.2. At that time I used a Linux NFS client to time untar'ing the GCC sources to each of the 4 filesystems. This test repeatedly failed on the first filesystem by bringing the machine to it's knees to the point that I had to power cycle it. This time around I decided to use the 'Core' machine as the client so I could also time the same test to it's local ZFS filesystems. At first I got my hopes up, because the test ran to completion (and rather quickly) locally on the core machine. I then added running it over NFS to the 'Stor' machine to the testing. In the beginning I was untarring it once on each filesystem, and even over NFS this worked (though slower than I'd hoped for having the ZIL and cacheflush disabled.) So I thought I'd push the DeDup a little harder, and I expanded the test to untar the sources 4 times per filesystem. This ran fine until the 4th NFS filesystem, where the 'Stor' machine panic'd. The client waited while it rebooted, and then resumed the test causing it to panic a second time. For some reason it hung so bad the second time it didn't even reboot - I'll have to power cycle it monday when I get to work. The 2 stack traces are identical: anic[cpu3]/thread=ff001782fc60: BAD TRAP: type=e (#pf Page fault) rp=ff001782f9c0 addr=18 occurred in module unix due to a NULL pointer dereference sched: #pf Page fault Bad kernel fault at addr=0x18 pid=0, pc=0xfb863374, sp=0xff001782fab8, eflags=0x10286 cr0: 8005003bpg,wp,ne,et,ts,mp,pe cr4: 6f8xmme,fxsr,pge,mce,pae,pse,de cr2: 18cr3: 500cr8: c rdi: ff03dc84fcfc rsi: ff03e1d03d98 rdx:2 rcx:2 r8:0 r9: ff0017a51c60 rax: ff001782fc60 rbx:2 rbp: ff001782fb10 r10: e10377c748 r11: ff00 r12: ff03dc84fcfc r13: ff00 r14: ff00 r15: 10 fsb:0 gsb: ff03e1d03ac0 ds: 4b es: 4b fs:0 gs: 1c3 trp:e err:0 rip: fb863374 cs: 30 rfl:10286 rsp: ff001782fab8 ss: 38 ff001782f8a0 unix:die+dd () ff001782f9b0 unix:trap+177b () ff001782f9c0 unix:cmntrap+e6 () ff001782fb10 unix:mutex_owner_running+14 () ff001782fb40 ips:ips_remove_busy_command+27 () ff001782fb80 ips:ips_finish_io_request+a8 () ff001782fbb0 ips:ips_intr+7b () ff001782fc00 unix:av_dispatch_autovect+7c () ff001782fc40 unix:dispatch_hardint+33 () ff0018517580 unix:switch_sp_and_call+13 () ff00185175d0 unix:do_interrupt+b8 () ff00185175e0 unix:_interrupt+b8 () ff00185176e0 genunix:kmem_free+34 () ff0018517710 zfs:zio_pop_transforms+86 () ff0018517780 zfs:zio_done+152 () ff00185177b0 zfs:zio_execute+8d () ff0018517810 zfs:zio_notify_parent+a6 () ff0018517880 zfs:zio_done+3e2 () ff00185178b0 zfs:zio_execute+8d () ff0018517910 zfs:zio_notify_parent+a6 () ff0018517980 zfs:zio_done+3e2 () ff00185179b0 zfs:zio_execute+8d () ff0018517a10 zfs:zio_notify_parent+a6 () ff0018517a80 zfs:zio_done+3e2 () ff0018517ab0 zfs:zio_execute+8d () ff0018517b50 genunix:taskq_thread+248 () ff0018517b60 unix:thread_start+8 () syncing file systems... done dumping to /dev/zvol/dsk/syspool/dump, offset 65536, content: kernel + curproc 0% done: 0 pages dumped, dump failed: error 5 rebooting... As I read this, it's probably a bug in the IPS driver. But I really don't know anything about kernel panic's. This seems 100% reproducible, so I'm happy to run more tests in KDB if it will help. As I've mentioned before I'd be happy to try to work on the code myself if it were available. Anyone have any ideas? -Kyle On 7/7/2010 3:12 PM, Kyle McDonald wrote: On 6/24/2010 6:31 PM, James C. McPherson wrote: hi Kyle, the
Re: [zfs-discuss] ZFS, IPS (IBM ServeRAID) driver, and a kernel panic...
First off, you need to test 3.0.3 if you're using dedup. Earlier versions had an unduly large number of issues when used with dedup. Hopefully with 3.0.3 we've got the bulk of the problems resolved. ;-) Secondly, from your stack backtrace, yes, it appears ips is implicated. If I had source for ips, I might be better able to help you out. - Garrett On Fri, 2010-07-09 at 18:08 -0400, Kyle McDonald wrote: Hi, I have been trying out the latest NextentaCore and NexentaStor Community ed. builds (they have the driver I need built in) on the hardware I have with this controller. The only difference between the 2 machines is that the 'Core' machine has 16GB of RAM and the 'Stor' one has 12GB. On both machines I did the following: 1) Created zpool consisting of a single RaidZ from 5 300GB U320 10K drives. 2) Created 4 filesystems in the pool. 3) On the 4 filesystems I set the dedup and compression properties to cover all the combinations. (off/off, off/on, on/off, and on/on) On the 'Stor' machine I elected to Disable the ZIL and cacheflush through the web GUI. I didn't do this on the 'Core' machine. On the 'Core' machine I mounted the 4 Filesystems from the 'Stor' machine via NFSv4. Now for a bit of history. I tried out the 'Stor' machine in this exact config (but with ZIL and Cache flushes on) about a month ago with version 3.0.2. At that time I used a Linux NFS client to time untar'ing the GCC sources to each of the 4 filesystems. This test repeatedly failed on the first filesystem by bringing the machine to it's knees to the point that I had to power cycle it. This time around I decided to use the 'Core' machine as the client so I could also time the same test to it's local ZFS filesystems. At first I got my hopes up, because the test ran to completion (and rather quickly) locally on the core machine. I then added running it over NFS to the 'Stor' machine to the testing. In the beginning I was untarring it once on each filesystem, and even over NFS this worked (though slower than I'd hoped for having the ZIL and cacheflush disabled.) So I thought I'd push the DeDup a little harder, and I expanded the test to untar the sources 4 times per filesystem. This ran fine until the 4th NFS filesystem, where the 'Stor' machine panic'd. The client waited while it rebooted, and then resumed the test causing it to panic a second time. For some reason it hung so bad the second time it didn't even reboot - I'll have to power cycle it monday when I get to work. The 2 stack traces are identical: anic[cpu3]/thread=ff001782fc60: BAD TRAP: type=e (#pf Page fault) rp=ff001782f9c0 addr=18 occurred in module unix due to a NULL pointer dereference sched: #pf Page fault Bad kernel fault at addr=0x18 pid=0, pc=0xfb863374, sp=0xff001782fab8, eflags=0x10286 cr0: 8005003bpg,wp,ne,et,ts,mp,pe cr4: 6f8xmme,fxsr,pge,mce,pae,pse,de cr2: 18cr3: 500cr8: c rdi: ff03dc84fcfc rsi: ff03e1d03d98 rdx:2 rcx:2 r8:0 r9: ff0017a51c60 rax: ff001782fc60 rbx:2 rbp: ff001782fb10 r10: e10377c748 r11: ff00 r12: ff03dc84fcfc r13: ff00 r14: ff00 r15: 10 fsb:0 gsb: ff03e1d03ac0 ds: 4b es: 4b fs:0 gs: 1c3 trp:e err:0 rip: fb863374 cs: 30 rfl:10286 rsp: ff001782fab8 ss: 38 ff001782f8a0 unix:die+dd () ff001782f9b0 unix:trap+177b () ff001782f9c0 unix:cmntrap+e6 () ff001782fb10 unix:mutex_owner_running+14 () ff001782fb40 ips:ips_remove_busy_command+27 () ff001782fb80 ips:ips_finish_io_request+a8 () ff001782fbb0 ips:ips_intr+7b () ff001782fc00 unix:av_dispatch_autovect+7c () ff001782fc40 unix:dispatch_hardint+33 () ff0018517580 unix:switch_sp_and_call+13 () ff00185175d0 unix:do_interrupt+b8 () ff00185175e0 unix:_interrupt+b8 () ff00185176e0 genunix:kmem_free+34 () ff0018517710 zfs:zio_pop_transforms+86 () ff0018517780 zfs:zio_done+152 () ff00185177b0 zfs:zio_execute+8d () ff0018517810 zfs:zio_notify_parent+a6 () ff0018517880 zfs:zio_done+3e2 () ff00185178b0 zfs:zio_execute+8d () ff0018517910 zfs:zio_notify_parent+a6 () ff0018517980 zfs:zio_done+3e2 () ff00185179b0 zfs:zio_execute+8d () ff0018517a10 zfs:zio_notify_parent+a6 () ff0018517a80 zfs:zio_done+3e2 () ff0018517ab0 zfs:zio_execute+8d () ff0018517b50 genunix:taskq_thread+248 () ff0018517b60 unix:thread_start+8 () syncing file systems... done dumping to /dev/zvol/dsk/syspool/dump, offset 65536, content: kernel
Re: [zfs-discuss] SATA 6G controller for OSOL
This thread from Marc Bevand and his blog linked therein might have some useful alternative suggestions. http://opensolaris.org/jive/thread.jspa?messageID=480925 I've bookmarked it because it's quite a handy summary and I hope he keeps updating it with new info -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/recv hanging in 2009.06
On 07/10/10 09:49 AM, BJ Quinn wrote: I have a couple of systems running 2009.06 that hang on relatively large zfs send/recv jobs. With the -v option, I see the snapshots coming across, and at some point the process just pauses, IO and CPU usage go to zero, and it takes a hard reboot to get back to normal. The same script running against the same data doesn't hang on 2008.05. There are maybe 100 snapshots, 200GB of data total. Just trying to send to a blank external USB drive in one case, and in the other, I'm restoring from a USB drive to a local drive, but the behavior is the same. I see that others have had a similar problem, but there doesn't seem to be any answers - https://opensolaris.org/jive/thread.jspa?messageID=384540 http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg34493.html http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg37158.html I'd like to stick with a released version of OpenSolaris, so I'm hoping that the answer isn't to switch to the dev repository and pull down b134. It probably is. I had a number of these issues (in Solaris 10) and they are fixed in more recent builds. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/recv hanging in 2009.06
On Fri, Jul 9, 2010 at 6:49 PM, BJ Quinn bjqu...@seidal.com wrote: I have a couple of systems running 2009.06 that hang on relatively large zfs send/recv jobs. With the -v option, I see the snapshots coming across, and at some point the process just pauses, IO and CPU usage go to zero, and it takes a hard reboot to get back to normal. The same script running against the same data doesn't hang on 2008.05. There are issues running concurrent zfs receive in 2009.6. Try to run just one at a time. Switching to a development build (b134) is probably the answer until we've a new release. -- Giovanni Tirloni gtirl...@sysdroid.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Debunking the dedup memory myth
Whenever somebody asks the question, How much memory do I need to dedup X terabytes filesystem, the standard answer is as much as you can afford to buy. This is true and correct, but I don't believe it's the best we can do. Because as much as you can buy is a true assessment for memory in *any* situation. To improve knowledge in this area, I think the question just needs to be asked differently. How much *extra* memory is required for X terabytes, with dedup enabled versus disabled? I hope somebody knows more about this than me. I expect the answer will be something like ... The default ZFS block size is 128K. If you have a filesystem with 128G used, that means you are consuming 1,048,576 blocks, each of which must be checksummed. ZFS uses adler32 and sha256, which means 4bytes and 32bytes ... 36 bytes * 1M blocks = an extra 36 Mbytes and some fluff consumed by enabling dedup. I suspect my numbers are off, because 36Mbytes seems impossibly small. But I hope some sort of similar (and more correct) logic will apply. ;-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Debunking the dedup memory myth
On Fri, Jul 9, 2010 at 5:00 PM, Edward Ned Harvey solar...@nedharvey.comwrote: The default ZFS block size is 128K. If you have a filesystem with 128G used, that means you are consuming 1,048,576 blocks, each of which must be checksummed. ZFS uses adler32 and sha256, which means 4bytes and 32bytes ... 36 bytes * 1M blocks = an extra 36 Mbytes and some fluff consumed by enabling dedup. I suspect my numbers are off, because 36Mbytes seems impossibly small. But I hope some sort of similar (and more correct) logic will apply. ;-) I think that DDT entries are a little bigger than what you're using. The size seems to range between 150 and 250 bytes depending on how it's calculated, call it 200b each. Your 128G dataset would require closer to 200M (+/- 25%) for the DDT if your data was completely unique. 1TB of unique data would require 600M - 1000M for the DDT. The numbers are fuzzy of course, and assum only 128k blocks. Lots of small files will increase the memory cost of dedupe, and using it on a zvol that has the default block size (8k) would require 16 times the memory. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] block align SSD for use as a l2arc cache
I have an Intel X25-M 80GB SSD. For optimum performance, I need to block align the SSD device, but I am not sure exactly how I should to it. If I run the format - fdisk it allows me to partition based on a cylinder, but I don't think that is sufficient enough. Can someone tell me how they block aligned an SSD device for use in l2arc. Thanks, Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Legality and the future of zfs...
On 7/9/2010 2:55 PM, Peter Taps wrote: Folks, I would appreciate it if you can create a separate thread for Mac Mini. Back to the original subject. NetApp has deep pockets. A few companies have already backed out of zfs as they cannot afford to go through a lawsuit. I am in a stealth startup company and we rely on zfs for our application. The future of our company, and many other businesses, depends on what happens to zfs. If you are in a similar boat, what actions are you planning? Regards, Peter Congratulations. You've tied your boat to a system which has legal issues. Welcome to the Valley. Part of being a successful startup is having a flexible business plan, which includes a hard look at the possibility that core technologies you depend on may no longer be available to you, for whatever reason. Risk analysis is something that any good business *should* include as a part of their strategic view (you do have periodic strategic reviews, right?) If you're planning on developing some sort of storage appliance, and depend on OpenSolaris or FreeBSD w/ ZFS, well, pick another filesystem. It's pretty much that simple. Painful, but simple - each of the other filesystems has well known weaknesses and strengths, so it shouldn't be a big issue to pick the right one for you (even if it's not just like ZFS ). Of course, the smart thing to do is get this strategy in place now, but wait to execute it until it becomes necessary (i.e. ZFS can't be used anymore). If you're writing a ZFS-dependent application (backup?) well, then, you're up the creek. You have no alternate, since you've bet the farm on ZFS. Good news is that it's unlikely that NetApp will win, and if it does look like they'll win, I would bet huge chunks of money that Oracle cross-licenses the patents or pays for a license, rather than kill ZFS (it simply makes too much money for Oracle to abandon). IANAL, but I'd strongly advise against trying to get a license from NetApp, should they come calling for blood money. My personal feeling is that it's better to bet the startup's future on not needing the license, than on forking over a substantial portion of your revenue for what most likely will be unnecessary. But it's up to your financial backers - in the end, it's a gamble. But so are all startups, and trading away significant revenue for dubious safety isn't good sign that you startup will succeed in the long-haul. I'd strongly suggest trying to stay off of NetApp's radar for now, as they're in the mode of a shakedown bully while they still have leverage. If you do get a call from NetApp, go see an IP lawyer right away. They should give you strategies which you can use to stall the progress of any actual lawsuit until the NetApp/Oracle one is finished. And, even now, that strategy is likely less costly than one involving forking over a portion of your revenue to NetApp for a considerable time. Do remember: Oracle has much deeper pockets than NetApp, and much less incentive to settle. None of the preceding should infer that I speak for Oracle, Inc, nor do I have any special knowledge of the progress of the NetApp v Oracle lawsuit. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Debunking the dedup memory myth
On 7/9/2010 5:18 PM, Brandon High wrote: On Fri, Jul 9, 2010 at 5:00 PM, Edward Ned Harvey solar...@nedharvey.com mailto:solar...@nedharvey.com wrote: The default ZFS block size is 128K. If you have a filesystem with 128G used, that means you are consuming 1,048,576 blocks, each of which must be checksummed. ZFS uses adler32 and sha256, which means 4bytes and 32bytes ... 36 bytes * 1M blocks = an extra 36 Mbytes and some fluff consumed by enabling dedup. I suspect my numbers are off, because 36Mbytes seems impossibly small. But I hope some sort of similar (and more correct) logic will apply. ;-) I think that DDT entries are a little bigger than what you're using. The size seems to range between 150 and 250 bytes depending on how it's calculated, call it 200b each. Your 128G dataset would require closer to 200M (+/- 25%) for the DDT if your data was completely unique. 1TB of unique data would require 600M - 1000M for the DDT. The numbers are fuzzy of course, and assum only 128k blocks. Lots of small files will increase the memory cost of dedupe, and using it on a zvol that has the default block size (8k) would require 16 times the memory. -B Go back and read several threads last month about ZFS/L2ARC memory usage for dedup. In particular, I've been quite specific about how to calculate estimated DDT size. Richard has also been quite good at giving size estimates (as well as explaining how to see current block size usage in a filesystem). The structure in question is this one: ddt_entry http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/sys/ddt.h#108 I'd have to fire up an IDE to track down all the sizes of the ddt_entry structure's members, but I feel comfortable using Richard's 270 bytes-per-entry estimate. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] block align SSD for use as a l2arc cache
On 7/9/2010 5:55 PM, Geoff Nordli wrote: I have an Intel X25-M 80GB SSD. For optimum performance, I need to block align the SSD device, but I am not sure exactly how I should to it. If I run the format - fdisk it allows me to partition based on a cylinder, but I don't think that is sufficient enough. Can someone tell me how they block aligned an SSD device for use in l2arc. Thanks, Geoff (a) what makes you think you need to do block alignment for an L2ARC usage (particularly if you give the entire device to ZFS)? (b) what makes you think that even if (a) is needed, that ZFS will respect 4k block boundaries? That is, why do you think that ZFS would put any effort into doing block alignment with its L2ARC writes? -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Debunking the dedup memory myth
On 07/09/10 19:40, Erik Trimble wrote: On 7/9/2010 5:18 PM, Brandon High wrote: On Fri, Jul 9, 2010 at 5:00 PM, Edward Ned Harvey solar...@nedharvey.com mailto:solar...@nedharvey.com wrote: The default ZFS block size is 128K. If you have a filesystem with 128G used, that means you are consuming 1,048,576 blocks, each of which must be checksummed. ZFS uses adler32 and sha256, which means 4bytes and 32bytes ... 36 bytes * 1M blocks = an extra 36 Mbytes and some fluff consumed by enabling dedup. I suspect my numbers are off, because 36Mbytes seems impossibly small. But I hope some sort of similar (and more correct) logic will apply. ;-) I think that DDT entries are a little bigger than what you're using. The size seems to range between 150 and 250 bytes depending on how it's calculated, call it 200b each. Your 128G dataset would require closer to 200M (+/- 25%) for the DDT if your data was completely unique. 1TB of unique data would require 600M - 1000M for the DDT. The numbers are fuzzy of course, and assum only 128k blocks. Lots of small files will increase the memory cost of dedupe, and using it on a zvol that has the default block size (8k) would require 16 times the memory. -B Go back and read several threads last month about ZFS/L2ARC memory usage for dedup. In particular, I've been quite specific about how to calculate estimated DDT size. Richard has also been quite good at giving size estimates (as well as explaining how to see current block size usage in a filesystem). The structure in question is this one: ddt_entry http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/sys/ddt.h#108 I'd have to fire up an IDE to track down all the sizes of the ddt_entry structure's members, but I feel comfortable using Richard's 270 bytes-per-entry estimate. It must have grown a bit because on 64 bit x86 a ddt_entry is currently 0x178 = 376 bytes : # mdb -k Loading modules: [ unix genunix specfs dtrace mac cpu.generic cpu_ms.AuthenticAMD.15 uppc pcplusmp scsi_vhci zfs sata sd ip hook neti sockfs arp usba fctl random cpc fcip nfs lofs ufs logindmux ptm sppp ipc ] ::sizeof struct ddt_entry sizeof (struct ddt_entry) = 0x178 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] block align SSD for use as a l2arc cache
-Original Message- From: Erik Trimble Sent: Friday, July 09, 2010 6:45 PM Subject: Re: [zfs-discuss] block align SSD for use as a l2arc cache On 7/9/2010 5:55 PM, Geoff Nordli wrote: I have an Intel X25-M 80GB SSD. For optimum performance, I need to block align the SSD device, but I am not sure exactly how I should to it. If I run the format - fdisk it allows me to partition based on a cylinder, but I don't think that is sufficient enough. Can someone tell me how they block aligned an SSD device for use in l2arc. Thanks, Geoff (a) what makes you think you need to do block alignment for an L2ARC usage (particularly if you give the entire device to ZFS)? (b) what makes you think that even if (a) is needed, that ZFS will respect 4k block boundaries? That is, why do you think that ZFS would put any effort into doing block alignment with its L2ARC writes? Thanks Erik. So obviously what you are saying is you don't need to worry about doing block alignment with an l2arc cache device because it will randomly read/write at the device block level instead of doing a larger writes like a file system. Have a great weekend! Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Scrub extremely slow?
Hello, I'm trying to figure out why I'm getting about 10MB/s scrubs, on a pool where I can easily get 100MB/s. It's 4x 1TB SATA2 (nv_sata), raidz. Athlon64 with 8GB RAM. Here's the output while I cat an 8GB file to /dev/null r...@solaris:~# zpool iostat 20 capacity operationsbandwidth poolalloc free read write read write -- - - - - - - rpool123G 26.0G 2 5 206K 38.8K tera3.04T 598G 19 43 813K 655K -- - - - - - - rpool123G 26.0G 1 0 199K 0 tera3.04T 598G966 0 121M 0 -- - - - - - - rpool123G 26.0G 1 8 212K 60.7K tera3.04T 598G 1.53K 7 195M 20.9K -- - - - - - - and here's what happens when I'm scrubbing the pool: -- - - - - - - rpool123G 26.0G 1 7 106K 78.2K tera3.04T 598G 87 8 10.5M 20.7K -- - - - - - - rpool123G 26.0G 0 7 90.3K 81.8K tera3.04T 598G 87 7 10.3M 18.1K -- - - - - - - rpool123G 26.0G 1 0 130K 0 tera3.04T 598G 88 0 10.5M 0 -- - - - - - - I'd be glad to provide any info you might need. Thanks. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss