Re: [zfs-discuss] S11 vs illumos zfs compatiblity
On 12/14/12 10:07 AM, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) wrote: Is that right? You can't use zfs send | zfs receive to send from a newer version and receive on an older version? That is my experience. If you do a zfs upgrade on the sending machine, the receiving machine requires a version = the sending machine. No. You can, with recv, override any property in the sending stream that can be set from the command line (ie, a writable). Version is not one of those properties. It only gets changed, in an upward direction, when you do a zfs upgrade. ie: # zfs get version repo/support NAME PROPERTY VALUESOURCE repo/support version 5- # zfs send repo/support@cpu-0412 | zfs recv -o version=4 repo/test cannot receive: cannot override received version You can send a version 6 file system into a version 28 pool, but it will still be a version 6 file system. Bob I am not disagreeing with this, but isn't this the opposite test from what Ned asked? You can send from an old version (6) to a new version (28), but I don't believe you can send the other way from the new version (28) to receive on the old version (6). Or am I reading this wrong? Chad ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS snapshot used space question
Is there a way to get the total amount of data referenced by a snapshot that isn't referenced by a specified snapshot/filesystem? I think this is what is really desired in order to locate snapshots with offending space usage. The written and written@ attributes seem to only do the reverse. I think you can back calculate it from the snapshot and filesystem referenced sizes, and the written@snap property of the filesystem, but that isn't particularly convenient to do (looks like zfs get -Hp ... makes it possible to hack a script together for, though). This is what I was hoping to get as well, but I am not sure it's really possible. Even if you try to calculate the referenced space + displayed used space and compare against the active filesystem that doesn't really tell you much because the data on the active filesystem might not be as static as you want. For example: If it references 10G and the active filesystem shows 10G used, you might expect that the snapshot isn't using any space. However, the 10G it referenced might have been deleted and the 10G in the active filesystem might be new data and that means your snap could be 10G. But if 9G of that was on another snapshot, you would have something like this: rootpool/export/home@snap.0- 1G- - - - rootpool/export/home@snap.1- 27K - - - - rootpool/export/home@snap.2- 0 - - - - And the referenced would look something like: rootpool/export/home@snap.00 - 10G - rootpool/export/home@snap.10 - 9G- rootpool/export/home@snap.10 - 10G- And the current filesystem would be: rootpool/export/home 40G 20G 10G 10G 0 0 Then imagine that across more than three snapshots. I can't wrap my head around logic that would work there. I would love if someone could figure out a good way though... - Chad ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS snapshot used space question
On Wed, Aug 29, 2012 at 8:58 PM, Timothy Coalson tsc...@mst.edu wrote: As I understand it, the used space of a snapshot does not include anything that is in more than one snapshot. True. It shows the amount that would be freed if you destroyed the snapshot right away. Data held onto by more than one snapshot cannot be removed when you destroy just one of them, obviously. The act of destroying a snapshot will likely change the USED value of the neighbouring snapshots though. Yup, this is the same thing I came up with as well. Though I am a bit disappointed in the results at least things make sense again. Thank you all for your help! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ok for single disk dev box?
Now that is interesting. But how do you do a receive before you reinstall? Live cd?? Just boot off of the CD (or jumpstart server) to single user mode. Format your new disk, create a zpool, zfs recv, installboot (or installgrub), reboot and done. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS snapshot used space question
All, I apologize in advance for what appears to be a question asked quite often, but I am not sure I have ever seen an answer that explains it. This may also be a bit long-winded so I apologize for that as well. I would like to know how much unique space each individual snapshot is using. I have a ZFS filesystem that shows: $zfs list -o space rootpool/export/home NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD rootpool/export/home 5.81G 14.4G 8.81G5.54G 0 0 So reading this I see that I have a total of 14.4G of space used by this data set. Currently 5.54 is active data that is available on the normal filesystem and 8.81G used in snapshots. 8.81G + 5.54G = 14.4G (roughly). I 100% agree with these numbers and the world makes sense. This is also backed up by: $zfs get usedbysnapshots rootpool/export/home NAME PROPERTYVALUE SOURCE rootpool/export/home usedbysnapshots 8.81G - Now if I wanted to see how much space any individual snapshot is currently using I would like to think that this would show me: $zfs list -ro space rootpool/export/home NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD rootpool/export/home 5.81G 14.4G 8.81G 5.54G 0 0 rootpool/export/home@week3 -202M - - - - rootpool/export/home@week2 -104M - - - - rootpool/export/home@7daysago-1.37M - - - - rootpool/export/home@6daysago-1.20M - - - - rootpool/export/home@5daysago-1020K - - - - rootpool/export/home@4daysago-342K - - - - rootpool/export/home@3daysago-1.28M - - - - rootpool/export/home@week1 -0- - - - rootpool/export/home@2daysago-0- - - - rootpool/export/home@yesterday - 360K - - - - rootpool/export/home@today-1.26M - - - - So normal logic would tell me if USEDSNAP is 8.81G and is composed of 11 snapshots, I would add up the size of each of those snapshots and that would roughly equal 8.81G. So time to break out the calculator: 202M + 104M + 1.37M + 1.20M + 1020K + 342K + 1.28M +0 +0 + 360K + 1.26M equals... ~312M! That is nowhere near 8.81G. I would accept it even if it was within 15%, but it's not even close. That definitely not metadata or ZFS overhead or anything. I understand that snapshots are just the delta between the time when the snapshot was taken and the current active filesystem and are truly just references to a block on disk rather than a copy. I also understand how two (or more) snapshots can reference the same block on a disk but yet there is still only that one block used. If I delete a recent snapshot I may not save as much space as advertised because some may be inherited by a parent snapshot. But that inheritance is not creating duplicate used space on disk so it doesn't justify the huge difference in sizes. But even with this logic in place there is currently 8.81G of blocks referred to by snapshots which are not currently on the active filesystem and I don't believe anyone can argue with that. Can something show me how much space a single snapshot has reserved? I searched through some of the archives and found this thread (http://mail.opensolaris.org/pipermail/zfs-discuss/2012-August/052163.html) from early this month and I feel as if I have the same problem as the OP, but hopefully attacking it with a little more background. I am not arguing with discrepancies between df/du and zfs output and I have read the Oracle documentation about it but haven't found what I feel like should be a simple answer. I currently have a ticket open with Oracle, but I am getting answers to all kinds of questions except for the question I am asking so I am hoping someone out there might be able to help me. I am a little concerned I am going to find out that there is no real way to show it and that makes for one sad SysAdmin. Thanks, Chad ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Ideas for ghetto file server data reliability?
On Nov 15, 2010, at 8:32 AM, Bryan Horstmann-Allen wrote: +-- | On 2010-11-15 10:21:06, Edward Ned Harvey wrote: | | Backups. | | Even if you upgrade your hardware to better stuff... with ECC and so on ... | There is no substitute for backups. Period. If you care about your data, | you will do backups. Period. Backups are not going to save you from bad memory writing corrupted data to disk. If your RAM flips a bit and writes garbage to disk, and you back up that garbage, guess what: Your backups are full of garbage. Invest in ECC RAM and hardware that is, at the least, less likely to screw you. Test your backups to ensure you can trust them. The amount of resources invested trying to fix this by someone is probably more than the costs of some ECC RAM and a MB ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] couple of ZFS questions
On Nov 12, 2010, at 5:54 AM, Edward Ned Harvey wrote: Why are you sharing iscsi from nexenta to freebsd? Wouldn't it be better for nexenta to simply create zfs filesystems, and then share nfs? Much more flexible in a lot of ways. Unless your design requirements require limiting the flexibility intentionally... I can't think of any reason you'd want to do the iscsi thing from nexenta to freebsd. Because for running jails (in very simple terms a FreeBSD jail is a really fancy chroot or a really simple approximation of a zone) it does not work very well (at least not in the past -- I have tried that but not recently). Things like apache don't want to run off NFS mounted file system for example (actual httpd daemon -- not the webroots etc).___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] couple of ZFS questions
I will be setting up a NexentaStor Community Edition based ZFS file server. I will be serving some zvols over iSCSI to some FreeBSD machines to host jails in. 1) The ZFS box offers a single iSCSI target that exposes all the zvols as individual disks. When the FreeBSD initiator finds it, it creates a separate disk for each zvol. I assume if I have multiple FreeBSD machines connecting to this iSCSI target, as long as no individual zvol is mounted on more than 1 FreeBSD machine, the fact that a disk exists for each zvol on each FreeBSD machine is irrelevant and won't cause problems 2) I am thinking about formatting the virtual disks served from the Nexenta iSCSI target as ZFS on the FreeBSD machine even though it has no redundancy. I see this as safe since the backing store on the Nexenta machine is a redundant based ZFS zvol... Is this correct thinking? Thanks Chad ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] couple of ZFS questions
On Nov 11, 2010, at 7:18 PM, Xin LI wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 11/11/10 17:57, Chad Leigh -- Shire.Net LLC wrote: I will be setting up a NexentaStor Community Edition based ZFS file server. I will be serving some zvols over iSCSI to some FreeBSD machines to host jails in. 1) The ZFS box offers a single iSCSI target that exposes all the zvols as individual disks. When the FreeBSD initiator finds it, it creates a separate disk for each zvol. I assume if I have multiple FreeBSD machines connecting to this iSCSI target, as long as no individual zvol is mounted on more than 1 FreeBSD machine, the fact that a disk exists for each zvol on each FreeBSD machine is irrelevant and won't cause problems This is correct. A follow-on question. If the zvol (virtual disks) are mounted READ ONLY, is it possible to mount it on multiple FreeBSD systems at the same time and access it for reading only from all the systems? (With only one system having it R/W and that only being used occasionally when the new software needs to be installed for the jails)? What I want to do does not rely on this but could make things easier for me... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143
Hi Garrett, Since my problem did turn out to be a debug kernel on my compilations, I booted back into the Nexanta 3 RC2 CD and let a scrub run for about half an hour to see if I just hadn't waited long enough the first time around. It never made it past 159 MB/s. I finally rebooted into my 145 non-debug kernel and within a few seconds of reimporting the pool the scrub was up to ~400 MB/s, so it does indeed seem like the Nexanta CD kernel is either in debug mode, or something else is slowing it down. Chad On Wed, Jul 21, 2010 at 09:12:35AM -0700, Garrett D'Amore wrote: On Wed, 2010-07-21 at 02:21 -0400, Richard Lowe wrote: I built in the normal fashion, with the CBE compilers (cc: Sun C 5.9 SunOS_i386 Patch 124868-10 2009/04/30), and 12u1 lint. I'm not subscribed to zfs-discuss, but have you established whether the problematic build is DEBUG? (the bits I uploaded were non-DEBUG). That would make a *huge* difference. DEBUG bits have zero optimization, and also have a great number of sanity tests included that are absent from the non-DEBUG bits. If these are expensive checks on a hot code path, it can have a very nasty impact on performance. Now that said, I *hope* the bits that Nexenta delivered were *not* DEBUG. But I've seen at least one bug that makes me think we might be delivering DEBUG binaries. I'll check into it. -- Garrett -- Rich Haudy Kazemi wrote: Could it somehow not be compiling 64-bit support? -- Brent Jones I thought about that but it says when it boots up that it is 64-bit, and I'm able to run 64-bit binaries. I wonder if it's compiling for the wrong processor optomization though? Maybe if it is missing some of the newer SSEx instructions the zpool checksum checking is slowed down significantly? I don't know how to check for this though and it seems strange it would slow it down this significantly. I'd expect even a non-SSE enabled binary to be able to calculate a few hundred MB of checksums per second for a 2.5+ghz processor. Chad Would it be possible to do a closer comparison between Rich Lowe's fast 142 build and your slow 142 build? For example run a diff on the source, build options, and build scripts. If the build settings are close enough, a comparison of the generated binaries might be a faster way to narrow things down (if the optimizations are different then a resultant binary comparison probably won't be useful). You said previously that: The procedure I followed was basically what is outlined here: http://insanum.com/blog/2010/06/08/how-to-build-opensolaris using the SunStudio 12 compilers for ON and 12u1 for lint. Are these the same compiler versions Rich Lowe used? Maybe there is a compiler optimization bug. Rich Lowe's build readme doesn't tell us which compiler he used. http://genunix.org/dist/richlowe/README.txt I suppose the easiest way for me to confirm if there is a regression or if my compiling is flawed is to just try compiling snv_142 using the same procedure and see if it works as well as Rich Lowe's copy or if it's slow like my other compilations. Chad Another older compilation guide: http://hub.opensolaris.org/bin/view/Community+Group+tools/building_opensolaris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143
Hi, My bits were originally debug because I didn't know any better. I thought I had then recompiled without debug to test again, but I didn't realize until just now the packages end up in a different directory (nightly vs nightly-nd) so I believe after compiling non-debug I just reinstalled the debug bits. I'm about to test again with an actual non-debug 142, and after that a non-debug 145 which just came out. Thanks, Chad On Wed, Jul 21, 2010 at 02:21:51AM -0400, Richard Lowe wrote: I built in the normal fashion, with the CBE compilers (cc: Sun C 5.9 SunOS_i386 Patch 124868-10 2009/04/30), and 12u1 lint. I'm not subscribed to zfs-discuss, but have you established whether the problematic build is DEBUG? (the bits I uploaded were non-DEBUG). -- Rich Haudy Kazemi wrote: Could it somehow not be compiling 64-bit support? -- Brent Jones I thought about that but it says when it boots up that it is 64-bit, and I'm able to run 64-bit binaries. I wonder if it's compiling for the wrong processor optomization though? Maybe if it is missing some of the newer SSEx instructions the zpool checksum checking is slowed down significantly? I don't know how to check for this though and it seems strange it would slow it down this significantly. I'd expect even a non-SSE enabled binary to be able to calculate a few hundred MB of checksums per second for a 2.5+ghz processor. Chad Would it be possible to do a closer comparison between Rich Lowe's fast 142 build and your slow 142 build? For example run a diff on the source, build options, and build scripts. If the build settings are close enough, a comparison of the generated binaries might be a faster way to narrow things down (if the optimizations are different then a resultant binary comparison probably won't be useful). You said previously that: The procedure I followed was basically what is outlined here: http://insanum.com/blog/2010/06/08/how-to-build-opensolaris using the SunStudio 12 compilers for ON and 12u1 for lint. Are these the same compiler versions Rich Lowe used? Maybe there is a compiler optimization bug. Rich Lowe's build readme doesn't tell us which compiler he used. http://genunix.org/dist/richlowe/README.txt I suppose the easiest way for me to confirm if there is a regression or if my compiling is flawed is to just try compiling snv_142 using the same procedure and see if it works as well as Rich Lowe's copy or if it's slow like my other compilations. Chad Another older compilation guide: http://hub.opensolaris.org/bin/view/Community+Group+tools/building_opensolaris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143
It does seem to be faster now that I really installed the non-debug bits. I let it resume a scrub after reboot, and while it's not as fast as it usually is (280 - 300 MB/s vs 500+) I assume it's just presently checking a part of the filesystem currently with smaller files thus reducing the speed, since it's well past the prior limitation. I tested 142 non-debug briefly until the scrub reached at least 250 MB/s and then booted into 145 non-debug where I'm letting the scrub finish now. I'll test the Nexanta disc again to be sure it was slow since I don't recall exactly how much time I gave it in my prior tests for the scrub to reach it's normal speed, although I can't do that until this evening when I'm home again. Chad On Wed, Jul 21, 2010 at 09:44:42AM -0700, Chad Cantwell wrote: Hi, My bits were originally debug because I didn't know any better. I thought I had then recompiled without debug to test again, but I didn't realize until just now the packages end up in a different directory (nightly vs nightly-nd) so I believe after compiling non-debug I just reinstalled the debug bits. I'm about to test again with an actual non-debug 142, and after that a non-debug 145 which just came out. Thanks, Chad On Wed, Jul 21, 2010 at 02:21:51AM -0400, Richard Lowe wrote: I built in the normal fashion, with the CBE compilers (cc: Sun C 5.9 SunOS_i386 Patch 124868-10 2009/04/30), and 12u1 lint. I'm not subscribed to zfs-discuss, but have you established whether the problematic build is DEBUG? (the bits I uploaded were non-DEBUG). -- Rich Haudy Kazemi wrote: Could it somehow not be compiling 64-bit support? -- Brent Jones I thought about that but it says when it boots up that it is 64-bit, and I'm able to run 64-bit binaries. I wonder if it's compiling for the wrong processor optomization though? Maybe if it is missing some of the newer SSEx instructions the zpool checksum checking is slowed down significantly? I don't know how to check for this though and it seems strange it would slow it down this significantly. I'd expect even a non-SSE enabled binary to be able to calculate a few hundred MB of checksums per second for a 2.5+ghz processor. Chad Would it be possible to do a closer comparison between Rich Lowe's fast 142 build and your slow 142 build? For example run a diff on the source, build options, and build scripts. If the build settings are close enough, a comparison of the generated binaries might be a faster way to narrow things down (if the optimizations are different then a resultant binary comparison probably won't be useful). You said previously that: The procedure I followed was basically what is outlined here: http://insanum.com/blog/2010/06/08/how-to-build-opensolaris using the SunStudio 12 compilers for ON and 12u1 for lint. Are these the same compiler versions Rich Lowe used? Maybe there is a compiler optimization bug. Rich Lowe's build readme doesn't tell us which compiler he used. http://genunix.org/dist/richlowe/README.txt I suppose the easiest way for me to confirm if there is a regression or if my compiling is flawed is to just try compiling snv_142 using the same procedure and see if it works as well as Rich Lowe's copy or if it's slow like my other compilations. Chad Another older compilation guide: http://hub.opensolaris.org/bin/view/Community+Group+tools/building_opensolaris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143
On Mon, Jul 19, 2010 at 07:01:54PM -0700, Chad Cantwell wrote: On Tue, Jul 20, 2010 at 10:54:44AM +1000, James C. McPherson wrote: On 20/07/10 10:40 AM, Chad Cantwell wrote: fyi, everyone, I have some more info here. in short, rich lowe's 142 works correctly (fast) on my hardware, while both my compilations (snv 143, snv 144) and also the nexanta 3 rc2 kernel (134 with backports) are horribly slow. I finally got around to trying rich lowe's snv 142 compilation in place of my own compilation of 143 (and later 144, not mentioned below), and unlike my own two compilations, his works very fast again on my same zpool ( scrubbing avg increased from low 100s to over 400 MB/s within a few minutes after booting into this copy of 142. I should note that since my original message, I also tried booting from a Nexanta Core 3.0 RC2 ISO after realizing it had zpool 26 support backported into 134 and was in fact able to read my zpool despite upgrading the version. Running a scrub from the F2 shell on the Nexanta CD was also slow scrubbing, just like the 143 and 144 that I compiled. So, there seem to be two possibilities. Either (and this seems unlikely) there is a problem introduced post-142 which slows things down, and it occured in 143, 144, and was brought back to 134 with Nexanta's backports, or else (more likely) there is something different or wrong with how I'm compiling the kernel that makes the hardware not perform up to its specifications with a zpool, and possibly the Nexanta 3 RC2 ISO has the same problem as my own compilations. So - what's your env file contents, which closedbins are you using, why crypto bits are you using, and what changeset is your own workspace synced with? James C. McPherson -- Oracle http://www.jmcp.homeunix.com/blog The procedure I followed was basically what is outlined here: http://insanum.com/blog/2010/06/08/how-to-build-opensolaris using the SunStudio 12 compilers for ON and 12u1 for lint. For each build (143, 144) I cloned the exact tag for that build, i.e.: # hg clone ssh://a...@hg.opensolaris.org/hg/onnv/onnv-gate onnv-b144 # cd onnv-b144 # hg update onnv_144 Then I downloaded the corresponding closed and crypto bins from http://dlc.sun.com/osol/on/downloads/b143 or http://dlc.sun.com/osol/on/downloads/b144 The only environemnt variables I modified from the default opensolaris.sh file were the basic ones: GATE, CODEMGR_WS, STAFFER, and ON_CRYPTO_BINS to point to my work directory for the build, my username, and the relevant crypto bin: $ egrep -e ^GATE|^CODEMGR_WS|^STAFFER|^ON_CRYPTO_BINS opensolaris.sh GATE=onnv-b144; export GATE CODEMGR_WS=/work/compiling/$GATE; export CODEMGR_WS STAFFER=chad; export STAFFER ON_CRYPTO_BINS=$CODEMGR_WS/on-crypto-latest.$MACH.tar.bz2 I suppose the easiest way for me to confirm if there is a regression or if my compiling is flawed is to just try compiling snv_142 using the same procedure and see if it works as well as Rich Lowe's copy or if it's slow like my other compilations. Chad I've just compiled and booted into snv_142, and I experienced the same slow dd and scrubbing as I did with my 142 and 143 compilations and with the Nexanta 3 RC2 CD. So, this would seem to indicate a build environment/process flaw rather than a regression. Chad ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143
Yes, I think this might have been it. I missed the NIGHTLY_OPTIONS variable in opensolaris and I think it was compiling a debug build. I'm not sure what the ramifications are of this or how much slower a debug build should be, but I'm recompiling a release build now so hopefully all will be well. Thanks, Chad On Tue, Jul 20, 2010 at 08:39:42AM +0100, Robert Milkowski wrote: On 20/07/2010 07:59, Chad Cantwell wrote: I've just compiled and booted into snv_142, and I experienced the same slow dd and scrubbing as I did with my 142 and 143 compilations and with the Nexanta 3 RC2 CD. So, this would seem to indicate a build environment/process flaw rather than a regression. Are you sure it is not a debug vs. non-debug issue? -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143
No, this wasn't it. A non debug build with the same NIGHTLY_OPTIONS at Rich Lowe's 142 build is still very slow... On Tue, Jul 20, 2010 at 09:52:10AM -0700, Chad Cantwell wrote: Yes, I think this might have been it. I missed the NIGHTLY_OPTIONS variable in opensolaris and I think it was compiling a debug build. I'm not sure what the ramifications are of this or how much slower a debug build should be, but I'm recompiling a release build now so hopefully all will be well. Thanks, Chad On Tue, Jul 20, 2010 at 08:39:42AM +0100, Robert Milkowski wrote: On 20/07/2010 07:59, Chad Cantwell wrote: I've just compiled and booted into snv_142, and I experienced the same slow dd and scrubbing as I did with my 142 and 143 compilations and with the Nexanta 3 RC2 CD. So, this would seem to indicate a build environment/process flaw rather than a regression. Are you sure it is not a debug vs. non-debug issue? -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143
On Tue, Jul 20, 2010 at 10:45:58AM -0700, Brent Jones wrote: On Tue, Jul 20, 2010 at 10:29 AM, Chad Cantwell c...@iomail.org wrote: No, this wasn't it. A non debug build with the same NIGHTLY_OPTIONS at Rich Lowe's 142 build is still very slow... On Tue, Jul 20, 2010 at 09:52:10AM -0700, Chad Cantwell wrote: Yes, I think this might have been it. I missed the NIGHTLY_OPTIONS variable in opensolaris and I think it was compiling a debug build. I'm not sure what the ramifications are of this or how much slower a debug build should be, but I'm recompiling a release build now so hopefully all will be well. Thanks, Chad On Tue, Jul 20, 2010 at 08:39:42AM +0100, Robert Milkowski wrote: On 20/07/2010 07:59, Chad Cantwell wrote: I've just compiled and booted into snv_142, and I experienced the same slow dd and scrubbing as I did with my 142 and 143 compilations and with the Nexanta 3 RC2 CD. So, this would seem to indicate a build environment/process flaw rather than a regression. Are you sure it is not a debug vs. non-debug issue? -- Robert Milkowski http://milek.blogspot.com Could it somehow not be compiling 64-bit support? -- Brent Jones br...@servuhome.net I thought about that but it says when it boots up that it is 64-bit, and I'm able to run 64-bit binaries. I wonder if it's compiling for the wrong processor optomization though? Maybe if it is missing some of the newer SSEx instructions the zpool checksum checking is slowed down significantly? I don't know how to check for this though and it seems strange it would slow it down this significantly. I'd expect even a non-SSE enabled binary to be able to calculate a few hundred MB of checksums per second for a 2.5+ghz processor. Chad ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143
fyi, everyone, I have some more info here. in short, rich lowe's 142 works correctly (fast) on my hardware, while both my compilations (snv 143, snv 144) and also the nexanta 3 rc2 kernel (134 with backports) are horribly slow. I finally got around to trying rich lowe's snv 142 compilation in place of my own compilation of 143 (and later 144, not mentioned below), and unlike my own two compilations, his works very fast again on my same zpool ( scrubbing avg increased from low 100s to over 400 MB/s within a few minutes after booting into this copy of 142. I should note that since my original message, I also tried booting from a Nexanta Core 3.0 RC2 ISO after realizing it had zpool 26 support backported into 134 and was in fact able to read my zpool despite upgrading the version. Running a scrub from the F2 shell on the Nexanta CD was also slow scrubbing, just like the 143 and 144 that I compiled. So, there seem to be two possibilities. Either (and this seems unlikely) there is a problem introduced post-142 which slows things down, and it occured in 143, 144, and was brought back to 134 with Nexanta's backports, or else (more likely) there is something different or wrong with how I'm compiling the kernel that makes the hardware not perform up to its specifications with a zpool, and possibly the Nexanta 3 RC2 ISO has the same problem as my own compilations. Chad On Tue, Jul 06, 2010 at 03:08:50PM -0700, Chad Cantwell wrote: Hi all, I've noticed something strange in the throughput in my zpool between different snv builds, and I'm not sure if it's an inherent difference in the build or a kernel parameter that is different in the builds. I've setup two similiar machines and this happens with both of them. Each system has 16 2TB Samsung HD203WI drives (total) directly connected to two LSI 3081E-R 1068e cards with IT firmware in one raidz3 vdev. In both computers, after a fresh installation of snv 134, the throughput is a maximum of about 300 MB/s during scrub or something like dd if=/dev/zero bs=1024k of=bigfile. If I bfu to snv 138, I then get throughput of about 700 MB/s with both scrub or a single thread dd. I assumed at first this was some sort of bug or regression in 134 that made it slow. However, I've now tested also from the fresh 134 installation, compiling the OS/Net build 143 from the mercurial repository and booting into it, after which the dd throughput is still only about 300 MB/s just like snv 134. The scrub throughput in 143 is even slower, rarely surpassing 150 MB/s. I wonder if the scrubbing being extra slow here is related to the additional statistics displayed during the scrub that didn't used to be shown. Is there some kind of debug option that might be enabled in the 134 build and persist if I compile snv 143 which would be off if I installed a 138 through bfu? If not, it makes me think that the bfu to 138 is changing the configuration somewhere to make it faster rather than fixing a bug or being a debug flag on or off. Does anyone have any idea what might be happening? One thing I haven't tried is bfu'ing to 138, and from this faster working snv 138 installing the snv 143 build, which may possibly create a 143 that performs faster if it's simply a configuration parameter. I'm not sure offhand if installing source-compiled ON builds from a bfu'd rpool is supported, although I suppose it's simple enough to try. Thanks, Chad Cantwell ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143
On Tue, Jul 20, 2010 at 10:54:44AM +1000, James C. McPherson wrote: On 20/07/10 10:40 AM, Chad Cantwell wrote: fyi, everyone, I have some more info here. in short, rich lowe's 142 works correctly (fast) on my hardware, while both my compilations (snv 143, snv 144) and also the nexanta 3 rc2 kernel (134 with backports) are horribly slow. I finally got around to trying rich lowe's snv 142 compilation in place of my own compilation of 143 (and later 144, not mentioned below), and unlike my own two compilations, his works very fast again on my same zpool ( scrubbing avg increased from low 100s to over 400 MB/s within a few minutes after booting into this copy of 142. I should note that since my original message, I also tried booting from a Nexanta Core 3.0 RC2 ISO after realizing it had zpool 26 support backported into 134 and was in fact able to read my zpool despite upgrading the version. Running a scrub from the F2 shell on the Nexanta CD was also slow scrubbing, just like the 143 and 144 that I compiled. So, there seem to be two possibilities. Either (and this seems unlikely) there is a problem introduced post-142 which slows things down, and it occured in 143, 144, and was brought back to 134 with Nexanta's backports, or else (more likely) there is something different or wrong with how I'm compiling the kernel that makes the hardware not perform up to its specifications with a zpool, and possibly the Nexanta 3 RC2 ISO has the same problem as my own compilations. So - what's your env file contents, which closedbins are you using, why crypto bits are you using, and what changeset is your own workspace synced with? James C. McPherson -- Oracle http://www.jmcp.homeunix.com/blog The procedure I followed was basically what is outlined here: http://insanum.com/blog/2010/06/08/how-to-build-opensolaris using the SunStudio 12 compilers for ON and 12u1 for lint. For each build (143, 144) I cloned the exact tag for that build, i.e.: # hg clone ssh://a...@hg.opensolaris.org/hg/onnv/onnv-gate onnv-b144 # cd onnv-b144 # hg update onnv_144 Then I downloaded the corresponding closed and crypto bins from http://dlc.sun.com/osol/on/downloads/b143 or http://dlc.sun.com/osol/on/downloads/b144 The only environemnt variables I modified from the default opensolaris.sh file were the basic ones: GATE, CODEMGR_WS, STAFFER, and ON_CRYPTO_BINS to point to my work directory for the build, my username, and the relevant crypto bin: $ egrep -e ^GATE|^CODEMGR_WS|^STAFFER|^ON_CRYPTO_BINS opensolaris.sh GATE=onnv-b144; export GATE CODEMGR_WS=/work/compiling/$GATE; export CODEMGR_WS STAFFER=chad; export STAFFER ON_CRYPTO_BINS=$CODEMGR_WS/on-crypto-latest.$MACH.tar.bz2 I suppose the easiest way for me to confirm if there is a regression or if my compiling is flawed is to just try compiling snv_142 using the same procedure and see if it works as well as Rich Lowe's copy or if it's slow like my other compilations. Chad ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143
On Mon, Jul 19, 2010 at 06:00:04PM -0700, Brent Jones wrote: On Mon, Jul 19, 2010 at 5:40 PM, Chad Cantwell c...@iomail.org wrote: fyi, everyone, I have some more info here. in short, rich lowe's 142 works correctly (fast) on my hardware, while both my compilations (snv 143, snv 144) and also the nexanta 3 rc2 kernel (134 with backports) are horribly slow. I finally got around to trying rich lowe's snv 142 compilation in place of my own compilation of 143 (and later 144, not mentioned below), and unlike my own two compilations, his works very fast again on my same zpool ( scrubbing avg increased from low 100s to over 400 MB/s within a few minutes after booting into this copy of 142. I should note that since my original message, I also tried booting from a Nexanta Core 3.0 RC2 ISO after realizing it had zpool 26 support backported into 134 and was in fact able to read my zpool despite upgrading the version. Running a scrub from the F2 shell on the Nexanta CD was also slow scrubbing, just like the 143 and 144 that I compiled. So, there seem to be two possibilities. Either (and this seems unlikely) there is a problem introduced post-142 which slows things down, and it occured in 143, 144, and was brought back to 134 with Nexanta's backports, or else (more likely) there is something different or wrong with how I'm compiling the kernel that makes the hardware not perform up to its specifications with a zpool, and possibly the Nexanta 3 RC2 ISO has the same problem as my own compilations. Chad On Tue, Jul 06, 2010 at 03:08:50PM -0700, Chad Cantwell wrote: Hi all, I've noticed something strange in the throughput in my zpool between different snv builds, and I'm not sure if it's an inherent difference in the build or a kernel parameter that is different in the builds. I've setup two similiar machines and this happens with both of them. Each system has 16 2TB Samsung HD203WI drives (total) directly connected to two LSI 3081E-R 1068e cards with IT firmware in one raidz3 vdev. In both computers, after a fresh installation of snv 134, the throughput is a maximum of about 300 MB/s during scrub or something like dd if=/dev/zero bs=1024k of=bigfile. If I bfu to snv 138, I then get throughput of about 700 MB/s with both scrub or a single thread dd. I assumed at first this was some sort of bug or regression in 134 that made it slow. However, I've now tested also from the fresh 134 installation, compiling the OS/Net build 143 from the mercurial repository and booting into it, after which the dd throughput is still only about 300 MB/s just like snv 134. The scrub throughput in 143 is even slower, rarely surpassing 150 MB/s. I wonder if the scrubbing being extra slow here is related to the additional statistics displayed during the scrub that didn't used to be shown. Is there some kind of debug option that might be enabled in the 134 build and persist if I compile snv 143 which would be off if I installed a 138 through bfu? If not, it makes me think that the bfu to 138 is changing the configuration somewhere to make it faster rather than fixing a bug or being a debug flag on or off. Does anyone have any idea what might be happening? One thing I haven't tried is bfu'ing to 138, and from this faster working snv 138 installing the snv 143 build, which may possibly create a 143 that performs faster if it's simply a configuration parameter. I'm not sure offhand if installing source-compiled ON builds from a bfu'd rpool is supported, although I suppose it's simple enough to try. Thanks, Chad Cantwell ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss I'm surprised you're even getting 400MB/s on the fast configurations, with only 16 drives in a Raidz3 configuration. To me, 16 drives in Raidz3 (single Vdev) would do about 150MB/sec, as your slow speeds suggest. -- Brent Jones br...@servuhome.net With which drives and controllers? For a single dd thread writing a large file to fill up a new zpool from /dev/zero, in this configuration I can sustain over 700 MB/s for the duration of the process and can fill up the ~26t usable space overnight. This is with two 8 port LSI 1068e controllers and no expanders. RAIDZ operates similiar to regular raid and you should get striped speeds for sequential access minus any inefficiencies and processing time for the parity. 16 disks in raidz3 is 13 disks worth of striping so with ~700 MB/s I'm getting about 50% efficiency after the parity calculations etc which is fine with me. I understand that some people need to have higher performance random I/O to many
[zfs-discuss] zpool throughput: snv 134 vs 138 vs 143
Hi all, I've noticed something strange in the throughput in my zpool between different snv builds, and I'm not sure if it's an inherent difference in the build or a kernel parameter that is different in the builds. I've setup two similiar machines and this happens with both of them. Each system has 16 2TB Samsung HD203WI drives (total) directly connected to two LSI 3081E-R 1068e cards with IT firmware in one raidz3 vdev. In both computers, after a fresh installation of snv 134, the throughput is a maximum of about 300 MB/s during scrub or something like dd if=/dev/zero bs=1024k of=bigfile. If I bfu to snv 138, I then get throughput of about 700 MB/s with both scrub or a single thread dd. I assumed at first this was some sort of bug or regression in 134 that made it slow. However, I've now tested also from the fresh 134 installation, compiling the OS/Net build 143 from the mercurial repository and booting into it, after which the dd throughput is still only about 300 MB/s just like snv 134. The scrub throughput in 143 is even slower, rarely surpassing 150 MB/s. I wonder if the scrubbing being extra slow here is related to the additional statistics displayed during the scrub that didn't used to be shown. Is there some kind of debug option that might be enabled in the 134 build and persist if I compile snv 143 which would be off if I installed a 138 through bfu? If not, it makes me think that the bfu to 138 is changing the configuration somewhere to make it faster rather than fixing a bug or being a debug flag on or off. Does anyone have any idea what might be happening? One thing I haven't tried is bfu'ing to 138, and from this faster working snv 138 installing the snv 143 build, which may possibly create a 143 that performs faster if it's simply a configuration parameter. I'm not sure offhand if installing source-compiled ON builds from a bfu'd rpool is supported, although I suppose it's simple enough to try. Thanks, Chad Cantwell ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] compressed root pool at installation time with flash archive predeployment script
I was trying to think of a way to set compression=on at the beginning of a jumpstart. The only idea I've come up with is to do so with a flash archive predeployment script. Has anyone else tried this approach? Thanks, Chad___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS replace - many to one
I'm looking to migrate a pool from using multiple smaller LUNs to one larger LUN. I don't see a way to do a zpool replace for multiple to one. Anybody know how to do this? It needs to be non disruptive. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] mpt errors on snv 127
fyi to everyone, the Asus P5W64 motherboard previously in my opensolaris machine was the culprit, and not the general mpt issues. At the time the motherboard was originally put in that machine, there was not enough zfs i/o load to trigger the problem which led to the false impression the hardware was fine. I'm using a 5400 chipset xeon board now (asus dseb-gh) and my LSI cards are working perfectly again; over 2 hours of heavy I/O and no errors or warnings with snv 127 (with the P5W64/LSI combo with build 127 it would never run more than 15 minutes without warnings). I chose this board partly since it has PCI-X slots and I thought those might be useful for AOC-SAT2-MV8 cards if I couldn't shake the mpt issues, but now that the mpt issues are gone I can continue with that controller if I want. Thanks everyone for your help, Chad On Sun, Dec 06, 2009 at 11:12:50PM -0800, Chad Cantwell wrote: Thanks for the info on the yukon driver. I realize too many variables makes things impossible to determine, but I had made these hardware changes awhile back, and they seemed to work fine at the time. Since they aren't now, even in the older OpenSolaris (i've tried 2009.06 and 2008.11 now), the problem seems to be a hardware quirk, and the only way to narrow that down is to change hardware back until it works like it used to in at least the older snv builds. I've ruled out the ethernet controller. I'm leaning toward the current motherboard (Asus P5W64) not playing nicely with the LSI cards, but it will probably be several days until I get to the bottom of this since it takes awhile to test after making a change... Thanks, Chad On Mon, Dec 07, 2009 at 11:09:39AM +1000, James C. McPherson wrote: Gday Chad, the more swaptronics you partake in, the more difficult it is going to be for us (collectively) to figure out what is going wrong on your system. Btw, since you're running a build past 124, you can use the yge driver instead of the yukonx (from Marvell) or myk (from Murayama-san) drivers. As another comment in this thread has mentioned, a full scrub can be a serious test of your hardware depending on how much data you've got to walk over. If you can keep the hardware variables to a minimum then clarity will be more achievable. thankyou, James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] mpt errors on snv 127
Thanks for the info on the yukon driver. I realize too many variables makes things impossible to determine, but I had made these hardware changes awhile back, and they seemed to work fine at the time. Since they aren't now, even in the older OpenSolaris (i've tried 2009.06 and 2008.11 now), the problem seems to be a hardware quirk, and the only way to narrow that down is to change hardware back until it works like it used to in at least the older snv builds. I've ruled out the ethernet controller. I'm leaning toward the current motherboard (Asus P5W64) not playing nicely with the LSI cards, but it will probably be several days until I get to the bottom of this since it takes awhile to test after making a change... Thanks, Chad On Mon, Dec 07, 2009 at 11:09:39AM +1000, James C. McPherson wrote: Gday Chad, the more swaptronics you partake in, the more difficult it is going to be for us (collectively) to figure out what is going wrong on your system. Btw, since you're running a build past 124, you can use the yge driver instead of the yukonx (from Marvell) or myk (from Murayama-san) drivers. As another comment in this thread has mentioned, a full scrub can be a serious test of your hardware depending on how much data you've got to walk over. If you can keep the hardware variables to a minimum then clarity will be more achievable. thankyou, James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] mpt errors on snv 127
Hi all, Unfortunately for me, there does seem to be a hardware component to my problem. Although my rsync copied almost 4TB of data with no iostat errors after going back to OpenSolaris 2009.06, I/O on one of my mpt cards did eventually hang, with 6 disk lights on and 2 off, until rebooting. There are a few hardware changes made since the last time I did a full backup, so it's possible that whatever problem was introduced didn't happen frequently enough in low i/o usage for me to detect until now when I was reinstalling and copying massive amounts of data back. The changes I had made since originally installing osol2009.06 several months ago are: - stop using marvel yukon2 ethernet onboard driver (which used a 3rd party driver) in favor of intel 1000 pt dual port, which necessesitated an extra pci-e slot, prompting the following item: - swapped motherboards between 2 machines (they were similiar though, with similiar onboard hardware and shouldn't have been a major change). Originally was an Asus P5Q Deluxe w/3 pci-e slots, now is a slightly older Asus P5W64 w/4 pci-e slots. - the intel 1000 pt dual port card has been aggregated as aggr0 since it was installed (the older yukon2 was a basic interface) the above changes were what was done awhile ago before upgrading opensolaris to 127, and things seemed to be working fine for at least 2-3 months with rsync updating (never hung, or had a fatal zfs error or lost access to data requiring a reboot) new changes since troubleshooting snv 127 mpt issues: - upgrade LSI 3081 firmware from 1.28.2 (or was it .02) to 1.29, the latest. If this turns out to be an issue, I do have the previous IT firmware that I was using before which I can flash back. another, albeit unlikely factor: when I originally copied all my data to my first opensolaris raidz2 pool, I didn't use rsync at all, I used netcat tar, and only setup rsync later for updates. perhaps the huge initial single rsync of the large tree does something strange that the original intiial netcat tar copy did not (i know, unlikely, but I'm grasping at straws here to determine what has happened). I'll work on ruling out the potential sources of hardware problems before I report any more on the mpt issues, since my test case would probably confound things at this point. I am affected by the mpt bugs since I would get the timeouts almost constantly in snv 127+, but since I'm also apparently affected by some other unknown hardware issue, my data on the mpt problems might lead people in the wrong direction at this point. I will first try to go back to the non-aggregated yukon ethernet, remove the intel dual port pci-e network adapter, then if the problem persists try half of my drives on each LSI controller individually to confirm if one controller has a problem the other does not, or one drive in one set is causing a new problem to a particular controller. I hope to have some kind of answer at that point and not have to resort to motherboard swapping again. Chad On Thu, Dec 03, 2009 at 10:44:53PM -0800, Chad Cantwell wrote: I eventually performed a few more tests, adjusting some zfs tuning options which had no effect, and trying the itmpt driver which someone had said would work, and regardless my system would always freeze quite rapidly in snv 127 and 128a. Just to double check my hardware, I went back to the opensolaris 2009.06 release version, and everything is working fine. The system has been running a few hours and copied a lot of data and not had any trouble, mpt syslog events, or iostat errors. One thing I found interesting, and I don't know if it's significant or not, is that under the recent builds and under 2009.06, I had run echo '::interrupts' | mdb -k to check the interrupts used. (I don't have the printout handy for snv 127+, though). I have a dual port gigabit Intel 1000 P PCI-e card, which shows up as e1000g0 and e1000g1. In snv 127+, each of my e1000g devices shares an IRQ with my mpt devices (mpt0, mpt1) on the IRQ listing, whereas in opensolaris 2009.06, all 4 devices are on different IRQs. I don't know if this is significant, but most of my testing when I encountered errors was data transfer via the network, so it could have potentially been interfering with the mpt drivers when it was on the same IRQ. The errors did seem to be less frequent when the server I was copying from was linked at 100 instead of 1000 (one of my tests), but that is as likely to be a result of the slower zpool throughput as it is to be related to the network traffic. I'll probably stay with 2009.06 for now since it works fine for me, but I can try a newer build again once some more progress is made in this area and people want to see if its fixed (this machine is mainly to backup another array so it's not too big a deal to test later when the mpt drivers are looking better and wipe again in the event of problems) Chad
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
I was under the impression that the problem affecting most of us was introduced much later than b104, sometime between ~114 and ~118. When I first started using my LSI 3081 cards, they had the IR firmware on them, and it caused me all kinds of problems. The disks showed up but I couldn't write to them, I believe. Eventually I found that I needed the IT firmware for it to work properly, which is what I have used ever since, but maybe some builds do work with IR firmware? I remember, then, when I was originally trying to set them up with the IR firmware, Opensolaris saw my two cards as one device, whereas with the IT firmware they were always mpt0 and mpt1. Could also be the IR works with one card but not well when two cards are combine... Chad On Sat, Dec 05, 2009 at 02:47:55PM -0800, Calvin Morrow wrote: I found this thread after fighting the same problem in Nexenta which uses the OpenSolaris kernel from b104. Thankfully, I think I have (for the moment) solved my problem. Background: I have an LSI 3081e-R (1068E based) adapter which experiences the same disconnected command timeout error under relatively light load. This card connects to a Supermicro chassis using 2 MiniSAS cables to redundant expanders that are attached to 18 SAS drives. The card ran the latest IT firmware (1.29?). This server is a new install, and even installing from the CD to two disks in a mirrored ZFS root would randomly cause the disconnect error. The system remained unresponsive until after a reboot. I tried the workarounds mentioned in this thread, namely using set mpt:mpt_enable_msi = 0 and set xpv_psm:xen_support_msi = -1 in /etc/system. Once I added those lines, the system never really became unresponsive, however there were partial read and partial write messages that littered dmesg. At one point there appeared to be a disconnect error ( can not confirm ) that the system recovered from. Eventually, I became desperate and flashed the IR (Integrated Raid) firmware over the top of the IT firmware. Since then, I have had no errors in dmesg of any kind. I even removed the workarounds from /etc/system and still have had no issues. The mpt driver is exceptionally quiet now. I'm interested to know if anyone who has a 1068E based card is having these problems using the IR firmware, or if they all seem to be IT (initiator target) related. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] mpt errors on snv 127
I eventually performed a few more tests, adjusting some zfs tuning options which had no effect, and trying the itmpt driver which someone had said would work, and regardless my system would always freeze quite rapidly in snv 127 and 128a. Just to double check my hardware, I went back to the opensolaris 2009.06 release version, and everything is working fine. The system has been running a few hours and copied a lot of data and not had any trouble, mpt syslog events, or iostat errors. One thing I found interesting, and I don't know if it's significant or not, is that under the recent builds and under 2009.06, I had run echo '::interrupts' | mdb -k to check the interrupts used. (I don't have the printout handy for snv 127+, though). I have a dual port gigabit Intel 1000 P PCI-e card, which shows up as e1000g0 and e1000g1. In snv 127+, each of my e1000g devices shares an IRQ with my mpt devices (mpt0, mpt1) on the IRQ listing, whereas in opensolaris 2009.06, all 4 devices are on different IRQs. I don't know if this is significant, but most of my testing when I encountered errors was data transfer via the network, so it could have potentially been interfering with the mpt drivers when it was on the same IRQ. The errors did seem to be less frequent when the server I was copying from was linked at 100 instead of 1000 (one of my tests), but that is as likely to be a result of the slower zpool throughput as it is to be related to the network traffic. I'll probably stay with 2009.06 for now since it works fine for me, but I can try a newer build again once some more progress is made in this area and people want to see if its fixed (this machine is mainly to backup another array so it's not too big a deal to test later when the mpt drivers are looking better and wipe again in the event of problems) Chad On Tue, Dec 01, 2009 at 03:06:31PM -0800, Chad Cantwell wrote: To update everyone, I did a complete zfs scrub, and it it generated no errors in iostat, and I have 4.8T of data on the filesystem so it was a fairly lengthy test. The machine also has exhibited no evidence of instability. If I were to start copying a lot of data to the filesystem again though, I'm sure it would generate errors and crash again. Chad On Tue, Dec 01, 2009 at 12:29:16AM -0800, Chad Cantwell wrote: Well, ok, the msi=0 thing didn't help after all. A few minutes after my last message a few errors showed up in iostat, and then in a few minutes more the machine was locked up hard... Maybe I will try just doing a scrub instead of my rsync process and see how that does. Chad On Tue, Dec 01, 2009 at 12:13:36AM -0800, Chad Cantwell wrote: I don't think the hardware has any problems, it only started having errors when I upgraded OpenSolaris. It's still working fine again now after a reboot. Actually, I reread one of your earlier messages, and I didn't realize at first when you said non-Sun JBOD that this didn't apply to me (in regards to the msi=0 fix) because I didn't realize JBOD was shorthand for an external expander device. Since I'm just using baremetal, and passive backplanes, I think the msi=0 fix should apply to me based on what you wrote earlier, anyway I've put set mpt:mpt_enable_msi = 0 now in /etc/system and rebooted as it was suggested earlier. I've resumed my rsync, and so far there have been no errors, but it's only been 20 minutes or so. I should have a good idea by tomorrow if this definitely fixed the problem (since even when the machine was not crashing it was tallying up iostat errors fairly rapidly) Thanks again for your help. Sorry for wasting your time if the previously posted workaround fixes things. I'll let you know tomorrow either way. Chad On Tue, Dec 01, 2009 at 05:57:28PM +1000, James C. McPherson wrote: Chad Cantwell wrote: After another crash I checked the syslog and there were some different errors than the ones I saw previously during operation: ... Nov 30 20:59:13 the-vault LSI PCI device (1000,) not supported. ... Nov 30 20:59:13 the-vault mpt_config_space_init failed ... Nov 30 20:59:15 the-vault mpt_restart_ioc failed Nov 30 21:33:02 the-vault fmd: [ID 377184 daemon.error] SUNW-MSG-ID: PCIEX-8000-8R, TYPE: Fault, VER: 1, SEVERITY: Major Nov 30 21:33:02 the-vault EVENT-TIME: Mon Nov 30 21:33:02 PST 2009 Nov 30 21:33:02 the-vault PLATFORM: System-Product-Name, CSN: System-Serial-Number, HOSTNAME: the-vault Nov 30 21:33:02 the-vault SOURCE: eft, REV: 1.16 Nov 30 21:33:02 the-vault EVENT-ID: 7886cc0d-4760-60b2-e06a-8158c3334f63 Nov 30 21:33:02 the-vault DESC: The transmitting device sent an invalid request. Nov 30 21:33:02 the-vault Refer to http://sun.com/msg/PCIEX-8000-8R for more information. Nov 30
Re: [zfs-discuss] mpt errors on snv 127
I don't think the hardware has any problems, it only started having errors when I upgraded OpenSolaris. It's still working fine again now after a reboot. Actually, I reread one of your earlier messages, and I didn't realize at first when you said non-Sun JBOD that this didn't apply to me (in regards to the msi=0 fix) because I didn't realize JBOD was shorthand for an external expander device. Since I'm just using baremetal, and passive backplanes, I think the msi=0 fix should apply to me based on what you wrote earlier, anyway I've put set mpt:mpt_enable_msi = 0 now in /etc/system and rebooted as it was suggested earlier. I've resumed my rsync, and so far there have been no errors, but it's only been 20 minutes or so. I should have a good idea by tomorrow if this definitely fixed the problem (since even when the machine was not crashing it was tallying up iostat errors fairly rapidly) Thanks again for your help. Sorry for wasting your time if the previously posted workaround fixes things. I'll let you know tomorrow either way. Chad On Tue, Dec 01, 2009 at 05:57:28PM +1000, James C. McPherson wrote: Chad Cantwell wrote: After another crash I checked the syslog and there were some different errors than the ones I saw previously during operation: ... Nov 30 20:59:13 the-vault LSI PCI device (1000,) not supported. ... Nov 30 20:59:13 the-vault mpt_config_space_init failed ... Nov 30 20:59:15 the-vault mpt_restart_ioc failed Nov 30 21:33:02 the-vault fmd: [ID 377184 daemon.error] SUNW-MSG-ID: PCIEX-8000-8R, TYPE: Fault, VER: 1, SEVERITY: Major Nov 30 21:33:02 the-vault EVENT-TIME: Mon Nov 30 21:33:02 PST 2009 Nov 30 21:33:02 the-vault PLATFORM: System-Product-Name, CSN: System-Serial-Number, HOSTNAME: the-vault Nov 30 21:33:02 the-vault SOURCE: eft, REV: 1.16 Nov 30 21:33:02 the-vault EVENT-ID: 7886cc0d-4760-60b2-e06a-8158c3334f63 Nov 30 21:33:02 the-vault DESC: The transmitting device sent an invalid request. Nov 30 21:33:02 the-vault Refer to http://sun.com/msg/PCIEX-8000-8R for more information. Nov 30 21:33:02 the-vault AUTO-RESPONSE: One or more device instances may be disabled Nov 30 21:33:02 the-vault IMPACT: Loss of services provided by the device instances associated with this fault Nov 30 21:33:02 the-vault REC-ACTION: Ensure that the latest drivers and patches are installed. Otherwise schedule a repair procedure to replace the affected device(s). Us e fmadm faulty to identify the devices or contact Sun for support. Sorry to have to tell you, but that HBA is dead. Or at least dying horribly. If you can't init the config space (that's the pci bus config space), then you've got about 1/2 the nails in the coffin hammered in. Then the failure to restart the IOC (io controller unit) == the rest of the lid hammered down. best regards, James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] mpt errors on snv 127
Well, ok, the msi=0 thing didn't help after all. A few minutes after my last message a few errors showed up in iostat, and then in a few minutes more the machine was locked up hard... Maybe I will try just doing a scrub instead of my rsync process and see how that does. Chad On Tue, Dec 01, 2009 at 12:13:36AM -0800, Chad Cantwell wrote: I don't think the hardware has any problems, it only started having errors when I upgraded OpenSolaris. It's still working fine again now after a reboot. Actually, I reread one of your earlier messages, and I didn't realize at first when you said non-Sun JBOD that this didn't apply to me (in regards to the msi=0 fix) because I didn't realize JBOD was shorthand for an external expander device. Since I'm just using baremetal, and passive backplanes, I think the msi=0 fix should apply to me based on what you wrote earlier, anyway I've put set mpt:mpt_enable_msi = 0 now in /etc/system and rebooted as it was suggested earlier. I've resumed my rsync, and so far there have been no errors, but it's only been 20 minutes or so. I should have a good idea by tomorrow if this definitely fixed the problem (since even when the machine was not crashing it was tallying up iostat errors fairly rapidly) Thanks again for your help. Sorry for wasting your time if the previously posted workaround fixes things. I'll let you know tomorrow either way. Chad On Tue, Dec 01, 2009 at 05:57:28PM +1000, James C. McPherson wrote: Chad Cantwell wrote: After another crash I checked the syslog and there were some different errors than the ones I saw previously during operation: ... Nov 30 20:59:13 the-vault LSI PCI device (1000,) not supported. ... Nov 30 20:59:13 the-vault mpt_config_space_init failed ... Nov 30 20:59:15 the-vault mpt_restart_ioc failed Nov 30 21:33:02 the-vault fmd: [ID 377184 daemon.error] SUNW-MSG-ID: PCIEX-8000-8R, TYPE: Fault, VER: 1, SEVERITY: Major Nov 30 21:33:02 the-vault EVENT-TIME: Mon Nov 30 21:33:02 PST 2009 Nov 30 21:33:02 the-vault PLATFORM: System-Product-Name, CSN: System-Serial-Number, HOSTNAME: the-vault Nov 30 21:33:02 the-vault SOURCE: eft, REV: 1.16 Nov 30 21:33:02 the-vault EVENT-ID: 7886cc0d-4760-60b2-e06a-8158c3334f63 Nov 30 21:33:02 the-vault DESC: The transmitting device sent an invalid request. Nov 30 21:33:02 the-vault Refer to http://sun.com/msg/PCIEX-8000-8R for more information. Nov 30 21:33:02 the-vault AUTO-RESPONSE: One or more device instances may be disabled Nov 30 21:33:02 the-vault IMPACT: Loss of services provided by the device instances associated with this fault Nov 30 21:33:02 the-vault REC-ACTION: Ensure that the latest drivers and patches are installed. Otherwise schedule a repair procedure to replace the affected device(s). Us e fmadm faulty to identify the devices or contact Sun for support. Sorry to have to tell you, but that HBA is dead. Or at least dying horribly. If you can't init the config space (that's the pci bus config space), then you've got about 1/2 the nails in the coffin hammered in. Then the failure to restart the IOC (io controller unit) == the rest of the lid hammered down. best regards, James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] mpt errors on snv 127
First I tried just upgrading to b127, that had a few issues besides the mpt driver. After that I did a clean install of b127, but no I don't have my osol2009.06 root still there. I wasn't sure how to install another copy and leave it there (I suspect it is possible, since I saw when doing upgrades it creates a second root environment, but my forte isn't solaris so I just reformatted the root device) On Tue, Dec 01, 2009 at 08:09:32AM -0500, Mark Johnson wrote: Chad Cantwell wrote: Hi, I was using for quite awhile OpenSolaris 2009.06 with the opensolaris-provided mpt driver to operate a zfs raidz2 pool of about ~20T and this worked perfectly fine (no issues or device errors logged for several months, no hanging). A few days ago I decided to reinstall with the latest OpenSolaris in order to take advantage of raidz3. Just to be clear... The same setup was working fine on osol2009.06, you upgraded to b127 and it started failing? Did you keep the osol2009.06 be around so you can reboot back to it? If so, have you tried the osol2009.06 mpt driver in the BE with the latest bits (make sure you make a backup copy of the mpt driver)? MRJ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] mpt errors on snv 127
To update everyone, I did a complete zfs scrub, and it it generated no errors in iostat, and I have 4.8T of data on the filesystem so it was a fairly lengthy test. The machine also has exhibited no evidence of instability. If I were to start copying a lot of data to the filesystem again though, I'm sure it would generate errors and crash again. Chad On Tue, Dec 01, 2009 at 12:29:16AM -0800, Chad Cantwell wrote: Well, ok, the msi=0 thing didn't help after all. A few minutes after my last message a few errors showed up in iostat, and then in a few minutes more the machine was locked up hard... Maybe I will try just doing a scrub instead of my rsync process and see how that does. Chad On Tue, Dec 01, 2009 at 12:13:36AM -0800, Chad Cantwell wrote: I don't think the hardware has any problems, it only started having errors when I upgraded OpenSolaris. It's still working fine again now after a reboot. Actually, I reread one of your earlier messages, and I didn't realize at first when you said non-Sun JBOD that this didn't apply to me (in regards to the msi=0 fix) because I didn't realize JBOD was shorthand for an external expander device. Since I'm just using baremetal, and passive backplanes, I think the msi=0 fix should apply to me based on what you wrote earlier, anyway I've put set mpt:mpt_enable_msi = 0 now in /etc/system and rebooted as it was suggested earlier. I've resumed my rsync, and so far there have been no errors, but it's only been 20 minutes or so. I should have a good idea by tomorrow if this definitely fixed the problem (since even when the machine was not crashing it was tallying up iostat errors fairly rapidly) Thanks again for your help. Sorry for wasting your time if the previously posted workaround fixes things. I'll let you know tomorrow either way. Chad On Tue, Dec 01, 2009 at 05:57:28PM +1000, James C. McPherson wrote: Chad Cantwell wrote: After another crash I checked the syslog and there were some different errors than the ones I saw previously during operation: ... Nov 30 20:59:13 the-vault LSI PCI device (1000,) not supported. ... Nov 30 20:59:13 the-vault mpt_config_space_init failed ... Nov 30 20:59:15 the-vault mpt_restart_ioc failed Nov 30 21:33:02 the-vault fmd: [ID 377184 daemon.error] SUNW-MSG-ID: PCIEX-8000-8R, TYPE: Fault, VER: 1, SEVERITY: Major Nov 30 21:33:02 the-vault EVENT-TIME: Mon Nov 30 21:33:02 PST 2009 Nov 30 21:33:02 the-vault PLATFORM: System-Product-Name, CSN: System-Serial-Number, HOSTNAME: the-vault Nov 30 21:33:02 the-vault SOURCE: eft, REV: 1.16 Nov 30 21:33:02 the-vault EVENT-ID: 7886cc0d-4760-60b2-e06a-8158c3334f63 Nov 30 21:33:02 the-vault DESC: The transmitting device sent an invalid request. Nov 30 21:33:02 the-vault Refer to http://sun.com/msg/PCIEX-8000-8R for more information. Nov 30 21:33:02 the-vault AUTO-RESPONSE: One or more device instances may be disabled Nov 30 21:33:02 the-vault IMPACT: Loss of services provided by the device instances associated with this fault Nov 30 21:33:02 the-vault REC-ACTION: Ensure that the latest drivers and patches are installed. Otherwise schedule a repair procedure to replace the affected device(s). Us e fmadm faulty to identify the devices or contact Sun for support. Sorry to have to tell you, but that HBA is dead. Or at least dying horribly. If you can't init the config space (that's the pci bus config space), then you've got about 1/2 the nails in the coffin hammered in. Then the failure to restart the IOC (io controller unit) == the rest of the lid hammered down. best regards, James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] mpt errors on snv 127
Hi, Sorry for not replying to one of the already open threads on this topic; I've just joined the list for the purposes of this discussion and have nothing in my client to reply to yet. I have an x86_64 opensolaris machine running on a Core 2 Quad Q9650 platform with two LSI SAS3081E-R PCI-E 8 port SAS controllers, with 8 drives each. The LSI cards are flashed with IT firmware from Feb 2009 (I think, I can double check if it's important). The drives are Samsung HD154UI 1.5TB disks. I was using for quite awhile OpenSolaris 2009.06 with the opensolaris-provided mpt driver to operate a zfs raidz2 pool of about ~20T and this worked perfectly fine (no issues or device errors logged for several months, no hanging). A few days ago I decided to reinstall with the latest OpenSolaris in order to take advantage of raidz3. I hadn't known at the time about the current mpt issues, or I may have held off on upgrading. I installed Solaris Nevada build 127 from the DVD image. I then proceed to setup a raidz3 pool with the same disks as before, of a slightly smaller size (obviously) than the former raidz2 pool. I started a moderately long-running and heavy load rsync to copy my data back to the pool from another host. Several times during the day (sometimes a couple times an hour, or it could go up to a few hours with no errors), I get several syslog errors and warnings about mpt, similiar but not identical to what I've seen reported here by others. Also, iostat -en shows several hw and trn errors of varying amounts for all the drives (in OpenSolaris 2009.06 I never had any iostat errors). After awhile the machine will hang in a variety of ways. The first time it was pingable, and I could authenticate through ssh but it would never spawn a shell. The second time it crashed it was unpingable from the network, and the display was black, although the numlock key was still toggling properly the numlock light on the console. Here's a sample of my errors. I've included the complete series of errors from one timestampe, and a few lines from a subsequent series of errors a couple minutes later: (if there's any other info I can provide or more things to test just let me know. Thanks, --Chad ) Nov 29 04:42:55 the-vault scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1): Nov 29 04:42:55 the-vault mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31120200 Nov 29 04:42:55 the-vault scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1): Nov 29 04:42:55 the-vault mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31120200 Nov 29 04:42:55 the-vault scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@0/pci1000,3...@0 (mpt0): Nov 29 04:42:55 the-vault mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31120200 Nov 29 04:42:55 the-vault scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@0/pci1000,3...@0 (mpt0): Nov 29 04:42:55 the-vault mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31120200 Nov 29 04:42:55 the-vault scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@0/pci1000,3...@0 (mpt0): Nov 29 04:42:55 the-vault mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31120200 Nov 29 04:42:55 the-vault scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@0/pci1000,3...@0 (mpt0): Nov 29 04:42:55 the-vault mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31120200 Nov 29 04:42:55 the-vault scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1): Nov 29 04:42:55 the-vault mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31120200 Nov 29 04:42:55 the-vault scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1): Nov 29 04:42:55 the-vault mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31120200 Nov 29 04:42:55 the-vault scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1): Nov 29 04:42:55 the-vault mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31120200 Nov 29 04:42:55 the-vault scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1): Nov 29 04:42:55 the-vault mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31120200 Nov 29 04:42:55 the-vault scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1): Nov 29 04:42:55 the-vault mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31120200 Nov 29 04:42:55 the-vault scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@1/pci1000,3...@0 (mpt1
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Hi, I just posted a summary of a similiar issue I'm having with non-Sun hardware. For the record, it's in a Chenbro RM41416 chassis with 4 chenbro SAS backplanes but no expanders (each backplane is 4 disks connected by SFF-8087 cable). Each of my LSI brand SAS3081E PCI-E cards is connected to two backplanes with 1m SFF-8087 (both ends) cables. For more details if they are important see my other post. I haven't tried the MSI workaround yet (although I'm not sure what MSI is) but from what I've read the workaround won't fix the issues in my case with non-sun hardware. Thanks, Chad On Tue, Dec 01, 2009 at 12:36:33PM +1000, James C. McPherson wrote: Hi all, I believe it's an accurate summary of the emails on this thread over the last 18 hours to say that (1) disabling MSI support in xVM makes the problem go away (2) disabling MSI support on bare metal when you only have disks internal to your host (no jbods), makes the problem go away (several reports of this) (3) disabling MSI support on bare metal when you have a non-Sun jbod (and cables) does _not_ make the problem go away. (several reports of this) (4) the problem is not seen with a Sun-branded jbod and cables (only one report of this) (5) problem is seen with both mpt(7d) and itmpt(7d). (6) mpt(7d) without MSI support is sloow. For those who've been suffering this problem and who have non-Sun jbods, could you please let me know what model of jbod and cables (including length thereof) you have in your configuration. For those of you who have been running xVM without MSI support, could you please confirm whether the devices exhibiting the problem are internal to your host, or connected via jbod. And if via jbod, please confirm the model number and cables. Please note that Jianfei and I are not making assumptions about the root cause here, we're just trying to nail down specifics of what seems to be a likely cause. thankyou in advance, James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] mpt errors on snv 127
Hi, Replied to your previous general query already, but in summary, they are in the server chassis. It's a Chenbro 16 hotswap bay case. It has 4 mini backplanes that each connect via an SFF-8087 cable (1m) to my LSI cards (2 cables / 8 drives per card). Chad On Tue, Dec 01, 2009 at 01:02:34PM +1000, James C. McPherson wrote: Chad Cantwell wrote: Hi, Sorry for not replying to one of the already open threads on this topic; I've just joined the list for the purposes of this discussion and have nothing in my client to reply to yet. I have an x86_64 opensolaris machine running on a Core 2 Quad Q9650 platform with two LSI SAS3081E-R PCI-E 8 port SAS controllers, with 8 drives each. Are these disks internal to your server's chassis, or external in a jbod? If in a jbod, which one? Also, which cables are you using? thankyou, James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] mpt errors on snv 127
Hi, The Chenbro chassis contains everything - the motherboard/CPU, and the disks. As far as I know the chenbro backplanes are basically electrical jumpers that the LSI cards shouldn't be aware of. They pass through the SATA signals directly from SFF-8087 cables to the disks. Thanks, Chad On Tue, Dec 01, 2009 at 01:43:06PM +1000, James C. McPherson wrote: Chad Cantwell wrote: Hi, Replied to your previous general query already, but in summary, they are in the server chassis. It's a Chenbro 16 hotswap bay case. It has 4 mini backplanes that each connect via an SFF-8087 cable (1m) to my LSI cards (2 cables / 8 drives per card). Hi Chad, thanks for the followup. Just to confirm - you've got this Chenbro chassis connected to the actual server chassis (where the cpu is), or do you have the cpu inside the Chenbro chassis? thankyou, James -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] mpt errors on snv 127
/pci111d,8...@0/pci111d,8...@0/pci1000,3...@0 (mpt0): Nov 30 22:38:21 the-vault mpt_config_space_init failed Nov 30 22:38:22 the-vault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@0/pci1000,3...@0 (mpt0): Nov 30 22:38:22 the-vault LSI PCI device (1000,) not supported. Nov 30 22:38:22 the-vault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@0/pci1000,3...@0 (mpt0): Nov 30 22:38:22 the-vault mpt_config_space_init failed Nov 30 22:38:46 the-vault sshd[636]: [ID 800047 auth.crit] monitor fatal: protocol error during kex, no DH_GEX_REQUEST: 254 Nov 30 22:38:46 the-vault sshd[637]: [ID 800047 auth.crit] fatal: Protocol error in privilege separation; expected packet type 254, got 20 Nov 30 23:11:23 the-vault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@0/pci1000,3...@0 (mpt0): Nov 30 23:11:23 the-vault mpt_send_handshake_msg task 3 failed Nov 30 23:11:23 the-vault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@0/pci1000,3...@0 (mpt0): Nov 30 23:11:23 the-vault LSI PCI device (1000,) not supported. Nov 30 23:11:23 the-vault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@0/pci1000,3...@0 (mpt0): Nov 30 23:11:23 the-vault mpt_config_space_init failed Nov 30 23:11:25 the-vault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@0/pci1000,3...@0 (mpt0): Nov 30 23:11:25 the-vault LSI PCI device (1000,) not supported. Nov 30 23:11:25 the-vault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@0/pci1000,3...@0 (mpt0): Nov 30 23:11:25 the-vault mpt_config_space_init failed Nov 30 23:11:25 the-vault scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,2...@3/pci111d,8...@0/pci111d,8...@0/pci1000,3...@0 (mpt0): Nov 30 23:11:25 the-vault mpt_restart_ioc failed (and that's the last message before I hit the reset button. Host was unpingable, and just moving the mouse around on the screen was extremely delayed) Nov 30 23:32:05 the-vault genunix: [ID 540533 kern.notice] ^MSunOS Release 5.11 Version snv_127 64-bit Nov 30 23:32:05 the-vault genunix: [ID 943908 kern.notice] Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved. Also, it says it resilvered some data; this is the first time I've seen any notes next to a devices. Still no zpool errors though. # zpool status vault pool: vault state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Mon Nov 30 23:33:16 2009 config: NAME STATE READ WRITE CKSUM vaultONLINE 0 0 0 raidz3-0 ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 c1t8d0 ONLINE 0 0 0 c1t9d0 ONLINE 0 0 0 c1t11d0 ONLINE 0 0 0 c1t12d0 ONLINE 0 0 0 c1t13d0 ONLINE 0 0 0 c1t14d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 c2t5d0 ONLINE 0 0 0 11.5K resilvered c2t6d0 ONLINE 0 0 0 c2t7d0 ONLINE 0 0 0 c2t8d0 ONLINE 0 0 0 c2t9d0 ONLINE 0 0 0 c2t10d0 ONLINE 0 0 0 errors: No known data errors # On Mon, Nov 30, 2009 at 06:46:13PM -0800, Chad Cantwell wrote: Hi, Sorry for not replying to one of the already open threads on this topic; I've just joined the list for the purposes of this discussion and have nothing in my client to reply to yet. I have an x86_64 opensolaris machine running on a Core 2 Quad Q9650 platform with two LSI SAS3081E-R PCI-E 8 port SAS controllers, with 8 drives each. The LSI cards are flashed with IT firmware from Feb 2009 (I think, I can double check if it's important). The drives are Samsung HD154UI 1.5TB disks. I was using for quite awhile OpenSolaris 2009.06 with the opensolaris-provided mpt driver to operate a zfs raidz2 pool of about ~20T and this worked perfectly fine (no issues or device errors logged for several months, no hanging). A few days ago I decided to reinstall with the latest OpenSolaris in order to take advantage of raidz3. I hadn't known at the time about the current mpt issues, or I may have held off on upgrading. I installed Solaris Nevada build 127 from the DVD image. I then proceed to setup a raidz3 pool with the same disks as before, of a slightly smaller size (obviously) than the former raidz2 pool. I started a moderately long-running and heavy load rsync to copy my data back to the pool from another
[zfs-discuss] doing HDS shadow copy of a zpool
I appologize if this has been answered already, but I've tried to RTFM and haven't found much. I'm trying to get HDS shadow copy to work for zpool replication. We do this with VXVM by modifying each target disk ID after it's been shadowed from the source LUN. This allows us to import each target disk into the target diskgroup and then have its volumes mounted for backup over the network. From what I can tell, each LUN in a zpool will have 2 256K vdev labels in the front and 2 at the end. Is there a way to modify the vdev labels so that the target LUNs don't end up with the same zpool ID as the source LUNs? Better yet, is there a way to import and rename a zpool that has the same exact id and name of an existing one? As it stands now, after shadow copy, format can tell that each target LUN is labeled to be part of the source zpool, but that is invisibile to zpool import. Thanks, Chad___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can I trust ZFS?
On Jul 31, 2008, at 2:56 PM, Bob Netherton wrote: On Thu, 2008-07-31 at 13:25 -0700, Ross wrote: Hey folks, I guess this is an odd question to be asking here, but I could do with some feedback from anybody who's actually using ZFS in anger. ZFS in anger ? That's an interesting way of putting it :-) If you watch Phil Liggett and/or Paul Sherwen commentating on a cycling event, you're pretty much guaranteed to hear turning the pedals in anger at some point when a rider goes on the attack. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs, raidz, spare and jbod
On Jul 25, 2008, at 7:27 AM, Claus Guttesen wrote: I'm running the version that was supplied on the CD, this is 1.20.00.15 from 2007-04-04. The firmware is V1.45 from 2008-3-27. Check the version at the Areca website. They may have a more recent driver there. The dates are later for the 1.20.00.15 and there is a -71010 extension. Otherwise, file a bug with Areca. They are pretty good about responding. Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE: -t flag for 'zfs destroy'
http://www.opensolaris.org/bug/report.jspa You'll need an OpenSolaris.org account to file the RFE of course. On Jul 17, 2008, at 10:52 AM, Will Murnane wrote: I would like to request an additional flag for the command line zfs tools. Specifically, I'd like to have a -t flag for zfs destroy, as shown below. Suppose I have a pool home with child filesystem will, and a snapshot home/[EMAIL PROTECTED]. Then I run the following commands: # zfs destroy -t volume home/[EMAIL PROTECTED] zfs: not destroying home/[EMAIL PROTECTED], as it is not a volume. # zfs destroy -t snapshot home/[EMAIL PROTECTED] (succeeds) # zfs destroy -t snapshot home/will zfs: not destroying home/will, as it is not a snapshot. # zfs destroy -t volume home/will zfs: not destroying home/will, as it is not a volume. # zfs destroy -t filesystem home/will (succeeds) Now, to test the behavior of '-r', I recreate the same structure as before, and run some more commands: zfs destroy -r -t snapshot home (succeeds) zfs list -Hro name home home/will One more time, to demonstrate -R: zfs clone home/[EMAIL PROTECTED] home/oldwill zfs destroy -R -t snapshot home (???) The two ways I can think of at this point are to destroy the clone as well, or to promote it and then destroy the snapshots. Or, I suppose, make -R incompatible with -t for zfs destroy. I imagine this would be easy to implement, and for scripting use it would be a good sanity check; if you're trying to clean up snapshots you don't accidentally kill the filesystems by messing up some string operation and naming a valid filesystem by mistake. Especially with -r, this could prevent silly mistakes. Also, it might be a helpful thing to add to 'zfs get'; if one wants to see some property for all user home directories and not the snapshots of them, syntax like zfs get used -r -t filesystem home could list the used property of all the children of the home filesystem. This is a slightly different semantic from the proposed zfs destroy enhancement: it's a filter rather than a predicate. I think this is the Right Thing to do with this flag, and it will be intuitive for users. Any suggestions on better specifying the behavior? How can I formally propose this? I'd be glad to implement it if would help this get finished. Thanks! Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] previously mentioned J4000 released
Here's the announcement for those new Sun JBOD devices mentioned the other day. http://www.sun.com/aboutsun/pr/2008-07/sunflash.20080709.1.xml ckl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [perf-discuss] [storage-discuss] zpool io to 6140 is really slow
On 11/20/07, Asif Iqbal [EMAIL PROTECTED] wrote: On Nov 20, 2007 7:01 AM, Chad Mynhier [EMAIL PROTECTED] wrote: On 11/20/07, Asif Iqbal [EMAIL PROTECTED] wrote: On Nov 19, 2007 1:43 AM, Louwtjie Burger [EMAIL PROTECTED] wrote: On Nov 17, 2007 9:40 PM, Asif Iqbal [EMAIL PROTECTED] wrote: (Including storage-discuss) I have 6 6140s with 96 disks. Out of which 64 of them are Seagate ST337FC (300GB - 1 RPM FC-AL) Those disks are 2Gb disks, so the tray will operate at 2Gb. That is still 256MB/s . I am getting about 194MB/s 2Gb fibre channel is going to max out at a data transmission rate around 200MB/s rather than the 256MB/s that you'd expect. Fibre channel uses an 8-bit/10-bit encoding, so it transmits 8-bits of data in 10 bits on the wire. So while 256MB/s is being transmitted on the connection itself, only 200MB/s of that is the data that you're transmitting. But I am running 4GB fiber channels with 4GB NVRAM on a 6 tray of 300GB FC 10K rpm (2Gb/s) disks So I should get a lot more than ~ 200MB/s. Shouldn't I? Here, I'm relying on what Louwtjie said above, that the tray itself is going to be limited to 2Gb/s because of the 2Gb/s FC disks. Chad Mynhier ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] nv-69 install panics dell precision 670
Apparently known bug, fixed in snv_70. http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6577473 On Aug 14, 2007, at 8:28 AM, Bill Moloney wrote: using hyperterm, I captured the panic message as: SunOS Release 5.11 Version snv_69 32-bit Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. panic[cpu0]/thread=fec1ede0: Can't handle mwait size 0 fec37e70 unix:mach_alloc_mwait+72 (fec2006c) fec37e8c unix:mach_init+b0 (c0ce80, fe800010, f) fec37eb8 unix:psm_install+95 (fe84166e, 3, fec37e) fec37ec8 unix:startup_end+93 (fec37ee4, fe91731e,) fec37ed0 unix:startup+3a (fe800010, fec33c98,) fec37ee4 genunix:main+1e () skipping system dump - no dump device configured rebooting... this behavior loops endlessly This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS Apple WWDC Keynote Absence
On Jun 12, 2007, at 9:37 AM, Andy Lubel wrote: Yeah this is pretty sad, we had such plans for actually using our apple (PPC) hardware in our datacenter for something other than AFP and web serving. It also shows how limited apples vision seems to be. I think you are jumping to conclusions For 2 CEO's not to be on the same page demonstrates that there is something else going on rather than just we chose not to put a future ready file system into our next OS. And how its being dismissed by apple is quite upsetting. I think you are jumping to conclusions. Jonathon jumped the gun on something. Chad I wonder when we will see Johnny-cat and Steve-o in the same room talking about it. On 6/12/07 8:23 AM, Sunstar Dude [EMAIL PROTECTED] wrote: Yea, What is the deal with this? I am so bummed :( What the heck was Sun's CEO talking about the other day? And why the heck did Apple not include at least non-default ZFS support in Leopard? If no ZFS in Leapard, then what is all the Apple-induced-hype about? A trapezoidal Dock table? A transparent menu bar? Can anyone explain the absence of ZFS in Leopard??? I signed up for this forum just to post this. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Andy Lubel -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mac OS X Leopard to use ZFS
On Jun 7, 2007, at 12:50 PM, Rick Mann wrote: From Macintouch (http://macintouch.com/#other.2007.06.07): --- On stage Wednesday in Washington D.C., Sun Microsystems Inc. CEO Jonathan Schwartz revealed that his company's open-source ZFS file system will replace Apple's long-used HFS+ in Mac OS X 10.5, a.k.a. Leopard, when the new operating system ships this fall. This week, you'll see that Apple is announcing at their Worldwide Developers Conference that ZFS has become the file system in Mac OS X, said Schwartz. ZFS (Zettabyte File System), designed by Sun for its Solaris OS but licensed as open-source, is a 128-bit file storage system that features, among other things, pooled storage, which means that users simply plug in additional drives to add space, without worrying about such traditional storage parameters as volumes or partitions. [ZFS] eliminates volume management, it has extremely high performance It permits the failure of disk drives, crowed Schwartz during a presentation focused on Sun's new blade servers. --- We'll see next week what Steve announces at the WWDC keynote (which is not under NDA like the rest of the conference). I'll be there and try to remember to post what is said (though it will probably be in a billion other places as well) Chad ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ARC and patents
With US patent laws the way they are, no one but a patent lawyer could safely give you an answer. If by some chance a patent lawyer is lurking and decided to comment, none of the rest of us could safely read such comments. No one working on ZFS could even safely look at the patent you've referenced. On Jun 5, 2007, at 11:40 PM, Kasper Nielsen wrote: Hi there, I was looking at using something very similar to arc.c http:// src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/ fs/zfs/arc.c for an open source project. However, I'm a bit worried about the patent IBM is holding on the ARC data structure. http://appft1.uspto.gov/netacgi/nph-Parser? Sect1=PTO1Sect2=HITOFFd=PG01p=1u=%2Fnetahtml%2FPTO% 2Fsrchnum.htmlr=1f=Gl=50s1=%2220040098541%22.PGNR.OS=DN/ 20040098541RS=DN/20040098541 I remember PostgreSQL dropping their ARC implementation for 2Q some time ago. But I was hoping, that someone on this list might have some constructive input on this issue? cheers Kasper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS - Use h/w raid or not? Thoughts. Considerations.
On 5/17/07, Robert Milkowski [EMAIL PROTECTED] wrote: Hello Phillip, Thursday, May 17, 2007, 6:30:38 PM, you wrote: PF [b]Given[/b]: A Solaris 10 u3 server with an externally attached PF disk array with RAID controller(s) PF [b]Question[/b]: Is it better to create a zpool from a PF [u]single[/u] external LUN on an external disk array, or is it PF better to use no RAID on the disk array and just present PF individual disks to the server and let ZFS take care of the RAID? Then other thing - do you use SATA disks? How much data loss or corruption is an issue for you? Doing software RAID in ZFS can detect AND correct such problems. HW RAID also can but to much less extent. I think this point needs to be emphasized. If reliability is a prime concern, you absolutely want to let ZFS handle redundancy in one way or another, either as mirrogin or as raidz. You can think of redundancy in ZFS as much the same thing as packet retransmission in TCP. If the data comes through bad the first time, checksum verification will catch it, and you get a second chance to get the correct data. A single-LUN zpool is the moral equivalent of disabling retransmission in TCP. Chad Mynhier ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on the desktop
On Apr 17, 2007, at 7:47 AM, Toby Thain wrote: On 17-Apr-07, at 8:33 AM, Robert Milkowski wrote: Hello Rayson, Tuesday, April 17, 2007, 10:50:41 AM, you wrote: RH On 4/17/07, David R. Litwin [EMAIL PROTECTED] wrote: How about asking Microsoft to change Shared Source first?? Let's leave ms out of this, eh? :-) RH While ZFS is nice, I don't think it is a must for most desktop users. RH For servers and power users, yes. But most (over 90% of world RH population) people who just use the computers to browse the web, check RH emails, do word processing, etc... don't care. Even if they do care, I RH don't think those who do not backup their drive can really understand RH how to use ZFS. I belive that ZFS definitely belongs on a desktop, Apple (and I) assuredly agree with you. I would agree as well. With the proper UI (which I hope Apple has or will eventually have -- waiting to get Leopard! as I have not yet renewed my paid developer program at Apple) ZFS is a killer on the desktop, especially on OS X where everything of importance has to be or likes to live on the boot device (I understand that OS X does not yet support booting on ZFS but someday it will), but on any consumer class desktop it is killer because it removes the need to worry about disks from the end user. You need more space, buy a new disk or two and then just add them into the pool of storage. What's interesting about its integration in OS X - and OS X in general - is it diffuses hitherto server grade technology (UNIX, inter alia) all the way down to everybody's grandmother's non- technical desktop/MacBook. Steve definitely proved his point (starting with NeXT, of course); Linux and Solaris will inevitably arrive there too. To M's detriment :-) Yep Chad --Toby mostly for its built-in reliability, free snapshots, built-in compression and cryptography (soon) and easy to use. ps. few days ago I encountered my first checksum error on my desktop system on a submirror (two sata drives in a zfs mirror). Thanks to zfs it won't be a problem and it's already repaired. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on the desktop
On Apr 17, 2007, at 10:03 AM, Toby Thain wrote: On 17-Apr-07, at 12:15 PM, Chad Leigh -- Shire.Net LLC wrote: On Apr 17, 2007, at 7:47 AM, Toby Thain wrote: On 17-Apr-07, at 8:33 AM, Robert Milkowski wrote: ... I belive that ZFS definitely belongs on a desktop, Apple (and I) assuredly agree with you. I would agree as well. With the proper UI (which I hope Apple has or will eventually have -- waiting to get Leopard! Full disclosure: I don't think anyone outside Apple yet knows for SURE if it's going to be in Leopard (or even a future release). Found this sceptical article today - or is it out of date? http://arstechnica.com/staff/fatbits.ars/2006/8/15/4995 I don't have any insider or NDA knowledge (as I said, I have not yet re-upped my paid developer status and have not had any of the leopard seeds), but there have been screenshots from Leopard seeds posted that show ZFS volume creation options etc in dialog boxes. Again, who knows if it will actually ship with that feature. But it has been shipped in seeds as far as I know. Siracusa's column is old. Chad ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FYI: ZFS on USB sticks (from Germany)
On Feb 1, 2007, at 10:51 AM, Richard Elling wrote: FYI, here is an interesting blog on using ZFS with a dozen USB drives from Constantin. http://blogs.sun.com/solarium/entry/solaris_zfs_auf_12_usb My German is somewhat rusty, but I see that Google Translate does a respectable job. Thanks Constantin! -- richard This is the best line: Hier ist die offizielle Dokumentation, echte Systemhelden jedoch kommen mit nur zwei man-Pages aus: zpool und zfs. Roughly, Here [link] is the official documentation; real system heroes need only the two manpages: zpool and zfs Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs / nfs issue (not performance :-) with courier-imap
I am not sure if this is a zfs issue, and nfs issue, or a combination of the two, or not an issue with them per se (caching or whatever), or a courier-imap issue, or even a mail client issue. However, the issue happens in at least two different unrelated mail clients, so I don't think it is client related, and I have spoken to someone who uses courier-imap on nfs mounted directories for maildir mailstore using FreeBSD 6.x to NetApp nfs servers without issue (my nfs client if FreeBSD 6.x while the server is Solaris 10 x86 serving ZFS backed filesystems over nfs), so maybe it is something to do with ZFS and NFS interaction. Basically, I have a few maildir mailstores that are mounted on my FreeBSD imap server from a Solaris 10 sever that serves them using NFSv3 from ZFS filesystems (each maildir has its own ZFS filesystem). Most of my maildirs are on a local disk and do not have a problem and a few on the nfz/zfs do not have the problem and a few have the problem that appeared right after they were migrated from the local disk to the zfs/nfs filesystem for testing (we would eventually like to move over all mail to this nfz/zfs setup). Basically, in the affected accounts (under Apple Mail.app and Windows Thunderbird), you can delete 1 or more messages, (mark for delete), expunge, and then mail starting some place in the list after the deleted messages starts to show the wrong mail content for the given message as shown in the list view. say I have messages A B C D E F G etc A B C D E F G I delete C and expunge Now it looks like this A B D E F G but if I click, say E, it has F's contents, F has Gs contents, and no mail has D's contents that I can see. But the list in the mail client list view is correct. -- Some feedback from the courier mail list, from a guy who runs the FreeBSD nfs clients to NetApp nfs servers with courier without issue, thought it might be an nfs caching issue or something on the client or server. Since this is ZFS backed nfs, I thought to ask here to see if there were any gotchas or anything that might be causing this. ATIME is off (but was on earlier and the problem still happened before I switched it) CHECKSOM COMPRESS DEVICES EXEC SETUID are ON and RDONLY and ZONED are OFF. ACLMODE is groupmask and ACLINHEREIT is secure. I have not messed around with the ZIL business to improve performance. Thanks for any insight on how I might have set this up wrong. Thanks Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Solaris-Supported cards with battery backup
On Jan 24, 2007, at 1:57 PM, Robert Milkowski wrote: Hello James, Wednesday, January 24, 2007, 3:20:14 PM, you wrote: JFH Since we're talking about various hardware configs, does anyone know JFH which controllers with battery backup are supported on Solaris? If JFH we build a big ZFS box I'd like to be able to turn on write caching JFH on the drives but have them battery-backed in the event of a power JFH loss. Are 3ware cards going to be supported any time soon? JFH I checked and there doesn't seem to be a battery backup option JFH for Thumper. Is that right? Does anyone know if there plans for JFH that? ZFS makes sure itself that transaction is on disk by issuing write cache flush command to disks. So you don't have to worry about it. Areca SATA cards are supported on Solaris x86 by Areca (drivers etc from them, not from Sun) and they support battery backup. It is what I am using Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS entry in /etc/vfstab
On 1/10/07, Vahid Moghaddasi [EMAIL PROTECTED] wrote: Hi, Why would I ever need to specify ZFS mount(s) in /etc/vfstab at all? I see it in some documents that zfs can be defined in /etc/vfstab with fstype zfs. Thanks. I don't think it's a question of needing to be able to do so as much as it is a useful transitional mechanism. Some people might not be comfortable with how ZFS keeps track of filesystems and where they should be mounted, and vfstab is something they're used to dealing with. For example, at a previous job, we had a sanity-check script running out of cron to verify that every file system that should have been mounted actually was mounted and that every file system that actually was mounted should have been mounted (in other words, that the mapping of vfstab entries to (non-auto-)mounted filesystems was both one-to-one and onto.)[1] In the pre-ZFS world, knowing what should be mounted was simply a question of looking at vfstab. With ZFS, the filesystems that should be mounted are those filesystems that _are_ mounted. In this model, a sanity-check script like this is meaningless, because there's no longer an independent source of information to say what should be mounted. This is an example where this feature is convenient. There might be other examples where this feature is necessary. Chad Mynhier [1] Note that the purpose of the script was mostly to guard against operator error rather than system problems. With vfstab, it would take two independent actions to change what is mounted on a server and the concept of what should be mounted there. With ZFS, a single action can change both of those. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
On Dec 2, 2006, at 10:56 AM, Al Hopper wrote: On Sat, 2 Dec 2006, Chad Leigh -- Shire.Net LLC wrote: On Dec 2, 2006, at 6:01 AM, [EMAIL PROTECTED] wrote: While other file systems, when they become corrupt, allow you to salvage data :-) They allow you to salvage what you *think* is your data. But in reality, you have no clue what the disks are giving you. I stand by what I said. If you have a massive disk failure, yes. You are right. When you have subtle corruption, some of the data and meta data is bad but not all. In that case you can recover (and verify the data if you have the means to do so) t he parts that did not get corrupted. My ZFS experience so far is that it basically said the whole 20GB pool was dead and I seriously doubt all 20GB was corrupted. That was because you built a pool with no redundancy. In the case where ZFS does not have a redundant config from which to try to reconstruct the data (today) it simply says: sorry charlie - you pool is corrupt. Where a RAID system would still be salvageable. Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
On Dec 2, 2006, at 12:29 PM, Jeff Victor wrote: Chad Leigh -- Shire.Net LLC wrote: On Dec 2, 2006, at 10:56 AM, Al Hopper wrote: On Sat, 2 Dec 2006, Chad Leigh -- Shire.Net LLC wrote: On Dec 2, 2006, at 6:01 AM, [EMAIL PROTECTED] wrote: While other file systems, when they become corrupt, allow you to salvage data :-) They allow you to salvage what you *think* is your data. But in reality, you have no clue what the disks are giving you. I stand by what I said. If you have a massive disk failure, yes. You are right. When you have subtle corruption, some of the data and meta data is bad but not all. In that case you can recover (and verify the data if you have the means to do so) t he parts that did not get corrupted. My ZFS experience so far is that it basically said the whole 20GB pool was dead and I seriously doubt all 20GB was corrupted. That was because you built a pool with no redundancy. In the case where ZFS does not have a redundant config from which to try to reconstruct the data (today) it simply says: sorry charlie - you pool is corrupt. Where a RAID system would still be salvageable. That is a comparison of apples to oranges. The RAID system has Redundancy. If the ZFS pool had been configured with redundancy, it would have fared at least as well as the RAID system. Without redundancy, neither of them can magically reconstruct data. The RAID system would simply be an AID system. That is not the question. Assuming the error came OUT of the RAID system (which it did in this case as there was a bug in the driver and the cache did not get flushed in a certain shutdown situation), another FS would have been salvageable as the whole 20GB of the pool was not corrupt. Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
On Dec 1, 2006, at 9:50 AM, Al Hopper wrote: Followup: When you say you fixed the HW, I'm curious as to what you found and if this experience with ZFS convinced you that your trusted RAID H/W did, in fact, have issues? Do you think that it's likely that there are others running production systems on RAID systems that they trust, but don't realize may have bugs (causing data corruption) that have yet to be discovered? And this is different from any other storage system, how? (ie, JBOD controllers and disks can also have subtle bugs that corrupt data) Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
On Dec 1, 2006, at 4:34 PM, Dana H. Myers wrote: Chad Leigh -- Shire.Net LLC wrote: On Dec 1, 2006, at 9:50 AM, Al Hopper wrote: Followup: When you say you fixed the HW, I'm curious as to what you found and if this experience with ZFS convinced you that your trusted RAID H/W did, in fact, have issues? Do you think that it's likely that there are others running production systems on RAID systems that they trust, but don't realize may have bugs (causing data corruption) that have yet to be discovered? And this is different from any other storage system, how? (ie, JBOD controllers and disks can also have subtle bugs that corrupt data) Of course, but there isn't the expectation of data reliability with a JBOD that there is with some RAID configurations. There is not? People buy disk drives and expect them to corrupt their data? I expect the drives I buy to work fine (knowing that there could be bugs etc in them, the same as with my RAID systems). Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
On Dec 1, 2006, at 10:17 PM, Ian Collins wrote: Chad Leigh -- Shire.Net LLC wrote: On Dec 1, 2006, at 4:34 PM, Dana H. Myers wrote: Chad Leigh -- Shire.Net LLC wrote: And this is different from any other storage system, how? (ie, JBOD controllers and disks can also have subtle bugs that corrupt data) Of course, but there isn't the expectation of data reliability with a JBOD that there is with some RAID configurations. There is not? People buy disk drives and expect them to corrupt their data? I expect the drives I buy to work fine (knowing that there could be bugs etc in them, the same as with my RAID systems). So you trust your important data to a single drive? I doubt it. But I bet you do trust your data to a hardware RAID array. Yes, but not because I expect a single drive to be more error prone (versus total failure). Total drive failure on a single disk loses all your data. But we are not talking total failure, we are talking errors that corrupt data. I buy individual drives with the expectation that they are designed to be error free and are error free for the most part and I do not expect a RAID array to be more robust in this regard (after all, the RAID is made up of a bunch of single drives). Some people on this list think that the RAID arrays are more likely to corrupt your data than JBOD (both with ZFS on top, for example, a ZFS mirror of 2 raid arrays or a JBOD mirror or raidz). There is no proof of this or even reasonable hypothetical explanation for this that I have seen presented. Chad Ian --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
On Dec 1, 2006, at 10:42 PM, Toby Thain wrote: On 1-Dec-06, at 6:36 PM, Chad Leigh -- Shire.Net LLC wrote: On Dec 1, 2006, at 4:34 PM, Dana H. Myers wrote: Chad Leigh -- Shire.Net LLC wrote: On Dec 1, 2006, at 9:50 AM, Al Hopper wrote: Followup: When you say you fixed the HW, I'm curious as to what you found and if this experience with ZFS convinced you that your trusted RAID H/W did, in fact, have issues? Do you think that it's likely that there are others running production systems on RAID systems that they trust, but don't realize may have bugs (causing data corruption) that have yet to be discovered? And this is different from any other storage system, how? (ie, JBOD controllers and disks can also have subtle bugs that corrupt data) Of course, but there isn't the expectation of data reliability with a JBOD that there is with some RAID configurations. There is not? People buy disk drives and expect them to corrupt their data? I expect the drives I buy to work fine (knowing that there could be bugs etc in them, the same as with my RAID systems). Yes, but in either case, ZFS will tell you. And then kill your whole pool :-) Other filesystems in general cannot. While other file systems, when they become corrupt, allow you to salvage data :-) Chad --Toby Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] poor NFS/ZFS performance
On Nov 22, 2006, at 4:11 PM, Al Hopper wrote: No problem there! ZFS rocks. NFS/ZFS is a bad combination. Has anyone tried sharing a ZFS fs using samba or afs or something else besides nfs? Do we have the same issues? Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] poor NFS/ZFS performance
On Nov 21, 2006, at 1:36 PM, Joe Little wrote: On 11/21/06, Matthew B Sweeney - Sun Microsystems Inc. [EMAIL PROTECTED] wrote: Roch, Am I barking up the wrong tree? Or is ZFS over NFS not the right solution? I strongly believe it is.. We just are at odds as to some philosophy. Either we need NVRAM backed storage between NFS and ZFS, battery backed-memory that can survive other subsystem failure, or a change in the code path to allow some discretion here. Currently, the third option, 6280630, ZIL syncronicity, or as I reference it, sync_deferred functionality. A combination is best, but the sooner this arrives, the better for anyone who needs a general purpose file server / NAS that compares anywhere near to the competion. I had heard that some stuff in the latest OS and coming in Sol10 U3 should greatly help in NFS/ZFS performance. Something to do with ZFS not synching the entire pool on every sync but just the stuff needed or something like that. I heard it kind of 2nd or 3rd hand so cannot be to detailed in my description. Can someone here in the know confirm that this is so (or not)? Thanks Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS Performance Question
On Oct 31, 2006, at 11:09 AM, Jay Grogan wrote: Thanks Robert, I was hoping something like that hard turned up allot of what I will need to use ZFS for will be sequential writes at this time. I don't know what it is worth, but I was using iozone http:// www.iozone.org/ on my ZFS on top of Areca RAID volumes as well on ufs on a similar volume and it showed, for many sorts of things, better performance under ZFS. I am not an expert on file systems and disk performance so I cannot say that there are not faults in its methodology, but it is interesting to run and look at. Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance Question
On Oct 30, 2006, at 10:45 PM, David Dyer-Bennet wrote: Also, stacking it on top of an existing RAID setup is kinda missing the entire point! Everyone keeps saying this, but I don't think it is missing the point at all. Checksumming and all the other goodies still work fine and you can run a ZFS mirror across 2 or more raid devices for ultimate in reliability. My Dual RAID-6 with large ECC battery backed cache device mirrors will be much more reliable than your RAID-Z and probably perform better, and I still get the ZFS goodness. I can lose one whole RAID device (all the disks) and up to 2 of the disks on the second RAID device, all att he same time, and still be OK and fully recoverable and still operating. (ok, my second raid is not yet installed, so right now my ZFS'ed single RAID-6 is not as reliable as I would like, but the second half, ie, second RAID-6 will be installed before XMas) Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What is touching my filesystems?
On 10/17/06, Niclas Sodergard [EMAIL PROTECTED] wrote: Hi everyone, I have a very strange problem. I've written a simple script that uses zfs send/recv to send a filesystem between two hosts using ssh. Works like a charm - most of the time. As you know we need a two snapshots when we do a incremental send. But the problem is something is touching my filesystems on the receiving side so they are no longer identical. Do you have atime updates on the recv side turned off? If you want to do incrementals, and you also want to be able to look at the data on the receive side, you'll need to do so. Chad Mynhier ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A versioning FS
as it never once came up as a an issue with VMS usability. Also, a big difference between Snapshots and FV tends to be who controls EOL-ing a version/Snapshot. Snapshots tend to be done by the Admin, and their aging strictly controlled and defines (e.g. we keep hourly snapshots for 1 week). File versioning is typically under the control of the End-User, as their utility is much more nebulously defined. Certainly, there is no ability to truncate based on number of versions (e.g. we only allow 100 versions to be kept), since the frequency of versioning a file varies widely. Aging on a version is possibly a better answer, but this runs into a problem of user education, where we have to retrain our users to stop making frequent copies of important documents (like they do now, in absence of FV), but _do_ remember to dig through the FV archive periodically to save a desirable old copy. Also, if managing FV is to be a User task, how are they to do it over NFS/SAMBA? And, log into the NFS server to do a cleanup isn't an acceptable answer. Also, FV is only useful for apps which do a close() on a file (or at least, I'm assuming we wait for a file to signal that it is closed before taking a version - otherwise, we do what? take a version every X minutes while the file still open? I shudder to think about the implementation of this, and its implications...). How many apps keep a file open for a long period of time? FV isn't useful to them, only an unlimited undo functionality INSIDE the app. Yes, any time you do a close() or equivalent. The idea is not to implement a universal undo stack. You can always find a scenario where FV doesn't help. So what. There are lots of scenarios where it does help. More positive scenarios than you can dream up negatives for. Lastly, consider the additional storage requirement of FV, and exactly how much utility you gain for sacrificing disk space. We have GB and TB of cheap space. A few extra versions lying around until people hit their quotas is the users' issue, not the sysadmin. Look at this scenario: I'm editing a file, making 1MB of change per 5 minutes (a likely scenario when actively editing any Office- style document), of which only 50% to I actually make permanent (the rest being temp edits for ideas I decide to change or throw out). If I'm auto-saving every 5 minutes, that means I use 12MB of version space per hour. If I took a hourly snapshot, then I need only 6MB of storage. So. Your snapshot is much less useful and 12MB is nothing in todays GBs of cheap space. Probably compressed too so even less usage than you envision. The situation gets worse, for the primary usefulness of FV is for files which are frequently edited - mean that they have rapid content change, and not in append-mode. Such a usage pattern means that FV will take up a much greater amount of space than periodic snapshots, as the longer interval in snapshots will allow the changes to settle. Not an issue. Cheap disk space. To me, FV is/was very useful in TOPS-20 and VMS, where you were looking at a system DESIGNED with the idea in mind, already have a user base trained to use and expect it, and virtually all usage was local (i.e. no network filesharing). None of this is true in the UNIX/POSIX world. And does not affects its usefulness. Chad -Erik ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A versioning FS
On Oct 6, 2006, at 3:53 PM, Nicolas Williams wrote: On Fri, Oct 06, 2006 at 03:30:20PM -0600, Chad Leigh -- Shire.Net LLC wrote: On Oct 6, 2006, at 3:08 PM, Erik Trimble wrote: OK. So, now we're on to FV. As Nico pointed out, FV is going to need a new API. Using the VMS convention of simply creating file names with a version string afterwards is unacceptible, as it creates enormous directory pollution, Assumption, not supported. Eye of the beholder. No, you really need an API, otherwise you have to guess when to snapshot versions of files. What does snapshot versions of files mean? My line Assumption, not supported. Eye of the beholder was in reference to enormous directory polution not to mention user confusion. Assumption, not supported. Maybe Erik would find it confusing. I know I would find it _annoying_. Then leave it set to 1 version So, FV has to be invisible to non-aware programs. yes Interesting that you agree with this when you disagree with Erik's other points! To me this statement implies FV APIs. It has to do with the implementation details. I don't know what sort of APIs you are saying are needed. Maybe they are needed and maybe they would be handy. I am not disputing that. The above should be simple to do however -- a program does an open of a file name foo.bar. ZFS / the file system routine would use the most recent version by default if no version info is given. Now we have a problem: how do we access FV for non-local (e.g. SAMBA/NFS) clients? Since the VAST majority of usefulness of FV is in the network file server arena, Assumption, and definitely not supported. It is very useful outside of the file sharing arena. I agree with you, and I agree with Erik. We, Sun engineers that is, need to look at the big picture, and network access is part of the big picture. Sure unless we can use FV over the network, it is useless. Wrong Yes, but we have to provide for it. I never said that file sharing is not useful (in this or any context). I just said that FV is not useless except in the over the network use. And if it did not support filesharing scenarios, at least in the beginning, it still has great use. The same way that apache does not support lockfiles on nfs file systems, does not make apache or nfs useless, FV that is not 100% in every nook and cranny does not make it useless. I would find it of tremendous use just in managing system and configuration files. You can't modify the SMB or NFS protocol (easily or quickly) to add FV functionality (look how hard it was to add ACLs to these protocols). About the only way I can think around this problem is to store versions in a special subdir of each directory (e.g. .zfs_version), which would then be browsable over the network, using tools not normally FV-aware. But this puts us back into the problem of a directory which potentially has hundreds or thousands of files. This directory way of doing it is not a good way. It fails the ease of use to the end user test. No, it doesn't: it doesn't preclude having FV-aware UIs that make it easier to access versions. All Erik's .zfs_version proposal is about is remote access, not a user interface. one UI is the command line shell The VMS way is far superior. The problem is that you have to make sure that apps that are not FV aware have no problems, which means you cannot just append something to the actual file name. It has to be some sort of meta data. I.e., APIs. Well, file system level meta data that the file system uses may or may not need APIs to expose it -- depends on how the final implementation works. However, I never came out against APIs The big question though is: how to snapshot file versions when they are touched/created by applications that are not aware of FV? Don't use the word snapshot as it may draw in unintended comparisons to snapshot features. Certainly not with every write(2). no At fsync(2), close(2), open(2) for write/append? probably What if an application deals in multiple files? so? Etc... Automatically capturing file versions isn't possible in the general case with applications that aren't aware of FV. In most cases it is possible. At worst you make a copy on open and work on the copy, making it the most recent version. While this may indeed mean that you have all of your changes around, figuring out which version has them can be massively time- consuming. Your assumption. (And much less hard than using snapshots). I agree that with ZFS snapshots it could be hard to find the file versions you want. I don't agree that the same isn't true with FV *except* where you have FV-aware applications. How so? The shell / desktop is enough of a UI to deal with it. Yes, any time you do a close() or equivalent. The idea is not to implement a universal undo stack. Or open(2) for write, fsync(2)s
Re: [zfs-discuss] A versioning FS
On Oct 6, 2006, at 7:33 PM, Erik Trimble wrote: This is what Nico and I are talking about: if you turn on file versioning automatically (even for just a directory, and not a whole filesystem), the number of files being created explodes geometrically. But it doesn't. Unless you are editing geometrically more files. Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A versioning FS
FV, your editor must issue periodic close() and open() commands on the same file, as you edit, all without your intervention. No, you get the benefits of FV, just across editing sessions and not internal to an editing session. Exactly how many editors do this? I have no idea. So, the only way to enable FV is to require the user to periodically push the Save button. Which is how much more different than the current situation? I edit a file. I realize I screwed up. I can go back to the previous version (or 2 ago or whatever). I cannot do that in the current situation. Chad -Erik ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A versioning FS
On Oct 6, 2006, at 10:18 PM, Richard Elling - PAE wrote: Erik Trimble wrote: The problem is we are comparing apples to oranges in user bases here. TOPS-20 systems had a couple of dozen users (or, at most, a few hundred). VMS only slightly more. UNIX/POSIX systems have 10s of thousands. IIRC, I had about a dozen files under VMS, not counting versions. You mean in your system? There was a lot more than that... Plus, the number of files being created under typical modern systems is at least two (and probably three or four) orders of magnitude greater. I've got 100,000 files under /usr in Solaris, and almost 1,000 under my home directory. wimp :-) I count 88,148 in my main home directory. I'll bet just running gnome and firefox will get you in the ballpark of 1,000 :-/ None (well, maybe 1 or 2) of which you edit and hence would not generate versions. Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A versioning FS
On Oct 5, 2006, at 7:47 PM, Chad Leigh -- Shire.Net LLC wrote: I find the unix conventions of storying a file and file~ or any of the other myriad billion ways of doing it that each app has invented to be much more unwieldy. sorry, storing a file, not storying --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A versioning FS
On Oct 5, 2006, at 6:48 PM, Frank Cusack wrote: On October 5, 2006 5:25:17 PM -0700 David Dyer-Bennet [EMAIL PROTECTED] b.net wrote: Well, unless you have a better VCS than CVS or SVN. I first met this as an obscure, buggy, expensive, short-lived SUN product, actually; I believe it was called NSE, the Network Software Engineering environment. And I used one commercial product (written by an NSE user after NSE was discontinued) that supported the feature needed. Both of these had what I might call a two-level VCS. Each developer had one or more private repositories (the way people have working directories now with SVN), but you had full VCS checkin/checkout (and compare and rollback and so forth) within that. Then, when your code was ready for the repository, you did a commit step that pushed it up from your private repository to the public repository. I wouldn't call that 2-level, it's simply branching, and all VCS/SCM systems have this, even rcs. Some expose all changes in the private branch to everyone (modulo protection mechanisms), some only expose changes that are put back (to use Sun teamware terminology). Both CVS and SVN have this. -frank David is describing a different behavior. Even a branch is still ultimately on the single, master server with CVS, SVN, and more other versioning systems. Teamware, and a few other versioning systems, let you have more arbitrary parent and child relationships. In Teamware, you can create a project gate, have a variety of people check code into this project gate, and do all of this without ever touching the parent gate. When the project is done, you then checkin the changes to the project gate's parent. The gate parent may itself be a child of some other gate, making the above project gate a grand-child of some higher gate. You can also change a child's parent, so you could in fact skip the parent and go straight to the grand parent if you wish. For that matter, you can re-parent the parent to sync with the former child if you had some reason to do so. A Teamware putback really isn't a matter of exposure. Until you do a putback to the parent, the code is not physically (or even logically) present in the parent. Teamware's biggest drawbacks are a lack of change sets (like how Subversion tracks simultaneous, individual changes as a group) and that it only runs via file access (no network protocol, filesystem or NFS only.) Mercurial seems to be similar to Teamware in terms of parenting, but with network protocol support builtin. Which is presumably OpenSolaris will be using it. ckl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] problem ZFS / NFS from FreeBSD nfsv3 client -- periodic NFS server not resp
On Sep 26, 2006, at 12:26 PM, Chad Leigh -- Shire.Net LLC wrote: On Sep 26, 2006, at 12:24 PM, Mike Kupfer wrote: Chad == Chad Leigh -- Shire.Net LLC [EMAIL PROTECTED] writes: Chad snoop does not show me the reply packets going back. What do I Chad need to do to go both ways? It's possible that performance issues are causing snoop to miss the replies. If your server has multiple network interfaces, it's more likely that the server is routing the replies back on a different interface. We've run into that problem many times with the NFS server that has my home directory on it. If that is what's going on, you need to fire up multiple instances of snoop, one per interface. OK, I will try that. I did run tcpdump on the BSD client as well so the responses should show up there as well as it only has the 1 interface on that net while the Solaris box has 3. That got me thinking. Since I had 3 dedicated ports to use for nfs, I changed it so each is on its own network (192.168.2 .3. 4) so there is no port switcheroo on incoming and outgoing port. I also upgraded the FreeBSD to catch any bge updates and patches (there were some I think but I am not sure they had anything to do with my issue). Anyway, after doing both of these my issue seems to have gone away... I am still testing / watching but I have not seen or experienced the issue in a day. I am not sure which one fixed my problem but it seems to have gone away. Thanks Chad Thanks Chad mike --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] problem ZFS / NFS from FreeBSD nfsv3 client -- periodic NFS server not resp
On Sep 26, 2006, at 12:24 PM, Mike Kupfer wrote: Chad == Chad Leigh -- Shire.Net LLC [EMAIL PROTECTED] writes: Chad snoop does not show me the reply packets going back. What do I Chad need to do to go both ways? It's possible that performance issues are causing snoop to miss the replies. If your server has multiple network interfaces, it's more likely that the server is routing the replies back on a different interface. We've run into that problem many times with the NFS server that has my home directory on it. If that is what's going on, you need to fire up multiple instances of snoop, one per interface. OK, I will try that. I did run tcpdump on the BSD client as well so the responses should show up there as well as it only has the 1 interface on that net while the Solaris box has 3. Thanks Chad mike --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] problem ZFS / NFS from FreeBSD nfsv3 client -- periodic NFS server not resp
I have set up a Solaris 10 U2 06/06 system that has basic patches to the latest -19 kernel patch and latest zfs genesis etc as recommended. I have set up a basic pool (local) and a bunch of sub-pools (local/mail, local/mail/shire.net, local/mail/shire.net/o, local/jailextras/shire.net/irsfl, etc). I am exporting these with [EMAIL PROTECTED],[EMAIL PROTECTED] and then mounting a few of these pools on a FreeBSD system using nfsv3. The FreeBSD has about 4 of my 10 or so subpools mounted. 2 are email imap account tests, 1 is generic storage, and one is a FreeBSD jail root. FreeBSD mounts them with using TCP /sbin/mount_nfs -s -i -3 -T foo-i1:/local/mail/shire.net/o/obar /local/2/hobbiton/local/mail/shire.net/o/obar The systems are both directly connected to a gigabit switch using 1000btx-fdx and both have an MTU set at 9000. The Solaris side is an e1000g port (the system has 2 bge and 2 e1000g ports all configured) and the FreeBSD is a bge port. etc. I have heard that there are some ZFS/NFS sync performance problems etc that will be fixed in U3 or are fixed in OpenSolaris. I do not think my issue is related to that. I have also seen some of that with sometimes having pisspoor performance on writing. I have experienced the following issue several times since I started experimenting with this a few days ago. I periodically will get NFS server not responding errors on the FreeBSD machine for one of the mounted pools, and it will last 4-8 minutes or so and then come alive again and be fine for many hours. When this happens, access to the other mounted pools still works fine and logged directly in to the Solaris machine I am able to access the file systems (pools) just fine. Example error message: Sep 24 03:09:44 freebsdclient kernel: nfs server solzfs-i1:/local/jailextras/shire.net/irsfl: not responding Sep 24 03:10:15 freebsdclient kernel: nfs server solzfs-i1:/local/jailextras/shire.net/irsfl: not responding Sep 24 03:12:19 freebsdclient last message repeated 4 times Sep 24 03:14:54 freebsdclient last message repeated 5 times I would be interested in getting feedback on what might be the problem and also ways to track this down etc. Is this a know issue? Have others seen the nfs server sharing ZFS time out (but not for all pools)? Etc. Is there any functional difference with setting up the ZFS pools as legacy mounts and using a traditional share command to share them over nfs? I am mostly a Solaris noob and am happy to learn and can try anything people want me to test. Thanks in advance for any comments or help. thanks Chad This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] problem ZFS / NFS from FreeBSD nfsv3 client -- periodic NFS server not resp
On Sep 25, 2006, at 12:18 PM, eric kustarz wrote: Chad Leigh wrote: I have set up a Solaris 10 U2 06/06 system that has basic patches to the latest -19 kernel patch and latest zfs genesis etc as recommended. I have set up a basic pool (local) and a bunch of sub-pools (local/mail, local/mail/shire.net, local/mail/shire.net/ o, local/jailextras/shire.net/irsfl, etc). I am exporting these with [EMAIL PROTECTED],[EMAIL PROTECTED] and then mounting a few of these pools on a FreeBSD system using nfsv3. The FreeBSD has about 4 of my 10 or so subpools mounted. 2 are email imap account tests, 1 is generic storage, and one is a FreeBSD jail root. FreeBSD mounts them with using TCP /sbin/mount_nfs -s -i -3 -T foo-i1:/local/mail/shire.net/o/obar / local/2/hobbiton/local/mail/shire.net/o/obar The systems are both directly connected to a gigabit switch using 1000btx-fdx and both have an MTU set at 9000. The Solaris side is an e1000g port (the system has 2 bge and 2 e1000g ports all configured) and the FreeBSD is a bge port. etc. I have heard that there are some ZFS/NFS sync performance problems etc that will be fixed in U3 or are fixed in OpenSolaris. I do not think my issue is related to that. I have also seen some of that with sometimes having pisspoor performance on writing. I have experienced the following issue several times since I started experimenting with this a few days ago. I periodically will get NFS server not responding errors on the FreeBSD machine for one of the mounted pools, and it will last 4-8 minutes or so and then come alive again and be fine for many hours. When this happens, access to the other mounted pools still works fine and logged directly in to the Solaris machine I am able to access the file systems (pools) just fine. Example error message: Sep 24 03:09:44 freebsdclient kernel: nfs server solzfs-i1:/local/ jailextras/shire.net/irsfl: not responding Sep 24 03:10:15 freebsdclient kernel: nfs server solzfs-i1:/local/ jailextras/shire.net/irsfl: not responding Sep 24 03:12:19 freebsdclient last message repeated 4 times Sep 24 03:14:54 freebsdclient last message repeated 5 times I would be interested in getting feedback on what might be the problem and also ways to track this down etc. Is this a know issue? Have others seen the nfs server sharing ZFS time out (but not for all pools)? Etc. Could be lots of things - network partition, bad hardware, overloaded server, bad routers, etc. What's the server's load like (vmstat, prstat)? If you're banging on the server too hard and using up the server's resources then nfsd may not be able to respond to your client's requests. The server is not doing anything except this ZFS / NFS serving and only 1 client is attached to it (the one with the problems). prstat shows a load of 0.00 continually and vmstat is typically like # vmstat kthr memorypagedisk faults cpu r b w swap free re mf pi po fr de sr s1 s2 -- -- in sy cs us sy id 0 0 0 10640580 691412 0 1 0 0 0 0 2 0 11 0 0 421 85 120 0 0 100 # You can also grab a snoop trace to see what packets are not being responded too? If I can catch it happening. Most of the time I am not around and I just see it in the logs. Sometimes it happens when I do a df -h on the client for example. What are clients and local apps doing to the machine? Almost nothing. No local apps are running on the server. It is basically just doing ZFS and NFS. The client has 4 mounts from ZFS, all of them very low usage. 2 email accounts storage (imap maildir) are mounted for testing. Each receives 10-100 messages a day. 1 extra storage space is mounted and once a day rsync copies 2 files to it in the middle of the night -- one around 70mb and one 7mb. The other is being used as the root for a FreeBSD jail which is not being used for anything. Just proof of concept. No processes are running in the jail that are doing much of anything to the NFS mounted fiel system -- occasional log writes. What is your server hardware (# processors, memory) - is it underprovisioned for what you're doing to it? Tyan 2892 MB with a single dual core Opteron at 2.0 GHZ. 2GB memory. Single Areca 1130 raid card with 1gb RAM cache. Works very well with ZFS without the NFS component. (Has a 9 disk RAID 6 array on it). I have done lots of testing with this card and Solaris with and without ZFS and it has held up very well without any sort of IO issues. (Except the fact that it does not get a flush when the system powers down with init 5). The ZFS pools are currently on this single disk (to be augmented later this year when more funding comes through to buy more stuff) A dual port e1000g intel server card over PCIe is the Solaris side of the network. How is the freeBSD NFS client code - robust? I have
Re: [zfs-discuss] problem ZFS / NFS from FreeBSD nfsv3 client -- periodic NFS server not resp
On Sep 25, 2006, at 1:15 PM, Mike Kupfer wrote: Chad == Chad Leigh -- Shire.Net LLC [EMAIL PROTECTED] writes: Chad On Sep 25, 2006, at 12:18 PM, eric kustarz wrote: You can also grab a snoop trace to see what packets are not being responded too? Chad If I can catch it happening. Most of the time I am not around and Chad I just see it in the logs. I've attached a hack script that runs snoop in the background and rotates the capture files. If you start it as (for example) bgsnoop client server it will save the last 6 hours of capture files between the two hosts. If you notice a problem in the logs, you can find the corresponding capture file and extract from it what you need. Hi Mike Thanks. I set this up like so ./bgsnoop.sh -d e1000g0 freebsd-internal since my nfs is not going out the default interface. Soon thereafter I caught the problem. In looking at the snoop.trace file I am not sure what to look for. There seems to be no packet headers or time stamps or anything -- just a lot of binary data. What am I looking for? Thanks Chad mike bgsnoop --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.n smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] problem ZFS / NFS from FreeBSD nfsv3 client -- periodic NFS server not resp
On Sep 25, 2006, at 2:49 PM, Mike Kupfer wrote: Chad == Chad Leigh -- Shire.Net LLC [EMAIL PROTECTED] writes: Chad There seems to be no packet headers or time stamps or anything -- Chad just a lot of binary data. What am I looking for? Use snoop -i capture_file to decode the capture file. OK, a little snoop help is required. I ran bgsnoop as follows: # ./bgsnoop.sh -t a -r -d e1000g0 According to the snoop man page -t [ r | a | d ]Time-stamp presentation. Time-stamps areaccuratetowithin4 microseconds. The default is for times to be presented in d (delta) format (the time since receiving the previous packet). Option a (abso- lute) gives wall-clock time. Option r (relative) gives time relative to the first packet displayed. This can be used with the -p option to display time relative to any selected packet. so -t a should show wall clock time But my feed looks like the following and I don't see any wall clock time stamps. I need to be able to get some sort of wall-time stamp on this so that I can know where to look in my snoop dump for offending issues... 1 0.0 freebsd-internal.shire.net - bagend-i1NFS C ACCESS3 FH=50E5 (read,lookup,modify,extend,delete,execute) 2 0.00045 freebsd-internal.shire.net - bagend-i1NFS C ACCESS3 FH=339B (read,lookup,modify,extend,delete,execute) 3 0.00019 freebsd-internal.shire.net - bagend-i1NFS C LOOKUP3 FH=339B 1159219290.M400972P15189_courierlock.freebsd.shire.net 4 0.00019 freebsd-internal.shire.net - bagend-i1NFS C LOOKUP3 FH=339B 1159219290.M400972P15189_courierlock.freebsd.shire.net 5 0.00026 freebsd-internal.shire.net - bagend-i1NFS C CREATE3 FH=339B (UNCHECKED) 1159219290.M400972P15189_courierlock.freebsd.shire.net 6 0.00045 freebsd-internal.shire.net - bagend-i1NFS C ACCESS3 FH=878C (read,lookup,modify,extend,delete,execute) 7 0.00013 freebsd-internal.shire.net - bagend-i1NFS C LOOKUP3 FH=50E5 tmp 8 0.00013 freebsd-internal.shire.net - bagend-i1NFS C LOOKUP3 FH=339B 1159219290.M400972P15189_courierlock.freebsd.shire.net 9 0.00019 freebsd-internal.shire.net - bagend-i1NFS C ACCESS3 FH=878C (read,lookup,modify,extend,delete,execute) 10 0.00026 freebsd-internal.shire.net - bagend-i1NFS C ACCESS3 FH=878C (read,lookup,modify,extend,delete,execute) 11 0.00019 freebsd-internal.shire.net - bagend-i1NFS C WRITE3 FH=878C at 0 for 24 (ASYNC) 12 0.00026 freebsd-internal.shire.net - bagend-i1NFS C ACCESS3 FH=878C (read,lookup,modify,extend,delete,execute) 13 0.00013 freebsd-internal.shire.net - bagend-i1NFS C LOOKUP3 FH=339B courier.lock 14 0.00013 freebsd-internal.shire.net - bagend-i1NFS C COMMIT3 FH=878C at 0 for 24 15 0.00032 freebsd-internal.shire.net - bagend-i1NFS C LINK3 FH=878C to FH=339B courier.lock 16 0.00026 freebsd-internal.shire.net - bagend-i1NFS C LOOKUP3 FH=339B 1159219290.M400972P15189_courierlock.freebsd.shire.net 17 0.00019 freebsd-internal.shire.net - bagend-i1NFS C REMOVE3 FH=339B 1159219290.M400972P15189_courierlock.freebsd.shire.net 18 0.00032 freebsd-internal.shire.net - bagend-i1NFS C ACCESS3 FH=339B (read,lookup,modify,extend,delete,execute) 19 0.00019 freebsd-internal.shire.net - bagend-i1NFS C FSSTAT3 FH=50E5 20 0.00019 freebsd-internal.shire.net - bagend-i1NFS C READDIR3 FH=339B Cookie=0 for 8192 21 0.00026 freebsd-internal.shire.net - bagend-i1NFS C LOOKUP3 FH=339B courier.lock 22 0.00019 freebsd-internal.shire.net - bagend-i1NFS C LOOKUP3 FH=339B 1159219290.M405999P15189_imapuid_164.freebsd.shire.net 23 0.00026 freebsd-internal.shire.net - bagend-i1NFS C LOOKUP3 FH=339B 1159219290.M405999P15189_imapuid_164.freebsd.shire.net 24 0.00013 freebsd-internal.shire.net - bagend-i1NFS C CREATE3 FH=339B (UNCHECKED) 1159219290.M405999P15189_imapuid_164.freebsd.shire.net 25 0.00032 freebsd-internal.shire.net - bagend-i1NFS C ACCESS3 FH=868C (read,lookup,modify,extend,delete,execute) 26 0.00013 freebsd-internal.shire.net - bagend-i1NFS C LOOKUP3 FH=339B 1159219290.M405999P15189_imapuid_164.freebsd.shire.net 27 0.00013 freebsd-internal.shire.net - bagend-i1NFS C ACCESS3 FH=EE81 (read,lookup,modify,extend,delete,execute) 28 0.00013 freebsd-internal.shire.net - bagend-i1NFS C ACCESS3 FH=EE81 (read,lookup,modify,extend,delete,execute) 29 0.05840 freebsd-internal.shire.net - bagend-i1NFS C ACCESS3 FH=868C (read,lookup,modify,extend,delete,execute) 30 0.00019 freebsd-internal.shire.net - bagend-i1NFS C ACCESS3 FH
Re: [zfs-discuss] problem ZFS / NFS from FreeBSD nfsv3 client -- periodic NFS server not resp
On Sep 25, 2006, at 3:54 PM, Mike Kupfer wrote: Chad == Chad Leigh -- Shire.Net LLC [EMAIL PROTECTED] writes: Chad so -t a should show wall clock time The capture file always records absolute time. So you (just) need to use -t a when you decode the capture file. Sorry for not making the clear earlier. OK, thanks. Sorry for being such a noob with snoop. I guess it is kind of obvious now that you would put that on the snoop that reads the file and outputs the human readable one and not the one that saves things away... This appears to be the only stuff having to do with the hanging server (lots of other stuff that is with other zfs pools that are served over nfs) 68 15:29:27.53298 freebsd-internal.shire.net - solaris-zfs-i1 NFS C FSSTAT3 FH=84EC 72 15:29:28.54294 freebsd-internal.shire.net - solaris-zfs-i1NFS C FSSTAT3 FH=84EC (retransmit) 73 15:29:29.54312 freebsd-internal.shire.net - solaris-zfs-i1NFS C FSSTAT3 FH=84EC (retransmit) 74 15:29:31.54356 freebsd-internal.shire.net - solaris-zfs-i1NFS C FSSTAT3 FH=84EC (retransmit) 75 15:29:35.54443 freebsd-internal.shire.net - solaris-zfs-i1NFS C FSSTAT3 FH=84EC (retransmit) 76 15:29:43.54610 freebsd-internal.shire.net - solaris-zfs-i1NFS C FSSTAT3 FH=84EC (retransmit) 5890 15:29:59.55835 freebsd-internal.shire.net - solaris-zfs-i1 NFS C FSSTAT3 FH=84EC 5993 15:30:31.56506 freebsd-internal.shire.net - solaris-zfs-i1 NFS C FSSTAT3 FH=84EC (retransmit) 6124 15:31:35.58971 freebsd-internal.shire.net - solaris-zfs-i1 NFS C FSSTAT3 FH=84EC (retransmit) 6346 15:32:44.23048 freebsd-internal.shire.net - solaris-zfs-i1 NFS C FSSTAT3 FH=84EC 6347 15:32:44.23585 freebsd-internal.shire.net - solaris-zfs-i1 NFS C FSSTAT3 FH=84EC (retransmit) 6755 15:34:40.56138 freebsd-internal.shire.net - solaris-zfs-i1 NFS C FSSTAT3 FH=84EC comes alive again again right about 6347 15:32:22.23585 based on matching log entries and this snoop snoop does not show me the reply packets going back. What do I need to do to go both ways? Thanks Chad mike --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Access to ZFS checksums would be nice and very useful feature
On Sep 14, 2006, at 1:32 PM, Henk Langeveld wrote: Bady, Brant RBCM:EX wrote: Part of the archiving process is to generate checksums (I happen to use MD5), and store them with other metadata about the digital object in order to verify data integrity and demonstrate the authenticity of the digital object over time. Wouldn't it be helpful if there was a utility to access/read the checksum data created by ZFS, and use it for those same purposes. Doesn't ZFS use block-level checksums? Hoping to see something like that in a future release, or a command line utility that could do the same. It might be possible to add a user set property to a file with the md5sum and a timestamp when it was computed. But what would this protect against? If you need to avoid tampering, you need the checksums offline anyway - cf. tripwire. Cheers, Henk Better still would be the forthcoming cryptographic extensions in some kind of digital-signature mode. ckl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: Re: Re: Proposal: multiple copies of user data
On Sep 12, 2006, at 4:39 PM, Celso wrote: On 12/09/06, Celso [EMAIL PROTECTED] wrote: I think it has already been said that in many peoples experience, when a disk fails, it completely fails. Especially on laptops. Of course ditto blocks wouldn't help you in this situation either! Exactly. I still think that silent data corruption is a valid concern, one that ditto blocks would solve. Also, I am not thrilled about losing that much space for duplication of unneccessary data (caused by partitioning a disk in two). Well, you'd only be duplicating the data on the mirror. If you don't want to mirror the base OS, no one's saying you have to. Yikes! that sounds like even more partitioning! The redundancy you're talking about is what you'd get from 'cp /foo/bar.jpg /foo/bar.jpg.ok', except it's hidden from the user and causing headaches for anyone trying to comprehend, port or extend the codebase in the future. the proposed solution differs in one important aspect: it automatically detects data corruption. Detecting data corruption is a function of the ZFS checksumming feature. The proposed solution has _nothing_ to do with detecting corruption. The difference is in what happens when/if such bad data is detected. Without a duplicate copy, via some RAID level or the proposed ditto block copies, the file is corrupted. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Enabling compression/encryption on a populated filesystem
On 7/18/06, Brian Hechinger [EMAIL PROTECTED] wrote: On Tue, Jul 18, 2006 at 09:46:44AM -0400, Chad Mynhier wrote: On 7/18/06, Brian Hechinger [EMAIL PROTECTED] wrote: Being able to remove devices from a pool would be a good thing. I can't personally think of any reason that I would ever do it, but a friend of mine keeps asking me why it can't do it and that it should be able to. -brian This situation is implicitly included in what Jeff said, but live data migration is a good example of where this would come in handy. Size upgrades you can do in place, and even migrating to a new shelf you can do in place as well (replace individual disk in old shelf with individual disk in new chelf). The only place the removing disks from the pool would be useful in this scenario would be if the new array had a fewer number of larger disks. There are conceivable situations in which you're not able to do a simple one-to-one device replacement. One case is the one you give, where you have an array with fewer, larger disks. But it's also feasible that the zpool structure you want to use on the new storage doesn't match what you're doing on your current storage. (Although I guess the fewer, larger disk scenario is just a special case of the situation in which the resultant zpool structure doesn't match the original.) Chad Mynhier http://cmynhier.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Enabling compression/encryption on a populated filesystem
On 7/13/06, Darren Reed [EMAIL PROTECTED] wrote: When ZFS compression is enabled, although the man page doesn't explicitly say this, my guess is that only new data that gets written out is compressed - in keeping with the COW policy. [ ... ] Hmmm, well, I suppose the same problem might apply to encrypting data too...so maybe what I need is a zfs command that will walk the filesystem's data tree, read in data and write it back out according to the current data policy. It seems this could be made a function of 'zfs scrub' -- instead of simply verifying the data, it could rewrite the data as it goes. This comes in handy in other situations. For example, with the current state of things, if you add disks to a pool that contains mostly static data, you don't get the benefit of the additional spindles when reading old data. Rewriting the data would gain you that benefit, plus it would avoid the new disks becoming the hot spot for all new writes (assuming the old disks were very full.) Theoretically this could also be useful in a live data migration situation, where you have both new and old storage connected to a server. But this assumes there would be some way to tell ZFS to treat a subset of disks as read-only. Chad Mynhier http://cmynhier.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] COW question
It uses extra space in the middle of the write, in order to hold the new data, but once the write is complete, the space occupied by the old version is now free for use. ckl On Jul 12, 2006, at 8:05 PM, Robert Chen wrote: I still could not understand why Copy on Write does not waste file system capacity. Robert Raymond Xiong 写道: Robert Chen wrote: question is ZFS uses COW(copy on write), does this mean it will double usage of capacity or waste the capacity? What COW really do? No mirror also has COW? Please help me, thanks. Robert ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http:// mail.opensolaris.org/mailman/listinfo/zfs-discuss It doesn't. Page 11 of the following slides illustrates how COW works in ZFS: http://www.opensolaris.org/os/community/zfs/docs/ zfs_last.pdf Blocks containing active data are never overwritten in place; instead, a new block is allocated, modified data is written to it, and then any metadata blocks referencing it are similarly read, reallocated, and written. To reduce the overhead of this process, multiple updates are grouped into transaction groups, and an intent log is used when synchronous write semantics are required.(from http://en.wikipedia.org/wiki/ZFS) IN snapshot scenario, COW consumes much less disk space and is much faster. Raymond ___ zfs- discuss mailing list zfs-discuss@opensolaris.org http:// mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss