Re: [zfs-discuss] SSD over 10gbe not any faster than 10K SAS over GigE
Thank you for your input folks. The MTU 9000 idea worked like a charm. I have the Intel X25 also, but the capacity was not what I am after for a 6 device array. I have looked and looked at review after review and thats why I started with the Intel path, albeit that firmware upgrade in May was a pain to pull off. I have seen glowing things about the Samsung's and Intels both. What tipped me over the edge is a youtube video, ( surely paid for by Samsung ). Check it out : http://www.youtube.com/watch?v=96dWOEa4Djs Figuring out how to do jumbo frames on the ixgbe was fun given my newness to suns platform. Thanks, Derek -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD over 10gbe not any faster than 10K SAS over GigE
On Fri, Oct 9, 2009 at 9:25 PM, Derek Anderson wrote: > > GigE wasn't giving me the performance I had hoped for so I spring for some > 10Gbe cards. So what am I doing wrong. > > My setup is a Dell 2950 without a raid controller, just a SAS6 card. The > setup is as such > : > mirror rpool (boot) SAS 10K > raidz SSD 467 GB on 3 Samsung 256 MLC SSD (220MB/s each) > > to create the raidz I did a simple zpool create raidz SSD c1x c1xx > c1x. I have a single 10GBe card with a single IP on it. > > I created a NFS filesystem for vmware by using : zfs create SSD/vmware . I > had to set permissoins for Vmware anon=0, but thats it. Below is what zpool > iostat reads: > > File copy 10Gbe to SSD -> 40M max > file copy 1gbe to SSD -> 5.4M max > File copy SAS to SSD internal -> 90M > File copy SSD to SAS internal -> 55M > > Top shows not matter what I always have 2.5 G free and every other test says > the same thing. Can anyone tell me why this is seems to be slow? Does 90M > mean MegaBytes or MegaBits? > > Thanks, > Derek - I think you made a bad choice with the Samsung disks. I'd recommend the Intel 160Gb drives if its not too late to return the Samsungs. The Intel drives currently offer the best compromise between different work loads. There are plenty of SSD reviews and the Samsungs always come out poorly in comparison testing. Regards, -- Al Hopper Logical Approach Inc,Plano,TX a...@logical-approach.com Voice: 972.379.2133 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] deduplication
On Fri, Jul 17, 2009 at 2:42 PM, Brandon High wrote: > The keynote was given on Wednesday. Any more willingness to discuss > dedup on the list now? The following video contains a de-duplication overview from Bill and Jeff: https://slx.sun.com/1179275620 Hope this helps, - Ryan -- http://prefetch.net ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] NFS sgid directory interoperability with Linux
On Mon, 12 Oct 2009, Mark Shellenbaum wrote: > Does it only fail under NFS or does it only fail when inheriting an ACL? It only fails over NFS from a Linux client, locally it works fine, and from a Solaris client it works fine. It also only seems to fail on directories, files receive the correct group ownership: $ uname -a Linux damien 2.6.27-gentoo-r8 #7 SMP Tue May 26 13:15:08 PDT 2009 x86_64 Dual Core AMD Opteron(tm) Processor 280 AuthenticAMD GNU/Linux $ id uid=1005(henson) gid=1012(csupomona) $ mount | grep henson kyle.unx.csupomona.edu:/export/user/henson on /user/henson type nfs4 (rw,sec=krb5p,clientaddr=134.71.247.8,sloppy,addr=134.71.247.14) $ ls -ld . drwx--s--x 3 henson iit 4 Oct 12 15:58 . $ touch foo $ mkdir bar $ ls -l total 1 drwxr-sr-x 2 henson csupomona 2 Oct 12 15:58 bar -rw-r--r-- 1 henson iit 0 Oct 12 15:58 foo New directory group ownership is wrong whether the containing directory has an inheritable ACL or not. I only have ZFS filesystems exported right now, but I assume it would behave the same for ufs. The underlying issue seems to be the Sun NFS server expects the NFS client to apply the sgid bit itself and create the new directory with the parent directory's group, while the Linux NFS client expects the server to enforce the sgid bit. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] NFS sgid directory interoperability with Linux
Paul B. Henson wrote: We're running Solaris 10 with ZFS to provide home and group directory file space over NFSv4. We've run into an interoperability issue between the Solaris NFS server and the Linux NFS client regarding the sgid bit on directories and assigning appropriate group ownership on newly created subdirectories. If a directory exists with the sgid bit set owned by a group other than the user's primary group, new directories created in that directory are owned by the primary group rather than by the group of the parent directory. Evidently, the Solaris NFS server assumes the client will specify the correct owner of the directory, whereas the Linux NFS client assumes the server is in charge of implementing the sgid functionality and will assign the right group itself. As such, with a Solaris server and a Linux client the functionality is simply broken :(. This poses a considerable security issue, as the GROUP@ inherited ACL now provides access to the primary group of the user rather than the intended group, which as you might imagine is somewhat problematic. Ideally, it seems that the server should be responsible for this, rather than the client voluntarily enforcing it. Is this functionality strictly defined anywhere, or is it implementation dependent? You'd think something like this would have turned up in an interoperability bake-off at some point. Thanks for any information... Does it only fail under NFS or does it only fail when inheriting an ACL? I just tried it locally and it appears to work. # ls -ld test.dir drwsr-sr-x 2 marksstorage4 Oct 12 16:45 test.dir my primary group is "staff" $ touch file $ ls -l file -rw-r--r-- 1 marksstorage0 Oct 12 16:49 file -Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] NFS sgid directory interoperability with Linux
We're running Solaris 10 with ZFS to provide home and group directory file space over NFSv4. We've run into an interoperability issue between the Solaris NFS server and the Linux NFS client regarding the sgid bit on directories and assigning appropriate group ownership on newly created subdirectories. If a directory exists with the sgid bit set owned by a group other than the user's primary group, new directories created in that directory are owned by the primary group rather than by the group of the parent directory. Evidently, the Solaris NFS server assumes the client will specify the correct owner of the directory, whereas the Linux NFS client assumes the server is in charge of implementing the sgid functionality and will assign the right group itself. As such, with a Solaris server and a Linux client the functionality is simply broken :(. This poses a considerable security issue, as the GROUP@ inherited ACL now provides access to the primary group of the user rather than the intended group, which as you might imagine is somewhat problematic. Ideally, it seems that the server should be responsible for this, rather than the client voluntarily enforcing it. Is this functionality strictly defined anywhere, or is it implementation dependent? You'd think something like this would have turned up in an interoperability bake-off at some point. Thanks for any information... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Keep track of meta data on each zfs
On Sat, October 10, 2009 12:02, Harry Putnam wrote: > > What do real live administators who administer important data do about > meta info like that? Same thing I do about directories -- I name them meaningfully. So I've got /home/ddb which is the home directory for user ddb and is mounted from /zp1/ddb, and similarly for other users. And then I've got //fsfs/public/music and //fsfs/public/installers, which are probably mounted as one filesystem from /zp1/public but I don't remember for sure. They hold music files and software installers. I'm kind of wondering what you're doing, because the confusion sounds strange to me. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20
Hi Richard; You are right ZFS is not a shared FS so it can not be used for RAC unless you have 7000 series disk system. In Exadata ASM is used for storage Management where F20 can perform as a cache. Best regards Mertol Mertol Ozyoney Storage Practice - Sales Manager Sun Microsystems, TR Istanbul TR Phone +902123352200 Mobile +905339310752 Fax +90212335 Email mertol.ozyo...@sun.com -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Richard Elling Sent: Thursday, September 24, 2009 8:10 PM To: James Andrewartha Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] Sun Flash Accelerator F20 On Sep 24, 2009, at 12:20 AM, James Andrewartha wrote: > I'm surprised no-one else has posted about this - part of the Sun > Oracle Exadata v2 is the Sun Flash Accelerator F20 PCIe card, with > 48 or 96 GB of SLC, a built-in SAS controller and a super-capacitor > for cache protection. http://www.sun.com/storage/disk_systems/sss/f20/specs.xml At the Exadata-2 announcement, Larry kept saying that it wasn't a disk. But there was little else of a technical nature said, though John did have one to show. RAC doesn't work with ZFS directly, so the details of the configuration should prove interesting. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20
Hi James; Product will be lounched in a very short time. You can learn pricing from sun. Please keep in mind that Logzilla and F20 is desigined for slightly different tasks in mind. Logzilla is an extremely fast and reliable write device while F20 can be used for many different loads (read or write cache r both at the same time) Mertol Mertol Ozyoney Storage Practice - Sales Manager Sun Microsystems, TR Istanbul TR Phone +902123352200 Mobile +905339310752 Fax +90212335 Email mertol.ozyo...@sun.com -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of James Andrewartha Sent: Thursday, September 24, 2009 10:21 AM To: zfs-discuss@opensolaris.org Subject: [zfs-discuss] Sun Flash Accelerator F20 I'm surprised no-one else has posted about this - part of the Sun Oracle Exadata v2 is the Sun Flash Accelerator F20 PCIe card, with 48 or 96 GB of SLC, a built-in SAS controller and a super-capacitor for cache protection. http://www.sun.com/storage/disk_systems/sss/f20/specs.xml There's no pricing on the webpage though - does anyone know how it compares in price to a logzilla? -- James Andrewartha ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] deduplication
Hi All ; I am not the right person to talk about Solaris/ZFS roadmap, however you can talk with you Sun account Manager about 7000 series roadmap if you sign an NDA, which can give you more information Best regards Mertol Mertol Ozyoney Storage Practice - Sales Manager Sun Microsystems, TR Istanbul TR Phone +902123352200 Mobile +905339310752 Fax +90212335 Email mertol.ozyo...@sun.com -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Cyril Plisko Sent: Thursday, September 17, 2009 9:20 AM To: Brandon High Cc: ZFS discuss Subject: Re: [zfs-discuss] deduplication 2009/9/17 Brandon High : > 2009/9/11 "C. Bergström" : >> Can we make a FAQ on this somewhere? >> >> 1) There is some legal bla bla between Sun and green-bytes that's tying up >> the IP around dedup... (someone knock some sense into green-bytes please) >> 2) there's an acquisition that's got all sorts of delays.. which may very >> well delay the thing with green-bytes as well.. > > I know you're trying to help, but your opinion as to the delay is > hardly authoritative. > > Could someone from Sun provide information on data deduplication in > ZFS, even if just to say it's tied up in litigation at the moment? I think it should be pretty obvious by now that no one from Sun going to tell you a word until it is possible to tell things. At which point they will probably tell everything + source. My own opinion of course... -- Regards, Cyril ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Does ZFS work with SAN-attached devices?
> "sj" == Shawn Joy writes: sj> Can you explain in, simple terms, how ZFS now reacts sj> to this? I can't. :) I think Victor's long message made a lot of sense. The failure modes with a SAN are not simple. At least there is the difference of whether the target's write buffer was lost after a transient failure or not, and the current storage stack assumes it's never lost. IMHO, SAN's are in general broken by design because their software stacks don't deal predictably with common network failure modes (like the target rebooting, but the initiator staying up). The standard that would qualify to me as ``deal predictably'' would be what NFS provides: * writes are double-cached on client and server, so the client can replay them if the server crashes. To my limited knowledge, no SAN stack does this. Expensive SAN's can limit the amount of data at risk with NVRAM, but it seems like there would always be a little bit of data in-flight. A cost-conscious Solaris iSCSI target will put a quite large amount of data at risk between sync-cache commands. This is okay, just as it's ok for NFS servers, but only if all the initiators reboot whenver the target reboots. Doing the client side part of the double-caching is a little tricky because I think you really want to do it pretty high in the storage stack, maybe in ZFS rather than in the initiator, or else you will be triple-caching a TXG (twice on the client, once on the server) which can be pretty big. This means introducing the idea that a sync-cache command can fail, and that when it does, none/some/all of the writes between the last sync-cache that succeeded and the current one that failed may have been silently lost even if those write commands were ack'd successful when they were issued. * the bcp for NFS mount type is 'hard,intr' meaning, retry forever if there is a failure. If you want to stop retrying, whatever app was doing the writing gets killed. This rule means any database file that got ``intr'd'' will be crash-consistent. The SAN equivalent of 'intr' would be force-unmounting the filesystem (and force-unmounting implies either killing processes with open files or giving persistent errors to any open filehandles). I'm pretty sure no SAN stack does this intentionally whenever it's needed---rather it just sort of happens sometimes depending on how errors percolate upwards through various nested cargo-cult timeouts. I guess it would be easy to add to a first order---just make SAN targets stay down forever after they bounce until ZFS marks them offline. The tricky part is the complaints you get after: ``how do I add this target back without rebooting?'', ``do I really have to resilver? It's happening daily so I'm basically always resilvering.'', ``we are going down twice a day because of harmless SAN glitches that we never noticed before---is this really necessary?'' I think I remember some post that made it sound like people were afraid to touch any of the storage exception handling because no one knows what cases are really captured by the many stupid levels of timeouts and retries. In short, to me it sounds like the retry state machines of SAN initiators are broken by design, across the board. They make the same assumption they did for local storage: the only time data in a target write buffer will get lost is during a crash-reboot. This is wrong not only for SAN's but also for hot-pluggable drives which can have power sags that get wrongly treated the same way as CRC errors on the data cable. It's possible to get it right, like NFS is right, but instead the popular fix with most people is to leave the storage stack broken but make ZFS more resilient to this type of corruption, like other filesystems are, because resilience is good, and people are always twitchy and frightened and not expecting strictly consistent behavior around their SAN's anyway so the problem is rare. So far SAN targets have been proprietary, so vendors are free to conceal this problem with protocol tweaks, expensive NVRAM's, and giving undefended or fuzzed advice through their support channels to their paranoid, accepting sysadmins. Whatever free and open targets behaved differently were assumed to be ``immature.'' Hopefully now that SAN's are opening up this SAN write hole will finally get plugged somehow, ...maybe with one of the two * points above, and if we were to pick the second * then we'd probably need some notion of a ``target boot cookie'' so we only take the 'intr'-like force-unmount path in the cases where it's really needed. sj> Do we all agree that creating a zpool out of one device in a sj> SAN environment is not recommended. This is still a good question. The stock response is ``ZFS needs to manage at least one layer of '', but this problem (SAN target reboots while initiator does not) isn't unexplai
Re: [zfs-discuss] How to use ZFS on x4270
Hi, Am 12.10.2009 um 13:29 schrieb Richard Elling: I've not implemented qmail, but it appears to be just an MTA. These do store-and-forward, so it is unlikely that they need to use sync calls. It will create a lot of files, but that is usually done async. Async I/O for mail servers is a big no go. I worked for Canbox, a large unified messaging provider during the dotcom-boom. My experience: You can afford to lose a index because you can reconstruct it but your aren't allowed to loose a single mail. And this would be the consequence for using async for the spool. Regards Joerg -- Joerg MoellenkampTel: (+49 40) 25 15 23 - 460 Principal Field Technologist Fax: (+49 40) 25 15 23 - 425 Sun Microsystems GmbH Mobile: (+49 172) 83 18 433 Nagelsweg 55 mailto:joerg.moellenk...@sun.com D-20097 Hamburg Website: http://www.sun.de Blog: http://www.c0t0d0s0.org Sitz der Gesellschaft: Sun Microsystems GmbH Sonnenallee 1 D-85551 Kirchheim-Heimstetten Amtsgericht München: HRB 161028 Geschäftsführer:Thomas Schröder Wolfgang Engels Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Häring ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] use zpool directly w/o create zfs
Hua, The behavior below is described here: http://docs.sun.com/app/docs/doc/819-5461/setup-1?a=view The top-level /tank file system cannot be removed so it is less flexible then using descendent datasets. If you want to create snapshot or clone and later promote the /tank clone, then it is best to create separate ZFS file systems rather than using /tank. Cindy On 10/10/09 17:00, Hua wrote: I understand that usually zfs need to be created inside a zpool to store files/data. However, I quick test shows that I actually can put files directly inside a mounted zpool without creating any zfs. After zpool create -f tank c0d1 I actually can copy/delete any files into /tank. I can also create dir inside /tank. I haven't seen any documentation talking about such an usage. Just wonder whether it is allowed or is there any problem that I use zpool this way? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to use ZFS on x4270
Richard Elling wrote: On Oct 12, 2009, at 2:12 AM, tak ar wrote: I'm not aware of email services using sync regularly. In my experience with large email services, the response time of the disks used for database and indexes is the critical factor (for > 600 messages/sec delivered, caches don't matter :-) Performance of the disks for the mail messages themselves is not as critical. I'm not using database. I'm using qmail only. Sync don't matter? I've not implemented qmail, but it appears to be just an MTA. These do store-and-forward, so it is unlikely that they need to use sync calls. It will create a lot of files, but that is usually done async. I can't speak for qmail which I've never used, but MTA's should sync data to disk before acknowledging receipt, to ensure that in the event of unexpected outage, no messages are lost. (Some of the MTA testing standards do permit message duplication on unexpected MTA outage, but never any loss, or at least didn't 10 years ago when I was working in this area.) An MTA is basically a transactional database, and (if properly written), the requirements on the underlying storage will be quite similar. -- Andrew ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to use ZFS on x4270
On Oct 12, 2009, at 2:12 AM, tak ar wrote: I'm not aware of email services using sync regularly. In my experience with large email services, the response time of the disks used for database and indexes is the critical factor (for > 600 messages/sec delivered, caches don't matter :-) Performance of the disks for the mail messages themselves is not as critical. I'm not using database. I'm using qmail only. Sync don't matter? I've not implemented qmail, but it appears to be just an MTA. These do store-and-forward, so it is unlikely that they need to use sync calls. It will create a lot of files, but that is usually done async. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to use ZFS on x4270
> I'm not aware of email services using sync regularly. > In my experience > with large > email services, the response time of the disks used > for database and > indexes is > the critical factor (for > 600 messages/sec > delivered, caches don't > matter :-) > Performance of the disks for the mail messages > themselves is not as > critical. I'm not using database. I'm using qmail only. Sync don't matter? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] kernel panic on zpool import
i have re run zdb -l /dev/dsk/c9t4d0s0 as i should have the first time (thanks Nicolas). Attached output. -- This message posted from opensolaris.org# zdb -l /dev/dsk/c9t4d0s0 LABEL 0 version=14 name='tank' state=0 txg=119170 pool_guid=15136317365944618902 hostid=290968 hostname='lexx' top_guid=1561201926038510280 guid=11292568128772689834 vdev_tree type='raidz' id=0 guid=1561201926038510280 nparity=1 metaslab_array=23 metaslab_shift=35 ashift=9 asize=4000766230528 is_log=0 children[0] type='disk' id=0 guid=11292568128772689834 path='/dev/dsk/c9t4d0s0' devid='id1,s...@n50014ee2588170a5/a' phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@4,0:a' whole_disk=1 children[1] type='disk' id=1 guid=10678319508898151547 path='/dev/dsk/c9t5d0s0' devid='id1,s...@n50014ee2032b9b04/a' phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@5,0:a' whole_disk=1 children[2] type='disk' id=2 guid=16523383997370950474 path='/dev/dsk/c9t6d0s0' devid='id1,s...@n50014ee2032b9b75/a' phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@6,0:a' whole_disk=1 children[3] type='disk' id=3 guid=1710422830365926220 path='/dev/dsk/c9t7d0s0' devid='id1,s...@n50014ee2add68f2c/a' phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@7,0:a' whole_disk=1 LABEL 1 version=14 name='tank' state=0 txg=119170 pool_guid=15136317365944618902 hostid=290968 hostname='lexx' top_guid=1561201926038510280 guid=11292568128772689834 vdev_tree type='raidz' id=0 guid=1561201926038510280 nparity=1 metaslab_array=23 metaslab_shift=35 ashift=9 asize=4000766230528 is_log=0 children[0] type='disk' id=0 guid=11292568128772689834 path='/dev/dsk/c9t4d0s0' devid='id1,s...@n50014ee2588170a5/a' phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@4,0:a' whole_disk=1 children[1] type='disk' id=1 guid=10678319508898151547 path='/dev/dsk/c9t5d0s0' devid='id1,s...@n50014ee2032b9b04/a' phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@5,0:a' whole_disk=1 children[2] type='disk' id=2 guid=16523383997370950474 path='/dev/dsk/c9t6d0s0' devid='id1,s...@n50014ee2032b9b75/a' phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@6,0:a' whole_disk=1 children[3] type='disk' id=3 guid=1710422830365926220 path='/dev/dsk/c9t7d0s0' devid='id1,s...@n50014ee2add68f2c/a' phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@7,0:a' whole_disk=1 LABEL 2 version=14 name='tank' state=0 txg=119170 pool_guid=15136317365944618902 hostid=290968 hostname='lexx' top_guid=1561201926038510280 guid=11292568128772689834 vdev_tree type='raidz' id=0 guid=1561201926038510280 nparity=1 metaslab_array=23 metaslab_shift=35 ashift=9 asize=4000766230528 is_log=0 children[0] type='disk' id=0 guid=11292568128772689834 path='/dev/dsk/c9t4d0s0' devid='id1,s...@n50014ee2588170a5/a' phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@4,0:a' whole_disk=1 children[1] type='disk' id=1 guid=10678319508898151547 path='/dev/dsk/c9t5d0s0' devid='id1,s...@n50014ee2032b9b04/a' phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@5,0:a' whole_disk=1 children[2] type='disk' id=2 guid=16523383997370950474 path='/dev/dsk/c9t6d0s0' devid='id
Re: [zfs-discuss] kernel panic on zpool import
Hi Victor, i have tried to re-attach the detail from /var/adm/messages -- This message posted from opensolaris.orgOct 11 17:16:55 opensolaris unix: [ID 836849 kern.notice] Oct 11 17:16:55 opensolaris ^Mpanic[cpu0]/thread=ff000b6f7c60: Oct 11 17:16:55 opensolaris genunix: [ID 361072 kern.notice] zfs: freeing free segment (offset=3540185931776 size=22528) Oct 11 17:16:55 opensolaris unix: [ID 10 kern.notice] Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f75f0 genunix:vcmn_err+2c () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f76e0 zfs:zfs_panic_recover+ae () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7770 zfs:space_map_remove+13c () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7820 zfs:space_map_load+260 () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7860 zfs:metaslab_activate+64 () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7920 zfs:metaslab_group_alloc+2b7 () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7a00 zfs:metaslab_alloc_dva+295 () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7aa0 zfs:metaslab_alloc+9b () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7ad0 zfs:zio_dva_allocate+3e () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7b00 zfs:zio_execute+a0 () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7b60 zfs:zio_notify_parent+a6 () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7b90 zfs:zio_ready+188 () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7bc0 zfs:zio_execute+a0 () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7c40 genunix:taskq_thread+193 () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7c50 unix:thread_start+8 () Oct 11 17:16:55 opensolaris unix: [ID 10 kern.notice] Oct 11 17:16:55 opensolaris genunix: [ID 672855 kern.notice] syncing file systems... Oct 11 17:16:55 opensolaris genunix: [ID 904073 kern.notice] done Oct 11 17:16:56 opensolaris genunix: [ID 111219 kern.notice] dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel Oct 11 17:17:09 opensolaris genunix: [ID 409368 kern.notice] ^M100% done: 168706 pages dumped, compression ratio 3.58, Oct 11 17:17:09 opensolaris genunix: [ID 851671 kern.notice] dump succeeded___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to use ZFS on x4270
> > Use the BBWC to maintain high IOPS when X25-E's > write cache is disabled? > > It should certainly help. Note that in this case > your relatively > small battery-backed memory is accepting writes for > both the X25-E and > for the disk storage so the BBWC memory becomes 1/2 > as useful and you > are wasting some of the RAID card write performance. > > Some people here advocate putting as much > battery-backed memory on the > RAID card as possible (and with multiple RAID cards > if possible) > rather than using a slower slog SSD. Battery backed > RAM is faster > than FLASH SSDs. The only FLASH SSDs which can keep > up include their > own battery-backed (or capacitor backed) RAM. > > Regardless, if you can decouple your slog I/O path > from the main I/O > path, you should see less latency, and more > performance. This suggests > that you should use a different controller for your > X25-E's if you > can. OK, I will disable the X25-E's write cache. But I can't prepare the different controller because there is no budget. > > At some report I have seen, write cache is > necessary for > > wear-leveling. Should I switch off the X25-E's > write cache? > > I don't know the answer to that. Intel does not seem > to provide much > detail. If you want your slog to protect as much > data as possible > when the system loses power, then it seems that you > should disable the > X25-E write cache since it is not protected. Expect > a 5X reduction in > write IOPS performance (e.g. 5000 --> 1000). I think the data is more important than the performance, so I will disable the X25-E's write cache. > > The serser has RAID card, so I can use > hardware(Adaptec's) RAID(the > > file system is ZFS). Should I use ZFS for the RAID? > > Unless the Adaptec firmware is broken so that you > can't usefully > export the disks as "JBOD" devices, then I would use > ZFS for the RAID. OK, I will use ZFS for the RAID(include boot disk). > > I think the IOPS is important for mail server, so > ZIL is useful. The > > server has 48GB RAM and two(ZFS or hardware mirror) > X25-E(32GB) for > > ZIL(slog). I understand the ZIL needs half of RAM. > > There is a difference between synchronous IOPS and > async "IOPS" since > synchronous writes require that data be written right > away while async > I/O can be written later. Postponed writes are much > more efficient. > > If the mail software invokes fsync(2) to flush a mail > file to disk, > then a synchronous write is required. However, there > is still a > difference between opening a file with the O_DSYNC > option (all writes > are synchronous) and using the fsync(2) call when the > file write is > complete (only pending unwritten data is > synchronous). > > A lot depends on how your mail software operates. > Some mail systems > reate a file for each mail message while others > concatenate all of > the messages for one user into one file. > > You may want to defer installing your X25-Es and > evaluate performance > of the mail system with a DTrace tool called > 'zilstat', which is > written by Richard Elling. This tool will tell you > how much and what > type of synchronous write traffic you have. > > It is currently difficult to remove slog devices so > it is safer to add > them if you determine they will help rather than > reduce performance. I'm using qmail for the mail server on linux now, and I will replace it to solaris. I think the qmail invokes fsync whenever the server receives mail messages. And the mail server is used to relay mail received from application servers. I think slog device is useful. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] kernel panic on zpool import
On 11.10.09 12:59, Darren Taylor wrote: I have searched the forums and google wide, but cannot find a fix for the issue I'm currently experiencing. Long story short - I'm now at a point where I cannot even import my zpool (zpool import -f tank) without causing a kernel panic I'm running OpenSolaris snv_111b and the zpool is version 14. This is the panic from /var/adm/messages; (full output attached); Where is full stack back trace? I do not see any attachment. victor genunix: [ID 361072 kern.notice] zfs: freeing free segment (offset=3540185931776 size=22528) This is the output I get from zpool import; # zpool import pool: tank id: 15136317365944618902 state: ONLINE status: The pool was last accessed by another system. action: The pool can be imported using its name or numeric identifier and the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: tankONLINE raidz1ONLINE c9t4d0 ONLINE c9t5d0 ONLINE c9t6d0 ONLINE c9t7d0 ONLINE raidz1ONLINE c9t0d0 ONLINE c9t1d0 ONLINE c9t2d0 ONLINE c9t3d0 ONLINE I tried pulling back some info via this zdb command, but i'm not sure if i'm on the right track here (as zpool import seems to see the zpool without issue). This result is similar from all drives; # zdb -l /dev/dsk/c9t4d0 LABEL 0 failed to unpack label 0 LABEL 1 failed to unpack label 1 LABEL 2 failed to unpack label 2 LABEL 3 failed to unpack label 3 I also can complete zdb -e tank without issues – it lists all my snapshots and various objects without problem (this is still running on the machine at the moment) I have put the following into /etc/system; set zfs:zfs_recover=1 set aok=1 i've also tried mounting the zpool read only with zpool import -f -o ro tank but no luck.. I dont know where to go next? – am I meant to try and recover using an older txg? E. I would be extremely grateful to anyone who can offer advice on how to resolve this issue as the pool contains irreplaceable photos. Unfortunately I have not done any backups for a while as I thought raidz would be my savour. :( please help ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss