Re: [zfs-discuss] previously mentioned J4000 released
Yes, but pricing that's so obviously disconnected with cost leads customers to feel they're being ripped off. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500 panic report.
Today we had another panic, at least it was during work time :) Just a shame the 999GB ufs takes 80+ mins to fsck. (Yes, it is mounted 'logging'). panic[cpu3]/thread=ff001e70dc80: free: freeing free block, dev:0xb60024, block:13144, ino:1737885, fs:/export /saba1 ff001e70d500 genunix:vcmn_err+28 () ff001e70d550 ufs:real_panic_v+f7 () ff001e70d5b0 ufs:ufs_fault_v+1d0 () ff001e70d6a0 ufs:ufs_fault+a0 () ff001e70d770 ufs:free+38f () ff001e70d830 ufs:indirtrunc+260 () ff001e70dab0 ufs:ufs_itrunc+738 () ff001e70db60 ufs:ufs_trans_itrunc+128 () ff001e70dbf0 ufs:ufs_delete+3b0 () ff001e70dc60 ufs:ufs_thread_delete+da () ff001e70dc70 unix:thread_start+8 () syncing file systems... panic[cpu3]/thread=ff001e70dc80: panic sync timeout dumping to /dev/dsk/c6t0d0s1, offset 65536, content: kernel $c vpanic() vcmn_err+0x28(3, f783a128, ff001e70d678) real_panic_v+0xf7(0, f783a128, ff001e70d678) ufs_fault_v+0x1d0(ff04facf65c0, f783a128, ff001e70d678) ufs_fault+0xa0() free+0x38f(ff001e70d8d0, a6a7358, 2000, 89) indirtrunc+0x260(ff001e70d8d0, a6a42b8, , 0, 89) ufs_itrunc+0x738(ff0550b9fde0, 0, 81, fffec0594db0) ufs_trans_itrunc+0x128(ff0550b9fde0, 0, 81, fffec0594db0) ufs_delete+0x3b0(fffed20e2a00, ff0550b9fde0, 1) ufs_thread_delete+0xda(64704840) thread_start+8() ::panicinfo cpu3 thread ff001e70dc80 message free: freeing free block, dev:0xb60024, block:13144, ino:1737885, fs:/export /saba1 rdi f783a128 rsi ff001e70d678 rdx f783a128 rcx ff001e70d678 r8 f783a128 r90 rax3 rbx0 rbp ff001e70d4d0 r10 fffec3d40580 r10 fffec3d40580 r11 ff001e70dc80 r12 f783a128 r13 ff001e70d678 r143 r15 f783a128 fsbase0 gsbase fffec3d40580 ds 4b es 4b fs0 gs 1c3 trapno0 err0 rip fb83c860 cs 30 rflags 246 rsp ff001e70d488 ss 38 gdt_hi0 gdt_lo 81ef idt_hi0 idt_lo 7fff ldt0 task 70 cr0 8005003b cr2 fed0e010 cr3 2c0 cr4 6f8 Jorgen Lundman wrote: On Saturday the X4500 system paniced, and rebooted. For some reason the /export/saba1 UFS partition was corrupt, and needed fsck. This is why it did not come back online. /export/saba1 is mounted logging,noatime, so fsck should never (-ish) be needed. SunOS x4500-01.unix 5.11 snv_70b i86pc i386 i86pc /export/saba1 on /dev/zvol/dsk/zpool1/saba1 read/write/setuid/devices/intr/largefiles/logging/quota/xattr/noatime/onerror=panic/dev=2d80024 on Sat Jul 5 08:48:54 2008 One possible related bug: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=4884138 What would be the best solution? Go back to latest Solaris 10 and pass it on to Sun support, or find a patch for this problem? Panic dump follows: -rw-r--r-- 1 root root 2529300 Jul 5 08:48 unix.2 -rw-r--r-- 1 root root 10133225472 Jul 5 09:10 vmcore.2 # mdb unix.2 vmcore.2 Loading modules: [ unix genunix specfs dtrace cpu.AuthenticAMD.15 uppc pcplusmp scsi_vhci ufs md ip hook neti sctp arp usba uhci s1394 qlc fctl nca lofs zfs random cpc crypto fcip fcp logindmux nsctl sdbc ptm sv ii sppp rdc nfs ] $c vpanic() vcmn_err+0x28(3, f783ade0, ff001e737aa8) real_panic_v+0xf7(0, f783ade0, ff001e737aa8) ufs_fault_v+0x1d0(fffed0bfb980, f783ade0, ff001e737aa8) ufs_fault+0xa0() dqput+0xce(1db26ef0) dqrele+0x48(1db26ef0) ufs_trans_dqrele+0x6f(1db26ef0) ufs_idle_free+0x16d(ff04f17b1e00) ufs_idle_some+0x152(3f60) ufs_thread_idle+0x1a1() thread_start+8() ::cpuinfo ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 0 fbc2fc10 1b00 60 nono t-0 ff001e737c80 sched 1 fffec3a0a000 1f10 -1 nono t-0ff001e971c80 (idle) 2 fffec3a02ac0 1f00 -1 nono t-1ff001e9dbc80 (idle) 3 fffec3d60580 1f00 -1 no
Re: [zfs-discuss] X4540
Well, I'm not holding out much hope of Sun working with these suppliers any time soon. I asked Vmetro why they don't work with Sun considering how well ZFS seems to fit with their products, and this was the reply I got: Micro Memory has a long history of working with Sun, and I worked at Sun for almost 10 years developing Solaris x86. We have tried to get various Sun Product Managers responsible for these servers (Thumper) to work with us on this and they have said no. We have tried to get Sun's integration group to work with us (where they would integrate upon customer request, charging the customer for integration and support), and they have also said no. They don't feel there is an adequate business case to justify it as all of the opportunities are so small. This is an incredibly frustrating response for all the Sun customers who could have really benefited from these cards. Why develop the ability to move the ZIL to nvram devices, benchmark the Thumper on one of them, and then refuse to work with the manufacturer to offer the card to customers? I appreciate Sun are working on their own flash memory solutions, but surely it's to their benefit and ours to take advantage of the technology already on the market with years of tried tested use behind it? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Mirroring - Scenario
Essentially yes, the entire pool dies. If you think of each mirror as an individual disk, you've just striped them together so the pool goes offline if any mirror fails, and each mirror can only guard against one half of the mirror failing. If you want to guard against any two trays failing, you need to use some kind of dual parity protection. Either dual mirrors, or raid-z2. Given that you only have 8 LUN's, raid-z2 would seem to be the best option. If you really need to use mirroring for performance, is there any way you can split those trays to generate two LUN's each? That gives you 16 LUN's in total, enough for five dual mirror sets (using 3 LUN's each), plus one acting as a hot spare. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] please help with raid / failure / rebuild calculations
Without checking your math, I believe you may be confusing the risk of *any* data corruption with the risk of a total drive failure, but I do agree that the calculation should just be for the data on the drive, not the whole array. My feeling on this from the various analyses I've read on the web is that you're reasonably likely to find some corruption on a drive during a rebuild, but raid-6 protects you from this nicely. From memory, I think the stats were something like a 5% chance of an error on a 500GB drive, which would mean something like a 10% chance with your 1TB drives. That would tie in with your figures if you took out the multiplier for the whole raid's data. Instead of a guaranteed failure, you've calculated around 1 in 10 odds. So, during any rebuild you've around a 1 in 10 chance of the rebuild encountering *some* corruption, but that's very likely going to be just a few bits of data, which can be easily recovered using raid-6 and the rest of the rebuild can carry on as normal. Of course there's always a risk of a second drive failing, which is why we have backups, but I believe that risk is miniscule in comparison, and also offset by the ability to regularly scrub your data, which helps to ensure that any problems with drives are caught early on. Early replacement of failing drives means it's far less likely that you'll ever have two fail together. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS/Install related question
Get a cheap 5th SATA drive to act as your boot drive, install Solaris on that, and then let ZFS use the whole of the remaining 4 drives. That gives you performance benefits, and it means it's very easy to recover if your boot drive fails - just re-install Solaris and zpool import the raid array. The raid data is stored on the drives so you can even take those 4 drives and fit them to another machine if you need the data quick. ZFS doesn't even care what order the drives are attached in. To install Solaris, just boot from the DVD and follow the prompts. I managed that as a windows admin with no Linux or Solaris experience so you should be fine :-) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Case study/recommended ZFS setup for home file server
It was posted in the CIFS forum a couple of days ago: http://www.opensolaris.org/jive/forum.jspa?forumID=214 Thread: HEADS-UP: Please skip snv_93 if you use CIFS server: http://www.opensolaris.org/jive/thread.jspa?threadID=65996tstart=0 This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Case study/recommended ZFS setup for home file server
On Thu, Jul 10, 2008 at 1:15 AM, Fajar A. Nugraha [EMAIL PROTECTED] wrote: Another alternative is to use an IDE to Compact Flash adapter, and boot off of flash. Just curious, what will that flash contain? e.g. will it be similar to linux's /boot, or will it contain the full solaris root? How do you manage redundancy (e.g. mirror) for that boot device? 4gb is enough to hold a minimal system install. /var will go to a file system on the raidz pool. ZFS mirroring can be used on boot devices for redundancy. -B -- Brandon High [EMAIL PROTECTED] The good is the enemy of the best. - Nietzsche ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send/receive questions
Will Murnane wrote: On Thu, Jul 10, 2008 at 12:43, Glaser, David [EMAIL PROTECTED] wrote: I guess what I was wondering if there was a direct method rather than the overhead of ssh. On receiving machine: nc -l 12345 | zfs recv mypool/[EMAIL PROTECTED] and on sending machine: zfs send sourcepool/[EMAIL PROTECTED] | nc othermachine.umich.edu 12345 You'll need to build your own netcat, but this is fairly simple. If Why ? Pathname: /usr/bin/nc Type: regular file Expected mode: 0555 Expected owner: root Expected group: bin Expected file size (bytes): 31428 Expected sum(1) of contents: 5207 Expected last modification: Jun 16 05:58:18 2008 Referenced by the following packages: SUNWnetcat Current status: installed -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] X4540
On Jul 10, 2008, at 12:42, Tim wrote: It's the same reason you don't see HDS or EMC rushing to adjust the price of the SYM or USP-V based on Sun releasing the thumpers. No one ever got fired for buying EMC/HDS/NTAP I know my company has corporate standards for various aspects of IT, and if someone purchases something out side of that (which is frowned upon) then you're on your own. If you open a service / trouble ticket for it they'll just close it saying not supported. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recovering an array on Mac
So, does anybody have an approach to recovering this filesystem? Is there a way to relabel the drives so that ZFS will recognize them, without losing the data? Thanks, Lee On Jul 5, 2008, at 1:24 PM, Lee Fyock wrote: Hi-- Here's the scoop, in probably too much detail: I'm a sucker for new filesystems and new tech in general. For you old-time Mac people, I installed Sequoia when it was first seeded, and had to reformat my drive several times as it grew to the final release. I flipped the journaled flag before I even knew what it meant. I installed the pre-Leopard ZFS seed and have been using it for, what, a year? So, I started with two 500 GB drives in a single pool, not mirrored. I bought a 1 TB drive and added it to the pool. I bought another 1 TB drive, and finally had enough storage (~1.5 TB) to mirror my disks and be all set for the foreseeable future. In order to migrate my data from a single pool of 500 GB + 500 GB + 1 TB to a mirrored 500GB/500GB + 1TB/1TB pool, I was planning on doing this: 1) Copy everything to the New 1 TB drive (slopping what wouldn't fit onto another spare drive) 2) Upgrade to the latest ZFS for Mac release (117) 3) Destroy the existing pool 4) Create a pool with the two 500 GB drives 5) Copy everything from the New drive to the 500 GB x 2 pool 6) Create a mirrored pool with the two 1 TB drives 7) Copy everything from the 500 GB x 2 pool to the mirrored 1 TB pool 8) Destroy the 500 GB x 2 pool, and create it as a 500GB/500GB mirrored pair and add it to the 1TB/1TB pool During step 7, while I was at work, the power failed at home, apparently long enough to drain my UPS. When I rebooted my machine, both pools refused to mount: the 500+500 pool and the 1TB/1TB mirrored pool. Just about all my data is lost. This was my media server containing my DVD rips, so everything is recoverable in that I can re-rip 1+TB, but I'd rather not. diskutil list says this: /dev/disk1 #: TYPE NAMESIZE IDENTIFIER 0: FDisk_partition_scheme*465.8 Gi disk1 1:465.8 Gi disk1s1 /dev/disk2 #: TYPE NAMESIZE IDENTIFIER 0: FDisk_partition_scheme*465.8 Gi disk2 1:465.8 Gi disk2s1 /dev/disk3 #: TYPE NAMESIZE IDENTIFIER 0: FDisk_partition_scheme*931.5 Gi disk3 1:931.5 Gi disk3s1 /dev/disk4 #: TYPE NAMESIZE IDENTIFIER 0: FDisk_partition_scheme*931.5 Gi disk4 1:931.5 Gi disk4s1 During step 2, I created the pools using zpool create media mirror / dev/disk3 /dev/disk4 then zpool upgrade, since I got warnings that the filesystem version was out of date. Note that I created zpools referring to the entire disk, not just a slice. I had labelled the disks using diskutil partitiondisk /dev/disk2 GPTFormat ZFS %noformat% 100% but now the disks indicate that they're FDisk_partition_scheme. Googling for FDisk_partition_scheme yields http://lists.macosforge.org/pipermail/zfs-discuss/2008-March/000240.html , among other things, but no hint of where to go from here. zpool import -D reports no pools available to import. All of this is on a Mac Mini running Mac OS X 10.5.3, BTW. I own Parallels if using an OpenSolaris build would be of use. So, is the data recoverable? Thanks! Lee ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] please help with raid / failure / rebuild calculations
Hello relling, Thanks for your comments. FWIW, I am building an actual hardware array, so een though I _may_ put ZFS on top of the hardware arrays 22TB drive that the OS sees (I may not) I am focusing purely on the controller rebuild. So, setting aside ZFS for the moment, am I still correct in my intuition that there is no way a _controller_ needs to touch a disk more times than there are bits on the entire disk, and that this calculation people are doing is faulty ? I will check out that blog - thanks. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] raid or mirror
I'm still confused. What is a -SAFE- way with two drives if you prepare for hardware faulure? That is: one drive fails and the system does not go down because the other drive takes over. Do I need raid or mirror? -- Dick Hoogendijk -- PGP/GnuPG key: 01D2433D ++ http://nagual.nl/ + SunOS sxce snv91 ++ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raid or mirror
Hi Dick You want Mirroring. A Sun system with mirrored disks can be configured to not go down due to one disk failing. For this to be valid, you need to also make sure that the device used for SWAP is mirrored - you won't believe how many times I've seen this mistake being made. To be even MORE safe, you want the two disks to be on separate controllers, so that you can survive a controller failure too. note: Technically, mirroring is RAID, to be specific, it is Raid level 1. _Johan On Fri, Jul 11, 2008 at 2:37 PM, dick hoogendijk [EMAIL PROTECTED] wrote: I'm still confused. What is a -SAFE- way with two drives if you prepare for hardware faulure? That is: one drive fails and the system does not go down because the other drive takes over. Do I need raid or mirror? -- Dick Hoogendijk -- PGP/GnuPG key: 01D2433D ++ http://nagual.nl/ + SunOS sxce snv91 ++ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Any sufficiently advanced technology is indistinguishable from magic. Arthur C. Clarke Afrikaanse Stap Website: http://www.bloukous.co.za My blog: http://initialprogramload.blogspot.com ICQ = 193944626, YahooIM = johan_hartzenberg, GoogleTalk = [EMAIL PROTECTED], AIM = JohanHartzenberg ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Mirroring - Scenario
Sorry, but I'm stuck at 6540. There are so many options in how you would practically configure these that there is no way to give a sensible answer to your question. But the most basic questions are: Does the racks have power from separate PDUs? Are they in physically remote locations? Does your fabric switches have redundant power from separate PDUs? Do you want mirroring here purely for performance reasons? Because these systems have so much internal redundancy that I can not see why you would want to mirror across them. Striping would give you better performance. On Thu, Jul 10, 2008 at 11:01 PM, Robb Snavely [EMAIL PROTECTED] wrote: I have a scenario (tray failure) that I am trying to predict how zfs will behave and am looking for some input . Coming from the world of svm, ZFS is WAY different ;) If we have 2 racks, containing 4 trays each, 2 6540's that present 8D Raid5 luns to the OS/zfs and through zfs we setup a mirror config such that: I'm oversimplifying here but... Rack 1 - Tray 1 = lun 0Rack 2 - Tray 1 = lun 4 Rack 1 - Tray 2 = lun 1Rack 2 - Tray 2 = lun 5 Rack 1 - Tray 3 = lun 2Rack 2 - Tray 3 = lun 6 Rack 1 - Tray 4 = lun 3Rack 2 - Tray 4 = lun 7 so the zpool command would be: zpool create somepool mirror 0 4 mirror 1 5 mirror 2 6 mirror 3 7 ---(just for ease of explanation using the supposed lun numbers) so a status output would look similar to: somepool mirror 0 4 mirror 1 5 mirror 3 6 mirror 4 7 Now in the VERY unlikely event that we lost the first tray in each rack which contain 0 and 4 respectively... somepool mirror--- 0 | 4 | Bye Bye --- mirror 1 5 mirror 3 6 mirror 4 7 Would the entire somepool zpool die? Would it affect ALL users in this pool or a portion of the users? Is there a way in zfs to be able to tell what individual users are hosed (my group is a bunch of control freaks ;)? How would zfs react to something like this? Also any feedback on a better way to do this is more then welcome Please keep in mind I am a ZFS noob so detailed explanations would be awesome. Thanks in advance Robb ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Any sufficiently advanced technology is indistinguishable from magic. Arthur C. Clarke Afrikaanse Stap Website: http://www.bloukous.co.za My blog: http://initialprogramload.blogspot.com ICQ = 193944626, YahooIM = johan_hartzenberg, GoogleTalk = [EMAIL PROTECTED], AIM = JohanHartzenberg ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] X4540
Bob Friesenhahn I expect that Sun is realizing that it is already undercutting much of the rest of its product line. These minor updates would allow the X4540 to compete against much more expensive StorageTek SAN hardware. Assuming, of course that the requirements for the more expensive SAN hardware don't include, for example, surviving a controller or motherboard failure (or gracefully a RAM chip failure) without requiring an extensive downtime for replacement, or other extended downtime because there's only 1 set of chips that can talk to those disks. Real SAN storage is dual-ported to dual controller nodes so that you can replace a motherboard without taking down access to the disk. Or install a new OS version without waiting for the system to POST. How can other products remain profitable when competing against such a star performer? Features. RAS. Simplicity. Corporate Inertia (having storage admins who don't know OpenSolaris). Executive outings with StorageTek-logo'd golfballs. The last 2 aren't something I'd build a business case around, but they're a reality. --Joe ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] X4540
On Fri, Jul 11, 2008 at 9:25 AM, Moore, Joe [EMAIL PROTECTED] wrote: Features. RAS. Simplicity. Corporate Inertia (having storage admins who don't know OpenSolaris). Executive outings with StorageTek-logo'd golfballs. The last 2 aren't something I'd build a business case around, but they're a reality. --Joe Why not? There's several in the market today whom I suspect have done just that :D I won't name names, but for anyone in the industry I doubt I have to. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send/receive questions
On Fri, Jul 11, 2008 at 05:23, Darren J Moffat [EMAIL PROTECTED] wrote: Why ? Referenced by the following packages: SUNWnetcat Is this in 10u5? Weird, it's not on my media. Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problem mirror
Hi thanks for you help in the forum help i got an answer also iam gonna try that. But your suggestion is also an angle with i will investigate. Is there maybo some diagnostic tool in opensolaris i can use, or shall i use the solaris bootable cd that inspects of my hw is fully compitble ? thanks ! This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send/receive questions
Will Murnane wrote: On Fri, Jul 11, 2008 at 05:23, Darren J Moffat [EMAIL PROTECTED] wrote: Why ? Referenced by the following packages: SUNWnetcat Is this in 10u5? Weird, it's not on my media. No but this is an opensolaris.org alias not a Solaris 10 support forum. So the assumption unless people say otherwise is that you are running a recent build of SX:CE or OpenSolaris 2008.05 (including updates). -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Mirroring - Scenario
On Fri, 11 Jul 2008, Ross wrote: If you want to guard against any two trays failing, you need to use some kind of dual parity protection. Either dual mirrors, or raid-z2. Given that you only have 8 LUN's, raid-z2 would seem to be the best option. System reliability will be dominated by the reliability of the weakest VDEV. If all the VDEVs have the same reliability then the reliability of the entire load-shared pool will be the reliability of one VDEV divided by the number of VDEVs. Given sufficient individual VDEV reliability, it can be seen that it takes quite a lot of VDEVs in the load-shared pool before the number of VDEVs becomes very significant in the pool reliability calculation. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send/receive questions
On Fri, Jul 11, 2008 at 11:44, Darren J Moffat [EMAIL PROTECTED] wrote: No but this is an opensolaris.org alias not a Solaris 10 support forum. So the assumption unless people say otherwise is that you are running a recent build of SX:CE or OpenSolaris 2008.05 (including updates). Luckily, the OP mentioned he's running 10u5 in his first post ;) Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Recovering corrupted root pool
Yesterday evening, I tried Live Upgrade on a Sun Fire V60x running SX:CE 90 to SX:CE 93 with ZFS root (mirrored root pool called root). The LU itself ran without problems, but before rebooting the machine, I wanted to add some space to the root pool that had previously been in use for an UFS BE. Both disks (c0t0d0 and c0t1d0) were partitioned as follows: Part TagFlag Cylinders SizeBlocks 0 rootwm 1 - 18810 25.91GB(18810/0/0) 54342090 1 unassignedwm 18811 - 246188.00GB(5808/0/0) 16779312 2 backupwm 0 - 24618 33.91GB(24619/0/0) 71124291 3 unassignedwu 00 (0/0/0)0 4 unassignedwu 00 (0/0/0)0 5 unassignedwu 00 (0/0/0)0 6 unassignedwu 00 (0/0/0)0 7 unassignedwu 00 (0/0/0)0 8 bootwu 0 - 01.41MB(1/0/0) 2889 9 unassignedwu 00 (0/0/0)0 Slice 0 is used by the root pool, slice 1 was used by the UFS BE. To achieve this, I ludeleted the now unused UFS BE and used # NOINUSE_CHECK=1 format to extend slice 0 by the size of slice 1, deleting the latter afterwards. I'm pretty sure that I've done this successfully before, even on a live system, but this time something went wrong: I remember an FMA message about one side of the root pool mirror being broken (something about an inconsistent label, unfortunately I didn't write down the exact message). Nonetheless, I rebooted the machine after luactivate sol_nv_93 (the new ZFS BE), but the machine didn't come up: SunOS Release 5.11 Version snv_93 32-bit Copyright 1983-2008 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. NOTICE: spa_import_rootpool: error 22 panic[cpu0]/thread=fec1cfe0: cannot mount root path /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0:a /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0:a fec351ac genunix:rootconf+10b (c0f040, 1, fec1c750) fec351d0 genunix:vfs_mountroot+54 (fe800010, fec30fd8,) fec351e4 genunix:main+b4 () panic: entering debugger (no dump device, continue to reboot) skipping system dump - no dump device configured rebooting... I've managed a failsafe boot (from the same pool), and zpool import reveals pool: root id: 14475053522795106129 state: UNAVAIL status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. see: http://www.sun.com/msg/ZFS-8000-EY config: root UNAVAIL insufficient replicas mirror UNAVAIL corrupted data c0t1d0s0 ONLINE c0t0d0s0 ONLINE Even restoring slice 1 on both disks to its old size and shrinking slice 0 accordingly doesn't help. I'm sure I've done this correctly since I could boot from the old sol_nv_b90_ufs BE, which was still on c0t0d0s1. I didn't have much success to find out what's going on here: I tried to remove either of the disks in case both sides of the mirror are inconsistent, but to no avail. I didn't have much luck with zdb either. Here's the output of zdb -l /dev/rdsk/c0t0d0s0 and /dev/rdsk/c0t1d0s0: c0t0d0s0: LABEL 0 version=10 name='root' state=0 txg=14643945 pool_guid=14475053522795106129 hostid=336880771 hostname='erebus' top_guid=17627503873514720747 guid=6121143629633742955 vdev_tree type='mirror' id=0 guid=17627503873514720747 whole_disk=0 metaslab_array=13 metaslab_shift=28 ashift=9 asize=36409180160 is_log=0 children[0] type='disk' id=0 guid=1526746004928780410 path='/dev/dsk/c0t1d0s0' devid='id1,[EMAIL PROTECTED]/a' phys_path='/[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0:a' whole_disk=0 DTL=160 children[1] type='disk' id=1 guid=6121143629633742955 path='/dev/dsk/c0t0d0s0' devid='id1,[EMAIL PROTECTED]/a' phys_path='/[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0:a' whole_disk=0 DTL=272 LABEL 1 version=10 name='root' state=0
Re: [zfs-discuss] please help with raid / failure / rebuild calculations
User Name wrote: Hello relling, Thanks for your comments. FWIW, I am building an actual hardware array, so een though I _may_ put ZFS on top of the hardware arrays 22TB drive that the OS sees (I may not) I am focusing purely on the controller rebuild. So, setting aside ZFS for the moment, am I still correct in my intuition that there is no way a _controller_ needs to touch a disk more times than there are bits on the entire disk, and that this calculation people are doing is faulty ? I think the calculation is correct, at least for the general case. At FAST this year there was an interesting paper which tried to measure this exposure in a large field sample by using checksum verifications. I like this paper and it validates what we see in the field -- the most common failure mode is unrecoverable read. http://www.usenix.org/event/fast08/tech/ full_papers/bairavasundaram/bairavasundaram.pdf I should also point out that ZFS is already designed to offer some diversity which should help guard against spatially clustered media failures. hmmm... another blog topic in my queue... -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problem mirror
There's nothing I know of I'm afraid, I'm too new to Solaris to have looked into things that deeply. If you have access to any spare parts, the easiest way to test is to swop things over and see if the problem is reproducable. It could even be something as simple as a struggling power supply. Running a compatibility check does sound like a good first step though. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Largest (in number of files) ZFS instance tested
I need to find out what is the largest ZFS file system - in numbers of files, NOT CAPACITY that has been tested. Looking to scale to billions of files and would like to know if anyone has tested anything close and what the performance ramifications are. Has anyone tested a ZFS file system with at least 100 million + files? What were the performance characteristics? Thanks! Sean -- http://www.sun.com * Sean Cochrane * Global Storage Architect *Sun Microsystems, Inc.* 525 South 1100 East Salt Lake City, UT 84102 US Phone +1877 255 5756 Mobile +1801-949-4799 Fax +1877.255.5756 Email [EMAIL PROTECTED] ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Largest (in number of files) ZFS instance tested
On Fri, 11 Jul 2008, Sean Cochrane - Storage Architect wrote: I need to find out what is the largest ZFS file system - in numbers of files, NOT CAPACITY that has been tested. Looking to scale to billions of files and would like to know if anyone has tested anything close and what the performance ramifications are. Wow. Just curious, what sort of application is this? -- Rich Teer, SCSA, SCNA, SCSECA CEO, My Online Home Inventory URLs: http://www.rite-group.com/rich http://www.linkedin.com/in/richteer http://www.myonlinehomeinventory.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] previously mentioned J4000 released
bf == Bob Friesenhahn [EMAIL PROTECTED] writes: bf since the dawn of time since the dawn of time Sun has been playing these games with hard drive ``sleds''. I still have sparc32 stuff on the shelf with missing/extra sleds. bf POTS line bf cell phone bf You are free to select products from a different vendor. what? So, this means he *shouldn't* feel like he's being ripped off if he buys from Sun? blinks bf Sun's pricing likely reflects the high cost of product bf development, warranty, service, and quality control. You are talking about cost here, but the pricing reflects ``market forces''. The blog makes it sound like Sun engineers have come up with this sneaky plan to achieve a certain tier of reliability at a tier below in cost, but what they really mean by low cost is, low cost _to Sun_, not to customers. The price you pay is determined by what other vendors charge for the same tier of reliability---knowing this, while reading the blog you would already be thinking, ``oh fantastic, a tiny ~$10 chip and a plastic carrier that's practically free, but has incredible market value. They've come up with a scheme for ripping me off. What smooth and adept capitalists they are! What merit, what admiration I have for their schemes! too bad it helps them, not me.'' If you're a stockholder, get excited about the blogs, but for customers, without Sun's price list and their competitors' price lists in front of you, there's apparently not much point in discussing anything (except maybe, whether we can swap drives out of the tray and have the thing still work or whether there is some ``sled DRM'' in the closed-source LSI Logic SATA driver, and how much we save by not buying a support contract which I assume is pointless after said swapping). pgpvZssYVw8bz.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raid or mirror
jh == Johan Hartzenberg [EMAIL PROTECTED] writes: jh To be even MORE safe, you want the two disks to be on separate jh controllers, so that you can survive a controller failure too. or a controller-driver-failure. At least on Linux, when a disk goes bad, Linux starts resetting controllers and xATA busses and stuff, and often takes out any nearby drives. It's often hard to determine which drive is actually bad. Depending on how well-integrated your hardware is with Solaris and how the drive fails, I suspect this sort of thing could imagineably happen there, too. pgpgboNQVlAQg.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] please help with raid / failure / rebuild calculations
Thanks for your comments. FWIW, I am building an actual hardware array, so een though I _may_ put ZFS on top of the hardware arrays 22TB drive that the OS sees (I may not) I am focusing purely on the controller rebuild. Not letting ZFS handle (at least one level of) redundancy is a bad idea. Don't do that! This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problem mirror
Hi I too strongly suspect that some HW component is failing. It is rare to see all drives (in your case both drives in mirror and the boot drive) reporting errors at same time. zfs clear just resets the error counters. You still have got errors in there. Start with following components (in this order): 1. Memory: Use memtest86+ (use any live CD.. it is very common) 2. Power supply - search the forums, it is very common 3. Your mobo/disk controller - (??? try another one maybe) Have you also experienced any kernel panics or strange random software crashes on this box ? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Mirroring - Scenario
Thank you for all the feedback! It's appreciated! @hartz Does the racks have power from separate PDUs? Yes Are they in physically remote locations? No, the racks are side by side Does your fabric switches have redundant power from separate PDUs? yes Do you want mirroring here purely for performance reasons? Goals would be data integrity, and REDUNDANCY - while not throwing performance completely out of the window. Because these systems have so much internal redundancy that I can not see why you would want to mirror across them. This is due to the fact that we have data that we just can't afford to lose. Our hardware/power setup is pretty good and we have good backups but want to try to make sure all of our customers data is protected from every angle...and as I said in the initial post I know this scenario is possible but not probable especially with the redundant power etcso in short, mirroring was put in there to account for one of the racks failing (again, unlikely in our setup..hope for the best...prepare for the worst) Striping would give you better performance. So how would this be setup in ZFS zpool create somepool 0 1 2 3 4 5 6 7 So essentially a raid 0 on the zfs side? and leave all the redundancy on the hardware?shakes nervously Am I understanding that correctly? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recovering an array on Mac
This shouldn't have happened. Do you have zdb on Mac ? If yes you can try it. It is (intentionally?) undocumented, so you'll need to search for various scripts on blogs.sun.com and here. Something might just work. But do check what apple is actually shipping. You may want to use dtrace to find out why it can't find any pools. I doubt it is due to labelling mistake as that should have been flushed long back if you were copying data when you lost power. ZFS transactional property guarantees that. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problem mirror
Hi running all kinds of tools now even a tool for my hd from WD, so we will she what the results are. I ordered another mobo this morning and if that doesn't work then i will ask a fellow sysop to punt my disk in his solaris array. No i didn't notice anything of kernel panics the only thing i noticed was this line popping up when i did a shutdown The machine himself is just used as a storage array nothing else is running on it, and i use CIFS to share and that works great. gzip: kernel/misc/qlc/qlc_fw_2400: I/O error keep you posted thanks for everything already :) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problem mirror
Trying the disks in another machine is a great step, it will eliminate those quickly. Use your own cables too so you can eliminate them from suspicion. If this is hardware related, from my own experience I would say it's most likely to be (in order): - Power Supply - Memory (especially if ever handled without anti-static precautions) - Bad driver / disk controller - Bad cpu / motherboard - other component When you get your new board, just set it up for troubleshooting with the bare minimum components: - Power supply - Motherboard - CPU - Memory - Disks - Power button Don't even connect the reset switch or the case LED's. It's by far the quickest way to eliminate items from suspicion. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Largest (in number of files) ZFS instance tested
On Fri, 11 Jul 2008, Sean Cochrane - Storage Architect wrote: I need to find out what is the largest ZFS file system - in numbers of files, NOT CAPACITY that has been tested. In response to an earlier such question (from you?) I created a directory with a million files. I forgot about it since then so the million files are still there without impacting anything for a month now. The same simple script (with a small enhancement) could be used to create a million directories containing a million files but it might take a while to complete. It seems that a Storage Architect should be willing to test this for himself and see what happens. Looking to scale to billions of files and would like to know if anyone has tested anything close and what the performance ramifications are. There are definitely issues with programs like 'ls' when listing a directory with a million files since 'ls' sorts its output by default. My Windows system didn't like it at all when accessing it with CIFS and the file browser since it wants to obtain all file information before doing anything else. System backup with hundreds of millions of files sounds like fun. Has anyone tested a ZFS file system with at least 100 million + files? What were the performance characteristics? I think that there are more issues with file fragmentation over a long period of time than the sheer number of files. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] please help with raid / failure / rebuild calculations
On Fri, 11 Jul 2008, Akhilesh Mritunjai wrote: Thanks for your comments. FWIW, I am building an actual hardware array, so een though I _may_ put ZFS on top of the hardware arrays 22TB drive that the OS sees (I may not) I am focusing purely on the controller rebuild. Not letting ZFS handle (at least one level of) redundancy is a bad idea. Don't do that! Agreed. A further issue to consider is mean time to recover/restore. This has quite a lot to do with actual uptime. For example, if you decide to create two huge 22TB LUNs and mirror across them, if ZFS needs to resilver one of the LUNs it will take a *long* time. A good design will try to keep any storage area which needs to be resilvered small enough that it may be restored quickly and risk of secondary failure is minimized. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] previously mentioned J4000 released
Will Murnane wrote: If the prices on disks were lower on these, they would be interesting for low-end businesses or even high-end home users. The chassis is within reach of reasonable, but the disk prices look ludicrously high from where I sit. An empty one only costs $3k, sure, but fill it with twelve disks and it's up to $20k. Are there some extra electronics required for larger disks that help explain this steep slope of cost? I can't think of any reasons off the top of my head (other than the understandable profit motive). I guess most large customers only compare storage costs against other storage vendors. Most shops I've worked with only buy fully populated shelves and none of them pay list! Ian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] previously mentioned J4000 released
The admin user doesn't have any access to customer data; just could kill off sessions, etc. --- World-class email, DNS, web- and app-hosting services, www.concentric.com . On Jul 11, 2008, at 2:05 PM, Ian Collins wrote: Will Murnane wrote: If the prices on disks were lower on these, they would be interesting for low-end businesses or even high-end home users. The chassis is within reach of reasonable, but the disk prices look ludicrously high from where I sit. An empty one only costs $3k, sure, but fill it with twelve disks and it's up to $20k. Are there some extra electronics required for larger disks that help explain this steep slope of cost? I can't think of any reasons off the top of my head (other than the understandable profit motive). I guess most large customers only compare storage costs against other storage vendors. Most shops I've worked with only buy fully populated shelves and none of them pay list! Ian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] X4540
Richard Elling wrote: The best news, for many folks, is that you can boot from an (externally pluggable) CF card, so that you don't have to burn two disks for the OS. Can these be mirrored? I've been bitten by these cards failing (in a camera). Ian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] previously mentioned J4000 released
On Fri, 11 Jul 2008, Tim wrote: 20k list gets you into a decked out storevault with FCP/iSCSI/NFS... For being just a jbod this thing is ridiculously overpriced, sorry. I'm normally the first one to defend Sun when it come to decisions made due to an enterprise customer base, but this will not be one of those situations. You are not required to purchase a Sun product. Just purchase a similar IBM or Adaptec JBOD product. They will work fine with ZFS. If Sun's product is over-priced, they will find out soon enough and adjust their prices. It may be that Sun initially sets the prices very high so that after they start shipping they can reduce the price and advertise the new bargian. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] previously mentioned J4000 released
Tim wrote: On Fri, Jul 11, 2008 at 4:05 PM, Ian Collins [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Will Murnane wrote: If the prices on disks were lower on these, they would be interesting for low-end businesses or even high-end home users. The chassis is within reach of reasonable, but the disk prices look ludicrously high from where I sit. An empty one only costs $3k, sure, but fill it with twelve disks and it's up to $20k. Are there some extra electronics required for larger disks that help explain this steep slope of cost? I can't think of any reasons off the top of my head (other than the understandable profit motive). I guess most large customers only compare storage costs against other storage vendors. Most shops I've worked with only buy fully populated shelves and none of them pay list! 20k list gets you into a decked out storevault with FCP/iSCSI/NFS... For being just a jbod this thing is ridiculously overpriced, sorry. I'm normally the first one to defend Sun when it come to decisions made due to an enterprise customer base, but this will not be one of those situations. OK, one client of mine has just installed an IBM DS3200 shelf. Pop over to IBM's site (http://www-03.ibm.com/systems/storage/disk/ds3000/ds3200/browse.html) and compare prices with a J4200. For starters, the IBM sourced 1TB drives are $249 more... Ian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] X4540
Ian Collins wrote: Richard Elling wrote: The best news, for many folks, is that you can boot from an (externally pluggable) CF card, so that you don't have to burn two disks for the OS. Can these be mirrored? I've been bitten by these cards failing (in a camera). Yes, of course. But there is only one CF slot. If you are worried about data loss, zfs set copies=2. If you are worried about CF loss, mirror to something else. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Largest (in number of files) ZFS instance tested
On Jul 11, 2008, at 4:59 PM, Bob Friesenhahn wrote: Has anyone tested a ZFS file system with at least 100 million + files? What were the performance characteristics? I think that there are more issues with file fragmentation over a long period of time than the sheer number of files. actually it's a similar problem .. with a maximum blocksize of 128KB and the COW nature of the filesytem you get indirect block pointers pretty quickly on a large ZFS filesystem as the size of your tree grows .. in this case a large constantly modified file (eg: /u01/data/ *.dbf) is going to behave over time like a lot of random access to files spread across the filesystem .. the only real difference is that you won't walk it every time someone does a getdirent() or an lstat64() so ultimately the question could be framed as what's the maximum manageable tree size you can get to with ZFS while keeping in mind that there's no real re-layout tool (by design) .. the number i'm working with until i hear otherwise is probably about 20M, but in the relativistic sense - it *really* does depend on how balanced your tree is and what your churn rate is .. we know on QFS we can go up to 100M, but i trust the tree layout a little better there, can separate the metadata out if i need to and have planned on it, and know that we've got some tools to relayout the metadata or dump/restore for a tape backed archive jonathan (oh and btw - i believe this question is a query for field data .. architect != crash test dummy .. but some days it does feel like it) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Largest (in number of files) ZFS instance tested
On Fri, Jul 11, 2008 at 5:33 PM, Sean Cochrane - Storage Architect [EMAIL PROTECTED] wrote: I need to find out what is the largest ZFS file system - in numbers of files, NOT CAPACITY that has been tested. Looking to scale to billions of files and would like to know if anyone has tested anything close and what the performance ramifications are. Has anyone tested a ZFS file system with at least 100 million + files? I've got a thumper with a pool that has over a hundred million files. I think the most in a single filesystem is currently just under 30 million (we've got plenty of those). It just works, although it's going to get a lot bigger before we're done. What were the performance characteristics? Not brilliant... Although I suspect raid-z isn't exactly the ideal choice. Still, performance generally is adequate for our needs, although backup performance isn't. (The backup problem is the real stumbling block. And backup is an area ripe for disruptive innovation.) -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Largest (in number of files) ZFS instance tested
Peter Tribble wrote: On Fri, Jul 11, 2008 at 5:33 PM, Sean Cochrane - Storage Architect [EMAIL PROTECTED] wrote: What were the performance characteristics? Not brilliant... Although I suspect raid-z isn't exactly the ideal choice. Still, performance generally is adequate for our needs, although backup performance isn't. (The backup problem is the real stumbling block. And backup is an area ripe for disruptive innovation.) Is down to volume of data, or many small files? I'm look into a problem with slow backup of a filesystem with many thousands for small files. We see high CPU load and miserable performance on restores and I've been wondering if we can tune the filesystem, or just zip the files. I guess working with many small files and tape is more of an issue with filesystem aware backups than block device ones (ufsdump). Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Largest (in number of files) ZFS instance tested
On Fri, Jul 11, 2008 at 3:59 PM, Bob Friesenhahn [EMAIL PROTECTED] wrote: There are definitely issues with programs like 'ls' when listing a directory with a million files since 'ls' sorts its output by default. My Windows system didn't like it at all when accessing it with CIFS and the file browser since it wants to obtain all file information before doing anything else. System backup with hundreds of millions of files sounds like fun. Millions of files in a directory has historically been the path to big performance problems. Even if zfs can handle millions, other tools (ls, backup programs, etc.) will choke. Create a hierarchy and you will be much happier. FWIW, I created 10+ million files and the necessary directories to make it so that no directory had more than 10 entries (dirs or files) in it. I found the creation time to be quite steady at about 2500 file/directory creations per second over the entire exercise. I saw the kernel memory usage (kstat -p unix:0:system_pages:pp_kernel) slowly and steadily increase while arc_c slowly decreased. Out of curiosity I crashed the box then ran ::findleaks to find that there was just over 32KB leaked. I've not dug in further to see where the rest of the memory was used. In the past when I was observing file creations on UFS, VxFS, and NFS with millions of files in a single directory, the file operation time was measured in seconds per operation, rather than operations per second. This was with several (100) processes contending for reading directory contents, file creations, and file deletions. This is where I found the script that though that touch $dir/test.$$ (followed by rm) was the right way to check to see if a directory is writable. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] scrub failing to initialise
If the cabling outage was transient, the disk driver would simply retry until they came back. If it's a hotplug-capable bus and the disks were flagged as missing, ZFS would by default wait until the disks came back (see zpool get failmode pool), and complete the I/O then. There would be no missing disk writes, hence nothing to resilver. Jeff On Mon, Jul 07, 2008 at 06:55:02PM +0200, Justin Vassallo wrote: Hi, I've got a zpool made up of 2 mirrored vdevs. For one moment i had a cabling problem and lost all disks... i reconnected and onlined the disks. No resilvering kicked in, so i tried to force a scrub, but nothing's happening. I issue the command and it's as if i never did. Any suggestions? Thanks justin ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss