Re: [zfs-discuss] ZFS: unreliable for professional usage?
Bob Friesenhahn wrote: On Fri, 13 Feb 2009, Ross wrote: Something like that will have people praising ZFS' ability to safeguard their data, and the way it recovers even after system crashes or when hardware has gone wrong. You could even have a common causes of this are... message, or a link to an online help article if you wanted people to be really impressed. I see a career in politics for you. Barring an operating system implementation bug, the type of problem you are talking about is due to improperly working hardware. Irreversibly reverting to a previous checkpoint may or may not obtain the correct data. Perhaps it will produce a bunch of checksum errors. Actually that's a lot like FMA replies when it sees a problem, telling the person what happened and pointing them to a web page which can be updated with the newest information on the problem. That's a good spot for This pool was not unmounted cleanly due to a hardware fault and data has been lost. The name of timestamp line contains the date which can be recovered to. Use the command # zfs reframbulocate this that -t timestamp to revert to timestamp --dave -- David Collier-Brown| Always do right. This will gratify Sun Microsystems, Toronto | some people and astonish the rest dav...@sun.com | -- Mark Twain cell: (647) 833-9377, bridge: (877) 385-4099 code: 506 9191# ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Does your device honor write barriers?
Peter Schuller wrote: It would actually be nice in general I think, not just for ZFS, to have some standard run this tool that will give you a check list of successes/failures that specifically target storage correctness. Though correctness cannot be proven, you can at least test for common cases of systematic incorrect behavior. A tiny niggle: for an operation set of moderate size, you can generate an exhaustive set of tests. I've done so for APIs, but unless you have infinite spare time, you want to generate the test set with a tool (;-)) --dave (who hasn't even Copious Spare Time, much less Infinite) c-b -- David Collier-Brown| Always do right. This will gratify Sun Microsystems, Toronto | some people and astonish the rest dav...@sun.com | -- Mark Twain cell: (647) 833-9377, bridge: (877) 385-4099 code: 506 9191# ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS core contributor nominations
+1 utterly! Mark Shellenbaum wrote: Neelakanth Nadgir wrote: +1. I would like to nominate roch.bourbonn...@sun.com for his work on improving the performance of ZFS over the last few years. thanks, -neel +1 on Roch being a core contributor. On Feb 2, 2009, at 4:02 PM, Neil Perrin wrote: Looks reasonable +1 Neil. On 02/02/09 08:55, Mark Shellenbaum wrote: The time has come to review the current Contributor and Core contributor grants for ZFS. Since all of the ZFS core contributors grants are set to expire on 02-24-2009 we need to renew the members that are still contributing at core contributor levels. We should also add some new members to both Contributor and Core contributor levels. First the current list of Core contributors: Bill Moore (billm) Cindy Swearingen (cindys) Lori M. Alt (lalt) Mark Shellenbaum (marks) Mark Maybee (maybee) Matthew A. Ahrens (ahrens) Neil V. Perrin (perrin) Jeff Bonwick (bonwick) Eric Schrock (eschrock) Noel Dellofano (ndellofa) Eric Kustarz (goo)* Georgina A. Chua (chua)* Tabriz Holtz (tabriz)* Krister Johansen (johansen)* All of these should be renewed at Core contributor level, except for those with a *. Those with a * are no longer involved with ZFS and we should let their grants expire. I am nominating the following to be new Core Contributors of ZFS: Jonathan W. Adams (jwadams) Chris Kirby Lin Ling Eric C. Taylor (taylor) Mark Musante Rich Morris George Wilson Tim Haley Brendan Gregg Adam Leventhal Pawel Jakub Dawidek Ricardo Correia For Contributor I am nominating the following: Darren Moffat Richard Elling I am voting +1 for all of these (including myself) Feel free to nominate others for Contributor or Core Contributor. -Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- David Collier-Brown| Always do right. This will gratify Sun Microsystems, Toronto | some people and astonish the rest dav...@sun.com | -- Mark Twain cell: (647) 833-9377, bridge: (877) 385-4099 code: 506 9191# ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is st_size of a zfs directory equal to the
Richard L. Hamilton rlha...@smart.net wrote: I did find the earlier discussion on the subject (someone e-mailed me that there had been such). It seemed to conclude that some apps are statically linked with old scandir() code that (incorrectly) assumed that the number of directory entries could be estimated as st_size/24; and worse, that some such apps might be seeing the small st_size that zfs offers via NFS, so they might not even be something that could be fixed on Solaris at all. But I didn't see anything in the discussion that suggested that this was going to be changed. Nor did I see a compelling argument for leaving it the way it is, either. In the face of undefined, all arguments end up as pragmatism rather than principle, IMO. Joerg Schilling wrote: This is a problem I had to fix for some customers in 1992 when people started to use NFS servers based on the Novell OS. Jörg Oh bother, I should have noticed this back in 1999/2001 (;-)) Joking aside, we were looking at the Solaris ABI (application Binary interface) and working on ensuring binary stability. The size of a directory entry was supposed to be undefined and in principle *variable*, but Novell et all seem to have assumed that the size they used was guaranteed to be the same for all time. And no machine needs more than 640 KB of memory, either... Ah well, at least the ZFS folks found it for us, so I can add it to my database of porting problems. What OSs did you folks find it on? --dave (an external consultant, these days) c-b -- David Collier-Brown| Always do right. This will gratify Sun Microsystems, Toronto | some people and astonish the rest dav...@sun.com | -- Mark Twain cell: (647) 833-9377, bridge: (877) 385-4099 code: 506 9191# ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Tuning for a file server, disabling data cache (almost)
Marcelo Leal [EMAIL PROTECTED] wrote: Hello all, I think he got some point here... maybe that would be an interesting feature for that kind of workload. Caching all the metadata, would make t the rsync task more fast (for many files). Try to cache the data is really waste of time, because the data will not be read again, and will just send away the good metadata cached. That is what i understand when he said about the 96k being descarded soon. He wants to configure an area to copy the data, and that´s it. Leave my metadata cache alone. ;-) That's a common enough behavior pattern that Per Brinch Hansen defined a distinct filetype for it in, if memory serves, the RC 4000. As soon as it's read, it's gone. We saw this behavior on NFS servers in the Markham ACE lab, and absolutely with Samba almost everywhere. My Smarter Colleagues[tm] explained it as a normal pattern whenever you have front-end caching, as backend caching is then rendered far less effective, and sometimes directly disadvantageous. It sounded like, from the previous discussion, one could tune for it with the level 1 and 2 caches, although if I understood it properly, the particular machine also had to narrow a stripe for the particular load being discussed... --dave -- David Collier-Brown| Always do right. This will gratify Sun Microsystems, Toronto | some people and astonish the rest [EMAIL PROTECTED] | -- Mark Twain cell: (647) 833-9377, bridge: (877) 385-4099 code: 506 9191# ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Sidebar re ABI stability (was Segmentation fault / core dump)
[EMAIL PROTECTED] wrote Linux does not implement stable kernel interfaces. It may be that there is an intention to do so but I've seen problems on Linux resulting from self-incompatibility on a regular base. To be precise, Linus tries hard to prevent ABI changes in the system call interfaces exported from the kernel, but the glibc team had defeated him in the past. For example, they accidentally started returning ENOTSUP from getgid when one had a library version mis- match (!). Sun stabilizes both library and system call interfaces: I used to work on that with David J. Brown's team, back when I was an employee. --dave (who's a contractor) c-b -- David Collier-Brown| Always do right. This will gratify Sun Microsystems, Toronto | some people and astonish the rest [EMAIL PROTECTED] | -- Mark Twain cell: (647) 833-9377, bridge: (877) 385-4099 code: 506 9191# ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sidebar to ZFS Availability discussion
: Case 1. Fully redundant storage array with active/active controllers. A failed controller should cause the system to recover on the surviving controller. I have some lab test data for this sort of thing and some popular arrays can take on the order of a minute to complete the failure detection and reconfiguration. You don't want to degrade the vdev when this happens, you just want to wait until the array is again ready for use (this works ok today.) I would further argue that no disk failure prediction code would be useful for this case. Case 2. Power on test. I had a bruise (no scar :-) once from an integrated product we were designing http://docs.sun.com/app/docs/coll/cluster280-3 which had a server (or two) and raid array (or two). If you build such a system from scratch, then it will fail a power-on test. If you power on the rack containing these systems, then the time required for the RAID array to boot was longer than the time required for the server to boot *and* timeout probes of the array. The result was that the volume manager will declare the disks bad and system administration intervention is required to regain access to the data in the array. Since this was an integrated product, we solved it by inducing a delay loop in the server boot cycle to slow down the server. Was it the best possible solution? No, but it was the only solution which met our other design constraints. In both of these cases, the solutions imply multi-minute timeouts are required to maintain a stable system. For 101-level insight to this sort of problem see the Sun BluePrint article (an oldie, but goodie): http://www.sun.com/blueprints/1101/clstrcomplex.pdf --dave -- David Collier-Brown| Always do right. This will gratify Sun Microsystems, Toronto | some people and astonish the rest [EMAIL PROTECTED] | -- Mark Twain cell: (647) 833-9377, bridge: (877) 385-4099 code: 506 9191# ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Sidebar to ZFS Availability discussion
Re Availability: ZFS needs to handle disk removal / driver failure better A better option would be to not use this to perform FMA diagnosis, but instead work into the mirror child selection code. This has already been alluded to before, but it would be cool to keep track of latency over time, and use this to both a) prefer one drive over another when selecting the child and b) proactively timeout/ignore results from one child and select the other if it's taking longer than some historical standard deviation. This keeps away from diagnosing drives as faulty, but does allow ZFS to make better choices and maintain response times. It shouldn't be hard to keep track of the average and/or standard deviation and use it for selection; proactively timing out the slow I/Os is much trickier. Interestingly, tracking latency has come under discussion in the Linux world, too, as they start to deal with developing resource management for disks as well as CPU. In fact, there are two cases where you can use a feedback loop to adjust disk behavior, and a third to detect problems. The first loop is the one you identified, for dealing with near/far and fast/slow mirrors. The second is for resource management, where one throttles disk-hog projects when one discovers latency growing without bound on disk saturation, and the third is in case of a fault other than the above. For the latter to work well, I'd like to see the resource management and fast/slow mirror adaptation be something one turns on explicitly, because then when FMA discovered that you in fact have a fast/slow mirror or a Dr. Evil program saturating the array, the fix could be to notify the sysadmin that they had a problem and suggesting built-in tools to ameliorate it. Ian Collins writes: One solution (again, to be used with a remote mirror) is the three way mirror. If two devices are local and one remote, data is safe once the two local writes return. I guess the issue then changes from is my data safe to how safe is my data. I would be reluctant to deploy a remote mirror device without local redundancy, so this probably won't be an uncommon setup. There would have to be an acceptable window of risk when local data isn't replicated. And in this case too, I'd prefer the sysadmin provide the information to ZFS about what she wants, and have the system adapt to it, and report how big the risk window is. This would effectively change the FMA behavior, you understand, so as to have it report failures to complete the local writes in time t0 and remote in time t1, much as the resource management or fast/slow cases would need to be visible to FMA. --dave (at home) c-b -- David Collier-Brown| Always do right. This will gratify Sun Microsystems, Toronto | some people and astonish the rest [EMAIL PROTECTED] | -- Mark Twain cell: (647) 833-9377, bridge: (877) 385-4099 code: 506 9191# ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
where you would store this information, but wherever that is, zpool status should be reporting the error and directing the admin to the log file. I would probably say this could be safely stored on the system drive. Would it be possible to have a number of possible places to store this log? What I'm thinking is that if the system drive is unavailable, ZFS could try each pool in turn and attempt to store the log there. In fact e-mail alerts or external error logging would be a great addition to ZFS. Surely it makes sense that filesystem errors would be better off being stored and handled externally? Ross Date: Mon, 28 Jul 2008 12:28:34 -0700 From: [EMAIL PROTECTED] Subject: Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed To: [EMAIL PROTECTED] I'm trying to reproduce and will let you know what I find. -- richard Win £3000 to spend on whatever you want at Uni! Click here to WIN! http://clk.atdmt.com/UKM/go/101719803/direct/01/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- David Collier-Brown| Always do right. This will gratify Sun Microsystems, Toronto | some people and astonish the rest [EMAIL PROTECTED] | -- Mark Twain cell: (647) 833-9377, bridge: (877) 385-4099 code: 506 9191# ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zfs-code] Peak every 4-5 second
And do you really have 4-sided raid 1 mirrors, not 4-wide raid-0 stripes??? --dave Robert Milkowski wrote: Hello Tharindu, Thursday, July 24, 2008, 6:02:31 AM, you wrote: We do not use raidz*. Virtually, no raid or stripe through OS. We have 4 disk RAID1 volumes. RAID1 was created from CAM on 2540. 2540 does not have RAID 1+0 or 0+1. Of course it does 1+0. Just add more drives to RAID-1 -- Best regards, Robert Milkowski mailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- David Collier-Brown| Always do right. This will gratify Sun Microsystems, Toronto | some people and astonish the rest [EMAIL PROTECTED] | -- Mark Twain (905) 943-1983, cell: (647) 833-9377, (800) 555-9786 x56583 bridge: (877) 385-4099 code: 506 9191# ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zfs-code] Peak every 4-5 second
Hmmn, that *sounds* as if you are saying you've a very-high-redundancy RAID1 mirror, 4 disks deep, on an 'enterprise-class tier 2 storage' array that doesn't support RAID 1+0 or 0+1. That sounds weird: the 2540 supports RAID levels 0, 1, (1+0), 3 and 5, and deep mirrors are normally only used on really fast equipment in mission-critical tier 1 storage... Are you sure you don't mean you have raid 0 (stripes) 4 disks wide, each stripe presented as a LUN? If you really have 4-deep RAID 1, you have a configuration that will perform somewhat slower than any single disk, as the array launches 4 writes to 4 drives in parallel, and returns success when they all complete. If you had 4-wide RAID 0, with mirroring done at the host, you would have a configuration that would (probabilistically) perform better than a single drive when writing to each side of the mirror, and the write would return success when the slowest side of the mirror completed. --dave (puzzled!) c-b Tharindu Rukshan Bamunuarachchi wrote: We do not use raidz*. Virtually, no raid or stripe through OS. We have 4 disk RAID1 volumes. RAID1 was created from CAM on 2540. 2540 does not have RAID 1+0 or 0+1. cheers tharindu Brandon High wrote: On Tue, Jul 22, 2008 at 10:35 PM, Tharindu Rukshan Bamunuarachchi [EMAIL PROTECTED] wrote: Dear Mark/All, Our trading system is writing to local and/or array volume at 10k messages per second. Each message is about 700bytes in size. Before ZFS, we used UFS. Even with UFS, there was evey 5 second peak due to fsflush invocation. However each peak is about ~5ms. Our application can not recover from such higher latency. Is the pool using raidz, raidz2, or mirroring? How many drives are you using? -B *** The information contained in this email including in any attachment is confidential and is meant to be read only by the person to whom it is addressed. If you are not the intended recipient(s), you are prohibited from printing, forwarding, saving or copying this email. If you have received this e-mail in error, please immediately notify the sender and delete this e-mail and its attachments from your computer. *** ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- David Collier-Brown| Always do right. This will gratify Sun Microsystems, Toronto | some people and astonish the rest [EMAIL PROTECTED] | -- Mark Twain (905) 943-1983, cell: (647) 833-9377, (800) 555-9786 x56583 bridge: (877) 385-4099 code: 506 9191# ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] OT: Formatting Problem of ZFS Adm Guide (pdf)
One can carve furniture with an axe, especially if it's razor-sharp, but that doesn't make it a spokeshave, plane and saw. I love star office, and use it every day, but my publisher uses Frame, so that's what I use for books. --dave W. Wayne Liauh wrote: I doubt so. Star/OpenOffice are word processors... and like Word they are not suitable for typesetting documents. SGML, FrameMaker TeX/LateX are the only ones capable of doing that. This was pretty much true about a year ago. However, after version 2.3, which adds the kerning feature, OpenOffice.org can produce very professionally looking documents. All of the OOo User Guides, which are every bit as complex as if not more so than our own user guides, are now self-generated. Solveig Haugland, a highly respected OpenOffice.org consultant, published her book OpenOffice.org 2 Guidebook (a 527-page book complete with drawings, table of contents, multi-column index, etc.) entirely on OOo. Another key consideration, in addition to perhaps a desire to support our sister product, is that the documents so generated are guaranteed to be displayable on the OS they are intended to serve. This is a pretty important consideration IMO. :-) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- David Collier-Brown| Always do right. This will gratify Sun Microsystems, Toronto | some people and astonish the rest [EMAIL PROTECTED] | -- Mark Twain (905) 943-1983, cell: (647) 833-9377, (800) 555-9786 x56583 bridge: (877) 385-4099 code: 506 9191# ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS deduplication
Hmmn, you might want to look at Andrew Tridgell's' thesis (yes, Andrew of Samba fame), as he had to solve this very question to be able to select an algorithm to use inside rsync. --dave Darren J Moffat wrote: [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote on 07/08/2008 03:08:26 AM: Does anyone know a tool that can look over a dataset and give duplication statistics? I'm not looking for something incredibly efficient but I'd like to know how much it would actually benefit our Check out the following blog..: http://blogs.sun.com/erickustarz/entry/how_dedupalicious_is_your_pool Just want to add, while this is ok to give you a ballpark dedup number -- fletcher2 is notoriously collision prone on real data sets. It is meant to be fast at the expense of collisions. This issue can show much more dedup possible than really exists on large datasets. Doing this using sha256 as the checksum algorithm would be much more interesting. I'm going to try that now and see how it compares with fletcher2 for a small contrived test. -- David Collier-Brown| Always do right. This will gratify Sun Microsystems, Toronto | some people and astonish the rest [EMAIL PROTECTED] | -- Mark Twain (905) 943-1983, cell: (647) 833-9377, (800) 555-9786 x56583 bridge: (877) 385-4099 code: 506 9191# ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Some basic questions about getting the best performance for database usage
This is a bit of a sidebar to the discussion about getting the best performance for PostgreSQL from ZFS, but may affect you if you're doing sequential scans through the 70GB table or its segments. ZFS copy-on-write results in tables' contents being spread across the full width of their stripe, which is arguably a good thing for transaction processing performance (or at least can be), but makes sequential table-scan speed degrade. If you're doing sequential scans over large amounts of data which isn't changing very rapidly, such as older segments, you may want to re-sequentialize that data. I was talking to one of the Slony developers back whern this first came up, and he suggested a process to do this in PostgreSQL. He suggested doing a cluster operation, relative to a specific index, then dropping and recreating the index. This results in the relation being rewritten in the order the index is sorted by, which should defragment/linearize it. The dropping and recreating the index rewrites it sequentially too. Neither he nor I know the cost if the relation has more than one index: we speculate they should be dropped before the clustering and recreated last. --dave -- David Collier-Brown| Always do right. This will gratify Sun Microsystems, Toronto | some people and astonish the rest [EMAIL PROTECTED] | -- Mark Twain (905) 943-1983, cell: (647) 833-9377, (800) 555-9786 x56583 bridge: (877) 385-4099 code: 506 9191# ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Some basic questions about getting the best performance for database usage
David Collier-Brown wrote: ZFS copy-on-write results in tables' contents being spread across the full width of their stripe, which is arguably a good thing for transaction processing performance (or at least can be), but makes sequential table-scan speed degrade. If you're doing sequential scans over large amounts of data which isn't changing very rapidly, such as older segments, you may want to re-sequentialize that data. Richard Elling [EMAIL PROTECTED] wrote There is a general feeling that COW, as used by ZFS, will cause all sorts of badness for database scans. Alas, there is a dearth of real-world data on any impacts (I'm anxiously awaiting...) There are cases where this won't be a problem at all, but it will depend on how you use the data. I quite agree: at some point, the experts on Oracle, MySQL and PostgreSQL will get a clear understanding of how to get the best performance for random database I/O and ZFS. I'll be interested to see what the behavior is for large, high-performance systems. In the meantime... In this particular case, it would be cost effective to just buy a bunch of RAM and not worry too much about disk I/O during scans. In the future, if you significantly outgrow the RAM, then there might be a case for a ZFS (L2ARC) cache LUN to smooth out the bumps. You can probably defer that call until later. ... it's a Really Nice Thing that large memories only cost small dollars (;-)) --dave -- David Collier-Brown| Always do right. This will gratify Sun Microsystems, Toronto | some people and astonish the rest [EMAIL PROTECTED] | -- Mark Twain (905) 943-1983, cell: (647) 833-9377, (800) 555-9786 x56583 bridge: (877) 385-4099 code: 506 9191# ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Issue with simultaneous IO to lots of ZFS pools
Darren J Moffat [EMAIL PROTECTED] wrote: Chris Siebenmann wrote: | Still, I'm curious -- why lots of pools? Administration would be | simpler with a single pool containing many filesystems. The short answer is that it is politically and administratively easier to use (at least) one pool per storage-buying group in our environment. I think the root cause of the issue is that multiple groups are buying physical rather than virtual storage yet it is all being attached to a single system. I will likely be a huge up hill battle but: if all the physical storage could be purchased by one group and a combination of ZFS reservations and quotas used on top level (eg one level down from the pool) datasets to allocate the virtual storage, and appropriate amounts charged to the groups, you could technical be able to use ZFS how it was intended with much fewer (hopefully 1 or 2) pools. The scenario Chris describes is one I see repeatedly at customers buying SAN storage (as late as last month!) and is considered a best practice on the business side. We may want to make this issue and it's management visible, as people moving from SAN to ZFS are likely to trip over it. In particular, I'd like to see a blueprint or at least a wiki discussion by someone from the SAN world on how to map those kinds of purchases to ZFS pools, how few one wants to have, what happens when it goes wrong, and how to mitigate it (;-)) --dave ps: as always, having asked for something, I'm also volunteering to help provide it: I'm not a storage or ZFS guy, but I am an author, and will happily help my Smarter Colleagues[tm] to write it up. -- David Collier-Brown| Always do right. This will gratify Sun Microsystems, Toronto | some people and astonish the rest [EMAIL PROTECTED] | -- Mark Twain (905) 943-1983, cell: (647) 833-9377, (800) 555-9786 x56583 bridge: (877) 385-4099 code: 506 9191# ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Issue with simultaneous IO to lots of ZFS pools
Chris Siebenmann [EMAIL PROTECTED] wrote: | Speaking as a sysadmin (and a Sun customer), why on earth would I have | to provision 8 GB+ of RAM on my NFS fileservers? I would much rather | have that memory in the NFS client machines, where it can actually be | put to work by user programs. | | (If I have decently provisioned NFS client machines, I don't expect much | from the NFS fileserver's cache. Given that the clients have caches too, | I believe that the server's cache will mostly be hit for things that the | clients cannot cache because of NFS semantics, like NFS GETATTR requests | for revalidation and the like.) That's certainly true for the NFS part of the NFS fileserver, but to get the ZFS feature-set, you trade off cycles and memory. If we investigate this a bit, we should be able to figure out a rule of thumb for how little memory we need for an NFS-home-directories workload without cutting into performance. --dave -- David Collier-Brown| Always do right. This will gratify Sun Microsystems, Toronto | some people and astonish the rest [EMAIL PROTECTED] | -- Mark Twain (905) 943-1983, cell: (647) 833-9377, (800) 555-9786 x56583 bridge: (877) 385-4099 code: 506 9191# ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How many ZFS pools is it sensible to use on a single server?
We've discussed this in considerable detail, but the original question remains unanswered: if an organization *must* use multiple pools, is there an upper bound to avoid or a rate of degradation to be considered? --dave -- David Collier-Brown| Always do right. This will gratify Sun Microsystems, Toronto | some people and astonish the rest [EMAIL PROTECTED] | -- Mark Twain (905) 943-1983, cell: (647) 833-9377, (800) 555-9786 x56583 bridge: (877) 385-4099 code: 506 9191# ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss