Re: [zfs-discuss] ZFS on 32bit x86
AMD Geodes are 32-bit only. I haven't heard any mention that they will _ever_ be 64-bit. But, honestly, this and the Via chip aren't really ever going to be targets for Solaris. That is, they simply aren't (any substantial) part of the audience we're trying to reach with Solaris x86. I'm not sure that we want to limit the Solaris target in that way; we want laptops, small embedded systems as well as big iron. The more systems Solaris runs on the bigger the eco system. If it means no high thruput ZFS, then that is fine with me and certainly I would not prioritize this. And we don't want 1U systems; we want mini-ITX, nearly silent systems which can fit in cars or which can be easily hidden away. The price is not the objection; it's the form factor. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on 32bit x86
Erik Trimble wrote: Artem Kachitchkine wrote: AMD Geodes are 32-bit only. I haven't heard any mention that they will _ever_ be 64-bit. But, honestly, this and the Via chip aren't really ever going to be targets for Solaris. That is, they simply aren't (any substantial) part of the audience we're trying to reach with Solaris x86. Didn't know our audience was made up of CPUs :) I like to think (when in a good mood) that we are trying to reach creative people who can take OpenSolaris where Sun haven't imagined or been able to. -Artem. Then let those folks fix the problem. The issue here is what amount of effort _Sun_ can put into fixing 32-bit Solaris so as to enable ZFS to comfortably run on it. This is an @opensolaris.org alias it is about working together as a community and identifying problems and discovering solutions. I don't think it is at all appropriate to bring up Sun business choices here. Where that is appropriate is when Sun employees need to justify to their manager what they are working on. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] recommended hardware for a zfs/nfs NAS?
I was wondering if anyone could recommend hardware forr a ZFS-based NAS for home use. The 'zfs on 32-bit' thread has scared me of a mini-itx fanless setup, so I'm looking at sparc or opteron. Ideally it would: a) run quiet (blade 100/150 is ok, x4100 ain't :) ) b) take advantage of cheap disks ( ide/sata, unless scsi suddenly got affordable) c) come in around the 300-400 pounds mark Don't need massive storage, it just needs to be reliable and reasonably fast - I was thinking of maybe a 2-way 100Gb mirror set. Graphics are a complete non-issue. It only needs to saturate 100mbit (I'm not planning to use it for anything else, so CPU isn't important). Any used sun systems fit the bill, or should I be thinking of rolling my own opteron? Thanks! -- Rasputin :: Jack of All Trades - Master of Nuns http://number9.hellooperator.net/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS and Virtualization
Hello Nate, I have few issues about ZFS and virtualization: [b]Virtualization and performance[/b] When filesystem traffic occurs on a zpool containing only spindles dedicated to this zpool i/o can be distributed evenly. When the zpool is located on a lun sliced from a raid group shared by multiple systems the capability of doing i/o from this zpool will be limited. Avoiding or limiting i/o to this lun until the load from the other systems decreases would overall help performance for the local zpool. I heard some rumors recently about using SMI-S to de-virtualize the traffic and allow Solaris to peek through the virtualization layers thus optimizing i/o target selection. Maybe someone has some rumors to add ;-) Virtualization with 6920 has been briefly discussed at http://www.opensolaris.org/jive/thread.jspa?messageID=14984#14984 but without conclusion or recommendations. I don't know the answer, but: Wouldn't the overhead of using SMI-S, or some other method, to determine the load on the raid group from the storage array, negate any potential I/O benefits you could gain? Avoiding or limiting I/O to a heavily used LUN in your zpool would reduce the number of spindles in your zpool thus reducing aggregate throughput anyway(?). Yes, you may be right on this. The current implementation limiting outstanding i/o operations per lun now seems more appropriate to me too. Storage array layout best practices suggest, if at all possible, to limit the number of LUNs you create from a raid group. Exactly because of the I/O limitations that you mention. This is basically true, however in virtualized environment you can not always ensure this because of the complexity. You have spindles distributed across a raid group, sliced luns from the raid group, virtualized them e.g. with 6920 and distributed the virtualized luns possibly to different hosts or zpools. Knowing which luns lead to which spindle might help to optimize vdev selection. I can understand building the smarts into ZFS to handle multipath LUNs (LUNs presented out of more than one controller on the array, active-active configurations, not simply dual-fabric multipathing) and load balance that way. Does ZFS simply take advantage of MPxIO in Solaris for multipathing/load balancing or are there plans to build support for it into the file system itself? This has already been discussed at http://www.opensolaris.org/jive/thread.jspa?messageID=44278#44159 and http://www.opensolaris.org/jive/thread.jspa?messageID=19248#19248 [b]Volume mobility[/b] One of the major advantages of zfs is sharing of the zpool capacity between filesystems. I often run application in small application containers located on separate luns which are zoned to several hosts so they can be run on different hosts. The idea behind this is failover, testing and load adjustment. Because only complete zpools can be migrated capacity sharing between movable containers is currently impossible. Are there any plans to allow zpools to be concurrently shareable between hosts? Clarification, you're not asking for shared file system behaviour are you? No. A shared filesystem is concurrently mountable on multiple servers at the same time. I was thinking of mounting [i]different[/i] filesystems from the same pool on different servers, so each filesystem is mounted at most on one server at the same time. Multiple systems zoned to see the same LUNs and simultaneously reading/writing to them? Yes, the LUNs must be visible to each host and simultaneaous writing will occur. but I assume if you coordinated which server had ownership of a zpool, there would be nothing from stopping you from creating a zpool on servera with a set of LUNs, creating your zfs file systems within the pools, zoning the same set of LUNs to one or more other servers, and then coordinating who has ownership of the zpool. This works out of the box with 'zpool export' and 'zpool import' Ex: You're testing an application/data installed on a ZFS file system on a 32-bit server (x86) system, then you want to test it on an Opteron. So you zone the LUNs to the Opteron and stop using the zpool on the 32-bit server and use it on the Opteron I may be completely incorrect about the above. This, too, works already out of the box. Other than that scenario, I think your questions fit more closely to the shared file system topic that I brought up originally. Do you mean http://www.opensolaris.org/jive/click.jspa?searchID=98699messageID=16480 ? Still if you had production data in a ZFS file system in your pool as well as test data in a separate ZFS file system also using the same pool (your application container) the disks making up your common pool would still have to be visible to multiple servers and you probably would want to limit exposure to the other ZFS file systems within that pool on the
[zfs-discuss] http://www.opensolaris.org/os/community/zfs/version/3
having just upgraded to nv42 zpool status tells me I need to upgrade the ondisk version. zpool version points me at http://www.opensolaris.org/os/community/zfs/version/3 : : sigma TS 6 $; zpool upgrade -v This system is currently running ZFS version 3. The following versions are suppored: VER DESCRIPTION --- 1 Initial ZFS version 2 Ditto blocks (replicated metadata) 3 Hot spares and double parity RAID-Z For more information on a particular version, including supported releases, see: http://www.opensolaris.org/os/community/zfs/version/N Where 'N' is the version number. : sigma TS 7 $; wget http://www.opensolaris.org/os/community/zfs/version/3 --11:14:30-- http://www.opensolaris.org/os/community/zfs/version/3 = `3' Resolving www.opensolaris.org... 72.5.124.63 Connecting to www.opensolaris.org|72.5.124.63|:80... connected. HTTP request sent, awaiting response... 404 11:14:30 ERROR 404: (no description). : sigma TS 8 $; When will this page appear? It would aslo be kind of neat if the page http://www.opensolaris.org/os/community/zfs/version/N existed so that the link from the zpool command above would work when opened perhaps giving an index of all the versions. --chris This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: where has all my space gone? (with zfs mountroot + b38)
Mark Shellenbaum wrote: ... So we have a bunch of stuff in the in-core delete queue, but no threads to process them. The fact that we don't have the threads is related to the bug that Tabriz is working on. Hi Mark, after installing your fixes from three days ago and (cough!) ensuring that my boot archive contained them, I then spent the next 7 or so hours waiting for the delete queue to be flushed. In that time my root disk (a Maxtor) decided it didn't like me much (I was asking it to do too much io) so zfs paniced... then a few single- user boots later (where each time the boot process was stuck in the fs-usr service, flushing the queue) and I'm finally back to having the disk space that I think I should have. My one remaining concern is that I'm not sure that I've got all my zfs bits totally sync'd with my kernel so I'll be bfuing again tomorrow just to make sure. Thanks for your help with this, I reallyreally appreciate it. best regards, James C. McPherson -- Solaris Datapath Engineering Data Management Group Sun Microsystems ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved
On 6/23/06, Roch [EMAIL PROTECTED] wrote: Joe Little writes: On 6/22/06, Bill Moore [EMAIL PROTECTED] wrote: Hey Joe. We're working on some ZFS changes in this area, and if you could run an experiment for us, that would be great. Just do this: echo 'zil_disable/W1' | mdb -kw We're working on some fixes to the ZIL so it won't be a bottleneck when fsyncs come around. The above command will let us know what kind of improvement is on the table. After our fixes you could get from 30-80% of that improvement, but this would be a good data point. This change makes ZFS ignore the iSCSI/NFS fsync requests, but we still push out a txg every 5 seconds. So at most, your disk will be 5 seconds out of date compared to what it should be. It's a pretty small window, but it all depends on your appetite for such windows. :) After running the above command, you'll need to unmount/mount the filesystem in order for the change to take effect. If you don't have time, no big deal. --Bill On Thu, Jun 22, 2006 at 04:22:22PM -0700, Joe Little wrote: On 6/22/06, Jeff Bonwick [EMAIL PROTECTED] wrote: a test against the same iscsi targets using linux and XFS and the NFS server implementation there gave me 1.25MB/sec writes. I was about to throw in the towel and deem ZFS/NFS has unusable until B41 came along and at least gave me 1.25MB/sec. That's still super slow -- is this over a 10Mb link or something? Jeff I think the performance is in line with expectation for, small file,single threaded, open/write/close NFS workload (nfs must commit on close). Therefore I expect : (avg file size) / (I/O latency). Joe does this formula approach the 1.25 MB/s ? To this day, I still don't know how to calculate the i/o latency. Average file size is always expected to be close to kernel page size for NASes -- 4-8k. Always tune for that. Nope, gig-e link (single e1000g, or aggregate, doesn't matter) to the iscsi target, and single gig-e link (nge) to the NFS clients, who are gig-e. Sun Ultra20 or AMD Quad Opteron, again with no difference. Again, the issue is the multiple fsyncs that NFS requires, and likely the serialization of those iscsi requests. Apparently, there is a basic latency in iscsi that one could improve upon with FC, but we are definitely in the all ethernet/iscsi camp for multi-building storage pool growth and don't have interest in a FC-based SAN. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Well, following Bill's advice and the previous note on disabling zil, I ran my test on a B38 opteron initiator and if you do a time on the copy from the client, 6250 8k files transfer at 6MB/sec now. If you watch the entire commit on the backend using zpool iostat 1 I see that it takes a few more seconds, and the actual rate there is 4MB/sec. Beats my best of 1.25MB/sec, and this is not B41. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Joe, you know this but for the benefit of others, I have to highlight that running any NFS server this way, may cause silent data corruption from client's point of view. Whenever a server keeps data in RAM this way and does not commit it to stable storage upon request from clients, that opens a time window for corruption. So a client writes to a page, then reads the same page, and if the server suffered a crash in between, the data may not match. So this is performance at the expense of data integrity. -r Yes.. ZFS in its normal mode has better data integrity. However, this may be a more ideal tradeoff if you have specific read/write patterns. In my case, I'm going to use ZFS initially for my tier2 storage, with nightly write periods (needs to be short duration rsync from tier1) and mostly read periods throughout the rest of the day. I'd love to use ZFS as a tier1 service as well, but then you'd have to perform as a NetApp does. Same tricks, same NVRAM or initial write to local stable storage before writing to backend storage. 6MB/sec is closer to expected behavior for first tier at the expense of reliability. I don't know what the answer is for Sun to make ZFS 1st Tier quality with their NFS implementation and its sync happiness. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: recommended hardware for a zfs/nfs NAS?
I was wondering if anyone could recommend hardware forr a ZFS-based NAS for home use. setup, so I'm looking at sparc or opteron. Ideally it would: a) run quiet (blade 100/150 is ok, x4100 ain't :) ) Not much space in a Blade 100/150 for multiple disks, but it is quiet and cheap. For NAS use, I'd think RAID would be ideal...RAID in a Blade 100/150 isn't ideal. b) take advantage of cheap disks ( ide/sata, unless scsi suddenly got affordable) come in around the 300-400 pounds mark Ummm... Don't need massive storage, it just needs to be reliable and reasonably Reliable = redundant Redundant = multiple disks fast - I was thinking of maybe a 2-way 100Gb mirror set. Fast = $ (typically) Speed costs money. How fast do you want to go? 8) Graphics are a complete non-issue. It only needs to saturate 100mbit (I'm not planning Saturating 100Mbit with a 64-bit CPU and redundant disks for $300-400 Pounds may be tough. to use it for anything else, so CPU isn't important). True, but ZFS compression=on works very well, and potentially on-disk encryption support [down the road] with compression may change your mind. Any used sun systems fit the bill, or should I be thinking of rolling my own opteron? Thanks! Roll your own with Opteron, new disks, etc.? ...not feasible in that price range, IMHO. Depending on what your current workstation is, I just might suggest adding disks there and beef it up. Nice NAS: A nice new Ultra 20 is inexpensive, quiet, and makes a good workstation. I'd suggest beefing up your existing workstation or replacing it with a U20 with two fast SATA disks ZFS mirrors or stripes. Cheap NAS: The last option I suggest is a Blade 100/150 with only one IDE drive - you could zip tie another in the box carefully I suppose - but this isn't the most reliable place to save your personal data. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] add_install_client and ZFS and SMF incompatibility
Hi, I just set up an install server on my notebook and of course all the installer data is on a ZFS volume. I love the zfs compression=on command! It seems that the standard ./add_install_client script from the S10U2 Tools directory creates an entry in /etc/vfstab for a loopback mount of the Solaris miniroot into the /tftpboot directory. Unfortunately, at boot time (I'm using Nevada build 39), the mount_all script tries to mount the loopback mount from /vfstab before ZFS gets its filesystems mounted. So the SMF filesystem/local method fails and I have to either mount all ZFS filesystems from hand, then re-run mount_all or replace the vfstab entry with a simple symlink. Which only works until you say add_install_client the next time. Is this a known issue? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: recommended hardware for a zfs/nfs NAS?
Saturating 100Mbit with a 64-bit CPU and redundant disks for $300-400 Pounds may be tough. Anything in the market can saturate 100Mbit easily; even with a single cheap IDE disk. The disks are generally a factor 5-10 faster than the 100Mbit network. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Priorities (was: ZFS on 32bit x86)
Darren J Moffat wrote: This is an @opensolaris.org alias it is about working together as a community and identifying problems and discovering solutions. I don't think it is at all appropriate to bring up Sun business choices here. Where that is appropriate is when Sun employees need to justify to their manager what they are working on. Darren brings up a good point here, and I thank him for making me remember that this isn't just a Sun-only developer list. However, this does bring to light a current problem: who is working on what, and how do the various sponsoring entities prioritize work? I've run into this problem on a couple of large Open Source projects, and we do need to make things a bit more transparent. We have the same problem over here in the Java group - how do we coordinate bugfixing and feature additions within a large community of developers and users, where developers may come from a variety of sources, and users may also be interested in providing not just feedback/RFEs, but actual sponsorship for developer time. Obviously, a developer is going to be most interested in producing work that their sponsor thinks is important (and, naturally, it is very possible for a developer to be his or her own sponsor). For a developer who doesn't have specific work directed by the sponsor, there needs to be some way for the community to prioritize work for that developer. That is, we as the community need to be able to let the developers know what is important to us, in an organized way. Personally, I'd like to have the ZFS community have an open bug and RFE system that looks like the one for Java (check out: http://bugs.sun.com/bugdatabase/index.jsp), or something that provides similar features. We (the users) would have a much easier way to hunt down things going on with developers' work, and developers would have a much easier time determining what is considered widely important to the user community. I've previously bitched about a lack of view of feature schedules for ZFS. This would solve that problem, also. How about it folks - would it be a good idea for me to explore what it takes to get such a bug/RFE setup implemented for the ZFS community on OpenSolaris.org? -Erik ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] add_install_client and ZFS and SMF incompatibility
Constantin Gonzalez [EMAIL PROTECTED] writes: Is this a known issue? Yes, I've raised this during ZFS Beta as SDR-0192. For some reason, I don't have a CR here. Rainer -- - Rainer Orth, Faculty of Technology, Bielefeld University ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] recommended hardware for a zfs/nfs NAS?
Dick Davies wrote: I was wondering if anyone could recommend hardware forr a ZFS-based NAS for home use. The 'zfs on 32-bit' thread has scared me of a mini-itx fanless setup, so I'm looking at sparc or opteron. Ideally it would: a) run quiet (blade 100/150 is ok, x4100 ain't :) ) b) take advantage of cheap disks ( ide/sata, unless scsi suddenly got affordable) c) come in around the 300-400 pounds mark Don't need massive storage, it just needs to be reliable and reasonably fast - I was thinking of maybe a 2-way 100Gb mirror set. Graphics are a complete non-issue. It only needs to saturate 100mbit (I'm not planning to use it for anything else, so CPU isn't important). Any used sun systems fit the bill, or should I be thinking of rolling my own opteron? Thanks! If it's just going to be a NAS, look for a AMD Sempron or Intel Celeron D (with 64-bit extension, so you'll need the LGA775 socket version) based motherboard with 4 SATA ports on-board - check the OpenSolaris folks for drivers. You should be able to get 4 mid-sized SATA drives (say in the 160GB range), and either RAID-Z or stripe/mirror them. That will be more than enough to keep a 100Mbit interface fully occupied, both reading and writing. Example: Socket 753 motherboard w/ 4 SATA ports ( Biostar NF4 4X-A7) $60 Sempron 2600+ $75 1GB RAM$50 mid-tower case $50 (4) 80GB SATA drives 4 @ $50 each CD-ROM$20 Total: $455 -Erik ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: recommended hardware for a zfs/nfs NAS?
Saturating 100Mbit with a 64-bit CPU and redundant disks for $300-400 Pounds may be tough. Anything in the market can saturate 100Mbit easily; even with a single cheap IDE disk. The disks are generally a factor 5-10 faster than the 100Mbit network. Casper Indeed, I stand corrected - must have been thinking 1000Mbit This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] recommended hardware for a zfs/nfs NAS?
Dick Davies wrote: I was wondering if anyone could recommend hardware forr a ZFS-based NAS for home use. The 'zfs on 32-bit' thread has scared me of a mini-itx fanless setup, so I'm looking at sparc or opteron. Ideally it would: I think the issue with ZFS on 32-bit is revolving around the efficient use of memory. If you have lots of memory, ZFS won't use it. By contrast, in 64-bit systems, when you have lots of memory, ZFS will use it. In either case, if you only have a little bit of memory, ZFS may dominate. [my simplification, I'll expect correction from the ZFS team, if I'm wrong :-)] a) run quiet (blade 100/150 is ok, x4100 ain't :) ) b) take advantage of cheap disks ( ide/sata, unless scsi suddenly got affordable) c) come in around the 300-400 pounds mark I don't think you will have much memory at that price. I'd go for 2 GBytes, no matter what processor you get. 512 MBytes is too little (I've got one of those here on the Ranch-net... for archive purposes only) -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved
Joe Little wrote: On 6/23/06, Roch [EMAIL PROTECTED] wrote: Joe, you know this but for the benefit of others, I have to highlight that running any NFS server this way, may cause silent data corruption from client's point of view. Whenever a server keeps data in RAM this way and does not commit it to stable storage upon request from clients, that opens a time window for corruption. So a client writes to a page, then reads the same page, and if the server suffered a crash in between, the data may not match. So this is performance at the expense of data integrity. I agree, as a RAS guy this line of reasoning makes me nervous... I've never known anyone who regularly made this trade-off and didn't get burned. Yes.. ZFS in its normal mode has better data integrity. However, this may be a more ideal tradeoff if you have specific read/write patterns. The only pattern this makes sense for is the write-only pattern. That pattern has near zero utility. In my case, I'm going to use ZFS initially for my tier2 storage, with nightly write periods (needs to be short duration rsync from tier1) and mostly read periods throughout the rest of the day. I'd love to use ZFS as a tier1 service as well, but then you'd have to perform as a NetApp does. Same tricks, same NVRAM or initial write to local stable storage before writing to backend storage. 6MB/sec is closer to expected behavior for first tier at the expense of reliability. I don't know what the answer is for Sun to make ZFS 1st Tier quality with their NFS implementation and its sync happiness. I know the answer will not compromise data integrity. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Fwd: Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved
I should copy this to the list.-- Forwarded message --On 6/23/06, Joe Little [EMAIL PROTECTED] wrote: I can post back to Roch what this latency is. I think the latency is aconstant regardless of the zil or not. all that I do by disabling thezil is that I'm able to submit larger chunks at a time (faster) thandoing 1k or worse blocks 3 times per file (the NFS fsync penalty) Please send the script ( I attached a modified version ) along with the result. They need to see how it works to trust ( or dispute ) the result. Rule #1 in performance tuning is do not trust the report from an unproven tool :) I have some comment on the output below. This is for a bit longer (16 trees of 6250 8k files, again with zil disabled): Generating report from biorpt.sh.rec ... === Top 5 I/O types ===DEVICETBLKs COUNT-sd2 W 2563095sd1 W 2562843 sd1 W 2 201sd2 W 2 197sd1 W32 185This part tells me majority of I/Os are 128KB writes on sd2 and sd1. === Top 5 worst I/O response time ===DEVICETBLKsOFFSETTIMESTAMPTIME.ms-sd2 W 175 52907067185.9338433559.55 sd1 W 256 52109768047.5619183097.21sd1 W 256 52115196854.9442533090.42sd1 W 256 52115222454.9442073090.23sd1 W64 52115248054.944241 3090.21Longest response time are more than 3 seconds, ouch. === Top 5 Devices with largest number of I/Os ===DEVICEREAD AVG.ms MBWRITE AVG.ms MBIOs SEEK-- -- - -- - sd16 0.340 4948 387.88413 4954 0%sd26 0.250 4230 387.07405 4236 0%cmdk0 23 8.110152 0.84017510% Average response time of 300ms is bad. I calculate SEEK rate on 512-byte block basis, since I/Os are mostly 128K, the seek rate is less than 1% ( 0 ), in other words I consider this as mostly sequential I/O. I guess it's debatable whether 512-byte-based calculation is meaningful. === Top 5 Devices with largest amount of data transfer === DEVICEREAD AVG.ms MBWRITE AVG.ms MB Tol.MB MB/s-- -- - -- - sd16 0.340 4948 387.884134134sd26 0.250 4230 387.074054054cmdk0 23 8.110152 0.84000=== Report saved in biorpt.sh.rec.rpt === I calculate the MB/s on per-second basis, meaning as long as there's at least one finished I/O on the device in a second, that second is used in calculating throughput. Tao biorpt.sh Description: Bourne shell script ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Priorities
How about it folks - would it be a good idea for me to explore what it takes to get such a bug/RFE setup implemented for the ZFS community on OpenSolaris.org? what's wrong with http://bugs.opensolaris.org/bugdatabase/index.jsp for finding bugs? i think we've been really good about taking reported problems and filing bugs - if others disagree, feel free to speak up. I think what you're asking for should be solved at the opensolaris community level (if its not already there) - not specifically for ZFS. eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: Fwd: Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved
On 6/23/06, Richard Elling [EMAIL PROTECTED] wrote: comment on analysis below...Tao Chen wrote: === Top 5 Devices with largest number of I/Os === DEVICEREAD AVG.ms MBWRITE AVG.ms MBIOs SEEK -- -- - -- - sd16 0.340 4948 387.88413 4954 0% sd26 0.250 4230 387.07 405 4236 0% cmdk0 23 8.110152 0.84017510% Average response time of 300ms is bad.Average is totally useless with this sort of a distribution. I'd suggest using a statistical package to explore the distribution.Just a few 3-second latencies will skew the average quite a lot.-- richardA summary report is nothing more than an indication of issues, or non-issue. So I agree that an average is just, an average.However, a few 3-second latencies will not spoil the result too much when there're more than 4000 I/Os sampled.The script saves the raw data in a .rec file, so you can run whatever statistic tool you have against it. I am currently more worried about how accurate and useful the raw data is, which is generated from a DTrace command in it. The raw record is in this format:- Timestamp(sec.microsec) - DeviceName- W/R- BLK_NO (offset) - BLK_CNT (I/O size)- IO_Time (I/O elapsed time)Tao ( msec.xx)Tao ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Priorities
EK == eric kustarz [EMAIL PROTECTED] writes: EK what's wrong with http://bugs.opensolaris.org/bugdatabase/index.jsp EK for finding bugs? Unless they've fixed it recently, the keywords search doesn't actually check against the Bugster keywords field. And the information presented is pretty limited. I think there are other issues, but these are the ones that annoy me the most. We (the OpenSolaris core team) have been working with the people who own the b.o.o code to fix some of the most glaring issues in the short-term. We've also been working with the Bugster folks to come up with a long-term plan that puts the external community on more-or-less equal footing with Sun employees. (The difference would be that you have to be a Sun employee to see confidential information, like customer account names.) EK I think what you're asking for should be solved at the opensolaris EK community level (if its not already there) - not specifically for EK ZFS. Yes, please. If we can't work within the b.o.o framework (which is not an obvious conclusion to me), then at least let's implement something for the entire site. Having community-specific bug functionality is just going to mean duplicated work and an uneven user experience. mike ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Priorities
On Jun 23, 2006, at 1:09 PM, eric kustarz wrote: How about it folks - would it be a good idea for me to explore what it takes to get such a bug/RFE setup implemented for the ZFS community on OpenSolaris.org? what's wrong with http://bugs.opensolaris.org/bugdatabase/index.jsp for finding bugs? There's a LOT of things wrong with how b.s.o is presented. For us non-Sun people, b.s.o is a one-way ticket, and only when we're lucky. First, yes, we can search on bug keywords and categories. Great. Used to need a Sunsolve acct for this. But once we do that, we can only hope that the bugs we want to read about in detail aren't comprised solely of See Notes and that's it. It's like seeing To be continued... right before the climax of a movie. Useless and frustrating. Second, while there is a way for Joe Random to submit a bug, there is zero way for Joe Random to interact with a bug. No voting to bump or drop a priority, no easy way to find hot topic bugs, no way to add one's own notes to the issue. I guess the desperate just have to clog the system with new bugs and have them marked as dups or badger someone with a sun.com email address to do it for us. Third, much of end-to-end bug servicing from a non-Sun perspective is still an uphill battle, from acronyms and terms used to policies and coordination of work, e.g. Is someone in Sun or elsewhere already working on this particular bug I'm interested in? and questions which would stem from that basic one. In summary, the bug/RFE process is still a mystery after 1 year, and who knows if it'll stay the ginormous tease that it currently is. Really, it's still no better than if one had a Sunsolve account in years' past. /dale ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Priorities
* Erik Trimble [EMAIL PROTECTED] [2006-06-23 11:15]: It is a good start (yes, I know it's an interface to Bugster, just as the Java one I pointed out is too - in fact, it's probably the same code). And, I'm certainly not complaining about how well people have been taking to and addressing bugs. However, there are some significant shortcomings with the interface that need to be fixed. And, yes, this is true w/r/t the OpenSolaris community as a whole. Basically, the problem is that the OpenSolaris portal itself is extremely primitive, and really needs a big overhaul to make the information we have easily accessible in a coherent manner. Please come to either of website-discuss or tools-discuss and share your thoughts for improvement (or at least de-primitivization). And, in addition, the bug portal isn't really useful for helping manage external (to Sun) contributors work. And it doesn't given any real insight into who's working on what, and what schedules might be. I am not sure whether you are commenting on the lack of publication from the internal database (which may have this information), or the lack of this information more generally. - Stephen -- Stephen Hahn, PhD Solaris Kernel Development, Sun Microsystems [EMAIL PROTECTED] http://blogs.sun.com/sch/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Priorities
On Fri, Jun 23, 2006 at 02:20:54PM -0400, Dale Ghent wrote: Second, while there is a way for Joe Random to submit a bug, there is zero way for Joe Random to interact with a bug. No voting to bump or drop a priority, no easy way to find hot topic bugs, no way to add one's own notes to the issue. I guess the desperate just have to clog the system with new bugs and have them marked as dups or badger someone with a sun.com email address to do it for us. Aside: we track bug severity and priority separately. The former is for customers to decide, and each customer may assert different severities for the same bug, while the latter is for engineers and management to decide. As for the See comments problem, us engineers have been told to stop doing that, so that you should see very few _new_ CRs of that sort. In summary, the bug/RFE process is still a mystery after 1 year, and who knows if it'll stay the ginormous tease that it currently is. I hope not. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Priorities (moving forums...)
Please refer all followups to this thread over to the [EMAIL PROTECTED] list. On Fri, 2006-06-23 at 11:27 -0700, Stephen Hahn wrote: * Erik Trimble [EMAIL PROTECTED] [2006-06-23 11:15]: It is a good start (yes, I know it's an interface to Bugster, just as the Java one I pointed out is too - in fact, it's probably the same code). And, I'm certainly not complaining about how well people have been taking to and addressing bugs. However, there are some significant shortcomings with the interface that need to be fixed. And, yes, this is true w/r/t the OpenSolaris community as a whole. Basically, the problem is that the OpenSolaris portal itself is extremely primitive, and really needs a big overhaul to make the information we have easily accessible in a coherent manner. Please come to either of website-discuss or tools-discuss and share your thoughts for improvement (or at least de-primitivization). And, in addition, the bug portal isn't really useful for helping manage external (to Sun) contributors work. And it doesn't given any real insight into who's working on what, and what schedules might be. I am not sure whether you are commenting on the lack of publication from the internal database (which may have this information), or the lack of this information more generally. - Stephen As several others have pointed out, the current Bug/RFE interface is seriously broken for non-Sun users, and is missing quite a bit of functionality (both in the interface and in the data being stored) even for internal Sun folks. Off the top of my head: 1. The categories for bug submission and searching really need to be rethought. At the minimum, the search function should probably be more in line with the various communities on OS.org. That is, you probably should have main categories which line up with each of the O.S. communities, with subcategories being more specific. 2. Viewing bugs is a mess - access varies widely across external and internal users, bugs aren't consistently found/displayed, etc. 3. There is no development schedule information stored/available. e.g. when a particular Bug/RFE is expected to be fixed/included. 4. Who is working on a bug/RFE isn't available. 5. External users are effectively shut out of the bug/RFE database. It should be possible for a (properly authorized) external user to both update a bug status, and/or take ownership of the bug/RFE. 6. A better community-centric bug/RFE prioritization method needs to be developed. 7. Bug/RFE bounties need to be considered, along with a method of funding and payout for them 8. The UI for the whole Bug/RFE setup needs a drastic overhaul to make it simpler to view multiple/related bugs. -- Erik Trimble Java System Support Mailstop: usca14-102 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Bandwidth disparity between NFS and ZFS
While dd'ing to an nfs filesystem, half of the bandwidth is unaccounted for. What dd reports amounts to almost exactly half of what zpool iostat or iostat show; even after accounting for the overhead of the two mirrored vdevs. Would anyone care to guess where it may be going? (This is measured over 10 second intervals. For 1 second intervals, the bandwidth to the disks jumps around from 40MB/s to 240MB/s) With a local dd, everything adds up. This is with a b41 server, and a MacOS 10.4 nfs client. I have verified that the bandwidth at the network interface is approximately that reported by dd, so the issue would appear to be within the server. Any suggestions would be welcome. Chris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss