Re: [zfs-discuss] X4540
On Jul 10, 2008, at 7:05 AM, Ross wrote: Oh god, I hope not. A patent on fitting a card in a PCI-E slot, or using nvram with RAID (which raid controllers have been doing for years) would just be rediculous. This is nothing more than cache, and even with the American patent system I'd have though it hard to get that past the obviousness test. How quickly they forget. Take a look at the Prestoserve User's Guide for a refresher... http://docs.sun.com/app/docs/doc/801-4896-11 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS NFS cannot write
Check the permissions on the directory... On Jun 5, 2008, at 1:06 PM, Gary Leong wrote: This is the first time I tried nfs with zfs. I shared the zfs filesystem with nfs, but i can't write to the files though i mount it as read-write. This is for Solaris 10 update 4. I wonder if there is a bug? ---server (sdw2-2) #zfs create -o sharenfs=on data/nfstest #zfs get all data/nfstest NAME PROPERTY VALUE SOURCE data/nfstest type filesystem - data/nfstest creation Thu Jun 5 13:22 2008 - data/nfstest used 40.7K - data/nfstest available 15.4T - data/nfstest referenced 40.7K - data/nfstest compressratio 1.00x - data/nfstest mountedyes- data/nfstest quota none default data/nfstest reservationnone default data/nfstest recordsize 128K default data/nfstest mountpoint /data/nfstest default data/nfstest sharenfs on local data/nfstest checksum on default data/nfstest compressionoffdefault data/nfstest atime on inherited from data data/nfstest deviceson default data/nfstest exec on default data/nfstest setuid on default data/nfstest readonly offdefault data/nfstest zoned offdefault data/nfstest snapdirhidden default data/nfstest aclmodegroupmask default data/nfstest aclinherit secure default data/nfstest canmount on default data/nfstest shareiscsi offdefault data/nfstest xattr on default #share - /data/nfstest rw client #mount -o rw sdw2-2:/data/nfstest /data/nfsmount/ #mount -o rw sdw2-2:/data/nfstest /sdw2nfs/ touch /sdw2nfs/dummy.txt touch: /sdw2nfs/dummy.txt cannot create This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] NFS4-sharing-ZFS issues
On May 21, 2008, at 1:43 PM, Will Murnane wrote: I'm looking at implementing home directories on ZFS. This will be about 400 users, each with a quota. The ZFS way of doing this AIUI is create one filesystem per user, assign them a quota and/or reservation, and set sharenfs=on. So I tried it: # zfs create local-space/test # zfs set sharenfs=on local-space/test # zfs create local-space/test/foo # zfs create local-space/test/foo/bar # share - /export/local-space/test rw - /export/local-space/test/foo rw - /export/local-space/test/foo/bar rw All good so far. Now, I understand that with nfs in general, the child filesystems will not be mounted, and I do see this behavior on Linux. If I specify nfs4, the children are mounted as I expected: # mount -t nfs4 server:/export/local-space/test /mnt/ # cd /mnt/ # ls foo # ls foo bar Okay, all is well. Try the same thing on a Solaris client, though, and it doesn't work: # mount -o vers=4 ds3:/export/local-space/test /mnt/ # cd mnt # ls foo # ls foo nothing This behavior was a recent addition to the Solaris client and therefore are seeing this lack of functionality. Any recent Solaris Express or OpenSolaris install will have the functionality you desire. Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Not all filesystems shared over NFS
On May 18, 2008, at 3:39 AM, Johan Kooijman wrote: Morning all, situation is as follows: OpenSolaris NFS server, Linux client. I've created a ZFS filesystem and shared it over NFS: -bash-3.2# zfs list | grep vz datatank/vz 126M 457G 126M /datatank/vz datatank/vz/private 37K 457G19K /datatank/ vz/private datatank/vz/private/28999 18K 457G18K /datatank/ vz/private/28999 datatank/vz/root 37K 457G19K /datatank/ vz/root datatank/vz/root/2899918K 457G18K /datatank/ vz/root/28999 -bash-3.2# cat /etc/dfs/sharetab /datatank/vz/root/28999 - nfs anon=0,sec=sys,[EMAIL PROTECTED]/24 /datatank/vz/root - nfs anon=0,sec=sys,[EMAIL PROTECTED]/24 /datatank/vz/private- nfs anon=0,sec=sys,[EMAIL PROTECTED]/24 /datatank/vz/private/28999 - nfs anon=0,sec=sys,[EMAIL PROTECTED]/24 /datatank/vz- nfs anon=0,sec=sys,[EMAIL PROTECTED]/24 So far, so good. I can mount it on my linux machine: [EMAIL PROTECTED] vz]# mount -t nfs 192.168.178.31:/datatank/vz on /vz type nfs (rw,addr=192.168.178.31) As you can see ,I've created a file system datatank/vz/root/28999, which should appear on the Linux client. It doesn't: [EMAIL PROTECTED] vz]# ls -l /vz/private/ total 0 It does on the server: -bash-3.2# ls -l /datatank/vz/private/ total 3 drwxr-xr-x 2 root root 2 May 18 09:21 28999 Can anyone give me some directions on this? I believe that you will need to mount those filesystems directly. In later versions of the OpenSolaris NFsv4 client, those filesystems will be mounted automatically. I believe this feature is also available on later versions of the Linux NFSv4 client as well but I don't happen to remember the specifics. Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] share zfs hierarchy over nfs
On Apr 29, 2008, at 9:35 PM, Tim Wood wrote: Hi, I have a pool /zfs01 with two sub file systems /zfs01/rep1 and / zfs01/rep2. I used [i]zfs share[/i] to make all of these mountable over NFS, but clients have to mount either rep1 or rep2 individually. If I try to mount /zfs01 it shows directories for rep1 and rep2, but none of their contents. On a linux machine I think I'd have to set the [i]no_sub_tree_check [/i] flag in /etc/exports to let an NFS mount move through the different exports, but I'm just beginning with solaris, so I'm not sure what to do here. I found this post in the forum: http://opensolaris.org/jive/ thread.jspa?messageID=169354#169354 but that makes it sound like this issue was resolved by changing the NFS client behavior in solaris. Since my NFS client machines are going to be linux machines that doesn't help me any. My understanding is that the linux client has the same capabilities of the Solaris client in that it can traverse server side mount points dynamically. Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] NFS async and ZFS zil_disable
On Apr 22, 2008, at 12:16 PM, msl wrote: Hello all, I think the two options are very similar in the cliente side view, but i want to hear from the experts... So, somebody can talk a little about the two options? We have two different layers here, i think: 1) The async from the protocol stack, and the other... 2) From the filesystem point of view. What makes me think that the first option could be more quick for the client, because the ack is in a higher level (NFS protocol). The NFS client has control over WRITE requests in that it may ask to have them done async and then follow it with a COMMIT request to ensure the data is in stable-storage/disk. However, the NFS client has no control over namespace operations (file/directory create/remove/rename). These must be done synchronously -- no way for the client to direct the operational behavior of the server in these cases. Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] NFS async and ZFS zil_disable
On Apr 22, 2008, at 2:00 PM, msl wrote: On Apr 22, 2008, at 12:16 PM, msl wrote: Hello all, I think the two options are very similar in the cliente side view, but i want to hear from the experts... So, somebody can talk a little about the two options? We have two different layers here, i think: 1) The async from the protocol stack, and the other... 2) From the filesystem point of view. What makes me think that the first option could be more quick for the client, because the ack is in a higher level (NFS protocol). The NFS client has control over WRITE requests in that it may ask to have them done async and then follow it with a COMMIT request to ensure the data is in stable-storage/disk. Great information... so, the sync option on the server (export) side is just a possible option for the client requests? I mean, the sync/async option is a requirement in a nfs write request operation? When i did the question, i was talking about the server side, i did not know about the possibility of the client requests sync/async. The Solaris NFS server does not offer a method to specify sync/ async behavior for NFS WRITE requests. The Solaris server will do what the client asks it to do. However, the NFS client has no control over namespace operations (file/directory create/remove/rename). These must be done synchronously -- no way for the client to direct the operational behavior of the server in these cases. If i understand well, here the zil_disable is is a problem for the NFS semantics... i mean, the service will be compromise, because the nfs client can't control the namespace operations. What is a big diff in my initial question. Yes, zil_disable can be a problem as described by Eric here: http://blogs.sun.com/erickustarz/entry/zil_disable Spencer Thanks a lot for your comments! Anybody else? ps.: how can i enable async in nfs server on solaris? just add async for the export options? See above; not possible. Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problem with sharing multiple zfs file systems
On Nov 27, 2007, at 1:36 AM, Anton B. Rang wrote: Given that it will be some time before NFSv4 support, let alone NFSv4 support for mount point crossing, in most client operating systems ... what obstacles are in the way of constructing an NFSv3 server which would 'do the right thing' transparently to clients so long as the file systems involved were within a single ZFS pool? So far I can think of (a) clients expect inode numbers to be unique -- this could be solved by making them (optionally) unique within a pool; (b) rename and link semantics depend on the file system -- for rename this is easy, for link it might require a cross-file- system hard link object, which is certainly doable. This would go a long way towards making ZFS-with-many-filesystems approaches more palatable. (Hmmm, how does CIFS support deal with the many-filesystems problem today?) What you describe is the nohide option that was first introduced in Irix and picked up in the Linux NFS server implementation. As you say, inode number uniqueness would be one key issue along with the others you mention. I have a bias but I would rather see effort placed into dealing with issues that stand in the way of effective deployment and use of NFSv4; it will be more effective at CIFS/NFS co-existence and deals with a number of other issues that can not be easily solved with NFSv3. Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problem with sharing multiple zfs file systems
On Nov 27, 2007, at 9:48 AM, Richard Elling wrote: Anton B. Rang wrote: Given that it will be some time before NFSv4 support, let alone NFSv4 support for mount point crossing, in most client operating systems ... what obstacles are in the way of constructing an NFSv3 server which would 'do the right thing' transparently to clients so long as the file systems involved were within a single ZFS pool? So far I can think of (a) clients expect inode numbers to be unique -- this could be solved by making them (optionally) unique within a pool; (b) rename and link semantics depend on the file system -- for rename this is easy, for link it might require a cross-file-system hard link object, which is certainly doable. This would go a long way towards making ZFS-with-many-filesystems approaches more palatable. (Hmmm, how does CIFS support deal with the many-filesystems problem today?) One solution, which is only about 18 years old, is automount. Does anyone know how ubiquitous automount clients are? I think you have answered your own question. 18 years and we have to ask the question? It must not be very ubiquitous. :-) The fact is that automount has ended up meaning different things to the various operating environments. The maps and options are slightly different in implementations and therefore it leads to not being an effective method of managing a namespace (along with a host of other issues that stand in the way). We need to move away from automount usage and on to namespaces that are managed at the server side of things and integrated in a way to allow admins to effectively manage large environments. NFSv4 and the client's ability to move from filesystem to filesystem at the same server and then to be referred to another server gives us a reasonable base to start from. There is an effort that is going on to define the back end server namespace management and it turns out that it will be discussed at the next IETF meeting. If you are interested in the ideas to date, you can check out the following internet draft: http://www.ietf.org/internet-drafts/draft-tewari-federated-fs- protocol-00.txt Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problem with sharing multiple zfs file systems
On Nov 21, 2007, at 2:11 PM, Simon Gao wrote: Here is one issue I am running into when setting up a new NFS server to share several zfs file systems. I created following zfs file system from a zfs pool called bigpool. The bigpool is the top level file system and mounted as /export/ bigpool. file system mount point bigpool /export/bigpool bigpool/zfs1 /export/bigpool/zfs1 bigpool/zfs2 /export/bigpool/zfs2 All directories under /export are owned by a group called users. Also group users have write access to them. Next, I exported bigpool (zfs1 and zfs2 inherited from bigpool) as NFS share. zfs set sharenfs=on bigpool On a Linux client, I can mounte all shares directly without problem. If I mounted /export/bigpool to /mnt/nfs_share on the Linux client. The ownership and permissions on /mnt/nfs_share match to /export/bigpool on the nfs server. However, permissions on /mnt/nfs_share/zfs1 or /mnt/nfs_share/zfs2 are not inherited correctly. The group ownership is switched to root on /mnt/nfs_share/zfs1,zfs2 and write permission is removed. I expect /mnt/nfs_share/zfs1 should match /export/bigpool/zfs1, so does for zfs2. Why ownership and permissions do not get inherited? When I directly mount /export/bigpool/zfs1 to /mnt/nfs_share/zfs1, then ownership and permissions match again. Since with ZFS, creating and using multiple file systems are recommended practice, does it mean that it will be lots of more trouble in managing NFS shares on the system? Is there a way to only export top level file system and let all permissions and ownership flow down correctly on client side? Or maybe there are some special settings out there to solve my problem? Any help is appreciated. What you are describing is general NFS behavior. Nothing special about ZFS usage here. When mounting /export/bigpool at the client, the client observes the underlying directory /export/bigpool/zfs1 and hence the change in ownership and permissions. When the client mounts the path /export/bigpool/zfs1, it is accessing that filesystems directory and has the ownership and other attributes that are expected of that filesystem. With an NFSv4 client that provides 'mirror mounts', the client will be able to mount /export/bigpool and have the underlying filesystems automatically mounted when accessed and the behavior you describe will be alleviated by the access of the desired filesystem. Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] nfs-ownership
On Oct 17, 2007, at 11:25 AM, Claus Guttesen wrote: Did you mount both the parent and all the children on the client ? No, I just assumed that the sub-partitions would inherit the same uid/gid as the parent. I have done a chown -R. Ahhh, the issue is not permissions, but how the NFS server sees the various directories to share. Each dataset in the zpool is seen as a separate FS from the OS perspective; each is a separate NFS share. In which case each has to be mounted separately on the NFS client. Thank you for the clarification. When mounting the same partitions from a windows-client I get r/w access to both the parent- and child-partition. Will it be possible to implement such a feature in nfs? NFSv4 allows the client visibility into the shared filesystems at the server. It is up to the client to mount or access those individual filesystems. The Solaris client is being updated with this functionality (we have named it mirror-mounts); I don't know about the bsd client's ability to do the same. Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] nfs-ownership
Claus, Is the mount using NFSv4? If so, there is likely a midguided mapping of the user/groups between the client and server. While not including BSD info, there is a little bit on NFSv4 user/group mappings at this blog: http://blogs.sun.com/nfsv4 Spencer On Oct 16, 2007, at 2:11 PM, Claus Guttesen wrote: Hi. I have created some zfs-partitions. First I create the home/user-partitions. Beneath that I create additional partitions. Then I have do a chown -R for that user. These partitions are shared using the sharenfs=on. The owner- and group-id is 1009. These partitions are visible as the user assigned above. But when I mount the home/user partition from a FreeBSD-client, only the top-partiton has the proper uid- and guid-assignment. The partitons beneath are assigned to the root/wheel (uid 0 and gid 0 on FreeBSD). Am I doing something wrong From nfs-client: ls -l spool drwxr-xr-x 181 print print 181 16 oct 21:00 2007-10-16 drwxr-xr-x2 rootwheel 2 11 oct 11:07 c8 From nfs-server: ls -l spool drwxr-xr-x 185 print print 185 Oct 16 21:10 2007-10-16 drwxr-xr-x 6 print print 6 Oct 13 17:10 c8 The folder 2007-10-16 is a regular folder below the nfs-mounted partition, c8 is a zfs-partition. -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Fileserver performance tests
On Oct 10, 2007, at 8:41 AM, Luke Lonergan wrote: Hi Eric, On 10/10/07 12:50 AM, eric kustarz [EMAIL PROTECTED] wrote: Since you were already using filebench, you could use the 'singlestreamwrite.f' and 'singlestreamread.f' workloads (with nthreads set to 20, iosize set to 128k) to achieve the same things. Yes but once again we see the utility of the zero software needed approach to benchmarking! The dd test rules for general audience on the mailing lists IMO. The other goodness aspect of the dd test is that the results are indisputable because dd is baked into the OS. And filebench will be in the next build in the same way. Spencer That all said - we don't have a simple dd benchmark for random seeking. With the latest version of filebench, you can then use the '-c' option to compare your results in a nice HTML friendly way. That's worth the effort. - Luke ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Fileserver performance tests
On Oct 10, 2007, at 2:56 AM, Thomas Liesner wrote: Hi Eric, Are you talking about the documentation at: http://sourceforge.net/projects/filebench or: http://www.opensolaris.org/os/community/performance/filebench/ and: http://www.solarisinternals.com/wiki/index.php/FileBench ? i was talking about the solarisinternals wiki. I can't find any documentation at the sourceforge site and the opensolaris site refers to solarisinternals for a more detailed documentation. The INSTALL document within the distribution refers to solarisinternals and pkgadd which of course isn't working without providing a package ;) This is the output of make within filebench/filebench: [EMAIL PROTECTED] # make make: Warning: Can't find `../Makefile.cmd': Datei oder Verzeichnis nicht gefunden make: Fatal error in reader: Makefile, line 27: Read of include file `../Makefile.cmd' failed I am working to clean that up and will be posting binaries as well. Spencer Before looking at the results, decide if that really *is* your expected workload Sure enough i have to dig deeper into the filebench workloads and create my own workload to represent my expected workload even better, but the tasks within the fileserver workload are already quite representative (i could skip the append test though...) Regards, Tom This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] problem: file copy's aren't getting the current file
On Aug 30, 2007, at 12:35 PM, Richard Elling wrote: NFS clients can cache. This cache can be loosely synchronized for performance reasons. See the settings for actimeo and related variables in mount_nfs(1m) The NFS client will getattr/OPEN at the point where the application opens the file (close to open consistency) and actimeo will not change that behavior. The nocto mount option will disable that. If the client is copying an older version of the file, then the client is either not checking the file's modification time correctly or the NFS server is not telling the truth. Spencer -- richard Russ Petruzzelli wrote: I'm not sure if this is a zfs, zones, or solaris/nfs problem... So I'll start on this alias... Problem: I am seeing file copies from one machine to another grab an older file. (Worded differently: The cp command is not getting the most recent file.) For instance, On a T2000, Solaris 10u3, with zfs setup, and a zone I try to copy in a file from my swan home directory to a directory in the zone ... The file copied, is not the file currently in my home directory. It is an older version of it. I've suspected this for some time (months) but today was the first time I could actually see it happen. The niagara box seems to pull the file from some cache, but where? Thanks in advance for any pointers or configuration advice. This is wreaking havoc on my testing. Russ - --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cluster File System Use Cases
On Jul 13, 2007, at 2:20 AM, Richard L. Hamilton wrote: Bringing this back towards ZFS-land, I think that there are some clever things we can do with snapshots and clones. But the age-old problem of arbitration rears its ugly head. I think I could write an option to expose ZFS snapshots to read-only clients. But in doing so, I don't see how to prevent an ill-behaved client from clobbering the data. To solve that problem, an arbiter must decide who can write where. The SCSI rotocol has almost nothing to assist us in this cause, but NFS, QFS, and pxfs do. There is room for cleverness, but not at the SCSI or block level. -- richard Yeah; ISTR that IBM mainframe complexes with what they called shared DASD (DASD==Direct Access Storage Device, i.e. disk, drum, or the like) depended on extent reserves. IIRC, SCSI dropped extent reserve support, and indeed it was never widely nor reliably available anyway. AFAIK, all SCSI offers is reserves of an entire LUN; that doesn't even help with slices, let alone anything else. Nor (unlike either the VTOC structure on MVS nor VxFS) is ZFS extent-based anyway; so even if extent reserves were available, they'd only help a little. Which means, as he says, some sort of arbitration. I wonder whether the hooks for putting the ZIL on a separate device will be of any use for the cluster filesystem problem; it almost makes me wonder if there could be any parallels between pNFS and a refactored ZFS. We are busy layering pNFS on ZFS in the NFSv4.1 project and hope to allow for coordination with client access and other interesting features. Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Rsync update to ZFS server over SSH faster than over NFS?
On May 25, 2007, at 11:22 AM, Roch Bourbonnais wrote: Le 22 mai 07 à 01:11, Nicolas Williams a écrit : On Mon, May 21, 2007 at 06:09:46PM -0500, Albert Chin wrote: But still, how is tar/SSH any more multi-threaded than tar/NFS? It's not that it is, but that NFS sync semantics and ZFS sync semantics conspire against single-threaded performance. Hi Nic, I don't agree with the blanket statement. So to clarify. There are 2 independant things at play here. a) NFS sync semantics conspire againts single thread performance with any backend filesystem. However NVRAM normally offers some releaf of the issue. b) ZFS sync semantics along with the Storage Software + imprecise protocol in between, conspire againts ZFS performance of some workloads on NVRAM backed storage. NFS being one of the affected workloads. The conjunction of the 2 causes worst than expected NFS perfomance over ZFS backend running __on NVRAM back storage__. If you are not considering NVRAM storage, then I know of no ZFS/NFS specific problems. Issue b) is being delt with, by both Solaris and Storage Vendors (we need a refined protocol); Issue a) is not related to ZFS and rather fundamental NFS issue. Maybe future NFS protocol will help. Net net; if one finds a way to 'disable cache flushing' on the storage side, then one reaches the state we'll be, out of the box, when b) is implemented by Solaris _and_ Storage vendor. At that point, ZFS becomes a fine NFS server not only on JBOD as it is today , both also on NVRAM backed storage. I will add a third category, response time of individual requests. One can think of the ssh stream of filesystem data as one large remote procedure call that says put this directory tree and contents on the server. The time it takes is essentially the time it takes to transfer the filesystem data. The latency on the very last of the request, amortized across the entire stream is zero. For the NFS client, there is response time injected at each request and the best way to amortize this is through parallelism and that is very difficult for some applications. Add the items in a) and b) and there is a lot to deal with. Not insurmountable but it takes a little more effort to build an effective solution. Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS+NFS on storedge 6120 (sun t4)
On Apr 21, 2007, at 9:46 AM, Andy Lubel wrote: so what you are saying is that if we were using NFS v4 things should be dramatically better? I certainly don't support this assertion (if it was being made). NFSv4 does have some advantages from the perspective of enabling more aggressive file data caching; that will enable NFSv4 to outperform NFSv3 in some specific workloads. In general, however, NFSv4 performs similarly to NFSv3. Spencer do you think this applies to any NFS v4 client or only Suns? -Original Message- From: [EMAIL PROTECTED] on behalf of Erblichs Sent: Sun 4/22/2007 4:50 AM To: Leon Koll Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] Re: ZFS+NFS on storedge 6120 (sun t4) Leon Koll, As a knowldegeable outsider I can say something. The benchbark (SFS) page specifies NFSv3,v2 support, so I question whether you ra n NFSv4. I would expect a major change in performance just to version 4 NFS version and ZFS. The benchmark seems to stress your configuration enough that the latency to service NFS ops increases to the point of non serviced NFS requests. However, you don't know what is the byte count per IO op. Reads are bottlenecked against rtt of the connection and writes are normally sub 1K with a later commit. However, many ops are probably just file handle verifications which again are limited to your connection rtt (round trip time). So, my initial guess is that the number of NFS threads are somewhat related to the number of non state (v4 now has state) per file handle op. Thus, if a 64k ZFS block is being modified by 1 byte, COW would require a 64k byte read, 1 byte modify, and then allocation of another 64k block. So, for every write op, you COULD be writing a full ZFS block. This COW philosphy works best with extending delayed writes, etc where later reads would make the trade-off of increased latency of the larger block on a read op versus being able to minimize the number of seeks on the write and read. Basicly increasing the block size from say 8k to 64K. Thus, your read latency goes up just to get the data off the disk and minimizing the number of seeks, and dropping the read ahead logic for the needed 8k to 64k file offset. I do NOT know that THAT 4000 IO OPS load would match your maximal load and that your actual load would never increase past 2000 IO ops. Secondly, jumping from 2000 to 4000 seems to be too big of a jump for your environment. Going to 2500 or 3000 might be more appropriate. Lastly wrt the benchmark, some remnants (NFS and/or ZFS and/or benchmark) seem to remain that have a negative impact. Lastly, my guess is that this NFS and the benchark are stressing small partial block writes and that is probably one of the worst case scenarios for ZFS. So, my guess is the proper analogy is trying to kill a nat with a sledgehammer. Each write IO OP really needs to be equal to a full size ZFS block to get the full benefit of ZFS on a per byte basis. Mitchell Erblich Sr Software Engineer - Leon Koll wrote: Welcome to the club, Andy... I tried several times to attract the attention of the community to the dramatic performance degradation (about 3 times) of NFZ/ZFS vs. ZFS/UFS combination - without any result : a href=http:// www.opensolaris.org/jive/thread.jspa?messageID=98592[1]/a , a href=http://www.opensolaris.org/jive/thread.jspa?threadID=24015; [2]/a. Just look at two graphs in my a href=http://napobo3.blogspot.com/ 2006/08/spec-sfs-bencmark-of-zfsufsvxfs.htmlposting dated August, 2006/a to see how bad the situation was and, unfortunately, this situation wasn't changed much recently: http:// photos1.blogger.com/blogger/7591/428/1600/sfs.1.png I don't think the storage array is a source of the problems you reported. It's somewhere else... [i]-- leon[/i] This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: Re[2]: [zfs-discuss] Multi-tera, small-file filesystems
On Apr 18, 2007, at 6:44 PM, Robert Milkowski wrote: Hello Carson, Thursday, April 19, 2007, 1:22:17 AM, you wrote: CG Robert Milkowski wrote: We did some tests with Linux (2.4 and 2.6) and it seems there's a problem if you have thousands of nfs file systems - they won't all be mounted automatically, and even doing it manually (or in a script with a sleep between each mount) there seems to be a limit below 1000. We did not investigate further as in that environment all nfs clients are Solaris server (x86, sparc) and we see no problems with thousands of file systems. CG The Linux limitation is possibly due to privileged port exhaustion with CG TCP mounts, FYI. We've been thinking about the same lines (1024-some services already running). But still with few hundreds nfs entries Linux time outs end you end up with some file system not mounted, etc. See the Linux NFS FAQ at http://nfs.sourceforge.net/ Question/Answer B3. There is a limit of a few hundred NFS mounts. Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Cluster File System Use Cases
The pNFS protocol doesn't preclude varying meta-data server designs and their various locking strategies. As an example, there has been work going on at University of Michigan/ CITI to extend the Linux/NFSv4 implementation to allow for a pNFS server on top of the Polyserve solution. Spencer On Mar 5, 2007, at 2:37 PM, Rayson Ho wrote: I read this paper on Sunday. Seems interesting: The Architecture of PolyServe Matrix Server: Implementing a Symmetric Cluster File System http://www.polyserve.com/requestinfo_formq1.php?pdf=2 What interested me the most is that the metadata and lock are spread across all the nodes. I read the Parallel NFS (pNFS) presentation, and seems like pNFS still has the metadata on one server... (Lisa, correct me if I am wrong). http://opensolaris.org/os/community/os_user_groups/frosug/pNFS/ FROSUG-pNFS.pdf Rayson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why number of NFS threads jumps to the max value?
On Mar 5, 2007, at 11:17 AM, Leon Koll wrote: On 3/5/07, Roch - PAE [EMAIL PROTECTED] wrote: Leon Koll writes: On 3/5/07, Roch - PAE [EMAIL PROTECTED] wrote: Leon Koll writes: On 2/28/07, Roch - PAE [EMAIL PROTECTED] wrote: http://bugs.opensolaris.org/bugdatabase/view_bug.do? bug_id=6467988 NFSD threads are created on a demand spike (all of them waiting on I/O) but thentend to stick around servicing moderate loads. -r Hello Roch, It's not my case. NFS stops to service after some point. And the reason is in ZFS. It never happens with NFS/UFS. Shortly, my scenario: 1st SFS run, 2000 requested IOPS. NFS is fine, ;low number of threads. 2st SFS run, 4000 requested IOPS. NFS cannot serve all requests, no of threads jumps to max 3rd SFS run, 2000 requested IOPS. NFS cannot serve all requests, no of threads jumps to max. System cannot get back to the same results under equal load (1st and 3rd). Reboot between 2nd and 3rd doesn't help. The only persistent thing is a directory structure that was created during the 2nd run (in SFS higher requested load - more directories/files created). I am sure it's a bug. I need help. I don't care that ZFS works N times worse than UFS. I really care that after heavy load everything is totally screwed. Thanks, -- Leon Hi Leon, How much is the slowdown between 1st and 3rd ? How filled is Typical case is: 1st: 1996 IOPS, latency 2.7 3rd: 1375 IOPS, latency 37.9 The large latency increase is the side effect of requesting more than what can be delivered. Queue builds up and latency follow. So it should not be the primary focus IMO. The Decrease in IOPS is the primary problem. One hypothesis is that over the life of the FS we're moving toward spreading access to the full disk platter. We can imagine some fragmentation hitting as well. I'm not sure how I'd test both hypothesis. the pool at each stage ? What does 'NFS stops to service' mean ? There is a lot of error messages on the NFS(SFS) client : sfs352: too many failed RPC calls - 416 good 27 bad sfs3132: too many failed RPC calls - 302 good 27 bad sfs3109: too many failed RPC calls - 533 good 31 bad sfs353: too many failed RPC calls - 301 good 28 bad sfs3144: too many failed RPC calls - 305 good 25 bad sfs3121: too many failed RPC calls - 311 good 30 bad sfs370: too many failed RPC calls - 315 good 27 bad Can this be timing out or queue full drops ? Might be a side effect of SFS requesting more than what can be delivered. I don't know was it timeouts or full drops. SFS marked such runs as INVALID. I can run whatever is needed to help to investigate the problem. If you have a D script that will tell us more, please send it to me. I appreciate your help. The failed RPCs are indeed a result of the SFS client timing out the requests it has made to the server. The server is being overloaded for its capabilities and the benchmark results show that. I agree with Roch that as the SFS benchmark adds more data to the filesystems that additional latency is added and for this particular configuration and the server is being over-driven. The helpful thing would be to run smaller increments in the benchmark to determine where the response time increases beyond what the SFS workload can handle. There have been a number of changes in ZFS recently that should help with SFS performance measurement but fundamentally it all depends on the configuration of the server (number of spindles and CPU available). So there may be a limit that is being reached based on the hardware configuration. What is your real goal here, Leon? Are you trying to gather SFS data to fit into sizing of a particular solution or just trying to gather performance results for other general comparisons? There are certainly better benchmarks than SFS for either sizing and comparison reasons. Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] suggestion: directory promotion to filesystem
On Feb 21, 2007, at 12:11 PM, Matthew Ahrens wrote: Adrian Saul wrote: Not hard to work around - zfs create and a mv/tar command and it is done... some time later. If there was say a zfs graft directory newfs command, you could just break of the directory as a new filesystem and away you go - no copying, no risking cleaning up the wrong files etc. Yep, this idea was previously discussed on this list -- search for zfs split and see the following RFE: 6400399 want zfs split Note that current draft specification for NFSv4.1 has the capability to split a filesystem such that the NFSv4.1 client will recognize it. Then the new filesystem can be migrated to another server is needed. Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Honeycomb
On Wed, Dennis wrote: Hello, I just wanted to know if there are any news regarding Project Honeycomb? Wasn?? it announced for end of 2006? Is there still development? http://www.sun.com/storagetek/honeycomb/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43
On Fri, Ben Rockwood wrote: eric kustarz wrote: So i'm guessing there's lots of files being created over NFS in one particular dataset? We should figure out how many creates/second you are doing over NFS (i should have put a timeout on the script). Here's a real simple one (from your snoop it looked like you're only doing NFSv3, so i'm not tracking NFSv4): #!/usr/sbin/dtrace -s rfs3_create:entry, zfs_create:entry { @creates[probefunc] = count(); } tick-60s { exit(0); } Eric, I love you. Running this bit of DTrace reveled more than 4,000 files being created in almost any given 60 second window. And I've only got one system that would fit that sort of mass file creation: our Joyent Connector products Courier IMAP server which uses Maildir. As a test I simply shutdown Courier and unmounted the mail NFS share for good measure and sure enough the problem vanished and could not be reproduced. 10 minutes later I re-enabled Courier and our problem came back. Clearly ZFS file creation is just amazingly heavy even with ZIL disabled. If creating 4,000 files in a minute squashes 4 2.6Ghz Opteron cores we're in big trouble in the longer term. In the meantime I'm going to find a new home for our IMAP Mail so that the other things served from that NFS server at least aren't effected. You asked for the zpool and zfs info, which I don't want to share because its confidential (if you want it privately I'll do so, but not on a public list), but I will say that its a single massive Zpool in which we're using less than 2% of the capacity. But in thinking about this problem, even if we used 2 or more pools, the CPU consumption still would have choked the system, right? This leaves me really nervous about what we'll do when its not an internal mail server thats creating all those files but a customer. Oddly enough, this might be a very good reason to use iSCSI instead of NFS on the Thumper. Eric, I owe you a couple cases of beer for sure. I can't tell you how much I appreciate your help. Thanks to everyone else who chimed in with ideas and suggestions, all of you guys are the best! Good to hear that you have figured out what is happening, Ben. For future reference, there are two commands that you may want to make use of in observing the behavior of the NFS server and individual filesystems. There is the trusty, nfsstat command. In this case, you would have been able to do something like: nfsstat -s -v3 60 This will provide all of the server side NFSv3 statistics on 60 second intervals. Then there is a new command fsstat that will provide vnode level activity on a per filesystem basis. Therefore, if the NFS server has multiple filesystems active and you want ot look at just one something like this can be helpful: fsstat /export/foo 60 Fsstat has a 'full' option that will list all of the vnode operations or just certain types. It also will watch a filesystem type (e.g. zfs, nfs). Very useful. Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS/iSCSI target integration
On Thu, Darren J Moffat wrote: Ceri Davies wrote: For NFS, it's possible (but likely suboptimal) for clients to be configured to mount the filesystem from server A and fail over to server B, assuming that the pool import can happen quickly enough for them not to receive ENOENT. IIRC NFS client side failover is really only intended for read-only mounts. I can't remember though if this is enforced or not though. NFS client side failover is for read-only exports. No way to strictly enforce since the NFSv2/v3 protocols don't have support. The client attempts to ensure that active files look the same when failing over. NFSv4 has migration support such that a filesystem can move between servers but the administrative model is for that: movement and not server failover. Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS/iSCSI target integration
On Wed, Adam Leventhal wrote: On Wed, Nov 01, 2006 at 01:17:02PM -0500, Torrey McMahon wrote: Is there going to be a method to override that on the import? I can see a situation where you want to import the pool for some kind of maintenance procedure but you don't want the iSCSI target to fire up automagically. There isn't -- to my knowledge -- a way to do this today for NFS shares. This would be a reasonable RFE that would apply to both NFS and iSCSI. In the case of NFS, this can be dangerous if the rest of the NFS server is allowed to come up and serve other filesystems. The non-shared filesystem will end up returning ESTALE errors to clients that are active on that filesystem. It should be an all or nothing selection... Spencer Also, what if I don't have the iSCSI target packages on the node I'm importing to? Error messages? Nothing? You'll get an error message reporting that it could not be shared. Adam -- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS ACLs and Samba
On Thu, Joerg Schilling wrote: Spencer Shepler [EMAIL PROTECTED] wrote: On Wed, Jonathan Edwards wrote: On Oct 25, 2006, at 15:38, Roger Ripley wrote: IBM has contributed code for NFSv4 ACLs under AIX's JFS; hopefully Sun will not tarry in following their lead for ZFS. http://lists.samba.org/archive/samba-cvs/2006-September/070855.html I thought this was still in draft: http://ietf.org/internet-drafts/draft-ietf-nfsv4-acl-mapping-05.txt That I-D describes the Posix/NFSv4 mapping that can be done. NFSv4 ACLs to/from Samba/NT ACLs are a different story; no interdependency. VFSv4 ACLs are bitwise identical to WIN-NT ACLs, could you please explain why there is a difference for Samba? One known difference between NFSv4 ACLs and NT ACLs is information about how ACEs were populated via inheritance. There is a proposal in the NFSv4 WG at the moment to add this functionality to NFSv4.1. Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS ACLs and Samba
On Wed, Jonathan Edwards wrote: On Oct 25, 2006, at 15:38, Roger Ripley wrote: IBM has contributed code for NFSv4 ACLs under AIX's JFS; hopefully Sun will not tarry in following their lead for ZFS. http://lists.samba.org/archive/samba-cvs/2006-September/070855.html I thought this was still in draft: http://ietf.org/internet-drafts/draft-ietf-nfsv4-acl-mapping-05.txt That I-D describes the Posix/NFSv4 mapping that can be done. NFSv4 ACLs to/from Samba/NT ACLs are a different story; no interdependency. Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar
On Fri, Joerg Schilling wrote: Spencer Shepler [EMAIL PROTECTED] wrote: I didn't comment on the error conditions that can occur during the writing of data upon close(). What you describe is the preferred method of obtaining any errors that occur during the writing of data. This occurs because the NFS client is writing asynchronously and the only method the application has of retrieving the error information is from the fsync() or close() call. At close(), it is to late to recovery so fsync() can be used to obtain any asynchronous error state. This doesn't change the fact that upon close() the NFS client will write data back to the server. This is done to meet the close-to-open semantics of NFS. Your working did not match with the reality, this is why I did write this. You did write that upon close() the client will first do something similar to fsync on that file. The problem is that this is done asynchronously and the close() return value does noo contain an indication on whether the fsync did succeed. Sorry, the code in Solaris would behave as I described. Upon the application closing the file, modified data is written to the server. The client waits for completion of those writes. If there is an error, it is returned to the caller of close(). Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar
On Fri, Joerg Schilling wrote: Spencer Shepler [EMAIL PROTECTED] wrote: Sorry, the code in Solaris would behave as I described. Upon the application closing the file, modified data is written to the server. The client waits for completion of those writes. If there is an error, it is returned to the caller of close(). So is this Solaris specific, or why are people warned to depend on the close() return code only? All unix NFS clients that I know of behave the way I described. I believe the warning about relying on close() is that by the time the application receives the error it is too late to recover. If the application uses fsync() and receives an error, the application can warn the user and they may be able to do something about it (your example of ENOSPC is a very good one). Space can be freed, and the fsync() can be done again and the client will again push the writes to the server and be successful. If an application doesn't care about recovery but wants the error to report back to the user, then close() is sufficient. Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar
On Thu, Joerg Schilling wrote: Spencer Shepler [EMAIL PROTECTED] wrote: On Thu, Joerg Schilling wrote: Spencer Shepler [EMAIL PROTECTED] wrote: The close-to-open behavior of NFS clients is what ensures that the file data is on stable storage when close() returns. In the 1980s this was definitely not the case. When did this change? It has not. NFS clients have always flushed (written) modified file data to the server before returning to the applications close(). The NFS client also asks that the data be committed to disk in this case. This is definitely wrong. Our developers did loose many files in the 1980s when the NFS file server did fill up the exported filesystem while several NFS clients did try to write back edited files at the same time. VI at that time did not call fsync and for this reason did not notice that the file could not be written back properly. What happens: All client did call statfs() and did asume that there is still space on the server. They all did allow to put blocks into the local clients buffer cache. VI did call close, but the client did notice the no space problem after the close did return and VI did not notice that the file was damaged and allowed the user to quit VI. Some time later, Sun did enhance VI to first call fsync() and then call close(). Only if both return 0, the file is granted to be on the server. Sun also did inform us to write applications this way in order to prevent lost file content. I didn't comment on the error conditions that can occur during the writing of data upon close(). What you describe is the preferred method of obtaining any errors that occur during the writing of data. This occurs because the NFS client is writing asynchronously and the only method the application has of retrieving the error information is from the fsync() or close() call. At close(), it is to late to recovery so fsync() can be used to obtain any asynchronous error state. This doesn't change the fact that upon close() the NFS client will write data back to the server. This is done to meet the close-to-open semantics of NFS. Having tar create/write/close files concurrently would be a big win over NFS mounts on almost any system. Do you have an idea on how to do this? My naive thought would be to have multiple threads that create and write file data upon extraction. This multithreaded behavior would provide better overall throughput of an extraction given NFS' response time characteristics. More outstanding requests results in better throughput. It isn't only the file data being written to disk that is the overhead of the extraction, it is the creation of the directories and files that must also be committed to disk in the case of NFS. This is the other part that makes things slower than local access. Doing this with tar (which fetches the data from a serial data stream) would only make sense in case that there will be threads that only have the task to wait for a final fsync()/close(). It would also make it harder to implement error control as it may be that a problem is detected late while another large file is being extracted. Star could not just quit with an error message but would need to delay the error caused exit. Sure, I can see that it would be difficult. My point is that tar is not only waiting upon the fsync()/close() but also on file and directory creation. There is a longer delay not only because of the network latency but also the latency to writing the filesystem data to stable storage. Parallel requests will tend to overcome the delay/bandwidth issues. Not easy but can be an advantage with respect to performance. Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: NFS Performance and Tar
On Tue, eric kustarz wrote: Ben Rockwood wrote: I was really hoping for some option other than ZIL_DISABLE, but finally gave up the fight. Some people suggested NFSv4 helping over NFSv3 but it didn't... at least not enough to matter. ZIL_DISABLE was the solution, sadly. I'm running B43/X86 and hoping to get up to 48 or so soonish (I BFU'd it straight to B48 last night and brick'ed it). Here are the times. This is an untar (gtar xfj) of SIDEkick (http://www.cuddletech.com/blog/pivot/entry.php?id=491) on NFSv4 on a 20TB RAIDZ2 ZFS Pool: ZIL Enabled: real1m26.941s ZIL Disabled: real0m5.789s I'll update this post again when I finally get B48 or newer on the system and try it. Thanks to everyone for their suggestions. I imagine what's happening is that tar is a single-threaded application and it's basically doing: open, asynchronous write, close. This will go really fast locally. But for NFS due to the way it does cache consistency, on CLOSE, it must make sure that the writes are on stable storage, so it does a COMMIT, which basically turns your asynchronous write into a synchronous write. Which means you basically have a single-threaded app doing synchronous writes- ~ 1/2 disk rotational latency per write. Check out 'mount_nfs(1M)' and the 'nocto' option. It might be ok for you to relax the cache consistency for client's mount as you untar the file(s). Then remount w/out the 'nocto' option once you're done. This will not correct the problem because tar is extracting and therefore creating files and directories; those creates will be synchronous at the NFS server and there is no method to change this behavior at the client. Spencer Another option is to run multiple untars together. I'm guessing that you've got I/O to spare from ZFS's point of view. eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Lots of seeks?
On Tue, Anton B. Rang wrote: So while I'm feeling optimistic :-) we really ought to be able to do this in two I/O operations. If we have, say, 500K of data to write (including all of the metadata), we should be able to allocate a contiguous 500K block on disk and write that with a single operation. Then we update the ??berblock. The only inherent problem preventing this right now is that we don't have general scatter/gather at the driver level (ugh). Fixing this bug would help the NFS server significantly given the general lack of continuity of incoming write data (split at mblk boundaries). Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS
On Mon, Leon Koll wrote: I performed a SPEC SFS97 benchmark on Solaris 10u2/Sparc with 4 64GB LUNs, connected via FC SAN. The filesystems that were created on LUNS: UFS,VxFS,ZFS. Unfortunately the ZFS test couldn't complete bacuase the box was hung under very moderate load (3000 IOPs). Additional tests were done using UFS and VxFS that were built on ZFS raw devices (Zvolumes). Results can be seen here: http://napobo3.blogspot.com/2006/08/spec-sfs-bencmark-of-zfsufsvxfs.html Leon, Might I suggest that you provide the details as specified in the SPEC SFS run and reporting rules? They can be buried in a link from your blog but it would be helpful to have that information available to your readers. Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS iSCSI: where do do the mirroring/raidz
On Wed, Darren J Moffat wrote: I have 12 36G disks (in a single D2 enclosure) connected to a V880 that I want to share to a v40z that is on the same gigabit network switch. I've already decided that NFS is not the answer - the performance of ON consolidation builds over NFS just doesn't cut it for me. ? With a locally attached 3510 array on a 4-way v40z, I have been able to do a full nighly build in 1 hour 7 minutes. With NFSv3 access, from the same system, to a couple of different NFS servers, I have been able to achieve 1 hour 15 minutes in one case and 1 hour 22 minutes in the other. Is that too slow? Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Wrong reported free space over NFS
On Fri, Eric Schrock wrote: On Thu, Jun 08, 2006 at 10:53:06PM -0500, Spencer Shepler wrote: On Thu, Eric Schrock wrote: The problem is that statvfs() only returns two values (total blocks and free blocks) from which we have to calculate three values: size, free, ? From statvfs(2) the following are returned in struct statvfs: fsblkcnt_t f_blocks;/* total # of blocks on file system in units of f_frsize */ fsblkcnt_t f_bfree; /* total # of free blocks */ fsblkcnt_t f_bavail;/* # of free blocks avail to So, the data is being passed back. Is there something I am missing? Yes, because these values aren't as straightforward as they seem. For example, consider the return values from UFS: $ truss -t statvfs -v statvfs df -h / statvfs64(/, 0x080479BC) = 0 bsize=8192 frsize=1024 blocks=8068757 bfree=2258725 bavail=2178038 files=972608 ffree=809612favail=809612 fsid=0x198 basetype=ufs namemax=255 flag=ST_NOTRUNC fstr= Filesystem size used avail capacity Mounted on /dev/dsk/c1d0s07.7G 5.5G 2.1G73%/ $ Notice that the values don't correspond to your assumption. In particular, 'bfree + bavail != blocks'. The two values for 'bfree' and 'bavail' are used for filesystems that have a notion of 'reserved' blocks, i.e. metadata blocks which are used by the filesystem but not available to the user in the form of free space. That's why you have two values, and if you look at the source code for df(1), you'll see that it never uses 'bfree' (except in rare internal calculations) because it's basically useless. I must have been half asleep when looking at this; thanks for the clue bat. Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Wrong reported free space over NFS
On Thu, Eric Schrock wrote: The problem is that statvfs() only returns two values (total blocks and free blocks) from which we have to calculate three values: size, free, ? From statvfs(2) the following are returned in struct statvfs: fsblkcnt_t f_blocks;/* total # of blocks on file system in units of f_frsize */ fsblkcnt_t f_bfree; /* total # of free blocks */ fsblkcnt_t f_bavail;/* # of free blocks avail to So, the data is being passed back. Is there something I am missing? and available space. Prior to pooled storage, available = size - free. This isn't true with ZFS. On your local filesystem, df(1) recognizes it as a ZFS filesystem, and uses libzfs to get the real amount of available space. Over NFS, we have no choice but to stick with POSIX semantics, which means that we can never provide you with the right answer. For implementation details, check out adjust_total_blocks() in usr/src/cmd/fs.d/df.c. So, from the comments, that bit of df code seems to be adjusting for quotas if they exist? I am not sure I understand why zfs' VFS_STATVFS() function can't do what the df command is doing and then return the appropriate value to both df and the NFS server? So, in Robert's case, is that 17GB really available and if so that would seem to be an important thing to report to the NFS clients. Spencer On Thu, Jun 08, 2006 at 04:38:57PM -0700, Robert Milkowski wrote: NFS server (b39): bash-3.00# zfs get quota nfs-s5-s8/d5201 nfs-s5-p0/d5110 NAME PROPERTY VALUE SOURCE nfs-s5-p0/d5110 quota 600G local nfs-s5-s8/d5201 quota 600G local bash-3.00# bash-3.00# df -h | egrep d5201|d5110 nfs-s5-p0/d5110600G 527G73G88%/nfs-s5-p0/d5110 nfs-s5-s8/d5201600G 314G 269G54%/nfs-s5-s8/d5201 bash-3.00# NFS client (S10U1 + patches, NFSv3 mount over TCP): bash-3.00# df -h | egrep d5201|d5110 NFS-srv:/nfs-s5-p0/d5110 600G 527G73G88%/opt/d5110 NFS-srv:/nfs-s5-s8/d5201 583G 314G 269G54%/opt/d5201 bash-3.00# Well why I get 583GB size for d5201 on NFS client? ps. maybe I'm tired and missiong something really obvious...? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss