Re: Data corruption over NFS in -current
Martin Cracauer wrote: More findings. Reminder, with the original report I found: - files for no reason changing ownership and group to root/owngroupname - data corruption as in inserting binary junk obviously from ports - data corruption as in malformed ascii text that might be a bug I have in my code that is only exposed in FreeBSD I ran the script on a Linux machine in the same situation again the same NFS server, it worked fine. I haven't look at blocksizes, NFS versions etc in play yet. I ran with oldnfs (reboot), which showed only the third problem. I re-ran with newfs (reboot) which worked (all three problems absent). I then started building ports/land/gcc47 at the same time as I re-started my crazy script and it too only a few seconds for an unexpected ownership to root to occur. My next steps are: - trying block sizes and other parameters, maybe use a different NFS version with the Linux client. My NFS server is newly upgraded to Linux kernel 3.1.5 - running my script on a FreeBSD host with local disk to see whether problem #3 is a general problem that appears or is exposed only on FreeBSD - capture tcpdump as mentioned earlier I will probably have to turn debug off since this script run is dominated by system time now and gets 10x slower as it is now. While poking around (partly related to this and partly related to the NFSv4.1 pNFS client work), I came across an ugly bug in the way the new NFS client handled system operations. (system operations are mainly NFSv4 Ops that manage state, such as Renew, which renews a lease for the open/lock state. Another case of this was the NFSv3 statfs when it did a Getattr because the server did not provide post operation attributes in the reply.) It turns out that at least some Linux NFSv3 servers are in this category and the fact that Martin was doing a large number of StatFS RPCs was indeed relevent. Anyhow, the patch to fix the above seems to have resolved Martin's problem. The patch is needed for the new NFS client if you are using NFSv4 mounts or NFSv3 mounts against non-FreeBSD servers that don't provide post-op attributes in the Statfs RPC reply. (FreeBSD servers do provide post-op attributes, at least some Linux servers do not and I don't know about others. You could check by capturing the packets for a df and then looking at Statfs RPC reply in wireshark.) Without the patch, there will be intermittent permission failures, since the wrong credentials get used for an RPC. The patch is here and should be in head soon: http://people.freebsd.org/~rmacklem/authcred.patch Thanks go to Martin for pursuing this. rick ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Data corruption over NFS in -current
On 13/01/2012 15:37, Martin Cracauer wrote: More findings. Reminder, with the original report I found: - files for no reason changing ownership and group to root/owngroupname - data corruption as in inserting binary junk obviously from ports - data corruption as in malformed ascii text that might be a bug I have in my code that is only exposed in FreeBSD I re-ran with newfs (reboot) which worked (all three problems absent). I then started building ports/land/gcc47 at the same time as I re-started my crazy script and it too only a few seconds for an unexpected ownership to root to occur. Two more things to check: 1) Are you using tmpfs? Could you try without it? 2) Are you really sure your hardware is ok? If everything works fine after a reboot, it might mean that there is a memory corruption and you don't use that specific memory until some time after the reboot. Try running memtest86. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Data corruption over NFS in -current
More findings. Reminder, with the original report I found: - files for no reason changing ownership and group to root/owngroupname - data corruption as in inserting binary junk obviously from ports - data corruption as in malformed ascii text that might be a bug I have in my code that is only exposed in FreeBSD I ran the script on a Linux machine in the same situation again the same NFS server, it worked fine. I haven't look at blocksizes, NFS versions etc in play yet. I ran with oldnfs (reboot), which showed only the third problem. I re-ran with newfs (reboot) which worked (all three problems absent). I then started building ports/land/gcc47 at the same time as I re-started my crazy script and it too only a few seconds for an unexpected ownership to root to occur. My next steps are: - trying block sizes and other parameters, maybe use a different NFS version with the Linux client. My NFS server is newly upgraded to Linux kernel 3.1.5 - running my script on a FreeBSD host with local disk to see whether problem #3 is a general problem that appears or is exposed only on FreeBSD - capture tcpdump as mentioned earlier I will probably have to turn debug off since this script run is dominated by system time now and gets 10x slower as it is now. Martin -- %%% Martin Cracauer craca...@cons.org http://www.cons.org/cracauer/ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Data corruption over NFS in -current
Martin Cracauer wrote: More findings. Reminder, with the original report I found: - files for no reason changing ownership and group to root/owngroupname - data corruption as in inserting binary junk obviously from ports - data corruption as in malformed ascii text that might be a bug I have in my code that is only exposed in FreeBSD I ran the script on a Linux machine in the same situation again the same NFS server, it worked fine. I haven't look at blocksizes, NFS versions etc in play yet. I ran with oldnfs (reboot), which showed only the third problem. I re-ran with newfs (reboot) which worked (all three problems absent). Since this test worked, it suggests that problem #3 is not a bug in your software, unless your runs aren't processing the same data. However, a test using a local disk to confirm this, would be nice. I then started building ports/land/gcc47 at the same time as I re-started my crazy script and it too only a few seconds for an unexpected ownership to root to occur. Well, from my experience, isolating a problem like this is much easier if you can reproduce it reliably. I'd try this a few times and if doing ports/land/gcc47 concurrently reproduces the problem reliably, then I'd use that for all the testing. (I'd suggest you re-do the above tests doing ports/land/gcc47 concurrently with the script.) Also, I'd run systat -vmstat or similar (others may have better suggestions than systat -vmstat?) while running the tests, to see if there might be a memory exhaustion issue. (Daniel mentioned he had seen this, if I understood his post correctly. Maybe he can elaborate on how he spotted the memory exhaustion?) My next steps are: - trying block sizes and other parameters, maybe use a different NFS version with the Linux client. My NFS server is newly upgraded to Linux kernel 3.1.5 or go back to the old version of the NFS server, if that is feasible. Two changes (new Linux NFS server and new FreeBSD version) at about the same time, makes it harder to point your finger at the problem. - running my script on a FreeBSD host with local disk to see whether problem #3 is a general problem that appears or is exposed only on FreeBSD It might also be useful to run this FreeBSD host with local disk using the NFS mount and having a swap partition on the disk. (Again, related to what Daniel mentioned.) - capture tcpdump as mentioned earlier If the combination of running the script and ports/land/gcc47 reproduces the problem reliably, then doing a tcpdump should be straightforward. Good luck with it. I'll admit I doubt this will be resolved quickly or easily, but pursuing it as far as you can find the time to do so will be appreciated by others who might run into the same problem. rick I will probably have to turn debug off since this script run is dominated by system time now and gets 10x slower as it is now. Martin -- %%% Martin Cracauer craca...@cons.org http://www.cons.org/cracauer/ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Data corruption over NFS in -current
--+QahgC5+KEYLbs62 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Stefan Bethke wrote on Wed, Jan 11, 2012 at 07:14:44PM +0100: Am 11.01.2012 um 17:57 schrieb Martin Cracauer: I'm sorry for the unspecific bug report but I thought a heads-up is better than none. $ uname -a FreeBSD wings.cons.org 10.0-CURRENT FreeBSD 10.0-CURRENT #2: Wed Dec 28 12:19:21 EST 2011 craca...@wings.cons.org:/usr/src/sys/amd64/compile/WINGS amd64 I'm sure Rick will want to know which NFS version, which client code (default new code I'm assuming) and which mount options... It's all default both in fstab and as reported by mount(8). This is a diskless PXE boot but the mount affected (usr) is not the root filesystem, so this should come in via fstab. BTW, my /usr/ports is another mount so the corruption is cross-mount (garbage from /usr/ports entering /usr). Appending nfsstat output. I am re-running things contiguously to see how reproducible this is. This machine was recently updated from a -current almost a year old, so it's its first time with the new NFS client code. Martin I've seen problems, but they were always related to programs running out of resources and not reporting it correctly - in dataless specialy if running out of memory and there is no swap available. btw, most of my servers are dataless (they boot via PXE but have local swap, var, etc) hth, danny I see filesystem corruption on NFS filesystems here. I am running a heavy shellscript that is noodling around with ascii files assembling them with awk and whatnot. Some actions are concurrent with up to 21 forks doing full-CPU load scripting. This machine is a K8 with a total of 8 cores, diskless NFS and memory filesystem for /tmp. I observe two problems: - for no reason whatsoever, some files change from my (user/group) cracauer/wheel to root/cracauer - the same files will later be corrupted. The beginning of the file is normal but then it has what looks like parts of /usr/ports, including our CVS files and binary junk, mostly zeros I did do some ports building lately but not at the same time that this problem manifested itself. I speculate some ports blocks were still resident in the filesystem buffer cache. Server is Linux. Martin -- %%% Martin Cracauer craca...@cons.org http://www.cons.org/cracauer/ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org -- Stefan Bethke s...@lassitu.de Fon +49 151 14070811 -- %%% Martin Cracauer craca...@cons.org http://www.cons.org/cracauer/ --+QahgC5+KEYLbs62 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=l Client Info: Rpc Counts: Getattr SetattrLookup Readlink Read WriteCreate Remove 94392942513117 3637266 2577 40227237 2824593333832 304567 Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access 32522 5121 4856 20363 13954179035 0 3534382 MknodFsstatFsinfo PathConfCommit 5 21127240 3 2999521782 Rpc Info: TimedOut Invalid X Replies Retries Requests 0 0 0 0 167678419 Cache Info: Attr HitsMisses Lkup HitsMisses BioR HitsMisses BioW Hits Misses 1933340911 73265447 1123380719 3636242 90975094450509 4917135 2824593 BioRLHitsMisses BioD HitsMisses DirE HitsMisses Accs Hits Misses 54732346 2577599049142917352394 0 733726346 3534382 Server Info: Getattr SetattrLookup Readlink Read WriteCreate Remove 0 0 0 0 0 0 0 0 Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access 0 0 0 0 0 0 0 0 MknodFsstatFsinfo PathConfCommit 0 0 0 0 0 Server Ret-Failed 0 Server Faults 0 Server Cache Stats: Inprog Idem Non-idemMisses 0 0 0 0 Server Write Gathering: WriteOps WriteRPC Opsaved 0 0 0 --+QahgC5+KEYLbs62 Content-Type: text/plain; charset=us-ascii MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ___ freebsd-current@freebsd.org mailing list
Re: Data corruption over NFS in -current
Am 11.01.2012 um 17:57 schrieb Martin Cracauer: I'm sorry for the unspecific bug report but I thought a heads-up is better than none. $ uname -a FreeBSD wings.cons.org 10.0-CURRENT FreeBSD 10.0-CURRENT #2: Wed Dec 28 12:19:21 EST 2011 craca...@wings.cons.org:/usr/src/sys/amd64/compile/WINGS amd64 I'm sure Rick will want to know which NFS version, which client code (default new code I'm assuming) and which mount options... I see filesystem corruption on NFS filesystems here. I am running a heavy shellscript that is noodling around with ascii files assembling them with awk and whatnot. Some actions are concurrent with up to 21 forks doing full-CPU load scripting. This machine is a K8 with a total of 8 cores, diskless NFS and memory filesystem for /tmp. I observe two problems: - for no reason whatsoever, some files change from my (user/group) cracauer/wheel to root/cracauer - the same files will later be corrupted. The beginning of the file is normal but then it has what looks like parts of /usr/ports, including our CVS files and binary junk, mostly zeros I did do some ports building lately but not at the same time that this problem manifested itself. I speculate some ports blocks were still resident in the filesystem buffer cache. Server is Linux. Martin -- %%% Martin Cracauer craca...@cons.org http://www.cons.org/cracauer/ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org -- Stefan Bethke s...@lassitu.de Fon +49 151 14070811 ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Data corruption over NFS in -current
Stefan Bethke wrote on Wed, Jan 11, 2012 at 07:14:44PM +0100: Am 11.01.2012 um 17:57 schrieb Martin Cracauer: I'm sorry for the unspecific bug report but I thought a heads-up is better than none. $ uname -a FreeBSD wings.cons.org 10.0-CURRENT FreeBSD 10.0-CURRENT #2: Wed Dec 28 12:19:21 EST 2011 craca...@wings.cons.org:/usr/src/sys/amd64/compile/WINGS amd64 I'm sure Rick will want to know which NFS version, which client code (default new code I'm assuming) and which mount options... It's all default both in fstab and as reported by mount(8). This is a diskless PXE boot but the mount affected (usr) is not the root filesystem, so this should come in via fstab. BTW, my /usr/ports is another mount so the corruption is cross-mount (garbage from /usr/ports entering /usr). Appending nfsstat output. I am re-running things contiguously to see how reproducible this is. This machine was recently updated from a -current almost a year old, so it's its first time with the new NFS client code. Martin I see filesystem corruption on NFS filesystems here. I am running a heavy shellscript that is noodling around with ascii files assembling them with awk and whatnot. Some actions are concurrent with up to 21 forks doing full-CPU load scripting. This machine is a K8 with a total of 8 cores, diskless NFS and memory filesystem for /tmp. I observe two problems: - for no reason whatsoever, some files change from my (user/group) cracauer/wheel to root/cracauer - the same files will later be corrupted. The beginning of the file is normal but then it has what looks like parts of /usr/ports, including our CVS files and binary junk, mostly zeros I did do some ports building lately but not at the same time that this problem manifested itself. I speculate some ports blocks were still resident in the filesystem buffer cache. Server is Linux. Martin -- %%% Martin Cracauer craca...@cons.org http://www.cons.org/cracauer/ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org -- Stefan Bethke s...@lassitu.de Fon +49 151 14070811 -- %%% Martin Cracauer craca...@cons.org http://www.cons.org/cracauer/ Client Info: Rpc Counts: Getattr SetattrLookup Readlink Read WriteCreateRemove 94392942513117 3637266 2577 40227237 2824593333832304567 Rename Link Symlink Mkdir Rmdir Readdir RdirPlusAccess 32522 5121 4856 20363 13954179035 0 3534382 MknodFsstatFsinfo PathConfCommit 5 21127240 3 2999521782 Rpc Info: TimedOut Invalid X Replies Retries Requests 0 0 0 0 167678419 Cache Info: Attr HitsMisses Lkup HitsMisses BioR HitsMisses BioW HitsMisses 1933340911 73265447 1123380719 3636242 90975094450509 4917135 2824593 BioRLHitsMisses BioD HitsMisses DirE HitsMisses Accs HitsMisses 54732346 2577599049142917352394 0 733726346 3534382 Server Info: Getattr SetattrLookup Readlink Read WriteCreateRemove 0 0 0 0 0 0 0 0 Rename Link Symlink Mkdir Rmdir Readdir RdirPlusAccess 0 0 0 0 0 0 0 0 MknodFsstatFsinfo PathConfCommit 0 0 0 0 0 Server Ret-Failed 0 Server Faults 0 Server Cache Stats: Inprog Idem Non-idemMisses 0 0 0 0 Server Write Gathering: WriteOps WriteRPC Opsaved 0 0 0 ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Data corruption over NFS in -current
Martin Cracauer wrote: Stefan Bethke wrote on Wed, Jan 11, 2012 at 07:14:44PM +0100: Am 11.01.2012 um 17:57 schrieb Martin Cracauer: I'm sorry for the unspecific bug report but I thought a heads-up is better than none. $ uname -a FreeBSD wings.cons.org 10.0-CURRENT FreeBSD 10.0-CURRENT #2: Wed Dec 28 12:19:21 EST 2011 craca...@wings.cons.org:/usr/src/sys/amd64/compile/WINGS amd64 I'm sure Rick will want to know which NFS version, which client code (default new code I'm assuming) and which mount options... It's all default both in fstab and as reported by mount(8). I assume that by the above statement, you mean that you don't specify any mount options in your /etc/fstab entry except rw? (If this isn't correct, please post your /etc/fstab entries for the NFS mounts.) - If I am correct, in that you just specify rw, the main difference between the old and new NFS client will be the rsize/wsize used. The new NFS client will use MAX_BSIZE (64Kb) decreased to whatever the server says is the largest it can handle. This should be fine, unless the server says it can handle = 64Kb, but actually only works correctly for 32Kb (which is what the old NFS client will default to, I think?). A few things to try/check: - Look locally on the server to see if the file is corrupted there. - Try the old NFS client. (Set the fs type to oldnfs instead of nfs on the lines in your /etc/fstab.) - If switching to the old client helps, it might be a bug in the way the new client generates the create verifier. I just looked at the code and I'm not certain the code in the new client would work correctly for a amd64. (I only have i386 to test with.) - I can easily generate a patch that changes the new client to do this the same way as the old client, but there is no point, unless the old client doesn't have the problem. -- Exclusive create problems might explain the incorrect ownership, since it first does a create that will fill in user/group in whatever default way the Linux server chooses to and then does a Setattr RPC to change them to the correct values. If the Setattr RPC fails, then the file exists owned by whatever the server chooses. (I don't know if Linux servers use the gid of the directory or the gid of the requestor or ???) - If you have a non-Linux NFS server, try running against that to see if it is a Linux server specific problem. (Since I haven't seen any other reports like this, I suspect it might be an interoperability problem related to the Linux server.) Also, if you can reproduce the problem fairly easily, capture a packet trace via # tcpdump -s 0 -w xxx host server running on the client (or similar). Then email me xxx as an attachment and I can look at it in wireshark. (If you choose to look at it in wireshark, I would suggest you look for Create RPCs to see if they are Exclusive Creates, plus try and see where the data for the corrupt file is written.) Even if the capture is pretty large, it should be easy to find the interesting part, so long as you know the name of the corrupt file and search for that. This is a diskless PXE boot but the mount affected (usr) is not the root filesystem, so this should come in via fstab. BTW, my /usr/ports is another mount so the corruption is cross-mount (garbage from /usr/ports entering /usr). Appending nfsstat output. nfsstat output is pretty useless for this kind of situation. I did find it interesting that you do so many Fsstat RPCs, but that shouldn't be a problem, it's just weird to see that. rick I am re-running things contiguously to see how reproducible this is. This machine was recently updated from a -current almost a year old, so it's its first time with the new NFS client code. Martin I see filesystem corruption on NFS filesystems here. I am running a heavy shellscript that is noodling around with ascii files assembling them with awk and whatnot. Some actions are concurrent with up to 21 forks doing full-CPU load scripting. This machine is a K8 with a total of 8 cores, diskless NFS and memory filesystem for /tmp. I observe two problems: - for no reason whatsoever, some files change from my (user/group) cracauer/wheel to root/cracauer - the same files will later be corrupted. The beginning of the file is normal but then it has what looks like parts of /usr/ports, including our CVS files and binary junk, mostly zeros I did do some ports building lately but not at the same time that this problem manifested itself. I speculate some ports blocks were still resident in the filesystem buffer cache. Server is Linux. Martin -- %%% Martin Cracauer craca...@cons.org http://www.cons.org/cracauer/ ___
Re: Data corruption over NFS in -current
Rick Macklem wrote on Wed, Jan 11, 2012 at 08:42:25PM -0500: Martin Cracauer wrote: Stefan Bethke wrote on Wed, Jan 11, 2012 at 07:14:44PM +0100: Am 11.01.2012 um 17:57 schrieb Martin Cracauer: I'm sorry for the unspecific bug report but I thought a heads-up is better than none. $ uname -a FreeBSD wings.cons.org 10.0-CURRENT FreeBSD 10.0-CURRENT #2: Wed Dec 28 12:19:21 EST 2011 craca...@wings.cons.org:/usr/src/sys/amd64/compile/WINGS amd64 I'm sure Rick will want to know which NFS version, which client code (default new code I'm assuming) and which mount options... It's all default both in fstab and as reported by mount(8). I assume that by the above statement, you mean that you don't specify any mount options in your /etc/fstab entry except rw? (If this isn't correct, please post your /etc/fstab entries for the NFS mounts.) 172.18.30.2:/home/diskless/freebsd-current-usr /usrnfs rw 0 0 172.18.30.2:/home/diskless/usr-ports/usr/ports nfs rw 0 0 - If I am correct, in that you just specify rw, the main difference between the old and new NFS client will be the rsize/wsize used. The new NFS client will use MAX_BSIZE (64Kb) decreased to whatever the server says is the largest it can handle. This should be fine, unless the server says it can handle = 64Kb, but actually only works correctly for 32Kb (which is what the old NFS client will default to, I think?). I'll try 32 KB. A few things to try/check: - Look locally on the server to see if the file is corrupted there. Yes it has the corrupted version of the file, and in a new run I had another file changed to root ownership and that is the same from server and client standpoint. The good news is that this seems fairly reproducible, the root ownership is back. This time I stopped the script when ownership changed so I don't know whether it would have gone forward with corrupting the file afterwards. - Try the old NFS client. (Set the fs type to oldnfs instead of nfs on the lines in your /etc/fstab.) - If switching to the old client helps, it might be a bug in the way the new client generates the create verifier. I just looked at the code and I'm not certain the code in the new client would work correctly for a amd64. (I only have i386 to test with.) - I can easily generate a patch that changes the new client to do this the same way as the old client, but there is no point, unless the old client doesn't have the problem. -- Exclusive create problems might explain the incorrect ownership, since it first does a create that will fill in user/group in whatever default way the Linux server chooses to and then does a Setattr RPC to change them to the correct values. If the Setattr RPC fails, then the file exists owned by whatever the server chooses. (I don't know if Linux servers use the gid of the directory or the gid of the requestor or ???) - If you have a non-Linux NFS server, try running against that to see if it is a Linux server specific problem. (Since I haven't seen any other reports like this, I suspect it might be an interoperability problem related to the Linux server.) I should mention that I also updated the server to Linux-3.1.5 two weeks ago. I'm not sure I put I put heavy load on it since then. I will have a Linux NFS client do the same thing and try the FreeBSD things you mention. Also, if you can reproduce the problem fairly easily, capture a packet trace via # tcpdump -s 0 -w xxx host server running on the client (or similar). Then email me xxx as an attachment and I can look at it in wireshark. (If you choose to look at it in wireshark, I would suggest you look for Create RPCs to see if they are Exclusive Creates, plus try and see where the data for the corrupt file is written.) Even if the capture is pretty large, it should be easy to find the interesting part, so long as you know the name of the corrupt file and search for that. That's probably not practical, we are talking about hammering the NFS server with several CPU hours worth of parallel activity in a shellscript but I'll do my best :-) Martin This is a diskless PXE boot but the mount affected (usr) is not the root filesystem, so this should come in via fstab. BTW, my /usr/ports is another mount so the corruption is cross-mount (garbage from /usr/ports entering /usr). Appending nfsstat output. nfsstat output is pretty useless for this kind of situation. I did find it interesting that you do so many Fsstat RPCs, but that shouldn't be a problem, it's just weird to see that. rick I am re-running things contiguously to see how reproducible this is. This machine was recently updated from a -current almost a year old, so it's its first time with the new NFS client code. Martin I see filesystem corruption on NFS
Re: Data corruption over NFS in -current
In the last episode (Jan 11), Martin Cracauer said: Rick Macklem wrote on Wed, Jan 11, 2012 at 08:42:25PM -0500: Also, if you can reproduce the problem fairly easily, capture a packet trace via # tcpdump -s 0 -w xxx host server running on the client (or similar). Then email me xxx as an attachment and I can look at it in wireshark. (If you choose to look at it in wireshark, I would suggest you look for Create RPCs to see if they are Exclusive Creates, plus try and see where the data for the corrupt file is written.) Even if the capture is pretty large, it should be easy to find the interesting part, so long as you know the name of the corrupt file and search for that. That's probably not practical, we are talking about hammering the NFS server with several CPU hours worth of parallel activity in a shellscript but I'll do my best :-) The tcpdump options -C and -W can help here. For example, -C 1000 -W 10 will keep the most recent 10-GB of traffic by circularly writing to 10 1-GB capture files. All you need to do is kill the tcpdump when you discover the corruption, and work backwards through the logs until you find your file. -- Dan Nelson dnel...@allantgroup.com ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org