Re: NFS client zeroing out blocks on write?
: :In the last episode (Dec 04), Matthew Dillon said: : Hmm. I thought we had fixed all the zeroing problems. Are you sure : you compiled your current up from the latest source? : :Yep. The machine was a snapshot install from Nov 15 that I rebuilt :world on the 23rd, and rebuilt the kernel on Dec 2nd. : : I am presuming there are no other clients accessing the output files : while the split is running? : :Correct. "split" isn't even strictly necessary, but it makes it easier :to generate multiple gigs worth of data across an nfsv2 mount point. :Depending on my mount options, the glitch is sometimes infrequent :enough to only occur once every 5-10 gig of generated data.. : : Interesting. It looks very similar to a problem we fixed months ago. : That problem was related only to NFSv3 and mmap(), and you aren't : using mmap() here. It's disturbing to see this problem occur with : both NFSv2 and NFSv3. : : I wonder if the problem occurs between a FreeBSD client and server. : :Most of my tests have been done on NFSv3 mounts since they are so much :faster. I'll try another test run, NFSv2 mounting another FreeBSD box :and see what happens. : :-- : Dan Nelson : [EMAIL PROTECTED] Dan, I know this may be placing an undue burden on you, but can you try installing a 3.x snapshot to see if the bug exists there? If the bug exists in 3.x then I'll know that it isn't due to changes I've made in 4.x (or at least not likely due to those changes). If the bug does not exist then it gives me a place to start looking. The weird thing is that we are talking about a single process here, and I would expect this type of bug to occur with multiple contending processes.If it had just been an NFSv3 mount I would have suspected the commit rpc code, but if it is occuring on NFSv2 as well it kinda sounds like a preexisting bug that has just been brought out into the light due to changes in the way NFS works (major NFS performance improvements have been made in -current, for example, that allow NFS to saturate the network more easily). -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: NFS client zeroing out blocks on write?
In the last episode (Dec 04), Matthew Dillon said: Hmm. I thought we had fixed all the zeroing problems. Are you sure you compiled your current up from the latest source? Yep. The machine was a snapshot install from Nov 15 that I rebuilt world on the 23rd, and rebuilt the kernel on Dec 2nd. I am presuming there are no other clients accessing the output files while the split is running? Correct. "split" isn't even strictly necessary, but it makes it easier to generate multiple gigs worth of data across an nfsv2 mount point. Depending on my mount options, the glitch is sometimes infrequent enough to only occur once every 5-10 gig of generated data.. Interesting. It looks very similar to a problem we fixed months ago. That problem was related only to NFSv3 and mmap(), and you aren't using mmap() here. It's disturbing to see this problem occur with both NFSv2 and NFSv3. I wonder if the problem occurs between a FreeBSD client and server. Most of my tests have been done on NFSv3 mounts since they are so much faster. I'll try another test run, NFSv2 mounting another FreeBSD box and see what happens. -- Dan Nelson [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: NFS client zeroing out blocks on write?
:I just upgraded a server from 2.2.8 to -current (991201 kernel) and am :seeing some NFS corruption. It looks like byte ranges are getting :zeroed out by the client (or not getting written at all, and the server :at the other end is filling with zeros?). I've seen it while writing :to both a Solaris 2.6 server (NFSv3) and a Netware NFS server (NFSv2), :so I'm pretty sure it's the client causing the problem. Hmm. I thought we had fixed all the zeroing problems. Are you sure you compiled your current up from the latest source? I am presuming there are no other clients accessing the output files while the split is running? Also, change your blankcheck program to display the ranges in hex, doing so brings to light the pattern. : fileaa : fileab : 168173568-168179199(5632) A062000-A063FFF : 384966656-384972287(5632) 16F22000-16F235FF Interesting. It looks very similar to a problem we fixed months ago. That problem was related only to NFSv3 and mmap(), and you aren't using mmap() here. It's disturbing to see this problem occur with both NFSv2 and NFSv3. I wonder if the problem occurs between a FreeBSD client and server. -Matt Matthew Dillon [EMAIL PROTECTED] :All the zeroed out blocks start on an 8k NFS boundary, and I have :verified that the rest of the 8k block has the correct data in it. :Each corrupted block is always a multiple of 512 bytes long (so far :multiples are 6, 7, 11, and 12). : :On this example run, each file either has no corruption at all, or has :corruption with all the zeroed out ranges the same size. Dunno if this :matters, but it's interesting. : :If I run without nfsiod, or copy from a remote NFS mount to a remote :NFS mount, the corruption goes way down but still happens. I got only :one corrupted block in my 7-gig test run in each of those test cases. : :I'm afraid I don't know much about the internal workings of NFS, so I'm :hoping my description is enough to pinpoint the problem. : :-- : Dan Nelson : [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
NFS client zeroing out blocks on write?
I just upgraded a server from 2.2.8 to -current (991201 kernel) and am seeing some NFS corruption. It looks like byte ranges are getting zeroed out by the client (or not getting written at all, and the server at the other end is filling with zeros?). I've seen it while writing to both a Solaris 2.6 server (NFSv3) and a Netware NFS server (NFSv2), so I'm pretty sure it's the client causing the problem. Details: 4.0-CURRENT FreeBSD 4.0-CURRENT #2: Thu Dec 2 17:07:57 CST 1999 CPU: Dual Pentium III/Xeon 600 Mhz RAM: 256MB NIC: fxp0, full-duplex 100mbit NFS mount point: /mnt/filesystem/u01, mounting a Solaris 2.6 box also with a 100mbit full-duplex net connection, 8K NFS blocksize, UDP, via amd. My testcase is a 7-gig text file that I'm copying around with the following commands: $ cd /net/remotesystem/u01 $ split -b 10 /u01/bigfile.txt file creating seven 1-gig files fileaa .. fileag (running at a nice rate of 5-6 MB/sec :). I then run "blankcheck" (attached) to scan the file for runs of zeroes, and get the following: $ for i in filea{a,b,c,d,e,f} ; do echo $i ; ./blankcheck $i ; done fileaa fileab 168173568-168179199(5632) 384966656-384972287(5632) 385753088-385758719(5632) ( snip 156 lines just like the above, all ranges 5632 bytes in size ) 464068608-464074239(5632) 464723968-464729599(5632) 465248256-465253887(5632) fileac 203448320-203451391(3072) filead fileae 372097024-372103167(6144) fileaf 561774592-561778175(3584) $ All the zeroed out blocks start on an 8k NFS boundary, and I have verified that the rest of the 8k block has the correct data in it. Each corrupted block is always a multiple of 512 bytes long (so far multiples are 6, 7, 11, and 12). On this example run, each file either has no corruption at all, or has corruption with all the zeroed out ranges the same size. Dunno if this matters, but it's interesting. If I run without nfsiod, or copy from a remote NFS mount to a remote NFS mount, the corruption goes way down but still happens. I got only one corrupted block in my 7-gig test run in each of those test cases. I'm afraid I don't know much about the internal workings of NFS, so I'm hoping my description is enough to pinpoint the problem. -- Dan Nelson [EMAIL PROTECTED] Copyright (c) 1992-1999 The FreeBSD Project. Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. FreeBSD 4.0-CURRENT #2: Thu Dec 2 17:07:57 CST 1999 [EMAIL PROTECTED]:/usr/src/sys/compile/EMSSRV7 Timecounter "i8254" frequency 1193182 Hz CPU: Pentium III/Xeon (596.92-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x673 Stepping = 3 Features=0x383fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,XMM real memory = 268427264 (262136K bytes) avail memory = 257163264 (251136K bytes) Programming 24 pins in IOAPIC #0 FreeBSD/SMP: Multiprocessor motherboard cpu0 (BSP): apic id: 1, version: 0x00040011, at 0xfee0 cpu1 (AP): apic id: 0, version: 0x00040011, at 0xfee0 io0 (APIC): apic id: 2, version: 0x00170011, at 0xfec0 Preloaded elf kernel "kernel" at 0xc0303000. VESA: v2.0, 2048k memory, flags:0x0, mode table:0xc02af1c2 (122) VESA: ATI MACH64 Pentium Pro MTRR support enabled npx0: math processor on motherboard npx0: INT 16 interface pcib0: Intel 82443BX (440 BX) host to PCI bridge on motherboard pci0: PCI bus on pcib0 pcib1: Intel 82443BX (440 BX) PCI-PCI (AGP) bridge at device 1.0 on pci0 pci1: PCI bus on pcib1 vga-pci0: ATI model 4757 graphics accelerator at device 0.0 on pci1 pcib2: PCI to PCI bridge (vendor=1011 device=0024) at device 2.0 on pci0 pci2: PCI bus on pcib2 ahc0: Adaptec 2944 Ultra SCSI adapter irq 21 at device 9.0 on pci2 ahc0: aic7880 Wide Channel A, SCSI Id=7, 16/255 SCBs ahc1: Adaptec aic7890/91 Ultra2 SCSI adapter irq 16 at device 11.0 on pci2 ahc1: aic7890/91 Wide Channel A, SCSI Id=7, 16/255 SCBs isab0: Intel 82371AB PCI to ISA bridge at device 7.0 on pci0 isa0: ISA bus on isab0 chip1: Intel PIIX4 IDE controller at device 7.1 on pci0 pci0: UHCI USB controller (vendor=0x8086, dev=0x7112) at 7.2 irq 19 Timecounter "PIIX" frequency 3579545 Hz intpm0: Intel 82371AB Power management controller at device 7.3 on pci0 intpm0: I/O mapped 850 intpm0: intr IRQ 9 enabled revision 0 smbus0: System Management Bus on intsmb0 smb0: SMBus general purpose I/O on smbus0 intpm0: PM I/O mapped 800 fxp0: Intel EtherExpress Pro 10/100B Ethernet irq 18 at device 14.0 on pci0 fxp0: Ethernet address 00:90:27:dc:44:eb fdc0: NEC 72065B or clone at port 0x3f0-0x3f7 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: 1440-KB 3.5" drive on fdc0 drive 0 wdc1 at port 0x170-0x177 irq 15 on isa0 wdc1: unit 0 (atapi): SAMSUNG SC-140B/d005, removable, intr, dma, iordis wcd0: drive speed 6875KB/sec, 128KB cache wcd0: supported read types: CD-R, CD-RW, CD-DA, packet track wcd0: Audio: play, 255 volume levels wcd0: