Re: NFS client zeroing out blocks on write?

1999-12-06 Thread Matthew Dillon


:
:In the last episode (Dec 04), Matthew Dillon said:
: Hmm.  I thought we had fixed all the zeroing problems.  Are you sure
: you compiled your current up from the latest source?
:
:Yep.  The machine was a snapshot install from Nov 15 that I rebuilt
:world on the 23rd, and rebuilt the kernel on Dec 2nd.
: 
: I am presuming there are no other clients accessing the output files
: while the split is running?
:
:Correct.  "split" isn't even strictly necessary, but it makes it easier
:to generate multiple gigs worth of data across an nfsv2 mount point. 
:Depending on my mount options, the glitch is sometimes infrequent
:enough to only occur once every 5-10 gig of generated data..
:
: Interesting.  It looks very similar to a problem we fixed months ago.
: That problem was related only to NFSv3 and mmap(), and you aren't
: using mmap() here.  It's disturbing to see this problem occur with
: both NFSv2 and NFSv3.
: 
: I wonder if the problem occurs between a FreeBSD client and server.
:
:Most of my tests have been done on NFSv3 mounts since they are so much
:faster.  I'll try another test run, NFSv2 mounting another FreeBSD box
:and see what happens.
: 
:-- 
:   Dan Nelson
:   [EMAIL PROTECTED]

Dan, I know this may be placing an undue burden on you, but can you try
installing a 3.x snapshot to see if the bug exists there?  If the bug
exists in 3.x then I'll know that it isn't due to changes I've made
in 4.x (or at least not likely due to those changes).  If the bug does
not exist then it gives me a place to start looking.

The weird thing is that we are talking about a single process here, and
I would expect this type of bug to occur with multiple contending 
processes.If it had just been an NFSv3 mount I would have suspected
the commit rpc code, but if it is occuring on NFSv2 as well it kinda sounds
like a preexisting bug that has just been brought out into the light
due to changes in the way NFS works (major NFS performance improvements
have been made in -current, for example, that allow NFS to saturate the
network more easily).

-Matt
Matthew Dillon 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: NFS client zeroing out blocks on write?

1999-12-05 Thread Dan Nelson

In the last episode (Dec 04), Matthew Dillon said:
 Hmm.  I thought we had fixed all the zeroing problems.  Are you sure
 you compiled your current up from the latest source?

Yep.  The machine was a snapshot install from Nov 15 that I rebuilt
world on the 23rd, and rebuilt the kernel on Dec 2nd.
 
 I am presuming there are no other clients accessing the output files
 while the split is running?

Correct.  "split" isn't even strictly necessary, but it makes it easier
to generate multiple gigs worth of data across an nfsv2 mount point. 
Depending on my mount options, the glitch is sometimes infrequent
enough to only occur once every 5-10 gig of generated data..

 Interesting.  It looks very similar to a problem we fixed months ago.
 That problem was related only to NFSv3 and mmap(), and you aren't
 using mmap() here.  It's disturbing to see this problem occur with
 both NFSv2 and NFSv3.
 
 I wonder if the problem occurs between a FreeBSD client and server.

Most of my tests have been done on NFSv3 mounts since they are so much
faster.  I'll try another test run, NFSv2 mounting another FreeBSD box
and see what happens.
 
-- 
Dan Nelson
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: NFS client zeroing out blocks on write?

1999-12-04 Thread Matthew Dillon

:I just upgraded a server from 2.2.8 to -current (991201 kernel) and am
:seeing some NFS corruption.  It looks like byte ranges are getting
:zeroed out by the client (or not getting written at all, and the server
:at the other end is filling with zeros?).  I've seen it while writing
:to both a Solaris 2.6 server (NFSv3) and a Netware NFS server (NFSv2),
:so I'm pretty sure it's the client causing the problem.

Hmm.  I thought we had fixed all the zeroing problems.  Are you sure
you compiled your current up from the latest source?

I am presuming there are no other clients accessing the output files
while the split is running?

Also, change your blankcheck program to display the ranges in hex,
doing so brings to light the pattern.

: fileaa
: fileab
: 168173568-168179199(5632) A062000-A063FFF
: 384966656-384972287(5632) 16F22000-16F235FF

Interesting.  It looks very similar to a problem we fixed months ago.
That problem was related only to NFSv3 and mmap(), and you aren't using
mmap() here.  It's disturbing to see this problem occur with both NFSv2
and NFSv3.

I wonder if the problem occurs between a FreeBSD client and server.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

:All the zeroed out blocks start on an 8k NFS boundary, and I have
:verified that the rest of the 8k block has the correct data in it. 
:Each corrupted block is always a multiple of 512 bytes long (so far
:multiples are 6, 7, 11, and 12).
:
:On this example run, each file either has no corruption at all, or has
:corruption with all the zeroed out ranges the same size.  Dunno if this
:matters, but it's interesting.
:
:If I run without nfsiod, or copy from a remote NFS mount to a remote
:NFS mount, the corruption goes way down but still happens.  I got only
:one corrupted block in my 7-gig test run in each of those test cases. 
:
:I'm afraid I don't know much about the internal workings of NFS, so I'm
:hoping my description is enough to pinpoint the problem.  
:
:-- 
:   Dan Nelson
:   [EMAIL PROTECTED]




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



NFS client zeroing out blocks on write?

1999-12-03 Thread Dan Nelson


I just upgraded a server from 2.2.8 to -current (991201 kernel) and am
seeing some NFS corruption.  It looks like byte ranges are getting
zeroed out by the client (or not getting written at all, and the server
at the other end is filling with zeros?).  I've seen it while writing
to both a Solaris 2.6 server (NFSv3) and a Netware NFS server (NFSv2),
so I'm pretty sure it's the client causing the problem.

Details: 

4.0-CURRENT FreeBSD 4.0-CURRENT #2: Thu Dec 2 17:07:57 CST 1999
CPU: Dual Pentium III/Xeon 600 Mhz
RAM: 256MB
NIC: fxp0, full-duplex 100mbit
NFS mount point: /mnt/filesystem/u01, mounting a Solaris 2.6 box also
   with a 100mbit full-duplex net connection, 8K NFS blocksize,
   UDP, via amd.

My testcase is a 7-gig text file that I'm copying around with the
following commands:

 $ cd /net/remotesystem/u01
 $ split -b 10 /u01/bigfile.txt file

creating seven 1-gig files fileaa .. fileag (running at a nice rate of
5-6 MB/sec :).  I then run "blankcheck" (attached) to scan the file for
runs of zeroes, and get the following:

 $ for i in filea{a,b,c,d,e,f} ; do echo $i ; ./blankcheck  $i ;  done
 fileaa
 fileab
 168173568-168179199(5632)
 384966656-384972287(5632)
 385753088-385758719(5632)
( snip 156 lines just like the above, all ranges 5632 bytes in size )
 464068608-464074239(5632)
 464723968-464729599(5632)
 465248256-465253887(5632)
 fileac
 203448320-203451391(3072)
 filead
 fileae
 372097024-372103167(6144)
 fileaf
 561774592-561778175(3584)
 $

All the zeroed out blocks start on an 8k NFS boundary, and I have
verified that the rest of the 8k block has the correct data in it. 
Each corrupted block is always a multiple of 512 bytes long (so far
multiples are 6, 7, 11, and 12).

On this example run, each file either has no corruption at all, or has
corruption with all the zeroed out ranges the same size.  Dunno if this
matters, but it's interesting.

If I run without nfsiod, or copy from a remote NFS mount to a remote
NFS mount, the corruption goes way down but still happens.  I got only
one corrupted block in my 7-gig test run in each of those test cases. 

I'm afraid I don't know much about the internal workings of NFS, so I'm
hoping my description is enough to pinpoint the problem.  

-- 
Dan Nelson
[EMAIL PROTECTED]


Copyright (c) 1992-1999 The FreeBSD Project.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.
FreeBSD 4.0-CURRENT #2: Thu Dec  2 17:07:57 CST 1999
[EMAIL PROTECTED]:/usr/src/sys/compile/EMSSRV7
Timecounter "i8254"  frequency 1193182 Hz
CPU: Pentium III/Xeon (596.92-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x673  Stepping = 3
  
Features=0x383fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,XMM
real memory  = 268427264 (262136K bytes)
avail memory = 257163264 (251136K bytes)
Programming 24 pins in IOAPIC #0
FreeBSD/SMP: Multiprocessor motherboard
 cpu0 (BSP): apic id:  1, version: 0x00040011, at 0xfee0
 cpu1 (AP):  apic id:  0, version: 0x00040011, at 0xfee0
 io0 (APIC): apic id:  2, version: 0x00170011, at 0xfec0
Preloaded elf kernel "kernel" at 0xc0303000.
VESA: v2.0, 2048k memory, flags:0x0, mode table:0xc02af1c2 (122)
VESA: ATI MACH64
Pentium Pro MTRR support enabled
npx0: math processor on motherboard
npx0: INT 16 interface
pcib0: Intel 82443BX (440 BX) host to PCI bridge on motherboard
pci0: PCI bus on pcib0
pcib1: Intel 82443BX (440 BX) PCI-PCI (AGP) bridge at device 1.0 on pci0
pci1: PCI bus on pcib1
vga-pci0: ATI model 4757 graphics accelerator at device 0.0 on pci1
pcib2: PCI to PCI bridge (vendor=1011 device=0024) at device 2.0 on pci0
pci2: PCI bus on pcib2
ahc0: Adaptec 2944 Ultra SCSI adapter irq 21 at device 9.0 on pci2
ahc0: aic7880 Wide Channel A, SCSI Id=7, 16/255 SCBs
ahc1: Adaptec aic7890/91 Ultra2 SCSI adapter irq 16 at device 11.0 on pci2
ahc1: aic7890/91 Wide Channel A, SCSI Id=7, 16/255 SCBs
isab0: Intel 82371AB PCI to ISA bridge at device 7.0 on pci0
isa0: ISA bus on isab0
chip1: Intel PIIX4 IDE controller at device 7.1 on pci0
pci0: UHCI USB controller (vendor=0x8086, dev=0x7112) at 7.2 irq 19
Timecounter "PIIX"  frequency 3579545 Hz
intpm0: Intel 82371AB Power management controller at device 7.3 on pci0
intpm0: I/O mapped 850
intpm0: intr IRQ 9 enabled revision 0
smbus0: System Management Bus on intsmb0
smb0: SMBus general purpose I/O on smbus0
intpm0: PM I/O mapped 800 
fxp0: Intel EtherExpress Pro 10/100B Ethernet irq 18 at device 14.0 on pci0
fxp0: Ethernet address 00:90:27:dc:44:eb
fdc0: NEC 72065B or clone at port 0x3f0-0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: 1440-KB 3.5" drive on fdc0 drive 0
wdc1 at port 0x170-0x177 irq 15 on isa0
wdc1: unit 0 (atapi): SAMSUNG SC-140B/d005, removable, intr, dma, iordis
wcd0: drive speed 6875KB/sec, 128KB cache
wcd0: supported read types: CD-R, CD-RW, CD-DA, packet track
wcd0: Audio: play, 255 volume levels
wcd0: