> on my machine I am seeing very similiar problems when I try to access > many files, e.g.
Hi Alf, hi Hans, hi everyone! I think I could reproduce the problem. Twice. My configuration is Client: AFS version: OpenAFS 1.4.8pre2-pdc50 built 2008-10-09 Linux a11c31n01.pdc.kth.se 2.6.18-92.1.1.el5.centos.plus #1 SMP Fri Jun 20 18:05:02 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux /usr/openafs/sbin/afsd -nosettime -stat 16000 -dcache 8000 -daemons 16 -volumes 256 -rxpck 2000 -files 50000 -afsdb # /usr/openafs/bin/cmdebug localhost -cache Chunk files: 50000 Stat caches: 16000 Data caches: 8000 Volume caches: 256 Chunk size: 1048576 Cache size: 1828000 kB Set time: no Cache type: disk Server: AFS version: OpenAFS 1.4.7-pdc48 built 2008-07-30 Linux trevally.pdc.kth.se 2.6.18-92.1.1.el5.centos.plus #1 SMP Fri Jun 20 18:05: /usr/openafs/libexec/openafs/fileserver -nojumbo -p 128 -busyat 1200 -rxpck 800 -s 2400 -l 2400 -cb 200000 -b 480 -vc 2400 patched with RX_MAX_FRAG=1 I am running a program called sob to write many and/or big files. Sob fills the files with random junk. /afs/pdc.kth.se/home/p/pek/public_html/sob/sob.c $ ./sob -n 30000 -o 1000 -s 1k -b 1k -w Writing 30000 files of size 0.001MB, blocksize 1kB Wrote 29.297 MB in 32.070 s for 0.914 MB/s, 30000 files $ grep -r foooosoososs . $ ./sob -n 30000 -o 1000 -s 1k -b 1k -r Reading 30000 files of size 0.001MB, blocksize 1kB Failed to open file testfile.925 : Connection timed out Nothing in FileLog on the server, nothing in /var/log/messages on the client. OK. Let's try again: $ ./sob -n 30000 -o 1000 -s 1k -b 1k -r Reading 30000 files of size 0.001MB, blocksize 1kB Failed to open file testfile.2531 : Connection timed out Ok. Start tcpdump.... $ ./sob -n 30000 -o 1000 -s 1k -b 1k -r Reading 30000 files of size 0.001MB, blocksize 1kB Failed to open file testfile.3863 : Connection timed out Stop tcpdump. $ /usr/openafs/bin/fs getfid dir.3/testfile.3863 File dir.3/testfile.3863 (537095594.13532.6800) contained in volume 537095594 But I can't find that fid in the dump. The dump is here: /afs/pdc.kth.se/home/h/haba/Public/rx.1223629511 Then I'm running against a 1.4.8pre2 fileserver and it bugs out real fast. Restarted AFS client on client (same as above). Server: AFS version: OpenAFS 1.4.8pre2-pdc49 built 2008-10-08 Linux scad.pdc.kth.se 2.6.18-92.1.1.el5.centos.plus #1 SMP Fri Jun 20 18:05:02 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux /usr/openafs/libexec/openafs/fileserver -nojumbo -p 128 -busyat 1200 -rxpck 800 -s 2400 -l 2400 -cb 200000 -b 480 -vc 2400 RX_MAX_FRAG=1 $ ./sob -n 30000 -o 1000 -s 1k -b 1k -w Writing 30000 files of size 0.001MB, blocksize 1kB Failed to create file testfile.5572 : Connection timed out [Exit 1 ] $ /usr/openafs/bin/fs getfid dir.5/testfile.5572 File dir.5/testfile.5572 (537095587.16946.8506) contained in volume 537095587 Complete rx dump is here: /afs/pdc.kth.se/home/h/haba/Public/rx.1223631778 The last thing I see in the tcpdump is the storedata of the fid _before_ (16944) the missing one. I suspect that neither store-status nor store-data for dir.5/testfile.5572 (537095587.16946.8506) are ever sent. (The not answering vldb 130.237.237.230 at the end of the dump is "correct". Error in the vldb for foreign cell) Harald. _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
