Jeffrey,

I suppose I must apologize for sending any incorrect data in my original post on 12/1. At that point in my problem diagnosis I thought it was my physical RAM that was being used up causing lots of paging to occur. I was using a machine with only 256 Meg physical. Based on your statements that followed about the AFS cache implementation, I changed to using a smaller cache. You would think this would have made the original problem of slow starting apps totally disappear, but it did not. The machine certainly is more responsive, due to less, or no paging going on, but the AFS cache still seems to degrade application startup times. This was proven once we started testing on a machine with a Gig of RAM where the Windows cache didn't even enter into the equation. So I'm now working on another problem, which definitely seems to be a bug in the AFS cache manager.

Ok, I've tried to simplify this as much as possible. My previous email documents the exact method to produce the bug and only takes about 5 minutes to reproduce. You should see the same symptoms at your site. You should be able to watch the handle count rise well above 256 handles. You should be able to obtain the same results as I, more easily than I can gather it all into a detailed report for you (see data at bottom). If you are not seeing the same results, just let me know, I would be curious why.

Here is the method again, using correct units, with added text for clarity...

To reproduce the problem, use the following settings...

     Windows XP SP1, 1 Gig RAM, P4 3.0 Gig, 100 MBit connectivity
     OpenAFS 1.2.10
     Cache size:  8192K  ( 8 Meg cache )

Note: For those who need exacting definitions, this is an 8 Meg cache, not 32 Meg, not 256 Meg, not 8 Gig...just a simple 8 Meg cache. Units are checked. Based on your information, the current Windows AFS cache implementation should handle this cache size easily without problems.

     Chunk size:  32K
     Status Entries:  1000
     Background Threads: 6
     Service Threads: 8

1. Make a temporary local directory to copy some files to...

c:\>mkdir "c:\temp\test"

2. Change into the temporary folder...

c:\>cd "c:\temp\test"

3. Make sure you start with a fresh cache...

c:\>net stop "IBM AFS Client"

c:\>del "c:\afscache"

Note: It may take some time here before the AFS service let's go of the cache, keep trying the delete until the file is gone. (I'm not sure why it takes so long sometimes for AFS to shutdown. Its probably the same problem that manifests the handle leak.)

c:\>net start "IBM AFS Client"


4. Now bring up the task manager and select the columns for "afsd_service.exe" handles, etc., using the view->select columns menu.


5. Now, in the default temporary directory at the command prompt, start a recursive copy of a large tree of files out of your cells AFS space. It doesn't matter what files...any files will do.

c:\temp\test>xcopy "\\%computername%-afs\all\your-cell\dir1..." /s /e /f /c

The "/s /e /f /c" means...all subdirectories, even empty ones, show the files as they are being copied, and continue on errors.

Again, any files will do. You may need to copy a large number of files and/or some big files. At our site I just started the copy on a very large tree and let it go. For example, the following should work fine...

c:\temp\test>xcopy "\\%computername%-afs\all\your-cell-name-here\*.*" /s /e /f /c

(Make sure you don't have any symbolic links in AFS that might create a recursive loop in whatever tree of files you are copying. The xopy.exe program will follow them if you do.)

As the copy is progressing, as the handles start rising, keep watching. After the count of handles rises into the thousands, I just pressed CTRL+C, or CTRL+Break. Depending on your AFS permissions, you may need a token to do the copy. Make sure the size of the files being copied are plenty larger than the cache size of 8192K.

Now, if you watch the Task Manager's "afsd_service.exe" handle count it will start out ok, but soon rise out of control. Stopping the copy has no effect of reducing the handles.

Using the above method I was able to easily obtain the following numbers...

     Using the above config of 8 Meg cache with 32K chunks.
     After about 987 Meg copied from AFS to the local "c:\temp\test" folder.
     http://www.coe.uncc.edu/~rmdyer/test_8MB_afscache.jpg

Here's another, same senario, just using the AFS client defaults for cache and chunk...
http://www.coe.uncc.edu/~rmdyer/test_32MB_afscache.jpg


Is this enough information? When you say..."Please add this data to the Request (#2628)". How do I do this?

Happy Holidays! Sorry to be such a problem (an ass).

Rodney

_______________________________________________
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to