On Thu, 1 Apr 2010, Jeffrey Altman wrote:

On 3/31/2010 10:33 PM, ?? wrote:
Hi,

I want to know how many parallel  read requests for one volume at the
same time? or how many parallel read requests for one replication volume
at the same time?

In our afs system, there are about one hundred people to read a volume
parallelly, and each people will issus about 500 read requests. I found
the afs client's /var/log/message file often appear  some error
information, such as "volume 536871264 is busy or server is down, recheck ".


Our experience is that AFS and a large batch farm is a denial of service waiting to happen for rw volumes. What happens is that each batch process registers a callback for volume it is writing to and eventually the server gets starved for available threads and all the volumes served by that server suffer performance hits. Essentially the read requests are limited by
the number of threads on the server for the volume.

We have a constant user education problem with this, especially since the tipping point doesn't get triggered until the user is
sure everything is working and "scales up" their runs to several
hundred simultaneous batch jobs.

In theory a read only replica volume should not be nearly as
resource intensive. However, we have found this is rarely
the case.

I suspect your real problem is that the jobs are opening dot files or configuration/logging files in some volume that is also on the same server as the volume you are reading from. Most applications have some library that assumes reading/writing to
small files in the home directory will never be a problem.

AFS scales really well under the assumption of many machines each
accessing different volumes, it crashes and burns when the scenario switches to many machines accessing the same volume.

_ Booker C. Bense
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to