Re: [OpenAFS] volume 536871264 is busy or server is down, recheck

Booker Bense Thu, 01 Apr 2010 12:45:10 -0700

On Thu, 1 Apr 2010, Jeffrey Altman wrote:

On 3/31/2010 10:33 PM, ?? wrote:

Hi,


I want to know how many parallel  read requests for one volume at the
same time? or how many parallel read requests for one replication volume
at the same time?

In our afs system, there are about one hundred people to read a volume
parallelly, and each people will issus about 500 read requests. I found
the afs client's /var/log/message file often appear  some error
information, such as "volume 536871264 is busy or server is down, recheck ".

Our experience is that AFS and a large batch farm is a denial ofservice waiting to happen for rw volumes. What happensis that each batch process registers a callback for volume it iswriting to and eventually the server gets starved for availablethreads and all the volumes served by that server sufferperformance hits. Essentially the read requests are limited by

the number of threads on the server for the volume.

We have a constant user education problem with this, especiallysince the tipping point doesn't get triggered until the user is

sure everything is working and "scales up" their runs to several
hundred simultaneous batch jobs.

In theory a read only replica volume should not be nearly as
resource intensive. However, we have found this is rarely
the case.

I suspect your real problem is that the jobs are opening dotfiles or configuration/logging files in some volume that is alsoon the same server as the volume you are reading from. Mostapplications have some library that assumes reading/writing to

small files in the home directory will never be a problem.

AFS scales really well under the assumption of many machines each

accessing different volumes, it crashes and burns when thescenario switches to many machines accessing the same volume.


_ Booker C. Bense
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Re: [OpenAFS] volume 536871264 is busy or server is down, recheck

Reply via email to