On 11/28/2005 06:01:43 PM, Konstantin Kudin wrote: > I have the following question. Occasionally on the cluster I am > running Espresso (mostly cp.x) the storage becomes temporarily > unavailable. > > This causes the jobs to die. It seems that while the output file > can sometimes wait for the storage, the restart files absolutely > cannot.
> So my question is if it is possible to make MPI wait for disk > availability when writing the restart files? I suspect this is an operating system level thing. When the operating system "talks" to the disk and doesn't get a reply, it can either wait, or give up and report failure. Probably there is a configurable timeout. If it's a networked filesystem, most likely the filesystem daemons are responsible for that. Actually, if this hasn't changed recently, Espresso doesn't use the MPI I/O functions: all I/O is handled by cpu 0 that reads and writes locally. The "local" disk may be (and most often is) a networked filesystem; but again, this is handled by the operating system, and completely transparent to Espresso. I also guess that output behaves differently than input because it is buffered. Gerardo
