Hi,

 I have the following question. Occasionally on the cluster I am
running Espresso (mostly cp.x) the storage becomes temporarily
unavailable.

 This causes the jobs to die. It seems that while the output file can
sometimes wait for the storage, the restart files absolutely cannot.

 For example, in a recent case I had only 1 job out of 7 survive disk
unavailability, and I think this is because in the meantime it did not
need to write a restart file. It seems like in this particular case the
output file did wait.

 So my question is if it is possible to make MPI wait for disk
availability when writing the restart files?

 Note that when disk becomes full, the condition is different, and the
jobs die like this:

 2946  0.04081    0.0  418.3 -1164.45530 -1164.45530 -1163.99768
-1163.50459  0.0000  0.0000 -0.0001  2.0386
bm_list_5117:  p4_error: interrupt SIGx: 15
rm_l_1_5140:  p4_error: interrupt SIGx: 15
rm_l_1_5140: (196778.906250) net_send: could not write to fd=8, errno =
32
p0_5095: (196779.187500) net_send: could not write to fd=7, errno = 32
p3_8888:  p4_error: interrupt SIGx: 13
p2_8887:  p4_error: interrupt SIGx: 13
p2_8887: (196782.304688) net_send: could not write to fd=8, errno = 32
p3_8888: (196782.304688) net_send: could not write to fd=8, errno = 32
p1_5118: (196782.933594) net_send: could not write to fd=8, errno = 32


 Kostya


        
                
__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com

Reply via email to