Chao:
You are experiencing a cache issue with the client core. Correct me if I
am wrong:
1. On the machine getting the error (machine #1), you have created or
accessed the file, /mnt/pvfs2/tavg/t.natlantic.1-1. So, machine #1's cache
now has the handle associated with this file.
2. On machine #2, the file is deleted and re-created. (NOTE: this
includes if you are manually deleting and recreating the file from the
command line somewhere).
3. You access the file from machine #1 and get the error, because machine
#1 still has the old handle in its cache.
If this is the case, there are two things you can do:
1. Turn off caching or reduce the default caching timeout on machine #1,
which will slow your overall performance. The default timeout for a 2.9
installation is 60 seconds, since we are expecting large I/O with this
system. However, for smaller tests, you may need to lower the default or
even turn it off. You can modify the timeouts by manipulating them in the
/proc filesystem as root:
A. cat /proc/sys/pvfs2/{acache,ncache}/timeout-msecs - displays values
B. echo "number in millisecs" >
/proc/sys/pvfs2/{acache,ncache}/timeout-msecs - changes values
C. echo 0 > /proc/sys/pvfs2/{acache,ncache}/timeout-msecs - turns off
caching.
We have two caches, one for a file's attributes (acache) and one for the
directory entry by name (ncache). Most likely, it is the ncache that is
the problem here.
2. Change the way this file is processed to ensure that one machine is
coordinating the create/delete of this file, or use a different naming
convention, i.e., instead of deleting and recreating the file, use a
numbering scheme so each filename is different.
You can verify that the cache is the problem by turning the cache off.
Hope this helps!
Becky
On Fri, Jan 30, 2015 at 12:25 PM, Rob Latham <[email protected]> wrote:
>
>
> On 01/30/2015 11:14 AM, Chao Chen wrote:
>
>> Hi,
>> Thanks for your help.
>>
>> The error messages reported from fortran and /var/log/message are as
>> follows (here is a subset of repeated error messages):
>>
>
> Please remember to keep the list in your replies.
>
> At line 303 of file io_binary.f90 (unit = 2, file = '')
>> Fortran runtime error: File '/mnt/pvfs2/tavg/t.natlantic.1-1' does not
>> exist
>> At line 303 of file io_binary.f90 (unit = 2, file = '')
>>
> ...
>
>> Fortran runtime error: File '/mnt/pvfs2/tavg/t.natlantic.1-1' does not
>> exist
>> Fortran runtime error: File '/mnt/pvfs2/tavg/t.natlantic.1-1' does not
>> exist
>>
>>
>> dmesg report:
>> [ 5382.553959] pvfs2_file_write: error in vectored write to handle
>> 922337203685476872, FILE: t.natlantic.1-0, returning -2
>>
> ...
>
>> [ 5382.577885] pvfs2_file_write: error in vectored write to handle
>> 922337203685476872, FILE: t.natlantic.1-0, returning -2
>> [ 5382.578733] pvfs2_file_write: error in vectored write to handle
>> 922337203685476872, FILE: t.natlantic.1-0, returning -2
>>
>>
> -2 is ENOENT: no such file or directory.
>
> can get files with ls mount point /mnt/pvfs2
>>
>
> How about all clients on all nodes? Are the permissions ok? if
> pvfs2-ping and pvfs2-ls also work, then I'm going to be running out of
> ideas...
>
>
>
>> POP supports NetCDF but doesn't support parallel-netcdf. and it relies
>> fortran io for parallel io
>>
>
>
>
>> On Thu, Jan 29, 2015 at 4:42 PM, Rob Latham <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>
>>
>> On 01/29/2015 03:35 PM, Harms, Kevin N. wrote:
>>
>>
>> There is a good chance that the FORTRAN runtime is issuing a
>> system call
>> that the orangefs module doesn't support. (such as a lock) If
>> you can
>> strace POP, you can probably find out which system call fails.
>>
>>
>> One option might be to use MPI-IO from Fortran. Doesn't POP support
>> parallel-netcdf? that too has fortran bindings.
>>
>> ==rob
>>
>>
>> kevin
>>
>> Hi all,
>>
>>
>> Recently, I am trying to run POP (Parallel Ocean Program,
>> written with
>> fortran) with Orangefs. And found that there is always an
>> I/O error
>> reporting parameter is not correct from POP if I configure
>> it to output
>> file to Orangefs. But POP works fine if configure
>> its output to NFS. I also run IOR (written with C) with
>> orangefs, and
>> everything works fine too. Is there anybody has experience
>> of running
>> fortran program on orangefs ? How Can I figure out what's
>> the problem?
>>
>>
>> I tested both kernel module client interface and fuse
>> interface, with
>> orangefs-2.9.0 version.
>>
>>
>> Thanks
>>
>>
>>
>> --
>> Best Regards !
>> Chao
>>
>>
>>
>>
>>
>>
>> _________________________________________________
>> Pvfs2-users mailing list
>> Pvfs2-users@beowulf-__underground.org
>> <mailto:[email protected]>
>> http://www.beowulf-__underground.org/mailman/__
>> listinfo/pvfs2-users
>> <http://www.beowulf-underground.org/mailman/
>> listinfo/pvfs2-users>
>>
>>
>> --
>> Rob Latham
>> Mathematics and Computer Science Division
>> Argonne National Lab, IL USA
>>
>> _________________________________________________
>> Pvfs2-users mailing list
>> Pvfs2-users@beowulf-__underground.org
>> <mailto:[email protected]>
>> http://www.beowulf-__underground.org/mailman/__listinfo/pvfs2-users
>> <http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users>
>>
>>
>>
>>
>> --
>>
>> Best Regards !
>>
>> Chao Chen
>>
>>
> --
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
> _______________________________________________
> Pvfs2-users mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users