Hi Folks, I have a run on 256 PEs onot a lustre file system with the following code:
[snip] integer :: mype,npe,pe_min,pe_max,pe_prev,pe_next,mpi_my_real, & comm=mpi_comm_world,status(mpi_status_size),error, & mpi_realsize, thefile integer (kind=MPI_OFFSET_KIND) disp logical :: pe0,prl ! ************************************************************************* call mpi_init(error) call mpi_comm_rank(comm,mype,error) call mpi_comm_size(comm, npe,error) call mpi_type_extent(mpi_real, mpi_realsize, error); call mpi_type_size(MPI_REAL8, mpi_realsize, error) pe0=mype==0 . . . disp = mype*lu*mpi_realsize call mpi_barrier(comm,error) call mpi_file_open(comm,'output-parallel/dump.dat', MPI_MODE_RDONLY, mpi_info_null, thefile, error) call mpi_file_write_at(thefile, disp, u(1,nx1,ny1,nz1), lu, MPI_REAL8, mpi_status_ignore, error) call mpi_file_close(thefile, error) call mpi_barrier(comm,error) [snip] where lu is an integer which does not extend the limit. If I am exceeding the 32 Bit limit, which means that the size of my output file is larger then 2**31 but (what rouhgly 2.4 Gbytes), I am getting only a file with a size of 327 MBytey instead of expected 181 GByte for a checkpoint. This leads of course to a segfault when restarting. I am afraid this has something to do with the 32 Bit limit of my filesize, which might be calculated wrong in my offset (which is disp in my code) in mpi_file_write_at. Any ideas on how I can enclose the reson of the errpr, or - even better - on how to solve it? Best wishes Alexander