Hi Marton, Just in case you didn't run across this page:
http://www.nersc.gov/nusers/systems/bassi/programming.php Cheers, Lex Kemper Department of Physics University of Florida Marci wrote: > Hi Axel, > >> marton, >> >> are you trying to run the postprocessing on your local >> machine or on the IBM machine? > > on the IBM machine. I had bad experiences with postprocessing on a > different machine because of using the iotk package, converting binary > files to text files and back is quite time consuming... (and I hate > ssh-ing gygabites of files) > >> that depends on what is causing this. it could just be that you >> have an integer overflow, due to the size of your system, or it >> could be that you try to read unformatted data on a different >> endian machine. i would suggest you insert a print statment into >> the code that prints out the values of DIRECT_IO_FACTOR and recl >> as well as unf_recl and then get back to use with the information >> about the architectures and these numbers (ideally also for the >> smaller test, where it worked). > > Unfortunately, the espresso I'm using on BASSI was not compiled by > myself, and now I'm scared of compiling mine because I'm not sure that > it will be able to read the binary that was made with an espresso > probably compiled with different compilers and/or compiler options. > Yeah, I know... I should have compiled my own version of quantum > espresso before making serious calculations to avoid these > situtations. > > So... I made some changes in diropn.f90 in espresso4.0/PW and compiled > my own version of espresso (with this I get the same error) to print > the values below in the case of the big run, honestly I do not really > know much about this cluster, but I'm sure I'm using compiler xl > fortran version 11.1.0.3 and library essl 4.2.0.3. > > recl: 415578000 > DIRECT_IO_FACTOR: 8 > unf_recl: -970343296 > > On my home cluster, I used a parallelized espresso-4.0.3 on system > "Intel Xeon E5410 @ 2.33Ghz, 16 GB RAM" with ifort 10.1.015, intel mkl > libraries 10.0.1.014 and openmpi-1.2.6 and with a smaller but similar > system (same pseudos, same cutoff, only gamma point), as I said there > is no "wrong record length" error and I got the following values: > > recl: 97079200 > DIRECT_IO_FACTOR: 8 > unf_recl: 776633600 > > If I'm right... 415578000*8 = 3324624000 which is bigger than the > largest value of a signed 32 bit integer, maybe that causes the > problem? > > Thanks for your help, > Marton > _______________________________________________ > Pw_forum mailing list > Pw_forum at pwscf.org > http://www.democritos.it/mailman/listinfo/pw_forum
