Andrea, That's what it seems to be happening, indeed. However, I had never seen this error before - such a small system is not worthy of going into parallel tests, especially on such a huge number of processors! If you were using a larger system, you wouldn't have such an error. Try making a 2x2x1 supercell - it'll run with no problems.
As for efficiency - that's a whole other story :) Cheers, Marcos On Wed, Jul 7, 2010 at 12:05 AM, <[email protected]> wrote: > Dear Marcos, > > Many thanks for your prompt reply. Today I have performed a systematic > study (for the same system) by changing the number of processors. I found > out that SIESTA runs successfully in parallel mode up to a limit number of > processors. Above that limit, the program stopped with the error message > previously described (at the end of my original mail). > > In addition, when the program stopped, I found a message within the output > file reading: > > %%%%%%%%%%%%%%%%%%%%%% > Some processors are idle. Check PARALLEL_DIST > You have too many processors for the system size !!! > %%%%%%%%%%%%%%%%%%%%%%%% > > and by checking the PARALLEL_DIST file > > %%%%%%%%%%%%%%%%%%%%%%%%%%%% > Node 0 handles 4 orbitals. > Node 1 handles 4 orbitals. > Node 2 handles 4 orbitals. > Node 3 handles 4 orbitals. > Node 4 handles 4 orbitals. > Node 5 handles 4 orbitals. > Node 6 handles 4 orbitals. > Node 7 handles 4 orbitals. > Node 8 handles 4 orbitals. > Node 9 handles 4 orbitals. > Node 10 handles 4 orbitals. > Node 11 handles 4 orbitals. > Node 12 handles 4 orbitals. > Node 13 handles 0 orbitals. > Node 14 handles 0 orbitals. > Node 15 handles 0 orbitals. > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% > > Where you can see that there are 3 processors that are idle. Therefore, > it seems that when you have a somehow inefficient distribution of > processors, SIESTA stops. What do you think? > > Best regards, > > Andrea > > > > > > >> Andrea, >> >> In principle nothing should change, in terms of success in execution >> of Siesta from what I can remember. Basis sets and pseudos (as long as >> the latter are in psf format) should not pose a problem. From your >> error, it seems that the problem could be with your mpi, since it runs >> successfuly in serial mode. Have you performed a system update lately? >> I think not too long ago someone was having problems like yours and >> re-compilation from scratch (mpi, then blacs, then scalapack, then >> siesta) was the solution because the libc's had been updated and that >> interfered with the mpich libraries. This shouldn't be too difficult >> to do since you already compiled it anyway. Unfortunately you don't >> provide much information, such as when the error occurs, if the input >> is read correctly and so on, so it gets difficult to think of a >> possible cause and a less painful solution. :) >> >> If re-compilation from scratch doesn't work, then it is indeed a >> Siesta error. In cases like this, a "one-change-at-a-time-search" for >> the error is recommendable. >> >> Some things you could try (one at a time): >> >> 1) Try re-initializing the DM from scratch, instead of using the old one. >> 2) Try replacing the MP occupation with Fermi-Dirac smearing. MP >> apparently has a bug. >> 3) Try using a standard DZP basis set just for the sake of testing. >> 4) Re-generate your pseudo, could it be that your pseudo is somehow >> corrupted? Someone was having problems with pseudos from a previous >> run and found out there were spurious characters by the end of the >> file, some time ago. >> >> Cheers, >> >> Marcos >> >> On Mon, Jul 5, 2010 at 8:56 PM, <[email protected]> wrote: >>> Hi, >>> >>> I have installed (in a cluster) Siesta 2.0 which runs successfully in >>> parallel mode (it was checked for several systems). >>> >>> Now I want to take up again a problem that I have studied in the past >>> with >>> Siesta 1.3 in serial mode. I am trying to run the same problem now with >>> Siesta2.0 but it does not run OK in parallel mode but only in serial >>> one. >>> >>> Do I have to change anything in the pseudos & basis generated within >>> Siesta 1.3 in serial mode, so as they work OK with Siesta2.0 in >>> parallel? >>> >>> The message error is at the end of my message. I also send attached the >>> fdf file. >>> >>> Thanks in advance, >>> >>> Andrea >>> >>>> ----------------------------------------------------------------------------- >>> Error Message: >>>> forrtl: severe (174): SIGSEGV, segmentation fault occurred >>>> Image PC Routine Line >>> Source >>>> siesta 000000000045CAF2 Unknown Unknown >>> Unknown >>>> siesta 0000000000448ACE Unknown Unknown >>> Unknown >>>> siesta 000000000056515D Unknown Unknown >>> Unknown >>>> siesta 0000000000410002 Unknown Unknown >>> Unknown >>>> libc.so.6 0000003C33C1D8B4 Unknown Unknown >>> Unknown >>>> siesta 000000000040FF29 Unknown Unknown >>> Unknown >>>> forrtl: error (78): process killed (SIGTERM) >>>> Image PC Routine Line >>> Source >>>> libmpich.so.1.0 00002B3CEF46D062 Unknown Unknown >>> Unknown >>>> libmpich.so.1.0 00002B3CEF44FEFA Unknown Unknown >>> Unknown >>>> libmpich.so.1.0 00002B3CEF477CB6 Unknown Unknown >>> Unknown >>>> libmpich.so.1.0 00002B3CEF461122 Unknown Unknown >>> Unknown >>>> siesta 00000000007E116D Unknown Unknown >>> Unknown >>>> siesta 00000000007DF350 Unknown Unknown >>> Unknown >>>> siesta 00000000006C451F Unknown Unknown >>> Unknown >>>> siesta 00000000006C58D2 Unknown Unknown >>> Unknown >>>> siesta 000000000053D425 Unknown Unknown >>> Unknown >>>> siesta 000000000045D081 Unknown Unknown >>> Unknown >>>> siesta 0000000000448ACE Unknown Unknown >>> Unknown >>>> siesta 000000000056515D Unknown Unknown >>> Unknown >>>> siesta 0000000000410002 Unknown Unknown >>> Unknown >>>> libc.so.6 0000003C33C1D8B4 Unknown Unknown >>> Unknown >>>> siesta 000000000040FF29 Unknown Unknown >>> Unknown >>> >>> >>> >>> >>> >>> >> > > >
