[Wien] Error in paralel Lapw1
Hi Dr/ Blaha Thanks you for your reply I took smaller k value 12 in 5x5x3 grid(100 k points i defined). I got the same mistake. I have computer which has 64 GB Ramand i have 16 core (intel xeon processes). My machine file is .machines is the control file for parallel execution. Add lines like # # speed:machine_name # # for each machine specifying there relative speed. For mpi parallelization use # # speed:machine_name:1 machine_name:1 # lapw0:machine_name:1 machine_name:1 # # further options are: # # granularity:number (for loadbalancing on irregularly used machines) # residue:machine_name (on shared memory machines) # extrafine (to distribute the remaining k-points one after the other) # # granularity sets the number of files that will be approximately # be generated by each processor; this is used for load-balancing. # On very homogeneous systems set number to 1 # if after distributing the k-points to the various machines residual # k-points are left, they will be distributed to the residual-machine_name. # 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost granularity:1 extrafine:1 # # Uncomment for specific OMP-parallelization (overwriting a global OMP_NUM_THREADS) # #omp_global:4 # or use program-specific parallelization: #omp_lapw0:4 #omp_lapw1:4 #omp_lapw2:4 #omp_lapwso:4 #omp_dstart:4 #omp_sumpara:4 #omp_nlvdw:4 I had RTmax 7 percent. The error .machines is the control file for parallel execution. Add lines like # # speed:machine_name # # for each machine specifying there relative speed. For mpi parallelization use # # speed:machine_name:1 machine_name:1 # lapw0:machine_name:1 machine_name:1 # # further options are: # # granularity:number (for loadbalancing on irregularly used machines) # residue:machine_name (on shared memory machines) # extrafine (to distribute the remaining k-points one after the other) # # granularity sets the number of files that will be approximately # be generated by each processor; this is used for load-balancing. # On very homogeneous systems set number to 1 # if after distributing the k-points to the various machines residual # k-points are left, they will be distributed to the residual-machine_name. # 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost granularity:1 extrafine:1 # # Uncomment for specific OMP-parallelization (overwriting a global OMP_NUM_THREADS) # #omp_global:4 # or use program-specific parallelization: #omp_lapw0:4 #omp_lapw1:4 #omp_lapw2:4 #omp_lapwso:4 #omp_dstart:4 #omp_sumpara:4 #omp_nlvdw:4 . I do not have any idea what is wrong now. -- Yrd Doc Dr. Murat Aycibin Van Yuzuncu Yil Universitesi Fizik Bolumu ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
Re: [Wien] Error in paralel Lapw1
We still don't know much about your case. Please modify your .machinesfile and use only 2 instead of 16 lines with 1:localhost If this solves the problem, increase it to 4 or 6 (when you have 12 k-points) or 8 (if you have more k-points). Also uncomment omp_global:2 or 4 Then you are still using all your cores, but you will need less memory. Am 08.02.2021 um 11:24 schrieb Murat Aycibin: Hi Dr/ Blaha Thanks you for your reply I took smaller k value 12 in 5x5x3 grid(100 k points i defined). I got the same mistake. I have computer which has 64 GB Ramand i have 16 core (intel xeon processes). My machine file is .machines is the control file for parallel execution. Add lines like # # speed:machine_name # # for each machine specifying there relative speed. For mpi parallelization use # # speed:machine_name:1 machine_name:1 # lapw0:machine_name:1 machine_name:1 # # further options are: # # granularity:number (for loadbalancing on irregularly used machines) # residue:machine_name (on shared memory machines) # extrafine (to distribute the remaining k-points one after the other) # # granularity sets the number of files that will be approximately # be generated by each processor; this is used for load-balancing. # On very homogeneous systems set number to 1 # if after distributing the k-points to the various machines residual # k-points are left, they will be distributed to the residual-machine_name. # 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost granularity:1 extrafine:1 # # Uncomment for specific OMP-parallelization (overwriting a global OMP_NUM_THREADS) # #omp_global:4 # or use program-specific parallelization: #omp_lapw0:4 #omp_lapw1:4 #omp_lapw2:4 #omp_lapwso:4 #omp_dstart:4 #omp_sumpara:4 #omp_nlvdw:4 I had RTmax 7 percent. The error .machines is the control file for parallel execution. Add lines like # # speed:machine_name # # for each machine specifying there relative speed. For mpi parallelization use # # speed:machine_name:1 machine_name:1 # lapw0:machine_name:1 machine_name:1 # # further options are: # # granularity:number (for loadbalancing on irregularly used machines) # residue:machine_name (on shared memory machines) # extrafine (to distribute the remaining k-points one after the other) # # granularity sets the number of files that will be approximately # be generated by each processor; this is used for load-balancing. # On very homogeneous systems set number to 1 # if after distributing the k-points to the various machines residual # k-points are left, they will be distributed to the residual-machine_name. # 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost granularity:1 extrafine:1 # # Uncomment for specific OMP-parallelization (overwriting a global OMP_NUM_THREADS) # #omp_global:4 # or use program-specific parallelization: #omp_lapw0:4 #omp_lapw1:4 #omp_lapw2:4 #omp_lapwso:4 #omp_dstart:4 #omp_sumpara:4 #omp_nlvdw:4 . I do not have any idea what is wrong now. -- Yrd Doc Dr. Murat Aycibin Van Yuzuncu Yil Universitesi Fizik Bolumu ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html -- -- Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna Phone: +43-1-58801-165300 FAX: +43-1-58801-165982 Email: bl...@theochem.tuwien.ac.atWIEN2k: http://www.wien2k.at WWW: http://www.imc.tuwien.ac.at - ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
Re: [Wien] Error in paralel Lapw1
You might also check what OMP_NUM_THREADS is set to on your system in .bashrc or .cshrc? For example, on my Ubuntu system, I do: username@computername:~/Desktop$ grep OMP_NUM_THREADS ~/.bashrc export OMP_NUM_THREADS=1 As you can see I'm using a different value than the default that would have been set by userconfig_lapw during installation of WIEN2k. I believe the default value is OMP_NUM_THREADS=4. Is your Xeon processor a E5-2698 v3? If it is, the following link has "# of Threads" as 32: https://ark.intel.com/content/www/us/en/ark/products/81060/intel-xeon-processor-e5-2698-v3-40m-cache-2-30-ghz.html With your .machines file requesting 16 cores, if you OMP_NUM_THREADS is 4, you would be requesting 16 cores * 4 threads/core = 64 threads. That should be 32 threads (=64 requested threads - 32 processor core threads) more than your processor could handle at one time. If you using a different processor, you would have to look on Intel's website to find out the "# of Threads" your particular processor can handle. The OMP_NUM_THREADS of course can be overridden by using omp_global in the .machines file. If the problem is coming from a memory error as previously discussed as a possibility in the post: https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg20807.html Then, you might want to check /var/log. The following post might help with that: https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19703.html You might also check what parallel_options are set to with the command: cat $WIENROOT/parallel_options If the problem is related to passwordless login. One of the posts in the mailing list archive that might help is: https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg02295.html On 2/8/2021 5:33 AM, Peter Blaha wrote: We still don't know much about your case. Please modify your .machinesfile and use only 2 instead of 16 lines with 1:localhost If this solves the problem, increase it to 4 or 6 (when you have 12 k-points) or 8 (if you have more k-points). Also uncomment omp_global:2 or 4 Then you are still using all your cores, but you will need less memory. Am 08.02.2021 um 11:24 schrieb Murat Aycibin: Hi Dr/ Blaha Thanks you for your reply I took smaller k value 12 in 5x5x3 grid(100 k points i defined). I got the same mistake. I have computer which has 64 GB Ramand i have 16 core (intel xeon processes). My machine file is .machines is the control file for parallel execution. Add lines like # # speed:machine_name # # for each machine specifying there relative speed. For mpi parallelization use # # speed:machine_name:1 machine_name:1 # lapw0:machine_name:1 machine_name:1 # # further options are: # # granularity:number (for loadbalancing on irregularly used machines) # residue:machine_name (on shared memory machines) # extrafine (to distribute the remaining k-points one after the other) # # granularity sets the number of files that will be approximately # be generated by each processor; this is used for load-balancing. # On very homogeneous systems set number to 1 # if after distributing the k-points to the various machines residual # k-points are left, they will be distributed to the residual-machine_name. # 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost granularity:1 extrafine:1 # # Uncomment for specific OMP-parallelization (overwriting a global OMP_NUM_THREADS) # #omp_global:4 # or use program-specific parallelization: #omp_lapw0:4 #omp_lapw1:4 #omp_lapw2:4 #omp_lapwso:4 #omp_dstart:4 #omp_sumpara:4 #omp_nlvdw:4 I had RTmax 7 percent. The error .machines is the control file for parallel execution. Add lines like # # speed:machine_name # # for each machine specifying there relative speed. For mpi parallelization use # # speed:machine_name:1 machine_name:1 # lapw0:machine_name:1 machine_name:1 # # further options are: # # granularity:number (for loadbalancing on irregularly used machines) # residue:machine_name (on shared memory machines) # extrafine (to distribute the remaining k-points one after the other) # # granularity sets the number of files that will be approximately # be generated by each processor; this is used for load-balancing. # On very homogeneous systems set number to 1 # if after distributing the k-points to the various machines residual # k-points are left, they will be distributed to the residual-machine_name. # 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost 1:localhost granularity:1 extrafine:1 # # Uncomment for specific OMP-parallelization (overwriting a global OMP_NUM_THREADS) # #omp_global:4 # or use program-specific parallelization: #omp_lapw0:4 #omp_lapw1:4