Re: [Wien] configuring parallel options using ssh
You state "However, a .machines file with several machines will run using all required CPUs on the machine where launched (ignoring hosts)." That implies that you have not correctly configured the command to execute the mpi task. Without knowledge of what this is on your system (mpirun, srun, other) it is impossible to say more than this. On Mon, Sep 10, 2018, 13:57 Luc Fruchter wrote: > Dear users, > > I failed configuring the parallel options to run cases on several > machines, each of them with several CPUs, driven by ssh protocol. > > * Configuring the parallel options with: shared memory, MPI = 0, ssh > protocol, allows to run parallel jobs using several CPUs on the same > machine. However, a .machines file with several machines will run using > all required CPUs on the machine where launched (ignoring hosts). > > - Configuring with: no shared memory, MPI = 0, ssh protocol, will run no > parallel jobs, either on the same or different machines (Below is the > output for the error in this case). > > All machines communicate without problem with ssh and no password, and > have identical file paths. > > Thanks for helping > > -- > > > lapw0 -p (20:33:36) starting parallel lapw0 at Mon Sep 10 20:33:36 > CEST 2018 > .machine0 : processors > running lapw0 in single mode > 6.793u 0.073s 0:06.86 100.0%0+0k 0+5152io 0pf+0w > > lapw1 -p (20:33:43) starting parallel lapw1 at Mon Sep 10 > 20:33:43 CEST 2018 > -> starting parallel LAPW1 jobs at Mon Sep 10 20:33:43 CEST 2018 > running LAPW1 in parallel mode (using .machines) > 1 number_of_parallel_jobs > localhost(48)Summary of lapw1para: > localhostk=48user=0 wallclock=0 > 0.112u 0.158s 0:02.28 11.4% 0+0k 0+224io 0pf+0w > > lapw2 -p (20:33:45) running LAPW2 in parallel mode > ** LAPW2 crashed! > 0.085u 0.062s 0:00.13 107.6%0+0k 0+872io 0pf+0w > error: command /root/Documents/WIEN2KROOT/lapw2para lapw2.def failed > > > stop error > ___ > Wien mailing list > Wien@zeus.theochem.tuwien.ac.at > > https://urldefense.proofpoint.com/v2/url?u=http-3A__zeus.theochem.tuwien.ac.at_mailman_listinfo_wien&d=DwICAg&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=jnWlNOsPQtu8S9u0zpnjg1uwkVTrqkU_pN0NSD9BZ8g&s=P9YpOLMVPRwD8rg_-dqUngDGtvXLh4QdqM0nLjjzfgI&e= > SEARCH the MAILING-LIST at: > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mail-2Darchive.com_wien-40zeus.theochem.tuwien.ac.at_index.html&d=DwICAg&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=jnWlNOsPQtu8S9u0zpnjg1uwkVTrqkU_pN0NSD9BZ8g&s=9gXTE0OrAjv5wkaa5hWJTChJjI_2UG4VVZDEVV4W2ZM&e= > ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
Re: [Wien] configuring parallel options using ssh
How are the several machines connected? If the machines are connected using the currently typical 10/100 Mb/s, it is useless to do that [2]. As was mentioned before, you need either 1 Gb/s [3] or InfiniBand [4]. Are the machines setup to have common (NFS) filesystem [5,6]? The given information (error message) is insufficient. So I doubt anyone can help. For parallel calculations, it usually helps to provide: a) What command was used to run the parallel calculation? For example, runsp -p command or qsub job.pbs? b) If you are using SRC_mpiutil [7] or a job script [8], which one? If it not exactly one seen on a website that you can provide a link to, then what are the contents of your job script? c) Did you setup the .machines file for a k-point parallel or mpi parallel calculation? What are the contents of the .machines file? For a job script, it can be helpful to see both the job script and the .machines file it created. The high performance computing (hpc) clusters [9] that effectively use mpi can be quite unique. So in many cases it is not possible (for us on the mailing list) to run your job script to reproduce the .machines file unless it is done on the particular computer system that you are using. d) If you search the mailing list archive, you should find there are other output files that could contain information for identifying and resolving such an error. For example,the standard input/output file [10-13]. [1] https://en.wikipedia.org/wiki/Fast_Ethernet [2] https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg13632.html [3] https://en.wikipedia.org/wiki/Gigabit_Ethernet ; https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg14035.html [4] https://en.wikipedia.org/wiki/InfiniBand ; https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg05595.html [5] https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg09554.html [6] https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg09229.html [7] http://susi.theochem.tuwien.ac.at/reg_user/unsupported/ [8] http://susi.theochem.tuwien.ac.at/reg_user/faq/pbs.html [9] https://en.wikipedia.org/wiki/Supercomputer [10] https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg13598.html [11] https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg17317.html [12] https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg15549.html [13] https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg16551.html On 9/10/2018 12:56 PM, Luc Fruchter wrote: Dear users, I failed configuring the parallel options to run cases on several machines, each of them with several CPUs, driven by ssh protocol. * Configuring the parallel options with: shared memory, MPI = 0, ssh protocol, allows to run parallel jobs using several CPUs on the same machine. However, a .machines file with several machines will run using all required CPUs on the machine where launched (ignoring hosts). - Configuring with: no shared memory, MPI = 0, ssh protocol, will run no parallel jobs, either on the same or different machines (Below is the output for the error in this case). All machines communicate without problem with ssh and no password, and have identical file paths. Thanks for helping -- > lapw0 -p (20:33:36) starting parallel lapw0 at Mon Sep 10 20:33:36 CEST 2018 .machine0 : processors running lapw0 in single mode 6.793u 0.073s 0:06.86 100.0% 0+0k 0+5152io 0pf+0w > lapw1 -p (20:33:43) starting parallel lapw1 at Mon Sep 10 20:33:43 CEST 2018 -> starting parallel LAPW1 jobs at Mon Sep 10 20:33:43 CEST 2018 running LAPW1 in parallel mode (using .machines) 1 number_of_parallel_jobs localhost(48) Summary of lapw1para: localhost k=48 user=0 wallclock=0 0.112u 0.158s 0:02.28 11.4% 0+0k 0+224io 0pf+0w > lapw2 -p (20:33:45) running LAPW2 in parallel mode ** LAPW2 crashed! 0.085u 0.062s 0:00.13 107.6% 0+0k 0+872io 0pf+0w error: command /root/Documents/WIEN2KROOT/lapw2para lapw2.def failed > stop error ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
[Wien] Depositing atom positions
I want to make a general point on this list, namely the importance of depositing atomic positions in a standard format (e.g. CIF) with publications. I continue to see far too many DFT papers with a Figure and a few selected bond lengths, but insufficient information for someone to reproduce the results in detail. This is particularly true for surface structures, but quite general in the DFT community. If you submit a paper to the Wien2k publication list, I will urge you to also deposit a CIF with it. (Sorry Karlheinz for the extra work.) There may be other depositories that could be used; I know that Northwestern University now has one as do many of the DOE labs in the US. -- Professor Laurence Marks "Research is to see what everybody else has seen, and to think what nobody else has thought", Albert Szent-Gyorgi www.numis.northwestern.edu ; Corrosion in 4D: MURI4D.numis.northwestern.edu Partner of the CFW 100% program for gender equity, www.cfw.org/100-percent Co-Editor, Acta Cryst A ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
[Wien] configuring parallel options using ssh
Dear users, I failed configuring the parallel options to run cases on several machines, each of them with several CPUs, driven by ssh protocol. * Configuring the parallel options with: shared memory, MPI = 0, ssh protocol, allows to run parallel jobs using several CPUs on the same machine. However, a .machines file with several machines will run using all required CPUs on the machine where launched (ignoring hosts). - Configuring with: no shared memory, MPI = 0, ssh protocol, will run no parallel jobs, either on the same or different machines (Below is the output for the error in this case). All machines communicate without problem with ssh and no password, and have identical file paths. Thanks for helping -- > lapw0 -p (20:33:36) starting parallel lapw0 at Mon Sep 10 20:33:36 CEST 2018 .machine0 : processors running lapw0 in single mode 6.793u 0.073s 0:06.86 100.0%0+0k 0+5152io 0pf+0w > lapw1 -p (20:33:43) starting parallel lapw1 at Mon Sep 10 20:33:43 CEST 2018 -> starting parallel LAPW1 jobs at Mon Sep 10 20:33:43 CEST 2018 running LAPW1 in parallel mode (using .machines) 1 number_of_parallel_jobs localhost(48)Summary of lapw1para: localhost k=48user=0 wallclock=0 0.112u 0.158s 0:02.28 11.4% 0+0k 0+224io 0pf+0w > lapw2 -p (20:33:45) running LAPW2 in parallel mode ** LAPW2 crashed! 0.085u 0.062s 0:00.13 107.6%0+0k 0+872io 0pf+0w error: command /root/Documents/WIEN2KROOT/lapw2para lapw2.def failed > stop error ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
Re: [Wien] error in mBJ
I should have given more details in my previous email. What is technically a problem with vacuum is to calculate c using the average of grad(rho)/rho in the unit cell, because this average has no real meaning when there is vacuum. The solution is to manually choose a fixed value of c. Are the results reliable or not is another question that is difficult to answer in advance. This depends on the chosen value of c. A good choice for c may (or may not) be the value that is obtained for the bulk system that is the most related to your layered systems. On Monday 2018-09-10 14:59, mitra narimani wrote: Date: Mon, 10 Sep 2018 14:59:56 From: mitra narimani Reply-To: A Mailing list for WIEN2k users To: wien Subject: [Wien] error in mBJ Thank you for your response. But I have some questions? you say that the mBJ is not technologically appropriate for monolayers or nanolayers with vacuum. Are the results of mBJ for these cases unreliable? If we remove case.in0_grr and correct the value in case.grr, are the results unreliable again?And if your response is positive so what is the appropriate exchange correlation potential for these cases? ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
[Wien] error in mBJ
Thank you for your response. But I have some questions? you say that the mBJ is not technologically appropriate for monolayers or nanolayers with vacuum. Are the results of mBJ for these cases unreliable? If we remove case.in0_grr and correct the value in case.grr, are the results unreliable again?And if your response is positive so what is the appropriate exchange correlation potential for these cases? ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
Re: [Wien] Three questions
Dear Prof. Blaha, Thank you for a rapid and detailed answer. This helps a lot! Best regards, Lukasz On 9/10/2018 11:49 AM, Peter Blaha wrote: 1. CHARGE CONVERGENCE: I know this has been discussed before, but it seems it didn't change in Wien2k_18. Here is a typical example of the charge convergence history of the last few iterations of FM+SOC (ferromagnetic + spin orbit coupling) calculation: :ENERGY convergence: 0 0.0001 .000132045000 :CHARGE convergence: 0 0.001 .0027578 :ENERGY convergence: 1 0.0001 .49765000 :CHARGE convergence: 0 0.001 .0050904 :ENERGY convergence: 1 0.0001 .55765000 :CHARGE convergence: 0 0.001 .0004672 :ENERGY convergence: 1 0.0001 .22005000 :CHARGE convergence: 1 0.001 -.0001786 Something strange happens with charge convergence, is this OK? This is ok. 2. LIMITS in inso and in1c files: in order to avoid large vector files I am changing the energy limits in inso and in1c files for band structure calculations. SCF is done with default inso and in1c files, then I do save_lapw, then I edit case.inso and case.in1c files, and then I do FM+SOC band structure calculation: #! /bin/bash x lapw1 -band -up -p x lapw1 -band -dn -p x lapwso -up -p I am using emin/emax -0.5 0.5 in both files (inso and in1c) without changing anything else, then I have bands from limited energy range. I just want to make sure that this procedure is fine. No, this is NOT ok. You must not change Emax in case.in1c ! If you do, your basis set for the spin-orbit step is limited and eigenvalues will change as compared to the scf calculation. You can reduce emax in case.inso if you are not interested in the bands at higher energies. 3. FM+SOC convergence: I am doing FM+SOC calculations for different in-plane magnetization M directions in a 2D Fe(001) ferromagnetic layer. Actually an older Wien2k_14 version was not working well for this, results for generic M directions were really strange. Wien2k_18 seems much better, however, when starting things from the scratch (each M angle completely separate calculation) it seems that for some M angles the result is off, as if it didn't actually properly converge. I am using a fairly dense mesh for SCF (2000k, 25 25 3), and -ec 0.0001 -cc 0.001. Should I maybe try -fermit setting in init_lapw, and what would be a reasonable value? Do I always need to use instgen_lapw before init_lapw when starting a new case? Should I perhaps do each next M angle on top of a previously converged M angle (and save_lapw for each M angle)? Doing separate calculations for different directions may / may not yield correct results. The proper (save) way is to use ONE case.struct file for all cases. i) select all directions of magnetization first. ii) Produce (using init_so) struct files, which are the same for all cases (do not change after init_so) and use this struct file for ALL (also non-so) calculations. This works as: a) generate normal struct file. b) init -b -sp c) init_so with first direction, accept the generated structure d) init_so with second direction, accept ... repeat this for all desired directions (or until you have a P1 structure (only identity). e) with this setup (no new init_lapw ) execute: runsp -ec ... and save as "non-so" runsp -so ... and save as "so-dir_2" restore non-so; edit case.inso and put the first direction in; runsp -so ... and save as "so-dir_1" In this way, you can also compare the force-theorem (:SUM of first iteration (2 numbers !) with the scf solution. Best, Lukasz ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
Re: [Wien] Three questions
1. CHARGE CONVERGENCE: I know this has been discussed before, but it seems it didn't change in Wien2k_18. Here is a typical example of the charge convergence history of the last few iterations of FM+SOC (ferromagnetic + spin orbit coupling) calculation: :ENERGY convergence: 0 0.0001 .000132045000 :CHARGE convergence: 0 0.001 .0027578 :ENERGY convergence: 1 0.0001 .49765000 :CHARGE convergence: 0 0.001 .0050904 :ENERGY convergence: 1 0.0001 .55765000 :CHARGE convergence: 0 0.001 .0004672 :ENERGY convergence: 1 0.0001 .22005000 :CHARGE convergence: 1 0.001 -.0001786 Something strange happens with charge convergence, is this OK? This is ok. 2. LIMITS in inso and in1c files: in order to avoid large vector files I am changing the energy limits in inso and in1c files for band structure calculations. SCF is done with default inso and in1c files, then I do save_lapw, then I edit case.inso and case.in1c files, and then I do FM+SOC band structure calculation: #! /bin/bash x lapw1 -band -up -p x lapw1 -band -dn -p x lapwso -up -p I am using emin/emax -0.5 0.5 in both files (inso and in1c) without changing anything else, then I have bands from limited energy range. I just want to make sure that this procedure is fine. No, this is NOT ok. You must not change Emax in case.in1c ! If you do, your basis set for the spin-orbit step is limited and eigenvalues will change as compared to the scf calculation. You can reduce emax in case.inso if you are not interested in the bands at higher energies. 3. FM+SOC convergence: I am doing FM+SOC calculations for different in-plane magnetization M directions in a 2D Fe(001) ferromagnetic layer. Actually an older Wien2k_14 version was not working well for this, results for generic M directions were really strange. Wien2k_18 seems much better, however, when starting things from the scratch (each M angle completely separate calculation) it seems that for some M angles the result is off, as if it didn't actually properly converge. I am using a fairly dense mesh for SCF (2000k, 25 25 3), and -ec 0.0001 -cc 0.001. Should I maybe try -fermit setting in init_lapw, and what would be a reasonable value? Do I always need to use instgen_lapw before init_lapw when starting a new case? Should I perhaps do each next M angle on top of a previously converged M angle (and save_lapw for each M angle)? Doing separate calculations for different directions may / may not yield correct results. The proper (save) way is to use ONE case.struct file for all cases. i) select all directions of magnetization first. ii) Produce (using init_so) struct files, which are the same for all cases (do not change after init_so) and use this struct file for ALL (also non-so) calculations. This works as: a) generate normal struct file. b) init -b -sp c) init_sowith first direction, accept the generated structure d) init_sowith second direction, accept ... repeat this for all desired directions (or until you have a P1 structure (only identity). e) with this setup (no new init_lapw ) execute: runsp -ec ... and save as "non-so" runsp -so ... and save as "so-dir_2" restore non-so; edit case.inso and put the first direction in; runsp -so ... and save as "so-dir_1" In this way, you can also compare the force-theorem (:SUM of first iteration (2 numbers !) with the scf solution. Best, Lukasz ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html -- P.Blaha -- Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna Phone: +43-1-58801-165300 FAX: +43-1-58801-165982 Email: bl...@theochem.tuwien.ac.atWIEN2k: http://www.wien2k.at WWW: http://www.imc.tuwien.ac.at/TC_Blaha -- ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
[Wien] Three questions
Dear All, Sorry for a longer email, but couple of things have accumulated. I have three questions, one on charge convergence, another one on setting the limits in case.inso and case.in1c to avoid large vector files, and a last one on some expert advice on how to converge FM+SOC calculations properly. 1. CHARGE CONVERGENCE: I know this has been discussed before, but it seems it didn't change in Wien2k_18. Here is a typical example of the charge convergence history of the last few iterations of FM+SOC (ferromagnetic + spin orbit coupling) calculation: :ENERGY convergence: 0 0.0001 .000132045000 :CHARGE convergence: 0 0.001 .0027578 :ENERGY convergence: 1 0.0001 .49765000 :CHARGE convergence: 0 0.001 .0050904 :ENERGY convergence: 1 0.0001 .55765000 :CHARGE convergence: 0 0.001 .0004672 :ENERGY convergence: 1 0.0001 .22005000 :CHARGE convergence: 1 0.001 -.0001786 Something strange happens with charge convergence, is this OK? 2. LIMITS in inso and in1c files: in order to avoid large vector files I am changing the energy limits in inso and in1c files for band structure calculations. SCF is done with default inso and in1c files, then I do save_lapw, then I edit case.inso and case.in1c files, and then I do FM+SOC band structure calculation: #! /bin/bash x lapw1 -band -up -p x lapw1 -band -dn -p x lapwso -up -p I am using emin/emax -0.5 0.5 in both files (inso and in1c) without changing anything else, then I have bands from limited energy range. I just want to make sure that this procedure is fine. 3. FM+SOC convergence: I am doing FM+SOC calculations for different in-plane magnetization M directions in a 2D Fe(001) ferromagnetic layer. Actually an older Wien2k_14 version was not working well for this, results for generic M directions were really strange. Wien2k_18 seems much better, however, when starting things from the scratch (each M angle completely separate calculation) it seems that for some M angles the result is off, as if it didn't actually properly converge. I am using a fairly dense mesh for SCF (2000k, 25 25 3), and -ec 0.0001 -cc 0.001. Should I maybe try -fermit setting in init_lapw, and what would be a reasonable value? Do I always need to use instgen_lapw before init_lapw when starting a new case? Should I perhaps do each next M angle on top of a previously converged M angle (and save_lapw for each M angle)? Best, Lukasz ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
Re: [Wien] error in mBJ
Hi, For technical reasons, the mBJ method can not really be used for systems with vacuum. See this for more details: https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg03199.html https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg09181.html https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg13354.html FT On Sunday 2018-09-09 21:51, mitra narimani wrote: Date: Sun, 9 Sep 2018 21:51:54 From: mitra narimani Reply-To: A Mailing list for WIEN2k users To: wien Subject: [Wien] error in mBJ Hello dear users I have a problem about mBJ running of monolayer quantum well. I relax my structure and run it within GGA approach. This process doesnt have any error and everything goes well. But when I run this monolayer within mBJGGA approach, in cycles after 8 or 9 the errors occur in lcore with 'CORE' - NSTOP= 362 positive eigenvalue for 4D Atom: 0 La1 'CORE' - Try to apply a potential shift in case.inc in lcore.error file and STOP in MINI, FORCES small in mini.error file Please help me to solve these errors. Best regard. ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html