[Wien] Error in paralel Lapw1

2021-02-08 Thread Murat Aycibin
Hi Dr/ Blaha
Thanks you for your reply
I took smaller k value 12 in 5x5x3 grid(100 k points i defined). I got the
same mistake. I have computer which has 64 GB Ramand i have 16 core (intel
xeon processes). My machine file is

 .machines is the control file for parallel execution. Add lines like
#
#   speed:machine_name
#
# for each machine specifying there relative speed. For mpi parallelization
use
#
#   speed:machine_name:1 machine_name:1
#   lapw0:machine_name:1 machine_name:1
#
# further options are:
#
#   granularity:number (for loadbalancing on irregularly used machines)
#   residue:machine_name  (on shared memory machines)
#   extrafine (to distribute the remaining k-points one after the
other)
#
# granularity sets the number of files that will be approximately
# be generated by each processor; this is used for load-balancing.
# On very homogeneous systems set number to 1
# if after distributing the k-points to the various machines residual
# k-points are left, they will be distributed to the residual-machine_name.
#
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
granularity:1
extrafine:1
#
# Uncomment for specific OMP-parallelization (overwriting a global
OMP_NUM_THREADS)
#
#omp_global:4
# or use program-specific parallelization:
#omp_lapw0:4
#omp_lapw1:4
#omp_lapw2:4
#omp_lapwso:4
#omp_dstart:4
#omp_sumpara:4
#omp_nlvdw:4

 I had RTmax 7 percent. The error

 .machines is the control file for parallel execution. Add lines like
#
#   speed:machine_name
#
# for each machine specifying there relative speed. For mpi parallelization
use
#
#   speed:machine_name:1 machine_name:1
#   lapw0:machine_name:1 machine_name:1
#
# further options are:
#
#   granularity:number (for loadbalancing on irregularly used machines)
#   residue:machine_name  (on shared memory machines)
#   extrafine (to distribute the remaining k-points one after the
other)
#
# granularity sets the number of files that will be approximately
# be generated by each processor; this is used for load-balancing.
# On very homogeneous systems set number to 1
# if after distributing the k-points to the various machines residual
# k-points are left, they will be distributed to the residual-machine_name.
#
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
granularity:1
extrafine:1
#
# Uncomment for specific OMP-parallelization (overwriting a global
OMP_NUM_THREADS)
#
#omp_global:4
# or use program-specific parallelization:
#omp_lapw0:4
#omp_lapw1:4
#omp_lapw2:4
#omp_lapwso:4
#omp_dstart:4
#omp_sumpara:4
#omp_nlvdw:4

. I do not have any idea what is wrong now.

-- 
Yrd Doc Dr. Murat Aycibin
Van Yuzuncu Yil Universitesi
Fizik Bolumu
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Error in paralel Lapw1

2021-02-08 Thread Peter Blaha

We still don't know much about your case.

Please modify your .machinesfile and use only 2 instead of 16 lines with
1:localhost
If this solves the problem, increase it to 4 or 6 (when you have 12 
k-points) or 8 (if you have more k-points).

Also uncomment
omp_global:2   or 4
Then you are still using all your cores, but you will need less memory.

Am 08.02.2021 um 11:24 schrieb Murat Aycibin:

Hi Dr/ Blaha
Thanks you for your reply
I took smaller k value 12 in 5x5x3 grid(100 k points i defined). I got 
the same mistake. I have computer which has 64 GB Ramand i have 16 core 
(intel xeon processes). My machine file is


  .machines is the control file for parallel execution. Add lines like
#
#   speed:machine_name
#
# for each machine specifying there relative speed. For mpi 
parallelization use

#
#   speed:machine_name:1 machine_name:1
#   lapw0:machine_name:1 machine_name:1
#
# further options are:
#
#   granularity:number (for loadbalancing on irregularly used machines)
#   residue:machine_name  (on shared memory machines)
#   extrafine         (to distribute the remaining k-points one after 
the other)

#
# granularity sets the number of files that will be approximately
# be generated by each processor; this is used for load-balancing.
# On very homogeneous systems set number to 1
# if after distributing the k-points to the various machines residual
# k-points are left, they will be distributed to the residual-machine_name.
#
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
granularity:1
extrafine:1
#
# Uncomment for specific OMP-parallelization (overwriting a global 
OMP_NUM_THREADS)

#
#omp_global:4
# or use program-specific parallelization:
#omp_lapw0:4
#omp_lapw1:4
#omp_lapw2:4
#omp_lapwso:4
#omp_dstart:4
#omp_sumpara:4
#omp_nlvdw:4

  I had RTmax 7 percent. The error

  .machines is the control file for parallel execution. Add lines like
#
#   speed:machine_name
#
# for each machine specifying there relative speed. For mpi 
parallelization use

#
#   speed:machine_name:1 machine_name:1
#   lapw0:machine_name:1 machine_name:1
#
# further options are:
#
#   granularity:number (for loadbalancing on irregularly used machines)
#   residue:machine_name  (on shared memory machines)
#   extrafine         (to distribute the remaining k-points one after 
the other)

#
# granularity sets the number of files that will be approximately
# be generated by each processor; this is used for load-balancing.
# On very homogeneous systems set number to 1
# if after distributing the k-points to the various machines residual
# k-points are left, they will be distributed to the residual-machine_name.
#
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
granularity:1
extrafine:1
#
# Uncomment for specific OMP-parallelization (overwriting a global 
OMP_NUM_THREADS)

#
#omp_global:4
# or use program-specific parallelization:
#omp_lapw0:4
#omp_lapw1:4
#omp_lapw2:4
#omp_lapwso:4
#omp_dstart:4
#omp_sumpara:4
#omp_nlvdw:4

. I do not have any idea what is wrong now.

--
Yrd Doc Dr. Murat Aycibin
Van Yuzuncu Yil Universitesi
Fizik Bolumu

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html



--
--
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
Email: bl...@theochem.tuwien.ac.atWIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at
-
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Error in paralel Lapw1

2021-02-08 Thread Gavin Abo
You might also check what OMP_NUM_THREADS is set to on your system in 
.bashrc or .cshrc?


For example, on my Ubuntu system, I do:

username@computername:~/Desktop$ grep OMP_NUM_THREADS ~/.bashrc
export OMP_NUM_THREADS=1

As you can see I'm using a different value than the default that would 
have been set by userconfig_lapw during installation of WIEN2k.  I 
believe the default value is OMP_NUM_THREADS=4.


Is your Xeon processor a E5-2698 v3?  If it is, the following link has 
"# of Threads" as 32:


https://ark.intel.com/content/www/us/en/ark/products/81060/intel-xeon-processor-e5-2698-v3-40m-cache-2-30-ghz.html

With your .machines file requesting 16 cores, if you OMP_NUM_THREADS is 
4, you would be requesting 16 cores * 4 threads/core = 64 threads.  That 
should be 32 threads (=64 requested threads - 32 processor core threads) 
more than your processor could handle at one time.


If you using a different processor, you would have to look on Intel's 
website to find out the "# of Threads" your particular processor can handle.


The OMP_NUM_THREADS of course can be overridden by using omp_global in 
the .machines file.


If the problem is coming from a memory error as previously discussed as 
a possibility in the post:


https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg20807.html

Then, you might want to check /var/log.  The following post might help 
with that:


https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19703.html

You might also check what parallel_options are set to with the command:

cat $WIENROOT/parallel_options

If the problem is related to passwordless login.  One of the posts in 
the mailing list archive that might help is:


https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg02295.html


On 2/8/2021 5:33 AM, Peter Blaha wrote:

We still don't know much about your case.

Please modify your .machinesfile and use only 2 instead of 16 lines with
1:localhost
If this solves the problem, increase it to 4 or 6 (when you have 12 
k-points) or 8 (if you have more k-points).

Also uncomment
omp_global:2   or 4
Then you are still using all your cores, but you will need less memory.

Am 08.02.2021 um 11:24 schrieb Murat Aycibin:

Hi Dr/ Blaha
Thanks you for your reply
I took smaller k value 12 in 5x5x3 grid(100 k points i defined). I 
got the same mistake. I have computer which has 64 GB Ramand i have 
16 core (intel xeon processes). My machine file is


  .machines is the control file for parallel execution. Add lines like
#
#   speed:machine_name
#
# for each machine specifying there relative speed. For mpi 
parallelization use

#
#   speed:machine_name:1 machine_name:1
#   lapw0:machine_name:1 machine_name:1
#
# further options are:
#
#   granularity:number (for loadbalancing on irregularly used machines)
#   residue:machine_name  (on shared memory machines)
#   extrafine         (to distribute the remaining k-points one after 
the other)

#
# granularity sets the number of files that will be approximately
# be generated by each processor; this is used for load-balancing.
# On very homogeneous systems set number to 1
# if after distributing the k-points to the various machines residual
# k-points are left, they will be distributed to the 
residual-machine_name.

#
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
granularity:1
extrafine:1
#
# Uncomment for specific OMP-parallelization (overwriting a global 
OMP_NUM_THREADS)

#
#omp_global:4
# or use program-specific parallelization:
#omp_lapw0:4
#omp_lapw1:4
#omp_lapw2:4
#omp_lapwso:4
#omp_dstart:4
#omp_sumpara:4
#omp_nlvdw:4

  I had RTmax 7 percent. The error

  .machines is the control file for parallel execution. Add lines like
#
#   speed:machine_name
#
# for each machine specifying there relative speed. For mpi 
parallelization use

#
#   speed:machine_name:1 machine_name:1
#   lapw0:machine_name:1 machine_name:1
#
# further options are:
#
#   granularity:number (for loadbalancing on irregularly used machines)
#   residue:machine_name  (on shared memory machines)
#   extrafine         (to distribute the remaining k-points one after 
the other)

#
# granularity sets the number of files that will be approximately
# be generated by each processor; this is used for load-balancing.
# On very homogeneous systems set number to 1
# if after distributing the k-points to the various machines residual
# k-points are left, they will be distributed to the 
residual-machine_name.

#
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
granularity:1
extrafine:1
#
# Uncomment for specific OMP-parallelization (overwriting a global 
OMP_NUM_THREADS)

#
#omp_global:4
# or use program-specific parallelization:
#omp_lapw0:4
#omp_lapw1:4