Re: [Wien] Installation with MPI and GNU compilers

2018-05-03 Thread Pavel Ondračka
t be possible to
> > call it directly similarly to the VML, however it would introduce a
> > lot of #ifdef LIBMVEC to the code which I guess is not a good idea.
> > I would like to have this working better out of the box so I'll
> > keep looking for some solution which would not require extensive
> > changes in the code or siteconfig script. Dunno if the authors are
> > accepting patches anyway...
> > 
> > Best regards
> > Pavel
> >  
> > > Ciao 
> > > Gerhard 
> > > 
> > > DEEP THOUGHT in D. Adams; Hitchhikers Guide to the Galaxy: 
> > > "I think the problem, to be quite honest with you, 
> > > is that you have never actually known what the question is." 
> > > 
> > >  
> > > Dr. Gerhard H. Fecher 
> > > Institut of Inorganic and Analytical Chemistry 
> > > Johannes Gutenberg - University 
> > > 55099 Mainz 
> > > and 
> > > Max Planck Institute for Chemical Physics of Solids 
> > > 01187 Dresden 
> > >  
> > > Von: Pavel Ondračka [pavel.ondra...@email.cz] 
> > > Gesendet: Mittwoch, 2. Mai 2018 12:05 
> > > An: Fecher, Gerhard 
> > > Betreff: Re: [Wien] Installation with MPI and GNU compilers 
> > > 
> > > I'm using private answer since this might be getting too
> > > technical for 
> > > the list and in fact not interesting for majority of users... 
> > > 
> > > Fecher, Gerhard píše v St 02. 05. 2018 v 09:00 +: 
> > > > I never checked that: does the -DINTEL_VML switch correspond to
> > > the 
> > > > VML library routines of MKL 
> > > > or to the 
> > > > SVML library routines of the compiler 
> > > 
> > > The lapw1 calls directly the VML library, for example the vdcos,
> > > vdsin 
> > > functions, but I have not checked the rest of Wien2k. 
> > > 
> > > > this makes a difference, the svml routines are automatically
> > > invoked 
> > > > by the INTEL compiler if one uses -O2 optimization or higher. 
> > > > (check also the usage of the switches -vec, -no-vec, -vec-
> > > report) 
> > > > 
> > > > The VML routines of the MKL make only sense for appropriate
> > > sizes of 
> > > > the vectors, otherwise, they may even slow down the program
> > > (how much 
> > > > might also depend on threads etc.). 
> > > 
> > > The common usage of the VML in Wien2k is to call the VML
> > > functions with 
> > > a _large_ array as an argument. So if I understand it correctly
> > > the 
> > > vectorization is done inside the VML and the VML chooses the
> > > best 
> > > intrinsic. Since the arrays are large, there is a speedup in all
> > > cases. 
> > > 
> > > BTW are you sure the -O2 switch alone will give you the svml 
> > > intrinsic? IMO the svml intrinsic have different accuracy (might
> > > not be 
> > > strictly IEEE compliant as compared to the scalar variants) so I
> > > would 
> > > expect you need to specify it explicitly with some additional
> > > flag that 
> > > you are OK with this (e.g. for GCC you need the -ffast-math
> > > switch to 
> > > get the vectorized sse,avx goniometric fuctions from the
> > > libmvec). 
> > > 
> > > > A note (for the INTEL Fortran): 
> > > > I vaguely remember that the -DINTEL_VML switch did not bring
> > > any 
> > > > better performance, at that time one needed to give the -lsvml
> > > (with 
> > > > path to the compiler libs) explicitely. 
> > > > 
> > > > Ciao 
> > > > Gerhard 
> > > > 
> > > Best regards 
> > > Pavel 
> 
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.th
> eochem.tuwien.ac.at/index.html
> 
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Installation with MPI and GNU compilers

2018-05-02 Thread Rui Costa
 Changing "#if defined (INTEL_VML)" to "#if defined
(INTEL_VML_HAMILT)" in SRC_lapw1/hamilt.F
really improved Hamilt but seems like DIAG is a little slower. In my
pc (Intel(R)
Core(TM) i7-2630QM CPU @ 2.00GHz, 4 cores, 8 Gb RAM) the benchmark tests
went from:

Simulation  Total (CPU/Wall)   Hamilt (CPU/Wall)   HNS (CPU/Wall)
 DIAG (CPU/Wall)

Serial   1 kpt, 1 thread   82/82  41/41   6/6
35/35
 1 kpt, 2 thread   88/66  41/41   7/4
40/21
 1 kpt, 4 thread   112/61 41/41   9/3
64/17
kparallel2 kpts, 1 th. 83/83  42/42   6/6
35/35
 2 kpts, 2 th. 117/82 44/44   8/4
65/34
 4 kpts, 1 th. 126/12649/49   9/9
68/68
MPI 1 kpt, 1 mpi   1078/1080  618/620
 77/77383/383
1 kpt, 2 mpi   1014/1112  392/394
 104/104  518/618 <- pc stopped for a few minutes
1 kpt, 4 mpi   699/701210/211
 87/88402/403


to

Serial   1 kpt, 1 thread   50/50  8/8 6/6
36/36
 1 kpt, 2 thread   59/35  8/8 8/4
43/23
 1 kpt, 4 thread   89/30  8/8 10/3
   71/19
kparallel2 kpts, 1 th. 56/56  9/9 6/6
41/41
 2 kpts, 2 th. 86/50  9/9 9/5
68/36
 4 kpts, 1 th. 126/12610/10
 10/1073/73
MPI 1 kpt, 1 mpi   540/54178/79
 77/77385/385
1 kpt, 2 mpi   695/69981/83
 96/96518/520  <- ran this simulation twice, don't understand
1 kpt, 4 mpi   509/51145/46
 81/81383/384 why it isslower than the above


Now my only problem seems to be "-it" flag not working.

Best regards,
Rui Costa.

>
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Installation with MPI and GNU compilers

2018-05-02 Thread Rui Costa
I did the benchmark test with the -DINTEL_VML_HAMILT, but since my email
was too big it was waiting for confirmation, so I'll divide it:

I added the print statement to the inilpw.f file and I get the same
results, i.e., it prints only:

iunit = 4
iunit = 5
iunit = 6


Even when I run the simulation with "run_lapw" it only prints those, the
difference is that it finishes without errors. From what I understood, the
error that I get is probably because gfortran changed the behavior in an
update.
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Installation with MPI and GNU compilers

2018-05-02 Thread Laurence Marks
When you say "as fast" do you mean for single core machines or multicore
with threads and/or mpi? Almost everything slow in Wien2k is
lapack/scalapack/elpa. For most parts of the code with 30-200 atom problems
ifort is good but not as critical as the libraries and network.

On Wed, May 2, 2018, 16:05 Pavel Ondračka <pavel.ondra...@email.cz> wrote:

>
> -- Původní e-mail --
> Od: Fecher, Gerhard <fec...@uni-mainz.de>
> Komu: Pavel Ondračka <pavel.ondra...@email.cz>,
> wien@zeus.theochem.tuwien.ac.at <wien@zeus.theochem.tuwien.ac.at>
> Datum: 2. 5. 2018 16:08:06
> Předmět: AW: [Wien] Installation with MPI and GNU compilers
>
> Dear Pavel,
> maybe it's better to ask Laurence, seems he was writing the VML things.
>
> I didn't look into the code within the last years, what I found on a fast
> look is:
>
> The only place where the INTEL_VML is used any longer seems to be in
> Hamilt.f of LAPW1
> I found that it is commented in all other cases where it was once used.
>
> If you don't use INTEL_VML, the INTEL ifort will vectorice the loops in
> vectf.f of LAPW1 (see code in Hamilt.f that calls it)
> (as I mentioned, maybe one has to link the libsvml explicitely
>
>
>
> BTW is svml part of the MKL or do you need the ifort for that?
>
>
> For example
> -O2 -xHost -qopt-report=1 -qopt-report-phase=vec
> will show you which loops were vectorized
>
>
> Indeed, if I add the -O2 and -xHost to the default Wien2k flags (with
> ifort and MKL) there is no performance hit if I remove the -DINTEL_VML.
>
>
> I could not see that the svml has a reduced accuracy, however, you can set
> the performance/accuracy level in the VML.
> What you can do is to set a threshhold for the loop size (similar to
> unroll), might need some short study of the manual.
>
>
> Interesting, I will try to run some tests for the speed and accuracy of
> some basic trigonometric functions for ifort vs gfortran and standard glibc
> vs libmvec vs VML vs svml.
>
>
> I could not see that in W2kinit.F a threshold for the loops (size of the
> arrays) was set,
> only the precision was set there for the INTEL_VML script, however,
> I guess that Laurence used it where only large arrays appeared.
>
> NB: I enjoy more questions about how to increase the speed or how to
> improve the code.
>
>
> Well,  I do believe that the code is well optimized when you have the
> ifort + MKL, however the rest of the options is a somewhat worse.
>
>
> Since you can nowadays get the MKL library for free (but not the ifort)
> there is the combination of gfortran + MKL, which does not have any default
> config  and is slow as was reported by Rui in beginning of the thread. I'm
> quite sure this combination can be made almost as fast as the ifort + MKL
> (either by somewhat fixing the INTEL_VML define to fix the missing ifcore
> problem, or possibly by using the -mveclibabi=svml gfortran switch or some
> other trick). I'm not sure how many people have this setup though.
>
>
> The most problematic is the gfortran + OpenBLAS combination, where I was
> not able to force gfortran use the vectorized (SIMD) math. It works with C
> code (which is why my approach to making lapw1 fast includes porting the
> vectf.f to C) but not with Fortran. It is possible there is some way to
> make this work but I had no luck so far. The libmvec has a public interface
> so it might be possible to call it directly similarly to the VML, however
> it would introduce a lot of #ifdef LIBMVEC to the code which I guess is not
> a good idea. I would like to have this working better out of the box so
> I'll keep looking for some solution which would not require extensive
> changes in the code or siteconfig script. Dunno if the authors are
> accepting patches anyway...
>
>
> Best regards
>
> Pavel
>
>
>
>
> Ciao
> Gerhard
>
> DEEP THOUGHT in D. Adams; Hitchhikers Guide to the Galaxy:
> "I think the problem, to be quite honest with you,
> is that you have never actually known what the question is."
>
> 
> Dr. Gerhard H. Fecher
> Institut of Inorganic and Analytical Chemistry
> Johannes Gutenberg - University
> 55099 Mainz
> and
> Max Planck Institute for Chemical Physics of Solids
> 01187 Dresden
> 
> Von: Pavel Ondračka [pavel.ondra...@email.cz]
> Gesendet: Mittwoch, 2. Mai 2018 12:05
> An: Fecher, Gerhard
> Betreff: Re: [Wien] Installation with MPI and GNU compilers
>
> I'm using private answer since this might be getting too technical for
> the list and in fact not interesting for majority of users...
>
> Fecher, Gerhard píše v St 02. 05.

Re: [Wien] Installation with MPI and GNU compilers

2018-05-02 Thread Pavel Ondračka

-- Původní e-mail --
Od: Fecher, Gerhard <fec...@uni-mainz.de>
Komu: Pavel Ondračka <pavel.ondra...@email.cz>, wien@zeus.theochem.tuwien.
ac.at <wien@zeus.theochem.tuwien.ac.at>
Datum: 2. 5. 2018 16:08:06
Předmět: AW: [Wien] Installation with MPI and GNU compilers
"Dear Pavel,
maybe it's better to ask Laurence, seems he was writing the VML things. 

I didn't look into the code within the last years, what I found on a fast 
look is:

The only place where the INTEL_VML is used any longer seems to be in Hamilt.
f of LAPW1
I found that it is commented in all other cases where it was once used. 

If you don't use INTEL_VML, the INTEL ifort will vectorice the loops in 
vectf.f of LAPW1 (see code in Hamilt.f that calls it)
(as I mentioned, maybe one has to link the libsvml explicitely """
 

""
BTW is svml part of the MKL or do you need the ifort for that?

"
For example
-O2 -xHost -qopt-report=1 -qopt-report-phase=vec
will show you which loops were vectorized"



Indeed, if I add the -O2 and -xHost to the default Wien2k flags (with ifort
and MKL) there is no performance hit if I remove the -DINTEL_VML.



"I could not see that the svml has a reduced accuracy, however, you can set
the performance/accuracy level in the VML.
What you can do is to set a threshhold for the loop size (similar to
unroll), might need some short study of the manual. "



Interesting, I will try to run some tests for the speed and accuracy of some
basic trigonometric functions for ifort vs gfortran and standard glibc vs 
libmvec vs VML vs svml.

"
I could not see that in W2kinit.F a threshold for the loops (size of the 
arrays) was set,
only the precision was set there for the INTEL_VML script, however,
I guess that Laurence used it where only large arrays appeared.

NB: I enjoy more questions about how to increase the speed or how to improve
the code. "



Well,  I do believe that the code is well optimized when you have the ifort
+ MKL, however the rest of the options is a somewhat worse.




Since you can nowadays get the MKL library for free (but not the ifort) 
there is the combination of gfortran + MKL, which does not have any default
config  and is slow as was reported by Rui in beginning of the thread. I'm
quite sure this combination can be made almost as fast as the ifort + MKL 
(either by somewhat fixing the INTEL_VML define to fix the missing ifcore 
problem, or possibly by using the -mveclibabi=svml gfortran switch or some
other trick). I'm not sure how many people have this setup though.





The most problematic is the gfortran + OpenBLAS combination, where I was not
able to force gfortran use the vectorized (SIMD) math. It works with C code
(which is why my approach to making lapw1 fast includes porting the vectf.f
to C) but not with Fortran. It is possible there is some way to make this 
work but I had no luck so far. The libmvec has a public interface so it 
might be possible to call it directly similarly to the VML, however it would
introduce a lot of #ifdef LIBMVEC to the code which I guess is not a good 
idea. I would like to have this working better out of the box so I'll keep
looking for some solution which would not require extensive changes in the
code or siteconfig script. Dunno if the authors are accepting patches
anyway...





Best regards

Pavel


 
"
Ciao
Gerhard

DEEP THOUGHT in D. Adams; Hitchhikers Guide to the Galaxy:
"I think the problem, to be quite honest with you,
is that you have never actually known what the question is."


Dr. Gerhard H. Fecher
Institut of Inorganic and Analytical Chemistry
Johannes Gutenberg - University
55099 Mainz
and
Max Planck Institute for Chemical Physics of Solids
01187 Dresden

Von: Pavel Ondračka [pavel.ondra...@email.cz]
Gesendet: Mittwoch, 2. Mai 2018 12:05
An: Fecher, Gerhard
Betreff: Re: [Wien] Installation with MPI and GNU compilers

I'm using private answer since this might be getting too technical for
the list and in fact not interesting for majority of users...

Fecher, Gerhard píše v St 02. 05. 2018 v 09:00 +:
> I never checked that: does the -DINTEL_VML switch correspond to the
> VML library routines of MKL
> or to the
> SVML library routines of the compiler

The lapw1 calls directly the VML library, for example the vdcos, vdsin
functions, but I have not checked the rest of Wien2k.

> this makes a difference, the svml routines are automatically invoked
> by the INTEL compiler if one uses -O2 optimization or higher.
> (check also the usage of the switches -vec, -no-vec, -vec-report)
>
> The VML routines of the MKL make only sense for appropriate sizes of
> the vectors, otherwise, they may even slow down the program (how much 
> might also depend on threads etc.).

The common usage of the VML in Wien2k is to call the VML functions with 
a _larg

Re: [Wien] Installation with MPI and GNU compilers

2018-05-02 Thread Laurence Marks
In fact Peter added the vector code in lapw1, although I added it to aim
and lapw5. I did the W2kinit with some help.

I suspect I probably use the -DINTEL_VML parameter in W2kinit and perhaps
aim/lapw5 a bit sloppily, and it could be generalized. For instance it
makes sense to modify the code so -DOPENBLAS or similar is set and then
have some compile time #ifdef statements.

However, this gets to be somewhat tricky as I don't have access to all
compilers (and I suspect Peter does not either).

Also, W2kinit does some important things such as setup some error handlers
and set ulimit. (If you go back a few years you will find ever third email
of the list was about ulimit issues!) Setting this for other systems can be
tricky. I think we resolved the MAC issues, but they seem to reoccur.

And...one has to worry about compatibility and portability. While Fortran
is standard, C is less so and system calls embedded in compilers can
change. Plus, gfortran is in a state of flux. When I recently tested the
mixer with it I noticed that it gave a compile time warning that DO loops
with floating point variables was a "deleted" feature. (Fortunately the
mixer still seems to work.)

On Wed, May 2, 2018 at 9:08 AM, Fecher, Gerhard <fec...@uni-mainz.de> wrote:

> Dear Pavel,
> maybe it's better to ask Laurence, seems he was writing the VML things.
>
> I didn't look into the code within the last years, what I found on a fast
> look is:
>
> The only place where the INTEL_VML is used  any longer seems to be in
> Hamilt.f of LAPW1
> I found that it is commented in all other cases where it was once used.
>
> If you don't use INTEL_VML, the INTEL ifort will vectorice the loops in
> vectf.f of LAPW1 (see code in Hamilt.f that calls it)
> (as I mentioned, maybe one has to link the libsvml explicitely)
>
> For example
> -O2 -xHost -qopt-report=1 -qopt-report-phase=vec
> will show you which loops were vectorized
>
> I could not see that the svml has a reduced accuracy, however, you can set
> the performance/accuracy level in the VML.
> What you can do is to set a threshhold for the loop size (similar to
> unroll), might need some short study of the manual.
>
> I could not see that in W2kinit.F a threshold for the loops (size of the
> arrays) was set,
> only the precision was set there for the INTEL_VML script, however,
> I guess that Laurence used it where only large arrays appeared.
>
> NB: I enjoy more questions about how to increase the speed or how to
> improve the code.
>
>
> Ciao
> Gerhard
>
> DEEP THOUGHT in D. Adams; Hitchhikers Guide to the Galaxy:
> "I think the problem, to be quite honest with you,
> is that you have never actually known what the question is."
>
> 
> Dr. Gerhard H. Fecher
> Institut of Inorganic and Analytical Chemistry
> Johannes Gutenberg - University
> 55099 Mainz
> and
> Max Planck Institute for Chemical Physics of Solids
> 01187 Dresden
> ________________
> Von: Pavel Ondračka [pavel.ondra...@email.cz]
> Gesendet: Mittwoch, 2. Mai 2018 12:05
> An: Fecher, Gerhard
> Betreff: Re: [Wien] Installation with MPI and GNU compilers
>
> I'm using private answer since this might be getting too technical for
> the list and in fact not interesting for majority of users...
>
> Fecher, Gerhard píše v St 02. 05. 2018 v 09:00 +:
> > I never checked that: does the  -DINTEL_VML switch correspond to the
> > VML library routines of MKL
> > or to the
> > SVML library routines of the compiler
>
> The lapw1 calls directly the VML library, for example the vdcos, vdsin
> functions, but I have not checked the rest of Wien2k.
>
> > this makes a difference, the svml routines are automatically invoked
> > by the INTEL compiler if one uses -O2 optimization or higher.
> > (check also the usage of the switches -vec, -no-vec, -vec-report)
> >
> > The VML routines of the MKL make only sense for appropriate sizes of
> > the vectors, otherwise, they may even slow down the program (how much
> > might also depend on threads etc.).
>
> The common usage of the VML in Wien2k is to call the VML functions with
>  a _large_ array as an argument. So if I understand it correctly the
> vectorization is done inside the VML and the VML chooses the best
> intrinsic. Since the arrays are large, there is a speedup in all cases.
>
> BTW are you sure the -O2 switch alone will give you the svml
> intrinsic? IMO the svml intrinsic have different accuracy (might not be
> strictly IEEE compliant as compared to the scalar variants) so I would
> expect you need to specify it explicitly with some additional flag that
> you are OK with this (e.g. for GCC you need the -ffast-math switch 

Re: [Wien] Installation with MPI and GNU compilers

2018-05-02 Thread Fecher, Gerhard
Dear Pavel,
maybe it's better to ask Laurence, seems he was writing the VML things. 

I didn't look into the code within the last years, what I found on a fast look 
is:

The only place where the INTEL_VML is used  any longer seems to be in Hamilt.f 
of LAPW1
I found that it is commented in all other cases where it was once used.

If you don't use INTEL_VML, the INTEL ifort will vectorice the loops in vectf.f 
of LAPW1 (see code in Hamilt.f that calls it)
(as I mentioned, maybe one has to link the libsvml explicitely)

For example 
-O2 -xHost -qopt-report=1 -qopt-report-phase=vec
will show you which loops were vectorized

I could not see that the svml has a reduced accuracy, however, you can set the 
performance/accuracy level in the VML.
What you can do is to set a threshhold for the loop size (similar to unroll), 
might need some short study of the manual.

I could not see that in W2kinit.F a threshold for the loops (size of the 
arrays) was set,
only the precision was set there for the INTEL_VML script, however,
I guess that Laurence used it where only large arrays appeared.

NB: I enjoy more questions about how to increase the speed or how to improve 
the code.
   

Ciao
Gerhard

DEEP THOUGHT in D. Adams; Hitchhikers Guide to the Galaxy:
"I think the problem, to be quite honest with you,
is that you have never actually known what the question is."


Dr. Gerhard H. Fecher
Institut of Inorganic and Analytical Chemistry
Johannes Gutenberg - University
55099 Mainz
and
Max Planck Institute for Chemical Physics of Solids
01187 Dresden

Von: Pavel Ondračka [pavel.ondra...@email.cz]
Gesendet: Mittwoch, 2. Mai 2018 12:05
An: Fecher, Gerhard
Betreff: Re: [Wien] Installation with MPI and GNU compilers

I'm using private answer since this might be getting too technical for
the list and in fact not interesting for majority of users...

Fecher, Gerhard píše v St 02. 05. 2018 v 09:00 +:
> I never checked that: does the  -DINTEL_VML switch correspond to the
> VML library routines of MKL
> or to the
> SVML library routines of the compiler

The lapw1 calls directly the VML library, for example the vdcos, vdsin
functions, but I have not checked the rest of Wien2k.

> this makes a difference, the svml routines are automatically invoked
> by the INTEL compiler if one uses -O2 optimization or higher.
> (check also the usage of the switches -vec, -no-vec, -vec-report)
>
> The VML routines of the MKL make only sense for appropriate sizes of
> the vectors, otherwise, they may even slow down the program (how much
> might also depend on threads etc.).

The common usage of the VML in Wien2k is to call the VML functions with
 a _large_ array as an argument. So if I understand it correctly the
vectorization is done inside the VML and the VML chooses the best
intrinsic. Since the arrays are large, there is a speedup in all cases.

BTW are you sure the -O2 switch alone will give you the svml
intrinsic? IMO the svml intrinsic have different accuracy (might not be
strictly IEEE compliant as compared to the scalar variants) so I would
expect you need to specify it explicitly with some additional flag that
you are OK with this (e.g. for GCC you need the -ffast-math switch to
get the vectorized sse,avx goniometric fuctions from the libmvec).

> A note (for the INTEL Fortran):
> I vaguely remember that the -DINTEL_VML switch did not bring any
> better performance, at that time one needed to give the -lsvml (with
> path to the compiler libs) explicitely.
>
> Ciao
> Gerhard
>
Best regards
Pavel
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Installation with MPI and GNU compilers

2018-05-02 Thread Fecher, Gerhard
I never checked that: does the  -DINTEL_VML switch correspond to the
VML library routines of MKL
or to the
SVML library routines of the compiler
this makes a difference, the svml routines are automatically invoked by the 
INTEL compiler if one uses -O2 optimization or higher.
(check also the usage of the switches -vec, -no-vec, -vec-report)

The VML routines of the MKL make only sense for appropriate sizes of the 
vectors, otherwise, they may even slow down the program (how much might also 
depend on threads etc.).

A note (for the INTEL Fortran):
I vaguely remember that the -DINTEL_VML switch did not bring any better 
performance, at that time one needed to give the -lsvml (with path to the 
compiler libs) explicitely.

Ciao
Gerhard

DEEP THOUGHT in D. Adams; Hitchhikers Guide to the Galaxy:
"I think the problem, to be quite honest with you,
is that you have never actually known what the question is."


Dr. Gerhard H. Fecher
Institut of Inorganic and Analytical Chemistry
Johannes Gutenberg - University
55099 Mainz
and
Max Planck Institute for Chemical Physics of Solids
01187 Dresden

Von: Wien [wien-boun...@zeus.theochem.tuwien.ac.at] im Auftrag von Pavel 
Ondračka [pavel.ondra...@email.cz]
Gesendet: Mittwoch, 2. Mai 2018 10:30
An: A Mailing list for WIEN2k users
Betreff: Re: [Wien] Installation with MPI and GNU compilers

Rui Costa píše v Po 30. 04. 2018 v 22:24 +0100:
> I have the VML libraries, i.e., the libmkl_vml_* files are in
> $MKLROOT/lib/intel_64, but when I tried compiling with -DINTEL_VML it
> gave me the error "Fatal Error: Can't open module file ‘ifcore.mod’
> for reading at (1): No such file or directory", and this file only
> comes with the compilers.

Yeah, I have not realized that the INTEL_VML ifdef also guards the use
of ifcore stuff, IMO this could be improved by using two defines, one
for the actual VML calls (which would be defined when MKL is present)
and one for the ifcore library calls (which would be defined only when
also the ifort is detected).

BTW as a quick hack to make the lapw1 fast, just change all the
#if defined (INTEL_VML)
lines in SRC_lapw1/hamilt.F
to
#if defined (INTEL_VML_HAMILT)
and add the -DINTEL_VML_HAMILT flag
this should be all that is needed to use the VML in lapw1

> To use the libmvec library I would have to change a few lines of code
> in the mkl libraries and that is beyond my computer skills.

Actually no changes to the MKL are required. The least obtrusive way as
described in https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.
at/msg16159.html only consist of copying single c file to SRC_lapw1
compiling it by hand and than rerunning make to link lapw1 with the new
object file (i.e. no changes to any Wien2k files are needed). However
the VML way is easier when you already have the MKL set up.

Best regards,
Pavel

> Best regards,
> Rui Costa.
>
> On 30 April 2018 at 20:57, Pavel Ondračka <pavel.ondra...@email.cz>
> wrote:
> > -- Původní e-mail --
> > Od: Rui Costa <ruicosta@gmail.com>
> > Komu: A Mailing list for WIEN2k users <w...@zeus.theochem.tuwien.ac
> > .at>
> > Datum: 30. 4. 2018 19:39:44
> > Předmět: Re: [Wien] Installation with MPI and GNU compilers
> >
> > > I was able to install wien2k with gfortran+MKL. Apparently the
> > > MKL libraries are free [https://software.intel.com/en-us/performa
> > > nce-libraries] but not the compilers.
> > >
> > > While doing the benchmark tests we noticed that during the Hamilt
> > > there was a huge difference between this and an ifort+MKL
> > > compilation, and as Pavel said, this comes from the VML
> > > functions. This is not the case during DIAG because while the
> > > DIAG belongs to MKL, Hamilt is from wien2k. I then tried to
> > > compile with these VML functions but I couldn't because I need an
> > > ifcore.mod file that comes with intel compilers I think, at least
> > > it is not in the free MKL version.
> > >
> > > Do you have any recommendation about the compilation options that
> > > could better optimize wien2k?
> >
> > Dear Rui,
> >
> > so to make this clear, your MKL comes without the VML, or are you
> > just not able to use/link them? I do not understand the part with
> > the ifcore.mod much, however the VML paths are guarded with some
> > ifdef magic, try adding  -DINTEL_VML to your flags (FOPT, FPOPT)
> > and see if it helps.
> >
> > The second option is to use the libmvec library (provided you have
> > fairly new glibc) but it is unsupported by the Wien2k team and
> > probably not tested by many people except me. If you cannot get the
> > VML working, look for o

Re: [Wien] Installation with MPI and GNU compilers

2018-05-02 Thread Pavel Ondračka
Rui Costa píše v Po 30. 04. 2018 v 22:24 +0100:
> I have the VML libraries, i.e., the libmkl_vml_* files are in
> $MKLROOT/lib/intel_64, but when I tried compiling with -DINTEL_VML it
> gave me the error "Fatal Error: Can't open module file ‘ifcore.mod’
> for reading at (1): No such file or directory", and this file only
> comes with the compilers.

Yeah, I have not realized that the INTEL_VML ifdef also guards the use
of ifcore stuff, IMO this could be improved by using two defines, one
for the actual VML calls (which would be defined when MKL is present)
and one for the ifcore library calls (which would be defined only when
also the ifort is detected).

BTW as a quick hack to make the lapw1 fast, just change all the
#if defined (INTEL_VML)
lines in SRC_lapw1/hamilt.F
to 
#if defined (INTEL_VML_HAMILT)
and add the -DINTEL_VML_HAMILT flag
this should be all that is needed to use the VML in lapw1

> To use the libmvec library I would have to change a few lines of code
> in the mkl libraries and that is beyond my computer skills.

Actually no changes to the MKL are required. The least obtrusive way as
described in https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.
at/msg16159.html only consist of copying single c file to SRC_lapw1
compiling it by hand and than rerunning make to link lapw1 with the new
object file (i.e. no changes to any Wien2k files are needed). However
the VML way is easier when you already have the MKL set up.

Best regards,
Pavel

> Best regards,
> Rui Costa.
> 
> On 30 April 2018 at 20:57, Pavel Ondračka <pavel.ondra...@email.cz>
> wrote:
> > -- Původní e-mail --
> > Od: Rui Costa <ruicosta@gmail.com>
> > Komu: A Mailing list for WIEN2k users <w...@zeus.theochem.tuwien.ac
> > .at>
> > Datum: 30. 4. 2018 19:39:44
> > Předmět: Re: [Wien] Installation with MPI and GNU compilers
> > 
> > > I was able to install wien2k with gfortran+MKL. Apparently the
> > > MKL libraries are free [https://software.intel.com/en-us/performa
> > > nce-libraries] but not the compilers.
> > > 
> > > While doing the benchmark tests we noticed that during the Hamilt
> > > there was a huge difference between this and an ifort+MKL
> > > compilation, and as Pavel said, this comes from the VML
> > > functions. This is not the case during DIAG because while the
> > > DIAG belongs to MKL, Hamilt is from wien2k. I then tried to
> > > compile with these VML functions but I couldn't because I need an
> > > ifcore.mod file that comes with intel compilers I think, at least
> > > it is not in the free MKL version.
> > > 
> > > Do you have any recommendation about the compilation options that
> > > could better optimize wien2k?
> > 
> > Dear Rui,
> > 
> > so to make this clear, your MKL comes without the VML, or are you
> > just not able to use/link them? I do not understand the part with
> > the ifcore.mod much, however the VML paths are guarded with some
> > ifdef magic, try adding  -DINTEL_VML to your flags (FOPT, FPOPT)
> > and see if it helps. 
> > 
> > The second option is to use the libmvec library (provided you have
> > fairly new glibc) but it is unsupported by the Wien2k team and
> > probably not tested by many people except me. If you cannot get the
> > VML working, look for older emails discussing libmvec or contact me
> > privately and I can give you some pointers. 
> > 
> > No idea about the -it problem though.
> > 
> > Best regards
> > Pavel
> > 
> > ___
> > Wien mailing list
> > Wien@zeus.theochem.tuwien.ac.at
> > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> > SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.
> > theochem.tuwien.ac.at/index.html
> > 
> 
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.th
> eochem.tuwien.ac.at/index.html
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Installation with MPI and GNU compilers

2018-05-01 Thread Gavin Abo

Using:

64 bit Ubuntu 16.04.4 LTS
WIEN2k 17.1 (with the siteconfig, libxc, and gfortran patches [ 
https://github.com/gsabo/WIEN2k-Patches/tree/master/17.1 ])

username@computername:~$ gfortran --version
GNU Fortran (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609

In SRC_lapw1/inilpw.f, I added a print statement after line 196:

 READ (1,*,END=20,ERR=960) IUNIT,FNAME,STATUS,FORM,IRECL
 print*,'iunit = ',IUNIT

then recompiled using siteconfig and executed "run_lapw -it".

For ifort, it prints out all units for me as expected in the case.dayfile:

iunit = 4
iunit = 5
iunit = 6
...
iunit = 200

With gfortran, it prints only the first three units in the case.dayfile 
for me from the lapw1.def file:


iunit = 4
iunit = 5
iunit = 6

Does the same thing happen for you?  Part of the problem might be that 
IUNIT of 200 is never read such that it is not opened on line 214 in 
inilpw.f because of a gfortran issue with the READ statement on line 
196.  Changing the file iunit of 5 and 6 to say 65 and 66 in x_lapw and 
all the source files in SRC_lapw1 seems to get the print statement that 
was added on line 197 to produce the same output as ifort.


However, an empty case.storeHinv_proc_0 file is still produced that 
results in that error on line 140 of the file SRC_lapw1/jacdavblock.F.


Before the write(200) statement on line 140 in jacdavblock.F, there is a 
read(200,iostat=ios) statement on line 96.  I'm not yet sure, but I'm 
wondering if a rewind or backspace statement has to be used after the 
read before the write statement is performed on line 140.


After the endo on line 99, if I put on line 100:

rewind(200)

Then, a non-empty case.storeHinv_proc_0 is produced and the "WRITE not 
allowed after EOF marker" error goes away.  Though, I may not have the 
logic quite right yet even though the error disappears.  If someone else 
has time to further look into the problem, then hopefully this 
information can help.


The source of the error is likely similar to what has been reported before:

https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg16674.html
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg13671.html

whenever I try to run a simulation with "-it" flag the simulations 
fail in the second cycle with a "Fortran runtime error". In this 
example I am doing TiC from the UG and executing the command "run_lapw 
-it":


hup: Command not found.
STOP  LAPW0 END
foreach: No match.
Note: The following floating-point exceptions are signalling:
IEEE_DENORMAL
STOP  LAPW1 END
STOP  LAPW2 END
STOP  CORE  END
STOP  MIXER END
ec cc and fc_conv 0 1 1
in cycle 2    ETEST: 0   CTEST: 0
hup: Command not found.
STOP  LAPW0 END
At line 140 of file jacdavblock_tmp_.F (unit = 200, file =
'./TiC.storeHinv_proc_0')
Fortran runtime error: Sequential READ or WRITE not allowed after
EOF marker, possibly use REWIND or BACKSPACE

>   stop error


The "TiC.storeHinv_proc_0" file is empty and I can't find the file 
"jacdavblock_tmp_.F". What could be the problem?


___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Installation with MPI and GNU compilers

2018-04-30 Thread Rui Costa
I have the VML libraries, i.e., the libmkl_vml_* files are in
$MKLROOT/lib/intel_64, but when I tried compiling with -DINTEL_VML it gave
me the error "Fatal Error: Can't open module file ‘ifcore.mod’ for reading
at (1): No such file or directory", and this file only comes with the
compilers.

To use the libmvec library I would have to change a few lines of code in
the mkl libraries and that is beyond my computer skills.

Best regards,
Rui Costa.

On 30 April 2018 at 20:57, Pavel Ondračka <pavel.ondra...@email.cz> wrote:

> -- Původní e-mail --
> Od: Rui Costa <ruicosta@gmail.com>
> Komu: A Mailing list for WIEN2k users <wien@zeus.theochem.tuwien.ac.at>
> Datum: 30. 4. 2018 19:39:44
> Předmět: Re: [Wien] Installation with MPI and GNU compilers
>
> I was able to install wien2k with gfortran+MKL. Apparently the MKL
> libraries are free [https://software.intel.com/en-us/performance-libraries]
> but not the compilers.
>
> While doing the benchmark tests we noticed that during the Hamilt there
> was a huge difference between this and an ifort+MKL compilation, and as
> Pavel said, this comes from the VML functions. This is not the case during
> DIAG because while the DIAG belongs to MKL, Hamilt is from wien2k. I then
> tried to compile with these VML functions but I couldn't because I need an
> ifcore.mod file that comes with intel compilers I think, at least it is not
> in the free MKL version.
>
> Do you have any recommendation about the compilation options that could
> better optimize wien2k?
>
>
> Dear Rui,
>
>
> so to make this clear, your MKL comes without the VML, or are you just not
> able to use/link them? I do not understand the part with the ifcore.mod
> much, however the VML paths are guarded with some ifdef magic, try adding
> -DINTEL_VML to your flags (FOPT, FPOPT) and see if it helps.
>
>
> The second option is to use the libmvec library (provided you have fairly
> new glibc) but it is unsupported by the Wien2k team and probably not tested
> by many people except me. If you cannot get the VML working, look for older
> emails discussing libmvec or contact me privately and I can give you some
> pointers.
>
>
> No idea about the -it problem though.
>
>
> Best regards
>
> Pavel
>
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/
> wien@zeus.theochem.tuwien.ac.at/index.html
>
>
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Installation with MPI and GNU compilers

2018-04-30 Thread Pavel Ondračka
-- Původní e-mail --
Od: Rui Costa <ruicosta@gmail.com>
Komu: A Mailing list for WIEN2k users <wien@zeus.theochem.tuwien.ac.at>
Datum: 30. 4. 2018 19:39:44
Předmět: Re: [Wien] Installation with MPI and GNU compilers
"
I was able to install wien2k with gfortran+MKL. Apparently the MKL libraries
are free [https://software.intel.com/en-us/performance-libraries
(https://software.intel.com/en-us/performance-libraries)] but not the
compilers.



While doing the benchmark tests we noticed that during the Hamilt there was
a huge difference between this and an ifort+MKL compilation, and as Pavel 
said, this comes from the VML functions. This is not the case during DIAG 
because while the DIAG belongs to MKL, Hamilt is from wien2k. I then tried
to compile with these VML functions but I couldn't because I need an ifcore.
mod file that comes with intel compilers I think, at least it is not in the
free MKL version.




Do you have any recommendation about the compilation options that could 
better optimize wien2k?



"



Dear Rui,




so to make this clear, your MKL comes without the VML, or are you just not
able to use/link them? I do not understand the part with the ifcore.mod 
much, however the VML paths are guarded with some ifdef magic, try adding  -
DINTEL_VML to your flags (FOPT, FPOPT) and see if it helps.





The second option is to use the libmvec library (provided you have fairly 
new glibc) but it is unsupported by the Wien2k team and probably not tested
by many people except me. If you cannot get the VML working, look for older
emails discussing libmvec or contact me privately and I can give you some 
pointers.





No idea about the -it problem though.





Best regards

Pavel
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Installation with MPI and GNU compilers

2018-04-30 Thread Rui Costa
I was able to install wien2k with gfortran+MKL. Apparently the MKL
libraries are free [https://software.intel.com/en-us/performance-libraries]
but not the compilers.

While doing the benchmark tests we noticed that during the Hamilt there was
a huge difference between this and an ifort+MKL compilation, and as Pavel
said, this comes from the VML functions. This is not the case during DIAG
because while the DIAG belongs to MKL, Hamilt is from wien2k. I then tried
to compile with these VML functions but I couldn't because I need an
ifcore.mod file that comes with intel compilers I think, at least it is not
in the free MKL version.

Do you have any recommendation about the compilation options that could
better optimize wien2k?

The ones I used are the following:

 ***
 * Specify compiler and linker options *
 ***


 Recommended options for system linuxgfortran are:
  Compiler options:-ffree-form -O2 -ffree-line-length-none
  Linker Flags:$(FOPT) -L../SRC_lib
  Preprocessor flags:  '-DParallel'
  R_LIB (LAPACK+BLAS): -lopenblas -llapack -lpthread

 Current settings:
  O   Compiler options:-ffree-form -O2 -ftree-vectorize
-ffree-line-length-none -fopenmp -m64 -I$(MKLROOT)/include
-I/opt/openmpi/include
  L   Linker Flags:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH)
-L/opt/openmpi/lib -L/opt/fftw3/lib -pthread
  P   Preprocessor flags   '-DParallel'
  R   R_LIBS (LAPACK+BLAS):
/opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_blas95_lp64.a
/opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_lapack95_lp64.a
-Wl,--no-as-needed -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -lgomp
-lpthread -lm -ldl
  X   LIBX options:-DLIBXC -I/opt/etsf/include
  LIBXC-LIBS:  -L/opt/etsf/lib -lxcf03 -lxc

   ***
   * Specify parallel options and library settings   *
   ***

   Your current parallel settings (options and libraries) are:

 C   Parallel Compiler:  mpifort
 FP  Parallel Compiler Options:  -ffree-form -O2 -ftree-vectorize
-ffree-line-length-none -fopenmp -m64 -I$(MKLROOT)/include
-I/opt/openmpi/include
 MP  MPIRUN command: mpirun -np _NP_ -machinefile _HOSTS_
_EXEC_

   Additional setting for SLURM batch systems (is set to 1 otherwise):

 CN  Number of Cores:1

   Libraries:

 F   FFTW options:-DFFTW3 -I/opt/fftw3/include
 FFTW-LIBS:   -L/opt/fftw3/lib -lfftw3
 FFTW-PLIBS:  -lfftw3_mpi
 Sp  SCALAPACK:
 -L/opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64
 -lmkl_scalapack_lp64

 -L/opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64
-lmkl_blacs_openmpi_lp64
 E   ELPA options:
 ELPA-LIBS:

   Since you use gfortran you might need to specify additional libraries.
   You have to make sure that all necessary libraries are present (e.g.
MPI, ...)
   and can be found by the linker (specify, if necessary,
-L/Path_to_library )!

 RP  Parallel-Libs for gfortran:



Additionally, whenever I try to run a simulation with "-it" flag the
simulations fail in the second cycle with a "Fortran runtime error". In
this example I am doing TiC from the UG and executing the command "run_lapw
-it":

hup: Command not found.
STOP  LAPW0 END
foreach: No match.
Note: The following floating-point exceptions are signalling: IEEE_DENORMAL
STOP  LAPW1 END
STOP  LAPW2 END
STOP  CORE  END
STOP  MIXER END
ec cc and fc_conv 0 1 1
in cycle 2ETEST: 0   CTEST: 0
hup: Command not found.
STOP  LAPW0 END
At line 140 of file jacdavblock_tmp_.F (unit = 200, file =
'./TiC.storeHinv_proc_0')
Fortran runtime error: Sequential READ or WRITE not allowed after EOF
marker, possibly use REWIND or BACKSPACE

>   stop error


The "TiC.storeHinv_proc_0" file is empty and I can't find the file "
jacdavblock_tmp_.F". What could be the problem?

Best regards,
Rui Costa.

On 5 April 2018 at 11:18, Pavel Ondračka  wrote:

> Laurence Marks píše v St 04. 04. 2018 v 16:01 +:
> > I confess to being rather doubtful that gfortran+... is comparable to
> > ifort+... for Intel cpu, it might be for AMD. While the mkl vector
> > libraries are useful in a few codes such as aim, they are minor for
> > the main lapw[0-2].
>
> Well, some fast benchmark data then (serial benchmark single core):
> Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz (haswell)
> Wien2k 17.1
>
> -
>
> gfortran 7.3.1 + OPENBLAS 0.2.20 + glibc 2.26 (with the custom patch to
> use libmvec):
>
> Time for 

Re: [Wien] Installation with MPI and GNU compilers

2018-04-05 Thread Pavel Ondračka
Laurence Marks píše v St 04. 04. 2018 v 16:01 +:
> I confess to being rather doubtful that gfortran+... is comparable to
> ifort+... for Intel cpu, it might be for AMD. While the mkl vector
> libraries are useful in a few codes such as aim, they are minor for
> the main lapw[0-2].

Well, some fast benchmark data then (serial benchmark single core):
Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz (haswell)
Wien2k 17.1

-

gfortran 7.3.1 + OPENBLAS 0.2.20 + glibc 2.26 (with the custom patch to
use libmvec):

Time for al,bl(hamilt, cpu/wall) :  0.2 0.2
Time for legendre (hamilt, cpu/wall) :  0.1 0.2
Time for phase(hamilt, cpu/wall) :  1.2 1.2
Time for us   (hamilt, cpu/wall) :  1.2 1.2
Time for overlaps (hamilt, cpu/wall) :  2.6 2.8
Time for distrib  (hamilt, cpu/wall) :  0.1 0.1
Time sum iouter   (hamilt, cpu/wall) :  5.5 5.8
 number of local orbitals, nlo (hamilt)  304
   allocate YL   2.5
MB  dimensions15  3481 3
   allocate phsc 0.1 MB  dimensions  3481
Time for los  (hamilt, cpu/wall) :  0.4 0.3
Time for alm (hns) :  0.1
Time for vector  (hns) :  0.3
Time for vector2 (hns) :  0.3
Time for VxV (hns) :  2.1
Wall Time for VxV(hns) :  0.1
 245  Eigenvalues computed 
 Seclr4(Cholesky complete (CPU)) :   1.380 40754.14
Mflops
 Seclr4(Transform to eig.problem (CPU)) :4.470 37745.44
Mflops
 Seclr4(Compute eigenvalues (CPU)) :12.750 17643.13
Mflops
 Seclr4(Backtransform (CPU)) :   0.290 10237.08
Mflops
   TIME HAMILT (CPU)  = 5.8, HNS = 2.5, HORB = 0.0,
DIAG =18.9
   TIME HAMILT (WALL) = 6.1, HNS = 2.5, HORB = 0.0,
DIAG =19.0

real0m28.610s
user0m27.817s
sys 0m0.394s

---

Ifort 17.0.0 + MKL 2017.0:

Time for al,bl(hamilt, cpu/wall) :  0.2 0.2
Time for legendre (hamilt, cpu/wall) :  0.1 0.2
Time for phase(hamilt, cpu/wall) :  1.2 1.3
Time for us   (hamilt, cpu/wall) :  1.0 1.0
Time for overlaps (hamilt, cpu/wall) :  2.6 2.8
Time for distrib  (hamilt, cpu/wall) :  0.1 0.1
Time sum iouter   (hamilt, cpu/wall) :  5.4 5.6
 number of local orbitals, nlo (hamilt)  304
   allocate YL   2.5
MB  dimensions15  3481 3
   allocate phsc 0.1 MB  dimensions  3481
Time for los  (hamilt, cpu/wall) :  0.2 0.2
Time for alm (hns) :  0.0
Time for vector  (hns) :  0.4
Time for vector2 (hns) :  0.4
Time for VxV (hns) :  2.1
Wall Time for VxV(hns) :  0.1
 245  Eigenvalues computed 
 Seclr4(Cholesky complete (CPU)) :   1.110 50667.31
Mflops
 Seclr4(Transform to eig.problem (CPU)) :3.580 47129.09
Mflops
 Seclr4(Compute eigenvalues (CPU)) :11.320 19873.04
Mflops
 Seclr4(Backtransform (CPU)) :   0.250 11875.01
Mflops
   TIME HAMILT (CPU)  = 5.7, HNS = 2.6, HORB = 0.0,
DIAG =16.3
   TIME HAMILT (WALL) = 5.9, HNS = 2.6, HORB = 0.0,
DIAG =16.3

real0m25.587s
user0m24.857s
sys 0m0.321s
-

So I apologize for my statement in the last email that was too
ambitious. Indeed in this particular case the opensource stack is ~12%
slower (25 vs 28 seconds). Most of this is in the DIAG part (which I
believe is where OpenBLAS comes to play). However on some other (older)
Intel CPUs the DIAG part can be even faster with OpenBLAS, see the
already mentioned email by prof. Blaha https://www.mail-archive.com/wie
n...@zeus.theochem.tuwien.ac.at/msg15106.html where he tested on i7-3930K
(sandybridge), hence for those older CPUs I would expect the
performance to be really comparable (with the small patch to utilize
the libmvec in order to speed up the HAMILT part).

In general the opensource support is usually slow to materialize hence
the performance on older CPUs is better. Especially in the OpenBLAS
where the optimizations for new CPUs and instruction sets are not
provided by Intel (contrary to the gcc, gfrortran and glibc where Intel
engineers contribute directly) while the MKL and ifort have good
support from day 1.

I do agree that it is better to advise users to use MKL+ifort since
when they have it properly installed the siteconfig is almost always
able to detect and build everything out of the box with default config.
This is unfortunately not the case with the opensource libraries, where
the detection does not work most of time due to distro differences and
the unfortunate fact that majority of the needed libraries does not
provide any good means for autodetection 

Re: [Wien] Installation with MPI and GNU compilers

2018-04-04 Thread Laurence Marks
I confess to being rather doubtful that gfortran+... is comparable to
ifort+... for Intel cpu, it might be for AMD. While the mkl vector
libraries are useful in a few codes such as aim, they are minor for the
main lapw[0-2].

On Wed, Apr 4, 2018, 10:55 Pavel Ondračka  wrote:

> Rui Costa píše v St 04. 04. 2018 v 14:21 +0100:
> > I will see what I can do about the Intel compilers. I've had a
> > question about this, supposedly the intel compilers are the fastest
> > [
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mail-2D=DwIGaQ=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0=_NY1hgzBYHxYTFCc6OKMrgmlmLPdt3b__vsdXAoQC14=M8g1ileELz5aImqKU7vYPAV3_-54VZ7qLYURtbQMg2o=
> > archive.com/wien@zeus.theochem.tuwien.ac.at/msg13021.html], but how
> > much faster are they than the others? I expect this to vary from case
> > to case but on average, how much faster are they?
>
> In fact the compiler (e.g. ifort vs gfortran) hardly makes a difference
> . The important part are the algebra libraries. The opensource OpenBLAS
> should be almost identical to Intels MKL (see
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mail-2Darchive=DwIGaQ=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0=_NY1hgzBYHxYTFCc6OKMrgmlmLPdt3b__vsdXAoQC14=7D6kz5IB8jHQBz1mDaZyKAY1Qi7mPGB6quKsNFRB0o4=.
> com/wien@zeus.theochem.tuwien.ac.at/msg15106.html
> 
> for comparison of
> OenBLAS vs MKL). However in this old benchmark the opensource stack is
> still quite slower since the MKL also provides the VML library for
> vectorized math functions, which did not had any open source
> alternative for a long time. Recently there is the libmvec library
> which provides such functions (you need recent glibc), but there is no
> official Wien2k support for this. However it is actually quite easy to
> get it working (see
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mail-2Darchive.com_wien-40zeus.theochem.tuw=DwIGaQ=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0=_NY1hgzBYHxYTFCc6OKMrgmlmLPdt3b__vsdXAoQC14=rdruSPlbzu87Wc6ZNL7iE8pWopEi2yuuvJSnKEzGxTQ=
> ien.ac.at/msg16159.html ).
> Hence if you use the gfortran + OpenBLAS + libmvec the performance is
> virtually identical to ifort + MKL + VML. The setup is somewhat more
> difficult though.
>
> Best regards
> Pavel
>
> > My objective is not to do simulations with mpi in the computer that
> > I'm trying to install but to figure out how to install wien2k with
> > mpi and then give some guidelines to the IT technician. I spent two
> > weeks telling them that the simulations were not running because the
> > packages were not compiled and in the end everything was poorly
> > installed.
> >
> > Thank you for your help.
> >
> > Best regards,
> > Rui Costa.
> >
>
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__zeus.theochem.tuwien.ac.at_mailman_listinfo_wien=DwIGaQ=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0=_NY1hgzBYHxYTFCc6OKMrgmlmLPdt3b__vsdXAoQC14=q8GD6lLCuIa7NZly_UAheLzby1ihOwinBLQyO21N9hI=
> SEARCH the MAILING-LIST at:
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mail-2Darchive.com_wien-40zeus.theochem.tuwien.ac.at_index.html=DwIGaQ=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0=_NY1hgzBYHxYTFCc6OKMrgmlmLPdt3b__vsdXAoQC14=f-OQQ1fsIHEDXpL8Zy4Sy4C_4jyrDkZgdqtQvTOMYk8=
>
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Installation with MPI and GNU compilers

2018-04-04 Thread Pavel Ondračka
Rui Costa píše v St 04. 04. 2018 v 14:21 +0100:
> I will see what I can do about the Intel compilers. I've had a
> question about this, supposedly the intel compilers are the fastest
> [https://www.mail-
> archive.com/wien@zeus.theochem.tuwien.ac.at/msg13021.html], but how
> much faster are they than the others? I expect this to vary from case
> to case but on average, how much faster are they?

In fact the compiler (e.g. ifort vs gfortran) hardly makes a difference
. The important part are the algebra libraries. The opensource OpenBLAS
should be almost identical to Intels MKL (see https://www.mail-archive.
com/wien@zeus.theochem.tuwien.ac.at/msg15106.html for comparison of
OenBLAS vs MKL). However in this old benchmark the opensource stack is
still quite slower since the MKL also provides the VML library for
vectorized math functions, which did not had any open source
alternative for a long time. Recently there is the libmvec library
which provides such functions (you need recent glibc), but there is no
official Wien2k support for this. However it is actually quite easy to
get it working (see https://www.mail-archive.com/wien@zeus.theochem.tuw
ien.ac.at/msg16159.html ).
Hence if you use the gfortran + OpenBLAS + libmvec the performance is
virtually identical to ifort + MKL + VML. The setup is somewhat more
difficult though.

Best regards
Pavel

> My objective is not to do simulations with mpi in the computer that
> I'm trying to install but to figure out how to install wien2k with
> mpi and then give some guidelines to the IT technician. I spent two
> weeks telling them that the simulations were not running because the
> packages were not compiled and in the end everything was poorly
> installed.
> 
> Thank you for your help.
> 
> Best regards,
> Rui Costa.
> 

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Installation with MPI and GNU compilers

2018-04-04 Thread Rui Costa
I will see what I can do about the Intel compilers. I've had a question
about this, supposedly the intel compilers are the fastest [
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg13021.html],
but how much faster are they than the others? I expect this to vary from
case to case but on average, how much faster are they?

My objective is not to do simulations with mpi in the computer that I'm
trying to install but to figure out how to install wien2k with mpi and then
give some guidelines to the IT technician. I spent two weeks telling them
that the simulations were not running because the packages were not
compiled and in the end everything was poorly installed.

Thank you for your help.

Best regards,
Rui Costa.

On 4 April 2018 at 04:48, Gavin Abo  wrote:

> Some comments:
>
> I haven't seen many mailing list posts about using a gfortran-based mpi.
> That is probably because the clusters used for mpi are likely systems that
> cost something like $100k to $1 millon.  Those systems usually seem to be
> running Intel MPI.  So companies, computing centers, and universities for
> example likely have no problem paying say $1,499 for Cluster Edition for
> C/C++ and Fortran [ https://software.intel.com/en-
> us/articles/academic-pricing ].
>
> How much I get paid to help you: $0
>
> How much the IT Support Technician at your organization likely gets paid
> to help you: average pay of about $52k per year according to glassdoor [
> https://www.glassdoor.com/Salaries/it-support-
> technician-salary-SRCH_KO0,21.htm ]
>
> Meaning, please try asking your clusters IT department or helpdesk first
> for help with this if you have one.
>
> I assume your using at least a GB-network cluster [
> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg13632.html
> , https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.
> at/msg09334.html ] as mpi is usually useless with a single multi-core
> personal computer [ https://www.mail-archive.com/
> wien@zeus.theochem.tuwien.ac.at/msg05470.html ,
> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg07371.html
> ].
>
> If you plan to use Bandstructure in w2web of WIEN2k 17.1, a "At line 75 of
> file modules_tmp_.F (unit = 5, file = 'ubuntu.in1c')" error can occur with
> gfortran when the "x lapw1 -band" button is clicked, so you may want to
> apply band.pl and scf.pl [ https://www.mail-archive.com/
> wien@zeus.theochem.tuwien.ac.at/msg16069.html ] or band.patch and
> scf.patch [ https://github.com/gsabo/WIEN2k-Patches/tree/master/17.1 ].
> There are some other patches that might also be helpful for gfortran [
> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg17175.html
> ].  However, I currently don't have files (SRC_symmetry and x_lapw) to fix
> the gfortran error with "x dstart" if you use the init_lapw command, but
> you could probably get them from Prof. Blaha or make the changes yourself
> following the instructions given before [ https://www.mail-archive.com/
> wien@zeus.theochem.tuwien.ac.at/msg16674.html ].
>
> Yes, it usually better to have only one MPI implementation and one BLAS
> library.  However, it should not be a problem as long as you can keep them
> from mixing and conflicting with each other.  As I recall, all mpi is
> somewhat equal [ https://www.mail-archive.com/
> wien@zeus.theochem.tuwien.ac.at/msg09557.html ], though openmpi might be
> easy to compile [ https://www.mail-archive.com/
> wien@zeus.theochem.tuwien.ac.at/msg07343.html ].
>
> The libfftw3 and libfftw3_mpi of FFTW 3.x.x [
> http://www.fftw.org/download.html ] are what you need to use.  The
> liblfftw3xf and libfftw3xf_gnu (or libfftw3x_cdft.a) files if I recall
> correctly would be located or generated in the interfaces directory of an
> Intel ifort/mkl installation [ https://www.mail-archive.com/
> wien@zeus.theochem.tuwien.ac.at/msg07333.html ].  The mkl interface to
> fftw3xf I think still does not work with mpi (only for serial compilation)
> [ https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.
> at/msg06959.html ].
> Sorry, I currently don't have answers to your other questions, but I think
> you are headed in the right direction.
>
>
> On 4/3/2018 2:02 PM, Rui Costa wrote:
>
> Dear wien2k users,
>
> I'm trying to install wien2k_17.1 with mpi, fftw and elpa using the GNU
> compilers. I must say that I'm not an expert in linking and compiling the
> packages, so probably some things will be wrong.
>
> I'm using Ubuntu 16.04 LTS and have installed:
> - BLAS: OpenBlas-0.2.20 which is in /opt/OpenBLAS and the libblas3 (shared
> version) and libblas-dev (static version) of netlib using synaptic package
> manager which is in /usr/lib;
> - LAPACK: liblapack3 (shared version) and libpalack-dev (static version)
> of netlib using synaptic package manager which is in /usr/lib;
> - BLACS: libblacs-mpi-dev and libblacs-openmpi1 of netlib using synaptic
> package manager which is in /usr/lib
> - ScaLAPACK: 

Re: [Wien] Installation with MPI and GNU compilers

2018-04-03 Thread Gavin Abo

Some comments:

I haven't seen many mailing list posts about using a gfortran-based 
mpi.  That is probably because the clusters used for mpi are likely 
systems that cost something like $100k to $1 millon.  Those systems 
usually seem to be running Intel MPI.  So companies, computing centers, 
and universities for example likely have no problem paying say $1,499 
for Cluster Edition for C/C++ and Fortran [ 
https://software.intel.com/en-us/articles/academic-pricing ].


How much I get paid to help you: $0

How much the IT Support Technician at your organization likely gets paid 
to help you: average pay of about $52k per year according to glassdoor [ 
https://www.glassdoor.com/Salaries/it-support-technician-salary-SRCH_KO0,21.htm 
]


Meaning, please try asking your clusters IT department or helpdesk first 
for help with this if you have one.


I assume your using at least a GB-network cluster [ 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg13632.html 
, 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg09334.html 
] as mpi is usually useless with a single multi-core personal computer [ 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg05470.html 
, 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg07371.html 
].


If you plan to use Bandstructure in w2web of WIEN2k 17.1, a "At line 75 
of file modules_tmp_.F (unit = 5, file = 'ubuntu.in1c')" error can occur 
with gfortran when the "x lapw1 -band" button is clicked, so you may 
want to apply band.pl and scf.pl [ 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg16069.html 
] or band.patch and scf.patch [ 
https://github.com/gsabo/WIEN2k-Patches/tree/master/17.1 ]. There are 
some other patches that might also be helpful for gfortran [ 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg17175.html 
].  However, I currently don't have files (SRC_symmetry and x_lapw) to 
fix the gfortran error with "x dstart" if you use the init_lapw command, 
but you could probably get them from Prof. Blaha or make the changes 
yourself following the instructions given before [ 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg16674.html 
].


Yes, it usually better to have only one MPI implementation and one BLAS 
library.  However, it should not be a problem as long as you can keep 
them from mixing and conflicting with each other.  As I recall, all mpi 
is somewhat equal [ 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg09557.html 
], though openmpi might be easy to compile [ 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg07343.html 
].


The libfftw3 and libfftw3_mpi of FFTW 3.x.x [ 
http://www.fftw.org/download.html ] are what you need to use.  The 
liblfftw3xf and libfftw3xf_gnu (or libfftw3x_cdft.a) files if I recall 
correctly would be located or generated in the interfaces directory of 
an Intel ifort/mkl installation [ 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg07333.html 
].  The mkl interface to fftw3xf I think still does not work with mpi 
(only for serial compilation) [ 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg06959.html 
].


Sorry, I currently don't have answers to your other questions, but I 
think you are headed in the right direction.


On 4/3/2018 2:02 PM, Rui Costa wrote:

Dear wien2k users,

I'm trying to install wien2k_17.1 with mpi, fftw and elpa using the 
GNU compilers. I must say that I'm not an expert in linking and 
compiling the packages, so probably some things will be wrong.


I'm using Ubuntu 16.04 LTS and have installed:
- BLAS: OpenBlas-0.2.20 which is in /opt/OpenBLAS and the libblas3 
(shared version) and libblas-dev (static version) of netlib using 
synaptic package manager which is in /usr/lib;
- LAPACK: liblapack3 (shared version) and libpalack-dev (static 
version) of netlib using synaptic package manager which is in /usr/lib;
- BLACS: libblacs-mpi-dev and libblacs-openmpi1 of netlib using 
synaptic package manager which is in /usr/lib
- ScaLAPACK: libscalapack-mpi-dev where I don't find the shared or 
static libraries, i.e., the libscalapack-mpi-dev.a or 
libscalapack-mpi-dev.so files, and libscalapack-openmpi1 
using synaptic package manager which is in /usr/lib;
- MPI: mpich-3.2.1 and it is installed in /usr/local/bin, 
/usr/local/lib and /usr/local/include and since I installed the 
openmpi version of BLACS and ScaLAPACK, I also have Openmpi and the 
binary files are at /usr/bin;
- FFTW: fftw-3.3.7 and I installed this one from source files with the 
option ./configure --enable-mpi, and installed it in /usr/local/lib/ 
and /usr/lib/x86_64-linux-gnu;
- ELPA: libelpa-dev and libelpa3 and installed these using the 
synaptic manager again.



Questions:
1) I'm concerned that having two MPI implementations and two BLAS 
libraries might cause things to be compiled incorrectly. My idea was 
to install wien2k with MPICH since it seems to be the