[Wien] band character plotting with eece

2024-04-24 Thread Pavel Ondračka
Dear Wien2k mailing list,

is this enough for spaghetti with band character for onsite hybrids?

x lapw1 (-up/-dn) -orb -band
x lapw2 (-up/-dn) -qtl -band

or do I need also -eece -orb switches for lapw2?

Best regards
Pavel

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Few questions about onsite hybrids and so

2024-02-12 Thread Pavel Ondračka

Dear prof. Blaha,





wow, thanks a lot for such a detailed answer. This was very helpful indeed.




Best regards

Pavel


"Hi,
Here are my comments. Most of them similar to what Laurence said.

> I'm trying to calculate a band structure of Tb3Ga5O12 magneto-optical 
> crystal (cubic Ia-3d, 80 atoms). While I consider myself quite
>
> Luckily I'm not shooting completely blind as I have some high-quality 
> optical data where we can see some (very weak but also quite sharp and 
> hence noticeable f-f transitions in the band gap so I have some idea
> how the Tb f states at least should look like). Significant optical
> absorption start around 4eV but below that I see some very weak
> electronic transitions in the 0.2-0.9eV range, around 2.5 and 3.5eV
> (reportedly between f states located in the band gap). So I expect at 
> least three bunches of f states in band gap one occupied and the others 
> unoccupied.

Unfortunately, I don't believe that these optical f-f transitions can be 
described by DFT. These are crystal-field splitted multiplet
excitations, which are usually not accessible by DFT.

PS: Optical transitions create an electron-hole pair and excitonic
(correlation) effects can be very large.
XPS creates a free electron and a hole and although this is also not a
ground state, it is usually better described by groundstate DFT.

>From your chemical formulae one expects Tb3+, i.e. a fully occupied
spin-up 4f band and a single 4f electron occupied in spin-dn.
Of course, PBE gives a metal and the 4f-dn states are pinned at EF.
An orbital potential can split these states and single out a single 4f
electron/atom. However, with orbital potentials in many cases one can
obtain several different orbitals occupied, which depends on the
starting density matrix. In other words, your solution may not be the
ground state, but a metastable state.
Therefor I'd do first GGA+SO, and "hope" that this gives me a bit larger 
occupancy of the "correct" 4f orbital. When you then calculate the
density matrix from this solution, you may run in the lowest energy
orbitally-ordered state. Eventually, you could also start from different 
density matrices and see to which solutions you converge and compare
total energies (these manipulations are simpler in DFT+U than in EECE). 

RMTs: Since we cannot use HDLOs for orbital potentials, too large
spheres are not good. However, (in particular for 3d systems) small
spheres mean that only 80-90% of the d-charge is inside the sphere and
thus gets shifted by the orbital potential. Thus one needs a larger U
(or alpha) to get similar results with smaller RMTs.
For later 4f atoms, however, the 4f are very localized (in Tb with
RMT=2.0 97% of the 4f charge is inside spheres (see case.outputst). My 
personal choice would be RMT = 2.1 to 2.2).

Relaxation: Yes, you can safely relax the O atoms when SO is switched
off for them and the heavy atoms are fixed in case.inM.
If this is just a powder X-ray structure, the O-positions could be quite 
wrong.

Most 4f systems would be anti-ferromagnets, but with very low Neel
temperature, which means that the energy difference between an AFM and
FM ordering is very small. These are local moments and they do not care 
too much how their neighbors are polarized.
>
> Regarding the f electron correction I opted for onsite hybrid and
> initialized it with init_orb_lapw -eece.
> UG says that its better to use LDA for the exchange potential so I
> copied case.in0 to case.in0eece_lapw where I replaced "XC_PBE" on the 
> first line with "EX_PBE VX_LDA EC_PBE VC_PBE".

This is a misunderstanding. I'd use PBE in case.in0 since the Ga/O
states should be much better described by PBE. However, for the double
counting correction, LDA is numerically preferred and the UG says:
"This is possible by copying case.in0 to case.in0eece_lda and specify
VX_LDA". Note: it is case.in0eece_lda, not case.in0eece_lapw

EECE vs DFT+U is a matter of taste. EECE has one adjustable parameter,
DFT+U 1-2 (U and J). For 4f systems the "effective U" (J=0) is often not
justified since the intraatomic J may be important. It may have quite
some influence on the orbital magnetic moment.
Anyway, both are approximations and for a proper gap you may need mBJ+U 
(or mBJ+EECE) with a smaller U (alpha).

> The onsite hybrid calculation converged fine, I get a nice splitting of 
> the f states (albeit a bit too much maybe).
> The other options would be +U obviously, I went for the hybrid because 
> it felt more rigorous, but I would also appreciate comments if someone 
> has maybe better experience with +U?
>
> Next step was to initialize spin-orbit interaction with init_so_lapw. I 
> started with the default 001 but I want to also try other directions
> later and compare. I opted for no relativistic LOs (no support in
> optics) and enabled it only for Tb and Ga. symetso created a new
> structure (most notable I have more Tb inequivalent positions) and than 
> I manually fixed case.inso case.indm 

Re: [Wien] Few questions about onsite hybrids and so

2024-02-12 Thread Pavel Ondračka
On Mon, 2024-02-12 at 20:57 +0800, Laurence Marks wrote:
> With an RMT for Tb of 2.43 the O2p will leak into the Tb sphere. I
> used 2.02. You may want to use -ecut .995 or simioar rather than a
> fixed energy.

Will try, thanks.

> If your Ga & Tb positions are fixed then I guess -so might work in
> MSR1a, I have never tried.
> 
> N.B., I meant x-ray or neutron positions, the latter might be better
> for the O. In my opinion you should not use peaks in spectra or band
> gaps as these are excited state properties, and -eece is ground
> state. That said, optimizing the hybrid fraction for positions gave
> decent gaps for a few other cases as well. Never published as I have
> no explanation.

I fully agree that comparing band structure to optical spectra (and
optical band gaps) is tricky (unless one can also do BSE). However on
the contrary I have some good experience with XPS valence band
measurements. For example I previously observed good agreement between
position of some occupied defect states in the band gap as calculated
with (full hybrid) DFT and observed by valence band XPS.

Anyway, thanks again for all the suggestions, I'll also check if I can
get good enough O positions from XRD to compare to the relaxed
positions as dependent on the the hybrid fraction...

Best regards
Pavel
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Few questions about onsite hybrids and so

2024-02-12 Thread Pavel Ondračka
Dear prof. Marks,
thanks a lot for your comments, just some follow up (I was not sure
whether by "you can ask me offline" you meant private email, but
hopefully this can still be also interesting for the list):

On Mon, 2024-02-12 at 18:56 +0800, Laurence Marks wrote:
> Many comments/responses:
> a) You can do both forces and volume optimization with -eece, but not
> with -so.

Thanks for the clarification, this is very helpful, as I said I was
under the impression that I can relax O positions with so as long as I
don't turn on so for O ("UG 5.2.18 init_so_lapw: Since forces are not
correct for atoms with SO, it can be very useful to suppress SO for
light atoms (eg. the O-atoms in UO2 ), because then one can optimize
the O-positions.")

> b) For 4f what you did with case.in0eece is right, but check that it
> does not get overwritten. I had to edit an overwrite out of my
> runeece.

Will check, thanks.

> c) Expect the addition of -so to change things quite a lot -- and
> very little! The nett change in the energy will be very small, and
> you may want to think about the spin-ordering temperatures. Is your
> compound ferromagnetic, antiferromagnetic or what?

Honestly I have no idea, right now I have ferromagnetic, but this is
something I want to take a closer look at as well. It could be quite
complex, and maybe even questionable if I can even end with some
results that are relevant for the room temperature optical measurements
my colleagues are doing while within the limitations of collinear
model...

> d) People will tell you to use +U which will put the 4f electrons
> really low. My recommendation is to ignore them. As you noted they
> are in the valence regime.

Noted

> e) One way to fit the hybrid fraction is to get the best fit
> (approximately) to the x-ray positions. This turned out for me to be
> very reasonable.

Just to double check, by "X-ray positions" you mean refined atomic
positions from XRD or positions of the Tb states in XPS valence band
spectrum? XPS is something I definitely have on my TODO list.

> f) Beware too large RMTs. If you have these for the metal atoms then
> you get the tails of the O 2p states within those RMTs and that can
> give you artifacts.

To be honest I have no feeling here about what are too large RMTs in
this regard. I have 2.43 for Tb, 1.82 for Ga and 1.65 for O (this is
almost touching spheres). How big decrease would you recommend 5-10%?

> If you have other questions you can ask me offline if you want. You
> may want to look at DOI: 10.1103/PhysRevMaterials.2.025001,
> 10.1016/j.ultramic.2018.12.005, 10.1103/PhysRevMaterials.5.125002,
> 10.1021/acs.inorgchem.2c04107 Note that the XPS is dominated (cross-
> sections) by the 4f, and in TbScO3 that are at the Fermi edge (if it
> is Tb3+, Tb4+ will be simpler).

This is very exhaustive list, thanks. Will definitely read through it.

Best regards
Pavel

> 
> On Mon, Feb 12, 2024 at 6:15 PM Pavel Ondračka
>  wrote:
> > Dear Wien2k mailing list,
> > 
> > I'm trying to calculate a band structure of Tb3Ga5O12 magneto-
> > optical
> > crystal (cubic Ia-3d, 80 atoms). While I consider myself quite
> > experienced Wien2k user, I've always managed to stay away from f
> > block
> > elements, so my experience here is none. So besides the few
> > questions I
> > have I'll also try to somehow summarize what I did, please correct
> > me
> > if something was not OK.
> > 
> > Luckily I'm not shooting completely blind as I have some high-
> > quality
> > optical data where we can see some (very weak but also quite sharp
> > and
> > hence noticeable f-f transitions in the band gap so I have some
> > idea
> > how the Tb f states at least should look like). Significant optical
> > absorption start around 4eV but below that I see some very weak
> > electronic transitions in the 0.2-0.9eV range, around 2.5 and 3.5eV
> > (reportedly between f states located in the band gap). So I expect
> > at
> > least three bunches of f states in band gap one occupied and the
> > others
> > unoccupied.
> > 
> > I've started with spin-polarized PBE, I'm reasonably sure the
> > structure
> > file is OK, albeit probably not much relaxed (but I was hoping I
> > could
> > find equilibrium volume and do relaxation at a later point). I did
> > not
> > opt for HDLOs even though the Tb sphere is quite big (2.43) since I
> > would also like to try to get few momentum matrix elements later
> > with
> > optics, but I've increased the lmax to 14 and lvnsmax to 8 (lapw2
> > GMAX
> > 16, fft factor 3 and 4x4x4 k-grid).
> > 
> > The initial runsp went fine but the band structure is far from OK,
&g

[Wien] Few questions about onsite hybrids and so

2024-02-12 Thread Pavel Ondračka
Dear Wien2k mailing list,

I'm trying to calculate a band structure of Tb3Ga5O12 magneto-optical
crystal (cubic Ia-3d, 80 atoms). While I consider myself quite
experienced Wien2k user, I've always managed to stay away from f block
elements, so my experience here is none. So besides the few questions I
have I'll also try to somehow summarize what I did, please correct me
if something was not OK.

Luckily I'm not shooting completely blind as I have some high-quality
optical data where we can see some (very weak but also quite sharp and
hence noticeable f-f transitions in the band gap so I have some idea
how the Tb f states at least should look like). Significant optical
absorption start around 4eV but below that I see some very weak
electronic transitions in the 0.2-0.9eV range, around 2.5 and 3.5eV
(reportedly between f states located in the band gap). So I expect at
least three bunches of f states in band gap one occupied and the others
unoccupied.

I've started with spin-polarized PBE, I'm reasonably sure the structure
file is OK, albeit probably not much relaxed (but I was hoping I could
find equilibrium volume and do relaxation at a later point). I did not
opt for HDLOs even though the Tb sphere is quite big (2.43) since I
would also like to try to get few momentum matrix elements later with
optics, but I've increased the lmax to 14 and lvnsmax to 8 (lapw2 GMAX
16, fft factor 3 and 4x4x4 k-grid).

The initial runsp went fine but the band structure is far from OK, I
get only a single bunch of f states in the band gap clumped together
(some of them are occupied so its metallic), but experimentally I
should get and insulator (although the difference between the
unoccupied and occupied f states in the band gap is only maybe 0.2eV).

Regarding the f electron correction I opted for onsite hybrid and
initialized it with init_orb_lapw -eece.
UG says that its better to use LDA for the exchange potential so I
copied case.in0 to case.in0eece_lapw where I replaced "XC_PBE" on the
first line with "EX_PBE VX_LDA EC_PBE VC_PBE".
The onsite hybrid calculation converged fine, I get a nice splitting of
the f states (albeit a bit too much maybe).
The other options would be +U obviously, I went for the hybrid because
it felt more rigorous, but I would also appreciate comments if someone
has maybe better experience with +U?

Next step was to initialize spin-orbit interaction with init_so_lapw. I
started with the default 001 but I want to also try other directions
later and compare. I opted for no relativistic LOs (no support in
optics) and enabled it only for Tb and Ga. symetso created a new
structure (most notable I have more Tb inequivalent positions) and than
I manually fixed case.inso case.indm and case.inorb as the init_so
script warned me. I also guessed I should fix case.ineece (that seemed
straightforward) but than I thought I should also fix case.in2eece.
Reading UG gives the impression that case.in2eece is normal case.in2
with extra EECE on the first line and than the optional 3a and 3b
lines. In the case.in2eece created automatically with init_orb_lapw -
eece the 3a and 3b lines looked like:
1
1 1 3
However reading UG this actually seems wrong? Because UG says (Section
7.9 page 166) the format for optional 3b is just two values:
jatom rho, l rho
so I wonder if the UG is wrong or if I'm actually applying the hybrid
correction to p instead of f?

Also, is there anything else I should fix manually after intializing
the so on top of eece? Or should I do it the other way around (first so
and then eece)? The reasoning for doing first eece was that I get a
metal with plain PBE and an insulator with the onsite hybrid, so I
thought it might be easier to converge if I start so from insulator
(but I still use TEMP smearing just to be sure I don't end with
convergence problems if I get a metal during the convergence as the
expected unoccupied occupied f-f distance is very small.)

I was also considering mBJ later, just to get some feeling how the
conduction bad would shift but I'm not sure if this would work or not
on top of eece and so?

One last question is regarding the forces. From reading the UG I
understood that it should be OK to relax the oxygen positions with
onsite hybrid and so (as long as I don't have so or eece enabled for O
atoms). Is this correct? So will just switching to MSR1a and running
normal runsp -so -eece work or are some other fixes needed?

Best regards
Pavel
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Speeding up calculations in parallel mose

2023-08-22 Thread Pavel Ondračka
Dear Viktor,

at 54 atoms, you should have enough k-points to run k-parallel which is
probably going to be the fastest option. So lapw1 + lapw2 k-parallel
and the rest OpenMP parallelized.

An example .machines file for your 8 cores could look like this:

1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
omp_global:8
omp_lapw1:1
omp_lapw2:1

Additionally double check that you are indeed using the correct
libraries, go to you WIENROOT folder and do "ldd lapw1" you will see a
list of libraries it is inked against, make sure there is a
"libopenblas.so" and "libmvec.so" between them.
Also do the same for lapw0 and double check that it is compiled with
OpenMP (links to libgomp and links an OpenMP parallel openblas, I'm not
familiar with Ubuntu but it should be the library that comes with the
libopenblas-openmp package)).

That should hopefully do the trick. In general you can also test the
serial Wien2k benchmark. At single thread it should run around 20s on
your CPU and it should scale quite reasonably maybe up to 4-6 threads
with OMP_NUM_THREADS so that is also something you can check.

Best regards
Pavel

On Tue, 2023-08-22 at 12:30 +0300, Victor Zenou wrote:
> Dear Wien2k users!I’m investigating  a 54 tungsten atoms  supercell ,
> with 1 helium atom and 1 hydrogen atom (primitive cell) at different
> interstitial sites.  It takes ~ 46 hr per  calculation cycle, and
> half of it (~23 hr) in parallel mode.  The Wien2k version 23.2 was
> installed on Ubuntu 22.04.2 LTS. using gfortran and I set
> OMP_NUM_THREADS to  1, and used 2 parallel_jobs in the current work.
> The computer is build from  i7-10700 processor @ 2.90GHz (8 cores; 16
> Threads), 32 GB memory and 500 GB SSD. 
> In the past using the same computer , it took me ~ 14 hr per cycle
> for the same calculations, meaning 2-4 times faster than today. The
> wien2k version was 21.1, bur I can’t remember if the calculations
> were done in parallel, probably yes (I think  the number of parallel
> jobs was chosen automatically), and I think I set   OMP_NUM_THREADS
> to  4, but again I’m not sure.
> How can I speed up my calculations using the same computer?
> Best regards, Victor
> 
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at: 
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] FFTW compiling

2023-06-06 Thread Pavel Ondračka
Besides what Gavin mentioned, even if you get the correct fftw-openmpi
(or elpa-mpi) package, Red Hat distros don't install mpi stuff by
default into /usr/lib64. The reason is that more that one MPI runtime
can be installed (like MPICH vs OpenMPI) and the mpi libraries are MPI-
runtime specific. Thus the MPI libraries for the specific MPI runtime
tend to be with some specific directories like /usr/lib64/openmpi/lib/
and this path will only get added to the LD_LIBRARY_PATH if you load
the corresponding MPI module.

Additionally, the module files are also compiler and MPI specific so
also reside in custom directories, like
/usr/lib64/gfortran/modules/openmpi/
Now the main issue is that this custom locations are just not possible
to set up correctly with siteconfig, the script just doesn't give you
that much freedom as this breaks some expectations is has. So manual
editing of Makefiles is needed. In general one really needs to know
what he is doing to link Wien2k against system libraries with Red Hat
distros. So yeah, I've been there and I don't recommend it. In that
regard the download source/configure/make/make install combo is usually
simpler.

This only concerns MPI though, if you can live with k-point + OpenMP
parallelization only, than linking against system OpenBLAS and FFTW is
quite simple.

Best regards
Pavel

On Sun, 2023-06-04 at 13:31 +, Ilias Miroslav, doc. RNDr., PhD.
wrote:
> Hello,
> 
> ad:
> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg18799.html
> 
> I have CentOS 7, but the elpa rpm packages do not contain the
> libfftw3_mpi.a file,  only /usr/lib64/libfftw3_threads.a
> 
> Can the libfftw3_threads.a  file  be instead of libfftw3_mpi.a  ?
> 
> Miro
> 
> 
> PS:  List of all fftw library files:
> ls /usr/lib64/libfftw*          
> /usr/lib64/libfftw.a       /usr/lib64/libfftw3_omp.so.3.3.2*    
>  /usr/lib64/libfftw3f_omp.so.3@        /usr/lib64/libfftw3l_omp.so@
> /usr/lib64/libfftw.so@         /usr/lib64/libfftw3_threads.a    
>    /usr/lib64/libfftw3f_omp.so.3.3.2*    
>  /usr/lib64/libfftw3l_omp.so.3@
> /usr/lib64/libfftw.so.2@       /usr/lib64/libfftw3_threads.so@  
>    /usr/lib64/libfftw3f_threads.a      
>  /usr/lib64/libfftw3l_omp.so.3.3.2*
> /usr/lib64/libfftw.so.2.0.7*   /usr/lib64/libfftw3_threads.so.3@    
>  /usr/lib64/libfftw3f_threads.so@      
>  /usr/lib64/libfftw3l_threads.a
> /usr/lib64/libfftw3.a          /usr/lib64/libfftw3_threads.so.3.3.2*
>  /usr/lib64/libfftw3f_threads.so.3@    
>  /usr/lib64/libfftw3l_threads.so@
> /usr/lib64/libfftw3.so@        /usr/lib64/libfftw3f.a       
>  /usr/lib64/libfftw3f_threads.so.3.3.2*
>  /usr/lib64/libfftw3l_threads.so.3@
> /usr/lib64/libfftw3.so.3@      /usr/lib64/libfftw3f.so@     
>    /usr/lib64/libfftw3l.a    
>  /usr/lib64/libfftw3l_threads.so.3.3.2*
> /usr/lib64/libfftw3.so.3.3.2*  /usr/lib64/libfftw3f.so.3@       
>  /usr/lib64/libfftw3l.so@    
>  /usr/lib64/libfftw_threads.a
> /usr/lib64/libfftw3_omp.a      /usr/lib64/libfftw3f.so.3.3.2*   
>    /usr/lib64/libfftw3l.so.3@      /usr/lib64/libfftw_threads.so@
> /usr/lib64/libfftw3_omp.so@    /usr/lib64/libfftw3f_omp.a       
>  /usr/lib64/libfftw3l.so.3.3.2*      
>  /usr/lib64/libfftw_threads.so.2@
> /usr/lib64/libfftw3_omp.so.3@  /usr/lib64/libfftw3f_omp.so@     
>  /usr/lib64/libfftw3l_omp.a    
>  /usr/lib64/libfftw_threads.so.2.0.7*
> 
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at: 
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] compilation of Wien2k with GNU OpenMPI, but with MKL library ?

2023-06-06 Thread Pavel Ondračka
I believe it should be possible, at least the MKL link time
advisor 
https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html
definitely allows to select GNU compiler and OpenMPI. But yeah, it
might be more painful than going either fully Intel® oneAPI or GNU
compilers+OpenBLAS+OpenMPI way.

Best regards
Pavel


On Mon, 2023-06-05 at 12:11 +0200, Peter Blaha wrote:
> As far as I know, you cannot mix libraries compiled with ifort or
> with GNU compilers. At least in previous times, the objects would
> have one or 2 "_" in their reference and it would not fit together.
> Maybe there are some options to fix this, but I do not know.
> 
> My recommendations is therefore:   choose either Intel or GNU
> compilers.
> 
> For Intel you have to compile FFTW3 and ELPA yourself (see also the
> instructions in the UG, these are always only 3 commands and it is
> not so difficult) and can use the mkl for the rest.
> 
> For GNU you can use the Openblas and the corresponding Linux packages
> (if they exist) or you compile yourself with GNU. I don't know (but
> doubt) if you can link the mkl-blas,... with GNU, but you don't need
> mkl, because openblas is (almost) as good as mkl and "GNU-scalapack"
> comes with Linux.   
> 
> When using Intel, you can use either Intelmpi or Openmpi, but the
> name of the mkl blacks-library is different for the 2 mpi versions.
> 
> 
> Am 05.06.2023 um 10:45 schrieb Ilias Miroslav, doc. RNDr., PhD.:
>  
> >   
> >  Ad:
> > https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg22466.html
> >  
> >  Dear Professor Blaha, 
> >  
> >  thanks for your answer. So to get Wien2k compiled with intel
> > compilers, one needs FFTW3 and ELPA compiled with Intel compilers.
> >  
> >  Now the question is : if I use  OpenMPI with FFTW3 and ELPA
> > libraries compiled with GNU compilers, will the MKL libraries -
> > blas,lapack, plus  scalapack and blacs work, right  ?
> >  
> >  Best, Miro
> >  
> >  
> > ___
> > Wien mailing list
> > Wien@zeus.theochem.tuwien.ac.at
> > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> > SEARCH the MAILING-LIST at: 
> > http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at: 
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Wien2k access outside local area network

2023-04-14 Thread Pavel Ondračka
Hello Chithra,

it all depends on you network settings, in theory all you need is a
public IP and open ports. But this forum is not the good place to ask,
check with you local area network admin (who will also surely talk you
through the potential security risks).

Best regards
Pavel

On Fri, 2023-04-14 at 15:00 +0530, Chithra M Mathew wrote:
> Sir
> I have installed Wien2k software on my local area network Desktop.
> Can I access my Wien2k software from outside Local area network. Is
> it possible. please help me.
> 
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at: 
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] benchmark test withi9-12900k

2023-01-31 Thread Pavel Ondračka
Hi Sandeep, 
> 
> I have a query regarding this.
> While performing serial or parallel calculations, on increasing  omp
> from 1 to 8 , %age use of cpu's does not increase in the same scale
> (omp=2, 170to 180% , omp=4 ,300 to 330%  omp=8 only 500 to 550%).
> is something wrong in configuring or compiling the softwares or due
> to some limitations in hardware.
> Any suggestions?

There are several factors, one is the threading support in the
BLAS/LAPACK libraries and another one are the deficiencies of the
Wien2k OpenMP parallelization. HW also comes into play, mostly in the
general sense that the lower memory bandwidth you have the earlier you
will see the flattening of the speedup with more threads.

If you look at the lawp1 output you can see how the total time is
mostly divided in 3 parts, for example:
  TIME HAMILT (CPU)  = 2.8, HNS = 2.9, HORB = 0.0, DIAG
=17.3, SYNC = 0.0
   TIME HAMILT (WALL) = 0.7, HNS = 0.8, HORB = 0.0,
DIAG = 4.7, SYNC = 0.0

scaling of DIAG part is mostly based on how your libraries scale (MKL
does quite OK, but don't expect miracles). 

HAMILT scaling is based on explicit Wien2k parallelization. That one
also doesn't scale too well past 4-6 cores. The reason is I was mostly
learning OpenMP when I wrote it and I just went for the simplest "omp
parallel for" solution probably at too high level (also because the
support in ifort of higher OpenMP version with more advanced constructs
was not so good at that time). I think that there could still be some
speedup if this would be rewritten and the parallelization would happen
at different level, maybe more similarly to how its parallelized with
MPI so it fits better in the caches and could thus overcome the memory
bandwidth limits better when scaling to more cores.

HNS has no explicit threading at all and IIRC for the BLAS/LAPACK calls
there the library-level threadidng didn't help much. This could be also
improved by rewriting it to be more parallalization friendly (possibly
again mirroring how the MPI version does it, which scales fine IIRC),
but I'm not algebra expert so I haven't even tried.

So yeah, no easy way how this can be improved, unless you know a bit
about OpenMP and want to try yourself (BTW prof. Blaha was always very
welcoming to contributions even though I'm not part of the Wien2k team
:-) ). 


Best regards
Pavel
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] How to find the exact value of infinte epsilon?

2021-11-29 Thread Pavel Ondračka
Dear Atefe Marasi,

to add on top of what Xavier said, and thus similarly to him I'm also
assuming that by "infinite epsilon, i.e., dielectric constant at high
frequency" you mean the electronic part the dielectric tensor. Or in
other words the part from the electronic excitations. That you can get
with the optic, joint and kram commands.

Regarding the band gap, if you have insulator, the mBJ potential should
be a reasonable starting choice. The band gap values are OK and even
though the band dispersion and thus the momentum matrix elements are
not so good, it does not matter that much as the standard optic
calculation neglects excitons anyway. However even if you don't get the
proper shape of the imaginary dielectric function, in my experience
this does not matter that much if you are interested in the real part
of the dielectric function at E=0 (or at least reasonably far below the
band gap). I used mBJ+optic quite a lot for refractive index
calculations in vidible range for materials with band gap as low as 3eV
and even there it worked quite OK.

What is very important is to increase the maximum energy emax value in
lapw1, optic and joint to be sure you include enough states to properly
account for the high energy processes. Otherwise you will get some
underestimation of the real part of the dielectric function at low
energies.

Best regards
Pavel

On Mon, 2021-11-29 at 11:46 +0100, xavier rocquefelte wrote:
> Dear Atefe Marasi,
> Infinite epsilon means that you extrapolate the epsilon value for the
> zero of energy. 
> You must plot the real part of the dielectric function to properly
> estimate this value. 
> You must be careful because if you have a band gap and a bad
> description of the gap value ... the estimation will be not good at
> all. 
> Regards
> Xavier
> 
> On 29/11/2021 11:38, Atefe Marasi wrote:
>  
> > ___
> > Wien mailing list
> > 
> > 
> > SEARCH the MAILING-LIST at:  
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at: 
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Compiling Wien2k 21.1 on Ubuntu 20.04 with gfortran

2021-11-26 Thread Pavel Ondračka
Hi Gavin,

I think steps 1-5 (more or less the first 6 pages) can simplified to
something like "sudo apt instal libxc-dev openblas-openmp-dev
libscalapack-openmpi-dev openmpi-common libopenmpi-dev libfftw3-dev
libfftw3-mpi-dev ...". (I just googled the package names as I'm not
Ubuntu user, but I think you get the point).

IMO there is no point compiling libraries on your own, especially if
you don't use any extra flags or optimizations. I think everything
relevant is in Ubuntu repos (maybe except ELPA?).

I've been linking with system libraries (even the mpi ones like
scalapack and ELPA) in Fedora for ages with no issues and no noticeable
performance loss. Everything important for performance (like openblas,
fftw or elpa has optimized kernels for multiple architectures and
proper runtime selection anyway). 

Best regards
Pavel



On Fri, 2021-11-26 at 01:44 -0700, Gavin Abo wrote:
> I'm using Ubuntu 20.04 LTS  also but with a patched WIEN2k 21.1 that
> was compiled with gfortran and OpenBLAS.  The WIEN2k 21.1 bug fixes
> (patches) I got from the past posts in the mailing list.  A list of
> the url links to those posts are in the README file at [1].
> I also recently encountered a SIGSEGV segmentation fault (core
> dumped) runtime error when running lapw1 even though OpenBLAS 0.3.18
> compiled successfully.  I try to use the latest stable release of
> OpenBLAS, which is currently 0.3.18 [2].  However, in my case: My
> system has an AMD processor that targets Barcelona, and as it turns
> out, I found an OpenBLAS issue report at [3]. There it describes how
> OpenBLAS 0.3.15 works but OpenBLAS 0.3.16, 0.3.17, and 0.3.18 crashes
> for a processor that has a Barcelona target where the fix won't be
> available until a future 0.3.19 release.  As a workaround until
> 0.3.19 becomes available, I found that I could use the current
> OpenBLAS development version (0.3.18.dev) to have the fix.
> I have not tried the compile settings at [4].  I'm using just a
> 'basic' set of compile settings for being able to do serial, k-point
> parallel, or mpi parallel with WIEN2k.  By 'basic', I mean I using
> non-optimized flags as I haven't went through the GNU documentation
> [5] to optimize all the flags for my specific processor.
> Should you be interested in the details on how I installed WIEN2k
> 21.1 for my system.  I have made it available at [6], which you will
> probably find to be very similar to the older install described at
> [7].
>  [1] https://github.com/gsabo/WIEN2k-Patches/tree/master/21.1
>  [2] https://github.com/xianyi/OpenBLAS/releases
>  [3] https://github.com/xianyi/OpenBLAS/issues/3421
> [4]https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg21482.html
> [5] https://gcc.gnu.org/wiki/GFortran
> [6]https://github.com/gsabo/WIEN2k-Docs/blob/main/WIEN2k21.1_Install_with_gfortran.pdf
> [7]https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg21134.html
> 
> Kind Regards,
> Gavin
> WIEN2k user
> 
> On 11/24/2021 3:09 AM, David Holec wrote:
>  
> >  
> > Hi Pavel, 
> > 
> > Many thanks for your insights. As you know, I am not an expert on
> > how to compile codes, for me, this is sadly a trial and error
> > adventure.
> > 
> > I tried to compile it against the openblas library, but although
> > the compilation ends without any errors, I get a segmentation fault
> > when running lapw1 (on the test case
> > from http://www.wien2k.at/reg_user/benchmark/). The current setting
> > are:
> > 
> >  L   Linker Flags:    $(FOPT) -L/usr/lib/x86_64-linux-
> > gnu/openblas64-openmp 
> >   R   R_LIBS (LAPACK+BLAS):    /usr/lib/x86_64-linux-
> > gnu/openblas64-openmp/libopenblas64.so.0 -lpthread
> >  
> > (The rest is as I wrote in my first email.) Here is the list of
> > linked libraries:
> >  $ ldd lapw1 
> > linux-vdso.so.1 (0x7ffea57d6000) 
> > libopenblas64.so.0 => /lib/x86_64-linux-
> > gnu/libopenblas64.so.0 (0x14fe2b2e5000) 
> > libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> > (0x14fe2b2c2000) 
> > libgfortran.so.5 => /lib/x86_64-linux-gnu/libgfortran.so.5
> > (0x14fe2affa000) 
> > libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6
> > (0x14fe2aeab000) 
> > libmvec.so.1 => /lib/x86_64-linux-gnu/libmvec.so.1
> > (0x14fe2ae7f000) 
> > libgomp.so.1 => /lib/x86_64-linux-gnu/libgomp.so.1
> > (0x14fe2ae3d000) 
> > libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
> > (0x14fe2ac49000) 
> > /lib64/ld-linux-x86-64.so.2 (0x14fe2d4d3000) 
> > libquadmath.so.0 => /lib/x86_64-linux-gnu/libquadmath.so.0
> > (0x14fe2abff000) 
> > libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
> > (0x14fe2abe4000) 
> > libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2
> > (0x14fe2abde000)
> >  
> > And here is the stacking fault (it doesn't tell me anything):
> > $x lapw1 
> >  
> >  Program received signal SIGSEGV: Segmentation fault - invalid
> > memory reference. 
> >  
> >  

Re: [Wien] Compiling Wien2k 21.1 on Ubuntu 20.04 with gfortran

2021-11-25 Thread Pavel Ondračka
Hi David,

this is as good as it goes I guess, the HAMILT was speed up by ~30%,
but it is not so noticeable, because it actually scales much better
with the thread number than the rest (so it is actually running much
faster than the other parts at 4 threads, therefore further
improvements are not as much visible), it is more relevant at the k-
point parallel scenarios.

Anyway best luck
Pavel

On Thu, 2021-11-25 at 09:58 +0100, David Holec wrote:
> Hi Pavel,
> 
> I have added now -DHAVE_LIBMVEC to the compiler options as you have
> suggested (and removed it from the preprocessor flags). 
> 
>  O   Compiler options:    -ffree-form -O2 -ftree-vectorize -
> march=native -ffree-line-length-none -ffpe-summary=none -
> DHAVE_LIBMVEC
>  P   Preprocessor flags   -DParallel
> 
> 
> Here are the results of the test case:
> 
> $ x lapw1  
> STOP  LAPW1 END
> 109.094u 1.126s 0:29.41 374.7%  0+0k 0+37864io 0pf+0w
> $ grep HORB *output1*
> test_case.output1:   TIME HAMILT (CPU)  =    11.9, HNS =    15.2,
> HORB = 0.0, DIAG =    82.4, SYNC = 0.0
> test_case.output1:   TIME HAMILT (WALL) = 3.1, HNS = 4.4,
> HORB = 0.0, DIAG =    21.2, SYNC = 0.0
> 
> 
> David
> 
> ---
> Dr David Holec
> Computational Materials Science group
> Department of Materials Science
> Montanuniversität Leoben
> 
> 
> 
> Franz-Josef-Strasse 18, A-8700 Leoben, Austria
> tel. +43-(0)3842-4024211
> fax. +43-(0)3842-4024202
> materials.unileoben.ac.at
> cms.unileoben.ac.at
> 
> WHERE RESEARCH MEETS FUTURE
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at: 
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Compiling Wien2k 21.1 on Ubuntu 20.04 with gfortran

2021-11-24 Thread Pavel Ondračka
On Wed, 2021-11-24 at 16:51 +0100, Peter Blaha wrote:
> Just for information:  the -DHAVE_LIBMVEC is a preprocessor option
> (like -DINTEL_VML for ifort) and will speedup the HAMILT part due to
> a vectorization of cosine/sine functions.

Sorry for not being specific enough, -DHAVE_LIBMVEC should go to the
Wien2k compiler options, specifically add it to the "-ffree-form -O2 -
ftree-vectorize -march=native -ffree-line-length-none -ffpe-
summary=none" stuff

If you add it together with the -Dparallel (as I now see in the other
email), it will be available only for the mpi builds.

> As far as I remember, it is available only with more recent
> gfortran/openblas versions, therefore not yet a "default" gfortran
> option.

This is actually a Glibc feature (alternative to intel VML), introduced
with glibc 2.22 released in mid 2015. It is not on by default because 
at the time I wrote that stuff not all distros had it. Nowadays the are
still enterprise distros like old RHEL, CENTOS, or similar that use
older glibc (however this is mostly HPC where one would compile with
ifort anyway). All supported desktop distros like Fedora, Ubuntu,
Opensuse, etc. have is now, so it should be safe to add to
gfortran/openblas flags by default in the next release. 

Best regards
Pavel

> 
> Hi Pavel,
> 
> I don't think that that compiler flag has been used:
>  $ find . -name "Makefile" -exec grep "DHAVE_LIBMVEC" {} \;
> yields nothing in my Wien2k source directory.
> 
> David
> 
> Am 24.11.2021 um 14:43 schrieb Pavel Ondračka:
>  
> > Dear David,
> > 
> > nice, ~30 seconds instead of ~150 :-)
> > BTW is this already with "-DHAVE_LIBMVEC" in compiler options?
> > 
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at: 
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Compiling Wien2k 21.1 on Ubuntu 20.04 with gfortran

2021-11-24 Thread Pavel Ondračka
Dear David,

nice, ~30 seconds instead of ~150 :-)
BTW is this already with "-DHAVE_LIBMVEC" in compiler options?

For your real workflow you might also try to experiment with the number
of threads vs number of k-points in parallel (now it seems you are
already running at 4 threads for the test_case that has only 1 k-
point), but for small cases with lots of k-points I would expect that
k-point parallelization would be the best. 

Now that you link with the OpenMP-threaded OpenBLAS, it is important
that the number of k-points run in parallel times number of threads
allowed for lapw1/lapw2 does not exceed the total number of cores. It
seems your environment already has OMP_NUM_THREADS set to 4 (at least),
so you need to set the OpenMP threading explicitly for Wien2k.

Specifically try this .machines file (k-point parallel in lapw1/lapw2 +
OpenMP elsewhere, assuming Intel(R) Xeon(R) CPU W3550 should have 4
physical cores) with run_lapw -p for your standard use-case
---
1:localhost
1:localhost
1:localhost
1:localhost
omp_global:4
omp_lapw1:1
omp_lapw2:1


or alternativelly (two k-points and two threads in lapw1+2) 
-
1:localhost
1:localhost
omp_global:4
omp_lapw1:2
omp_lapw2:2
-

Best regards
Pavel

On Wed, 2021-11-24 at 13:55 +0100, David Holec wrote:
> Dear Pavel,
> 
> Many thanks again for your patience and guidance. With the libopenblas-
> openmp-dev package it seems to work well!
> 
> $ ldd lapw1
> linux-vdso.so.1 (0x7ffca83d8000)
> libopenblas.so.0 => /lib/x86_64-linux-gnu/libopenblas.so.0
> (0x14563f924000)
> libgfortran.so.5 => /lib/x86_64-linux-gnu/libgfortran.so.5
> (0x14563f65c000)
> libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6
> (0x14563f50d000)
> libmvec.so.1 => /lib/x86_64-linux-gnu/libmvec.so.1
> (0x14563f4e1000)
> libgomp.so.1 => /lib/x86_64-linux-gnu/libgomp.so.1
> (0x14563f49f000)
> libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> (0x14563f47c000)
> libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
> (0x14563f288000)
> /lib64/ld-linux-x86-64.so.2 (0x145641b2d000)
> libquadmath.so.0 => /lib/x86_64-linux-gnu/libquadmath.so.0
> (0x14563f23e000)
> libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
> (0x14563f223000)
> libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2
> (0x14563f21d000)
> 
> and these options in siteconfig:
>  L   Linker Flags:    $(FOPT) -L/usr/lib/x86_64-linux-
> gnu/openblas-openmp
>  R   R_LIBS (LAPACK+BLAS):    -lopenblas
> 
> I also get better timings now (though the TIME HAMILT are slightly
> longer, but overall improvement):
> $ ../x lapw1
> STOP  LAPW1 END
> 119.400u 1.937s 0:32.53 372.9%  0+0k 0+37864io 0pf+0w
> $ grep HORB *output1*
> test_case.output1:   TIME HAMILT (CPU)  =    17.3, HNS =    18.4,
> HORB = 0.0, DIAG =    85.0, SYNC = 0.0
> test_case.output1:   TIME HAMILT (WALL) = 4.6, HNS = 5.2,
> HORB = 0.0, DIAG =    22.0, SYNC = 0.0
> 
> Thanks for your help,
> David
> ---
> Dr David Holec
> Computational Materials Science group
> Department of Materials Science
> Montanuniversität Leoben
> 
> 
> 
> Franz-Josef-Strasse 18, A-8700 Leoben, Austria
> tel. +43-(0)3842-4024211
> fax. +43-(0)3842-4024202
> materials.unileoben.ac.at
> cms.unileoben.ac.at
> 
> WHERE RESEARCH MEETS FUTURE
> 
> 
> On Wed, 24 Nov 2021 at 12:39, Pavel Ondračka 
> wrote:
> > Hi David,
> > 
> > well, it is hard to say without the debug info why the OpenBLAS
> > crahes.
> > My guess is that you link with the 64bit interface, try to install
> > the
> > standard one (openblas-openmp-devel) and replace openblas64-openmp
> > with
> > openblas-openmp everywhere in you config. Also remove the -lpthread
> > (just to be safe, but in theory should not matter), it is not needed
> > with OpenMP. If it still crashes, please recompile with debug info
> > enabled (add -g to compiler options) and send me the x lapw1 output
> > via
> > PM.
> > 
> > BTW my response was mostly motivated by me suspecting you actually
> > link
> > against slow netlib BLAS (which turned out to be the case) and I
> > wanted
> > to warn others in case someone in the future would be using your
> > settings as a reference :-)
> > 
> > Best regards
> > Pavel
> > 
> > On Wed, 2021-11-24 at 11:09 +0100, David Holec wrote:
> > > Hi Pavel,
> > > 
> > > Many thanks for your insights. As you know, I am not an expert on
> > how
> > > to compile codes, for

Re: [Wien] Compiling Wien2k 21.1 on Ubuntu 20.04 with gfortran

2021-11-24 Thread Pavel Ondračka
 $lscpu
> Architecture:    x86_64
> CPU op-mode(s):  32-bit, 64-bit
> Byte Order:  Little Endian
> Vendor ID:   GenuineIntel
> CPU family:  6
> Model:   26
> Model name:  Intel(R) Xeon(R) CPU   W3550
>  @ 3.07GHz
> )
> 
> 
> ---
> Dr David Holec
> Computational Materials Science group
> Department of Materials Science
> Montanuniversität Leoben
> 
> 
> 
> Franz-Josef-Strasse 18, A-8700 Leoben, Austria
> tel. +43-(0)3842-4024211
> fax. +43-(0)3842-4024202
> materials.unileoben.ac.at
> cms.unileoben.ac.at
> 
> WHERE RESEARCH MEETS FUTURE
> 
> 
> On Wed, 24 Nov 2021 at 08:27, Pavel Ondračka
>  wrote:
> > Hi David,
> > 
> > as you said it works for you, so feel free to ignore, but I have
> > some
> > further tips if you are interested. Ubuntu switches between the
> > different blas and lapack using the "alternatives", so its
> > difficult
> > to
> > say if you actually link with the correct one.
> > 
> > "ldd lapw1" in WIENROOT should show which one is actually linked,
> > what
> > you want to have is the openmp openblas
> > /usr/lib/x86_64-linux-gnu/openblas-openmp/libblas.so
> > /usr/lib/x86_64-linux-gnu/openblas-openmp/liblapack.so
> > or alternatively
> > /usr/lib/x86_64-linux-gnu/openblas-openmp/libopenblas.so
> > It looks like you linked with the pthread one. This is not a
> > problem
> > when running at single thread but at higher thread number this
> > might
> > lead to oversubscription and slowdowns as the pthreaded openblas
> > doesn't respect the OMP_NUM_THREADS set by Wien2k. So I would
> > recommend
> > to relink with the openmp OpenBLAS. BTW it is usually safer to link
> > with OpenBLAS explicitly using the -lopenblas instead of the -
> > llapack
> > -
> > lblas to be sure you don't accidentally link the netlib one
> > (libopenblas is just the libblas and libblapack provided by
> > OpenBLAS
> > merged together).
> > 
> > In general easy way how to check performance is to run the serial
> > test_case from http://www.wien2k.at/reg_user/benchmark/ On modern
> > CPUs
> > (at least avx2) the runtime should be around 15-25 seconds at
> > single
> > thread.
> > 
> > I see total runtime of ~18seconds on Fedora 35 with gfortran 11.2.1
> > OpenBLAS and AMD Ryzen 9 3900X 12-Core Processor.
> > Also look for the following line in test_case.output1, this is what
> > I
> > have:
> > TIME HAMILT (WALL) =     2.2, HNS =     1.7, HORB =     0.0, DIAG
> > = 
> >  
> > 14.0, SYNC =     0.0
> > The time in HAMILT mostly depends on you compiler and vectorizing
> > settings, while the DIAG is 99% lapack/blas related, so this can
> > help
> > with the diagnostics if things are slow.
> > 
> > You might also get extra speedup of the HAMILT part by adding "-
> > DHAVE_LIBMVEC" to the Compiler options.
> > 
> > Best regards
> > Pavel
> > 
> > On Tue, 2021-11-23 at 11:07 +0100, David Holec wrote:
> > > Dear all,
> > > 
> > > I have just spent some time making Wien2k run on my single
> > > machine
> > > running Ubuntu 20.04 with gfortran/gcc. Since I am not an expert,
> > it
> > > was a trial and error, but it seems that I found a working
> > combination
> > > (sadly, the default parameters didn't work for me). Maybe this
> > > will
> > > help someone. Here are the settings that did the job for me:
> > > 
> > >   M   OpenMP switch:   -fopenmp
> > >   O   Compiler options:    -ffree-form -O2 -ftree-vectorize -
> > > march=native -ffree-line-length-none -ffpe-summary=none
> > >   L   Linker Flags:    $(FOPT) -L/usr/lib/x86_64-linux-
> > > gnu
> > >   P   Preprocessor flags   '-DParallel'
> > >   R   R_LIBS (LAPACK+BLAS):    -lblas -llapack -lpthread
> > >   F   FFTW options:    -DFFTW3 -DFFTW_OMP -I/usr/include
> > >   FFTW-LIBS:   -L/usr/lib/x86_64-linux-gnu -
> > > lfftw3
> > -
> > > lfftw3_omp
> > > 
> > > where the FFTW options were:
> > > 
> > >   R  FFTWROOT:  /usr/
> > >    V  FFTW_VERSION:  FFTW3
> > >    L  FFTW_LIB:  lib/x86_64-linux-gnu
> > >    N  FFTW_LIBNAME:  fftw3
> > > 
> > > Compiler versions:
> > > $ gcc -

Re: [Wien] Compiling Wien2k 21.1 on Ubuntu 20.04 with gfortran

2021-11-23 Thread Pavel Ondračka
Hi David,

as you said it works for you, so feel free to ignore, but I have some
further tips if you are interested. Ubuntu switches between the
different blas and lapack using the "alternatives", so its difficult to
say if you actually link with the correct one.

"ldd lapw1" in WIENROOT should show which one is actually linked, what
you want to have is the openmp openblas
/usr/lib/x86_64-linux-gnu/openblas-openmp/libblas.so
/usr/lib/x86_64-linux-gnu/openblas-openmp/liblapack.so
or alternatively
/usr/lib/x86_64-linux-gnu/openblas-openmp/libopenblas.so
It looks like you linked with the pthread one. This is not a problem
when running at single thread but at higher thread number this might
lead to oversubscription and slowdowns as the pthreaded openblas
doesn't respect the OMP_NUM_THREADS set by Wien2k. So I would recommend
to relink with the openmp OpenBLAS. BTW it is usually safer to link
with OpenBLAS explicitly using the -lopenblas instead of the -llapack -
lblas to be sure you don't accidentally link the netlib one
(libopenblas is just the libblas and libblapack provided by OpenBLAS
merged together).

In general easy way how to check performance is to run the serial
test_case from http://www.wien2k.at/reg_user/benchmark/ On modern CPUs
(at least avx2) the runtime should be around 15-25 seconds at single
thread.

I see total runtime of ~18seconds on Fedora 35 with gfortran 11.2.1
OpenBLAS and AMD Ryzen 9 3900X 12-Core Processor.
Also look for the following line in test_case.output1, this is what I
have:
TIME HAMILT (WALL) = 2.2, HNS = 1.7, HORB = 0.0, DIAG =   
14.0, SYNC = 0.0
The time in HAMILT mostly depends on you compiler and vectorizing
settings, while the DIAG is 99% lapack/blas related, so this can help
with the diagnostics if things are slow.

You might also get extra speedup of the HAMILT part by adding "-
DHAVE_LIBMVEC" to the Compiler options.

Best regards
Pavel

On Tue, 2021-11-23 at 11:07 +0100, David Holec wrote:
> Dear all,
> 
> I have just spent some time making Wien2k run on my single machine
> running Ubuntu 20.04 with gfortran/gcc. Since I am not an expert, it
> was a trial and error, but it seems that I found a working combination
> (sadly, the default parameters didn't work for me). Maybe this will
> help someone. Here are the settings that did the job for me:
> 
>   M   OpenMP switch:   -fopenmp
>   O   Compiler options:    -ffree-form -O2 -ftree-vectorize -
> march=native -ffree-line-length-none -ffpe-summary=none
>   L   Linker Flags:    $(FOPT) -L/usr/lib/x86_64-linux-gnu
>   P   Preprocessor flags   '-DParallel'
>   R   R_LIBS (LAPACK+BLAS):    -lblas -llapack -lpthread
>   F   FFTW options:    -DFFTW3 -DFFTW_OMP -I/usr/include
>   FFTW-LIBS:   -L/usr/lib/x86_64-linux-gnu -lfftw3 -
> lfftw3_omp
> 
> where the FFTW options were:
> 
>   R  FFTWROOT:  /usr/
>    V  FFTW_VERSION:  FFTW3
>    L  FFTW_LIB:  lib/x86_64-linux-gnu
>    N  FFTW_LIBNAME:  fftw3
> 
> Compiler versions:
> $ gcc --version
> gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
> gfortran --version
> GNU Fortran (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
> 
> And I used the generic lapack and openblas packages provides by Ubuntu
> repos:
> liblapack-dev/focal,now 3.9.0-1build1 amd64 [installed]
> liblapack3/focal,now 3.9.0-1build1 amd64 [installed,automatic]
> 
> liblapack64-3/focal,now 3.9.0-1build1 amd64 [installed,automatic]
> liblapack64-dev/focal,now 3.9.0-1build1 amd64 [installed]
> 
> libblas-dev/focal,now 3.9.0-1build1 amd64 [installed]
> libblas3/focal,now 3.9.0-1build1 amd64 [installed,automatic]
> libblas64-3/focal,now 3.9.0-1build1 amd64 [installed,automatic]
> libblas64-dev/focal,now 3.9.0-1build1 amd64 [installed,automatic]
> 
> libopenblas64-0/focal-updates,now 0.3.8+ds-1ubuntu0.20.04.1 amd64
> [installed]
> libopenblas64-0-openmp/focal-updates,now 0.3.8+ds-1ubuntu0.20.04.1
> amd64 [installed]
> libopenblas64-0-pthread/focal-updates,now 0.3.8+ds-1ubuntu0.20.04.1
> amd64 [installed,automatic]
> 
> (I am not totally sure if I need all the libraries above, but
> certainly, with these, the compilation seems to work and I am able to
> run SCF cycles & Telnes calculations without errors :-)
> 
> All the best,
> David
> ---
> Dr David Holec
> Computational Materials Science group
> Department of Materials Science
> Montanuniversität Leoben
> 
> 
> 
> Franz-Josef-Strasse 18, A-8700 Leoben, Austria
> tel. +43-(0)3842-4024211
> fax. +43-(0)3842-4024202
> materials.unileoben.ac.at
> cms.unileoben.ac.at
> 
> WHERE RESEARCH MEETS FUTURE
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at: 
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at

Re: [Wien] generalized regular k-point grids

2021-10-14 Thread Pavel Ondračka
Hi,

> Yes, WIEN2k is ready to accept any k-grid when using smearing
> methods, 
> however, you need to supply the proper weights.

Great :-)
> 
> I do not know what symmetry your cell has, but I can see only weights
> of 
> 1, 2 and 4 ? Is this correct ?
> It means you have only 4 symmetry operations for this cell ??

Well the posted k-points list was just few first lines, actually the
maximum weight in my case is 8 and that is consistent with the 8
symmetry operations I have (orthorhombic Cmcm).
> 
> PS: Have you tried TEMP instead of TEMPS ? As far as I understand, TEMP
> corrects towards zero Kelvin and should be compatible with TETRA. And
> of 
> course with a very small smearing parameter, TEMPS should go towards 
> TEMP --> TETRA (if the k-mesh is good enough).

I did not, but I was consistent so I believe it should not matter? 

> PPS: Be aware of different coordinate systems for different lattices !
> VASP and WIEN2k may eventually ?? specify the coordinates in different 
> coordinates (cartesian vs. fractions of (non-orthogonal) rec. lattice
> vectors.

I'll check this, thanks for the pointer, but this is looking like it
might be the case and I might need some rescaling. Looking at the
Userguide "We use cartesian coordinates in units of 2π/a, 2π/b, 2π/c
for P, C, F and B cubic, tetragonal and orthorhombic lattices, but
internal coordinates for H and monoclinic/triclinic lattices". I'll
check the VASP KPOINTS format...

Best regards
Pavel
> 
> Peter Blaha
> 
> Am 10/14/21 um 2:52 PM schrieb Pavel Ondračka:
> > Dear Wien2k mailing list,
> > 
> > Is Wien2k ready for a general k-point grid or is some part of the
> > code
> > assuming regular grid?
> > 
> > I was reading some papers about how the generalized regular k-point
> > grids have better efficiency over the standard Monkhorst-Pack ones...
> > For example this paper has also an implementation
> > https://msg.byu.edu/docs/papers/autoGR.pdf
> > 
> > It generates a k-point list in VASP KPOINTS format:
> >   0.  0.  0. 1
> > -0.1667  0.1667  0. 2
> > -0.  0.  0. 2
> >   0.5000 -0.5000  0. 1
> >   0.0625 -0.0208  0.035714285714 4
> > -0.10416667  0.1458  0.035714285714 4
> > -0.2708  0.3125  0.035714285714 4
> > .
> > 
> > I just do the stupid thing and convert it to the .klist format by
> > multiplying with 1e9 and applying the proper formating, i.e.:
> >   1 0 0 010  1.0
> >   2-1 1 010  2.0
> >   3-3 3 010  2.0
> >   4 5-5 010  1.0
> >   5  6250 -2083  3571428510  4.0
> >   6-10416 14583  3571428510  4.0
> >   7-27083 31250  3571428510  4.0
> > .
> > 
> > Now everything seems to run OK at the first glance (lapw2 crashes
> > with
> > TETRA ofc but TEMPS seems to be OK) but the energies are not so close
> > (I would expect that at very large number of k-points it should give
> > the same results as standard Wien2k MP grid), but there is a
> > difference
> > of maybe 2mRy/atom. So I guess there is still something somewhere
> > missing for this to work?
> > 
> > Best regards
> > Pavel
> > ___
> > Wien mailing list
> > Wien@zeus.theochem.tuwien.ac.at
> > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> > SEARCH the MAILING-LIST at: 
> > http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> > 
> 

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


[Wien] generalized regular k-point grids

2021-10-14 Thread Pavel Ondračka
Dear Wien2k mailing list,

Is Wien2k ready for a general k-point grid or is some part of the code
assuming regular grid?

I was reading some papers about how the generalized regular k-point
grids have better efficiency over the standard Monkhorst-Pack ones...
For example this paper has also an implementation
https://msg.byu.edu/docs/papers/autoGR.pdf

It generates a k-point list in VASP KPOINTS format:
 0.  0.  0. 1
-0.1667  0.1667  0. 2
-0.  0.  0. 2
 0.5000 -0.5000  0. 1
 0.0625 -0.0208  0.035714285714 4
-0.10416667  0.1458  0.035714285714 4
-0.2708  0.3125  0.035714285714 4
.

I just do the stupid thing and convert it to the .klist format by
multiplying with 1e9 and applying the proper formating, i.e.: 
 1 0 0 010  1.0
 2-1 1 010  2.0
 3-3 3 010  2.0
 4 5-5 010  1.0
 5  6250 -2083  3571428510  4.0
 6-10416 14583  3571428510  4.0
 7-27083 31250  3571428510  4.0
.

Now everything seems to run OK at the first glance (lapw2 crashes with
TETRA ofc but TEMPS seems to be OK) but the energies are not so close
(I would expect that at very large number of k-points it should give
the same results as standard Wien2k MP grid), but there is a difference
of maybe 2mRy/atom. So I guess there is still something somewhere
missing for this to work?

Best regards
Pavel
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] install WIEN2k_21.1 on Ubuntu 21.04

2021-10-12 Thread Pavel Ondračka
Dear Viktor,

I don't think that the Intel® oneAPI Base Toolkit actually includes the
fortran compiler, at least not according to:
https://software.intel.com/content/www/us/en/develop/tools/oneapi/all-toolkits.html#hpc-kit


I think you need Intel® oneAPI HPC Toolkit to get ifort.

BTW in general such questions are better suited for the Intel support
forums...

Best regards
Pavel

On Tue, 2021-10-12 at 14:12 +0300, Victor Zenou wrote:
> Dear all
> 
> I’m trying to install WIEN2k_21.1 on Ubuntu 21.04.
> 
> I updated and installed few system packages:
> 
> sudo apt update
> 
> sudo apt install build-essential gcc-multilib rpm default-jre-headless
> python tcsh gnuplot
> 
> sudo apt install autoconf libtool ghostscript octave
> 
> 
> I downloaded and installed “Intel® oneAPI Base Toolkit” which suppose
> to have ifort complier and mkl. I also add a line to bashrc:
> 
> source /opt/intel/oneapi/setvars.sh
> 
> 
> 
> The command “ifort -v”, gives “Command 'ifort' not found,...”
> 
> 
> ./siteconfig_lapw give the following message:
> 
> It seems you do not have the intel fortran compiler in your path.
> 
> 
> So it seems that my intel compiler either dio not exist or not in the
> path.
> 
> How can I check that/
> 
> 
> Any suggestions on how to proceed?
> 
> 
> Best regards, victor Zenou
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at: 
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] segmentation fault in lapwso

2021-08-19 Thread Pavel Ondračka
BTW I did the Valgrind run and there is nothing there (I don't have the
affected MKL, but either with OpenBLAS or with the Netlib LAPACK/BLAS
there are no Valgrind defects at all in the Wien2k code, just some
harmless leaked memory.) So yeah, confirming this is definitelly MKL.

Pavel

On Thu, 2021-08-19 at 06:56 -0500, Laurence Marks wrote:
> A suggestion: check your mkl version, as there is a mkl bug that was
> recently fixed, see
> https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Problem-with-LAPACK-subroutine-ZHEEVR-input-array-quot-isuppz/td-p/1150816
> _
> Professor Laurence Marks
> "Research is to see what everybody else has seen, and to think what
> nobody else has thought", Albert Szent-Györgyi
> www.numis.northwestern.edu
> 
> On Thu, Aug 19, 2021, 06:45 Peter Blaha
>  wrote:
> > I'm still on vacations, so cannot test myself.
> > 
> > However, I experienced such problems before. It has to do with 
> > multithreading (1 thread works always fine) and the mkl routine
> > zheevr.
> > 
> > In my case I could fix the problem by enlarging the workspace
> > beyond 
> > what the routine calculates itself. (see comment in hmsec on line
> > 841).
> > 
> > Right below, the workspace was enlarged by a factor 10, which fixed
> > my 
> > problem. But I can easily envision that it might not be enough in
> > some 
> > other cases.
> > 
> > An alternative is to switch back to zheevx (commented in the code).
> > 
> > Peter Blaha
> > 
> > Am 18.08.2021 um 20:01 schrieb Pavel Ondračka:
> > > Right, I think that the reason deallocate is failing because the
> > memory
> > > has been corrupted at some earlier point is quite clear, the only
> > other
> > > option why it should crash would be that it was not allocated at
> > all,
> > > which seem not to be the case here... The question is what
> > corrupted
> > > the memory and even more strange is why does it work if we
> > > disable
> > MKL
> > > multithreading?
> > > 
> > > It could indeed be that we are doing something wrong. I can
> > > imagine
> > the
> > > memory could be corrupted in some BLAS call if the number of
> > > columns/rows passed to the specific BLAS call is more than the
> > actual
> > > size of the matrix, than this could easily happen (and the
> > > multithreading is somehow influencing what the final value of the
> > > corrupted memory, and depending on the final value the deallocate
> > could
> > > fail or pass somehow). This should be possible to diagnose with
> > > valgrind as suggested.
> > > 
> > > Luis, can you upload the testcase somewhere, or recompile with
> > > debuginfo as suggested by Laurence earlier, run "valgrind --
> > > track-
> > > origins=yes lapwso lapwso.def" and send the output? Just be
> > > warned,
> > > there is a massive slowdown with valgrind (up to 100x) and the
> > logfile
> > > can get very large.
> > > 
> > > Best regards
> > > Pavel
> > > 
> > > 
> > > On Wed, 2021-08-18 at 12:10 -0500, Laurence Marks wrote:
> > > > Correction, I was looking at an older modules.F. It looks like
> > > > it
> > > > should be
> > > > 
> > > > DEALLOCATE(vect,stat=IV) ; if(IV .ne. 0)write(*,*)IV
> > > > 
> > > > 
> > > > On Wed, Aug 18, 2021 at 11:23 AM Laurence Marks
> > > >  wrote:
> > > > > I do wonder about this. I suggest editing module.F and
> > > > > changing
> > > > > lines 118 and 119 to
> > > > >        DEALLOCATE(en,stat=Ien) ; if(Ien .ne. 0)write(*,*)'Err
> > > > > en
> > > > > ',ien
> > > > >        DEALLOCATE(vnorm,stat=Ivn ; ) if(Ivn .ne.
> > > > > 0)write(*,*)'Err
> > > > > vnorm ',Ivn
> > > > > 
> > > > > There is every chance that the bug is not in those lines, but
> > > > > somewhere completely different. SIGSEV often means that the
> > > > > code
> > > > > has been overwritten, for instance arrays going out of
> > > > > bounds.
> > > > > 
> > > > > You can also recompile with -g (don't change other options)
> > > > > added, and/or -C. Sometimes this is better. Or use other
> > > > > things
> > > > > like debuggers or valgrind.
> > > > > 
> > &g

Re: [Wien] segmentation fault in lapwso

2021-08-18 Thread Pavel Ondračka
Right, I think that the reason deallocate is failing because the memory
has been corrupted at some earlier point is quite clear, the only other
option why it should crash would be that it was not allocated at all,
which seem not to be the case here... The question is what corrupted
the memory and even more strange is why does it work if we disable MKL
multithreading?

It could indeed be that we are doing something wrong. I can imagine the
memory could be corrupted in some BLAS call if the number of
columns/rows passed to the specific BLAS call is more than the actual
size of the matrix, than this could easily happen (and the
multithreading is somehow influencing what the final value of the
corrupted memory, and depending on the final value the deallocate could
fail or pass somehow). This should be possible to diagnose with
valgrind as suggested.

Luis, can you upload the testcase somewhere, or recompile with
debuginfo as suggested by Laurence earlier, run "valgrind --track-
origins=yes lapwso lapwso.def" and send the output? Just be warned,
there is a massive slowdown with valgrind (up to 100x) and the logfile
can get very large.

Best regards
Pavel


On Wed, 2021-08-18 at 12:10 -0500, Laurence Marks wrote:
> Correction, I was looking at an older modules.F. It looks like it
> should be
> 
> DEALLOCATE(vect,stat=IV) ; if(IV .ne. 0)write(*,*)IV
> 
> 
> On Wed, Aug 18, 2021 at 11:23 AM Laurence Marks
>  wrote:
> > I do wonder about this. I suggest editing module.F and changing
> > lines 118 and 119 to
> >      DEALLOCATE(en,stat=Ien) ; if(Ien .ne. 0)write(*,*)'Err en
> > ',ien
> >      DEALLOCATE(vnorm,stat=Ivn ; ) if(Ivn .ne. 0)write(*,*)'Err
> > vnorm ',Ivn
> > 
> > There is every chance that the bug is not in those lines, but
> > somewhere completely different. SIGSEV often means that the code
> > has been overwritten, for instance arrays going out of bounds.
> > 
> > You can also recompile with -g (don't change other options)
> > added, and/or -C. Sometimes this is better. Or use other things
> > like debuggers or valgrind.
> > 
> > On Wed, Aug 18, 2021 at 10:47 AM Pavel Ondračka
> >  wrote:
> > > I'm CCing the list back as the crash was now diagnosed to a
> > > likely
> > > MKL
> > > problem, see below for more details.
> > > > 
> > > > 
> > > > > So just to be clear, explicitly setting OMP_STACKSIZE=1g does
> > > not
> > > > > help
> > > > > to solve the issue?
> > > > > 
> > > > 
> > > > 
> > > > Right! OMP_STACKSIZE=1g with OMP_NUM_THREADS=4 does not solve
> > > > the
> > > > problem!
> > > >  
> > > > > 
> > > > > The problem is that the OpenMP code in lapwso is very simple,
> > > so I'm
> > > > > having problems seeing how it could be causing the problems.
> > > > > 
> > > > > Could you also try to see what happens if run with:
> > > > > OMP_NUM_THREADS=1
> > > > > MKL_NUM_THREADS=4
> > > > > 
> > > > 
> > > > 
> > > > It does not work with these values, but I checked and it works
> > > > reverting them:
> > > > OMP_NUM_THREADS=4
> > > > MKL_NUM_THREADS=1
> > > 
> > > This was very helpfull and IMO points to a problem with MKL
> > > instead
> > > of
> > > Wien2k.
> > > 
> > > Unfortunatelly setting MKL_NUM_THREADS=1 globally will reduce the
> > > OpenMP performance, mostly in lapw1 but also at other places. So
> > > if
> > > you
> > > want to keep the OpenMP BLAS/lapack level parallelism you have to
> > > either find some MKL version that works (if you do please report
> > > it
> > > here), link with OpenBLAS (using it for lapwso is enough) or
> > > create
> > > a
> > > simple wrapper that sets the MKL_NUM_THREADS=1 just for lapwso,
> > > i.e.,
> > > rename lapwso binary in WIENROOT to lapwso_bin and create new
> > > lapwso
> > > file there with:
> > > 
> > > #!/bin/bash
> > > MKL_NUM_THREADS=1 lapwso_bin $1
> > > 
> > > and set it to executable with chmod +x lapwso.
> > > 
> > > Or maybe MKL has a non-OpenMP version which you could link with
> > > just
> > > lapwso and use standard one in other parts, but dunno, I mostly
> > > use
> > > OpenBLAS. If you need some further help, let me know.
> > > 
> > > Reporting the issue to intel could be also nice, however

Re: [Wien] segmentation fault in lapwso

2021-08-18 Thread Pavel Ondračka
I'm CCing the list back as the crash was now diagnosed to a likely MKL
problem, see below for more details.
> 
> 
> > So just to be clear, explicitly setting OMP_STACKSIZE=1g does not
> > help
> > to solve the issue?
> > 
> 
> 
> Right! OMP_STACKSIZE=1g with OMP_NUM_THREADS=4 does not solve the
> problem!
>  
> > 
> > The problem is that the OpenMP code in lapwso is very simple, so I'm
> > having problems seeing how it could be causing the problems.
> > 
> > Could you also try to see what happens if run with:
> > OMP_NUM_THREADS=1
> > MKL_NUM_THREADS=4
> > 
> 
> 
> It does not work with these values, but I checked and it works
> reverting them:
> OMP_NUM_THREADS=4
> MKL_NUM_THREADS=1

This was very helpfull and IMO points to a problem with MKL instead of
Wien2k.

Unfortunatelly setting MKL_NUM_THREADS=1 globally will reduce the
OpenMP performance, mostly in lapw1 but also at other places. So if you
want to keep the OpenMP BLAS/lapack level parallelism you have to
either find some MKL version that works (if you do please report it
here), link with OpenBLAS (using it for lapwso is enough) or create a
simple wrapper that sets the MKL_NUM_THREADS=1 just for lapwso, i.e.,
rename lapwso binary in WIENROOT to lapwso_bin and create new lapwso
file there with:

#!/bin/bash
MKL_NUM_THREADS=1 lapwso_bin $1

and set it to executable with chmod +x lapwso.

Or maybe MKL has a non-OpenMP version which you could link with just
lapwso and use standard one in other parts, but dunno, I mostly use
OpenBLAS. If you need some further help, let me know.

Reporting the issue to intel could be also nice, however I never had
any real luck there and it is also a bit problematic as you can't
provide testcase due to Wien2k being proprietary code...

Best regards
Pavel

>  
> > 
> > This should disable the Wien2k-specific OpenMP parallelism but still
> > keep the rest of paralellism at the BLAS/lapack level.
> > 
> 
> 
> So, perhaps, the problem is related to MKL!
>  
> > 
> > Another option is that something is going wrong before lapwso and the
> > lapwso crash is just the symptom. What happens if you run everything
> > up
> > to lapwso without OpenMP (OMP_NUM_THREADS=1) and than enable it just
> > for lapwso?
> > 
> 
> 
> If I run lapw0 and lapw1 with OMP_NUM_THREADS=4 and then change it to 1
> just before lapwso, it works. 
> If I do the opposite, starting with OMP_NUM_THREADS=1 and then change
> it to 4 just before lapwso, it does not work.
> So I believe that the problem is really at lapwso.
>  
>    If you need more information, please, let me know!
>    All the best,
>     Luis


___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] segmentation fault in lapwso

2021-08-18 Thread Pavel Ondračka
Dear Luis,

one very easy thing to try could be to set environment variable
OMP_STACKSIZE to something large like "1g", i.e., "export
OMP_STACKSIZE=1g" before run_lapw. Small OpenMP stacksize caused issues
for us previously so could be the case here as well. The only explicit
omp loop in hsocalc.F does allocates all private variables on the stack
and few of them are arrays, it is feasible this could be the case.

2 prof. Blaha:
from a very brief visual inspection of the OpenMP code in lapwso, I
believe there could be another small issue with combined MPI OpenMP. At
lines hsocalc.F:159 and hsocalc.F:160 the variables ibf_local and
ibi_local should be probably private. This should not be the cause of
the here reported problems though as that would only influence the
lapwso_mpi. The rest seems OK though (at first glance).

Best regards
Pavel

On Tue, 2021-08-17 at 18:18 -0300, Luis Ogando wrote:
> Dear Wien2k Community,
>    Greetings!
>    This message is only to inform that I also had a fragmentation
> problem with lapwso and Wien2k-21.
>    It was a very strange case. After a converged SCF cycle with mBJ
> and SO, I could not run "run_lapw -NI -so ...". In this case, I
> always got the following error after lapwso:
> 
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> Image              PC                Routine            Line      
>  Source            
> lapwso             0046A0EA  Unknown               Unknown
>  Unknown
> libpthread-2.28.s  1530B217B730  Unknown               Unknown
>  Unknown
> libiomp5.so        1530B1D132FB  Unknown               Unknown
>  Unknown
> libiomp5.so        1530B1D13049  Unknown               Unknown
>  Unknown
> libiomp5.so        1530B1D14B59  Unknown               Unknown
>  Unknown
> libiomp5.so        1530B1D161E8  Unknown               Unknown
>  Unknown
> libiomp5.so        1530B1D0C926  Unknown               Unknown
>  Unknown
> lapwso             0049CA86  Unknown               Unknown
>  Unknown
> lapwso             0040D77F  hmsout_mp_finit_h         119
>  modules.F
> lapwso             0042B94E  MAIN__                    622
>  lapwso.F
> lapwso             00404D22  Unknown               Unknown
>  Unknown
> libc-2.28.so       1530A3E3609B  __libc_start_main     Unknown
>  Unknown
> lapwso             00404C2A  Unknown               Unknown
>  Unknown
> 0.167u 0.051s 0:00.10 210.0% 0+0k 0+1976io 0pf+0w
> error: command   /home/ogando/Wien/Wien21/lapwso lapwso.def   failed
> 
>    The solution was to change OMP_NUM_THREADS from 4 to 1.
>    I checked and it also worked with OMP_NUM_THREADS equal to 2 but
> not 3.
>    If someone is interested in the compilation options or any other
> information, please ask.
>    All the best,
>                   Luis
> 
>    
> 
> Em qui., 10 de jun. de 2021 às 08:17, Fecher, Gerhard
>  escreveu:
> > Dear all,
> > while running a -so calculation I hit a segmentation fault in
> > lapwso
> > (see below) with the latest version Wien2k21.1 that does NOT appear
> > in 19.2.
> > (appeared for two different systems in fresh directories)
> > 
> > Did someone experience the same, or did I miss a report and may be
> > not up to date?
> > 
> > I used all settings the same (mostly default values), and the same
> > compilers and options (Intel OneAPI 2021 2.0 and Parallel Studio XE
> > 2017.4.056) for both versions, 21.1 and 19.2
> > 
> > forrtl: severe (174): SIGSEGV, segmentation fault occurred
> > Image              PC                Routine            Line       
> > Source             
> > lapwso             0046CE0A  Unknown               Unknown 
> > Unknown
> > libpthread-2.22.s  2AFBCC6DAB10  Unknown               Unknown 
> > Unknown
> > libiomp5.so        2AFBCCF2C8E8  Unknown               Unknown 
> > Unknown
> > lapwso             0049F7A6  Unknown               Unknown 
> > Unknown
> > lapwso             00421E9E  hmsec_                    926 
> > hmsec.F
> > 
> > line 926 is;       deallocate(meigve) 
> > indeed, if  this is the correct line at all.
> > 
> > indeed in 21.2 (I have seen that hmsec.F is different in 19.2)
> > 
> > Thanks for any suggestions that help
> > 
> > Gerhard
> > 
> > DEEP THOUGHT in D. Adams; Hitchhikers Guide to the Galaxy:
> > "I think the problem, to be quite honest with you,
> > is that you have never actually known what the question is."
> > 
> > 
> > Dr. Gerhard H. Fecher
> > Institut of Physics
> > Johannes Gutenberg - University
> > 55099 Mainz
> > ___
> > Wien mailing list
> > Wien@zeus.theochem.tuwien.ac.at
> > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> > SEARCH the MAILING-LIST at: 
> > http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> 

Re: [Wien] Installing WIEN2k, w2web not running

2021-04-14 Thread Pavel Ondračka
On Wed, 2021-04-14 at 06:12 +, delamora wrote:
> Thank you Pavel
> 
> I do a search
> dnf search perl-Sys-Hostname
> and I get;
> perl-Sys-Hostname-Long.noarch : Try every conceivable way to get full
> hostname
> I will try it

I'm not sure about the perl-Sys-Hostname-Long.noarch package. On my
Fedora 33 I have perl-Sys-Hostname.x86_64 package.

I have no idea why you can't find it. BTW Another way how to determine
the correct package could be to really check for the missing file (from
your log this is: */Sys/Hostname.pm) with:
"dnf provides */Sys/Hostname.pm"

> By the way, is Pavel the same as Pablo but in your language? It seems
> that no.

I believe it is :-)

Best regards
Pavel

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Installing WIEN2k, w2web not running

2021-04-13 Thread Pavel Ondračka
Hi Pablo,

there is a difference between the package with hostname utility and the
missing perl package with Sys::Hostname module. The proper package
should be perl-Sys-Hostname.

Best regards
Pavel

On Tue, 2021-04-13 at 23:21 +, delamora wrote:
> I have a comment at the end
> Dear WIEN2k comunity,
> I am installing the WIEN2k package in a new computer, all seems to
> run well, except for
> w2web
> 
> I run w2web
> it says;
> Can't locate Sys/Hostname.pm in @INC (you may need to install the
> Sys::Hostname module) (@INC contains: /usr/local/lib64/perl5/5.32
> /usr/local/share/perl5/5.32 /usr/lib64/perl5/vendor_perl
> /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5) at
> /home/Programas/WIEN2k-19.1/w2web line 3.BEGIN failed--compilation
> aborted at /home/Programas/WIEN2k-19.1/w2web line 3.
> 
> I searched for "Sys::Hostname";
> dnf search Sys::Hostname
> => does not exist
> 
> I searched for "Hostname"; dnf search Hostname
> and I found it, so I try to install it
> => dnf install hostname.x86_64
> and I get as an answer
> => "package already installed"perl-Sys-Hostname
> 
> ==
> I just want to comment that /home/Programas/WIEN2k-
> 19.1/SRC_w2web/w2web
> gives;
> -
> #!/usr/bin/perl
> 
> use Sys::Hostname;
> --
> So I do not know why w2web is not running with Fedora 33
> 
> 
> Pablo
> 
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at: 
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] consistent RKmax and sphere size settings

2021-04-07 Thread Pavel Ondračka
Thank you Laurence,

I was a bit worried because the FAQ you linked also says: "Of course
you should use identical Mg+O spheres for MgO and Mg(OH)2 for
consistency", so I was not 100% sure if keeping the same maximum K-
vector Kmax is enough.

Should I also increase lmax and lvns for the larger spheres somehow? Or
would you keep it the same for small and large N spheres?

Best regards
Pavel

On Wed, 2021-04-07 at 15:33 -0500, Laurence Marks wrote:
> Have a look at http://www.wien2k.at/reg_user/faq/rkmax.html. If (say)
> with an RMT for the N of 1.6 a RKMAX of 6.5 is good enough, then when
> you reduce the RMT to 1.3 you can reduce the RKMAX to 6.5*1.3/1.6 =
> 5.28. This will not give you precisely the same relative convergence,
> but is close.
> 
> Another way is to say that an RKMAX of 7 is "OK" for RMTs of 2.0, an
> RKMAX of 3 for RMTs of 0.5, then interpolate using a straight line.
> This is similar.
> 
> On Wed, Apr 7, 2021 at 3:24 PM Pavel Ondračka
>  wrote:
> > Dear Wien2k mailing list,
> > 
> > I have a series of TiN and TiON amorphous-like structures where I
> > have
> > some large differences in spheres sizes for N atoms. In most of the
> > structures the smallest N sphere is around 1.6-1.7, however in some
> > I
> > have few N atoms with 1.3 (the structures should be OK, this much
> > smaller size is due to some rare local configuration which would
> > correspond to something like N split interstitial in crystalline
> > structure).
> > 
> > My goal is to calculate core electron binding energies of N1s
> > levels
> > of
> > many atoms in the structures (at least 200 core-hole calculations)
> > and
> > I need to be consistent over different structures in the series.
> > 
> > So usually I would just check what is the smallest N sphere size in
> > the
> > whole set, and force it for all N atoms in all structures and than
> > use
> > the identical RKmax for all structures, just to be sure I'm
> > consistent.
> > This is unfortunatelly not very efficient with respect to the
> > calculation speed as I have quite large cells (around 150 atoms).
> > Is
> > there another way how can I save some CPU time and keep the
> > consistency?
> > 
> > I was for example thinking if I can force somehow two different N
> > sphere sizes (one for the N split intestitial, which I have usually
> > just one in the whole cell and one larger for the rest of N atoms),
> > than I would have consistent sphere size for the rest of N atoms in
> > the
> > series and I could change the RKmax to keep the same largest K-
> > vector
> > which should be enough to guarantee consistency for all N atoms
> > expect
> > the split interstitials (but I don't care that much about them).
> > However as far as I understand this is not possible?
> > 
> > Any ideas would be appreciated.
> > 
> > Best regards
> > Pavel Ondracka
> > 
> > ___
> > Wien mailing list
> > Wien@zeus.theochem.tuwien.ac.at
> > https://urldefense.com/v3/__http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien__;!!Dq0X2DkFhyF93HkjWTBQKhk!CmcMwWJhVAKhTUEoDt5KIyqaJX5T80I6NHismOuUzcHH0sD9lAytg75A7qoRWwzDI3sKJg$
> > 
> > SEARCH the MAILING-LIST at: 
> > https://urldefense.com/v3/__http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html__;!!Dq0X2DkFhyF93HkjWTBQKhk!CmcMwWJhVAKhTUEoDt5KIyqaJX5T80I6NHismOuUzcHH0sD9lAytg75A7qoRWwypwEJ3kA$
> > 
> 
> 
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at: 
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


[Wien] consistent RKmax and sphere size settings

2021-04-07 Thread Pavel Ondračka
Dear Wien2k mailing list,

I have a series of TiN and TiON amorphous-like structures where I have
some large differences in spheres sizes for N atoms. In most of the
structures the smallest N sphere is around 1.6-1.7, however in some I
have few N atoms with 1.3 (the structures should be OK, this much
smaller size is due to some rare local configuration which would
correspond to something like N split interstitial in crystalline
structure).

My goal is to calculate core electron binding energies of N1s levels of
many atoms in the structures (at least 200 core-hole calculations) and
I need to be consistent over different structures in the series.

So usually I would just check what is the smallest N sphere size in the
whole set, and force it for all N atoms in all structures and than use
the identical RKmax for all structures, just to be sure I'm consistent.
This is unfortunatelly not very efficient with respect to the
calculation speed as I have quite large cells (around 150 atoms). Is
there another way how can I save some CPU time and keep the
consistency?

I was for example thinking if I can force somehow two different N
sphere sizes (one for the N split intestitial, which I have usually
just one in the whole cell and one larger for the rest of N atoms),
than I would have consistent sphere size for the rest of N atoms in the
series and I could change the RKmax to keep the same largest K-vector
which should be enough to guarantee consistency for all N atoms expect
the split interstitials (but I don't care that much about them).
However as far as I understand this is not possible?

Any ideas would be appreciated.

Best regards
Pavel Ondracka

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] convergence criteria in scf file?

2020-11-04 Thread Pavel Ondračka
On Wed, 2020-11-04 at 07:30 +, Tran, Fabien wrote:
> Is the grep of :ENE, :DIS and :FOR not useful enough?

Right, this is what one would expect, but just to be 100% certain, the
scf is stopped when the change between :ENE, :DIS and (maximum change?)
in :FOR in the last ?two? iterations is below the value specified in
-ec -cc or -fc argument?

BTW what about the criteria itself, new Wien2k versions print in the
:LABEL4 the run_lapw command and arguments (so I can see what specific
values were passed to -ec -cc and -fc), but the stuff I'm looking at
comes from some older version where this was not printed...

Best regards
Pavel

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


[Wien] convergence criteria in scf file?

2020-11-03 Thread Pavel Ondračka
Dear Wien2k mailing list,

I'm looking at some old saved calculations, where I don't have the
dayfile (but I vaguely remember some convergence troubles). So I'm now
looking at the scf file and wondering if it is possible to tell from
the scf file, what was the final convergence of energy, charge and
forces (and also what the convergence criteria passed to run_lapw
were).

Best regards
Pavel

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] finding density of states for individual bands

2020-10-12 Thread Pavel Ondračka
The problematic part is that while joint claims it can give you a DOS
for just one band (with the switch 2), this is actually not a DOS of a
single band but of a single band index. This will be the same thing
only if there is no band crossing (the difference will be obvious if
you do energy band structure plot with x spaghetti with and without
running x irrep before).

Best regards
Pavel

On Sun, 2020-10-11 at 05:58 +, Lee, Yongbin [A LAB] wrote:
> I guess you can do it with "joint".
> Check *.injoint which is at page 170 in UG.
> 
> Yongbin
> 
> From: Wien  on behalf of
> Joseph Ross 
> Sent: Saturday, October 10, 2020 4:40 PM
> To: wien@zeus.theochem.tuwien.ac.at 
> Subject: [Wien] finding density of states for individual bands
>  
> We have a semimetallic system which has an indirect overlap of some
> rather convoluted bands at Ef. In order to better understand the
> holes vs. electrons in this system we would like to find the density
> of states (and partial densities if possible) associated with
> individual bands, rather than the total. From my understanding &
> reading through the users guide, I think this is not a feature
> included in wien2k. However if we are overlooking something, or if
> there is a separate package that we could use to extract this type of
> information, we would be interested to know. Any suggestions on this
> are welcome.
> -Joe Ross
> -
> Joseph H. Ross Jr.
> Professor
> Department of Physics and Astronomy
> Texas A University
> 4242 TAMU
> College Station TX  77843-4242
> 979 845 3842 / 448 MPHY
> jhr...@tamu.edu / http://faculty.physics.tamu.edu/ross
> -
> 
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


[Wien] NOMAD and Wien2k

2020-10-09 Thread Pavel Ondračka
Dear Wien2k mailing list,

I'm experimenting with the NOMAD database (nomad-lab.eu) and since I
remembered some old post from prof. Blaha on this topic, I just thought
I would ask here for user experience, because so far its not really
working that well for me.

So first of all the most annoying thing is that NOMAD detects two
"mainfiles" per directory, specifically the scf and scf0 files, but as
far as I can see it can't parse anything useful from the scf0 file (not
even the potential). The main downside is that it thinks there are two
calculations, while there is just one in fact and therefore creates lot
of useless entries.

Another think which I'm not sure how to approach is what to do when the
scf file is missing. I often do the main scf loop, than save_lapw to
another directory and after that I generate a new denser k-grid (for
DOS or optics) and just run lapw1, 2 and the tetra (or optic) or so on.
In this case I don't have scf file in the directory with the DOS or
optic calculations, but I would still like to upload this. The upload
somehow works, because the scf0 file from the old run stays there, so
at least NOMAD recognizes there are some Wien2k data, but it really
can't parse anything from the scf0 file (and in general I think that
using the scf0 file is a bug). I can make it somehow detect some
metadata by artificially creating a new fake scf file by combining the
old and new scfxxx files with cat... so that at least the NOMAD can
detect the composition, potential and some other basic things, but this
is clearly not optimal.

In general the NOMAD Wien2k parser cannot detect pretty much anything
beyond maybe structure information, and I will try to report all this
to the NOMAD developers to improve the parser, but I would be curious
if someone here had some better experience or can share some tricks,
how to make it work better for Wien2k with the current NOMAD state.

Best regards
Pavel

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] parallel instalation of Wien2k: elpa, fftw

2020-09-14 Thread Pavel Ondračka
OK, thanks for the explanation, I was not aware of this, therefore
please ignore my previous emails, as they are completely wrong. Sorry
for misleading the original poster.

Best regards
Pavel

On Mon, 2020-09-14 at 07:30 -0500, Laurence Marks wrote:
> Linkers will by default (99.99% confidence) add ".so" to a name for
> dynamic; if that is not present they will add ".a". Hence use of
> -lfftw3 will pickup libfftw3.so or libfftw3.a.
> 
> _
> Professor Laurence Marks
> "Research is to see what everybody else has seen, and to think what
> nobody else has thought", Albert Szent-Gyorgi
> www.numis.northwestern.edu
> 
> On Mon, Sep 14, 2020, 07:26 Pavel Ondračka 
> wrote:
> > On Mon, 2020-09-14 at 06:08 -0600, Gavin Abo wrote:
> > > See that "./configure --enable-mpi" was used.
> > > 
> > > Of note, sometimes -gcc-sys is needed:
> > > 
> > > 
> > https://urldefense.com/v3/__https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg18664.html__;!!Dq0X2DkFhyF93HkjWTBQKhk!E4g1pwAyAnEalyTRgC-3W-G9PjrnjQyeQ2rogzXFW466AhffGupbkMeeLTiuuS6IFBptXQ$
> >  
> > 
> > Out of interest I went through the link and I don't see how the
> > linker
> > can find/use the FFTW libraries. The FFTW_LIBS and FFTW_PLIBS
> > clearly
> > specify to link with the dynamic libraries:
> >   FFTW_OPT   : -DFFTW3 -I/home/username/fftw3/include
> >   FFTW_LIBS  : -L/home/username/fftw3/lib -lfftw3
> >   FFTW_PLIBS : -lfftw3_mpi
> > 
> > however the directory with FFTW contains only the static libraries:
> > 
> > /home/username/fftw3/lib:
> > total 2108
> > drwxr-xr-x 3 username username4096 May 27 22:57 cmake
> > -rw-r--r-- 1 username username 1933432 May 27 22:57 libfftw3.a
> > -rwxr-xr-x 1 username username 893 May 27 22:57 libfftw3.la
> > -rw-r--r-- 1 username username  201232 May 27 22:57 libfftw3_mpi.a
> > -rwxr-xr-x 1 username username 939 May 27 22:57 libfftw3_mpi.la
> > drwxr-xr-x 2 username username4096 May 27 22:57 pkgconfig
> > 
> > IMO this could work only under two lucky circumstances:
> > either one has another libfftw3.so and libfftw3_mpi.so somewhere in
> > the
> > system path
> > or it in fact doesn't link with the FFTW libs
> > in /home/username/fftw3/lib but with the FFTW-compatible interface
> > inside the MKL
> > 
> > > On 9/13/2020 2:53 AM, Ilias Miroslav, doc. RNDr., PhD. wrote:
> > > > Hello,
> > > > 
> > > > to profit from parallelization, one has to install ELPA and the
> > > > parallel version fftw library.
> > > > 
> > > > For fftw library I used ./configure --enable-mpi, but Wien2k
> > > > installator says "!!!  WARNING:  No MPI version of the FFTW
> > library
> > > > found!" But my installed   fftw-3.3.8/lib
> > > > contains  also libfftw3_mpi.a .
> > > > 
> > > > Any clues what is wrong ? Maybe it would be better to have
> > defined
> > > > environmental variables from ELPA, fftw ?
> > > > 
> > > > Miro
> > >  
> > > 
> > > ___
> > > Wien mailing list
> > > Wien@zeus.theochem.tuwien.ac.at
> > > 
> > https://urldefense.com/v3/__http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien__;!!Dq0X2DkFhyF93HkjWTBQKhk!E4g1pwAyAnEalyTRgC-3W-G9PjrnjQyeQ2rogzXFW466AhffGupbkMeeLTiuuS7zBzN5wQ$
> >  
> > > SEARCH the MAILING-LIST at:  
> > > 
> > https://urldefense.com/v3/__http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html__;!!Dq0X2DkFhyF93HkjWTBQKhk!E4g1pwAyAnEalyTRgC-3W-G9PjrnjQyeQ2rogzXFW466AhffGupbkMeeLTiuuS5KGygnKw$
> >  
> > 
> > ___
> > Wien mailing list
> > Wien@zeus.theochem.tuwien.ac.at
> > https://urldefense.com/v3/__http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien__;!!Dq0X2DkFhyF93HkjWTBQKhk!E4g1pwAyAnEalyTRgC-3W-G9PjrnjQyeQ2rogzXFW466AhffGupbkMeeLTiuuS7zBzN5wQ$
> >  
> > SEARCH the MAILING-LIST at:  
> > https://urldefense.com/v3/__http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html__;!!Dq0X2DkFhyF93HkjWTBQKhk!E4g1pwAyAnEalyTRgC-3W-G9PjrnjQyeQ2rogzXFW466AhffGupbkMeeLTiuuS5KGygnKw$
> >  
> 
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] parallel instalation of Wien2k: elpa, fftw

2020-09-14 Thread Pavel Ondračka
On Mon, 2020-09-14 at 06:08 -0600, Gavin Abo wrote:
> See that "./configure --enable-mpi" was used.
> 
> Of note, sometimes -gcc-sys is needed:
> 
> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg18664.html

Out of interest I went through the link and I don't see how the linker
can find/use the FFTW libraries. The FFTW_LIBS and FFTW_PLIBS clearly
specify to link with the dynamic libraries:
  FFTW_OPT   : -DFFTW3 -I/home/username/fftw3/include
  FFTW_LIBS  : -L/home/username/fftw3/lib -lfftw3
  FFTW_PLIBS : -lfftw3_mpi

however the directory with FFTW contains only the static libraries:

/home/username/fftw3/lib:
total 2108
drwxr-xr-x 3 username username4096 May 27 22:57 cmake
-rw-r--r-- 1 username username 1933432 May 27 22:57 libfftw3.a
-rwxr-xr-x 1 username username 893 May 27 22:57 libfftw3.la
-rw-r--r-- 1 username username  201232 May 27 22:57 libfftw3_mpi.a
-rwxr-xr-x 1 username username 939 May 27 22:57 libfftw3_mpi.la
drwxr-xr-x 2 username username4096 May 27 22:57 pkgconfig

IMO this could work only under two lucky circumstances:
either one has another libfftw3.so and libfftw3_mpi.so somewhere in the
system path
or it in fact doesn't link with the FFTW libs
in /home/username/fftw3/lib but with the FFTW-compatible interface
inside the MKL

> On 9/13/2020 2:53 AM, Ilias Miroslav, doc. RNDr., PhD. wrote:
> > Hello,
> > 
> > to profit from parallelization, one has to install ELPA and the
> > parallel version fftw library.
> > 
> > For fftw library I used ./configure --enable-mpi, but Wien2k
> > installator says "!!!  WARNING:  No MPI version of the FFTW library
> > found!" But my installed   fftw-3.3.8/lib
> > contains  also libfftw3_mpi.a .
> > 
> > Any clues what is wrong ? Maybe it would be better to have defined
> > environmental variables from ELPA, fftw ?
> > 
> > Miro
>  
> 
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] parallel instalation of Wien2k: elpa, fftw

2020-09-14 Thread Pavel Ondračka
On Mon, 2020-09-14 at 06:51 -0500, Laurence Marks wrote:
> ?
> 
> I have never used dynamic fftw, and never had a problem so I doubt
> that is the issue.

I also don't have any issues with static linking but I do fix the
Makefiles manually when siteconfig fails me. If you have a way how to
link with the static libraries just using the siteconfig, i.e., setting
just the FFTWROOT, FFTW_VERSION, FFTW_LIB and FFTW_LIBNAME from
siteconfig to make it link with the static libraries, I would be
interested in your config...

Best regards
Pavel

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] parallel instalation of Wien2k: elpa, fftw

2020-09-14 Thread Pavel Ondračka
I think I see the issue, libfftw3_mpi.a is a static library  Please
build FFTW with something like --enable-dynamic, or so, to get the
dynamic libraries as well (I don't remember the exact switch, but
./configure --help will list you all the options, so just find it
there).

If you have libfftw3_mpi.so and you still have issues, just post all of
your FFTW options and the full path to the directory where you have the
libraries (and the list of its contents), to debug further.

As a further remark, FFTW is so common library that is should be pretty
much available on every possible system, so I would suggest to save
yourself some trouble and don't compile it on your own. If this is your
computer, just install the correct packages from repos, or if this is
on cluster, just use the FFTW module module provided by admins. This
approach have the advantage, that the libraries will be either
installed into your system paths (or the loading of the module should
update the default paths), so even if you mess up your FFTW settings is
siteconfig, the linker should still be able to find the libraries...

Best regards
Pavel

BTW linking with static libs is doable but to do it through the
siteconfig is pretty much impossible so if you really want to do it,
you need to edit Makefiles manually.

On Sun, 2020-09-13 at 08:53 +, Ilias Miroslav, doc. RNDr., PhD.
wrote:
> Hello,
> 
> to profit from parallelization, one has to install ELPA and the
> parallel version fftw library.
> 
> For fftw library I used ./configure --enable-mpi, but Wien2k
> installator says "!!!  WARNING:  No MPI version of the FFTW library
> found!" But my installed   fftw-3.3.8/lib
> contains  also libfftw3_mpi.a .
> 
> Any clues what is wrong ? Maybe it would be better to have defined
> environmental variables from ELPA, fftw ?
> 
> Miro
> 
> 
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] error in mixer

2020-04-24 Thread Pavel Ondračka
> forrtl: severe (168): Program Exception - illegal instruction

Did you compile Wien2k on different machine than you run it now on?
What were your compilation options? This looks like your lapw1 binary
was compiled with some instructions which are not available on the
machine...

Best regards
Pavel

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Why the GAP appeared in DOS plot is lower than the GAP obtained during "Analysis"

2020-04-23 Thread Pavel Ondračka
Another option is that you applied some broadening to your DOS, the gap
in the DOS than would look smaller.

Best regards
Pavel


On Thu, 2020-04-23 at 08:12 +, Tran, Fabien wrote:
> The gap shown in Analysis is :GAP in case.scf. If a different k-mesh
> (than the one used during the SCF calculation) is used for
> generating the DOS, then there may be a difference. Typically, one
> should increase the k-mesh for DOS.
> What is the difference between the gaps in Analysis and DOS in your
> case?
> 
> 
> From: Wien  on behalf of
> shamik chakrabarti 
> Sent: Thursday, April 23, 2020 10:01 AM
> To: A Mailing list for WIEN2k users
> Subject: [Wien] Why the GAP appeared in DOS plot is lower than the
> GAP obtained during "Analysis"
>  
> Dear Wien2k users,
>
>  We have seen that the GAP appeared in
> DOS plot is lower than the GAP obtained during "Analysis". Why it is
> so? what is the meaning of GAP appeared during "Analysis"
> 
> Looking forward to your reply in this regard.
> 
> with regards,
> 
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Lapw.2 error

2020-04-17 Thread Pavel Ondračka
Not directly related to this thread, but since reports of this bug (and
some others with known fixes) keeps reappearing, how about making a
19.2 (or 19.1.1) bugfix release (just 19.1 code + fixes from Gavins
repo)? Unless next major version is already round the corner.

Just my two cents
Best regards
Pavel

On Fri, 2020-04-17 at 12:29 -0600, Gavin Abo wrote:
> Also, did compile with gfortran and not apply the bug fix to WIEN2k
> 19.1:
> 
> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg18771.html
> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19741.html
> 
> 
> On 4/17/2020 12:20 PM, Laurence Marks wrote:
> > This may have nothing to do with it, but why is your directory
> > WIEN2k? The normal convention is, for instance, to have a directory
> > TiC in which your structure file is TiC.struct etc. I hope you are
> > not running in the installation directory, I suspect that could
> > lead to chaos!
> > 
> > On Fri, Apr 17, 2020 at 1:02 PM Johnathon Street  > > wrote:
> > > Prof. Blaha,
> > > 
> > > I am running Wien2k version 19.1 on Ubuntu. When running the SFC
> > > cycle I am receiving the following error in Lapw2. error file.
> > > 
> > >  'LAPW2' - can't open unit: 15  
> > >  
> > >  'LAPW2' -filename: WIEN2k.tmp  
> > >  
> > >  'LAPW2' -  status: scratch  form: unformatted 
> > > 
> > > I have searched the mailing list and found a possible solution
> > > would be to delete line 15 in the lapw2.def file which appears as
> > > below:
> > > 
> > >  2,'WIEN2k.nsh','unknown','unformatted',0
> > >  3,'WIEN2k.in1',   'unknown','formatted',0
> > >  4,'WIEN2k.inso',   'unknown','formatted',0
> > >  5,'WIEN2k.in2',   'old','formatted',0
> > >  6,'WIEN2k.output2','unknown','formatted',0
> > >  7,'WIEN2k.vorb','unknown','formatted',0
> > >  8,'WIEN2k.clmval','unknown','formatted',0
> > > 10,'./WIEN2k.vector', 'unknown','unformatted',9000
> > > 13,'WIEN2k.recprlist',  'unknown','unformatted',9000
> > > 14,'WIEN2k.kgen','unknown','formatted',0
> > > 16,'WIEN2k.qtl',   'unknown','formatted',0
> > > 17,'WIEN2k.weightaver','unknown','formatted',0
> > > 18,'WIEN2k.vsp',   'old','formatted',0
> > > 19,'WIEN2k.vns',   'unknown','formatted',0
> > > 20,'WIEN2k.struct', 'old','formatted',0
> > > 21,'WIEN2k.scf2','unknown','formatted',0
> > > 922,'WIEN2k.rotlm',   'unknown','formatted',0
> > > 23,'WIEN2k.radwf',   'unknown','formatted',0
> > > 26,'WIEN2k.weight',   'unknown','formatted',0
> > > 27,'WIEN2k.weightdn',   'unknown','formatted',0
> > > 29,'WIEN2k.energydn','unknown','formatted',0
> > > 30,'WIEN2k.energy', 'unknown','formatted',0
> > > 32,'WIEN2k.qdmft',   'unknown','formatted',0
> > > 34,'WIEN2k.oubwin',   'unknown','formatted',0
> > > 231,'WIEN2k.dmftsym',   'unknown','formatted',0
> > > 
> > > so that the lapw.def file looks like this.
> > > 
> > > line 15 prior to deletion is as follows:
> > > 
> > > 15, 'WIEN2k.tmp', 'scratch', unformatted' ,0
> > > 
> > > I continue to get the same error following deleting line 15. Do
> > > you have any suggestions 
> > > 
> > > Thank you,
> > > Johnathon Street
> > > 
> > > 
> > > ___
> > > Wien mailing list
> > > Wien@zeus.theochem.tuwien.ac.at
> > > https://urldefense.com/v3/__http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien__;!!Dq0X2DkFhyF93HkjWTBQKhk!BMa1D1qNnQmptPuCfS6VM-yj1xakPH0xhKCtIv9DReLsGVEuTMTkH-QXfeTzi5FI7psSVQ$
> > >  
> > > SEARCH the MAILING-LIST at:  
> > > https://urldefense.com/v3/__http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html__;!!Dq0X2DkFhyF93HkjWTBQKhk!BMa1D1qNnQmptPuCfS6VM-yj1xakPH0xhKCtIv9DReLsGVEuTMTkH-QXfeTzi5Ea7fovWg$
> > >  
> > > 
> > 
> > 
> > -- 
> > Professor Laurence Marks
> > Department of Materials Science and Engineering
> > Northwestern University
> > www.numis.northwestern.edu
> > Corrosion in 4D: www.numis.northwestern.edu/MURI
> > Co-Editor, Acta Cryst A
> > "Research is to see what everybody else has seen, and to think what
> > nobody else has thought"
> > Albert Szent-Gyorgi
> > 
> > 
> > ___
> > Wien mailing list
> > Wien@zeus.theochem.tuwien.ac.at
> > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> > SEARCH the MAILING-LIST at:  
> > http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> 
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  

Re: [Wien] fold2Bolch - gfortran

2020-03-30 Thread Pavel Ondračka
Hi,

gfotran should in theory compile anything what ifort does (assuming the
code is a valid standard fortran, which might not be a case for stuff
which is only used/tested with ifort). Just set the compiler to
gfortran instead of ifort, use the same flags as for Wien2k ("-ffree-
form -ffree-line-length-none -O2" might be a good start) and if you run
into any issues, provide the specific compile commands you use and also
the errors.

Best regards
Pavel 

On Mon, 2020-03-30 at 11:00 +0200, Catalina Coll wrote:
> Dear users and developers, 
> 
> I would like to know if is possible to compile fold2Bloch with
> gfortran (I currently have WIEN2k compiled with gfortran).
> 
> Thanks in advance.
> 
> Catalina Coll 
> PhD Candidate
> LENS - Laboratory of Electron Nanoscopy
> MIND - Micro-Nanotechnology and Nanoscopies for electrophotonic
> Devices
> IN2UB - Institute of Nanoscience and Nanotechnology
> Departament d'Enginyeria Electrònica i Biomèdica - Universitat de
> Barcelona
> 
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] TELNES calculation

2020-03-24 Thread Pavel Ondračka
Dear Ali,

please do core holes only for atoms with multiplicity 1 (otherwise you
add multiple core hole at once, and you will get interaction between
them, which is what you want to avoid with the supercell in the first
place)! Just name (number) one oxygen atom for every non-equivalent
oxygen position, so that the symmetry is reduced as needed. Than of
course you do the core hole and TELNES calculations just for the named
atoms and sum the spectra with weights according to the correct
multiplicities.

Best regards
Pavel

On Tue, 2020-03-24 at 07:37 +, Ali Baghizhadeh wrote:
> Dear Prof. Blaha
> Thank you very much. I did create supercell (2x2x1) and I am using
> LDA+U. Again some oxygen have multiplicity of 3, which may result in
> increase in the intensity of that specific oxygen. Currently I do
> non-spin polarized calculations, but I wish to introduce AFM state in
> the cell, on Fe ions. As oxygen is non-magnetic, I do not know how
> much the spin state of Fe ions will affect the TELNES spectra?
> 
> Best regards
> Ali
> From: Wien  on behalf of
> Peter Blaha 
> Sent: 24 March 2020 07:05:43
> To: wien@zeus.theochem.tuwien.ac.at
> Subject: Re: [Wien] TELNES calculation
>  
> ad 1) no case.inm has no effect on telnes. It is used only during
> run_lapw.
> ad 2) Yes, you should do the calculations for all non-equivalent O
> atoms 
> and sum the results including their multiplicity. (at least when you
> see 
> some differences in their corresponding DOS).
> 
> What you did not mention: You should create a supercell and create
> the 
> core holes in the supercells. Please read the corresponding
> literature 
> (or the XAS/TELNES sections in the UG and in our workshop lectures).
> 
> And: LuFeO3 is certainly a correlated material. Use GGA+U or mBJ for 
> these calculations.
> 
> Am 23.03.2020 um 22:42 schrieb Ali Baghizhadeh:
> > Dear WIEN2k users
> > 
> > I am trying to calculate the K-edge of oxygen in h-LuFeO3 using
> TELNES 
> > program. I have two questions regarding a structure having few
> oxygen 
> > ions of different Wyckoff positions and multiplicity. For K-edge
> oxygen 
> > calculation, I assume we change the occupancy of specific oxygen
> in 
> > case.inc and add an electron to background in case.inm to run SCF.
> > 
> > 1- After SCF convergence and before TELNES, should we modify 
> > again case.inm and remove the additional background electron or
> not?
> > 
> > 2- Should we repeat SCF calculation for all non-equivalent oxygens
> in 
> > the structure and sum spectra of all oxygens, to represent the 
> > experimental spectrum?
> > 
> > Thank you in advance.
> > 
> > 
> > Ali Baghi zadeh
> > 
> > 
> > ___
> > Wien mailing list
> > Wien@zeus.theochem.tuwien.ac.at
> > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> > SEARCH the MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> > 
> 
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] PES issues

2020-03-16 Thread Pavel Ondračka
Thanks for the comments.

I played with it a bit more today and indeed the c) looks the most
probable. In fact the fit is very unstable (as expected), when I change
the DOS a bit (for example a slightly different broadening) the fitted
values can change significantly. But hard to tell for sure without
actually seeing the correlation matrix. 

Best regards
Pavel

On Fri, 2020-03-13 at 15:41 +0100, Peter Blaha wrote:
> It looks a bit strange if this is a simple supercell without changes.
> 
> a) You should increase the limit for the fit (after the first fit, 
> changes are limited to +-0.20, with "l" you can redo the fit with a 
> larger limit until the "optimized weight" does not improve anymore.
> 
> b) In particular the N-s states look strange. These are well
> separated 
> bands of almost only 2s character. Thus the total DOS in this region 
> should be perfectly well represented by the sum of the renormalized-
> PDOS.
> 
> c) It could be that the fit is not unique, since without changes,
> all 
> PDOS of the different atoms should have the same shape and one can 
> probably have the same total DOS with some "arbitrarily" chosen
> weights.
> After all you fit:   Y = x1*A + x2*A  (A is the identical PDOS of
> atom 1 
> and 2). Then x1 and x2 are are not uniquely defined.
> 
> 
> On 3/13/20 3:27 PM, Pavel Ondračka wrote:
> > Dear Peter,
> > 
> > thanks for the new version. It seems to be working now (I've also
> > included the N2s states). The V states seem to be too strong with
> > respect to my experimental measurements, but maybe this is just a
> > problem with the crossections, I'll try to play with it a bit. What
> > I've however noticed is that the optimized q_sphere differ
> > significantly between different atoms of the same type:
> > 
> >Optimize q_sphere by fitting total DOS?(Y/n)
> > Y
> > Mean deviation of (total-DOS - sum(PDOS))**2 using
> > outputst-weights: optimized weights:   with limit:
> >12672.3980   1069.0124+-0.20
> > Partial Orbital  Case.outputst  Optimized
> > V 4s  0.13420.33429998
> > V 4p  1.0.8001
> > V 3d  0.818099980.9901
> > V 4s  0.13420.33429998
> > V 4p  1.0.8001
> > V 3d  0.818099980.9901
> > V 4s  0.13420.23524959
> > V 4p  1.0.8001
> > V 3d  0.818099980.9901
> > V 4s  0.13420.33429998
> > V 4p  1.0.8001
> > V 3d  0.818099980.81147813
> > N 2s  0.803600010.9901
> > N 2p  0.737900020.93790001
> > N 2s  0.803600010.60360003
> > N 2p  0.737900020.53790003
> > N 2s  0.803600010.60360003
> > N 2p  0.737900020.53790003
> > N 2s  0.803600010.9901
> > N 2p  0.737900020.93790001
> > N 2s  0.803600010.9901
> > N 2p  0.737900020.93790001
> > N 2s  0.803600010.87147462
> > N 2p  0.737900020.60641075
> > 
> > This is kinda unexpected as this is in fact a perfect cubic VN
> > supercell (with single named atom for core-hole calculations, but
> > no
> > core hole yet in this case).
> > 
> > Is the fitting working as expected?
> > 
> > Best regards
> > Pavel
> > 
> > On Fri, 2020-03-13 at 14:14 +0100, Peter Blaha wrote:
> > > Daer Pavel,
> > > 
> > > Thanks for your report.
> > > 
> > > I tried to reproduce it, but my version of pes has already been
> > > changed
> > > significantly. In particular optimize_charge.f is now quite
> > > different.
> > > 
> > > However, when compiling with -C I detected another problem with
> > > the
> > > variable "start" (never set to zero) and in read_dos.f.
> > > 
> > > I've uploaded SRC_pes.tar.gz into the "files" directory of the
> > > wien2k-download and you can download it from there, if
> > > interested.
> > > 
> > > PS: I've not fixed the gfortran problem yet, but will try to do
> > > it
> > > soon.
> > > So if it is not timely, you can also wait with the download until
> > > this
> > > is also fixed.
> > > 
> > 

Re: [Wien] PES issues

2020-03-13 Thread Pavel Ondračka
Dear Peter,

thanks for the new version. It seems to be working now (I've also
included the N2s states). The V states seem to be too strong with
respect to my experimental measurements, but maybe this is just a
problem with the crossections, I'll try to play with it a bit. What
I've however noticed is that the optimized q_sphere differ
significantly between different atoms of the same type:

  Optimize q_sphere by fitting total DOS?(Y/n)
Y
Mean deviation of (total-DOS - sum(PDOS))**2 using 
outputst-weights: optimized weights:   with limit:
  12672.3980   1069.0124+-0.20
Partial Orbital  Case.outputst  Optimized
V 4s  0.13420.33429998
V 4p  1.0.8001
V 3d  0.818099980.9901
V 4s  0.13420.33429998
V 4p  1.0.8001
V 3d  0.818099980.9901
V 4s  0.13420.23524959
V 4p  1.0.8001
V 3d  0.818099980.9901
V 4s  0.13420.33429998
V 4p  1.0.8001
V 3d  0.818099980.81147813
N 2s  0.803600010.9901
N 2p  0.737900020.93790001
N 2s  0.803600010.60360003
N 2p  0.737900020.53790003
N 2s  0.803600010.60360003
N 2p  0.737900020.53790003
N 2s  0.803600010.9901
N 2p  0.737900020.93790001
N 2s  0.803600010.9901
N 2p  0.737900020.93790001
N 2s  0.803600010.87147462
N 2p  0.737900020.60641075

This is kinda unexpected as this is in fact a perfect cubic VN
supercell (with single named atom for core-hole calculations, but no
core hole yet in this case).

Is the fitting working as expected?

Best regards
Pavel

On Fri, 2020-03-13 at 14:14 +0100, Peter Blaha wrote:
> Daer Pavel,
> 
> Thanks for your report.
> 
> I tried to reproduce it, but my version of pes has already been
> changed 
> significantly. In particular optimize_charge.f is now quite
> different.
> 
> However, when compiling with -C I detected another problem with the 
> variable "start" (never set to zero) and in read_dos.f.
> 
> I've uploaded SRC_pes.tar.gz into the "files" directory of the 
> wien2k-download and you can download it from there, if interested.
> 
> PS: I've not fixed the gfortran problem yet, but will try to do it
> soon. 
> So if it is not timely, you can also wait with the download until
> this 
> is also fixed.
> 
> PPS: Since you have N-s basis and it is used in the fit, you must 
> include its PDOS also. Thus you must include the PDOS for lower
> energy, 
> such that the N-s band is included. Otherwise the fit may give
> nonsense.
> 
> Best regards
> Peter
> 
> 
> On 3/13/20 11:58 AM, Pavel Ondračka wrote:
> > Dear Wien2k mailing list,
> > 
> > I'm experiencing a crash when trying to calculate valence band
> > spectra
> > for VN.
> > 
> > (This is a resend of previous email which is stuck in the queue due
> > to
> > being slightly over the size limit, now with a link instead. I
> > apologize for double posting if the original one eventually makes
> > it to
> > the list as well.)
> > 
> > There is out of bounds write during optimization of q_spheres:
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x0040d494 in optimize_charge () at optimize_charge.f:95
> > 95 temp(l,recon_counter)=temp(l,j)+temp(l,i)
> > (gdb) print recon_counter
> > $1 = 27
> > (gdb) print output_counter
> > $2 = 24
> > (it tries to write at index 27) but the size is just 24 (defined
> > by output_counter)
> > 
> > The files needed to reproduce this and the terminal output
> > (together
> > with the manual keyboard input needed to reproduce the crash) are
> > at
> > https://drive.google.com/open?id=1NZ8lSkfrgigtdQZrDZLp8Y-mFf4uSyk_
> > . I'm not a regular user of the pes program so there is a high
> > chance that there is something wrong with my input.
> > 
> > BTW While taking a quick look I spotted some likely unrelated small
> > issues, for instance pes is also influenced by the well known issue
> > with gfortran using the units 5 and 6 (have to change it manually
> > to
> > something else otherwise stdin and stdout doesn't work) and there
> > are
> > some valgrind warnings even before the crash, for example:
> > 
> > ==57304== Conditional jump or move depends on uninitialised
> > value(s)
> > ==

[Wien] PES issues

2020-03-13 Thread Pavel Ondračka
Dear Wien2k mailing list,

I'm experiencing a crash when trying to calculate valence band spectra
for VN.

(This is a resend of previous email which is stuck in the queue due to
being slightly over the size limit, now with a link instead. I
apologize for double posting if the original one eventually makes it to
the list as well.)

There is out of bounds write during optimization of q_spheres:
Program received signal SIGSEGV, Segmentation fault.
0x0040d494 in optimize_charge () at optimize_charge.f:95
95 temp(l,recon_counter)=temp(l,j)+temp(l,i)
(gdb) print recon_counter
$1 = 27
(gdb) print output_counter
$2 = 24
(it tries to write at index 27) but the size is just 24 (defined
by output_counter)

The files needed to reproduce this and the terminal output (together
with the manual keyboard input needed to reproduce the crash) are at 
https://drive.google.com/open?id=1NZ8lSkfrgigtdQZrDZLp8Y-mFf4uSyk_ 
. I'm not a regular user of the pes program so there is a high
chance that there is something wrong with my input.

BTW While taking a quick look I spotted some likely unrelated small
issues, for instance pes is also influenced by the well known issue
with gfortran using the units 5 and 6 (have to change it manually to
something else otherwise stdin and stdout doesn't work) and there are
some valgrind warnings even before the crash, for example:

==57304== Conditional jump or move depends on uninitialised value(s)
==57304==at 0x419919: abs_smooth_ (SPLINE.f:173)
==57304==by 0x41852E: setup_ (SPLINE.f:91)
==57304==by 0x4199FE: spline_ (SPLINE.f:16)
==57304==by 0x41582B: read_database2_ (read_database2.f:124)
==57304==by 0x403FE7: MAIN__ (pes.f:151)
==57304==by 0x4066FF: main (pes.f:3)
==57304==  Uninitialised value was created by a stack allocation
==57304==at 0x4199B3: spline_ (SPLINE.f:1)

SPLINE.f:173
   if (x >= delta_x) then
   
The unuinitialized variable is the delta_x which was passed from setup
(SPLINE.f:91):
call abs_smooth(m4 - m3, delta_x, w1)
and was itself allocated on the stack at the beginning of spline but is
not initialized anywhere as far as I can see. So I set it to 0.0d0 (the
default for ifort).

and one more which should be harmless...
==57389== Conditional jump or move depends on uninitialised value(s)
==57389==at 0x47213EC5: bcmp (vg_replace_strmem.c:1113)
==57389==by 0x474A3B7A: _gfortran_compare_string
(string_intrinsics_inc.c:98)
==57389==by 0x41677F: read_outputst_ (read_outputst.f:37)
==57389==by 0x4048E2: MAIN__ (pes.f:222)
==57389==by 0x4066FF: main (pes.f:3)
==57389==  Uninitialised value was created by a stack allocation
==57389==at 0x416559: read_outputst_ (read_outputst.f:1)

Best regards
Pavel

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Ask for help

2020-01-28 Thread Pavel Ondračka
Dear Siham,

yes, you can get a full dielectric tensor from Wien2k, just select the
appropriate components in case.inop and case.injoint files.

If I remember correctly the orientation of the tensor is that the xx
direction of the dielectric tensor is in the direction of the a lattice
parameter, yy is in the plane defined by a and b lattice parameters
(and perpendicular to xx) and zz is perpendicular to yy and xx. So for
some arbitrary direction you will have to do the transformation
yourself.

Best regards
Pavel

On Tue, 2020-01-28 at 13:43 +0100, Siham Malki wrote:
> Dear All,
> I calculated the dielectric function with Wien2k, so i obtained this
> function vs energie , i need to know how to change the wave vector of
> light q for determine the variation of the dielectric function as
> function the wave vector q. Can you help me please.
> Best regards
> 
> 
> 
>Sender notified by 
> Mailtrack 28/01/20 à 13:42:52   
> 
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Wien2k 19.1 with linux+gfortran benchmarks

2019-12-12 Thread Pavel Ondračka
I concur.

In general for the serial test case on modern CPU (avx2 instructions)
your runtime should be around or below 30seconds for single thread.

However as this is almost 10 years old mobile CPU with just avx
instructions the total runtime of slightly above 1 minute is expected.

Regarding the scaling even when not memory bound, I can get around 35%
runtime compared to serial run with openBLAS (MKL scales slightly
better). Small speedups could be probably gained with some work on HNS
section (as this is the worst scaling part which we have more or less
under control) but for the DIAG part we just depend on the BLAS/LAPACK
to scale properly.

If you have multiple k-points and your total memory permits it, its
best to use k-point parallelization and use OpenMP just for lapw0 and
mixer...

Pavel

On Thu, 2019-12-12 at 13:42 +0100, Peter Blaha wrote:
> It is perfectly ok for your hardware.
> 
> The cpu time is not so important for you, what counts is the WALL-
> time 
> (this is the time it really takes until it finishes).
> 
> You can see that Hamilt parallelizes fairly well (3.7 vs. 12.3
> seconds / 
> speedup factor 3.3), HNS is not so good (3.8 vs. 8.8 s / factor 2.3)
> and 
> DIAG is worse (23.2 vs. 48.2 / factor 2.1).
> 
> Part of the reason that you can never see a factor of 4 is the slow 
> memory access, so when 4 cores do some calculations, they have to
> wait 
> sometimes for data from the memory.
> 
> On machines with more cores and a better memory bus, you will get
> other 
> speed-ups, but basically no machine can use all cores with 100% 
> efficiency because of this limited memory access.
> 
> 
> On 12/12/19 1:07 PM, Hemza wrote:
> > Hi everybody:
> > I just finished updating my wien2k installation to 19.1 with
> > openMP 
> > support (linux (4.19.88), gfortran (9.2.0), openblas-lapack-openmp 
> > (0.3.7), fftw3 (3.3.8), libxc (4.3.4)), and patches from 
> > "https://github.com/gsabo/WIEN2k-Patches;.
> > I intend to use it for relatively small cases (less than 25
> > atoms/unit 
> > cell). I run 'x lapw1' on the test_case.
> > With OMP_NUM_THREAD=4 in bashrc:
> > --
> > $ x lapw1
> > STOP  LAPW1 END
> > 113.876u 2.097s 0:31.36 369.7%  0+0k 424+37840io 2pf+0w
> > $ grep HORB *output1*
> > test_case.output1:   TIME HAMILT (CPU)  =13.5, HNS =  
> >  12.6, 
> > HORB = 0.0, DIAG =87.3, SYNC = 0.0
> > test_case.output1:   TIME HAMILT (WALL) = 3.7, HNS =
> > 3.8, 
> > HORB = 0.0, DIAG =23.2, SYNC = 0.0
> > --
> > 
> > and with OMP_NUM_THREAD=1 , I got:
> > -
> > $ x lapw
> > STOP  LAPW1 END
> > 69.380u 0.339s 1:09.88 99.7%0+0k 352+37848io 2pf+0w
> > $ grep HORB *output1*
> > test_case.output1:   TIME HAMILT (CPU)  =12.0, HNS =
> > 8.8, 
> > HORB = 0.0, DIAG =48.1, SYNC = 0.0
> > test_case.output1:   TIME HAMILT (WALL) =12.3, HNS =
> > 8.8, 
> > HORB = 0.0, DIAG =48.2, SYNC = 0.0
> > 
> > I do not feel i really understand the output and I do not know if
> > this 
> > timing are good, so I eager to read your comments!
> > 
> > My machine ('inix -dm' output)
> > 
> > System:Host: dojo Kernel: 4.19.88-1-lts x86_64 bits: 64
> > Desktop: i3 
> > 4.17.1 Distro: Artix rolling
> > Machine:   Type: Laptop System: ASUSTeK product: K53SD v: 1.0
> > serial: 
> > 
> > Mobo: ASUSTeK model: K53SD v: 1.0 serial:  > required> 
> > BIOS: American Megatrends v: K53SD.202
> > date: 11/02/2011
> > Battery:   ID-1: BAT0 charge: 33.8 Wh condition: 33.8/59.4 Wh (57%)
> > Memory:RAM: total: 7.57 GiB used: 4.84 GiB (63.9%)
> > RAM Report: permissions: Unable to run dmidecode. Are
> > you root?
> > CPU:   Quad Core: Intel Core i7-2670QM type: MT MCP speed: 849
> > MHz 
> > min/max: 800/3100 MHz
> > Graphics:  Device-1: Intel 2nd Generation Core Processor Family 
> > Integrated Graphics driver: i915 v: kernel
> > Device-2: NVIDIA GF119M [GeForce 610M] driver: nouveau
> > v: 
> > kernel
> > Display: x11 server: X.org 1.20.6 driver:
> > intel,nouveau 
> > unloaded: fbdev,modesetting,vesa
> > resolution: 
> > Message: Unable to show advanced data. Required tool
> > glxinfo 
> > missing.
> > Network:   Device-1: Intel Centrino Wireless-N 100 driver: iwlwifi
> > Device-2: Qualcomm Atheros AR8151 v2.0 Gigabit
> > Ethernet 
> > driver: atl1c
> > Drives:Local Storage: total: 2.05 TiB used: 1.45 TiB (70.8%)
> > Info:  Processes: 300 Uptime: 1d 1h 46m Shell: bash inxi:
> > 3.0.26
> > -
> > 
> > regards
> > 
> > ___
> > Wien mailing list
> > Wien@zeus.theochem.tuwien.ac.at
> > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> > SEARCH the MAILING-LIST at:  
> > 

Re: [Wien] 24k points on 36processors ??. (a Fractional k-point per core)

2019-12-12 Thread Pavel Ondračka
I'm neither a PBS or csh expert but to me it looks like you are
spawning just a single kpoint job for each node and also for the lapw0
just a single process per node.

If you run just a single k-point job at the node and there is still not
enough memory than you probably need MPI. Or maybe the default memory
PBS gives you is not enough (maybe you need to specifically ask for
larger amount, no idea).

For the example from my last email to spawn 4 kpoint jobs per node at
three nodes with 3 openmp threads each the final .machines files should
look like this (with the nodexxx replaced with the actual nodenames
based on the $PBS_NODEFILE):

1:node1
1:node1
1:node1
1:node1
1:node2
1:node2
1:node2
1:node2
1:node3
1:node3
1:node3
1:node3
granularity:1
extrafine:1
omp_lapw2:3
omp_lapw1:3
omp_lapw0:4
omp_global:12
lapw0: node1:3 node2:3 node3:3

I would advice to read the .machines file section of the manual once
more, try to understand what should your .machines file look like and
than consult with whoever wrote your PBS script in the first place to
modify it so it generates the .machines file you need.

Best regards
Pavel

BTW you are actually not asking for 12cpus but just 8cpus...


On Thu, 2019-12-12 at 17:04 +0530, Ashwani Kumar wrote:
> Dear Sir,
>  Hyper-threading is disabled (just checked with facility
> expert). So 12 physical cores per node (intel xeon nehalem based
> arch.). Available Memory 4gb/core (48gb/node). 
>  Lapw1 stops with error "insufficient virtual memory". So i
> thought better to use 36 cores for 24k points as extra (48gb) memory
> will be available. I am using pbs queuing system (wien2k  V19.1
> compiled with openmpi_parallelization) which generates *.machine file
> when jobscript executed. Then how to set the omp_thread in *.machine
> file.  (jobscript file attached for your reference).
> 
> thanks,
> A. kumar
> 
> On Thu, Dec 12, 2019 at 2:55 PM Pavel Ondračka <
> pavel.ondra...@email.cz> wrote:
> > Hi,
> > 
> > do you have hyperthreading or not (in other words does this number
> > of
> > 12 already mean there are 6 physical CPUs and 12 virtuals, or 12
> > physical)? This would influence the advice maybe a bit...
> > 
> > Otherwise you need to experiment, the optimal setting is heavily
> > dependent on your specific CPU, memory speed and what you are
> > calculating (system size). 
> > 
> > When talking about the 24 kpoints and 36 processors, than running
> > 4kpoints on each node  (12 kpoints in parallel) with 3 openmp
> > threads
> > each might be a reasonable setting.
> > 
> > It is also possible that just leaving some cores idle might be the
> > best
> > thing to do (as running a lot of k-points in parallel you can get
> > limited by the memory speed so leaving some cores idle means more
> > memory bandwidth for the others):
> > This would correspond to running 8 kpoints on each node or 4
> > kpoints on
> > each node with 2 openmp threads each.
> > 
> > The linux kernel and modern processors are also usually good at
> > handling some small overload and load balancing so you can also try
> > to
> > overload the system a bit, i.e., 8kpoints per node with 2 openmp
> > threads each. 
> > 
> > Just try the different settings (single lapw1 run for each should
> > be
> > enough to get some idea) and compare the timings.
> > 
> > Best regards
> > Pavel
> > 
> > BTW for lapw0 I would go with something like 3 MPI processes per
> > node
> > with 4 OpenMP threads for each in this case.
> > 
> > On Thu, 2019-12-12 at 12:28 +0530, Ashwani Kumar wrote:
> > > Hi, 
> > >This is related to no. of k-points which we provide during the
> > > initilization. No. of k-gen points given ; 120 with shifted mesh.
> > > Irr. k-points : 24k points. Running job on 3 nodes (12 x3
> > processors,
> > > 48 gb x 3 Ram). Job running on 24 processors only (with
> > granularity:
> > > 1, extrafine:1 in *.machine file) which means 1kpoint/1-core. How
> > can
> > > 24 k-points be made to run on 36  cores ?. Or how can 24 kpoints
> > can
> > > be distributed equally between 36 cores (or let's say 12 kpoints
> > on
> > > 24 processors to make calculation converge faster). 
> > > 
> > > thanks,
> > > A. Kumar
> > > ___
> > > Wien mailing list
> > > Wien@zeus.theochem.tuwien.ac.at
> > > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> > > SEARCH the MAILING-LIST at:  
> > > 
> > http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/i

Re: [Wien] 24k points on 36processors ??. (a Fractional k-point per core)

2019-12-12 Thread Pavel Ondračka
Hi,

do you have hyperthreading or not (in other words does this number of
12 already mean there are 6 physical CPUs and 12 virtuals, or 12
physical)? This would influence the advice maybe a bit...

Otherwise you need to experiment, the optimal setting is heavily
dependent on your specific CPU, memory speed and what you are
calculating (system size). 

When talking about the 24 kpoints and 36 processors, than running
4kpoints on each node  (12 kpoints in parallel) with 3 openmp threads
each might be a reasonable setting.

It is also possible that just leaving some cores idle might be the best
thing to do (as running a lot of k-points in parallel you can get
limited by the memory speed so leaving some cores idle means more
memory bandwidth for the others):
This would correspond to running 8 kpoints on each node or 4 kpoints on
each node with 2 openmp threads each.

The linux kernel and modern processors are also usually good at
handling some small overload and load balancing so you can also try to
overload the system a bit, i.e., 8kpoints per node with 2 openmp
threads each. 

Just try the different settings (single lapw1 run for each should be
enough to get some idea) and compare the timings.

Best regards
Pavel

BTW for lapw0 I would go with something like 3 MPI processes per node
with 4 OpenMP threads for each in this case.

On Thu, 2019-12-12 at 12:28 +0530, Ashwani Kumar wrote:
> Hi, 
>This is related to no. of k-points which we provide during the
> initilization. No. of k-gen points given ; 120 with shifted mesh.
> Irr. k-points : 24k points. Running job on 3 nodes (12 x3 processors,
> 48 gb x 3 Ram). Job running on 24 processors only (with granularity:
> 1, extrafine:1 in *.machine file) which means 1kpoint/1-core. How can
> 24 k-points be made to run on 36  cores ?. Or how can 24 kpoints can
> be distributed equally between 36 cores (or let's say 12 kpoints on
> 24 processors to make calculation converge faster). 
> 
> thanks,
> A. Kumar
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Issues with Wien2k installation

2019-12-03 Thread Pavel Ondračka
On Wed, 2019-12-04 at 06:29 +, Vidit Zala wrote:
> Dear Sir,
> I am using Wien2k version 19.1 on a thinkstation with i7 processor,
> having ubuntu installed in it. I have just installed Wien2k in the
> machine. The gfortran compiler and gcc are used.
> After the installation, I have made a structure using makestruct
> command and initialization was done using init_lapw command.
> While running the scf calculations, with run_lapw command, I am
> facing an error. I tried looking up in the mailing list to solve the
> issue, but haven't found the solution. I am facing the following
> error.
> hup: Command not found.

This is harmless (you can search some old mailing list threads for more
details).
> 
> no Fe.clmsum(_old) file found, which is necessary for lapw0

This is the real issue. The clmsum file contains the total charge
density and should have been created during the init_lapw step. Were
there any errors during the initialization? Is your structure
reasonable?

Best regards
Pavel

> !grep: *scf1*: No such file or directory
> grep: lapw2*.error:
> No such file or directory
> 
> >  
> stop error
> 
> Please guide me to solve this query.
> 
> Thanking you in anticipation.
> 
> Regards,
> Vidit Zala
> Gujarat University
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] lapw2 crashed error

2019-11-25 Thread Pavel Ondračka
Hi,

I can't comment on the lapw2 error but just a small note about the
.machines file. The four "100:localhost" lines mean that you run the
lapw1, lapw2 and hf parallel over kpoints (in four separate processes).
The "omp_global:4" line means that every Wien2k process will try to use
4 threads. Therefore, the total load during lapw1 and lapw2 will be 16,
leading potentially to large overload and suboptimal speed. 
I would suggest to replace the "omp_global:4" with lines "omp_lapw0:4"
and "omp_mixer:4" to just use the OpenMP parallelization in parts of
the Wien2k where there is no parallelization over the kpoints.

Best regards
Pavel

On Tue, 2019-11-26 at 05:53 +0530, Peeyush kumar kamlesh wrote:
> Sir,
> I am using single node of four cores. Mu machine file is below:
> __
> 100:localhost
> 100:localhost
> 100:localhost
> 100:localhost
> granularity:1
> extrafine:1
> omp_global:4
> 
> 
> On Mon, Nov 25, 2019 at 10:06 PM Peeyush kumar kamlesh <
> peeyush.physik@gmail.com> wrote:
> > Hello Wien2k user,
> > Greetings!
> > I am running scf cycle with hf potential. When I run the command
> > "run_lapw -hf -p", then after successful completion of 7 cycles, I
> > found error in cycle 8. In terminal it is represented as follows:
> > 
> > in cycle 8ETEST: .000491915000   CTEST: .0035867
> > hup: Command not found.
> >  LAPW0 END
> >  LAPW0 END
> >  LAPW1 END
> >  LAPW1 END
> >  LAPW1 END
> >  LAPW1 END
> > sed: Command not found.
> > LAPW2 - Error. Check file lapw2.error
> > cp: cannot stat '.in.tmp': No such file or directory
> > 
> > >   stop error
> > -
> > -
> > 
> > When I checked lapw2.error file I found following details:
> > _
> > 'LAPW2' - can't open unit: 10  
> >  
> >  'LAPW2' -filename: /case.vector  
> >  
> >  'LAPW2' -  status: unknown  form: unformatted
> >
> > **  testerror: Error in Parallel LAPW2
> > -
> > --
> > 
> > I also tried to search and understand the previous threads, but I
> > was unable to do so. Kindly suggest me why this error is appearing
> > and how can it be resolved?
> > 
> > Thanks and Regards
> > Peeyush Kumar Kamlesh
> 
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


[Wien] possible overload/underload with OpenMP or threading in general

2019-10-08 Thread Pavel Ondračka
Dear Wien2k mailing list,

in some recent discussion with profs. Marks and Blaha it was shown that
under some circumstances the threading parallelization in Wien2k and
its interaction with threaded BLAS/LAPACK environment variable (MKL but
possibly also OpenBLAS and others) might have unexpected behavior
potentially leading either to not perfect utilization of nodes
(underload) or too many contending threads (overload), both reducing
optimal speed of calculations.

Short story with just three points:

- Occasionally check the load of your nodes when running (either with
"top", similar program or using your job scheduler reporting). If its
much higher or lower than the number of cores, than this could be a
problem and please continue reading.
- If you have previously set MKL_NUM_THREADS, OPENBLAS_NUM_THREADS or
any other equivalent BLAS/LAPACK specific threading variable, please
unset them.
- If you linked with non-default MKL settings or linked with different
threaded BLAS/LAPACK such as OpenBLAS, please make sure that you
BLAS/LAPACK library is internally threaded with OpenMP (not pthreads,
TBB or any other threading library) and it uses the same OpenMP library
as Wien2k (one example of such problematic config would be when
compiling Wien2k with gfortran using MKL and using libiomp5 for MKL
threading but libgomp for OpenMP threading in Wien2k itself).

Best regards
Pavel


P.S.: Long story for people interested in technical details:

Wien2k links with the threaded MKL by default and threaded OpenBLAS is
usually also the default which distributions provide. 

In Wien2k versions before 19 when running stuff k-parallel and without
OMP_NUM_THREADS set (or the BLAS specific equivalent env variables) the
parallel BLAS/LAPACK libraries usually try to use the maximum number of
cores, leading to overload if multiple k-points were running on single
node. This was fixed with Wien2k 19.1 where the threading is now
explicitly controlled from machines files and when no threading is
specified it defaults to one thread per process.

Another problem is with the BLAS/LAPACK specific threading variables
such as MKL_NUM_THREADS, OPENBLAS_NUM_TRHEADS, etc. They have higher
priority than the OMP_NUM_THREADS which is set by Wien2k internally
based on the omp_xxx:y lines in .machines file and therefore can
overwrite optimal threading set by the user. Unsetting them will make
the parallel BLAS/LAPACK obey settings from the .machines file.

More problems can occur when combining different threading models in
Wien2k and BLAS/LAPACK (such as OpenMP and POSIX threads) or using
OpenMP threading in both but different OpenMP libraries (for example
Intel and GNU). This is most likely to happen when using gfortran and
distro-provided OpenBLAS as its default threading is with ptreads.

The OpenMP parallelization in Wien2k works in such a way that there are
some explicit OpenMP parallel regions in which there might be also
BLAS/LAPACK calls. In other places the BLAS/LAPACK calls are done from
serial regions and we depend on parallelization at the BLAS/LAPACK
level. If using OpenMP and same omplib everywhere that in the first
case the BLAS/LAPACK library will recognize it is already being called
from parallel region and run only single threaded while in the second
case it will run with multiple threads as expected. If combining
threading models or different threading libraries the BLAS/LAPACK calls
from OpenMP parallel regions have no way of knowing there are already
multiple threads and can each spawn more threads leading again to
overload.

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] apw2c tries to read an anomalous amount of data

2019-07-29 Thread Pavel Ondračka
If you see a linear scaling for maximum used memory of lapw1c process
as a number of threads even at low thread number than there is
something strange going on. In fact the most memory consuming
diagonalization part should more or less take the same amount of memory
independent of the number of threads. The only part in which memory
consumption scales roughly with number of threads is the hamilt routine
at the beginning of lapw1, however unless you use more than ~10
threads, it should still consume less than the later diagonalization.
Therefore it should not increase the max memory footprint and in fact
at small number of threads the max memory consumption of both lapw1 and
lapw2 shouln't depend much on the number of threads. If you see
something different than please provide some specific numbers so I can
check it.

You are also right that the vector size is unchanged, and therefore the
total amount of IO is of course the same with and without OpenMP.
However, with the k-point parallelization, the issues can be caused by
all the processes running at the same time and accessing the vector
files simultaneously, therefore when you reduce the number of processes
the peak I/O access should be more balanced/reduced. Prof. Blaha
already explained this in his email on Friday.

Another stuff which can unexpectedly influence the results is the
filesystem cache. The linux kernel is usually quite clever with caching
the latest accessed files in the unused memory, so this can help
significantly as well (and again depends on the overall memory
pressure).

Hope this helps to explain it.
Pavel


On Mon, 2019-07-29 at 12:19 +0200, Luc Fruchter wrote:
> What comes out as a surprise (for me), is that the memory needed for 
> lapw2s does not scale with the number of CPUs, while it does for
> lapw1s: 
> when I reduce the number of CPUs, lapw1s memory scales down 
> proportionaly, while total .vector files size is unchanged, and so
> there 
> is no improvement in handling them with lapw2s.
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] lapw2c tries to read an anomalous amount of data

2019-07-26 Thread Pavel Ondračka
The easiest solution to reduce memory pressure is to reduce the number
of k-points run in parallel... You should experiment with other
parallelization options. If running 4 kpoints in parallel does not fit
in your memory (or is slow), try to run for example just with two but
with 2 OpenMP threads per process. Using MPI is another option and also
reduces memory required per CPU.

On Fri, 2019-07-26 at 09:37 +0100, Laurence Marks wrote:
> If I remember right, the largest piece of memory is the vector file
> so this should be a reasonable estimate.
> 
> During the scf convergence you can reduce this by carefully changing
> the numbers at the end of case.in1(c). You don't really need to go to
> 1.5 Ryd above E_F (and similarly reduce nband for ELPA). For DOS etc
> later you increase these and rerun lapw1 etc.
> 
> On Fri, Jul 26, 2019 at 9:27 AM Luc Fruchter 
> wrote:
> > Yes, I have shared memory. Swap on disk is disabled, so the system
> > must 
> > manage differently here.
> > 
> > I just wonder now: is there a way to estimate the memory needed for
> > the 
> > lapw2s, without running scf up to these ? Is this the total .vector
> > size ?
> > ___
> > Wien mailing list
> > Wien@zeus.theochem.tuwien.ac.at
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__zeus.theochem.tuwien.ac.at_mailman_listinfo_wien=DwICAg=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0=xDusGo0KphXQ04Dl6Wf9xCaKVxoL-U4kVBCyrmtP_J4=f2IY4a60LXX2_8DTCObJe-nnPgNcIVqZRBsqpTqrRQU=
> > SEARCH the MAILING-LIST at:  
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mail-2Darchive.com_wien-40zeus.theochem.tuwien.ac.at_index.html=DwICAg=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0=xDusGo0KphXQ04Dl6Wf9xCaKVxoL-U4kVBCyrmtP_J4=g-rsFk4xC7uHaddVQZCS2nKpLz-AyX4WWPpytDCUObI=
> 
> 
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] 3p or 4p PDOS for 3d transition metal

2019-07-12 Thread Pavel Ondračka
If I remember correctly I had some some similar problems in the past if
the deep lying states width was too thin (with respect to the energy
step). Try to reduce the energy step in tetra, or maybe increase the
broadening...

Best regards
Pavel

On Fri, 2019-07-12 at 15:30 +0800, 杨柯 wrote:
> Thanks for the reply.
> 
> The problem is that the energy level of 3p and 3p* orbial as valence
> state of Co atom were not show in my case.dos1eVup file.
> I already change the Emin in the case.inst file. Still the 3p and 3p*
> about -4.5 Ry were not show.
> I checked the case.qtlup file, the energy start in -4 Ry. It looks
> like the energy about -4.5 Ry lost. 
> I have no idea what happened.
> Any suggestions are welcome.
> 
> 
> 
> 
> 
> > -原始邮件-
> > 发件人: t...@theochem.tuwien.ac.at
> > 发送时间: 2019-07-12 15:06:21 (星期五)
> > 收件人: "A Mailing list for WIEN2k users" <
> > wien@zeus.theochem.tuwien.ac.at>
> > 抄送: 
> > 主题: Re: [Wien] 3p or 4p PDOS for 3d transition metal
> > 
> > Hi,
> > 
> > In WIEN2k, the plotting of the DOS is only for the valence states
> > (the core states are not shown). If they were plotted, each core
> > state would just correspond to a vertical line.
> > 
> > In the case where two sets of states with same angular momentum,
> > but different quantum number were both treated as valence,
> > then  case.outputst would tell you approximately were they should
> > appear on the DOS.
> > 
> > FT
> > 
> > On Friday 2019-07-12 03:13, 杨柯 wrote:
> > 
> > > Date: Fri, 12 Jul 2019 03:13:13
> > > From: 杨柯 
> > > Reply-To: A Mailing list for WIEN2k users <
> > > wien@zeus.theochem.tuwien.ac.at>
> > > To: A Mailing list for WIEN2k users <
> > > wien@zeus.theochem.tuwien.ac.at>
> > > Subject: Re: [Wien] 3p or 4p PDOS for 3d transition metal
> > > 
> > > Dear Blaha,
> > > 
> > > Thank you very much for your detailed reply.
> > > 
> > > I have another question that I hope you could help me.
> > > 
> > > The case.outputst have the information about which oribal is
> > > treated as Core-state and which orbital is treated as Valence-
> > > state.
> > > The case I was doing for example Co atom. The 1s,2s,2p,3s were
> > > treated as Core-state.
> > > The 3p,3d,4s were treated as Valence-state.
> > > When the initial input was like this.
> > > Dose thit mean when I using "x lapw2 -orb -up/-dn -qtl" to plot
> > > the PDOS,
> > > and the "configrue_int" to choose the tot,s,p,and the projected
> > > orbit of d orbit,
> > > "x tetra -up/-dn" to show the dos.
> > > It is obvious that the d orbit is the 3d orbit for Co atom.
> > > But I'm not very sure the s,p orbit corresponding to which orbit
> > > 3s, 3p or 3p, 4s?
> > > Is this principal quantum number (n) for ns,np PDOS related to
> > > the orbital you choose as Core-state and  Valence-state?
> > > 
> > > 
> > > I hope you can help me clear up my doubts.
> > > 
> > > 
> > > 
> > > --
> > > Yours sincerely,
> > > Ke Yang
> > > Email: kyan...@fudan.edu.cn
> > > Address: Department of Physics, Fudan University, Handan Road
> > > 220, Shanghai 200433, China
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > > -原始邮件-
> > > > 发件人: "Peter Blaha" 
> > > > 发送时间: 2019-07-11 23:20:37 (星期四)
> > > > 收件人: wien@zeus.theochem.tuwien.ac.at
> > > > 抄送:
> > > > 主题: Re: [Wien] 3p or 4p PDOS for 3d transition metal
> > > > 
> > > > Obviously, 3s,3p and 4s,4p states differ in their energy. The
> > > > principal
> > > > QNs are not "labelled" explicitly.
> > > > 
> > > > Depending on which TM atom you have, the 3s,3p states may range
> > > > at -2.0
> > > > (Sc) to -7 Ry (Cu). Eventually, their bandwidth can be very
> > > > small and
> > > > usually we do not "plot" the corresponding DOS.
> > > > 
> > > > The 4s,4p states are in the valance region around EF. However,
> > > > their
> > > > wave function are very delocalized, thus have very little
> > > > contribution
> > > > inside the atomic sphere, One 4s electron may eventiually have
> > > > only 0.1
> > > > electrons within the sphere, thus the corresponding PDOS will
> > > > be very small.
> > > > For these reasons, I'd recommend to use the recent   xps
> > > > package  (see
> > > > UG). If you provide all possible PDOS (of all atoms and all l
> > > > values)
> > > > this package can renormalize the PDOS, such that the
> > > > interstital DOS is
> > > > "removed" and distributed into the corresponding atomic PDOS.
> > > > 
> > > > 
> > > > 
> > > > Am 11.07.2019 um 16:54 schrieb 杨柯:
> > > > > Dear Blaha and others,
> > > > > 
> > > > >Now, I'm trying to plot the 3s,3p or 4s,4p PDOS for 3d
> > > > > transition metal.
> > > > > 
> > > > > I'm not sure the standard output s,p orbital for transition
> > > > > metal is 3s,3p or 4s,4p.
> > > > > 
> > > > > If not, how can I obtain the 3s,3p,4s,4p PDOS for the
> > > > > transition metal.
> > > > > 
> > > > > Any suggestion are welcome.
> > > > > 
> > > > > Thank you very much for your reply.
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > --
> > > > > 
> > > > > Yours 

Re: [Wien] FFTW compiling

2019-07-03 Thread Pavel Ondračka
Maybe Gavin can clarify but I'm not actually sure if the instructions
in the mentioned email are 100% correct. FFTW there is compiled with
the default static libs (libfftw3.a and libfftw3_mpi.a), but the actual
-lfftw3 -lfftw3_mpi flags used for linking are for the dynamic library
(libfftw3.so and libfftw3_mpi.so). My guess is that it worked for Gavin
due to having another dynamic fftw somewhere in the path (e.g., the
system fftw libraries)?

Anyway, if you can get libfftw3.a but not libfftw3_mpi.a even when you
pass --enable-mpi to fftw configure, there is likely some problem with
your MPI installation. Were there any warnings during fftw configure?
BTW on some distributions like Fedora, only installing openmpi or mpich
is not enough you must also load it to set all the paths properly (for
example on Fedora with "module load openmpi"). This is distribution
specific though.

To get the dynamic library, just pass --enable-shared to fftw
configure, the Wien2k flags should work than (provided you can figure
out the missing mpi fftw lib). However I would highly recommend to get
the fftw from distribution repositories (something like libfftw3-mpi-
dev and libfftw3-dev package is needed, but specific package names vary
with distribution as well), unless you are on some enterprise systemwith legacy 
fftw version. 

Best regards
Pavel

On Wed, 2019-07-03 at 14:57 +0530, Riyajul Islam wrote:
> Hellow wien2k users,
> I am facing a problem in compiling FFTW in sitecofiguration. I have
> followed all the steps as in "
> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg18664.html
> ". But could nopt acces the
> "/home/username/fftw3/mpi/.libs/libfftw3_mpi.a" as in in mailing
> list.
> the current settings for FFTW is 
> F   FFTW options:-DFFTW3 -I/home/edison1/fftw3/include
>   FFTW-LIBS:   -L/home/edison1/fftw3/lib64 -lfftw3
>   FFTW-PLIBS:  NOT FOUND!
> 
> How can I solve this issue?
> 
> Regards,
> Riyajul Islam
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] lapw2c tries to read an anomalous amount of data

2019-07-03 Thread Pavel Ondračka
Just out of curiosity, how do you measure the current disc usage per
process?

Also, how large is your memory consumption? Very high disc IO (and bad
speed in general) can also be associated with swapping. lapw2 can be
the most memory intensive part of the scf cycle.

Best regards
Pavel

On Tue, 2019-07-02 at 19:10 +0200, Luc Fruchter wrote:
> I am facing a problem with lapw2c on a machine running the 18.2
> version 
> of Wien2k. I suspect this is a machine problem, rather than a Wien2k 
> one, but would like to be sure:
> 
> As lapw2c runs in parallel in a cycle, the lapw2c processes will all
> try 
> to read a very large amount of data from the disk (several hundred
> Gb), 
> so keeping the system busy endless.
> 
> As the input files for lapw2c (.energy, .vector) are only a few Gb,
> this 
> reading of hundred Gb seems suspect, and rather a disk problem.
> However, 
> all lapw2c routines experience the same problem when run in
> parallel, 
> which I would not expect if the reading of one file was problematic.
> 
> On the other hand, the first cycles ran without any trouble.
> 
> Thanks
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] core-hole calculation in a molecule

2019-06-28 Thread Pavel Ondračka
So just some brief follow up, in case someone finds this interesting.

First of all I've made a mistake in my previous calculations, there
actually is some dependence on the supercell size for the Slater's
transition state approach. However the difference in binding energies
is only ~0.05-0.1eV when going from 5Å vacuum to 10Å vacuum and changes
less than ~0.01eV when going further to 15Å...

I've tried the Δ-SCF approach as well and this is much worse. The
difference in binding energies is ~0.3-0.4eV when going from 5Å vacuum
to 10Å vacuum and changes by another ~0.2eV when going further to
15Å...

The absolute energy values are better for the Δ-SCF approach (by approx
0.5eV), but since we are about 5eV from the absolute experimental
values anyway, this is likely meaningless. For example taking LDA
instead of PBE can change the absolute values by > 2eV.

What is important, the relative shifts between different carbon atoms
(with respect to experimental data) are also better for the Slater's
transition state than for the Δ-SCF approach (with Slater's transition
state I can get around 10% difference from experiment, while for Δ-SCF
it is more like 20%).

In general I'm very happy with the calculations now, except for the
speed ;-)

Best regards
Pavel


On Fri, 2019-06-21 at 07:39 +0200, Pavel Ondračka wrote:
> On Wed, 2019-06-19 at 16:25 +0200, Peter Blaha wrote:
> > This is certainly interesting.
> > 
> > For a molecule an alternative is to remove one electron and then
> > use 
> > E-tot(N) - E_tot(N-1) as binding energy. However, in this case due
> > to 
> > the charged cells, I'd expect quite some dependency on the cell
> > size
> > and 
> > some correction might be necessary.
> > 
> > Your findings indicate that Slater's transition state method is
> > much
> > better.
> 
> I will try the Δ-SCF approach as well to see if it behaves
> differently.
> But still, I've now done a lot of similar calculations and there was
> always some dependency on the cell size so this is a really big
> surprise...
> 
> BTW for Δ-SCF "E-tot(N) - E_tot(N-1)" is not enough, also μ is
> needed,
> which surprisingly no manuals mention...
> 
> > On the other hand: If you really want to do only organic molecules
> > (but 
> > many of them), any non-periodic molecular code (eg. NWChem, which
> > is 
> > free) will be MUCH cheaper and faster.
> 
> Right, the problem is that ultimately I would like to do the
> interaction with a surface as well (and look for changes), so I still
> do need a periodic boundary condition. In general I agree, when
> hydrogen comes into play the lapw approach is super slow... For now
> I'm
> just exploring this so burning some extra CPU time is not an issue if
> it ultimately saves me the troubles of learning yet another DFT code.
> 
> > Your last question, comparison to bulk materials, you have to find
> > out 
> > yourself.
> > I would not expect perfect agreement with experiment in all cases, 
> > simply because of the problem having a common Energy-zero (we use
> > EF
> > for 
> > this, but EF is well defined only in metals, but the VBM of an
> > insulator 
> > or the HOMO of a molecule is not the same "Fermi energy".
> > 
> > Suppose you put a molecule far away from a metal surface, the DFT 
> > simulation will give you a common EF (which is most likely not
> > where
> > the 
> > HOMO of the molecule is). Thus   (E-1s - EF) will be different if
> > you
> > do 
> > a combined system or the molecule alone, even when the molecule is
> > so 
> > far away that it behaves as a free molecule.
> 
> I'm actually hoping to use core electron binding energy of atoms far
> from the surface (in both the bulk and the molecule) as a reference
> to
> check the core electron binding energy shifts of atoms directly at
> the
> surface, but dunno how this will work in reality.
> 
> Thanks for all the feedback on the list (and in off-the-list emails
> I've received as well)
> Pavel
> 
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] core-hole calculation in a molecule

2019-06-20 Thread Pavel Ondračka
On Wed, 2019-06-19 at 16:25 +0200, Peter Blaha wrote:
> This is certainly interesting.
> 
> For a molecule an alternative is to remove one electron and then use 
> E-tot(N) - E_tot(N-1) as binding energy. However, in this case due
> to 
> the charged cells, I'd expect quite some dependency on the cell size
> and 
> some correction might be necessary.
> 
> Your findings indicate that Slater's transition state method is much
> better.

I will try the Δ-SCF approach as well to see if it behaves differently.
But still, I've now done a lot of similar calculations and there was
always some dependency on the cell size so this is a really big
surprise...

BTW for Δ-SCF "E-tot(N) - E_tot(N-1)" is not enough, also μ is needed,
which surprisingly no manuals mention...

> 
> On the other hand: If you really want to do only organic molecules
> (but 
> many of them), any non-periodic molecular code (eg. NWChem, which is 
> free) will be MUCH cheaper and faster.

Right, the problem is that ultimately I would like to do the
interaction with a surface as well (and look for changes), so I still
do need a periodic boundary condition. In general I agree, when
hydrogen comes into play the lapw approach is super slow... For now I'm
just exploring this so burning some extra CPU time is not an issue if
it ultimately saves me the troubles of learning yet another DFT code.

> Your last question, comparison to bulk materials, you have to find
> out 
> yourself.
> I would not expect perfect agreement with experiment in all cases, 
> simply because of the problem having a common Energy-zero (we use EF
> for 
> this, but EF is well defined only in metals, but the VBM of an
> insulator 
> or the HOMO of a molecule is not the same "Fermi energy".
> 
> Suppose you put a molecule far away from a metal surface, the DFT 
> simulation will give you a common EF (which is most likely not where
> the 
> HOMO of the molecule is). Thus   (E-1s - EF) will be different if you
> do 
> a combined system or the molecule alone, even when the molecule is
> so 
> far away that it behaves as a free molecule.

I'm actually hoping to use core electron binding energy of atoms far
from the surface (in both the bulk and the molecule) as a reference to
check the core electron binding energy shifts of atoms directly at the
surface, but dunno how this will work in reality.

Thanks for all the feedback on the list (and in off-the-list emails
I've received as well)
Pavel

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


[Wien] core-hole calculation in a molecule

2019-06-19 Thread Pavel Ondračka
Dear Wien2k mailing list,

I'm trying to calculate core electron binding energies using the
Slaters transition state approach (half electron removed from the core
compensated by the background charge) in an organic molecule.

As part of the usual convergence checking I did four calculations with
different amount of vacuum, with 5Å, 10Å, 15Å and 20Å in all directions
in order to get some trend and try to extrapolate the final values.
This is the approach similar to what I use for insulators (increasing
supercell size), to estimate the supercell size error due to the
Coulomb interaction between the periodic images of the charged atom.
However to my first surprise there is no change in the binding energies
(~0.01 eV) observed. Thinking about it more it makes sense though, as
there is no screening in the vacuum, so there probably is no reduction
of the interaction (like in the simple electrostatic example where the
electric field intensity next to the infinite charged plane doesn't
depend on the distance to it).

I'm looking for an advice whether someone already tried something like
this and if this kind of calculation (i.e., corehole for molecule,
single atom, or even a 2D material) actually makes a sense from the
physical point of view and also within the lapw framework... For now
I'm comparing the relative shifts of the core electron binding energies
of different carbon atoms within the molecule, and the results looks
quite in agreement with the literature. However I'm not sure how much I
can trust the results and if I can actually compare the values also
with bulk materials.

Any advice would be appreciated
Best regards
Pavel

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] compiling error of lapw1_mpi

2019-06-14 Thread Pavel Ondračka
The first option how to fix this is to compile with ELPA (which I would
recommend anyway as it is much faster).
To fix the compilation you can move the definitions outside the ifdef
ELPA block, i.e., 
change line number 12 in seclr4.F from:
  NPE, myrowhs, mycolhs,ictxt,ictxtall, NBELW_PARA, &
to
  NPE, myrowhs, mycolhs,ictxt,ictxtall, NBELW_PARA, lcolhs, dlcolhs, &

and line number 14 from
  lcolhs, dlcolhs, elpa_switch, scala_switch, elpa_mem_default
to
  elpa_switch, scala_switch, elpa_mem_default

This should fix the compilation but I'm not familiar with this part of
code enough to say how its intended to work. Hopefully prof. Blaha can
confirm this is the correct solution.

Best regards
Pavel
   

On Fri, 2019-06-14 at 11:31 +0200, Wien2k User wrote:
> YES, we have compiling lapw1_mpi just with intel MPI without ELPA
> 
> Le ven. 14 juin 2019 à 11:21, Pavel Ondračka  > a écrit :
> > I'll make a guess that this is with MPI but without ELPA?
> > 
> > The DLCOLHS and the others are defined inside "#ifdef ELPA" block
> > 
> > #ifdef ELPA
> >lcolhs, dlcolhs, elpa_switch,
> > scala_switch,
> > elpa_mem_default
> > #else
> > 
> > but used also in "#ifdef parallel" 
> > blocks (specifically the one starting from line 515). There is also
> > error on line 712 probably for the same reason, but I got lost in
> > the
> > ifdef magic there so dunno.
> > 
> > Pavel
> > 
> > 
> > On Fri, 2019-06-14 at 09:36 +0200, Peter Blaha wrote:
> > > Sorry, but I cannot reproduce this.
> > > 
> > > On 6/14/19 1:10 AM, Wien2k User wrote:
> > > > Dear Prof. P. Blaha
> > > > 
> > > > I got this error when compiling lapw1_mpi with  mpiifort intel
> > > > cluster 
> > > > edition 2018
> > > > 
> > > > seclr4_tmp_.F(520): error #6404: This name does not have a
> > type,
> > > > and 
> > > > must have an explicit type.   [DLCOLHS]
> > > > allocate(H(DLDHS,DLCOLHS))
> > > > ^
> > > > seclr4_tmp_.F(520): error #6385: The highest data type rank
> > > > permitted is 
> > > > INTEGER(KIND=8).   [DLCOLHS]
> > > > allocate(H(DLDHS,DLCOLHS))
> > > > ^
> > > > seclr4_tmp_.F(524): error #6385: The highest data type rank
> > > > permitted is 
> > > > INTEGER(KIND=8).   [DLCOLHS]
> > > > allocate(Z(DLDHS,DLCOLHS))
> > > > ^
> > > > seclr4_tmp_.F(712): error #6404: This name does not have a
> > type,
> > > > and 
> > > > must have an explicit type.   [LCOLHS]
> > > >  deallocate(Z) ; allocate(Z(LDHS,LCOLHS))
> > > > ^
> > > > seclr4_tmp_.F(712): error #6385: The highest data type rank
> > > > permitted is 
> > > > INTEGER(KIND=8).   [LCOLHS]
> > > >  deallocate(Z) ; allocate(Z(LDHS,LCOLHS))
> > > > 
> > > > ___
> > > > Wien mailing list
> > > > Wien@zeus.theochem.tuwien.ac.at
> > > > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> > > > SEARCH the MAILING-LIST at:  
> > > > 
> > http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> > > > 
> > 
> > ___
> > Wien mailing list
> > Wien@zeus.theochem.tuwien.ac.at
> > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> > SEARCH the MAILING-LIST at:  
> > http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> 
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] compiling error of lapw1_mpi

2019-06-14 Thread Pavel Ondračka
I'll make a guess that this is with MPI but without ELPA?

The DLCOLHS and the others are defined inside "#ifdef ELPA" block

#ifdef ELPA
   lcolhs, dlcolhs, elpa_switch, scala_switch,
elpa_mem_default
#else

but used also in "#ifdef parallel" 
blocks (specifically the one starting from line 515). There is also
error on line 712 probably for the same reason, but I got lost in the
ifdef magic there so dunno.

Pavel
 

On Fri, 2019-06-14 at 09:36 +0200, Peter Blaha wrote:
> Sorry, but I cannot reproduce this.
> 
> On 6/14/19 1:10 AM, Wien2k User wrote:
> > Dear Prof. P. Blaha
> > 
> > I got this error when compiling lapw1_mpi with  mpiifort intel
> > cluster 
> > edition 2018
> > 
> > seclr4_tmp_.F(520): error #6404: This name does not have a type,
> > and 
> > must have an explicit type.   [DLCOLHS]
> > allocate(H(DLDHS,DLCOLHS))
> > ^
> > seclr4_tmp_.F(520): error #6385: The highest data type rank
> > permitted is 
> > INTEGER(KIND=8).   [DLCOLHS]
> > allocate(H(DLDHS,DLCOLHS))
> > ^
> > seclr4_tmp_.F(524): error #6385: The highest data type rank
> > permitted is 
> > INTEGER(KIND=8).   [DLCOLHS]
> > allocate(Z(DLDHS,DLCOLHS))
> > ^
> > seclr4_tmp_.F(712): error #6404: This name does not have a type,
> > and 
> > must have an explicit type.   [LCOLHS]
> >  deallocate(Z) ; allocate(Z(LDHS,LCOLHS))
> > ^
> > seclr4_tmp_.F(712): error #6385: The highest data type rank
> > permitted is 
> > INTEGER(KIND=8).   [LCOLHS]
> >  deallocate(Z) ; allocate(Z(LDHS,LCOLHS))
> > 
> > ___
> > Wien mailing list
> > Wien@zeus.theochem.tuwien.ac.at
> > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> > SEARCH the MAILING-LIST at:  
> > http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> > 

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] System configuration

2019-05-23 Thread Pavel Ondračka
Right, as was written in the previous email, the provided config is a
weird mix of ifort and gfortran options, also at some point in
siteconfig you did chose that you want parallel build which now fails. 

> SRC_dstart/compile.msg:make: *** [para] Error 2

All the errors which I have seen up to now are from building parallel
mpi programs. It is likely that the serial stuff still built fine.
BTW Even after fixing the flags (using for example instruction in the
Gavins email) you will still miss the mpi libraries,  therefore it will
not help much. Unfortunately, I don't know how to disable the parallel
build after it has been enabled (and in general the siteconfig is not
very good at clearing completely already set options), so you just have
to ignore the errors for now and hope that the rest is fine (or clean
your Wien2k folder, start from scratch with fresh gfortran config, and
when it asks you about finegrained parallel just say no).

The more important thing is, after using the new compile flags I have
suggested in an earlier email, together with -lopenblas instead of the
-lapack -lblas flags for the linker (and optionally with the provided
patch), is the lapw1 faster?

Best regards
Pavel

On Thu, 2019-05-23 at 20:18 -0600, Gavin Abo wrote:
> The -mp1, -pad, -traceback, and so on look like ifort specific
> compiler flags .
> If you are using gfortran, compiler flags for gfortran need to be
> used for the Compiling Options in siteconfig.  A good starting
> pointing is to use the "Recommended options" by siteconfig for
> linuxgfortran, which is seen in the post [1], before you start
> customizing it with your own flags.  For example, gfortan has
> -fbacktrace [2] instead of the -trackback that ifort has [3].
> [1] 
> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg17903.html
> [2] https://gcc.gnu.org/onlinedocs/gfortran/Option-Summary.html
> [3] 
> https://software.intel.com/en-us/fortran-compiler-developer-guide-and-reference-traceback
> 
> On 5/23/2019 12:37 PM, Indranil mal wrote:
> > I did the patching but after compiling I am getting the 
> > SRC_dstart/compile.msg:gfortran: error: buffered_io: No such file
> > or directory
> > SRC_dstart/compile.msg:gfortran: error: unrecognized command line
> > option ‘-mp1’
> > SRC_dstart/compile.msg:gfortran: error: unrecognized command line
> > option ‘-prec_div’; did you mean ‘-mrecip’?
> > SRC_dstart/compile.msg:gfortran: error: unrecognized command line
> > option ‘-pc80’; did you mean ‘-mpc80’?
> > SRC_dstart/compile.msg:gfortran: error: unrecognized command line
> > option ‘-pad’
> > SRC_dstart/compile.msg:gfortran: error: unrecognized command line
> > option ‘-ip’; did you mean ‘-p’?
> > SRC_dstart/compile.msg:gfortran: error: unrecognized command line
> > option ‘-traceback’
> > SRC_dstart/compile.msg:gfortran: error: unrecognized command line
> > option ‘-assume’; did you mean ‘-msse’?
> > SRC_dstart/compile.msg:make[1]: *** [module.o] Error 1
> > SRC_dstart/compile.msg:make: *** [para] Error 2
> > ...
> 
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] System configuration

2019-05-23 Thread Pavel Ondračka

I'm putting this also back to the list after I received several private 
emails.




Your timing and the ldd shows that you are linking against reference lapack
and blas. You need to replace -llapack -lblas in R_LIBS with -lopenblas 
(this was discussed before in this thread: https://www.mail-archive.com/
wien@zeus.theochem.tuwien.ac.at/msg18194.html )




Also your config is a weird mix of ifort and gfortran options, which results
in a ton of errors for the parallel programs (as was shown in another off-
the-list email). At this moment this doesn't matter as we need to make the
serial stuff working first.




Best regards


Pavel

"

grep "TIME HAMILT" test_case.output1
       TIME HAMILT (CPU)  =    22.8, HNS =    12.3, HORB =     0.0, DIAG =  
  78.9
       TIME HAMILT (WALL) =    22.9, HNS =    12.4, HORB =     0.0, DIAG =  
  78.9

"
 

"
current:FOPT:-ffree-form -O2 -ffree-line-length-none
current:FPOPT:-O1 -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -
traceback -assume buffered_io -I$(MKLROOT)/include
current:LDFLAGS:$(FOPT)
current:DPARALLEL:'-DParallel'
current:R_LIBS:-llapack -lblas -lpthread
current:FFTWROOT:
current:FFTW_VERSION:
current:FFTW_LIB:
current:FFTW_LIBNAME:
current:LIBXCROOT:/opt/etsf/
current:LIBXC_FORTRAN:xcf03
current:LIBXC_LIBNAME:xc
current:LIBXC_LIBDNAME:lib/
current:SCALAPACKROOT:
current:SCALAPACK_LIBNAME:
current:BLACSROOT:
current:BLACS_LIBNAME:
current:ELPAROOT:
current:ELPA_VERSION:
current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_
current:CORES_PER_NODE:1
current:MKL_TARGET_ARCH:intel64
current:RP_LIBS:

linux-vdso.so.1 (0x7ffd78bac000)
liblapack.so.3 => /usr/lib/x86_64-linux-gnu/liblapack.so.3 (0x15344ad
82000)
libblas.so.3 => /usr/lib/x86_64-linux-gnu/libblas.so.3 (0x15344ab15000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x15344a8f
6000)
libgfortran.so.4 => /usr/lib/x86_64-linux-gnu/libgfortran.so.4 (0x15344a
517000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x15344a179000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x153449d88000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x153449b7)
/lib64/ld-linux-x86-64.so.2 (0x15344ba2e000)
libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x
15344993)




On Thu, May 23, 2019 at 8:52 PM Indranil mal mailto:indranil@gmail.com)> wrote:

"

Thanks a lot.


Sir my calculations are running when I do the x lapw1 may be due to that 
this time is too long.


I have installed ifort and intel mpi mkl but could not configured that is 
why  I am using GFORTRAN and gcc the basic gnu compiler and open blas. If
you dont mind you can access my pc through team viewer. 











On Thu, May 23, 2019 at 7:50 PM Pavel Ondračka mailto:pavel.ondra...@email.cz)> wrote:

"Well,

first we need to figure out why is your serial lapw so slow...
You definitely don't have the libmvec patches, however almost two min
runtime suggest that even your BLAS might be bad?

In the test_case folder run:
$ grep "TIME HAMILT" test_case.output1
and post the output. Also please go to the Wien2k folder and send the
output of
$ cat WIEN2k_OPTION
and
$ ldd lapw1

Next Wien2k version will have this simplified, however for now some
patching needs to be to be done. The other option would be to get MKL
and ifort from Intel and use it instead...

Anyway if you don't want MKL, you need to download the attached patch
to the SRC_lapw1 folder in Wien2k base folder.
Go to the folder, and apply the patch with (you might need the patch
package for that)
$ patch -p1 < lapw1.patch
then set the FOPT compile flags via siteconfig to:
-ffree-form -O2 -ffree-line-length-none -march=native -ftree-vectorize
-DHAVE_LIBMVEC -fopenmp
and recompile lapw1.
Now when you do again
$ ldd lapw1
it should show line with "libmvec.so.1 => /lib64/libmvec.so.1"

Compare timings again with the test_case.
Also try:
$ OMP_NUM_THREADS=2 x lapw1
$ OMP_NUM_THREADS=4 x lapw1

And after each run show total timings as well as
$ grep "TIME HAMILT" test_case.output1
Hopefully, you are already linking the multithreaded Openblas (but
dunno what is the Ubuntu default)...

I'll help you with the parallel execution in the next step.

Best regards
Pavel

On Thu, 2019-05-23 at 18:58 +0530, Indranil mal wrote:
> Dear sir
>
> After running x lapw1  I got the following
>
> ~/test_case$ x lapw1
> STOP  LAPW1 END
> 114.577u 0.247s 1:54.82 99.9% 0+0k 0+51864io 0pf+0w
>
> I am using parallel k point execution only 8 GB memory is in use and
> for 100 atom (100 kpoints) calculation it is taking around 12 hours
> to complete one cycle.
> please help me.     
>
> Thanking you
>
> Indranil
>
> On Thu, May 23, 2019 at 11:22 AM Pavel Ondračka <
> pavel.ondra...@email.cz(mailto:pavel.ondra...@email.cz)> wrote:
> > Hi Indranil,
> >
> > While the

Re: [Wien] System configuration

2019-05-23 Thread Pavel Ondračka

Hi Indranil,




I'm sending this again this time also to the list (haven't noticed you
removed it), in the hope it might be useful for someone optimizing with 
gfortran as well...




Pavel




"Well,

first we need to figure out why is your serial lapw so slow...
You definitely don't have the libmvec patches, however almost two min
runtime suggest that even your BLAS might be bad?

In the test_case folder run:
$ grep "TIME HAMILT" test_case.output1
and post the output. Also please go to the Wien2k folder and send the
output of
$ cat WIEN2k_OPTION
and
$ ldd lapw1

Next Wien2k version will have this simplified, however for now some
patching needs to be to be done. The other option would be to get MKL
and ifort from Intel and use it instead...

Anyway if you don't want MKL, you need to download the attached patch
to the SRC_lapw1 folder in Wien2k base folder.
Go to the folder, and apply the patch with (you might need the patch
package for that)
$ patch -p1 < lapw1.patch
then set the FOPT compile flags via siteconfig to:
-ffree-form -O2 -ffree-line-length-none -march=native -ftree-vectorize 
-DHAVE_LIBMVEC -fopenmp
and recompile lapw1.
Now when you do again
$ ldd lapw1
it should show line with "libmvec.so.1 => /lib64/libmvec.so.1"

Compare timings again with the test_case.
Also try:
$ OMP_NUM_THREADS=2 x lapw1
$ OMP_NUM_THREADS=4 x lapw1

And after each run show total timings as well as
$ grep "TIME HAMILT" test_case.output1
Hopefully, you are already linking the multithreaded Openblas (but
dunno what is the Ubuntu default)...

I'll help you with the parallel execution in the next step.

Best regards
Pavel

On Thu, 2019-05-23 at 18:58 +0530, Indranil mal wrote:
> Dear sir
>
> After running x lapw1 I got the following
>
> ~/test_case$ x lapw1
> STOP LAPW1 END
> 114.577u 0.247s 1:54.82 99.9% 0+0k 0+51864io 0pf+0w
>
> I am using parallel k point execution only 8 GB memory is in use and
> for 100 atom (100 kpoints) calculation it is taking around 12 hours
> to complete one cycle.
> please help me.
>
> Thanking you
>
> Indranil
>
> On Thu, May 23, 2019 at 11:22 AM Pavel Ondračka <
> pavel.ondra...@email.cz> wrote:
> > Hi Indranil,
> >
> > While the k-point parallelization is usually the most efficient
> > (provided you have sufficient number of k-points) and does not need 
> > any
> > extra libraries, for 100atoms case it might be problematic to fit
> > 12
> > processes into 32GB of memory. I assume you are already using it
> > since
> > you claim to run on two cores?
> >
> > Instead check what is the maximum memory requirement of lapw1 when
> > run
> > in serial and based on that find how much processes you can run in
> > parallel, than for each place one line "1:localhost" into .machines 
> > file (there is no need to copy .machines from templates, or use
> > random
> > scripts, instead read the userguide to understand what you are
> > doing,
> > it will save you time in the long run). If you can run at least few 
> > k-
> > points in parallel it might be enough to speed it up significantly. 
> >
> > For MPI you would need openmpi-devel scalapack-devel and fftw3-
> > devel
> > (I'm not sure how exactly are they named on Ubuntu) packages.
> > Especially the scalapack configuration could be tricky, it is
> > probably
> > easiest to start with lapw0 as this needs only MPI and fftw.
> >
> > Also based on my experience with default gfortran settings, it is
> > likely that you don't have even optimized the single core
> > performance,
> > try to download the serial benchmark
> > http://susi.theochem.tuwien.ac.at/reg_user/benchmark/test_case.tar.gz 
> > untar, run x lapw1 and report timings (on average i7 CPU it should
> > take
> > below 30 seconds, if it takes significantly more, you will need
> > some
> > more tweaks).
> >
> > Best regards
> > Pavel
> >
> > On Thu, 2019-05-23 at 10:42 +0530, Dr. K. C. Bhamu wrote:
> > > Hii,
> > >
> > > If you are doing k-point parallel calculation (having number of
> > k-
> > > points in IBZ more then 12) then use below script on terminal
> > where
> > > you want to run the calculation or use in your job script with
> > -p
> > > option in run(sp)_lapw (-so).
> > >
> > > if anyone knows how to repeat a nth line m times in a file then
> > this
> > > script can be changed.
> > >
> > > Below script simply coping machine file from temple directory and 
> > > updating it as per your need.
> > > So you do not need copy it, open it in your favorite editor and
> >

Re: [Wien] System configuration

2019-05-22 Thread Pavel Ondračka
Hi Indranil,

While the k-point parallelization is usually the most efficient 
(provided you have sufficient number of k-points) and does not need any
extra libraries, for 100atoms case it might be problematic to fit 12
processes into 32GB of memory. I assume you are already using it since
you claim to run on two cores?

Instead check what is the maximum memory requirement of lapw1 when run
in serial and based on that find how much processes you can run in
parallel, than for each place one line "1:localhost" into .machines
file (there is no need to copy .machines from templates, or use random
scripts, instead read the userguide to understand what you are doing,
it will save you time in the long run). If you can run at least few k-
points in parallel it might be enough to speed it up significantly.

For MPI you would need openmpi-devel scalapack-devel and fftw3-devel
(I'm not sure how exactly are they named on Ubuntu) packages.
Especially the scalapack configuration could be tricky, it is probably
easiest to start with lapw0 as this needs only MPI and fftw.

Also based on my experience with default gfortran settings, it is
likely that you don't have even optimized the single core performance,
try to download the serial benchmark 
http://susi.theochem.tuwien.ac.at/reg_user/benchmark/test_case.tar.gz
untar, run x lapw1 and report timings (on average i7 CPU it should take
below 30 seconds, if it takes significantly more, you will need some
more tweaks).

Best regards
Pavel

On Thu, 2019-05-23 at 10:42 +0530, Dr. K. C. Bhamu wrote:
> Hii,
> 
> If you are doing k-point parallel calculation (having number of k-
> points in IBZ more then 12) then use below script on terminal where
> you want  to run the calculation or use in your job script with -p
> option in run(sp)_lapw (-so).
> 
> if anyone knows how to repeat a nth line m times in a file then this
> script can be changed.
> 
> Below script simply coping machine file from temple directory and
> updating it as per your need.
> So you do not need copy it, open it in your favorite editor and do it
> manually.
> 
> cp $WIENROOT/SRC_templates/.machines . ; grep localhost .machines |
> perl -ne 'print $_ x 6' > LOCALHOST.dat ; tail -n 2 .machines >
> grang.dat ; sed '22,25d' .machines > MACHINE.dat ; cat MACHINE.dat
> localhost.dat grang.dat > .machines ; rm LOCALHOST.dat MACHINE.dat
> grang.dat
> 
> regards
> Bhamu
> 
> 
> On Wed, May 22, 2019 at 10:52 PM Indranil mal  > wrote:
> > respected sir/ Users,
> > I am using a PC with intel i7 8th gen (with 12
> > cores) 32GB RAM and 2TB HDD with UBUNTU 18.04 LTS. I have installed
> > OpenBLAS-0.2.20 and using GNU FORTRAN and c compiler. I am trying
> > to run a system with 100 atoms only two cores are using the rest of
> > them are idle and the calculation taking a too long time. I have
> > not installed mpi ScaLAPACK or elpa. Please help me what should I
> > do to utilize all of the cores of my cpu.
> > 
> > 
> > 
> > Thanking you 
> > 
> > Indranil
> > ___
> > Wien mailing list
> > Wien@zeus.theochem.tuwien.ac.at
> > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> > SEARCH the MAILING-LIST at:  
> > http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> 
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Wien2k on AVX512 CPUs

2019-02-27 Thread Pavel Ondračka
On Wed, 2019-02-27 at 06:54 -0600, Laurence Marks wrote:
> The script I used is below, works fine with  versions 19.0.2.187
> 20190117. You might have wanring/issues with the compilation of their
> test programs; I hacked configure.ac to remove them.
> 
> I suspect the issue with HAMILT is misleading, as it has very little
> MKL. I suggest doing "grep Time case.output1" to look at the
> individual parts.
> 

You are of course right. Sigh, I need to read my mails at least once
more before sending... I was of course thinking about DIAG part
(specifically about the single ZHETRD call). HAMILT and HNS are OK. So at least 
that part I got right.
> ---
> export FFLAGS="-O2 -pc80 -msse4.2 -fminshared -axCORE-AVX512 -pad -ip
> -fimf-precision=high -prec_div -traceback -no-complex-limited-range
> -no-fast-transcendentals -no-ftz "
> export CFLAGS="-O2 -pc80 -msse4.2 -fminshared -axCORE-AVX512 -ip
> -fimf-precision=high -prec_div -traceback -no-complex-limited-range
> -no-fast-transcendentals -no-ftz "
> 
> 
> 
> 
> export MPICC=mpiicc
> export CC=mpiicc
> export CXX=mpiicc
> export F77=mpiifort
> export F90=mpiifort
> export FC=mpiifort
> export FCFLAGS=$FFLAGS
> export MPIFC=mpiifort
> export MPIF90=mpiifort 
> 
> 
> export SCALAPACK_LDFLAGS=
> export SCALAPACK_FCFLAGS=
> export CFLAGS+="-mkl=cluster"
> export FCFLAGS+="-mkl=cluster"
> export CXXFLAGS=$CFLAGS
> 
> 
> ./configure --prefix=/opt/elpaRC1 --disable-shared --enable-avx512 --
> disable-tests --disable-legacy-interface
> make install

Thank you, I'll test this and post results.

Best regards
Pavel

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Wien2k on AVX512 CPUs

2019-02-27 Thread Pavel Ondračka
t; working with
> > > this MKL_ENABLE_INSTRUCTIONS variable:
> > > --avx512
> > >  TIME HAMILT (CPU)  = 5.1, HNS = 2.1,
> > HORB =   
> > >   0.0,
> > > DIAG =15.3
> > >  TIME HAMILT (WALL) = 5.4, HNS = 2.1,
> > HORB =   
> > >   0.0,
> > > DIAG =15.3
> > > --avx2
> > >  TIME HAMILT (CPU)  = 5.8, HNS = 2.5,
> > HORB =   
> > >   0.0,
> > > DIAG =16.3
> > >  TIME HAMILT (WALL) = 6.1, HNS = 2.5,
> > HORB =   
> > >   0.0,
> > > DIAG =16.3
> > > 
> > > However, when using OMP_NUM_THREADS=8, this difference is
> > further
> > > reduced (probably due to memory bounds ?)
> > > ---avx512
> > >  TIME HAMILT (CPU)  =19.9, HNS = 7.7,
> > HORB =   
> > >   0.0,
> > > DIAG =24.2
> > >  TIME HAMILT (WALL) = 2.6, HNS = 1.0,
> > HORB =   
> > >   0.0,
> > > DIAG = 3.2
> > > avx2
> > >  TIME HAMILT (CPU)  =20.0, HNS = 7.4,
> > HORB =   
> > >   0.0,
> > > DIAG =27.0
> > >      TIME HAMILT (WALL) = 2.6, HNS = 1.0,
> > HORB =   
> > >   0.0,
> > > DIAG = 3.5
> > > -
> > 
> > > 
> > > Yes, we have the latest ELPA elpa-2018.11.001 installed.
> > Seems
> > > to run
> > > without problems and is overall significantly better than
> > the
> > > old ELPA),
> > > but it requires a change in the user interface. The next
> > release of
> > > WIEN2k will have two elpa versions supported, a ELPA15
> > (which is in
> > > WIEN2k_18), and a new ELPA interface for elpa versions
> > later
> > > than 2017
> > > (this is somehow like FFTW2 and FFTW3 versions).
> > > 
> > > So in essence: with the present code one cannot use
> > > ELPA-versions from
> > > 2017 or later.
> > > 
> > > On 2/27/19 7:34 AM, Pavel Ondračka wrote:
> > >  > Dear mailing list,
> > >  >
> > >  > just out of curiosity has anyone any experience
> > running
> > > Wien2k on a
> > >  > AVX512 capable machine (eg. the KNL accelerators or
> > recent Intel
> > >  > skylake-avx512 CPUs)?
> > >  >
> > >  > Recently my cluster updated to this skylake-avx512
> > machines
> > > however I'm
> > >  > unable to get any better performance for Wien2k. In
> > > particular MKL seem
> > >  > to suck, for example in single core performance (with
> > the serial
> > >  > test_case) the eigenvalue problem is actually faster
> > when I
> > > forbid the
> > >  > usage of AVX512 instructions:
> > >  >
> > >  > running with MKL_VERBOSE=1
> > MKL_ENABLE_INSTRUCTIONS=AVX2
> > >  > MKL_VERBOSE
> > >  >
> > >   
> >  ZHETRD(L,3481,0x2b74d8567cc0,3481,0x2b74d82121c0,0x2b74d8218e88,0x
> > 2b74e
> > >  > f769b00,0x2b74ef777490,452530,0) 10.21s CNR:OFF Dyn:1
> > FastMM:1
> > >  > TID:0  NThr:1
> > >  >
> > >  > with MKL_ENABLE_INSTRUCTIONS=AVX512
> > >  > MKL_VERBOSE
> > >  >
> > >   
> >  ZHETRD(L,3481,0x2b5397c96cc0,3481,0x2b53979411c0,0x2b5397947e88,0x
> > 2b53a
> > >  > ee98b00,0x2b53aeea6490,452530,0) 12.31s CNR:OFF Dyn:1
> > FastMM:1
> > >  > TID:0  NThr:1
> > >  >
> > >  > This is somewhat compensated by speedups in the hamilt
> > part
> > > (the VML
> > >  > stuff and various ?GEMMs seem to be actually slightly
> > > faster), but
> > >  > overall the performance is mostly the same wi

[Wien] Wien2k on AVX512 CPUs

2019-02-26 Thread Pavel Ondračka
Dear mailing list,

just out of curiosity has anyone any experience running Wien2k on a
AVX512 capable machine (eg. the KNL accelerators or recent Intel
skylake-avx512 CPUs)?

Recently my cluster updated to this skylake-avx512 machines however I'm
unable to get any better performance for Wien2k. In particular MKL seem
to suck, for example in single core performance (with the serial
test_case) the eigenvalue problem is actually faster when I forbid the
usage of AVX512 instructions:

running with MKL_VERBOSE=1 MKL_ENABLE_INSTRUCTIONS=AVX2
MKL_VERBOSE
ZHETRD(L,3481,0x2b74d8567cc0,3481,0x2b74d82121c0,0x2b74d8218e88,0x2b74e
f769b00,0x2b74ef777490,452530,0) 10.21s CNR:OFF Dyn:1 FastMM:1
TID:0  NThr:1

with MKL_ENABLE_INSTRUCTIONS=AVX512
MKL_VERBOSE
ZHETRD(L,3481,0x2b5397c96cc0,3481,0x2b53979411c0,0x2b5397947e88,0x2b53a
ee98b00,0x2b53aeea6490,452530,0) 12.31s CNR:OFF Dyn:1 FastMM:1
TID:0  NThr:1

This is somewhat compensated by speedups in the hamilt part (the VML
stuff and various ?GEMMs seem to be actually slightly faster), but
overall the performance is mostly the same with and without the AVX512
stuff. OpenBLAS is maybe 15% slower so not an option as well...

Moreover for MPI version I'm not able to get a correctly working ELPA
compiled with the AVX512 support (I went for the latest elpa-
2018.11.001 version), it just returns bogus results and diverges after
few iterations. If someone has this working I'd be really grateful for 
a working configure line, and advice with which elpa and which compiler
version this was.

Unfortunately I was not able to get any support from the cluster admins
beyond "We see a 30% per-core performance increase in average"
therefore asking here if anyone has experience with such machines.

Any advice would be appreciated.
Best regards
Pavel

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


[Wien] expand script name clash with the GNU coreutils utility expand

2019-02-17 Thread Pavel Ondračka
Dear Wien2k mailing list,

this is not a bugreport per se but it might be wise to remove the short
"expand" symlink to the expand_lapw script from the Wien2k package.

There is actually a name clash with the "expand" utility from GNU
coreutils (
https://www.gnu.org/software/coreutils/manual/html_node/expand-invocation.html#expand-invocation
, installed by default on any linux box). Most users are be probably
unaffected as it is not that widely used but it has some uses (in my
case I was compiling the latest glibc and the build script uses it for
some text processing).

Best regards
Pavel 

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] N1s binding energy in TiN

2019-01-17 Thread Pavel Ondračka
On Thu, 2019-01-17 at 09:28 +0100, Peter Blaha wrote:
> Sorry for the confusion. The quoted values in the pdf are probably
> from 
> a lousy calculation and are ment only to demonstrate the effect (if
> I 
> remember correctly, only a 4x bigger P supercell, but for sure not 
> converged, I also don't remember which functional, ) and only 
> accidentally match experiment.

>From my current testing the supercell effect (the core-hole interaction
with its periodic images) is minimal for the TiN, e.g. the N1s binding
energy difference between 8 atoms and 216 atoms is only about 0.1eV (at
similar other parameters). The screening of the core hole is very good
here (at least for the TiN but probably also in other metals, from my
testing cells around 64 atoms are reasonably converged). This is ofc
completely different story for insulators... 

What I saw was that during my testing the difference between the small
8 atom cell with default parameters and the 216 atoms cells with well
tweaked numerical parameters was below 0.2eV. Basically the only input
which has any significant difference is the functional. Which made me
think that maybe I'm missing some secrest ingredient. Probably not...
:-)

> My general experience is that core-eigenvalues (taken with respect to
> EF 
> !!) are 10-15% off, while slater TS gives 1-2 %, i.e. an order of 
> magnitude better.

Right, the main problem with the core-eigenvalues is not that they are
shifted in absolute values but rather they show almost no chemical
shifts, which only become visible when the core-hole is added. The 1-
2% 
is also what I see here and in general is reasonable, I was just
wondering if I could do better.

Best regards
Pavel

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] N1s binding energy in TiN

2019-01-17 Thread Pavel Ondračka
On Thu, 2019-01-17 at 02:46 -0600, Laurence Marks wrote:
> My three cents. I think an agreement of 0.1eV should be considered as
> fortuitous. There are many issues which are glossed over even with
> the miraculous exact functional:
> 
> 1) The Slater method is a very clever use of the mean value theorem
> for an integral. However, it is only 1 value. You can check the
> literature, I remember seeing papers where people use a range of
> holes to more accurately do the integral.

Thanks for the pointer, I'll do some reading. This should not be a
problem for the delta-scf method, right?

I've always stuck with the Slater's method up to now since if I
understand the delta-scf method properly then for insulators you need
to place the electron in background and the E_b is than calculated
as E^tot_finalstate(N-1) - E^tot_initialstate(N) + μ. However I'm not
sure how to include the lat part. And if you do just
E^tot_finalstate(N-1) - E^tot_initialstate(N) you get a really bad
results. Metals are better since you can place the electron to valence
band and do  E^tot_initialstate(N) - E^tot_finalstate(N), which gives
in practice almost identical results as the Slater's approach.

BTW I did found it interesting that for metals and the Slater's
transition state calculations it actually doesn't matter if you place
the extra half electron in the valence band or in background (0.02eV
difference for TiN).

> 
> 2) The simple dft-based calculations assume that the final states are
> plane waves. Rigorously the exiting photoelectron in XPS is an
> evanescent Bloch wave (for a crystal). There is literature on this,
> but I doubt that it has been combined with DFT.

This is probably beyond my knowledge/interest I could check how large
differences this could cause.

I've actually seen some articles where authors claim to calculate good
absolute binding energies (Ozaki, T., & Lee, C.-C. (2017). Absolute
Binding Energies of Core Levels in Solids from First Principles.
Physical Review Letters, 118(2), 026401. 
https://doi.org/10.1103/PhysRevLett.118.026401) and their way to good
absolute results was the exact coulomb cutoff method and some penalty
functional, but I'm not sure how much this would be applicable to lapw
method.

> 3) In experiments you have to worry about photoelectron diffraction,
> and there will be some shifts to higher apparent binding energy due
> to phonon inelastic scattering. And you have to worry about charging
> and band bending for insulators, chemisorption induced work function
> changes (how clean is your XPS?)

The photoelectron diffraction and phonons is probably something I can't
do anything about, except for hoping that it does not affect the
relative shifts however I've always thought that the work function,
charging, etc. could be neglected considering you do proper change
compensation and align the measured spectra properly (either with the
adventitious carbon or preferably with respect to Au), than you should
have just the proper value with respect to the Fermi energy?

What I have found more problematic for semiconductors is that from
normal calculations you don't get the correct position of Fermi level,
since its somewhere in the band gap. In those cases I've tried to align
the experimental spectra to the valence band maximum for comparison
with calculations which sort of works but introduces extra
uncertainties.

Anyway thank you for the thoughts.
Best regards
Pavel Ondračka
> 
> _
> Professor Laurence Marks
> "Research is to see what everybody else has seen, and to think what
> nobody else has thought", Albert Szent-Gyorgi
> www.numis.northwestern.edu
> 
> On Thu, Jan 17, 2019, 02:29 Peter Blaha   wrote:
> > Sorry for the confusion. The quoted values in the pdf are probably
> > from 
> > a lousy calculation and are ment only to demonstrate the effect (if
> > I 
> > remember correctly, only a 4x bigger P supercell, but for sure not 
> > converged, I also don't remember which functional, ) and only 
> > accidentally match experiment.
> > 
> > My general experience is that core-eigenvalues (taken with respect
> > to EF 
> > !!) are 10-15% off, while slater TS gives 1-2 %, i.e. an order of 
> > magnitude better.
> > 
> > 
> > On 1/17/19 9:15 AM, Pavel Ondračka wrote:
> > > Dear Wien2k mailing list,
> > > 
> > > I'm looking for some advice regarding the calculation of core
> > level
> > > binding energies (to compare with XPS experiments). First of all
> > there
> > > is this nice lecture where prof. Blaha actually shows some
> > calculations
> > > 
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__susi.theochem.tuwien.ac.at_reg-5Fuser_textbooks_WIEN2k-5Flecture-2D=DwIGaQ=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws=U

[Wien] N1s binding energy in TiN

2019-01-17 Thread Pavel Ondračka
Dear Wien2k mailing list,

I'm looking for some advice regarding the calculation of core level
binding energies (to compare with XPS experiments). First of all there
is this nice lecture where prof. Blaha actually shows some calculations
http://susi.theochem.tuwien.ac.at/reg_user/textbooks/WIEN2k_lecture-
notes_2011/Blaha_xas_eels.pdf of core levels with perfect results. For
example with TiN the deltaSCF method gets 397.1eV for the N1s level as
compared to 397.0eV experiment. The trouble is that I'm not able to
reproduce this.

I've done some calculations before and I was never really happy with
the absolute values which were always few eV off but I've always
thought this is just the limitation of xc functional or methodology.
Hence seeing the nice results in the lecture surprised me. However, I'm
not able to reproduce the values even for metals from the example. For
the TiN I'm getting values of 404.8eV with the slaters transition state
approach and 404.6eV with delta-scf (here I'm using the formula for
metals E_b = E^tot_initialstate(N) - E^tot_finalstate(N), i.e. placing
the core-electron in the valence band and with PBE). I have thought
that this is maybe functional difference, since while taking LDA
instead of PBE shifts the results differ almost by 4eV (to 400.9eV).
However with the PBE I get the core energy ε_i as 377.4eV (consistent
with the mentioned pdf where it is 377.5eV) so maybe this is not just
about functional?. I've already checked convergence with supercell size
as well as numerical parameters and I'm actually out of ideas.

To be honest, I'm not much concerned personally about the discrepancy
since the chemical shifts seem to be reasonable even if the absolute
values are not. I just think that if it is possible to get the absolute
values right (or at least closer to experiment) as in the lecture pdf,
the results would of course look way better, therefore I'll be grateful
for any comments and help.

Best regards
Pavel




___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] non-zero value of epsilon 2 below the energy band gap

2018-12-16 Thread Pavel Ondračka
Dear Anup Pradhan Sakhya,

this probably just due to the broadening procedure. If you set
broadening to zero in kram, than you show have no absorption below the
band gap (with plain optic and no scissor). Or just take a look at the
output from joint which should contain the unbroadened imaginary part
of the dielectric tensor (epsilon 2).

In general if you want to compare your spectra against experiment some
broadening is needed for a good agreement (under the assumption that
your DOS is already good and the approximations in optic package works
for your material). There are processes which cause absorption below
the gap also in the experiment (the Urbach tail, instrumental
broadening, etc.) hence this is nothing to worry about.

The Lorentzian broadening in the kram is not really a good model
though, especially regarding the absorption bellow the band gap, since
it decays quite slowly and hence can produce nonzero absorption way too
low below the gap. I have much better experience with Gaussian
broadening (and it should be theoretically a better match for the
processes which happen in experiment). Unfortunately, kram only
supports Lorentzian broadening.

Best regards
Pavel 


On Sat, 2018-12-15 at 01:19 +0530, Anup Shakya wrote:
> Dear All,
> 
> I have performed calculations for two double perovskite oxide
> materials and the band gap of the material is found to be more than
> 1. 3 eV for both materials. The calculations have been performed
> using GGA+U, since it contains rare earth materials. The value of U
> have been used from the literature. The energy convergence was
> performed till 0.0001 Ry and the optical properties were calculated
> using optic. However, the imaginary part of the dielectric constant
> (epsilon 2) shows non-negligible value below the energy band gap. PBE
> was used as the exchange correlation functional and if I am not wrong
> then for the calculation of epsilon 2 the contribution of inter-band
> transitions are taken into consideration and the intra-bands are
> neglected. Then what is the reason for the non-zero value of epsilon
> 2 below the energy band gap?
>  If anyone could suggest some views, I would be very grateful. Please
> let me know, if anyone needs more information.
> 
> Sincerely,
> Anup Pradhan Sakhya,
> TIFR.


___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] "OpenBlas" package instead of default "blas"

2018-11-20 Thread Pavel Ondračka
On Mon, 2018-11-19 at 23:54 +0530, Ashwani Kumar wrote:
> Dear Dr. Pavel Ondracka,
> In previous thread,  
> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg18098.html
> 
> you advised to use OpenBlas package to extract best performance from
> processor. Since i was having problem with wien2k installation, so i
> went with Dr. Gavin's set of instructions (for lapack devel package).
> Now i want to speed up the wien2k execution (simple oxides too take
> much time). Further i noted that at a time, only one thread remains
> 100% busy, rest threads shows load level 1-5%. 
> Configuration of my pc: i7-8700 (6 cores, 12 threads), 8 gb ram (can
> be upgraded to 16 gb), fedora-28, graphic card (gtx...)
> 
> I understand that "openBlas" need to be installed and set R_path to
> -lopenblas. I also want to utilize thread level parallelism if it
> boosts the processor's performance further by a factor of  >= x1.5
> times.

Dear A. Kumar,

I don't fully understand your comment about the thread load? The Wien2k
does not ATM spawn multiple threads (unless you use threaded
blas/lapack). The k-point (or MPI) parallel calculations spawn multiple
processes but those should never be at 1-5% load... 

IMO there are likely two problems here:
1) If you are only using one machine and your case has a lot of k-point 
(and you are not memory-bound), what you want is k-point parallelism.
This can be done with the .machines file (and the -p switch). If you
are only using single machine your .machines file should contain
"1:localhost" line for every processor on your computer (i.e. in your
specific case reasonable .machines file would have 6 (maybe even 12
with hyperthreading, but you need to test your optimal setup) identical
lines. Please check the userguide for more details about the k-point
parallel execution and .machines file in general.

2) regarding the openblas: what you need is an openblas devel package.
In the beginning I suggest the serial openblas "dnf install openblas-
devel" and set R_LIBs to just "-lopenblas". If you want to squeeze more
speed (and you are using only single computer), add also "-ftree-
vectorize -march=native" to your FOPT flags.

If you really want to go with the threaded openblas I can help you
later but IMO this should not be needed in the beginning (as the k-
point parallelism is the optimal one). You will also need some further
tricks to make lapw1 fast with the libmvec. Either see 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg16159.html
or I can provide some new patches which do the same with OpenMP (but
first get the k-point parallelism and serial openblas working).

Hope this helps

Best regards
Pavel

> 
> Waiting for your expert advise,
> 
> thanks,
> A. Kumar
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


[Wien] crash in tetra -enefile

2018-11-12 Thread Pavel Ondračka
Dear Wien2k mailing list,
there is a small bug in tetra -enefile which could occasionally result
in a crash like this:

x tetra -enefile
Program received signal SIGSEGV: Segmentation fault - invalid memory
reference.
Backtrace for this error:
Segmentation fault (core dumped)
0.023u 0.389s 0:00.56 71.4% 0+0k 0+216io 0pf+0w

This is an uninitialized variable problem (here reproducible with
gfortran 8.2.1 and with the right star alignment due to the dependence
on random uninitialized memory) 

The crash happens at tetra.f:464
tetra.f:462   if(nnsum_dos.gt.0) then
tetra.f:463 do i=1,nnsum_dos
tetra.f:464   WRITE(6,1176)
i,(isumdos(i,i1),i1=1,nnsum_dos_max)
with out of bound read of isumdos, (I don't have any SUM in my int
file) hence the "if(nnsum_dos.gt.0)" should be false, but nnsum_dos is
unitialized at this point.

valgrind:
==30563== Conditional jump or move depends on uninitialised value(s)
==30563==at 0x40A38E: MAIN__ (tetra.f:462)
==30563==by 0x40B3B3: main (tetra.f:6)
(gdb) print nnsum_dos
$1 = 528

The variable is supposed to be set here:
tetra.f:256  nnsum_dos=0
tetra.f:257  read(5,'(a)',end=91) system
tetra.f:258  if(system(1:3).ne.'SUM') goto 91
tetra.f:259  read(system(5:70),*,ERR=91,END=91)
nnsum_dos,nnsum_dos_max

however the entire block is skipped with -enefile due to 
tetra.f:216   if(enefile) goto 200
which jumps to
tetra.f:343  200  CONTINUE

The solution is to zero-initialize the nnsum_dos variable earlier
(before the goto 200 jump or at the file beginning).

While the crash looks scary, it is likely harmless since it crashes
almost at the end where all important data should be written anyway,
reporting nevertheless. 

Best regards
Pavel

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Help Request for making WIEN2K (ver18.2) programs executable.

2018-10-24 Thread Pavel Ondračka
It would be probably reasonable to add the "-ffpe-summary=none" to the 
default gfortran flags to not scare a new users (as this "issue") is being
brought up over and over again  (hint to prof. Blaha).




In fact, I tried quite hard to debug lot of those things and in general to
hunt down the uninitialized stuff and the remaining ones (at least in the 
common codepaths) should be harmless. IIRC the denormals in lapw0 and lapw2
comes from something like "exp(-big_number)"  which is completely OK and the
scary ones IEEE_INVALID_FLAG IEEE_DIVIDE_BY_ZERO in mixer come from lapack-
netlib/SRC/ieeeck.f function which is intentionally doing all the divide by
zero and infinity math to check if it can depend on the compliance with the
IEEE 754 standard.




Regarding the ifort vs gfortran, this is mostly a matter of personal taste.
gfrortan strictly adheres to the Fortran specification and this caused some
issues in the Wien2k code. gfortran is also somewhat slower but this is due
to much more conservative default flags. At -O2 gfortran won't do any SIMD
vectorization or loop unrolling in general, no link-time (interprocedural)
optimizations and it strictly adheres to IEEE compliance (and does a lot or
other stuff, like properly setting errno after library calls etc. which 
ifort does not do with the Wien2k flags). I can maybe someday write some 
blogpost which flags to use to get ifort-like behavior and speed.





Best regards

Pavel







-- Původní e-mail --
Od: Gavin Abo 
Komu: wien@zeus.theochem.tuwien.ac.at
Datum: 24. 10. 2018 4:19:56
Předmět: Re: [Wien] Help Request for making WIEN2K (ver18.2) programs
executable.
"Those gfortran warnings have been seen in symmetry [
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg17396.html
] and dstart [
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg17389.html
], which if I recall correctly were resolved by the fixes seen on the
updated page [ http://susi.theochem.tuwien.ac.at/reg_user/updates/ ]:

VERSION_18.1: 1.6.2018
SRC_dstart: fix of zamt initialization
SRC_symmetry: setting yvec,zvec=0

>From the above, you can see that uninitialized variables in the code
tend to be the cause of those type of warnings.  Apparently, ifort 
either handles them better (or ignores the issue).

It is not surprising and perhaps expected that those warnings might
appear in more programs than just lapw0.  As I mentioned before, WIEN2k
compiled with gfortran is less vetted and less maintained by the
developers [
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg18018.html
].  I suppose that is one drawback of using that free compiler. Though,
I suppose Intel's recent 2016 or newer ifort compiler standardization
changes breaking some of the WIEN2k code is perhaps not better in some
cases [
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg17542.html
].  So, pick your poison.

As mentioned on stackoverflow page at the link in the previous post at
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg17385.html ,

I think the statement there describes well the meaning of those warning 
messages:

They can be a "hint about numerical problems in your code, but it is not 
an error per se."

Adding "-ffpe-summary=none" to the compiler settings might suppress that
warning message when compiling with gfortran. However, I don't recommend 
doing that as it might suppress other more important warnings/errors. 
Someday in the future, 'maybe' the WIEN2k code could be reprogrammed to 
remove that warning message (which I think would be the proper way to
remove that message).

In summary, unless you can fix the code yourself to remove the error,
you would have the ignore them and continue on with your calculation
unless it results in something absurdly wrong.

P.S., Pavel has been good at gfortran debugging and resolving those,
which has been quite appreciated.  Though, I believe he is a user (not
developer) less obligated to help fix such problems.

On 10/23/2018 2:27 PM, Ashwani Kumar wrote:
> thank you, Program compiled successfully withour any error. I tried to 
> remove manually but still some error occurred, Followed Mr. Gavin's
> method (from previous thread) for LIBXC link to R_LIBS. Done
> successfully. But while running an example of TiC (to check everything 
> is fine), STDOUT file shows warning message (for lapw0 and lapw2) but 
> program executed without error. I checked makefile, makefile.orig (and 
> makefile.orig_14 also for lapw0) and found nothing suspicious.
> **

> in cycle 9    ETEST: .1542   CTEST: .0009143
> STOP  LAPW0 END
> Note: The following floating-point exceptions are signalling:
> IEEE_DENORMAL
> STOP  LAPW1 END
> STOP  LAPW2 END
> Note: The following floating-point exceptions are signalling:
> IEEE_DENORMAL
> STOP  CORE  END
> Note: The following floating-point exceptions are signalling:
> 

Re: [Wien] Fwd: Help Request for making WIEN2K (ver18.2) programs executable.

2018-10-23 Thread Pavel Ondračka

Dear Ashwani,

the problem is that the libxc modules are not installed in /usr/include on
Fedora (and some other distros). This is kinda stupid (but the rationale 
being that the mod files are not headers in the standard sense, but rather a
binary (compiler and arch dependent) files). The are in $(LIBDIR)/gfortran/
modules/ on Fedora. Unfortunately the siteconfig is not flexible enough to
allow you to specify this directory. Therefore it is not possible to compile
with libxc on Fedora currently without manually changing the SRC_lapw0
Makefile.





You would do best to remove the libxc altogether. But to be honest I don't
know how, to do that from siteconfig (It does not allow me to reset
LIBXCROOT to empty). Hence your best chance is to edit WIEN2k_OPTIONS file
manually and delete all the lines startings with current:LIBXC*  (or 
hopefully someone more experienced can advice how to reset LIBXCROOT to 
empty fro siteconfig) and regenerate the makefiles.





If you really need the libxc, set:

LIBXCROOT =  /usr/
LIBXC_FORTRAN = xcf03
LIBXC_LIBDNAME = lib64
LIBXC_LIBNAME = xc
and manually edit the

"  LIBXC_FOPT = -DLIBXC -I$(LIBXCROOT)include"

line in SRC_lapw0/Makefile to

" LIBXC_FOPT = -DLIBXC -I$(LIBXCROOT)/$(LIBXC_LIBDNAME)/gfortran/modules"


(BTW also check that you have libxc-devel package installed "dnf install 
libxc-devel")





Best regards

Pavel




BTW its a pity  that the siteconfig package doesn't use the most common way
of the package detection (i.e. the package config files). Nowadays all
packages such as fftw, elpa, OpenBLAS or libxc (with the scalapack being the
exception) have a proper package configs (at least upstream) and the
information like where to find the fortran modules (of other required
compile/link flags) can be found in them.



-- Původní e-mail --
Od: Ashwani Kumar 
Komu: wien@zeus.theochem.tuwien.ac.at
Datum: 23. 10. 2018 6:56:47
Předmět: [Wien] Fwd: Help Request for making WIEN2K (ver18.2) programs
executable.
"
Mr. Pavel, i have just noted down your point (and will imply once i start 
using WIEN2K and gets more comfortable with the code). SPEED MATTERS A LOT.





Thanks Mr. Gavin. Earlier issue solved.  Now lapw0 and lapw2 not executable
which i doubt is due to LIBXC (or may not). Your previous reply indicated 
not to use LIBXC. I re-installed everything fresh but LIBXC setting remains
there. please find the compile errors:


Compiling All Program: **
Compile time errors (if any) were:

SRC_lapw0/compile.msg:Fatal Error: Can't open module file ‘xc_f03_lib_m.mod’
for reading at (1): No such file or directory

SRC_lapw0/compile.msg:make[1]: *** [Makefile:170: inputpars.o] Error 1

SRC_lapw0/compile.msg:make: *** [Makefile:119: seq] Error 2




Check file compile.msg in the corresponding SRC_* directory for the

compilation log and more info on any compilation problem.


**


Compiling lapw0 alone : 
**
RC_lapw0 ...

if [ -f .parallel ]; then \

rm -f .parallel modules.o W2kinit.o fft_modules.o reallocate.o energy.o 
getff1.o getfft.o gtfnam.o lapw0.o outerr.o rean0.o rean3.o rean4.o setff1.o
setff2.o setfft.o xcpot1.o xcpot3.o eramps.o *.mod; \

fi

touch .sequential

make ./lapw0 FORT=gfortran FFLAGS=' -ffree-form -O2 -ffree-line-length-none
-DLIBXC -I/usr/include '

make[1]: Entering directory '/home/hardy/WIEN2K/SRC_lapw0'

make[1]: Circular pwxad4.o <- pwxad4.o dependency dropped.

gfortran -ffree-form -O2 -ffree-line-length-none -DLIBXC -I/usr/include -c
inputpars.F

inputpars.F:6:10:



use xc_f03_lib_m

1

Fatal Error: Can't open module file ‘xc_f03_lib_m.mod’ for reading at (1):
No such file or directory

compilation terminated.

make[1]: *** [Makefile:170: inputpars.o] Error 1

make[1]: Leaving directory '/home/hardy/WIEN2K/SRC_lapw0'

make: *** [Makefile:119: seq] Error 2

make: *** No rule to make target 'complex'. Stop.

Copying programs

WARNING: no executable found in SRC_lapw0. Check compile.msg in this
directory



done.



Compile time errors (if any) were:

SRC_lapw0/compile.msg:Fatal Error: Can't open module file ‘xc_f03_lib_m.mod’
for reading at (1): No such file or directory

SRC_lapw0/compile.msg:make[1]: *** [Makefile:170: inputpars.o] Error 1

SRC_lapw0/compile.msg:make: *** [Makefile:119: seq] Error 2


**

init_lapw is executing succesfully while run_lapw shows error **
*

/home/hardy/WIEN2K/lapw0: Command not found.
grep: lapw2*.error: No such file or directory

>   stop error






thanking you,

A. Kumar









___
Wien mailing list

[Wien] OpenMP parallelization in lapw0

2018-10-19 Thread Pavel Ondračka
Dear Wien2k mailing list,

there is some ongoing work to use OpenMP in lapw0 to provide another
level of parallelization in addition to the MPI (which parallelizes
over atoms).

The current version should be almost production ready and was already
tested with some standard stuff (LDA,PBE,sp,etc.) by myself and prof.
Blaha. I'm looking for some experienced Wien2k users for further
testing. If you are willing to help, just email me directly and I'll
provide the patches and further instructions.

The OpenMP version allows for efficient parallelization of small cases
where the MPI version is too heavy hammer (or in general for single
computer installations if you don't want to mess with the MPI). It can
be also quite effective for large clusters in combination with the MPI
(when there are more cores than atoms).

Best regards
Pavel

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] $DEC! NOOPTIMIZE equivalents in gfortran?

2018-08-09 Thread Pavel Ondračka
I can look at the gfortran, what is your testcase?

I tried to take a quick look with the full mixer using one random TiO2
case. I put a breakpoint after some random Kahan sum (specifically this
was at charge.f:150 in Wien2k 18.2) and I looked for the differences
between O0 and O2. I was actually looking for small differences, but
the value of sum was 0 with -O2 vs 739.29 with -O0!

Hence in this case it looks like either the different optimization
levels influence the program flow, or the optimizations caused the
shift of the breakpoint to some other place.  

It might also be possible that this is a gdb problem since there is a
lot of 
** On entry to DHSEQR parameter number  4 had an illegal value
** On entry to DGEBAL parameter number  3 had an illegal value
** On entry to DGEHRD  parameter number  2 had an illegal value
spam which I have no idea about and 

BTW valgrind is also not happy with the mixer (even at -O0 there are
lot of "Use of uninitialised value ... and On entry to DHSEQR parameter
number  4 had an illegal value )

If you can produce a simple testcase, I'd be happy to look into the
Kahan sum problem, but at the moment I can't reproduce with the full
mixer due to the aforementioned problems.

Best regards
Pavel

Laurence Marks píše v St 08. 08. 2018 v 11:44 -0500:
> I am testing adding the compiler directive !DEC$ NOOPTIMIZE to the
> Kahan summations in charge.f in order to prevent ifort from
> optimizing the summation away. It seems to help.
> 
> Does anyone know if there are equivalents in gfortran or other
> compilers? (I can't find anything for gfortran.)
> 
> N.B., if anyone has experience with directives and wants to suggest
> others that may be faster but will avoid optimizing away the
> summation I am open to suggestions.
> 
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] crash with -it

2018-07-13 Thread Pavel Ondračka
On Wed, 2018-07-11 at 11:21 +0200, Peter Blaha wrote:
> I've been able to reproduce the problem with gfortran, and also made
> a 
> fix, which according to my tests seems to work.
> 
> Please try the attached jacdavblock.F file.
> 
> This fix is not necessary if ifort is used, but should not harm
> either.
> 
> Regards

It works fine now, thanks for the fix.
Best regards
Pavel
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] compilation problems in the new pes module

2018-07-11 Thread Pavel Ondračka
On Wed, 2018-07-11 at 08:41 +0200, Peter Blaha wrote:
> PES is for valence PES.
> Ti 3s and 3p are "core" states (from the chemists point of view).
> 
> You should specify the PDOS as I said before: Use
> 
> configure_int
> 
> total
> 1 s,d
> 2 s,p
> 
> Regards

Thanks, it seems I misunderstood the terminology, e.g. in Wien2k
context I understood "core" state as states which are defined in
case.inc.

BTW even with the fixed DOS files, there are still some problems with
gfortran. Besides the few unitialized variables as indicated earlier 
the biggest problems are with the quad-precision. Lot of the
uninitialized values come from the fact that following code:

REAL*16,DIMENSION(1:5)  :: temp_data
READ(4,*,IOSTAT=io)(temp_data(n),n=1,5)
where the the file contains lines like this:
 100. 42.75000   1.998000  0.142 -1.219E-06
does not work, e.g. the values are read completely messed up:
(gdb) print temp_data
$6 = (0, 2.56176902595376107227e+2170, 2.01476099468585859245e+3117,
-5.32047756928494954508e-2034, )

IMO this should work (i.e. this is likely a gfortran problem).
Surprisingly if I try to isolate this to a simple test case it works,
so there is some subtle bug going on. 

Converting the code to doubles fixes this particular case. But this is
more of a workaround than a fix, I'll report when I have more idea what
is going on, thanks for all the help.

Best regards
Pavel
 
> 
> Am 10.07.2018 um 22:48 schrieb Pavel Ondračka:
> > -- Původní e-mail --
> > Od: Peter Blaha 
> > Komu: wien@zeus.theochem.tuwien.ac.at
> > Datum: 10. 7. 2018 22:01:12
> > Předmět: Re: [Wien] compilation problems in the new pes module
> > 
> > 
> > I guess your case.int (and thus the dos files) is wrong.
> > 
> > 
> > This is definitely possible ;-)
> > 
> > The output says:
> > 
> > Valence orbitals according to periodic table data:
> > Ti4s3d
> > O 2s2p
> > 
> > 
> > What about the Ti 3s, and Ti 3d? I can see them in my DOS (around
> > -55 
> > and -32eV as expected), are they too deep?
> > 
> > 
> > so we need the Ti 4s and 3d PDOS and O 2s,2p PDOS (and the
> > total DOS)
> > 
> > 
> > Thanks for the clarification, but I think I still quite don't get
> > this. 
> > According to you comment I need only the total DOS, Ti4s dos, Ti3d
> > dos, 
> > O2s dos and O2p dos?
> > 
> > The userguide says "You need to generate the partial DOS for ALL
> > atoms 
> > and ALL “chemical angular momenta” (eg. C-s and C-p; or Ti-s and
> > Ti-d) 
> > using the tetra program. This is also not very clear since contrary
> > to 
> > your comment it does not speak about the total DOS at all. And its
> > also 
> > not very clear to me if the "generate the partial DOS for ALL
> > atoms" 
> > belongs to the rest of the sentence, i.e. if I need also the O-
> > total dos 
> > and Ti-total dos in addition to the O-s, O-p, Ti-s, Ti-d?
> > 
> > 
> > Would you be so kind to share some example case as I believe it
> > might 
> > save some further explanation?
> > 
> > BTW can the [Bagheri and Blaha 2018] manuscript be already
> > accessed 
> > somewhere?
> > 
> > 
> > Best regards
> > 
> > Pavel Ondračka
> > 
> > 
> > 
> > 
> > You always have to define the "chemical valence orbitals", but
> > not all
> > possible PDOS.
> > 
> > 
> > 
> > 
> > 
> > ___
> > Wien mailing list
> > Wien@zeus.theochem.tuwien.ac.at
> > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> > SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.
> > theochem.tuwien.ac.at/index.html
> > 
> 
> 
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] compilation problems in the new pes module

2018-07-10 Thread Pavel Ondračka
-- Původní e-mail --
Od: Peter Blaha 
Komu: wien@zeus.theochem.tuwien.ac.at
Datum: 10. 7. 2018 22:01:12
Předmět: Re: [Wien] compilation problems in the new pes module 
"I guess your case.int (and thus the dos files) is wrong."



This is definitely possible ;-)

""
 
"The output says:

Valence orbitals according to periodic table data:
Ti4s3d
O 2s2p"



What about the Ti 3s, and Ti 3d? I can see them in my DOS (around -55 and -
32eV as expected), are they too deep?

"
so we need the Ti 4s and 3d PDOS and O 2s,2p PDOS (and the total DOS)"



Thanks for the clarification, but I think I still quite don't get this. 
According to you comment I need only the total DOS, Ti4s dos, Ti3d dos, O2s
dos and O2p dos?

The userguide says "You need to generate the partial DOS for ALL atoms and
ALL “chemical angular momenta” (eg. C-s and C-p; or Ti-s and Ti-d) using the
tetra program. This is also not very clear since contrary to your comment it
does not speak about the total DOS at all. And its also not very clear to me
if the "generate the partial DOS for ALL atoms" belongs to the rest of the
sentence, i.e. if I need also the O-total dos and Ti-total dos in addition
to the O-s, O-p, Ti-s, Ti-d?




Would you be so kind to share some example case as I believe it might save
some further explanation?

BTW can the [Bagheri and Blaha 2018] manuscript be already accessed
somewhere?




Best regards

Pavel Ondračka


"

You always have to define the "chemical valence orbitals", but not all
possible PDOS.
"""




___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] compilation problems in the new pes module

2018-07-10 Thread Pavel Ondračka
So after applying the suggested compilation fixes (+ the one
uninitialized variable fix suggested later) I started to test the pes
program. My testcase was simple anatase TiO2. The program finishes
fine, however the spectrum is strange (looks like the Ti3p peak is
almost invisible). So I run with valgrind to see if there are some
obvious errors.



==16577== Conditional jump or move depends on uninitialised value(s)
==16577==at 0x40DD43: read_database2_ (read_database2.f:26)
==16577==by 0x402CE3: MAIN__ (pes.f:151)
==16577==by 0x40176C: main (pes.f:3)
==16577==  Uninitialised value was created by a stack allocation
==16577==at 0x40DC5A: read_database2_ (read_database2.f:1)

ios is used uninitialized at 
read_database2.f:26  do while (ios == 0)

---

==22496== Conditional jump or move depends on uninitialised value(s)
==22496==at 0x41054E: abs_smooth (SPLINE.f:170)
==22496==by 0x41054E: setup_ (SPLINE.f:88)
==22496==by 0x41099C: spline_ (SPLINE.f:15)
==22496==by 0x40E791: read_database2_ (read_database2.f:111)
==22496==by 0x402CE3: MAIN__ (pes.f:151)
==22496==by 0x40176C: main (pes.f:3)
==22496==  Uninitialised value was created by a stack allocation
==22496==at 0x410980: spline_ (SPLINE.f:1)

this is the delta_x variable first defined in SPLINE and passed down to
setup in
SPLINE.f:15call  setup(delta_x,X,F,N,strt,stp,J,interpolation)
further down to abs_smooth at:
SPLINE.f:88  call abs_smooth(m4 - m3, delta_x, w1)
where it is finally used in:
SPLINE.f:170 if (x >= delta_x) then
without initialization

---

==29640== Conditional jump or move depends on uninitialised value(s)
==29640==at 0x40E3EA: unpolarized (read_database2.f:153)
==29640==by 0x40E3EA: read_database2_ (read_database2.f:79)
==29640==by 0x402CE3: MAIN__ (pes.f:151)
==29640==by 0x40176C: main (pes.f:3)
==29640==  Uninitialised value was created by a stack allocation
==29640==at 0x40DC5A: read_database2_ (read_database2.f:1)

This is the counter variable at 
read_database2.f:153do I=1,counter
even though is should initialized at
read_database2.f:40   counter   = 0   
the line was not reached at this point, since it is guarded by several
ifs so it is possible that in some cases the variable can stay
uninitialized. Should ever the counter be 0 at the read_database2.f:153
line?

--

==22325== Conditional jump or move depends on uninitialised value(s)
==22325==at 0x4801F7A0: __multf3 (in /usr/lib64/libgcc_s-8-
20180502.so.1)
==22325==by 0x41251F: unpolarized_ (read_database2.f:155)
==22325==by 0x412E4A: read_database2_ (read_database2.f:79)
==22325==by 0x403414: MAIN__ (pes.f:151)
==22325==by 0x40560A: main (pes.f:3)
==22325==  Uninitialised value was created by a stack allocation
==22325==at 0x4125C9: read_database2_ (read_database2.f:1)

This is the unitialized beta variable. Obviously similar to the above
warning we somehow missed the initialization block starting at
read_database2.f:46 and ending at read_database2.f:56 and there are
more troubles ahead (another 1000+ valgrind warnings).

At this moment I stopped since I suspect something went wrong earlier
and there is no reason to debug codepaths which should probably never
be reached. It is possible that I messed something in the
initialization or my input files are not in order...


The complete output looks like this:

 
 Please enter the excitation energy (ev):
1486.6
 nat:   2
 mult:   2   4
 aname :Ti O 
 ___
 Valence orbitals according to periodic table data:
 Ti4s3d
 O 2s2p
 ___
 opening   9 100k_7Rk_PBE.dos2ev
 opening  10 100k_7Rk_PBE.dos3ev
 
 Valence Partial orbital found in case.DOS file:
 Ti4s  1   
 Ti3d  1   
 O 2s  2   
 O 2p  2   
 
 Enter database (default: 1)
 1  Total cross section & Asymmetry & Non-dipole parameters 100-1
eV 
(Trzhaskovskaya etal., Unpolarized & linearly polarized X-ray
source)
Recommended option !
 2  Total cross section  10-1500eV 
(Yeh & Lindau 1985; Unpolarized X-ray source)
 3  No cross sections (for testing renormalization) 
1
 
 Enter Calulation Scheme (default: 1)
   1   Unpolarized X-ray source (general)
  
 Linearlly polarized X-ray source
   2   Dipole & NON Dipole - parallel
   3   Dipole  - perpendicular
   4   Dipole & NON Dipole - perpendicular
   5   LDAD
1
 __
Partial OrbitalCross section
Ti4s0.54346D-01
Ti3d0.54346D-01
O 2s0.54346D-01
O 2p0.54346D-01
 
 Continue with q_sphere?(Recommended)(Y/n)
y
Partial OrbitalAverage q_sphere
Ti4s 0.7378D-01

[Wien] crash with -it

2018-07-10 Thread Pavel Ondračka
Dear Wien2k mailing,

so I noticed that the old crash with -it (with gfortran only of course)
is still present in the 18.1 version:

At line 140 of file jacdavblock_tmp_.F (unit = 200, file =
'./case.storeHinv_1_proc_0')
Fortran runtime error: Sequential READ or WRITE not allowed after EOF
marker, possibly use REWIND or BACKSPACE

It was first reported here: https://www.mail-archive.com/wien@zeus.theo
chem.tuwien.ac.at/msg17338.html and some analysis was also provided by
Gavin https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg1
7343.html but it probably got forgotten since then. The workaround is
simply to use the -noHinv flag, hence this is not really critical, but
it would be nice to have it fixed for some future release anyway.

Best regards
Pavel

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] supported ELPA versions?

2018-07-10 Thread Pavel Ondračka
On Mon, 2018-07-09 at 14:20 +, Ruh, Thomas wrote:
> Dear Pavel,
> 
> we used the ELPA version 2015.11.001 for implementation. There this
> module is still present - apparently there was a change of the
> interface between this version and the newer ones.
> I will look into getting the newest ELPA working with WIEN2k,
> however, at the moment, I am having problems even compiling the
> newest ELPA version on our system. Did the installation work for you
> without problem?

Dear Thomas,
I had the 2017.05.003 and 2018.05.001 packages already available on my
system, thats why I tried with them in the first place. However a quick
compile of the 2018.05.001 version works fine (I tested only MPI
version, no OMP or GPU stuff though). What problems do you see?

> I will keep you updated regarding the usability of newer versions,
> but in the meantime you could use version 2015.11.001.

Thanks, 2015.11.001 compiles fine, but now I'm seeing some:

Operating system error: Cannot allocate memory
Allocation would exceed memory limit

which is strange since this is with the mpi-benchmark which doesn't
need that much memory in the first place (and I'm only starting testing
now with few processes on a single node and with >50GB free memory so
memory pressure seems unlikely, with scalapack only it uses <10% of
available memory. I checked the ulimit as an obvious suspect but it is
set to unlimited, will need to dug into the code to see how much memory
it actually wants...

Best regards
Pavel

> Kind regards,
> Thomas
> ________
> Von: Wien  im Auftrag von
> Pavel Ondračka 
> Gesendet: Montag, 09. Juli 2018 15:26
> An: wien@zeus.theochem.tuwien.ac.at
> Betreff: [Wien] supported ELPA versions?
> 
> Dear Wien2k mailing list,
> 
> what is the recommended ELPA version? I've tried the 2017.05.003 and
> 2018.05.001 versions with no luck (missing
> mod_blacs_infrastructure.mod
> file).
> 
> Best regards
> Pavel
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.th
> eochem.tuwien.ac.at/index.html
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.th
> eochem.tuwien.ac.at/index.html
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] compilation problems in the new pes module

2018-07-10 Thread Pavel Ondračka
Thanks for the fixes, the code compiles now. I've prepared a patch so
that other users don't have to patch by hand, and also for Gavin if he
continues the great work of collecting fixes in his repo. Copy to the
SRC_pes folder and apply with patch -p1 < pes-patch.txt
Best regards
Pavel


On Tue, 2018-07-10 at 08:33 +0200, Peter Blaha wrote:
> Thanks for the report. See inlined comments.
> 
> PS: Unfortunately, when I looked into the code, I saw it is in
> terrible 
> shape. It mixes real*4 up to real*16 variables randomly and has a
> couple 
> of unclear things in it (for instance just before calling spline
> 
> Peter Blaha
> 
> > I'm interested in the new pes module. Unfortunately, the
> > compilation of
> > the module faces some problems with gfortran, specifically:
> > 
> > -
> > 
> > pes.f:114:19:
> > 
> >   read (*,'(I)') database
> > 1
> > Error: Nonnegative width required in format string at (1)
> > pes.f:146:21:
> > 
> > read (*,'(i)') scheme
> >   1
> > Error: Nonnegative width required in format string at (1)
> > 
> > - This is nonstandard behavior, looking at the expected values it
> > should be probably I1 in both cases
> 
> 
> Yes I1 is fine.
> 
> > 
> > 
> > pes.f:235:39:
> > 
> >  500 format(A,A16,2x,A16,2x,<7>(A16,2x))
> > 1
> > Error: Unexpected element ‘<’ in format string at (1)
> > pes.f:239:42:
> > 
> >   600 format(f16.8,2x,e16.8,2x,<7>(e16.8,2x))
> >1
> > Error: Unexpected element ‘<’ in format string at (1)
> > ind_p.f:39:26:
> > 
> > 100 format(<15>A1)
> >1
> > Error: Unexpected element ‘<’ in format string at (1)
> > optimize_charge.f:239:21:
> > 
> >   1013 FORMAT(<3>A15)
> >   1
> > Error: Unexpected element ‘<’ in format string at (1)
> > 
> >   - Another nonstandard ifort specific stuff. Since the value is
> > constant the brackets are not needed anyway.
> 
> Yes, the "<" and ">" characters should simply be removed.
> 
> 
> > 
> > 
> > 
> > pes.f:266:22:
> > 
> > 800 format(4x,I)
> >1
> > Error: Non-negative width required in format string at (1)
> > optimize_charge.f:64:25:
> > 
> >1001 FORMAT(3x,A1,I)
> 
> This should be I3
> 
> 
> >   1
> > Error: Nonnegative width required in format string at (1)
> > read_dos.f:41:21:
> > 
> >   301 FORMAT (7x,I)
> 
> This should be I5
> 
> >   1
> > Error: Nonnegative width required in format string at (1)
> > read_dos.f:44:45:
> > 
> >400 format(4x,f10.5,10x,i3,10x,i8,20x,f)
> 
> should be f10.5
> 
> >   1
> > Error: Nonnegative width required in format string at (1)
> > 
> > - No idea here about the required width, but needs to be set too.
> > 
> > 
> > 
> > pes.f:279:26:
> > 
> >if((ERROR.eq.0).AND.(STR.eq.'#')) then
> 
> Yes, of course this should be STTR instead of STR
> 
> >1
> > Error: Operands of comparison operator ‘.eq.’ at (1) are
> > INTEGER(4)/CHARACTER(1)
> > 
> > -It looks like the STR is undefined, probably a typo (did author
> > want
> > STTR in the comparison)?
> > 
> > 
> > 
> > read_dos.f:51:36:
> > 
> > 600 format(f10.5,f14.8)
> 
> Should simply be: 600 format(f10.5,7f14.8)
> 
> >  1
> > Error: Unexpected element ‘<’ in format string at (1)
> > Find_p.f:46:25:
> > 
> >200 format(A1)
> 
> It should be 15A1
> 
> >   1
> > Error: Unexpected element ‘<’ in format string at (1)
> > Find_p.f:50:25:
> > 
> >300 format(A1)
> 
> Also here: 15A1
> 
> >   1
> > Error: Unexpected element ‘<’ in format string at (1)
> > 
> > - Can be rewritten with combination of internal output and string
> > formats.
> > 
> > for example:
> > write(Anumber,200)(temp(l),l=1,k-1)
> > 200 format(A1)
> > 
> > should be equivalent to
> > 
> > character(len=10) :: frmt
> > write(frmt,'("(",I0,"A1)")') j-1
> > write(Anumber,frmt)(temp(l),l=1,k-1)
> > 
> > 
> > 
> > optimize_charge.f:103:9:
> > 
> > IF(PCHECK(j).EQ. .FALSE.)THEN
> >   1
> > Error: Logicals at (1) must be compared with .eqv. instead of .eq.
> > optimize_charge.f:329:12:
> > 
> >   IF (CHECK.EQ..FALSE.) THEN
> >  1
> > Error: Logicals at (1) must be compared with .eqv. instead of .eq.
> > read_database2.f:68:5:
> > 
> >if (data_exist.eq..false.)then
> >   1
> > Error: Logicals at (1) must be compared with .eqv. instead of .eq.
> > 
> > - Use .eqv. as suggested.
> 
> Yes, in all these cases it should be   .eqv.
> 
> > 
> > -
> > 
> > SPLINE.f:15:14:
> > 
> >  call  setup(p0, p1, p2, p3,
> > 

[Wien] supported ELPA versions?

2018-07-09 Thread Pavel Ondračka
Dear Wien2k mailing list,

what is the recommended ELPA version? I've tried the 2017.05.003 and
2018.05.001 versions with no luck (missing mod_blacs_infrastructure.mod
file).

Best regards
Pavel
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


[Wien] compilation problems in the new pes module

2018-07-09 Thread Pavel Ondračka
Dear Wien2k mailing list,
I'm interested in the new pes module. Unfortunately, the compilation of
the module faces some problems with gfortran, specifically:

-

pes.f:114:19:

 read (*,'(I)') database
   1
Error: Nonnegative width required in format string at (1)
pes.f:146:21:

   read (*,'(i)') scheme
 1
Error: Nonnegative width required in format string at (1)

- This is nonstandard behavior, looking at the expected values it
should be probably I1 in both cases



pes.f:235:39:

500 format(A,A16,2x,A16,2x,<7>(A16,2x))
   1
Error: Unexpected element ‘<’ in format string at (1)
pes.f:239:42:

 600 format(f16.8,2x,e16.8,2x,<7>(e16.8,2x))
  1
Error: Unexpected element ‘<’ in format string at (1)
ind_p.f:39:26:

   100 format(<15>A1)
  1
Error: Unexpected element ‘<’ in format string at (1)
optimize_charge.f:239:21:

 1013 FORMAT(<3>A15)
 1
Error: Unexpected element ‘<’ in format string at (1)

 - Another nonstandard ifort specific stuff. Since the value is
constant the brackets are not needed anyway.



pes.f:266:22:

   800 format(4x,I)
  1
Error: Non-negative width required in format string at (1)
optimize_charge.f:64:25:

  1001 FORMAT(3x,A1,I)
 1
Error: Nonnegative width required in format string at (1)
read_dos.f:41:21:

 301 FORMAT (7x,I)
 1
Error: Nonnegative width required in format string at (1)
read_dos.f:44:45:

  400 format(4x,f10.5,10x,i3,10x,i8,20x,f)
 1
Error: Nonnegative width required in format string at (1)

- No idea here about the required width, but needs to be set too.



pes.f:279:26:

  if((ERROR.eq.0).AND.(STR.eq.'#')) then
  1
Error: Operands of comparison operator ‘.eq.’ at (1) are
INTEGER(4)/CHARACTER(1)

-It looks like the STR is undefined, probably a typo (did author want
STTR in the comparison)?



read_dos.f:51:36:

   600 format(f10.5,f14.8)
1
Error: Unexpected element ‘<’ in format string at (1)
Find_p.f:46:25:

  200 format(A1)
 1
Error: Unexpected element ‘<’ in format string at (1)
Find_p.f:50:25:

  300 format(A1)
 1
Error: Unexpected element ‘<’ in format string at (1)

- Can be rewritten with combination of internal output and string
formats.

for example:
write(Anumber,200)(temp(l),l=1,k-1)
200 format(A1)

should be equivalent to

character(len=10) :: frmt
write(frmt,'("(",I0,"A1)")') j-1
write(Anumber,frmt)(temp(l),l=1,k-1)



optimize_charge.f:103:9:

   IF(PCHECK(j).EQ. .FALSE.)THEN
 1
Error: Logicals at (1) must be compared with .eqv. instead of .eq.
optimize_charge.f:329:12:

 IF (CHECK.EQ..FALSE.) THEN
1
Error: Logicals at (1) must be compared with .eqv. instead of .eq.
read_database2.f:68:5:

  if (data_exist.eq..false.)then
 1
Error: Logicals at (1) must be compared with .eqv. instead of .eq.

- Use .eqv. as suggested.

-

SPLINE.f:15:14:

call  setup(p0, p1, p2, p3, delta_x,X,F,N,strt,stp,J,interpolation)
  1
Error: Explicit interface required for ‘setup’ at (1): allocatable
argument

- No idea here :-(

-

read_int.f:18:25:

  read(22,100),ndos
 1
Warning: Legacy Extension: Comma before i/o item list at (1)

Find_p.f:66:65:

   write(output_names(output_counter),500)
,aname(m),composition(m,n),m
 1
Warning: Legacy Extension: Comma before i/o item list at (1)

- Some unrelated harmless easy to fix warnings.
-

Most of the fixes are probably obvious except the missing length for
the read formats, where the proper fix requires some knowledge about
the input structuring and also the "Explicit interface required" stuff.

Best regards
Pavel

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] IEEE_UNDERFLOW_FLAG IEEE_DENORMAL

2018-05-29 Thread Pavel Ondračka
On Tue, 2018-05-29 at 13:58 +0200, Peter Blaha wrote:
> I'm working on the new distribution, WIEN2k_18, where this variable 
> should be initialized.
> 
> It should come out within the next days ...

Thanks a lot, I'm looking forward to the new release.

Best regards
Pavel
> 
> On 05/29/2018 12:36 PM, Pavel Ondračka wrote:
> > On Wed, 2018-05-09 at 14:17 -0500, Laurence Marks wrote:
> > > You are right, this is a bug (potentially severe). In my local
> > > version I replaced somm with a more standard integration as somm
> > > interpolates to the origin (which is not right).
> > 
> > Will there be a fix for this bug?
> > 
> > Best regards
> > Pavel
> > > 
> > > On Wed, May 9, 2018 at 2:11 PM, Pavel Ondračka  > > ail.
> > > cz> wrote:
> > > > -- Původní e-mail --
> > > > Od: Laurence Marks 
> > > > Komu: A Mailing list for WIEN2k users  > > > n.ac
> > > > .at>
> > > > Datum: 9. 5. 2018 18:18:21
> > > > Předmět: Re: [Wien] IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
> > > > 
> > > > > Zamt is set by somm, which is a slightly inaccurate Simpson
> > > > > summation (the accuracy does not matter). Hence it does not
> > > > > need
> > > > > to be set previously.
> > > > 
> > > > Right, it is indeed set in the somm subroutine. The problem is
> > > > that
> > > > the DA variable (the local name for zamt variable in the somm
> > > > subroutine) is set for the first time on line 11, while it is
> > > > used
> > > > (read) for the first time earlier on line 10 (and this is the
> > > > line
> > > > when the gcc and valgrind complains)! Hence when the subroutine
> > > > is
> > > > called for the first time, the results depends on read from
> > > > uninitialized memory.
> > > > Best regards
> > > > Pavel
> > > >   
> > > > > _
> > > > > Professor Laurence Marks
> > > > > "Research is to see what everybody else has seen, and to
> > > > > think
> > > > > what nobody else has thought", Albert Szent-Gyorgi
> > > > > www.numis.northwestern.edu
> > > > > 
> > > > > On Wed, May 9, 2018, 10:42 AM Pavel Ondračka  > > > > emai
> > > > > l.cz> wrote:
> > > > > > Laurence Marks píše v St 09. 05. 2018 v 11:51 +:
> > > > > > > This appears to be due to a silly approach in gfortran,
> > > > > > > and
> > > > > > 
> > > > > > almost
> > > > > > > certainly is not an error/problem and can be ignored --
> > > > > > > see h
> > > > > > 
> > > > > > ttps://urldefense.proofpoint.com/v2/url?u=https-
> > > > > > 3A__s=DwIGaQ=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6w
> > > > > > s=
> > > > > > U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0=R1lxmwh4Y3r4y
> > > > > > Rx1Z
> > > > > > yR4NN9mpSe7RuaT974qRm6Uhfw=KDobN5dacXxTk7OUpO1BdJBY45FBX3
> > > > > > 4Hf6
> > > > > > Q9hTempg0=
> > > > > > > tackoverflow.com/questions/44308577/ieee-underflow-flag-
> > > > > > > ieee-
> > > > > > > denormal-in-fortran-77.
> > > > > > > 
> > > > > > 
> > > > > > While I do agree that in most cases this is harmless, it
> > > > > > can
> > > > > > also
> > > > > > suggest a bug.
> > > > > > 
> > > > > > I only looked at dstart for the TiC and I have this
> > > > > > specific
> > > > > > example:
> > > > > > 
> > > > > > dstart for a TiC case prints a "Note: The following
> > > > > > floating-
> > > > > > point
> > > > > > exceptions are signalling: IEEE_DENORMAL" line. If you trap
> > > > > > this with
> > > > > > -ffpe-trap='denormal' flag, to inspect in gdb, the
> > > > > > offending
> > > > > > line is
> > > > > > then:
> > > > > > Program received signal SIGFPE, Arithmetic exception.
> > > > > > 0x0041dc48 in somm (dr=..., dp=...,
> > > > > > dpas=0.

Re: [Wien] IEEE_UNDERFLOW_FLAG IEEE_DENORMAL

2018-05-29 Thread Pavel Ondračka
On Wed, 2018-05-09 at 14:17 -0500, Laurence Marks wrote:
> You are right, this is a bug (potentially severe). In my local
> version I replaced somm with a more standard integration as somm
> interpolates to the origin (which is not right).

Will there be a fix for this bug?

Best regards
Pavel
> 
> On Wed, May 9, 2018 at 2:11 PM, Pavel Ondračka  cz> wrote:
> > -- Původní e-mail --
> > Od: Laurence Marks 
> > Komu: A Mailing list for WIEN2k users  > .at>
> > Datum: 9. 5. 2018 18:18:21
> > Předmět: Re: [Wien] IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
> > 
> > > Zamt is set by somm, which is a slightly inaccurate Simpson
> > > summation (the accuracy does not matter). Hence it does not need
> > > to be set previously.
> > 
> > Right, it is indeed set in the somm subroutine. The problem is that
> > the DA variable (the local name for zamt variable in the somm
> > subroutine) is set for the first time on line 11, while it is used
> > (read) for the first time earlier on line 10 (and this is the line
> > when the gcc and valgrind complains)! Hence when the subroutine is
> > called for the first time, the results depends on read from
> > uninitialized memory.
> > Best regards
> > Pavel
> >  
> > > _
> > > Professor Laurence Marks
> > > "Research is to see what everybody else has seen, and to think
> > > what nobody else has thought", Albert Szent-Gyorgi
> > > www.numis.northwestern.edu
> > > 
> > > On Wed, May 9, 2018, 10:42 AM Pavel Ondračka  > > l.cz> wrote:
> > > > Laurence Marks píše v St 09. 05. 2018 v 11:51 +:
> > > > > This appears to be due to a silly approach in gfortran, and
> > > > almost
> > > > > certainly is not an error/problem and can be ignored -- see h
> > > > ttps://urldefense.proofpoint.com/v2/url?u=https-
> > > > 3A__s=DwIGaQ=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws=
> > > > U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0=R1lxmwh4Y3r4yRx1Z
> > > > yR4NN9mpSe7RuaT974qRm6Uhfw=KDobN5dacXxTk7OUpO1BdJBY45FBX34Hf6
> > > > Q9hTempg0=
> > > > > tackoverflow.com/questions/44308577/ieee-underflow-flag-ieee-
> > > > > denormal-in-fortran-77.
> > > > > 
> > > > 
> > > > While I do agree that in most cases this is harmless, it can
> > > > also
> > > > suggest a bug.
> > > > 
> > > > I only looked at dstart for the TiC and I have this specific
> > > > example:
> > > > 
> > > > dstart for a TiC case prints a "Note: The following floating-
> > > > point
> > > > exceptions are signalling: IEEE_DENORMAL" line. If you trap
> > > > this with
> > > > -ffpe-trap='denormal' flag, to inspect in gdb, the offending
> > > > line is
> > > > then:
> > > > Program received signal SIGFPE, Arithmetic exception.
> > > > 0x0041dc48 in somm (dr=..., dp=...,
> > > > dpas=0.013585429144994965,
> > > > da=1.1588924125005173e-310, m=0, np=935) at somm.f:10
> > > > 10D1=DA+MM
> > > > 
> > > > so this line looks completely harmless and some prints show why
> > > > the
> > > > compiler notes about this:
> > > > 
> > > > (gdb) print DA
> > > > $1 = 1.1588924125005173e-310
> > > > (gdb) print MM
> > > > $2 = 1
> > > > 
> > > > i.e. we just add a really small number (denormal, since
> > > > DOUBLE_MIN is
> > > > around 1.8e-308) to 1, which is completely OK. But lets take a
> > > > look
> > > > where this incredibly small value comes from...
> > > > 
> > > > (gdb) up
> > > > #1  0x004143c6 in make_spheres (lcore=.FALSE., luse=7)
> > > > at
> > > > make_spheres.F:81
> > > > 81   call
> > > > somm(rat(1,ia),rhoat(1,ia),dx(ia),zamt,0,nptat(ia))
> > > > 
> > > > So this is the zamt variable, surprisingly grepping around for
> > > > zamt
> > > > finds nothing. As far as I can see it is not declared or
> > > > initialized
> > > > anywere in dstart. If I just missed something please correct
> > > > me! The
> > > > same goes for the zamt1 and zamt2.
> > > > 
> > > > Note that in this case we get lucky since the random memory
> > > > value i

Re: [Wien] Problems when trying to plot E vs c/a

2018-05-18 Thread Pavel Ondračka
Riyajul Islam píše v Pá 18. 05. 2018 v 19:25 +0530:
> I also have the same problem with E vs c/a plot. Then when I replace
> optimize.pl your attached one the I get an error
> Failed to exec /home/dipraj/wien2k/SRC_w2web/htdocs/exec/optimize.pl
> : Permission denied

Dear Riyajul,

you probably have wrong file permissions, try:
chmod +x /home/dipraj/wien2k/SRC_w2web/htdocs/exec/optimize.pl

Best regards
Pavel

> On 17 May 2018 at 17:00, Fecher, Gerhard  wrote:
> > Hallo Peter,
> > thanks for the files.
> > unforunately, the otimize.pl still doesn't show the result of the
> > fit (plot is there)
> > output is in a shortened version:
> > 
> > Fit of:  E = a1 + a2*x + a3*x^2 + a4*x^3 + a5*x^4
> > a1  1.000 
> > a2  0.000  1.000 
> > a3 -0.725 -0.000  1.000 
> > a4 -0.000 -0.930  0.000  1.000 
> > a5  0.648  0.000 -0.985 -0.000  1.000 
> > 
> > the line 174 should contain at least   tail -15(instead of -5) 
> >   what results in the output of the parameters and the correlation
> > matrix 
> > 
> > Fit of:  E = a1 + a2*x + a3*x^2 + a4*x^3 + a5*x^4
> > Final set of parametersAsymptotic Standard Error
> > =====
> > a1  = -5573.9  +/- 3.634e-06(6.519e-08%)
> > a2  = 4.23124e-06  +/- 9.205e-06(217.5%)
> > a3  = 0.000137795  +/- 2.93e-05 (21.26%)
> > a4  = 7.61902e-06  +/- 1.037e-05(136.1%)
> > a5  = -1.43164e-05 +/- 2.725e-05(190.3%)
> > 
> > correlation matrix of the fit parameters:
> > a1 a2 a3 a4 a5 
> > a1  1.000 
> > a2  0.000  1.000 
> > a3 -0.725 -0.000  1.000 
> > a4 -0.000 -0.930  0.000  1.000 
> > a5  0.648  0.000 -0.985 -0.000  1.000
> > 
> > or shorter versuion is to use  tail -15 fit.log  | head -7 
> >  because I don't think that the correlation matrix is needed in the
> > w2web output (it's found in fit.log anyway)
> > the result is then only
> >  
> > Fit of:  E = a1 + a2*x + a3*x^2 + a4*x^3 + a5*x^4
> > Final set of parametersAsymptotic Standard Error
> > =====
> > a1  = -5573.9  +/- 3.634e-06(6.519e-08%)
> > a2  = 4.23124e-06  +/- 9.205e-06(217.5%)
> > a3  = 0.000137795  +/- 2.93e-05 (21.26%)
> > a4  = 7.61902e-06  +/- 1.037e-05(136.1%)
> > a5  = -1.43164e-05 +/- 2.725e-05(190.3%)
> > 
> > the optimize.pl file changed in the latter way is attached
> > 
> > 
> > Ciao
> > Gerhard
> > 
> > DEEP THOUGHT in D. Adams; Hitchhikers Guide to the Galaxy:
> > "I think the problem, to be quite honest with you,
> > is that you have never actually known what the question is."
> > 
> > 
> > Dr. Gerhard H. Fecher
> > Institut of Inorganic and Analytical Chemistry
> > Johannes Gutenberg - University
> > 55099 Mainz
> > and
> > Max Planck Institute for Chemical Physics of Solids
> > 01187 Dresden
> > 
> > Von: Wien [wien-boun...@zeus.theochem.tuwien.ac.at] im Auftrag von
> > Peter Blaha [pbl...@theochem.tuwien.ac.at]
> > Gesendet: Donnerstag, 17. Mai 2018 12:32
> > An: wien@zeus.theochem.tuwien.ac.at
> > Betreff: Re: [Wien] Problems when trying to plot E vs c/a
> > 
> > Thanks for the report.
> > 
> > Modified   eplot_lapw
> > and
> > SRC_w2web/htdocs/exec/optimize.pl
> > 
> > attached.
> > 
> > On 05/16/2018 04:20 PM, Fecher, Gerhard wrote:
> > > Dear c/a fitters,
> > > This concerns the latest Wien2k version
> > > I receive only the content of
> > > test_opt.analysis
> > > when I try with w2web to plot E vs c/a
> > > but neither the result of the fit nor the plot are shown,
> > > this seems to be a problem with the present version of the
> > >  eplot
> > > script
> > >
> > > when I use the eplot script of version 14.2 it is nearly ok,
> > however,
> > > there are still two issues: instead of the result of the fit, the
> > "correlation matrix of the fit parameters" is shown
> > > and the figure is missing.
> > > Reason is that eplot and optimize.pl do not work well together:
> > >   optimize.pl
> > > prints the last 5 lines of fit.log (but the result is before
> > these lines) and expects the graph as case.c_over_a.png (but has a
> > different name .coa.)
> > >
> > > this can be solved by changing the two lines
> > > line 169change "CASE.c_over_a.png"
> > >   $umps = qx(cp $DIR/$CASE.coa.png $tempdir/$SID-$$.png);
> > > line 173change "tail -5"
> > >   $OUT .= qx(cd $DIR;echo '  ';echo "Fit of:  E = a1 +
> > a2*x + a3*x^2 + a4*x^3 + a5*x^4";tail -15 fit.log);
> > >
> > > or indeed, by changing eplot (I just did not find fast how to
> > supress the output of the correlation 

Re: [Wien] IEEE_UNDERFLOW_FLAG IEEE_DENORMAL

2018-05-09 Thread Pavel Ondračka
-- Původní e-mail --
Od: Laurence Marks 
Komu: A Mailing list for WIEN2k users 
Datum: 9. 5. 2018 21:18:20
Předmět: Re: [Wien] IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
"
You are right, this is a bug (potentially severe). In my local version I 
replaced somm with a more standard integration as somm interpolates to the
origin (which is not right).

 
"
Thanks for looking into this, hopefully the fix is not too hard. BTW if I 
provide similar analysis also for the other occurrences of floating point 
exceptions and/or valgrind uninitialized read/write errors, would you be 
willing to check whether those are harmless or real problems as well?




Best regards

Pavel

"

"___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] IEEE_UNDERFLOW_FLAG IEEE_DENORMAL

2018-05-09 Thread Pavel Ondračka
-- Původní e-mail --
Od: Laurence Marks <l-ma...@northwestern.edu>
Komu: A Mailing list for WIEN2k users <wien@zeus.theochem.tuwien.ac.at>
Datum: 9. 5. 2018 18:18:21
Předmět: Re: [Wien] IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
"
Zamt is set by somm, which is a slightly inaccurate Simpson summation (the
accuracy does not matter). Hence it does not need to be set previously.

"



Right, it is indeed set in the somm subroutine. The problem is that the DA
variable (the local name for zamt variable in the somm subroutine) is set 
for the first time on line 11, while it is used (read) for the first time 
earlier on line 10 (and this is the line when the gcc and valgrind
complains)! Hence when the subroutine is called for the first time, the 
results depends on read from uninitialized memory.

Best regards

Pavel


 
"


_
Professor Laurence Marks
"Research is to see what everybody else has seen, and to think what nobody
else has thought", Albert Szent-Gyorgi
www.numis.northwestern.edu(http://www.numis.northwestern.edu)




On Wed, May 9, 2018, 10:42 AM Pavel Ondračka <pavel.ondra...@email.cz
(mailto:pavel.ondra...@email.cz)> wrote:

"Laurence Marks píše v St 09. 05. 2018 v 11:51 +:
> This appears to be due to a silly approach in gfortran, and almost
> certainly is not an error/problem and can be ignored -- see https://
urldefense.proofpoint.com/v2/url?u=https-3A__s=DwIGaQ=yHlS04HhBraes5BQ9
ueu5zKhE7rtNXt_d012z2PA6ws=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0=R
1lxmwh4Y3r4yRx1ZyR4NN9mpSe7RuaT974qRm6Uhfw=KDobN5dacXxTk7OUpO1BdJBY45FBX34
Hf6Q9hTempg0=
(https://urldefense.proofpoint.com/v2/url?u=https-3A__s=DwIGaQ=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0=R1lxmwh4Y3r4yRx1ZyR4NN9mpSe7RuaT974qRm6Uhfw=KDobN5dacXxTk7OUpO1BdJBY45FBX34Hf6Q9hTempg0=)
> tackoverflow.com/questions/44308577/ieee-underflow-flag-ieee-
(http://tackoverflow.com/questions/44308577/ieee-underflow-flag-ieee-)
> denormal-in-fortran-77.
>

While I do agree that in most cases this is harmless, it can also
suggest a bug.

I only looked at dstart for the TiC and I have this specific example:

dstart for a TiC case prints a "Note: The following floating-point
exceptions are signalling: IEEE_DENORMAL" line. If you trap this with
-ffpe-trap='denormal' flag, to inspect in gdb, the offending line is
then:
Program received signal SIGFPE, Arithmetic exception.
0x0041dc48 in somm (dr=..., dp=..., dpas=0.013585429144994965,
da=1.1588924125005173e-310, m=0, np=935) at somm.f:10
10            D1=DA+MM

so this line looks completely harmless and some prints show why the
compiler notes about this:

(gdb) print DA
$1 = 1.1588924125005173e-310
(gdb) print MM
$2 = 1

i.e. we just add a really small number (denormal, since DOUBLE_MIN is
around 1.8e-308) to 1, which is completely OK. But lets take a look
where this incredibly small value comes from...

(gdb) up
#1  0x004143c6 in make_spheres (lcore=.FALSE., luse=7) at
make_spheres.F:81
81               call
 somm(rat(1,ia),rhoat(1,ia),dx(ia),zamt,0,nptat(ia))

So this is the zamt variable, surprisingly grepping around for zamt
finds nothing. As far as I can see it is not declared or initialized
anywere in dstart. If I just missed something please correct me! The
same goes for the zamt1 and zamt2.

Note that in this case we get lucky since the random memory value is
effectively zero, however this might in my opinion lead to problems if
you hit random memory with another value.
In fact running the dstart in vagrind shows this as well and the
terminal is spammed with "Conditional jump or move depends on
uninitialised value(s)" and "Use of uninitialised value of size 8".

IMO this is a bug, so either the line needs to be changed to
somm(rat(1,ia),rhoat(1,ia),dx(ia),0,0,nptat(ia))
or the zamt variable needs to be declared and initialized somewhere.
But I actually have no idea about the physical meaning of the code so
please correct me if I just missed something.

Best regards
Pavel
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at(mailto:Wien@zeus.theochem.tuwien.ac.at)
https://urldefense.proofpoint.com/v2/url?u=http-3A__zeus.theochem.tuwien.ac.
at_mailman_listinfo_wien=DwIGaQ=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA
6ws=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0=R1lxmwh4Y3r4yRx1ZyR4NN9
mpSe7RuaT974qRm6Uhfw=_8Uru9zRH580QrhtgA9HnU1x81x6paXIOqJCFnOzZME=
(https://urldefense.proofpoint.com/v2/url?u=http-3A__zeus.theochem.tuwien.ac.at_mailman_listinfo_wien=DwIGaQ=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0=R1lxmwh4Y3r4yRx1ZyR4NN9mpSe7RuaT974qRm6Uhfw=_8Uru9zRH580QrhtgA9HnU1x81x6paXIOqJCFnOzZME=)
SEARCH the MAILING-LIST at:  https://urldefense.proofpoint.com/v2/url?u=http
-3A__www.mail-2Darchive.com_wien-40zeus.theochem.tuwien.ac.at_index.html=
Dw

Re: [Wien] IEEE_UNDERFLOW_FLAG IEEE_DENORMAL

2018-05-09 Thread Pavel Ondračka
Laurence Marks píše v St 09. 05. 2018 v 11:51 +:
> This appears to be due to a silly approach in gfortran, and almost
> certainly is not an error/problem and can be ignored -- see https://s
> tackoverflow.com/questions/44308577/ieee-underflow-flag-ieee-
> denormal-in-fortran-77.
> 

While I do agree that in most cases this is harmless, it can also
suggest a bug.

I only looked at dstart for the TiC and I have this specific example:

dstart for a TiC case prints a "Note: The following floating-point
exceptions are signalling: IEEE_DENORMAL" line. If you trap this with
-ffpe-trap='denormal' flag, to inspect in gdb, the offending line is
then:
Program received signal SIGFPE, Arithmetic exception.
0x0041dc48 in somm (dr=..., dp=..., dpas=0.013585429144994965,
da=1.1588924125005173e-310, m=0, np=935) at somm.f:10
10D1=DA+MM

so this line looks completely harmless and some prints show why the
compiler notes about this:

(gdb) print DA
$1 = 1.1588924125005173e-310
(gdb) print MM
$2 = 1

i.e. we just add a really small number (denormal, since DOUBLE_MIN is
around 1.8e-308) to 1, which is completely OK. But lets take a look
where this incredibly small value comes from...

(gdb) up
#1  0x004143c6 in make_spheres (lcore=.FALSE., luse=7) at
make_spheres.F:81
81   call
somm(rat(1,ia),rhoat(1,ia),dx(ia),zamt,0,nptat(ia))

So this is the zamt variable, surprisingly grepping around for zamt
finds nothing. As far as I can see it is not declared or initialized
anywere in dstart. If I just missed something please correct me! The
same goes for the zamt1 and zamt2.

Note that in this case we get lucky since the random memory value is
effectively zero, however this might in my opinion lead to problems if
you hit random memory with another value.
In fact running the dstart in vagrind shows this as well and the
terminal is spammed with "Conditional jump or move depends on
uninitialised value(s)" and "Use of uninitialised value of size 8".

IMO this is a bug, so either the line needs to be changed to 
somm(rat(1,ia),rhoat(1,ia),dx(ia),0,0,nptat(ia))
or the zamt variable needs to be declared and initialized somewhere.
But I actually have no idea about the physical meaning of the code so
please correct me if I just missed something.

Best regards
Pavel
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Installation with MPI and GNU compilers

2018-05-03 Thread Pavel Ondračka
Laurence Marks píše v St 02. 05. 2018 v 21:17 +:
> When you say "as fast" do you mean for single core machines or
> multicore with threads and/or mpi? Almost everything slow in Wien2k
> is lapack/scalapack/elpa. For most parts of the code with 30-200 atom
> problems ifort is good but not as critical as the libraries and
> network.

Yeah I was mostly talking about the normal/highend PCs, not HPC
clusters which always have the ifort + MKL set up. I think that the
gfortran + MKL / gfortran + OpenBLAS setups are mostly relevant for
people new to Wien2k who just want to try it or run some simple/medium
sized cases (or people like myself who prefer open source software).

Personally I think it is quite hard for new users to get the Wien2k
running properly, in a perfect world you would just tell user to
install appropriate packages with apt-get/dnf/whatever install openblas
fftw elpa etc... and have the siteconfig autodetect it via the usual
means (pkgconfig).

I have few ideas for siteconfig improvements, will try to come up with
some patches, but I'll start a new thread for that later.

Best regards
Pavel 

> On Wed, May 2, 2018, 16:05 Pavel Ondračka <pavel.ondra...@email.cz>
> wrote:
> > -- Původní e-mail --
> > Od: Fecher, Gerhard <fec...@uni-mainz.de>
> > Komu: Pavel Ondračka <pavel.ondra...@email.cz>, wien@zeus.theochem.
> > tuwien.ac.at <wien@zeus.theochem.tuwien.ac.at>
> > Datum: 2. 5. 2018 16:08:06
> > Předmět: AW: [Wien] Installation with MPI and GNU compilers 
> > > Dear Pavel, 
> > > maybe it's better to ask Laurence, seems he was writing the VML
> > > things. 
> > > 
> > > I didn't look into the code within the last years, what I found
> > > on a fast look is: 
> > > 
> > > The only place where the INTEL_VML is used any longer seems to be
> > > in Hamilt.f of LAPW1 
> > > I found that it is commented in all other cases where it was once
> > > used. 
> > > 
> > > If you don't use INTEL_VML, the INTEL ifort will vectorice the
> > > loops in vectf.f of LAPW1 (see code in Hamilt.f that calls it) 
> > > (as I mentioned, maybe one has to link the libsvml explicitely
> >  
> > 
> >  
> > 
> > BTW is svml part of the MKL or do you need the ifort for that? 
> > > For example 
> > > -O2 -xHost -qopt-report=1 -qopt-report-phase=vec 
> > > will show you which loops were vectorized
> > 
> > Indeed, if I add the -O2 and -xHost to the default Wien2k flags
> > (with ifort and MKL) there is no performance hit if I remove the
> > -DINTEL_VML.
> > 
> > > I could not see that the svml has a reduced accuracy, however,
> > > you can set the performance/accuracy level in the VML. 
> > > What you can do is to set a threshhold for the loop size (similar
> > > to unroll), might need some short study of the manual.
> > 
> > Interesting, I will try to run some tests for the speed and
> > accuracy of some basic trigonometric functions for ifort vs
> > gfortran and standard glibc vs libmvec vs VML vs svml.
> > > I could not see that in W2kinit.F a threshold for the loops (size
> > > of the arrays) was set, 
> > > only the precision was set there for the INTEL_VML script,
> > > however, 
> > > I guess that Laurence used it where only large arrays appeared. 
> > > 
> > > NB: I enjoy more questions about how to increase the speed or how
> > > to improve the code.
> > 
> > Well,  I do believe that the code is well optimized when you have
> > the ifort + MKL, however the rest of the options is a somewhat
> > worse.
> > 
> > Since you can nowadays get the MKL library for free (but not the
> > ifort) there is the combination of gfortran + MKL, which does not
> > have any default config  and is slow as was reported by Rui in
> > beginning of the thread. I'm quite sure this combination can be
> > made almost as fast as the ifort + MKL (either by somewhat fixing
> > the INTEL_VML define to fix the missing ifcore problem, or possibly
> > by using the -mveclibabi=svml gfortran switch or some other trick).
> > I'm not sure how many people have this setup though. 
> > 
> > The most problematic is the gfortran + OpenBLAS combination, where
> > I was not able to force gfortran use the vectorized (SIMD) math. It
> > works with C code (which is why my approach to making lapw1 fast
> > includes porting the vectf.f to C) but not with Fortran. It is
> > possible there is some way to make this work but I had no luck so
> > far. The libmvec has a public interface so it migh

Re: [Wien] Installation with MPI and GNU compilers

2018-05-02 Thread Pavel Ondračka

-- Původní e-mail --
Od: Fecher, Gerhard <fec...@uni-mainz.de>
Komu: Pavel Ondračka <pavel.ondra...@email.cz>, wien@zeus.theochem.tuwien.
ac.at <wien@zeus.theochem.tuwien.ac.at>
Datum: 2. 5. 2018 16:08:06
Předmět: AW: [Wien] Installation with MPI and GNU compilers
"Dear Pavel,
maybe it's better to ask Laurence, seems he was writing the VML things. 

I didn't look into the code within the last years, what I found on a fast 
look is:

The only place where the INTEL_VML is used any longer seems to be in Hamilt.
f of LAPW1
I found that it is commented in all other cases where it was once used. 

If you don't use INTEL_VML, the INTEL ifort will vectorice the loops in 
vectf.f of LAPW1 (see code in Hamilt.f that calls it)
(as I mentioned, maybe one has to link the libsvml explicitely """
 

""
BTW is svml part of the MKL or do you need the ifort for that?

"
For example
-O2 -xHost -qopt-report=1 -qopt-report-phase=vec
will show you which loops were vectorized"



Indeed, if I add the -O2 and -xHost to the default Wien2k flags (with ifort
and MKL) there is no performance hit if I remove the -DINTEL_VML.



"I could not see that the svml has a reduced accuracy, however, you can set
the performance/accuracy level in the VML.
What you can do is to set a threshhold for the loop size (similar to
unroll), might need some short study of the manual. "



Interesting, I will try to run some tests for the speed and accuracy of some
basic trigonometric functions for ifort vs gfortran and standard glibc vs 
libmvec vs VML vs svml.

"
I could not see that in W2kinit.F a threshold for the loops (size of the 
arrays) was set,
only the precision was set there for the INTEL_VML script, however,
I guess that Laurence used it where only large arrays appeared.

NB: I enjoy more questions about how to increase the speed or how to improve
the code. "



Well,  I do believe that the code is well optimized when you have the ifort
+ MKL, however the rest of the options is a somewhat worse.




Since you can nowadays get the MKL library for free (but not the ifort) 
there is the combination of gfortran + MKL, which does not have any default
config  and is slow as was reported by Rui in beginning of the thread. I'm
quite sure this combination can be made almost as fast as the ifort + MKL 
(either by somewhat fixing the INTEL_VML define to fix the missing ifcore 
problem, or possibly by using the -mveclibabi=svml gfortran switch or some
other trick). I'm not sure how many people have this setup though.





The most problematic is the gfortran + OpenBLAS combination, where I was not
able to force gfortran use the vectorized (SIMD) math. It works with C code
(which is why my approach to making lapw1 fast includes porting the vectf.f
to C) but not with Fortran. It is possible there is some way to make this 
work but I had no luck so far. The libmvec has a public interface so it 
might be possible to call it directly similarly to the VML, however it would
introduce a lot of #ifdef LIBMVEC to the code which I guess is not a good 
idea. I would like to have this working better out of the box so I'll keep
looking for some solution which would not require extensive changes in the
code or siteconfig script. Dunno if the authors are accepting patches
anyway...





Best regards

Pavel


 
"
Ciao
Gerhard

DEEP THOUGHT in D. Adams; Hitchhikers Guide to the Galaxy:
"I think the problem, to be quite honest with you,
is that you have never actually known what the question is."


Dr. Gerhard H. Fecher
Institut of Inorganic and Analytical Chemistry
Johannes Gutenberg - University
55099 Mainz
and
Max Planck Institute for Chemical Physics of Solids
01187 Dresden

Von: Pavel Ondračka [pavel.ondra...@email.cz]
Gesendet: Mittwoch, 2. Mai 2018 12:05
An: Fecher, Gerhard
Betreff: Re: [Wien] Installation with MPI and GNU compilers

I'm using private answer since this might be getting too technical for
the list and in fact not interesting for majority of users...

Fecher, Gerhard píše v St 02. 05. 2018 v 09:00 +:
> I never checked that: does the -DINTEL_VML switch correspond to the
> VML library routines of MKL
> or to the
> SVML library routines of the compiler

The lapw1 calls directly the VML library, for example the vdcos, vdsin
functions, but I have not checked the rest of Wien2k.

> this makes a difference, the svml routines are automatically invoked
> by the INTEL compiler if one uses -O2 optimization or higher.
> (check also the usage of the switches -vec, -no-vec, -vec-report)
>
> The VML routines of the MKL make only sense for appropriate sizes of
> the vectors, otherwise, they may even slow down the program (how much 
> might also depend on threads etc.).

The common usage of the VML in Wien2k is to call the VML functions with 
a _larg

Re: [Wien] Installation with MPI and GNU compilers

2018-05-02 Thread Pavel Ondračka
Rui Costa píše v Po 30. 04. 2018 v 22:24 +0100:
> I have the VML libraries, i.e., the libmkl_vml_* files are in
> $MKLROOT/lib/intel_64, but when I tried compiling with -DINTEL_VML it
> gave me the error "Fatal Error: Can't open module file ‘ifcore.mod’
> for reading at (1): No such file or directory", and this file only
> comes with the compilers.

Yeah, I have not realized that the INTEL_VML ifdef also guards the use
of ifcore stuff, IMO this could be improved by using two defines, one
for the actual VML calls (which would be defined when MKL is present)
and one for the ifcore library calls (which would be defined only when
also the ifort is detected).

BTW as a quick hack to make the lapw1 fast, just change all the
#if defined (INTEL_VML)
lines in SRC_lapw1/hamilt.F
to 
#if defined (INTEL_VML_HAMILT)
and add the -DINTEL_VML_HAMILT flag
this should be all that is needed to use the VML in lapw1

> To use the libmvec library I would have to change a few lines of code
> in the mkl libraries and that is beyond my computer skills.

Actually no changes to the MKL are required. The least obtrusive way as
described in https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.
at/msg16159.html only consist of copying single c file to SRC_lapw1
compiling it by hand and than rerunning make to link lapw1 with the new
object file (i.e. no changes to any Wien2k files are needed). However
the VML way is easier when you already have the MKL set up.

Best regards,
Pavel

> Best regards,
> Rui Costa.
> 
> On 30 April 2018 at 20:57, Pavel Ondračka <pavel.ondra...@email.cz>
> wrote:
> > -- Původní e-mail --
> > Od: Rui Costa <ruicosta@gmail.com>
> > Komu: A Mailing list for WIEN2k users <w...@zeus.theochem.tuwien.ac
> > .at>
> > Datum: 30. 4. 2018 19:39:44
> > Předmět: Re: [Wien] Installation with MPI and GNU compilers
> > 
> > > I was able to install wien2k with gfortran+MKL. Apparently the
> > > MKL libraries are free [https://software.intel.com/en-us/performa
> > > nce-libraries] but not the compilers.
> > > 
> > > While doing the benchmark tests we noticed that during the Hamilt
> > > there was a huge difference between this and an ifort+MKL
> > > compilation, and as Pavel said, this comes from the VML
> > > functions. This is not the case during DIAG because while the
> > > DIAG belongs to MKL, Hamilt is from wien2k. I then tried to
> > > compile with these VML functions but I couldn't because I need an
> > > ifcore.mod file that comes with intel compilers I think, at least
> > > it is not in the free MKL version.
> > > 
> > > Do you have any recommendation about the compilation options that
> > > could better optimize wien2k?
> > 
> > Dear Rui,
> > 
> > so to make this clear, your MKL comes without the VML, or are you
> > just not able to use/link them? I do not understand the part with
> > the ifcore.mod much, however the VML paths are guarded with some
> > ifdef magic, try adding  -DINTEL_VML to your flags (FOPT, FPOPT)
> > and see if it helps. 
> > 
> > The second option is to use the libmvec library (provided you have
> > fairly new glibc) but it is unsupported by the Wien2k team and
> > probably not tested by many people except me. If you cannot get the
> > VML working, look for older emails discussing libmvec or contact me
> > privately and I can give you some pointers. 
> > 
> > No idea about the -it problem though.
> > 
> > Best regards
> > Pavel
> > 
> > ___
> > Wien mailing list
> > Wien@zeus.theochem.tuwien.ac.at
> > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> > SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.
> > theochem.tuwien.ac.at/index.html
> > 
> 
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.th
> eochem.tuwien.ac.at/index.html
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Installation with MPI and GNU compilers

2018-04-30 Thread Pavel Ondračka
-- Původní e-mail --
Od: Rui Costa 
Komu: A Mailing list for WIEN2k users 
Datum: 30. 4. 2018 19:39:44
Předmět: Re: [Wien] Installation with MPI and GNU compilers
"
I was able to install wien2k with gfortran+MKL. Apparently the MKL libraries
are free [https://software.intel.com/en-us/performance-libraries
(https://software.intel.com/en-us/performance-libraries)] but not the
compilers.



While doing the benchmark tests we noticed that during the Hamilt there was
a huge difference between this and an ifort+MKL compilation, and as Pavel 
said, this comes from the VML functions. This is not the case during DIAG 
because while the DIAG belongs to MKL, Hamilt is from wien2k. I then tried
to compile with these VML functions but I couldn't because I need an ifcore.
mod file that comes with intel compilers I think, at least it is not in the
free MKL version.




Do you have any recommendation about the compilation options that could 
better optimize wien2k?



"



Dear Rui,




so to make this clear, your MKL comes without the VML, or are you just not
able to use/link them? I do not understand the part with the ifcore.mod 
much, however the VML paths are guarded with some ifdef magic, try adding  -
DINTEL_VML to your flags (FOPT, FPOPT) and see if it helps.





The second option is to use the libmvec library (provided you have fairly 
new glibc) but it is unsupported by the Wien2k team and probably not tested
by many people except me. If you cannot get the VML working, look for older
emails discussing libmvec or contact me privately and I can give you some 
pointers.





No idea about the -it problem though.





Best regards

Pavel
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Installation with MPI and GNU compilers

2018-04-05 Thread Pavel Ondračka
Laurence Marks píše v St 04. 04. 2018 v 16:01 +:
> I confess to being rather doubtful that gfortran+... is comparable to
> ifort+... for Intel cpu, it might be for AMD. While the mkl vector
> libraries are useful in a few codes such as aim, they are minor for
> the main lapw[0-2].

Well, some fast benchmark data then (serial benchmark single core):
Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz (haswell)
Wien2k 17.1

-

gfortran 7.3.1 + OPENBLAS 0.2.20 + glibc 2.26 (with the custom patch to
use libmvec):

Time for al,bl(hamilt, cpu/wall) :  0.2 0.2
Time for legendre (hamilt, cpu/wall) :  0.1 0.2
Time for phase(hamilt, cpu/wall) :  1.2 1.2
Time for us   (hamilt, cpu/wall) :  1.2 1.2
Time for overlaps (hamilt, cpu/wall) :  2.6 2.8
Time for distrib  (hamilt, cpu/wall) :  0.1 0.1
Time sum iouter   (hamilt, cpu/wall) :  5.5 5.8
 number of local orbitals, nlo (hamilt)  304
   allocate YL   2.5
MB  dimensions15  3481 3
   allocate phsc 0.1 MB  dimensions  3481
Time for los  (hamilt, cpu/wall) :  0.4 0.3
Time for alm (hns) :  0.1
Time for vector  (hns) :  0.3
Time for vector2 (hns) :  0.3
Time for VxV (hns) :  2.1
Wall Time for VxV(hns) :  0.1
 245  Eigenvalues computed 
 Seclr4(Cholesky complete (CPU)) :   1.380 40754.14
Mflops
 Seclr4(Transform to eig.problem (CPU)) :4.470 37745.44
Mflops
 Seclr4(Compute eigenvalues (CPU)) :12.750 17643.13
Mflops
 Seclr4(Backtransform (CPU)) :   0.290 10237.08
Mflops
   TIME HAMILT (CPU)  = 5.8, HNS = 2.5, HORB = 0.0,
DIAG =18.9
   TIME HAMILT (WALL) = 6.1, HNS = 2.5, HORB = 0.0,
DIAG =19.0

real0m28.610s
user0m27.817s
sys 0m0.394s

---

Ifort 17.0.0 + MKL 2017.0:

Time for al,bl(hamilt, cpu/wall) :  0.2 0.2
Time for legendre (hamilt, cpu/wall) :  0.1 0.2
Time for phase(hamilt, cpu/wall) :  1.2 1.3
Time for us   (hamilt, cpu/wall) :  1.0 1.0
Time for overlaps (hamilt, cpu/wall) :  2.6 2.8
Time for distrib  (hamilt, cpu/wall) :  0.1 0.1
Time sum iouter   (hamilt, cpu/wall) :  5.4 5.6
 number of local orbitals, nlo (hamilt)  304
   allocate YL   2.5
MB  dimensions15  3481 3
   allocate phsc 0.1 MB  dimensions  3481
Time for los  (hamilt, cpu/wall) :  0.2 0.2
Time for alm (hns) :  0.0
Time for vector  (hns) :  0.4
Time for vector2 (hns) :  0.4
Time for VxV (hns) :  2.1
Wall Time for VxV(hns) :  0.1
 245  Eigenvalues computed 
 Seclr4(Cholesky complete (CPU)) :   1.110 50667.31
Mflops
 Seclr4(Transform to eig.problem (CPU)) :3.580 47129.09
Mflops
 Seclr4(Compute eigenvalues (CPU)) :11.320 19873.04
Mflops
 Seclr4(Backtransform (CPU)) :   0.250 11875.01
Mflops
   TIME HAMILT (CPU)  = 5.7, HNS = 2.6, HORB = 0.0,
DIAG =16.3
   TIME HAMILT (WALL) = 5.9, HNS = 2.6, HORB = 0.0,
DIAG =16.3

real0m25.587s
user0m24.857s
sys 0m0.321s
-

So I apologize for my statement in the last email that was too
ambitious. Indeed in this particular case the opensource stack is ~12%
slower (25 vs 28 seconds). Most of this is in the DIAG part (which I
believe is where OpenBLAS comes to play). However on some other (older)
Intel CPUs the DIAG part can be even faster with OpenBLAS, see the
already mentioned email by prof. Blaha https://www.mail-archive.com/wie
n...@zeus.theochem.tuwien.ac.at/msg15106.html where he tested on i7-3930K
(sandybridge), hence for those older CPUs I would expect the
performance to be really comparable (with the small patch to utilize
the libmvec in order to speed up the HAMILT part).

In general the opensource support is usually slow to materialize hence
the performance on older CPUs is better. Especially in the OpenBLAS
where the optimizations for new CPUs and instruction sets are not
provided by Intel (contrary to the gcc, gfrortran and glibc where Intel
engineers contribute directly) while the MKL and ifort have good
support from day 1.

I do agree that it is better to advise users to use MKL+ifort since
when they have it properly installed the siteconfig is almost always
able to detect and build everything out of the box with default config.
This is unfortunately not the case with the opensource libraries, where
the detection does not work most of time due to distro differences and
the unfortunate fact that majority of the needed libraries does not
provide any good means for autodetection 

Re: [Wien] Installation with MPI and GNU compilers

2018-04-04 Thread Pavel Ondračka
Rui Costa píše v St 04. 04. 2018 v 14:21 +0100:
> I will see what I can do about the Intel compilers. I've had a
> question about this, supposedly the intel compilers are the fastest
> [https://www.mail-
> archive.com/wien@zeus.theochem.tuwien.ac.at/msg13021.html], but how
> much faster are they than the others? I expect this to vary from case
> to case but on average, how much faster are they?

In fact the compiler (e.g. ifort vs gfortran) hardly makes a difference
. The important part are the algebra libraries. The opensource OpenBLAS
should be almost identical to Intels MKL (see https://www.mail-archive.
com/wien@zeus.theochem.tuwien.ac.at/msg15106.html for comparison of
OenBLAS vs MKL). However in this old benchmark the opensource stack is
still quite slower since the MKL also provides the VML library for
vectorized math functions, which did not had any open source
alternative for a long time. Recently there is the libmvec library
which provides such functions (you need recent glibc), but there is no
official Wien2k support for this. However it is actually quite easy to
get it working (see https://www.mail-archive.com/wien@zeus.theochem.tuw
ien.ac.at/msg16159.html ).
Hence if you use the gfortran + OpenBLAS + libmvec the performance is
virtually identical to ifort + MKL + VML. The setup is somewhat more
difficult though.

Best regards
Pavel

> My objective is not to do simulations with mpi in the computer that
> I'm trying to install but to figure out how to install wien2k with
> mpi and then give some guidelines to the IT technician. I spent two
> weeks telling them that the simulations were not running because the
> packages were not compiled and in the end everything was poorly
> installed.
> 
> Thank you for your help.
> 
> Best regards,
> Rui Costa.
> 

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


  1   2   >