Hello Gerhard and Peter,

I am using ifx 2025.1.1 and I also read that OpenMP reductions cause a segfault using Intel compilers. They recommend serializing the loops or removing the line that performs the reduction eliminate the segfault.

https://github.com/flang-compiler/flang/issues/56


I have answered Peter's question below inserted between his comments.

So can I comment the reduction procedure out (it is not needed?). Serializing in the first cycle I did already by setting omp_lapw0:1. After the first cycle lapw0 runs smooth even with 8 omp_threads.


Best regards,

Michael


Am 08.06.2025 um 10:27 schrieb Fecher, Gerhard:
Dear Peter and Michael,
I receive the segmentation fault  with OneAPI 2024.2 and OneAPI 2025.1
it appears already with -O1

I mentioned already some time ago: when I comment the $omp directives at lines 
1649 ff. then the program runs smooth.

It seems that this is an old unresolved problem, as it is mentioned in a 
comment by jdoumont 30/7/20
(however, it seems not to depend on the size of the calculation)

Ciao
Gerhard

DEEP THOUGHT in D. Adams; Hitchhikers Guide to the Galaxy:
"I think the problem, to be quite honest with you,
is that you have never actually known what the question is."

====================================
Dr. Gerhard H. Fecher
Institut of Physics
Johannes Gutenberg - University
55099 Mainz
________________________________________
Von: Wien [[email protected]] im Auftrag von Peter Blaha 
[[email protected]]
Gesendet: Samstag, 7. Juni 2025 20:40
An: [email protected]
Betreff: Re: [Wien] New findings on the lapw0 seg fault core dump error

Very curious.

Is "number of PW"  in case.clmsum   after init_lapw   and after the
first cycle identical ?
Number of PW is 2239 in the starting case.clmsum as well as in the case.clmsum after the first cycle

Since this is a small case: Can you manually look at the
Fouriercoefficients in clmsum. Any "huge" numbers ? Any *** numbers,
No big numbers, no  ****

After dstart, I guess none of the FK are zero. After mixer (after 1st
iteration) the later ones should be zero.

My guess is a problem in the libthread library of your compiler version
(ifx 2025.xxx ?). The problems did not show up with previous compilers ?
I am using ifx 2025.1.1


Am 07.06.2025 um 18:18 schrieb Michael Fechtelkord via Wien:
smiles .. no it is MgF2.. Just two atoms in a cubic cell. and it is not
dependent on the structure. It crashes for all in the first cycle using
the clmsum from the init_lapw

Am 07.06.2025 um 17:34 schrieb Peter Blaha:
Is this a big supercell ?

The only thing I could imagine is that the number of PWs is bigger
after dstart than after the 1st cycle.
grep for "PW" in the clmsum files from dstart and after the 1st cycle.
Eventually reduce number of PW until it works as a temporary fix.
It might be a "stack" problem and I think one can increase this
somehow, but I can't remember how.

Am 06.06.2025 um 22:25 schrieb Michael Fechtelkord via Wien:
and a additional comment.


lapw0 crashes only in the first cycle with OMP_NUM_THREADS higher
than 1. When I set lapw0:1 for the first cycle (using -i 1 in
run_lapw) and then after the first run set it back to lapw0:8 it runs
without a problem for the complete scf cycle. It seems that is a
problem with  the initial case.clmsum file (init_lapw -b -prec 1).


Am 06.06.2025 um 22:07 schrieb Michael Fechtelkord via Wien:
Hello Peter,


omp_lapw0 in .machines was 8. I reduced it from 8 to 4, then to 2
and finally to 1. Only in the case of omp_lapw0:1 lapw0 does not crash.

omp_global:2


Best regards,

Michael


Am 06.06.2025 um 17:59 schrieb Peter Blaha:
What was your   OMP_NUM_THREADS variable ?

Set it to 1, 2, ... and check if the error occurs again.

Am 06.06.2025 um 14:07 schrieb Michael Fechtelkord via Wien:
I debugged the core-dump file with gdb and using debugging symbols
in compilation of lapw0.

The debugger gave me the line which causes the coredump

_----------------------------------------

Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on'
to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/local/WIEN2k/lapw0 lapw0.def'.
Program terminated with signal SIGSEGV, Segmentation fault.

#0  0x000000000048b89b in
MAIN__.DIR.OMP.PARALLEL.LOOP.12.split63842.split63939 ()*at
lapw0.F:1649*

*1649    !$omp parallel do reduction(+:rhopw00,cwk,cvout) &*


[Current thread is 1 (Thread 0x14823edbe740 (LWP 339344))]

------------------------------------

Maybe somebody has an idea how to fix it..


Best regards

Michael


Am 17.05.2025 um 13:48 schrieb Michael Fechtelkord via Wien:
Hello everybody,


I have new results considering the lapw0 crash which happens
partially (segmentation fault error - core dump).

It seems that the crucial thing is the case.clmsum file. (I am no
expert here) But if this is somehow the key. It can produce the
lapw0 so it might be that it is sometimes triggering the lapw0.

I calculated MgF2 and substituted the new generated clmsum by an
older one and then there was no crash. I cannot attach them
because the file size is too large.


I am not so into debugging, to find out why and where it happens.


Best regards,

Michael


--
Dr. Michael Fechtelkord

Institut für Geologie, Mineralogie und Geophysik
Ruhr-Universität Bochum
Universitätsstr. 150
D-44780 Bochum

Phone: +49 (234) 32-24380
Fax:  +49 (234) 32-04380
Email:[email protected]
Web Page:https://www.ruhr-uni-bochum.de/kristallographie/kc/
mitarbeiter/fechtelkord/


_______________________________________________
Wien mailing list
[email protected]
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at: http://www.mail-archive.com/
[email protected]/index.html
--
-----------------------------------------------------------------------
Peter Blaha,  Inst. f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-158801165300
Email: [email protected]
WWW:   http://www.imc.tuwien.ac.at      WIEN2k: http://www.wien2k.at
-------------------------------------------------------------------------

_______________________________________________
Wien mailing list
[email protected]
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/[email protected]/index.html
_______________________________________________
Wien mailing list
[email protected]
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/[email protected]/index.html

--
Dr. Michael Fechtelkord

Institut für Geologie, Mineralogie und Geophysik
Ruhr-Universität Bochum
Universitätsstr. 150
D-44780 Bochum

Phone: +49 (234) 32-24380
Fax:  +49 (234) 32-04380
Email: [email protected]
Web Page: 
https://www.ruhr-uni-bochum.de/kristallographie/kc/mitarbeiter/fechtelkord/

_______________________________________________
Wien mailing list
[email protected]
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/[email protected]/index.html

Reply via email to