Re: [QE-users] R: Data Parallelism and GPU Support for Quantum Espresso

2023-07-02 Thread Paolo Giannozzi
Apparently the intel compiler doesn't like the instruction at line 78 of 
PW/src/manypw.f90 (other compilers don't complain), introduced for the 
reason explained above in the code. As a quick fix, just replace 
STATUS='DELETE' with STATUS='KEEP'.


Paolo


On 02/07/2023 09:41, Prashant Govindarajan via users wrote:
Hi! Thanks a lot for the suggestions. I tried *manypw.x*  just with 2 
input files to see if it works. Basically, my directory consists of 
(drive links of text files provided)


 1. Input 1 -- espresso_0.in
 2. Input 2 -- espresso_1.in
 3. Slurm job script -- script *(1 node, 8 tasks per node, 4 cpus per task, QE 
version 7.0)*

The command I run (on Compute Canada) is the following

srun --cpus-per-task=$SLURM_CPUS_PER_TASK manypw.x -ni 2 -i espresso


When I submit the job, Input 2 starts and it immediately stops with the 
error message as described below, and Input 1 never starts.


--
      Program PWSCF v.7.0 starts on  2Jul2023 at  13:13:48

      This program is part of the open-source Quantum ESPRESSO suite
      for quantum simulation of materials; please cite
          "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);
          "P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017);
          "P. Giannozzi et al., J. Chem. Phys. 152 154105 (2020);
           URL http://www.quantum-espresso.org/ 
",
      in publications or presentations arising from this work. More 
details at
http://www.quantum-espresso.org/quote 



      Parallel version (MPI & OpenMP), running on      32 processor cores
      Number of MPI processes:                 8
      Threads/MPI process:                     4

      MPI processes distributed on     1 nodes
      path-images division:  nimage    =       2
      R & G space division:  proc/nbgrp/npool/nimage =       4
      35167 MiB available memory on the printing compute node when the 
environment starts


forrtl: Operation not permitted
forrtl: severe (28): CLOSE error, unit 6, file "Unknown"
Image              PC                Routine            Line        Source
manypw.x           0152002B  for__io_return        Unknown  Unknown
manypw.x           0151221F  for_close             Unknown  Unknown
manypw.x           004C6BAF  Unknown               Unknown  Unknown
manypw.x           00413F67  Unknown               Unknown  Unknown
manypw.x           00413D12  Unknown               Unknown  Unknown
libc-2.30.so 
       2B46AD812E1B  __libc_start_main     Unknown  Unknown

manypw.x           00413C2A  Unknown               Unknown  Unknown
srun: error: bc11237: task 0: Exited with exit code 28
--

I got the output file for one of the inputs as mentioned before, and it 
almost reaches completion when the above error appears. The output file 
is espresso_1.out 
. My understanding is that it starts for one of the inputs but when it parallely tries to start the other input, it is trying to access something that it does not have permissions. I'm not sure if this is a common error, but I was not able to see a solution for this anywhere. What could be the best way to resolve this? Please let me know if any further information is required from my side.


Actually I have faced similar issues while using just pw.x and having a 
'for loop' over multiple input crystals, so I thought I should find the 
root cause of such errors.


Thanks and Regards
*Prashant Govindarajan*


On Fri, Jun 30, 2023 at 4:34 AM Paolo Giannozzi 
mailto:paolo.gianno...@uniud.it>> wrote:


On 6/30/23 10:27, Pietro Davide Delugas wrote:

 > About the parallel execution: in QE, there is the manypw.x
application
 > that can run many inputs in parallel.

its usage is described in the header of PW/src/manypw.f90

Paolo
-- 
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,

Univ. Udine, via delle Scienze 208, 33100 Udine Italy, +39-0432-558216
___
The Quantum ESPRESSO community stands by the Ukrainian
people and expresses its concerns about the devastating
effects that the Russian military offensive has on their
country and on the free and peaceful scientific, cultural,
and economic cooperation amongst peoples
___
Quantum ESPRESSO is supported by MaX (http://www.max-centre.eu/
)
users mailing list users@lists.quantum-espresso.org


Re: [QE-users] R: Data Parallelism and GPU Support for Quantum Espresso

2023-07-02 Thread Prashant Govindarajan via users
Hi! Thanks a lot for the suggestions. I tried *manypw.x*  just with 2 input
files to see if it works. Basically, my directory consists of (drive links
of text files provided)

   1. Input 1 -- espresso_0.in
   

   2. Input 2 -- espresso_1.in
   

   3. Slurm job script -- script
   

*(1 node, 8 tasks per node, 4 cpus per task, QE version 7.0)*

The command I run (on Compute Canada) is the following

srun --cpus-per-task=$SLURM_CPUS_PER_TASK manypw.x -ni 2 -i espresso


When I submit the job, Input 2 starts and it immediately stops with the
error message as described below, and Input 1 never starts.

--
 Program PWSCF v.7.0 starts on  2Jul2023 at  13:13:48

 This program is part of the open-source Quantum ESPRESSO suite
 for quantum simulation of materials; please cite
 "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);
 "P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017);
 "P. Giannozzi et al., J. Chem. Phys. 152 154105 (2020);
  URL http://www.quantum-espresso.org;,
 in publications or presentations arising from this work. More details
at
 http://www.quantum-espresso.org/quote

 Parallel version (MPI & OpenMP), running on  32 processor cores
 Number of MPI processes: 8
 Threads/MPI process: 4

 MPI processes distributed on 1 nodes
 path-images division:  nimage=   2
 R & G space division:  proc/nbgrp/npool/nimage =   4
 35167 MiB available memory on the printing compute node when the
environment starts

forrtl: Operation not permitted
forrtl: severe (28): CLOSE error, unit 6, file "Unknown"
Image  PCRoutineLineSource

manypw.x   0152002B  for__io_returnUnknown  Unknown
manypw.x   0151221F  for_close Unknown  Unknown
manypw.x   004C6BAF  Unknown   Unknown  Unknown
manypw.x   00413F67  Unknown   Unknown  Unknown
manypw.x   00413D12  Unknown   Unknown  Unknown
libc-2.30.so   2B46AD812E1B  __libc_start_main Unknown  Unknown
manypw.x   00413C2A  Unknown   Unknown  Unknown
srun: error: bc11237: task 0: Exited with exit code 28
--

I got the output file for one of the inputs as mentioned before, and it
almost reaches completion when the above error appears. The output file is
espresso_1.out
.
My understanding is that it starts for one of the inputs but when it
parallely tries to start the other input, it is trying to access something
that it does not have permissions. I'm not sure if this is a common error,
but I was not able to see a solution for this anywhere. What could be the
best way to resolve this? Please let me know if any further information is
required from my side.

Actually I have faced similar issues while using just pw.x and having a
'for loop' over multiple input crystals, so I thought I should find the
root cause of such errors.

Thanks and Regards
*Prashant Govindarajan*


On Fri, Jun 30, 2023 at 4:34 AM Paolo Giannozzi 
wrote:

> On 6/30/23 10:27, Pietro Davide Delugas wrote:
>
> > About the parallel execution: in QE, there is the manypw.x application
> > that can run many inputs in parallel.
>
> its usage is described in the header of PW/src/manypw.f90
>
> Paolo
> --
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine Italy, +39-0432-558216
> ___
> The Quantum ESPRESSO community stands by the Ukrainian
> people and expresses its concerns about the devastating
> effects that the Russian military offensive has on their
> country and on the free and peaceful scientific, cultural,
> and economic cooperation amongst peoples
> ___
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users@lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
>
___
The Quantum ESPRESSO community stands by the Ukrainian
people and expresses its concerns about the devastating
effects that the Russian military offensive has on their
country and on the free and peaceful scientific, cultural,
and economic cooperation amongst peoples