Dear Sara,

at least the error message is clear now: there's no memory left on the GPU.

You could have guessed this in advance by inspecting the first lines of
the output where the memory estimator reports:

     Estimated static dynamical RAM per process >     652.61 MB
     Estimated max dynamical RAM per process >      16.82 GB
     Estimated total dynamical RAM >    1210.88 GB

The second entry is the important one: you have one process per GPU and
16 GB of memory on each card. Although the estimates is for RAM, it's
generally a good guess also for the GPU memory.

Try using less pools (or more nodes if you desperately need this to run
fast).

Best,
Pietro



On 8/31/20 6:54 PM, Sara Postorino wrote:
Thank for your response,

I ran it again with 6.5 (couldn't install 6.6a1), it uses the serial
eigensolver.

now I get :
      Band Structure Calculation
      Davidson diagonalization with overlap

      Computing kpt #:     1  of     9 on this pool
  Really copied g2kin H->D
  Really copied evc H->D
  Really copied et H->D

  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
      Error in routine  cegterg (1):
       cannot allocate vc_d
  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

      stopping ...

I attach input and output

I'll put the rest on gitlab

Thank you,
Sara


Il giorno dom 30 ago 2020 alle ore 23:18 Pietro Bonfa
<[email protected] <mailto:[email protected]>> ha scritto:

    Dear Sara,

    I'd suggest checking the following:

    1. verify that the serial eigensolver is used (it's written at the
    beginning of the output);

    2. use the latest version (6.6a1) that will correctly report problems
    with memory allocations during the iterative diagonalization.

    Could you please also open an issue at
    https://gitlab.com/QEF/q-e-gpu/-/issues
    <https://gitlab.com/QEF/q-e-gpu/-/issues>
    and attach the input, the
    pseudopotentials and the job script that you are using?

    Thank you,
    kind regards,
    Pietro



    On 8/29/20 6:33 PM, Sara Postorino wrote:
     > Hi QE users,
     >
     > I am running PW on Marconi100 and experiencing problems during
     > digonalization. I am using version 6.5 (autoload of the modules
    on m100).
     > My system is a MoTe2 bilayer k mesh 39x39x1 with many bands due
    to the
     > fact that I will do a GW calculation on top of it. (The calculation
     > works if I do not add many bands)
     > I tried with 4000 and 3000 bands using Davidson diagonalization
    running
     > on 18 nodes:
     > Parallel version (MPI & OpenMP), running on    2304 processor cores
     >       Number of MPI processes:                72
     >       Threads/MPI process:                    32
     > When doin the calculation of the first point I get:
     >
     >   Really copied g2kin H->D
     >   Really copied evc H->D
     >   Really copied et H->D
     >   Really copied vrs H->D
     >   dp_memcpy_d2h_c2dinvalid pitch argument           12
     >
     > I also tried with Conjugate gradient algorithm but  it gets stuck at
     >
     >   Really copied evc H->D
     >   Really copied et H->D
     >   Really copied h_diag H->D
     >   Really copied becp%nc H->D
     >   Really copied g2kin H->D
     >   Really copied vrs H->D
     >
     > And here it takes forever. I left it running for more than 1 hour
    and it
     > didn't finish on k point and since I have 147 kpoints the computation
     > would be very expensive even if it worked.
     >
     > I also tried to go down to 1000 bands (I need way more) and got
     >   Really copied g2kin H->D
     >   Really copied evc H->D
     >   Really copied et H->D
     >   Really copied vrs H->D
     >   zhegvdx_gpu error: cusolverDnZpotrf failed!
     >
     >
      
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     >       Error in routine  cdiaghg_gpu (1):
     >        zhegvdx_gpu failed
     >
      
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     >
     > Do you have any suggestion on how to fix this issue?
     > Thanks
     >
     > Sara Postorino
     > PhD student
     > University of Rome Tor Vergata
     >
     >
     >
    
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
    
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>>
     >       Mail priva di virus. www.avast.com
    <http://www.avast.com/>
     >
    
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
    
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>>
     >
     >
     > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
     >
     > _______________________________________________
     > Quantum ESPRESSO is supported by MaX
    (http://www.max-centre.eu/quantum-espresso
    <http://www.max-centre.eu/quantum-espresso>
     > users mailing list [email protected]
    <mailto:[email protected]>
     > https://lists.quantum-espresso.org/mailman/listinfo/users
    <https://lists.quantum-espresso.org/mailman/listinfo/users>
     >

    Firma il tuo 5 per mille all’Università di Parma e aiuta così i
    nostri studenti che vogliono realizzare un’esperienza di studio
    all’estero - Indica 00308780345 nella tua denuncia dei redditi.
    _______________________________________________
    Quantum ESPRESSO is supported by MaX
    (www.max-centre.eu/quantum-espresso
    <http://www.max-centre.eu/quantum-espresso>)
    users mailing list [email protected]
    <mailto:[email protected]>
    https://lists.quantum-espresso.org/mailman/listinfo/users
    <https://lists.quantum-espresso.org/mailman/listinfo/users>


<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
      Mail priva di virus. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>


<#m_-4887640929092430203_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

_______________________________________________
Quantum ESPRESSO is supported by MaX (http://www.max-centre.eu/quantum-espresso
users mailing list [email protected]
https://lists.quantum-espresso.org/mailman/listinfo/users


Firma il tuo 5 per mille all’Università di Parma e aiuta così i nostri studenti 
che vogliono realizzare un’esperienza di studio all’estero - Indica 00308780345 
nella tua denuncia dei redditi.
_______________________________________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu/quantum-espresso)
users mailing list [email protected]
https://lists.quantum-espresso.org/mailman/listinfo/users

Reply via email to