Dear Dr. Gavin Abo
thank you very much for your answers, my system does not have a HFI card
but I will try the second solution.
Le ven. 20 juil. 2018 à 02:54, Gavin Abo a écrit :
> Good to hear that the "unable to get host address" and "unable to connect
> to server" errors are gone after you
Good to hear that the "unable to get host address" and "unable to
connect to server" errors are gone after you fixed the hosts file on
each node.
Regarding the "no hfi units are available" error, if your system has
Intel OP HFI cards, then maybe they just need configured to work [
As I said, this is in your IB (or similar) fabric.
On Thu, Jul 19, 2018 at 11:54 AM, karima Physique wrote:
> Dear prof. Laurence Marks
>
> *I note that I am using the latest version of intel compilers (Intel
> Parallel Studio Cluster Edition)*
> *I read about the possible solution but I did
Dear prof. Laurence Marks
*I note that I am using the latest version of intel compilers (Intel
Parallel Studio Cluster Edition)*
*I read about the possible solution but I did not find a solution related
to intel.*
*do you have any solution for this problem?*
Le jeu. 19 juil. 2018 à 16:02,
See
https://www.google.com/search?q=no+hfi+units+are+available+(err%3D23)=no+hfi+units+are+available+(err%3D23)=chrome..69i57.481j0j4=chrome=UTF-8
This appears to be an issue with your local mpi/fabric.
On Thu, Jul 19, 2018 at 8:03 AM, karima Physique
wrote:
> *dear dr Gavin Abo*
> actually,
*dear dr Gavin Abo*
actually, the problem was solved by adding the hostname in the hosts file
in all the nodes and not only in the master node.
now the calculation works very well but at each excusion of LAPW0 in the
scf I get this error without affecting the calculations :
*""calcul.23539PSM2
*Thank you dr Gavin Abo*
*I checked the etc/hosts file and it is ok*
*but why lapw1_mpi works fine and in all the nodes while dstart_mi and
lapw0_mpi do not work on the nodes*
Le jeu. 19 juil. 2018 à 04:23, Gavin Abo a écrit :
> As the error message says, one possible cause is the connection
As the error message says, one possible cause is the connection being
blocked by a firewall.
Another possible cause is a ssh passwordless access problem:
https://stackoverflow.com/questions/19565795/unable-to-execute-mpich2-on-multiple-machines-on-ubuntu-12-04-hydu-sock-connect
Yet, another
8 matches
Mail list logo