subject:"\[Wien\] time difference among nodes"

Hi Lyudmila,

   Unfortunately, they do not have "top mode 1" output corresponding to the
problem period.
   Thanks again.
   All the best,
 Luis


2015-09-29 10:37 GMT-03:00 Lyudmila Dobysheva :

> 29.09.2015 14:57, Laurence Marks wrote:
>
>> If it happens again, one thing to ask them to check is swap usage and
>> how much memory is cached.
>>
> ...
>
>> Alternatively it was something else, a zombie, big log files or other
>> things. Rebooting gets rid of a lot of system caches and helps
>>
>
> I stand for losing parallelization on that node due to unclear reason
> (maybe this bad swapping/caching threw away parallel options from the
> memory and all jobs had been sent to only one processor of the node).
>
> I would like to know what had administrator seen in the "1" mode of top
> command.
>
> Best wishes
>   Lyudmila Dobysheva
> --
> Phys.-Techn. Institute of Ural Br. of Russian Ac. of Sci.
> 426001 Izhevsk, ul.Kirova 132
> RUSSIA
> --
> Tel.:7(3412) 432045(office), 722529(Fax)
> E-mail: l...@ftiudm.ru, lyuk...@mail.ru (office)
> lyuk...@gmail.com (home)
> Skype:  lyuka17 (home), lyuka18 (office)
> http://ftiudm.ru/content/view/25/103/lang,english/
> --
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] time difference among nodes

Hi Lyudmila,

   Thanks again !
   I will ask them.
   All the best,
  Luis


2015-09-29 10:37 GMT-03:00 Lyudmila Dobysheva :

> 29.09.2015 14:57, Laurence Marks wrote:
>
>> If it happens again, one thing to ask them to check is swap usage and
>> how much memory is cached.
>>
> ...
>
>> Alternatively it was something else, a zombie, big log files or other
>> things. Rebooting gets rid of a lot of system caches and helps
>>
>
> I stand for losing parallelization on that node due to unclear reason
> (maybe this bad swapping/caching threw away parallel options from the
> memory and all jobs had been sent to only one processor of the node).
>
> I would like to know what had administrator seen in the "1" mode of top
> command.
>
> Best wishes
>   Lyudmila Dobysheva
> --
> Phys.-Techn. Institute of Ural Br. of Russian Ac. of Sci.
> 426001 Izhevsk, ul.Kirova 132
> RUSSIA
> --
> Tel.:7(3412) 432045(office), 722529(Fax)
> E-mail: l...@ftiudm.ru, lyuk...@mail.ru (office)
> lyuk...@gmail.com (home)
> Skype:  lyuka17 (home), lyuka18 (office)
> http://ftiudm.ru/content/view/25/103/lang,english/
> --
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] time difference among nodes

2015-09-29 Thread Lyudmila Dobysheva


29.09.2015 14:57, Laurence Marks wrote:

If it happens again, one thing to ask them to check is swap usage and
how much memory is cached.

...

Alternatively it was something else, a zombie, big log files or other
things. Rebooting gets rid of a lot of system caches and helps


I stand for losing parallelization on that node due to unclear reason 
(maybe this bad swapping/caching threw away parallel options from the 
memory and all jobs had been sent to only one processor of the node).


I would like to know what had administrator seen in the "1" mode of top 
command.


Best wishes
  Lyudmila Dobysheva
--
Phys.-Techn. Institute of Ural Br. of Russian Ac. of Sci.
426001 Izhevsk, ul.Kirova 132
RUSSIA
--
Tel.:7(3412) 432045(office), 722529(Fax)
E-mail: l...@ftiudm.ru, lyuk...@mail.ru (office)
lyuk...@gmail.com (home)
Skype:  lyuka17 (home), lyuka18 (office)
http://ftiudm.ru/content/view/25/103/lang,english/
--
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] time difference among nodes

Dear Prof. Marks,

   Thanks !
   I will send your message to the administrators !
   All the best,
   Luis


2015-09-29 8:57 GMT-03:00 Laurence Marks :

> If it happens again, one thing to ask them to check is swap usage and how
> much memory is cached. On some of my nodes I have noticed that they do not
> always release cached memory, and can start swapping. If this happens the
> job will get very slow. The commands to use to clear the cache can be found
> at
>
> http://www.tecmint.com/clear-ram-memory-cache-buffer-and-swap-space-on-linux/
> or similar. (Needs root access.) Top can also show memory use.
>
> While there should be no need to do this, I have noticed that I need to do
> it every 3hrs on 4 nodes - the other 20 don't need it. It is an issue
> mainly for big calculations.
>
> Alternatively it was something else, a zombie, big log files or other
> things. Rebooting gets rid of a lot of system caches and helps -- even on
> my Android tablet every week or two. It's murky waters.
>
> ---
> Professor Laurence Marks
> Department of Materials Science and Engineering
> Northwestern University
> http://www.numis.northwestern.edu
> Corrosion in 4D http://MURI4D.numis.northwestern.edu
> Co-Editor, Acta Cryst A
> "Research is to see what everybody else has seen, and to think what nobody
> else has thought"
> Albert Szent-Gyorgi
> Hi Elias,
>
>There were no other jobs in the specific queue I was using and the
> nodes are dedicated to that queue, so, it was the opportunity to reboot
> them without furious reactions from other users.
>After trying everything suggested by the Wien2k community, the
> administrators resignedly remembered the words of wisdom given by the
> cluster guru, Shakespeare, and followed the suggestion given by Lyudmila
> Dobysheva. In other words, they killed my job, restarted all the nodes and
> I resubmitted the calculation
>All the best,
>  Luis
>
>
> 2015-09-29 3:50 GMT-03:00 Elias Assmann :
>
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA1
>>
>> On 09/28/2015 01:58 PM, Luis Ogando wrote:
>> > The problem is solved ! The solution was one suggested by Lyudmila
>> > Dobysheva : reboot the nodes. We will never know the origin of the
>> > problem, but, honestly, I do not care !
>>
>> Good to hear that!  So, how did you get the admins to reboot them?
>>
>> > "There are more things in heaven and earth, Horatio, Than are
>> > dreamt of in your philosophy."
>>
>> That is an apt quote for people working on clusters ;-).
>>
>>
>> Elias
>>
>> -BEGIN PGP SIGNATURE-
>> Version: GnuPG v1
>> Comment: Using GnuPG with Icedove - http://www.enigmail.net/
>>
>> iQIcBAEBAgAGBQJWCjTGAAoJEE/4gtQZfOqPhFAQAKZmda0t9FGgfAsk9UjymogK
>> oN1WxHdenQVOSaOblpAFEn4c0ihTog7zePEXdTqNl03OcBUcdKtOPVqSVLBKlmlF
>> f0VOBUeXjmOZKd6SAIuwNojflW0k9ysrJ2sLCo/dOGepT4L2Q8Um5DHpgh+mjehM
>> XtGbn6uDUQlcjoLKgHG9GxBzr9qRDqc4chYnMAvwNGkm7qntt7Q1jol9yGZikB8e
>> CONyaqYghNBr4x7BtGOaITJQ7yWw++l7t56oMSCNOXzee8Noy53cKPCVOvzh8lUF
>> PlMRNFB9pTgdxs59dy5yF31R4LTJjMG7zm+gHjmWDMi7BnQZQGEWDc6MIzLIwTPj
>> kN5dZm4R/cbVjYEzIlmsr9h67H/+9Otr36AvwfvvwycL/wy0RkC7jxqY0eC8i3fK
>> v/FdmFbt6b2wxzalmjvg+sEILe18Uz0fCmhcCDRdZ2fgmOWC68WeH4I7d2/kCJTr
>> Az2K8ZvZ5LxBCSH9MLoh/heZVSI3rowHu3aUNqfcbZ1pJLmT68RU9ZmPgfQnA4bK
>> 4uny7MaDcyYN/IvMRWf8lUiuY3OsRHGZAmcIfagkqvV2ukWPRFQ2AmsaZpMxbYyg
>> FsdKDJfYocUdp14KMT3wEhiGmUTE5BwtxAXq4NTq1sdJGESZIzhbEXYHbgnD7mbF
>> QDT7WZ/DqG+KpcVTRmnz
>> =JtdF
>> -END PGP SIGNATURE-
>> ___
>> Wien mailing list
>> Wien@zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> SEARCH the MAILING-LIST at:
>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>>
>
>
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
>
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] time difference among nodes

2015-09-29 Thread Laurence Marks

If it happens again, one thing to ask them to check is swap usage and how
much memory is cached. On some of my nodes I have noticed that they do not
always release cached memory, and can start swapping. If this happens the
job will get very slow. The commands to use to clear the cache can be found
at
http://www.tecmint.com/clear-ram-memory-cache-buffer-and-swap-space-on-linux/
or similar. (Needs root access.) Top can also show memory use.

While there should be no need to do this, I have noticed that I need to do
it every 3hrs on 4 nodes - the other 20 don't need it. It is an issue
mainly for big calculations.

Alternatively it was something else, a zombie, big log files or other
things. Rebooting gets rid of a lot of system caches and helps -- even on
my Android tablet every week or two. It's murky waters.

---
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
http://www.numis.northwestern.edu
Corrosion in 4D http://MURI4D.numis.northwestern.edu
Co-Editor, Acta Cryst A
"Research is to see what everybody else has seen, and to think what nobody
else has thought"
Albert Szent-Gyorgi
Hi Elias,

   There were no other jobs in the specific queue I was using and the nodes
are dedicated to that queue, so, it was the opportunity to reboot them
without furious reactions from other users.
   After trying everything suggested by the Wien2k community, the
administrators resignedly remembered the words of wisdom given by the
cluster guru, Shakespeare, and followed the suggestion given by Lyudmila
Dobysheva. In other words, they killed my job, restarted all the nodes and
I resubmitted the calculation
   All the best,
 Luis

2015-09-29 3:50 GMT-03:00 Elias Assmann :

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 09/28/2015 01:58 PM, Luis Ogando wrote:
> > The problem is solved ! The solution was one suggested by Lyudmila
> > Dobysheva : reboot the nodes. We will never know the origin of the
> > problem, but, honestly, I do not care !
>
> Good to hear that!  So, how did you get the admins to reboot them?
>
> > "There are more things in heaven and earth, Horatio, Than are
> > dreamt of in your philosophy."
>
> That is an apt quote for people working on clusters ;-).
>
>
> Elias
>
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1
> Comment: Using GnuPG with Icedove - http://www.enigmail.net/
>
> iQIcBAEBAgAGBQJWCjTGAAoJEE/4gtQZfOqPhFAQAKZmda0t9FGgfAsk9UjymogK
> oN1WxHdenQVOSaOblpAFEn4c0ihTog7zePEXdTqNl03OcBUcdKtOPVqSVLBKlmlF
> f0VOBUeXjmOZKd6SAIuwNojflW0k9ysrJ2sLCo/dOGepT4L2Q8Um5DHpgh+mjehM
> XtGbn6uDUQlcjoLKgHG9GxBzr9qRDqc4chYnMAvwNGkm7qntt7Q1jol9yGZikB8e
> CONyaqYghNBr4x7BtGOaITJQ7yWw++l7t56oMSCNOXzee8Noy53cKPCVOvzh8lUF
> PlMRNFB9pTgdxs59dy5yF31R4LTJjMG7zm+gHjmWDMi7BnQZQGEWDc6MIzLIwTPj
> kN5dZm4R/cbVjYEzIlmsr9h67H/+9Otr36AvwfvvwycL/wy0RkC7jxqY0eC8i3fK
> v/FdmFbt6b2wxzalmjvg+sEILe18Uz0fCmhcCDRdZ2fgmOWC68WeH4I7d2/kCJTr
> Az2K8ZvZ5LxBCSH9MLoh/heZVSI3rowHu3aUNqfcbZ1pJLmT68RU9ZmPgfQnA4bK
> 4uny7MaDcyYN/IvMRWf8lUiuY3OsRHGZAmcIfagkqvV2ukWPRFQ2AmsaZpMxbYyg
> FsdKDJfYocUdp14KMT3wEhiGmUTE5BwtxAXq4NTq1sdJGESZIzhbEXYHbgnD7mbF
> QDT7WZ/DqG+KpcVTRmnz
> =JtdF
> -END PGP SIGNATURE-
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] time difference among nodes

Hi Elias,

   There were no other jobs in the specific queue I was using and the nodes
are dedicated to that queue, so, it was the opportunity to reboot them
without furious reactions from other users.
   After trying everything suggested by the Wien2k community, the
administrators resignedly remembered the words of wisdom given by the
cluster guru, Shakespeare, and followed the suggestion given by Lyudmila
Dobysheva. In other words, they killed my job, restarted all the nodes and
I resubmitted the calculation
   All the best,
 Luis


2015-09-29 3:50 GMT-03:00 Elias Assmann :

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 09/28/2015 01:58 PM, Luis Ogando wrote:
> > The problem is solved ! The solution was one suggested by Lyudmila
> > Dobysheva : reboot the nodes. We will never know the origin of the
> > problem, but, honestly, I do not care !
>
> Good to hear that!  So, how did you get the admins to reboot them?
>
> > "There are more things in heaven and earth, Horatio, Than are
> > dreamt of in your philosophy."
>
> That is an apt quote for people working on clusters ;-).
>
>
> Elias
>
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1
> Comment: Using GnuPG with Icedove - http://www.enigmail.net/
>
> iQIcBAEBAgAGBQJWCjTGAAoJEE/4gtQZfOqPhFAQAKZmda0t9FGgfAsk9UjymogK
> oN1WxHdenQVOSaOblpAFEn4c0ihTog7zePEXdTqNl03OcBUcdKtOPVqSVLBKlmlF
> f0VOBUeXjmOZKd6SAIuwNojflW0k9ysrJ2sLCo/dOGepT4L2Q8Um5DHpgh+mjehM
> XtGbn6uDUQlcjoLKgHG9GxBzr9qRDqc4chYnMAvwNGkm7qntt7Q1jol9yGZikB8e
> CONyaqYghNBr4x7BtGOaITJQ7yWw++l7t56oMSCNOXzee8Noy53cKPCVOvzh8lUF
> PlMRNFB9pTgdxs59dy5yF31R4LTJjMG7zm+gHjmWDMi7BnQZQGEWDc6MIzLIwTPj
> kN5dZm4R/cbVjYEzIlmsr9h67H/+9Otr36AvwfvvwycL/wy0RkC7jxqY0eC8i3fK
> v/FdmFbt6b2wxzalmjvg+sEILe18Uz0fCmhcCDRdZ2fgmOWC68WeH4I7d2/kCJTr
> Az2K8ZvZ5LxBCSH9MLoh/heZVSI3rowHu3aUNqfcbZ1pJLmT68RU9ZmPgfQnA4bK
> 4uny7MaDcyYN/IvMRWf8lUiuY3OsRHGZAmcIfagkqvV2ukWPRFQ2AmsaZpMxbYyg
> FsdKDJfYocUdp14KMT3wEhiGmUTE5BwtxAXq4NTq1sdJGESZIzhbEXYHbgnD7mbF
> QDT7WZ/DqG+KpcVTRmnz
> =JtdF
> -END PGP SIGNATURE-
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] time difference among nodes

2015-09-28 Thread Elias Assmann

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 09/28/2015 01:58 PM, Luis Ogando wrote:
> The problem is solved ! The solution was one suggested by Lyudmila 
> Dobysheva : reboot the nodes. We will never know the origin of the 
> problem, but, honestly, I do not care !

Good to hear that!  So, how did you get the admins to reboot them?

> "There are more things in heaven and earth, Horatio, Than are
> dreamt of in your philosophy."

That is an apt quote for people working on clusters ;-).


Elias

-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Icedove - http://www.enigmail.net/

iQIcBAEBAgAGBQJWCjTGAAoJEE/4gtQZfOqPhFAQAKZmda0t9FGgfAsk9UjymogK
oN1WxHdenQVOSaOblpAFEn4c0ihTog7zePEXdTqNl03OcBUcdKtOPVqSVLBKlmlF
f0VOBUeXjmOZKd6SAIuwNojflW0k9ysrJ2sLCo/dOGepT4L2Q8Um5DHpgh+mjehM
XtGbn6uDUQlcjoLKgHG9GxBzr9qRDqc4chYnMAvwNGkm7qntt7Q1jol9yGZikB8e
CONyaqYghNBr4x7BtGOaITJQ7yWw++l7t56oMSCNOXzee8Noy53cKPCVOvzh8lUF
PlMRNFB9pTgdxs59dy5yF31R4LTJjMG7zm+gHjmWDMi7BnQZQGEWDc6MIzLIwTPj
kN5dZm4R/cbVjYEzIlmsr9h67H/+9Otr36AvwfvvwycL/wy0RkC7jxqY0eC8i3fK
v/FdmFbt6b2wxzalmjvg+sEILe18Uz0fCmhcCDRdZ2fgmOWC68WeH4I7d2/kCJTr
Az2K8ZvZ5LxBCSH9MLoh/heZVSI3rowHu3aUNqfcbZ1pJLmT68RU9ZmPgfQnA4bK
4uny7MaDcyYN/IvMRWf8lUiuY3OsRHGZAmcIfagkqvV2ukWPRFQ2AmsaZpMxbYyg
FsdKDJfYocUdp14KMT3wEhiGmUTE5BwtxAXq4NTq1sdJGESZIzhbEXYHbgnD7mbF
QDT7WZ/DqG+KpcVTRmnz
=JtdF
-END PGP SIGNATURE-
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] time difference among nodes

2015-09-28 Thread Luis Ogando

Dear Wien2k community,

   I would like to thank so many hints !
   The problem is solved ! The solution was one suggested by Lyudmila
Dobysheva : reboot the nodes. We will never know the origin of the problem,
but, honestly, I do not care !

"There are more things in heaven and earth, Horatio,
Than are dreamt of in your philosophy."
- *Hamlet* (1.5.167-8), Hamlet to Horatio
 Shakespeare

   I would like to thank you all again.
   All the best,
 Luis

2015-09-25 5:56 GMT-03:00 Pawel Lesniak :

> Hello,
>
> I'd suggest trying three things.
>
> First of all - does your cluster allow running interactive jobs? If yes,
> than you should create an interactive job to run /bin/bash. I'm not
> familiar with PBS, but in SGE/OGE if you print cluster queues with "qstat
> -f" you'll see "I" in column qtype which means that given queue allows
> running interactive jobs. Using bash you should be able to run top on given
> node without SSH access.
>
> Regardless of success or failure you should be able to look at nodes
> statistics using "qhost" command. You should see at lease what is the
> current load, memory usage and swap usage. In SGE/OGE there's a switch "-j"
> to qhost which will also show you what jobs are currently running on each
> node. It's necessary to be able to see load of machine interactively
> instead of view at some point of time.
>
> The last concept is to prepare a job to run at the same time on the same
> node as Wien2K, consisting of several
> "ps auxww | grep ogando >> ${HOME}/ps.output; sleep 2"
> commands. It will give you some information on what's going on. Think of
> it as non-interactive top stored in text file.
>
>
> Best regards,
>
> Pawel Lesniak
>
>
> W dniu 23.09.2015 o 14:25, Luis Ogando pisze:
>
> 0K ! In this case, I will try it !
>Many thanks,
>   Luis
>
>
> 2015-09-23 9:23 GMT-03:00 Laurence Marks :
>
>> Ganglia is web based, you don't need ssh. Please read the link I sent.
>>
>> ---
>> Professor Laurence Marks
>> Department of Materials Science and Engineering
>> Northwestern University
>> http://www.numis.northwestern.edu
>> Corrosion in 4D http://MURI4D.numis.northwestern.edu
>> Co-Editor, Acta Cryst A
>> "Research is to see what everybody else has seen, and to think what
>> nobody else has thought"
>> Albert Szent-Gyorgi
>> On Sep 23, 2015 07:21, "Luis Ogando"  wrote:
>>
>>>Hi,
>>>
>>>I can not access the nodes. SSH among them is forbidden ! We have to
>>> ask the administrators for anything !! It is the hell !!
>>>Of course, only the PBS jobs can "travel" among the nodes.
>>>All the best,
>>>Luis
>>>
>>>
>>> 2015-09-23 9:14 GMT-03:00 Laurence Marks :
>>>
 Nooo!

 You should use ganglia yourself.

 ---
 Professor Laurence Marks
 Department of Materials Science and Engineering
 Northwestern University
 http://www.numis.northwestern.edu
 Corrosion in 4D 
 http://MURI4D.numis.northwestern.edu
 Co-Editor, Acta Cryst A
 "Research is to see what everybody else has seen, and to think what
 nobody else has thought"
 Albert Szent-Gyorgi
 On Sep 23, 2015 07:13, "Luis Ogando" < 
 lcoda...@gmail.com> wrote:

> Dear Prof. Marks,
>
>Thank you for your comment.
>I sent your suggestions to the administrators.
>All the best,
> Luis
>
>
> 2015-09-23 8:56 GMT-03:00 Laurence Marks < 
> laurence.ma...@gmail.com>:
>
>> It is hard to work this out remotely, particularly with unfriendly
>> sys_admin.
>>
>> I would find out if you have ganglia available, see
>> 
>> http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi/linux/bks/SGI_Admin/books/ICEX_Admin_Guide/sgi_html/ch05.html#Z1190844523tls
>> . This is much more useful than top. Try doing http://... to
>> relevant head or admin nodes.
>>
>> ---
>> Professor Laurence Marks
>> Department of Materials Science and Engineering
>> Northwestern University
>> http://www.numis.northwestern.edu
>> Corrosion in 4D 
>> http://MURI4D.numis.northwestern.edu
>> Co-Editor, Acta Cryst A
>> "Research is to see what everybody else has seen, and to think what
>> nobody else has thought"
>> Albert Szent-Gyorgi
>> On Sep 23, 2015 06:31, "Luis Ogando" < 
>> lcoda...@gmail.com> wrote:
>>
>>> Dear Prof. Blaha and Lyudmila Dobysheva,
>>>
>>>Many thanks for your comments !
>>>Unfortunately, users have no privileges in the cluster. I will
>>> send your comments to the administrators and let's see what happens.
>>>Many thanks again,
>>> Luis
>>>
>>>
>>

Re: [Wien] time difference among nodes

2015-09-25 Thread Pawel Lesniak

Hello,

I'd suggest trying three things.

First of all - does your cluster allow running interactive jobs? If yes,
than you should create an interactive job to run /bin/bash. I'm not
familiar with PBS, but in SGE/OGE if you print cluster queues with
"qstat -f" you'll see "I" in column qtype which means that given queue
allows running interactive jobs. Using bash you should be able to run
top on given node without SSH access.

Regardless of success or failure you should be able to look at nodes
statistics using "qhost" command. You should see at lease what is the
current load, memory usage and swap usage. In SGE/OGE there's a switch
"-j" to qhost which will also show you what jobs are currently running
on each node. It's necessary to be able to see load of machine
interactively instead of view at some point of time.

The last concept is to prepare a job to run at the same time on the same
node as Wien2K, consisting of several

"ps auxww | grep ogando >> ${HOME}/ps.output; sleep 2"
commands. It will give you some information on what's going on. Think of
it as non-interactive top stored in text file.

Best regards,

Pawel Lesniak

W dniu 23.09.2015 o 14:25, Luis Ogando pisze:

0K ! In this case, I will try it !
Many thanks,
Luis

2015-09-23 9:23 GMT-03:00 Laurence Marks >:

Ganglia is web based, you don't need ssh. Please read the link I sent.

---
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
http://www.numis.northwestern.edu
Corrosion in 4D http://MURI4D.numis.northwestern.edu
Co-Editor, Acta Cryst A
"Research is to see what everybody else has seen, and to think
what nobody else has thought"
Albert Szent-Gyorgi

On Sep 23, 2015 07:21, "Luis Ogando" mailto:lcoda...@gmail.com>> wrote:

Hi,

I can not access the nodes. SSH among them is forbidden !
We have to ask the administrators for anything !! It is the
hell !!
Of course, only the PBS jobs can "travel" among the nodes.
All the best,
Luis

2015-09-23 9:14 GMT-03:00 Laurence Marks
mailto:laurence.ma...@gmail.com>>:

Nooo!

You should use ganglia yourself.

---
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
http://www.numis.northwestern.edu
Corrosion in 4D http://MURI4D.numis.northwestern.edu
Co-Editor, Acta Cryst A
"Research is to see what everybody else has seen, and to
think what nobody else has thought"
Albert Szent-Gyorgi

On Sep 23, 2015 07:13, "Luis Ogando" mailto:lcoda...@gmail.com>> wrote:

Dear Prof. Marks,

Thank you for your comment.
I sent your suggestions to the administrators.
All the best,
Luis

2015-09-23 8:56 GMT-03:00 Laurence Marks
mailto:laurence.ma...@gmail.com>>:

It is hard to work this out remotely, particularly
with unfriendly sys_admin.

I would find out if you have ganglia available,
see

http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi/linux/bks/SGI_Admin/books/ICEX_Admin_Guide/sgi_html/ch05.html#Z1190844523tls
. This is much more useful than top. Try doing
http://... to relevant head or admin nodes.

---
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
http://www.numis.northwestern.edu
Corrosion in 4D http://MURI4D.numis.northwestern.edu
Co-Editor, Acta Cryst A
"Research is to see what everybody else has seen,
and to think what nobody else has thought"
Albert Szent-Gyorgi

On Sep 23, 2015 06:31, "Luis Ogando"
mailto:lcoda...@gmail.com>>
wrote:

Dear Prof. Blaha and Lyudmila Dobysheva,

Many thanks for your comments !
Unfortunately, users have no privileges in the
cluster. I will send your comments to the
administrators and let's see what happens.
Many thanks again,
Luis

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at

Re: [Wien] time difference among nodes

2015-09-25 Thread Elias Assmann

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Sounds like a nasty problem …  In terms of strategy, I think the first
thing should be to find out if the node is really to blame.  If so,
you have to convince the admins and/or find a way to avoid it.  If
not, you can turn to figuring out whatever else (presumably in your
Wien2k setup) is causing the trouble.

On 09/24/2015 07:37 PM, Luis Ogando wrote:
> First of all, I wonder: To what extent is this problem
> reproducible? E.g., does your job always run on the same 4 nodes?
> 
> Yes.
> 
> Is it always the same node(s) that are slow?
> 
> Yes

It seems unusual that your job should always be assigned the same
nodes, but okay.  If you get your job to run on a different set it
could help establish if the node is really to blame.  In some queuing
systems, you can request specific nodes.  Or you could submit two
copies of your job.

> The strangest part: at the beginning of this month, the same
> calculation was running properly. I had a crash for convergence
> problems and when I reduced the "mixing factor" in case.inm (it is
> now 0.04 in pre-convergence scf cycle) the problems started.
> Obviously, I do not believe that the mixing factor is the problem.
> 
> No. All the executables are running slowly in the problematic
> node.

I would try to widen the tests then -- restart the calculation from
scratch, try a different case, try other programs …

> Users can do nothing. The administrator sent me the "top's" and I
> have asked him for simultaneous ones.

Like I said, even if you have no direct access you can put it in a job
script.  Something along these lines (in bash):

run &

pid=$(jobs -p %1)

while [[ "$(jobs)" ]]; do
   for n in $NODES; do
  ssh $n top -bn1 >>$n.top
  # plus whatever else you want to check
   done
done

wait



Elias

-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Icedove - http://www.enigmail.net/

iQIcBAEBAgAGBQJWBP9EAAoJEE/4gtQZfOqPHfkQALvFqdz2yL5CGbVH7c7klkoo
UT3vR6W+3Ev6in9Ed/z/KOc09m8j2hFrZ0p32jW9EF78jfiObFKaaNVkbHJLpw8l
6ru8AEVBxdNIeCJp53aakILSboRx/GzRnTHdZMyjj8EGfEng+0+fPG2+xm+OWipU
Nsreceb/n+gwJvZTKTn719xushxAM9JSUmSMPrN3WESH4nEgm3wFeR/FuPFyoqfZ
S3RNb0CYd8tB3bs0MP4lYFbHWVeiQVy0j2uOwoiqjfqkSlC1vvJoxnBXO900ybvX
AaIRRXGcmd8XiTaQfD/VPvZX0R3Un1swee4EI0LcMNxiYFGkvuN0p7lMd5MC5Zny
7h+IeXIMH9QNtlWF4HDr7stMAYSeKxKLhTWlddJgIOXrXGPF9BHHJsY/X3LwUIYF
E8UzP061j1LNVwDMUIOYYBX4UCIQJfMpnW3PvbTJIIq56NE3Z6ppxV4ZMAkK2JBo
HRmdtQX8pSCXJaggu7QbAIzdhH4Eat+YoEgBAo6uj1M4tYjZ1GivNlwBO2ItQFTu
Y5JCrWILBKloCEym4TDezcwCR0R2/4cUKkXQlgQUh+iLVrKCG2QkAYnJwSxzdIDe
q19gOQEU5MrUCHtH1vaUTYE+Oq4Z0UNWhKiGRapBgJNFYnRonqzKywqOciWt2SmU
JV7fZo5W2vviyEW/e9TF
=eXD9
-END PGP SIGNATURE-
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] time difference among nodes

2015-09-24 Thread Luis Ogando

Dear Prof. Marks,

   As I suspected, users can not use ganglia. Our administrators are very
jealous !!

Dear Elias Assmann,

   Many thanks for your comments. I will try to comment on some of them.

First of all, I wonder: To what extent is this problem reproducible?
> E.g., does your job always run on the same 4 nodes?

Yes.

> Is it always the
> same node(s) that are slow?

Yes

> Does the problem also show up in other
> calculations (maybe just changing the number of k-points, or
> restarting the same case from scratch).

The strangest part: at the beginning of this month, the same calculation
was running properly. I had a crash for convergence problems and when I
reduced the "mixing factor" in case.inm (it is now 0.04 in pre-convergence
scf cycle) the problems started. Obviously, I do not believe that the
mixing factor is the problem.

> Is it only lapw1 that is slow?
>

No. All the executables are running slowly in the problematic node.

>
> Second, how did you make those ‘top’s?  As for ‘lapw0’ and ‘lapw1’, I
> am guessing that this is just because the snapshots were taken at
> different times (notice that the CPU times of lapw0 on the two nodes
> are quite different, too).
>

Users can do nothing. The administrator sent me the "top's" and I have
asked him for simultaneous ones.

>
> About the CPU usage on ‘n2’, I find this very suspicious.  If it is as
> Peter said that the jobs are in the initialization and therefore not
> computing much, that may be fine; but I have to disagree with his
> assessment, because the memory usage of lapw1 on the two nodes is
> basically the same (if anything, the image sizes on ‘n2’ are slightly
> larger).  Note also that it is *not* the case that other processes are
> using the CPU; the total usage is at 7.5 %.
>
> It would be good to clarify that by getting a ‘top’ such that we know
> that lapw1 had been running for a while.  To this end, top has an ‘-n’
> option which says how many frames to output, e.g. ‘top -bn 10’.
>
> I am also curious about the load averages.  ‘n2’ has larger “mid-term”
> and “long-term” load averages than the others, and its “short-term”
> average is just as large.  I am not sure what that means.
>
> On 09/23/2015 02:21 PM, Luis Ogando wrote:
> > I can not access the nodes. SSH among them is forbidden ! We have
> > to ask the administrators for anything !! It is the hell !! Of
> > course, only the PBS jobs can "travel" among the nodes.
>
> I do not know about PBS Pro, but Torque and SGE have an option (I
> think ‘-I’ in either case) to submit an interactive job where you get
> a login on a node.  Of course that is only a realistic option when the
> queuing time is not too long.  Otherwise, any information that a more
> sophisticated tool can give you will also be available from the
> command line (just more painful to extract!) via ‘top’, ‘ps’, ‘/proc’,
> etc.  You can also put these things in a jobs script (which you
> apparently already did with ‘top’).
>
>
> Good luck,
>
> Elias
>

Finally, I would like to thank all the comments and say that if I did
not comment on them is because the administrators said they can not be the
origin of the problem, "everything is 0K" (?).
   All the best,
  Luis
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] time difference among nodes

2015-09-24 Thread Elias Assmann

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Luis,

First of all, I wonder: To what extent is this problem reproducible?
E.g., does your job always run on the same 4 nodes?  Is it always the
same node(s) that are slow?  Does the problem also show up in other
calculations (maybe just changing the number of k-points, or
restarting the same case from scratch).  Is it only lapw1 that is slow?

Second, how did you make those ‘top’s?  As for ‘lapw0’ and ‘lapw1’, I
am guessing that this is just because the snapshots were taken at
different times (notice that the CPU times of lapw0 on the two nodes
are quite different, too).

About the CPU usage on ‘n2’, I find this very suspicious.  If it is as
Peter said that the jobs are in the initialization and therefore not
computing much, that may be fine; but I have to disagree with his
assessment, because the memory usage of lapw1 on the two nodes is
basically the same (if anything, the image sizes on ‘n2’ are slightly
larger).  Note also that it is *not* the case that other processes are
using the CPU; the total usage is at 7.5 %.

It would be good to clarify that by getting a ‘top’ such that we know
that lapw1 had been running for a while.  To this end, top has an ‘-n’
option which says how many frames to output, e.g. ‘top -bn 10’.

I am also curious about the load averages.  ‘n2’ has larger “mid-term”
and “long-term” load averages than the others, and its “short-term”
average is just as large.  I am not sure what that means.

On 09/23/2015 02:21 PM, Luis Ogando wrote:
> I can not access the nodes. SSH among them is forbidden ! We have
> to ask the administrators for anything !! It is the hell !! Of
> course, only the PBS jobs can "travel" among the nodes.

I do not know about PBS Pro, but Torque and SGE have an option (I
think ‘-I’ in either case) to submit an interactive job where you get
a login on a node.  Of course that is only a realistic option when the
queuing time is not too long.  Otherwise, any information that a more
sophisticated tool can give you will also be available from the
command line (just more painful to extract!) via ‘top’, ‘ps’, ‘/proc’,
etc.  You can also put these things in a jobs script (which you
apparently already did with ‘top’).

Good luck,

Elias

-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Icedove - http://www.enigmail.net/

iQIcBAEBAgAGBQJWA7M8AAoJEE/4gtQZfOqPu5AQAJERPcJ8VBgVJdiVmDPSmfC0
9lJ+NUXWbNKxP9oXVChniwB/p0TUn588xVtVGIiXuviIW6jWM/reh7aU4NkXfxz/
J3zQq+yZ/gqMnK3JseNpq5hosU6f8keG4dGvq/qz3a+fDefe3Q1KoaTotG3oOyzY
foq3RJjIoY0M7Yl2VJXhhDU6fLWNuu2Uixd9DpbWDmUzhY2o7y8zUZrCdEN0CMN7
OcaUWAkPzFwAdGY/ZVzmc4AvBICXAndBRd29KIMF5JJAxKqwXzbCbROZC14spCl5
Yt8A3deCiUrCGKTuT8w4or8shtkfGxFXXWAEKxY9kKpsHRGmbcOmIVljXk3x6JpV
VOo5y3xHOEmaGOGGRZSDRGK0AWpkiep71us9zOYmnTd0GVuulOOAfi6m4FyTS0vc
3FPws2FUaOZWHm+K0AEMJyyxY5Sz6NwN6sTmiPfelvUdKLDHpDDVyig1a0X+x39+
jfgOx/J927rCYvyWA1/n5h6Mqj7ByUYA3zM9nrrTt3mw5YM/fgCyqlFp8M9cWWRF
cW54Aes9cnV2GdhnbLy7cuOwXK5J7FV6uyQFPipaAkuGEG7ynvUWQdvnftX9j1hL
O8S6WOzZDUYduB3mXJ5XT2iV2jjRd3zEk1niQcRfyFuQUYneY9zuGjpxkknmxEln
5KaBqwFCLo4XnRrvlDkg
=PO9e
-END PGP SIGNATURE-
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] time difference among nodes

0K ! In this case, I will try it !
   Many thanks,
  Luis


2015-09-23 9:23 GMT-03:00 Laurence Marks :

> Ganglia is web based, you don't need ssh. Please read the link I sent.
>
> ---
> Professor Laurence Marks
> Department of Materials Science and Engineering
> Northwestern University
> http://www.numis.northwestern.edu
> Corrosion in 4D http://MURI4D.numis.northwestern.edu
> Co-Editor, Acta Cryst A
> "Research is to see what everybody else has seen, and to think what nobody
> else has thought"
> Albert Szent-Gyorgi
> On Sep 23, 2015 07:21, "Luis Ogando"  wrote:
>
>>Hi,
>>
>>I can not access the nodes. SSH among them is forbidden ! We have to
>> ask the administrators for anything !! It is the hell !!
>>Of course, only the PBS jobs can "travel" among the nodes.
>>All the best,
>>Luis
>>
>>
>> 2015-09-23 9:14 GMT-03:00 Laurence Marks :
>>
>>> Nooo!
>>>
>>> You should use ganglia yourself.
>>>
>>> ---
>>> Professor Laurence Marks
>>> Department of Materials Science and Engineering
>>> Northwestern University
>>> http://www.numis.northwestern.edu
>>> Corrosion in 4D http://MURI4D.numis.northwestern.edu
>>> Co-Editor, Acta Cryst A
>>> "Research is to see what everybody else has seen, and to think what
>>> nobody else has thought"
>>> Albert Szent-Gyorgi
>>> On Sep 23, 2015 07:13, "Luis Ogando"  wrote:
>>>
 Dear Prof. Marks,

Thank you for your comment.
I sent your suggestions to the administrators.
All the best,
 Luis


 2015-09-23 8:56 GMT-03:00 Laurence Marks :

> It is hard to work this out remotely, particularly with unfriendly
> sys_admin.
>
> I would find out if you have ganglia available, see
> http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi/linux/bks/SGI_Admin/books/ICEX_Admin_Guide/sgi_html/ch05.html#Z1190844523tls
> . This is much more useful than top. Try doing http://... to relevant
> head or admin nodes.
>
> ---
> Professor Laurence Marks
> Department of Materials Science and Engineering
> Northwestern University
> http://www.numis.northwestern.edu
> Corrosion in 4D http://MURI4D.numis.northwestern.edu
> Co-Editor, Acta Cryst A
> "Research is to see what everybody else has seen, and to think what
> nobody else has thought"
> Albert Szent-Gyorgi
> On Sep 23, 2015 06:31, "Luis Ogando"  wrote:
>
>> Dear Prof. Blaha and Lyudmila Dobysheva,
>>
>>Many thanks for your comments !
>>Unfortunately, users have no privileges in the cluster. I will
>> send your comments to the administrators and let's see what happens.
>>Many thanks again,
>> Luis
>>
>>
>>
>>
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
>

>>> ___
>>> Wien mailing list
>>> Wien@zeus.theochem.tuwien.ac.at
>>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>> SEARCH the MAILING-LIST at:
>>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>>>
>>>
>>
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
>
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] time difference among nodes

2015-09-23 Thread Laurence Marks

Ganglia is web based, you don't need ssh. Please read the link I sent.

---
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
http://www.numis.northwestern.edu
Corrosion in 4D http://MURI4D.numis.northwestern.edu
Co-Editor, Acta Cryst A
"Research is to see what everybody else has seen, and to think what nobody
else has thought"
Albert Szent-Gyorgi
On Sep 23, 2015 07:21, "Luis Ogando"  wrote:

>Hi,
>
>I can not access the nodes. SSH among them is forbidden ! We have to
> ask the administrators for anything !! It is the hell !!
>Of course, only the PBS jobs can "travel" among the nodes.
>All the best,
>Luis
>
>
> 2015-09-23 9:14 GMT-03:00 Laurence Marks :
>
>> Nooo!
>>
>> You should use ganglia yourself.
>>
>> ---
>> Professor Laurence Marks
>> Department of Materials Science and Engineering
>> Northwestern University
>> http://www.numis.northwestern.edu
>> Corrosion in 4D http://MURI4D.numis.northwestern.edu
>> Co-Editor, Acta Cryst A
>> "Research is to see what everybody else has seen, and to think what
>> nobody else has thought"
>> Albert Szent-Gyorgi
>> On Sep 23, 2015 07:13, "Luis Ogando"  wrote:
>>
>>> Dear Prof. Marks,
>>>
>>>Thank you for your comment.
>>>I sent your suggestions to the administrators.
>>>All the best,
>>> Luis
>>>
>>>
>>> 2015-09-23 8:56 GMT-03:00 Laurence Marks :
>>>
 It is hard to work this out remotely, particularly with unfriendly
 sys_admin.

 I would find out if you have ganglia available, see
 http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi/linux/bks/SGI_Admin/books/ICEX_Admin_Guide/sgi_html/ch05.html#Z1190844523tls
 . This is much more useful than top. Try doing http://... to relevant
 head or admin nodes.

 ---
 Professor Laurence Marks
 Department of Materials Science and Engineering
 Northwestern University
 http://www.numis.northwestern.edu
 Corrosion in 4D http://MURI4D.numis.northwestern.edu
 Co-Editor, Acta Cryst A
 "Research is to see what everybody else has seen, and to think what
 nobody else has thought"
 Albert Szent-Gyorgi
 On Sep 23, 2015 06:31, "Luis Ogando"  wrote:

> Dear Prof. Blaha and Lyudmila Dobysheva,
>
>Many thanks for your comments !
>Unfortunately, users have no privileges in the cluster. I will send
> your comments to the administrators and let's see what happens.
>Many thanks again,
> Luis
>
>
>
>
 ___
 Wien mailing list
 Wien@zeus.theochem.tuwien.ac.at
 http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
 SEARCH the MAILING-LIST at:
 http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


>>>
>> ___
>> Wien mailing list
>> Wien@zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> SEARCH the MAILING-LIST at:
>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>>
>>
>
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] time difference among nodes

   Hi,

   I can not access the nodes. SSH among them is forbidden ! We have to ask
the administrators for anything !! It is the hell !!
   Of course, only the PBS jobs can "travel" among the nodes.
   All the best,
   Luis


2015-09-23 9:14 GMT-03:00 Laurence Marks :

> Nooo!
>
> You should use ganglia yourself.
>
> ---
> Professor Laurence Marks
> Department of Materials Science and Engineering
> Northwestern University
> http://www.numis.northwestern.edu
> Corrosion in 4D http://MURI4D.numis.northwestern.edu
> Co-Editor, Acta Cryst A
> "Research is to see what everybody else has seen, and to think what nobody
> else has thought"
> Albert Szent-Gyorgi
> On Sep 23, 2015 07:13, "Luis Ogando"  wrote:
>
>> Dear Prof. Marks,
>>
>>Thank you for your comment.
>>I sent your suggestions to the administrators.
>>All the best,
>> Luis
>>
>>
>> 2015-09-23 8:56 GMT-03:00 Laurence Marks :
>>
>>> It is hard to work this out remotely, particularly with unfriendly
>>> sys_admin.
>>>
>>> I would find out if you have ganglia available, see
>>> http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi/linux/bks/SGI_Admin/books/ICEX_Admin_Guide/sgi_html/ch05.html#Z1190844523tls
>>> . This is much more useful than top. Try doing http://... to relevant
>>> head or admin nodes.
>>>
>>> ---
>>> Professor Laurence Marks
>>> Department of Materials Science and Engineering
>>> Northwestern University
>>> http://www.numis.northwestern.edu
>>> Corrosion in 4D http://MURI4D.numis.northwestern.edu
>>> Co-Editor, Acta Cryst A
>>> "Research is to see what everybody else has seen, and to think what
>>> nobody else has thought"
>>> Albert Szent-Gyorgi
>>> On Sep 23, 2015 06:31, "Luis Ogando"  wrote:
>>>
 Dear Prof. Blaha and Lyudmila Dobysheva,

Many thanks for your comments !
Unfortunately, users have no privileges in the cluster. I will send
 your comments to the administrators and let's see what happens.
Many thanks again,
 Luis




>>> ___
>>> Wien mailing list
>>> Wien@zeus.theochem.tuwien.ac.at
>>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>> SEARCH the MAILING-LIST at:
>>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>>>
>>>
>>
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
>
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] time difference among nodes

2015-09-23 Thread Laurence Marks

Nooo!

You should use ganglia yourself.

---
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
http://www.numis.northwestern.edu
Corrosion in 4D http://MURI4D.numis.northwestern.edu
Co-Editor, Acta Cryst A
"Research is to see what everybody else has seen, and to think what nobody
else has thought"
Albert Szent-Gyorgi
On Sep 23, 2015 07:13, "Luis Ogando"  wrote:

> Dear Prof. Marks,
>
>Thank you for your comment.
>I sent your suggestions to the administrators.
>All the best,
> Luis
>
>
> 2015-09-23 8:56 GMT-03:00 Laurence Marks :
>
>> It is hard to work this out remotely, particularly with unfriendly
>> sys_admin.
>>
>> I would find out if you have ganglia available, see
>> http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi/linux/bks/SGI_Admin/books/ICEX_Admin_Guide/sgi_html/ch05.html#Z1190844523tls
>> . This is much more useful than top. Try doing http://... to relevant
>> head or admin nodes.
>>
>> ---
>> Professor Laurence Marks
>> Department of Materials Science and Engineering
>> Northwestern University
>> http://www.numis.northwestern.edu
>> Corrosion in 4D http://MURI4D.numis.northwestern.edu
>> Co-Editor, Acta Cryst A
>> "Research is to see what everybody else has seen, and to think what
>> nobody else has thought"
>> Albert Szent-Gyorgi
>> On Sep 23, 2015 06:31, "Luis Ogando"  wrote:
>>
>>> Dear Prof. Blaha and Lyudmila Dobysheva,
>>>
>>>Many thanks for your comments !
>>>Unfortunately, users have no privileges in the cluster. I will send
>>> your comments to the administrators and let's see what happens.
>>>Many thanks again,
>>> Luis
>>>
>>>
>>>
>>>
>> ___
>> Wien mailing list
>> Wien@zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> SEARCH the MAILING-LIST at:
>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>>
>>
>
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] time difference among nodes

Dear Prof. Marks,

   Thank you for your comment.
   I sent your suggestions to the administrators.
   All the best,
Luis


2015-09-23 8:56 GMT-03:00 Laurence Marks :

> It is hard to work this out remotely, particularly with unfriendly
> sys_admin.
>
> I would find out if you have ganglia available, see
> http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi/linux/bks/SGI_Admin/books/ICEX_Admin_Guide/sgi_html/ch05.html#Z1190844523tls
> . This is much more useful than top. Try doing http://... to relevant
> head or admin nodes.
>
> ---
> Professor Laurence Marks
> Department of Materials Science and Engineering
> Northwestern University
> http://www.numis.northwestern.edu
> Corrosion in 4D http://MURI4D.numis.northwestern.edu
> Co-Editor, Acta Cryst A
> "Research is to see what everybody else has seen, and to think what nobody
> else has thought"
> Albert Szent-Gyorgi
> On Sep 23, 2015 06:31, "Luis Ogando"  wrote:
>
>> Dear Prof. Blaha and Lyudmila Dobysheva,
>>
>>Many thanks for your comments !
>>Unfortunately, users have no privileges in the cluster. I will send
>> your comments to the administrators and let's see what happens.
>>Many thanks again,
>> Luis
>>
>>
>>
>>
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
>
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] time difference among nodes

2015-09-23 Thread Laurence Marks

It is hard to work this out remotely, particularly with unfriendly
sys_admin.

I would find out if you have ganglia available, see
http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi/linux/bks/SGI_Admin/books/ICEX_Admin_Guide/sgi_html/ch05.html#Z1190844523tls
. This is much more useful than top. Try doing http://... to relevant head
or admin nodes.

---
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
http://www.numis.northwestern.edu
Corrosion in 4D http://MURI4D.numis.northwestern.edu
Co-Editor, Acta Cryst A
"Research is to see what everybody else has seen, and to think what nobody
else has thought"
Albert Szent-Gyorgi
On Sep 23, 2015 06:31, "Luis Ogando"  wrote:

> Dear Prof. Blaha and Lyudmila Dobysheva,
>
>Many thanks for your comments !
>Unfortunately, users have no privileges in the cluster. I will send
> your comments to the administrators and let's see what happens.
>Many thanks again,
> Luis
>
>
>
>
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] time difference among nodes

Dear Prof. Blaha and Lyudmila Dobysheva,

   Many thanks for your comments !
   Unfortunately, users have no privileges in the cluster. I will send your
comments to the administrators and let's see what happens.
   Many thanks again,
Luis
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] time difference among nodes


23.09.2015 12:22, Lyudmila Dobysheva wrote:

the jobs are all at one processor of the node


Try for to be sure:
In top at n2 type "1" to show individual CPU usage.
It is better to make this after some time to pass the starting phase.

23.09.2015 11:25, Peter Blaha wrote:
> With only a few seconds cpu time, the job is just in the starting 
phase


Lyudmila Dobysheva
--
Phys.-Techn. Institute of Ural Br. of Russian Ac. of Sci.
426001 Izhevsk, ul.Kirova 132
RUSSIA
--
Tel.:7(3412) 432045(office), 722529(Fax)
E-mail: l...@ftiudm.ru, lyuk...@mail.ru (office)
lyuk...@gmail.com (home)
Skype:  lyuka17 (home), lyuka18 (office)
http://ftiudm.ru/content/view/25/103/lang,english/
--
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] time difference among nodes

2015-09-23 Thread Peter Blaha

With only a few seconds cpu time, the job is just in the starting phase 
(allocating memory, reading files, distributing data) and thus cpu-load 
is very low.


A few seconds later, this should reach about 100 % for each lapw1_mpi.

On 09/23/2015 11:20 AM, Lyudmila Dobysheva wrote:

22.09.2015 23:08, Luis Ogando wrote:

r1i1n2
   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
  2096 ogando20   0  927m 642m  20m R9  1.8   0:09.30 lapw1c_mpi
  2109 ogando20   0  926m 633m  17m R9  1.8   0:14.58 lapw1c_mpi
  2122 ogando20   0  924m 633m  19m R9  1.8   0:09.65 lapw1c_mpi
  2124 ogando20   0  922m 627m  15m R9  1.7   0:06.72 lapw1c_mpi
  2108 ogando20   0  927m 633m  17m R8  1.8   0:09.04 lapw1c_mpi
  2110 ogando20   0  926m 633m  17m R8  1.7   0:09.01 lapw1c_mpi
  2111 ogando20   0  924m 627m  13m R8  1.7   0:14.56 lapw1c_mpi
  2095 ogando20   0  930m 641m  17m R8  1.8   0:09.32 lapw1c_mpi
  2121 ogando20   0  927m 634m  17m R8  1.8   0:06.76 lapw1c_mpi
  2123 ogando20   0  924m 632m  18m R8  1.7   0:09.65 lapw1c_mpi
  2098 ogando20   0  922m 634m  16m R8  1.8   0:06.71 lapw1c_mpi
  2097 ogando20   0  927m 641m  19m R7  1.8   0:06.75 lapw1c_mpi


If we sum up the %CPU we obtain 100%, so the jobs are all at one node,
sure.
What does this mean? Maybe, machines file?
Or parallel_options in n2 wien-root?

Best wishes
   Lyudmila Dobysheva
--
Phys.-Techn. Institute of Ural Br. of Russian Ac. of Sci.
426001 Izhevsk, ul.Kirova 132
RUSSIA
--
Tel.:7(3412) 432045(office), 722529(Fax)
E-mail: l...@ftiudm.ru, lyuk...@mail.ru (office)
 lyuk...@gmail.com (home)
Skype:  lyuka17 (home), lyuka18 (office)
http://ftiudm.ru/content/view/25/103/lang,english/
--
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


--

  P.Blaha
--
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
Email: bl...@theochem.tuwien.ac.atWIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at/staff/tc_group_e.php
--
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] time difference among nodes


23.09.2015 12:20, Lyudmila Dobysheva wrote:

the jobs are all at one node

at one processor of the node, of course

  Lyudmila Dobysheva
--
Phys.-Techn. Institute of Ural Br. of Russian Ac. of Sci.
426001 Izhevsk, ul.Kirova 132
RUSSIA
--
Tel.:7(3412) 432045(office), 722529(Fax)
E-mail: l...@ftiudm.ru, lyuk...@mail.ru (office)
lyuk...@gmail.com (home)
Skype:  lyuka17 (home), lyuka18 (office)
http://ftiudm.ru/content/view/25/103/lang,english/
--
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] time difference among nodes


22.09.2015 23:08, Luis Ogando wrote:

r1i1n2
   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
  2096 ogando20   0  927m 642m  20m R9  1.8   0:09.30 lapw1c_mpi
  2109 ogando20   0  926m 633m  17m R9  1.8   0:14.58 lapw1c_mpi
  2122 ogando20   0  924m 633m  19m R9  1.8   0:09.65 lapw1c_mpi
  2124 ogando20   0  922m 627m  15m R9  1.7   0:06.72 lapw1c_mpi
  2108 ogando20   0  927m 633m  17m R8  1.8   0:09.04 lapw1c_mpi
  2110 ogando20   0  926m 633m  17m R8  1.7   0:09.01 lapw1c_mpi
  2111 ogando20   0  924m 627m  13m R8  1.7   0:14.56 lapw1c_mpi
  2095 ogando20   0  930m 641m  17m R8  1.8   0:09.32 lapw1c_mpi
  2121 ogando20   0  927m 634m  17m R8  1.8   0:06.76 lapw1c_mpi
  2123 ogando20   0  924m 632m  18m R8  1.7   0:09.65 lapw1c_mpi
  2098 ogando20   0  922m 634m  16m R8  1.8   0:06.71 lapw1c_mpi
  2097 ogando20   0  927m 641m  19m R7  1.8   0:06.75 lapw1c_mpi


If we sum up the %CPU we obtain 100%, so the jobs are all at one node, 
sure.

What does this mean? Maybe, machines file?
Or parallel_options in n2 wien-root?

Best wishes
  Lyudmila Dobysheva
--
Phys.-Techn. Institute of Ural Br. of Russian Ac. of Sci.
426001 Izhevsk, ul.Kirova 132
RUSSIA
--
Tel.:7(3412) 432045(office), 722529(Fax)
E-mail: l...@ftiudm.ru, lyuk...@mail.ru (office)
lyuk...@gmail.com (home)
Skype:  lyuka17 (home), lyuka18 (office)
http://ftiudm.ru/content/view/25/103/lang,english/
--
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] time difference among nodes