Hi Richard,

I would like to double check the following items if I were you.

1. /etc/hosts, ssh keys, nagios/nrpe, gmetad/gmond are all synced through all 
the nodes.
2. Make sure that the root user can ssh into all the nodes back and forth 
without password.
3. All the daemons of the job submission are running on all the nodes:
    (torque-server, torque-mom in the head node and torque-mom in the client 
nodes and maui on the head node)
    I assume that you are using torque as RM and maui as a scheduler.

Regards,

--
- DongInn



> On May 30, 2016, at 7:25 PM, Richard Young <richard.yo...@usq.edu.au> wrote:
> 
> I was hoping somebody would be able to help me with the following problem.
> 
> Recently I have applied updates and done some reconfiguration on a RHEL6.8 
> cluster running Oscar. The major change was changing the ipaddress of the 
> oscar_server, this was required because changes to the network structure. The 
> ipaddress has been applied to /etc/hosts, ssh keys, nagios/nrpe, gmetad/gmond 
> etc. However, I have missed something because no jobs will now run on the 
> cluster. The jobs basically site in the queue and then get cancelled because 
> they have hit their walltime.
> 
> Has anybody come across this problem before and be able to supply some 
> insight into how to fix the problem(s).
> 
> Thanks
> 
> ---------------------------------------------------------------------
> Richard A. Young
> ICT Services
> HPC Systems Engineer
> University of Southern Queensland
> Toowoomba, Queensland 4350
> Australia
> Email: richard.yo...@usq.edu.au   Phone: (07) 46315557
> Mob:   0437544370          Fax:   (07) 46312798
> ---------------------------------------------------------------------
> 
> 
> 
> _____________________________________________________________
> This email (including any attached files) is confidential and is for the 
> intended recipient(s) only. If you received this email by mistake, please, as 
> a courtesy, tell the sender, then delete this email.
> 
> The views and opinions are the originator's and do not necessarily reflect 
> those of the University of Southern Queensland. Although all reasonable 
> precautions were taken to ensure that this email contained no viruses at the 
> time it was sent we accept no liability for any losses arising from its 
> receipt.
> 
> The University of Southern Queensland is a registered provider of education 
> with the Australian Government.
> (CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 )
> 
> 
> ------------------------------------------------------------------------------
> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
> patterns at an interface-level. Reveals which users, apps, and protocols are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity
> planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
> _______________________________________________
> Oscar-users mailing list
> Oscar-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oscar-users

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to