[gridengine users] Random Crash when launching to an empty queue

2014-01-27 Thread thomas . forde
I have a strange problem that occurs randomly at times, when i try to launch many jobs into a empty queue system from same project the first 4-5 jobs always crash with random errors but most in regards to not being able to read a file. if i launch jobs manually everything goes fine, if i

Re: [gridengine users] Reserve cluster for large job

2014-01-27 Thread Joe Borġ
Hi Reuti, They're pretty mammoth cluster nodes. But anyway, the load is set to 99 to stop it from rejecting jobs based on load (in this test queue). We're not currently using the checkpoint feature either. Regards, Joseph David Borġ josephb.org On 23 January 2014 10:11, Reuti

Re: [gridengine users] Random Crash when launching to an empty queue

2014-01-27 Thread thomas . forde
Thats right,, its the jobs that is crashing. hav a python scrip that generates the jobs and sends them into the queue with qsub. I have been testing and digging a bit for a while now, and it seems when i launch multiple jobs under the same project, they read from the same _mesh folder, and

[gridengine users] Using modules form the compute nodes

2014-01-27 Thread Txema Heredia
Hi all, I have been trying to use modulefiles from my compute nodes with no avail. When a job starts, the modulecmd command is in the path, but the module function is nowhere to be found. I have tried to add calls to /etc/profile.d/modules.sh in both /etc/bashrc and ~/.bashrc, and even

Re: [gridengine users] Using modules form the compute nodes

2014-01-27 Thread Reuti
Hi, Am 27.01.2014 um 17:26 schrieb Txema Heredia: I have been trying to use modulefiles from my compute nodes with no avail. When a job starts, the modulecmd command is in the path, but the module function is nowhere to be found. I have tried to add calls to /etc/profile.d/modules.sh in

Re: [gridengine users] Using modules form the compute nodes

2014-01-27 Thread bergman
In the message dated: Mon, 27 Jan 2014 17:50:58 +0100, The pithy ruminations from Reuti on Re: [gridengine users] Using modules form the compute nodes were: = Hi, = = Am 27.01.2014 um 17:26 schrieb Txema Heredia: = = I have been trying to use modulefiles from my compute nodes with no avail.

Re: [gridengine users] Using modules form the compute nodes

2014-01-27 Thread Ed Lauzier
Eventually this will be fixed. qsub -cwd -v module should work qsub -cwd -V . does not transfer functions.. Maybe Univa will fix it.if not already -Ed -Original Message- From: berg...@merctech.com [mailto:berg...@merctech.com] Sent: Monday, January 27,

[gridengine users] soge 8.1.6 - on a standalone aws ec2 instance with dhcp

2014-01-27 Thread Ed Lauzier
Hi, Anyone have experience with aws ec2 and gridengine for a standalone instance running ge with dhcp enabled? Is there a way to configure the ami so that if the ip address and hostname change it will not effect the instance of ge. Ex: 16 core 30 GB ram with EBS storage 200 GB and where the ip

Re: [gridengine users] soge 8.1.6 - on a standalone aws ec2 instance with dhcp

2014-01-27 Thread Chi Chan
There is nothing that special about AWS EC2. What you get is a Xen VM on a standard DHCP network. You can set the localhost hostname using standard methods like /etc/hosts when your VM boots. --Chi On Mon, Jan 27, 2014 at 3:24 PM, Ed Lauzier elauzi...@perlstar.com wrote: Hi, Anyone have

Re: [gridengine users] soge 8.1.6 - on a standalone aws ec2 instance with dhcp

2014-01-27 Thread Ed Lauzier
I try to avoid using the loopback address for hostname, but may be ok for a standalone instance( like fedora desktop systems used to doand maybe still do) I'll give it a shot and see how it works. I also use x11 graphical tools in the instance ( rstudio... R plots.etc)

Re: [gridengine users] soge 8.1.6 - on a standalone aws ec2 instance with dhcp

2014-01-27 Thread Bill Bryce
Grid Engine certainly doesn't like it when you use the loopback 127.0.0.1 for the hostname...if that is what you were intending to do. Regards, Bill. On Jan 27, 2014, at 4:33 PM, Ed Lauzier elauzi...@perlstar.com wrote: I try to avoid using the loopback address for hostname, but may be ok

Re: [gridengine users] soge 8.1.6 - on a standalone aws ec2 instance with dhcp

2014-01-27 Thread Ed Lauzier
Update: Finally coaxed it to work after a bit of cleanup. put desired hostname on loopback address. example 127.0.0.1 localhost.localdomain localhost myhostname.mydomain.com myhostname act_qmaster set to the desired hostname cleaned up queue, admin, and sub host entries set hostname

Re: [gridengine users] Using modules form the compute nodes

2014-01-27 Thread Txema Heredia
El 27/01/14 17:50, Reuti escribió: Hi, Am 27.01.2014 um 17:26 schrieb Txema Heredia: I have been trying to use modulefiles from my compute nodes with no avail. When a job starts, the modulecmd command is in the path, but the module function is nowhere to be found. I have tried to add calls