Re: [gridengine users] Negative complex values

2015-06-08 Thread Simon Andrews
Thanks - I'll try doing it that way and hopefully we can get everything stable again. Thanks to everyone for quick reply - and a fix! Simon. On 08/06/2015 18:08, "Alex Chekholko" wrote: >IIRC the JOB vs YES for h_vmem for parallel vs batch is a "known issue". > >So we modified our environment

Re: [gridengine users] command runs in grid engine but does not complete.

2015-06-08 Thread Alex Chekholko
What was the "grid reconfiguration"? On 06/08/2015 11:42 AM, Dan Hyatt wrote: We are running a binary program called metaanalysis, which the user says was working prior to a grid reconfiguration. qsub -cwd -b y /dsg_cent/bin/metal < c22srcfile.txt > c22SBP.log This starts, runs, creates the

Re: [gridengine users] command runs in grid engine but does not complete.

2015-06-08 Thread Chris Dagdigian
Most common scenario when "it works from the command line" but "it does not work in grid engine" is usually: - Different shell environment between command-line and batch execution (especially if SGE is running in POSIX_COMPLIANT mode) - Different ENV variables between CLI and batch environme

[gridengine users] command runs in grid engine but does not complete.

2015-06-08 Thread Dan Hyatt
We are running a binary program called metaanalysis, which the user says was working prior to a grid reconfiguration. qsub -cwd -b y /dsg_cent/bin/metal < c22srcfile.txt > c22SBP.log This starts, runs, creates the logs, and then fails to create the data files qsub -cwd -b y /dsg_cent/bin/me

Re: [gridengine users] qlogin + X11 + pam_sge_authorize ?

2015-06-08 Thread Alex Chekholko
On 06/08/2015 12:49 AM, William Hay wrote: On Fri, 5 Jun 2015 22:16:12 + Alex Chekholko wrote: Hi all, I have a standard grid engine cluster (sge-8.1.8 tarball from Dave Love's site) where users use qlogin to get interactive shells on compute nodes, and we use a qlogin wrapper script to e

Re: [gridengine users] Negative complex values

2015-06-08 Thread Alex Chekholko
IIRC the JOB vs YES for h_vmem for parallel vs batch is a "known issue". So we modified our environment back to "YES" instead of "JOB" and adjusted the rest of the environment appropriately, e.g. your jsv So our h_vmem requests are all "per slot". On 6/8/15 9:00 AM, Simon Andrews wrote: Tha

Re: [gridengine users] Negative complex values

2015-06-08 Thread Simon Andrews
Thanks for replying! Am I reading that right, that if the resource is allocated per job then it doesn't actually need to be available? If that's the case, what is the correct way to set up a job level resource which we can use for scheduling? I suppose I could change the resource to be slot leve

Re: [gridengine users] Negative complex values

2015-06-08 Thread Feng Zhang
Hi Simon, As you defined the h_vmem as "JOB", according to the manual: " A consumable defined by 'y' is a per slot consumables which means the limit is multiplied by the number of slots being used by the job before being applied. In case of 'j' the consumable is a per jo

Re: [gridengine users] Negative complex values

2015-06-08 Thread Simon Andrews
Having done a bit of investigation it seems that the problem we're hitting is that our h_vmem limits aren't being respected if the jobs are being submitted as parallel jobs. If I put two jobs in: $ qsub -o test.log -l h_vmem=1000G hostname Your job 343719 ("hostname") has been submitted $ qsub

[gridengine users] Negative complex values

2015-06-08 Thread Simon Andrews
Our cluster seems to have ended up in a strange state, and I don't understand why. We have set up h_vmem to be a consumable resource so that users can't exhaust the memory on any compute node. This has been working OK and in our tests it all seemed to be right, but we've now found that somehow

Re: [gridengine users] qlogin + X11 + pam_sge_authorize ?

2015-06-08 Thread William Hay
On Fri, 5 Jun 2015 22:16:12 + Alex Chekholko wrote: > Hi all, > > I have a standard grid engine cluster (sge-8.1.8 tarball from Dave > Love's site) where users use qlogin to get interactive shells on compute > nodes, and we use a qlogin wrapper script to enable X11 forwarding, by > using