Re: [gridengine users] OGS status

2015-11-10 Thread Rayson Ho
On Tue, Nov 10, 2015 at 7:08 AM, Taras Shapovalov wrote: > Hi guys, > > Can we state now that OGS has not been developed any more (already 4 year) > and we should not recommend our customers to install it? The last release was a bug fix version GAed in 2012, so it was released 2-3 years ago. Ron

Re: [gridengine users] Project status

2014-03-18 Thread Rayson Ho
Hi Joshua, We are still developing new features for our users, especially new features from consulting projects. We are just too busy to integrate all of them to the open source branch. If you look at our feature list, you will find that we are the first to develop the ARM Linux port, the cgroups

Re: [gridengine users] Grid on beaglebone

2014-02-13 Thread Rayson Ho
In 2011, we released OGS / GE 2011.11 that includes the ARM port. We tried it on a Raspberry Pi like platform, but any ARM Linux platform is supported. The code was then copied & referenced by other GE forks as well, so just use any recent version of Grid Engine, can you can compile & run your fav

Re: [gridengine users] Compatibility of 6.2u4 and 6.2u5?

2014-02-04 Thread Rayson Ho
On Tue, Feb 4, 2014 at 6:40 PM, Simon Matthews wrote: > I found binaries for 6.2u5 both in the form of a tarfile and packed in rpms > (in the EPEL repository for CentOS). However, if I try to use qstat from > these against my 6.2u4 qmaster, I get a timeout message: > error: failed receiving gdi re

Re: [gridengine users] Submitting a script vs a binary

2014-01-23 Thread Rayson Ho
There is slightly less overhead to "submit" a binary, because there is no need to spool the job script. Also, we used to tell our users that they can change the job scripts after submission, and that's one small advantage of submitting job scripts. However, in most use cases just users just need t

Re: [gridengine users] cygwin support by scalable logic

2013-06-26 Thread Rayson Ho
We still need more testing for 64-bit Windows. Currently, 32-bit works for us. Rayson On Wed, Jun 26, 2013 at 11:16 AM, Raghavendra wrote: > Hi, > > I found that scalable logic guys have implemented support of cygwin from > the link bellow: > > http://blogs.scalablelogic.com/2012/06/grid-engi

Re: [gridengine users] Scalable Logic

2013-01-31 Thread Rayson Ho
Hi, Sorry we missed your emails -- we grew from just 1 business unit (used to only have Grid Engine support) to 2 business units now. We now have: - Grid Engine support - HPC Cloud services And we route our customers & users accordingly. Looks like your email address did not belong to any of the

[gridengine users] Cluster Data Management

2012-12-19 Thread Rayson Ho
Somewhat related to the email message below: I am going to play with a StoragePod 2.0 for the next project in 2013... The machine uses off-the-shelf components like motherboard, CPU, 45 disks, memory, power supply... and the real intellectual property that was released by Backblaze is the 4U custom

[gridengine users] A 10, 000-node Grid Engine Cluster in Amazon EC2

2012-12-19 Thread Rayson Ho
012 at 10:52 AM, Rayson Ho wrote: > > This year, we tested the scalability of Open Grid Scheduler / Grid > Engine on the cloud -- we ran a 10,000-node cluster on EC2 (we could > have used Gompute's hardware but obviously there are more important > workloads in

Re: [gridengine users] will changes to a hard limit in a queue config roll down into running jobs?

2012-11-15 Thread Rayson Ho
I just checked the source, "JB_hard_wallclock_gmt" (which is an execd internal data structure first set when the job starts running, and the value is pulled from h_rt) is only handled by the execd, so if the execd is not running (just do what Reuti mentioned in his email below), then the job *shoul

[gridengine users] SC12 & HPC in the Cloud

2012-11-15 Thread Rayson Ho
For those at SC12 who are interested in running your HPC jobs in the Cloud, or just open source Grid Engine in general, please visit the Gompute booth (#3436): http://www.gompute.com/-/sc12-nov-10-16-salt-lake-city-usa This year, we tested the scalability of Open Grid Scheduler / Grid Engine on t

Re: [gridengine users] ulimits not taking in GE

2012-11-14 Thread Rayson Ho
Joseph, You need to set S_DESCRIPTORS, H_DESCRIPTORS with the "execd_params" option in sge_conf: http://gridscheduler.sourceforge.net/htmlman/htmlman5/sge_conf.html Example: H_DESCRIPTORS=1 Rayson On Wed, Nov 14, 2012 at 1:51 PM, Joseph Farran wrote: > Hi All. > > I increased our ulimit

Re: [gridengine users] disaster recovery of grid engine setup..

2012-11-06 Thread Rayson Ho
On Tue, Nov 6, 2012 at 5:47 AM, Paul Simpson wrote: > The child process does not get killed which leaves an over > stressed machine which leads to knock on errors. Are these parallel (MPI?) jobs? > From reading this list, we > are not alone in suffering from this. Can anyone shred light on this

Re: [gridengine users] New to OGE. Installation questions

2012-11-02 Thread Rayson Ho
Hi Arnau, Thanks for using OGE (the official name for it is OGS/GE, but a few people call it OGE as well, so we don't really have a preference). The SRPM build does not use the version of jemalloc shipped with the source (in fact, it also does not use the hwloc library included), but instead uses

Re: [gridengine users] Q: how to set up a job in which multiple executables run simultaneously? Also: Open MP environment variables honored?

2012-10-26 Thread Rayson Ho
On Fri, Oct 26, 2012 at 1:55 PM, Wagner, Justin wrote: > while (!foo_done_file){}; > > while (!bar_done_file){}; > > while (!baz_done_file){}; Or use bash's built-in wait, like this: http://stackoverflow.com/questions/356100/how-to-wait-in-bash-for-several-subprocesses-to-finish-and-return-exit-

Re: [gridengine users] h_vmem negative values?

2012-10-18 Thread Rayson Ho
Alex, Can you run qhost and see if the memory value is also negative also?? If it is, then this bug was fixed in any release of OGS/GE. Rayson On Thu, Oct 18, 2012 at 6:53 PM, Alex Chekholko wrote: > Hi, > > Running Rocks 6, so whatever GE version is included there. > > h_vmem is set consumab

Re: [gridengine users] setting environment variables in prolog?

2012-10-09 Thread Rayson Ho
Prolog runs as a separate process, so all env vars are discarded when the prolog is done. To implement what you need, you can look at the starter method... Rayson On Tue, Oct 9, 2012 at 6:40 PM, Orion Poplawski wrote: > Is there a way to set environment variables in a prolog script that persi

Re: [gridengine users] [Gridscheduler-users] Announce: gewake 1.1

2012-10-02 Thread Rayson Ho
Thanks Mark! It is much better than reading the access time of /dev/mouse method that I used to use ages ago! Rayson On Tue, Oct 2, 2012 at 1:21 PM, Bober, Mark wrote: > Nice! That's a lot cleaner than my hacked-up set of scripts. I'll try it > as soon as I can. > > I noted your comment about

Re: [gridengine users] queues not erroring out when jobs error out

2012-09-04 Thread Rayson Ho
blem, which is not related to grid > engine. But in the meantime, I wonder if there is some workaround for my > filesystem issue. > > Is there a way to make the load sensor check more frequent? > > Regards, > Alex > > > On 09/04/2012 12:03 PM, Rayson Ho wrote: >&

Re: [gridengine users] queues not erroring out when jobs error out

2012-09-04 Thread Rayson Ho
Hi Alex, That's the correct behavior (for SSTATE_OPEN_OUTPUT), or else a user can DoS the cluster easily by pointing the input or output file to a path that can't be opened by the user. Rayson On Tue, Sep 4, 2012 at 2:50 PM, Alex Chekholko wrote: > Hi, > > I have a cluster with Rayson's OGE f

Re: [gridengine users] sge_master Daemon crashing

2012-08-31 Thread Rayson Ho
Oh, and when it crashes, eg.: Program received signal SIGSEGV, Segmentation fault. ... (gdb) where You will then see the stack trace. Rayson On Fri, Aug 31, 2012 at 3:36 PM, Rayson Ho wrote: > - Set SGE_ND in the env > - At the shell, gdb sge_qmaster , and then "r". > &g

Re: [gridengine users] sge_master Daemon crashing

2012-08-31 Thread Rayson Ho
- Set SGE_ND in the env - At the shell, gdb sge_qmaster , and then "r". Rayson On Fri, Aug 31, 2012 at 3:33 PM, Bob Tupper wrote: > Can you please explain in more detail how to launch with the debugger > enabled? > > Thanks > > > On 08/31/2012 12:25 PM, Rayson

Re: [gridengine users] sge_master Daemon crashing

2012-08-31 Thread Rayson Ho
per wrote: > Thanks for your help. > I do have PE defined. But it crashes with just a simple job that just > sleeps. > Crashes every time. > -Bob > > > > On 08/31/2012 11:59 AM, Rayson Ho wrote: >> >> Do you have parallel (or PE) jobs in your cluster?? A bug in SGE

Re: [gridengine users] sge_master Daemon crashing

2012-08-31 Thread Rayson Ho
Do you have parallel (or PE) jobs in your cluster?? A bug in SGE 6.2u5 can cause the qmaster to seg fault when it receives the job reports from parallel jobs. Rayson On Fri, Aug 31, 2012 at 2:52 PM, Bob Tupper wrote: > Greetings, > > Hope someone can help me out. > I have a 6.2u5 install on ce

Re: [gridengine users] Compiling grid engine 2011.11 on solaris 11

2012-08-30 Thread Rayson Ho
e > Fujitsu Semiconductor Design (Chengdu) Co. Ltd., > Phone : (86)28-85150023 ext.8826 > E-mail:harris...@cn.fujitsu.com > > > -Original Message- > From: Rayson Ho [mailto:rayray...@gmail.com] > Sent: Thursday, August 30, 2012 11:25 PM > To: Harris He, Kun - CD > Cc:

Re: [gridengine users] Compiling grid engine 2011.11 on solaris 11

2012-08-30 Thread Rayson Ho
- > From: users-boun...@gridengine.org [mailto:users-boun...@gridengine.org] On > Behalf Of Rayson Ho > Sent: Tuesday, February 14, 2012 12:45 AM > To: Pierre Girard > Cc: users@gridengine.org > Subject: Re: [gridengine users] Compiling grid engine 2011.11 on solaris 11 > >

Re: [gridengine users] Solaris 5.8 no go?

2012-08-29 Thread Rayson Ho
Hi Harris, >From the error message, it looks like you don't have make installed... Are you able to run "make" from an interactive shell?? Rayson On Wed, Aug 29, 2012 at 10:08 PM, Harris He, Kun - CD wrote: > Dear All, > > > > I encounter a problem recently. > > The version GE2011.11p1 that i

Re: [gridengine users] Segfault trying to start qmaster w/GE2011.11p1

2012-08-27 Thread Rayson Ho
x08239fae in sge_monitor_init () > #3 0x08056463 in sge_signaler_main () > #4 0x00166a49 in start_thread () from /lib/libpthread.so.0 > #5 0x00259e5e in clone () from /lib/libc.so.6 > > Let me know if you want/need any more info > > -Karl > On Mon, Aug 27, 2012 at 10

Re: [gridengine users] Segfault trying to start qmaster w/GE2011.11p1

2012-08-27 Thread Rayson Ho
Most of our users don't run 32-bit Linux, but we tested it and it worked for us (earlier RHEL versions, but not CentOS 6.3). Can you run the qmaster under gdb so that gdb would show the stack trace? Rayson On Mon, Aug 27, 2012 at 10:35 AM, Karl Vollmer wrote: > Hello, > > I recently tried in

Re: [gridengine users] SGE 6.2u5 - submitting to whole nodes

2012-08-22 Thread Rayson Ho
There's Exclusive Scheduling for allocating the whole node: http://docs.oracle.com/cd/E24901_01/doc.62/e21978/management.htm#sthref431 Rayson On Wed, Aug 22, 2012 at 3:39 PM, Henrichs, Juryk wrote: > Hi, > > we have a heterogeneous cluster consisting of nodes with 32 and 48 > cpu's. Some of o

Re: [gridengine users] qrsh & modules environment

2012-08-15 Thread Rayson Ho
On Wed, Aug 15, 2012 at 1:55 PM, Joseph Farran wrote: > This was another issued that Son of Grid Engine sge_8.1.1 solved that > GE2011.11 had issues with. Just for the record, GE 2011.11p1 fixed a module related issue. However, I've re-read the whole discussion again, and I don't see any clean a

Re: [gridengine users] Jobs not using all cores

2012-08-09 Thread Rayson Ho
It's the correct behavior, because with jobs running, then there is load. And by default Grid Engine dispatches jobs to the lightest load machines first. You can take a look at this blog entry, "N1GE 6 - Scheduler Hacks: "least used" / "fill up" configuration": http://wiki.gridengine.info/wiki/in

Re: [gridengine users] Install SGE2011 at CentOS

2012-08-06 Thread Rayson Ho
On Mon, Aug 6, 2012 at 11:10 PM, Harris He, Kun - CD wrote: > But when I run: scripts/distinst -all -local –noexit > > A error information shows that: > > --- > > Installing Libjuti.so > > “Libdb-4.4.so” not found. Assuming binaries are statically linked. Hi, You can safely i

Re: [gridengine users] "Can't close file usage"

2012-08-03 Thread Rayson Ho
Check if the execd spool directory local or NFS shared?? The usage file is written into the execd spool, not the job's working dir. Rayson On Fri, Aug 3, 2012 at 2:07 PM, Simon Matthews wrote: > Can anyone tell me what is going on here? > > The machine has plenty of disk space. The directory f

Re: [gridengine users] getting information on finished job

2012-08-02 Thread Rayson Ho
May be you can run qacct -j ? Rayson On Thu, Aug 2, 2012 at 10:28 AM, Lionel SPINELLI wrote: > Hello all, > > I would like to know if it is possible from a submit host to get information > on a job that ended normally. > I mean, using "qstat -f" or "qstat -j " I can get information on pending

Re: [gridengine users] start_gui_installer ( re-adding nodes )

2012-08-01 Thread Rayson Ho
Hi Joseph, If you want to add a node from scratch, then you can try "install_execd" - it should create all the needed files & the needed queues, etc for the node. If the qmaster already has the queues defined, and everything is the same (node name, etc) except that the node's filesystem is gone,

Re: [gridengine users] $SGE_STDOUT_PATH & $SGE_STDERR_PATH but for PE Environments?

2012-08-01 Thread Rayson Ho
Hi Joesph, Sorry I don't have time to check the code... what is SGE_STDOUT_PATH set to in your setup, and what's the expected behavior?? Rayson On Wed, Aug 1, 2012 at 2:51 PM, Joseph Farran wrote: > Hi. > > GE Environments variables are set for SGE_STDOUT_PATH & SGE_STDERR_PATH for > standard

Re: [gridengine users] RE : Managing user password

2012-07-26 Thread Rayson Ho
your responses... I will try to find a solution... > > > Lionel > ____ > De : Rayson Ho [rayray...@gmail.com] > Date d'envoi : jeudi 26 juillet 2012 17:37 > À : Reuti > Cc : Lionel SPINELLI; users@gridengine.org > Objet : Re: [grid

Re: [gridengine users] Managing user password

2012-07-26 Thread Rayson Ho
On Thu, Jul 26, 2012 at 11:32 AM, Reuti wrote: >> But when I try to submit a job, I got an error telling "error: can't open >> output file "/home/tommy/simple.sh.o60": Permission denied >> >> Obviously, if i use the -o and -e option of qsub to put the output and error >> logs to a shared disk wi

Re: [gridengine users] Managing user password

2012-07-26 Thread Rayson Ho
On Thu, Jul 26, 2012 at 11:17 AM, Lionel SPINELLI wrote: > But when I try to submit a job, I got an error telling "error: can't open > output file "/home/tommy/simple.sh.o60": Permission denied > > Obviously, if i use the -o and -e option of qsub to put the output and error > logs to a shared di

Re: [gridengine users] Use SGE for cluster stats

2012-07-16 Thread Rayson Ho
qacct, ACRo, etc?? In fact, many people write their own scripts to parse data in the accounting & reporting files: http://gridscheduler.sourceforge.net/htmlman/htmlman5/accounting.html http://gridscheduler.sourceforge.net/htmlman/htmlman5/reporting.html Rayson On Mon, Jul 16, 2012 at 2:58 PM,

Re: [gridengine users] Futex leap-second bug for GridEngine?

2012-07-13 Thread Rayson Ho
Others fixed it by running date -s "`date`" : Ref: http://gridengine.org/pipermail/users/2012-July/004089.html Rayson On Fri, Jul 13, 2012 at 9:31 PM, Daniel Povey wrote: > Has anyone noticed their sge_execd proceses suddenly taking up a lot of CPU, > possibly since around July 2nd this year?

Re: [gridengine users] cgroups Integration in OGS/GE 2011.11 update 1

2012-07-12 Thread Rayson Ho
On Thu, Jul 12, 2012 at 6:24 AM, William Hay wrote: > It's more than a month later and AFAICT your public SVN hasn't > advanced beyond 2011.11p1. > Is there any way to get hold of a version with your cgroups PDC(inc > enforcing rss)? We'd like to evaluate it on our test cluster. Hi William, We

Re: [gridengine users] queues behaving differently

2012-07-11 Thread Rayson Ho
On Wed, Jul 11, 2012 at 9:38 AM, John Young wrote: > Hmmm... If it was an OS issue, I would expect both queues to behave > the same way since both are under the same OS, but they don't. Again, as a normal user & then as root, run an interactive shell - but let me clarify, *outside of Grid Engine

Re: [gridengine users] queues behaving differently

2012-07-10 Thread Rayson Ho
On Tue, Jul 10, 2012 at 4:23 PM, John Young wrote: > With this in place, it seems odd that from one of my queues I > get a default setting for the number of descriptors of 1024. > > So I have two questions really: > > 1. Why am I getting different behavior from the two queues? Could be OS issue -

Re: [gridengine users] Default Shell bash not always found

2012-07-10 Thread Rayson Ho
On Tue, Jul 10, 2012 at 3:51 PM, Reuti wrote: >> Failure was that user's .bashrc were not being read. In each user's >> account, I have it sourcing a system wide shell script which sets certain >> things up, like our module environment configuration and so nothing was >> being setup. > > Yep,

Re: [gridengine users] queues behaving differently

2012-07-10 Thread Rayson Ho
On Tue, Jul 10, 2012 at 4:02 PM, John Young wrote: >> If you really have a real use-case for setting the # of descriptors in >> the queue config, then let us know and we can implement that in OGS/GE >> (... when time permits). >> > Well... I have an engineer here who want to run a 2048 core job.

Re: [gridengine users] queues behaving differently

2012-07-10 Thread Rayson Ho
The number of file descriptors is not part of the queue limit, see the message I sent to the list 2 months ago: http://gridengine.org/pipermail/users/2012-May/003705.html If you really have a real use-case for setting the # of descriptors in the queue config, then let us know and we can implement

Re: [gridengine users] Stop executing jobs on job error?

2012-07-10 Thread Rayson Ho
On Tue, Jul 10, 2012 at 5:45 AM, Reuti wrote: > > Just to note, that the path can be accessed by $SGE_JOB_SPOOL_DIR. Thanks Reuti - it will be useful to David. I forgot this environment var as I have not used this hack for almost a year... basically since getting the job exit status in epilog w

Re: [gridengine users] Default Shell bash not always found

2012-07-10 Thread Rayson Ho
On Tue, Jul 10, 2012 at 1:48 PM, Joseph Farran wrote: > I was using the same identical script, so it's still a mystery why the > script ran on some nodes while it failed on others, but now with this change > it works on all nodes and that is good enough. Those settings affect whether the global l

Re: [gridengine users] Stop executing jobs on job error?

2012-07-09 Thread Rayson Ho
an you > point me to any info on what this error state does? Presumably it > stops the existing and future jobs until the error state is cleared, > but I'd love do know how to clear the error state, etc. > > Thanks! > David > > ps is there a PPA available for OGE on Ubunt

Re: [gridengine users] Stop executing jobs on job error?

2012-07-09 Thread Rayson Ho
If you are using Open Grid Scheduler/Grid Engine 2011.11 or later, then there is the $SGE_JOBEXIT_STAT variable set in the epilog, so you can check the exit status of the job. And then in the epilog, you can close the queue or in fact do whatever you want - like disable scheduling new jobs, etc. T

Re: [gridengine users] Java out of memory errors with gridengine on Ubuntu

2012-07-09 Thread Rayson Ho
See this blog post I wrote back in 2005: http://web.archive.org/web/20051219011530/http://gridengine.info/articles/2005/10/10/unlimited-stack-limit-on-solaris Rayson On Mon, Jul 9, 2012 at 10:41 AM, Peter van Heusden wrote: > Hi there > > I'm using gridengine 6.2u5-4 on Ubuntu 12.04. I've se

Re: [gridengine users] GE2011.11 compilation error

2012-07-04 Thread Rayson Ho
On Wed, Jul 4, 2012 at 10:38 AM, Reuti wrote: >> ed.screen.c:(.text+0x7d3): undefined reference to `tputs' > > This looks like the ncurses-devel package is missing. It's a common problem with newer versions of RHEL-based distros. I downloaded CentOS 6.2 & Oracle Linux 6.3... I will install them s

Re: [gridengine users] qconf -aattr/-mattr support for "rqs" object ?

2012-07-03 Thread Rayson Ho
Off the top of my head... it's "resource_quota". Rayson On Tue, Jul 3, 2012 at 10:27 AM, CB wrote: > Hi, > > In OGS/GE 2011.11 man page, qconf, it says: > > >-aattr obj_spec attr_name val obj_instance,... > > > > Allows adding specifications to a si

Re: [gridengine users] New version of GE2011 ?

2012-07-03 Thread Rayson Ho
for Cygwin binary: http://dl.dropbox.com/u/47200624/GE2011.11p1/ge2011.11-cygwin-p1.tar.bz2 Rayson On Mon, Jul 2, 2012 at 5:24 PM, Joseph Farran wrote: > When is GE 2011.11 update 1 with cgroups planned on being released? > > > On 07/02/2012 01:56 PM, Rayson Ho wrote: >> >>

Re: [gridengine users] New version of GE2011 ?

2012-07-02 Thread Rayson Ho
GE 2011.11 patch 1 was released a while ago (back in April or May I believe): http://dl.dropbox.com/u/47200624/GE2011.11p1/GE2011.11p1.tar.gz But we did not release certified binaries for it yet, so you will need to compile from source, or wait for GE 2011.11 update 1 that has the Grid Engine cgr

Re: [gridengine users] ge2011.11 installation

2012-07-02 Thread Rayson Ho
such file or directory Rayson On Mon, Jul 2, 2012 at 12:41 PM, george he wrote: > Hello Rayson, > > That file is a link to chkconfig: > > /usr/lib/lsb/install_initd -> ../../../sbin/chkconfig > > How do you use another way to install the init script? > > Thanks, >

Re: [gridengine users] ge2011.11 installation

2012-07-02 Thread Rayson Ho
Check if you have /usr/lib/lsb/install_initd on your system. On my Fedora 17, that file is missing, and thus the install script uses another way to install the init scripts. Rayson On Mon, Jul 2, 2012 at 11:52 AM, george he wrote: > Hello all, > > I downloaded ge2011.11 from > http://dl.dropbo

Re: [gridengine users] Jobs "adding up" resource reports (cpu, mem, io) from other jobs in the same node

2012-06-28 Thread Rayson Ho
On Thu, Jun 28, 2012 at 8:11 AM, Txema Heredia Genestar wrote: > 2-20100 I'll change it to 2-3 to see if it still happens. Yes, it can be due to a collusion in the GID range with some valid user GIDs - may be real users are assigned those GIDs?? > But I have checked the gid for all

Re: [gridengine users] file permission change on $TMP directory

2012-06-27 Thread Rayson Ho
On Wed, Jun 27, 2012 at 12:51 PM, CB wrote: > The one last ToDo > is the trace file, which is owned by the job owner and it still has > world-readable permission. > See the attached file for hardening the trace file (patch file against Open Grid Scheduler trunk - should work with older versio

Re: [gridengine users] file permission change on $TMP directory

2012-06-27 Thread Rayson Ho
On Wed, Jun 27, 2012 at 1:26 PM, Reuti wrote: > The job scripts need to be readable by each individual user who is running > the job at execution time. What permission did you put on the execd's spool > directory? For binary jobs it will work (unless you want to use $PE_HOSTFILE > or alike). Y

Re: [gridengine users] file permission change on $TMP directory

2012-06-27 Thread Rayson Ho
Didn't know that you wanted to harden the spool files as well (that requirement was not in your original email) - but can't you just change the permission of the execd spool directory?? I just do a quick experiment and jobs seem to run fine. With the permission set at the execd local spool dir, th

Re: [gridengine users] GridEngine and soft user limits not being respected.

2012-06-23 Thread Rayson Ho
Grid Engine does not parse /etc/security/limits.conf, and you may want to set the limit in the queue configuration: http://gridscheduler.sourceforge.net/htmlman/htmlman5/queue_conf.html Rayson On Sat, Jun 23, 2012 at 9:43 PM, Daniel Povey wrote: > We have a problem in our queue that GridEngine

Re: [gridengine users] Possible OGE Bug with Subordinate Field when using OGE GUI

2012-06-21 Thread Rayson Ho
ges to the code. > > If I had any control over this software product, I would have broken the > ability to change the subordinate field with the GUI until the bug was > corrected. > > > > On 06/21/2012 08:39 AM, Rayson Ho wrote: >> >> It's a known bug, it was

Re: [gridengine users] Possible OGE Bug with Subordinate Field when using OGE GUI

2012-06-21 Thread Rayson Ho
On Thu, Jun 21, 2012 at 11:31 AM, Reuti wrote: >> If it's a bug, is this a new bug or already known bug? > > Maybe it depends on the fork. It's a known bug, it was in the Sun bug database. We didn't fix it because not that many people use qmon (ie. a lot of us just use the command line interface)

Re: [gridengine users] file permission change on $TMP directory

2012-06-20 Thread Rayson Ho
See if it is this one: daemons/execd/tmpdir.c (I have not tested it myself so I could be wrong!) Rayson On Wed, Jun 20, 2012 at 2:09 PM, CB wrote: > Hi, > > I am using the GE2011.11 release. > > When a job dispatched to a node, it creates $TMP directory, which is usually > located at /tmp on

Re: [gridengine users] $SGE_ROOT/$SGE_CELL/common/sge_qstat

2012-06-19 Thread Rayson Ho
If you add it into your ~/.sge_qstat then you will notice the difference... Rayson On Tue, Jun 19, 2012 at 7:06 PM, Wagner, Justin wrote: > Perhaps I am missing the point of this discussion, but we have no problem > defining a "-u *" in the $SGE_ROOT/$SGE_CELL/common/sge_qstat file which > s

Re: [gridengine users] $SGE_ROOT/$SGE_CELL/common/sge_qstat

2012-06-18 Thread Rayson Ho
Hi Kelvin, The interaction is a bit more complicated than that... I will update the list on the details later, but it actually makes sense to limit the filter when we see: -u * -u user Because most likely we do not want to see all the jobs when we have a second -u. However, currently Grid Engin

Re: [gridengine users] $SGE_ROOT/$SGE_CELL/common/sge_qstat

2012-06-18 Thread Rayson Ho
Try: -u * Note that "*" is only needed when you are interacting with the shell. Rayson On Mon, Jun 18, 2012 at 5:33 PM, Joseph Farran wrote: > Hello. > > I like to make qstat do "qstat -u "*" as the default to see all user jobs. > I added: > > -u "*" > > to our //default/common/sge_qstat >

[gridengine users] OGS/Grid Engine cgroups integration demo at ISC'12

2012-06-18 Thread Rayson Ho
The Gompute team has setup an OGS/GE installation with cgroups integration. For those who are interested, please visit their booth at 560: https://www.gompute.com/web/guest/isc12 Gridcore/Gompute is a member of the Open Grid Scheduler Project, and we have received help from them in since last yea

Re: [gridengine users] GE2011.11 and ge6.2u5

2012-06-15 Thread Rayson Ho
> ru_maxrss    0 >> > ru_ixrss 0 >> > ru_ismrss    0 >> > ru_idrss 0 >> > ru_isrss 0 >> > ru_minflt    0 >> > ru_majflt    0 >> > ru_nswap 0 >> > ru_inblock   0 >> > ru_oublock   0 >> &

Re: [gridengine users] GE2011.11 and ge6.2u5

2012-06-15 Thread Rayson Ho
     0.000 > arid undefined > > > > On Fri, Jun 15, 2012 at 11:27 AM, Michael Coffman > wrote: >> >> On Fri, Jun 15, 2012 at 11:11 AM, Rayson Ho wrote: >>> >>> Can you set "execd_params" to KEEP_ACTIVE for this host?? (See the >>

Re: [gridengine users] GE2011.11 and ge6.2u5

2012-06-15 Thread Rayson Ho
gt; >> On Fri, Jun 15, 2012 at 12:58 PM, Michael Coffman >> wrote: >> > On Fri, Jun 15, 2012 at 10:11 AM, Rayson Ho wrote: >> >> >> >> On Fri, Jun 15, 2012 at 12:01 PM, Michael Coffman >> >> wrote: >> >> > From the qmaster

Re: [gridengine users] GE2011.11 and ge6.2u5

2012-06-15 Thread Rayson Ho
he execd_params or else you will fill up your local spool dir eventually with job information.) Rayson On Fri, Jun 15, 2012 at 12:58 PM, Michael Coffman wrote: > On Fri, Jun 15, 2012 at 10:11 AM, Rayson Ho wrote: >> >> On Fri, Jun 15, 2012 at 12:01 PM, Michael Coffman >> w

Re: [gridengine users] Subordinate Queues

2012-06-15 Thread Rayson Ho
On Fri, Jun 15, 2012 at 12:29 PM, Reuti wrote: > If you have two queue instances on a maschine, and want to limit the number > of slots across both queue instances to avoid oversubscription, you need to > define the overall limit in the exechost definition (to be complete: or in an > RQS, but i

Re: [gridengine users] Subordinate Queues

2012-06-15 Thread Rayson Ho
On Fri, Jun 15, 2012 at 12:20 PM, Joseph Farran wrote: > Sorry, I don't follow.   Assume I am a OGE Newbie :-).    I am not new to > the concept as I currently have the setup I described working under > Torque/Maui, so now I am trying to duplicate the same setup but under OGE. Hi Joseph, Think o

Re: [gridengine users] GE2011.11 and ge6.2u5

2012-06-15 Thread Rayson Ho
On Fri, Jun 15, 2012 at 12:01 PM, Michael Coffman wrote: > From the qmaster messages file: > 06/14/2012 21:29:39|worker|gemaster|W|job 3885.1 failed on host > cs428.ftc.avagotech.net general before job because: 06/14/2012 21:29:37 > [20339:8436]: can't open file job_pid: Permission denied > > I ch

Re: [gridengine users] ge2011.11

2012-06-14 Thread Rayson Ho
Hi Joseph, (Sorry - was busy and didn't respond to your private email earlier.) All releases of Grid Engine from us are compatible with each other, meaning that you can mix & match versions. You don't need to stop all running jobs if you know how to do it (but it is not an officially supported wa

Re: [gridengine users] build GE2011.11

2012-06-13 Thread Rayson Ho
On Wed, Jun 13, 2012 at 2:14 AM, mahbube rustaee wrote: > Any document for build GE2011 on windows 7? You can just go to the qmake directory, and directly invoke "configure" from there. Rayson ___ users mailing list users@gridengine.org https://grideng

Re: [gridengine users] qalter not successful

2012-06-13 Thread Rayson Ho
On Wed, Jun 13, 2012 at 6:36 PM, Reuti wrote: > Am 14.06.2012 um 00:20 schrieb Kevin Buckley: >> Is it possible to alter the local execd configuration so that a new >> instance could be >> started and have the node then accept other tasks, whilst still retaining the >> original communication ports

Re: [gridengine users] PE Job Suspend / Resume

2012-06-12 Thread Rayson Ho
On Wed, Jun 13, 2012 at 1:47 AM, Erik Soyez wrote: > You probably need some kind of cronjob to suspend and unsuspend your > parallel jobs correctly.  Or does anyone have a patch for this? Erik, So is/was it really working when you try it with SGE 6.2u5?? I have not looked into the code that han

Re: [gridengine users] PE Job Suspend / Resume

2012-06-12 Thread Rayson Ho
g entry for the Grid Engine cgroups integration: http://blogs.scalablelogic.com/2012/05/grid-engine-cgroups-integration.html Rayson > > Joseph > > On 6/12/2012 5:19 PM, Rayson Ho wrote: > > On Tue, Jun 12, 2012 at 8:10 PM, Joseph Farran wrote: > > If you guys are that paranoid abou

Re: [gridengine users] PE Job Suspend / Resume

2012-06-12 Thread Rayson Ho
On Tue, Jun 12, 2012 at 8:10 PM, Joseph Farran wrote: > If you guys are that paranoid about PE suspension, how about adding an > on/off flag for this since the code is already there and let the admin pick? Hi Joseph, I just want to understand the background a bit more, that's all... Esp. now we

Re: [gridengine users] PE Job Suspend / Resume

2012-06-11 Thread Rayson Ho
On Tue, Jun 12, 2012 at 12:58 AM, Joseph A. Farran wrote: > Yes it makes sense not to introduce new options. Hi Joseph, Sorry I was busy with many things (apparently - as it's almost 2am here and I am still up!), and I've asked Ron to handle some of the mailing list questions... > I am not fam

Re: [gridengine users] PE Job Suspend / Resume

2012-06-11 Thread Rayson Ho
the tree so we will need to start the discussion again and see if it really is a good idea to suspend parallel jobs. Rayson On Mon, Jun 11, 2012 at 4:21 PM, Rayson Ho wrote: > Only rank 0 of the job is suspended if I recall correctly - it was > designed specifically because not all paralle

Re: [gridengine users] PE Job Suspend / Resume

2012-06-11 Thread Rayson Ho
Only rank 0 of the job is suspended if I recall correctly - it was designed specifically because not all parallel jobs are able to handle suspend/restart correctly - for example you can get TCP timeouts and things like those. Rayson On Mon, Jun 11, 2012 at 3:53 PM, Joseph Farran wrote: > Hi. >

Re: [gridengine users] Collecting of scheduler job information is turned off

2012-06-08 Thread Rayson Ho
This time it is performance reasons... in fact I was at a site that was experiencing performance issues & the qmaster was using more memory than they ever like (their qmaster has many services running on the machine). So they also turned scheduler info off (the default was on at that time - and yes

Re: [gridengine users] Linux Groups

2012-06-08 Thread Rayson Ho
On Fri, Jun 8, 2012 at 2:37 PM, Rayson Ho wrote: > That's the primary group ID we are talking about. You can think of it > as Grid Engine only checks the primary group ID, so you need to have > the primary group ID configured properly or else it won't work. Let me clarify.

Re: [gridengine users] Linux Groups

2012-06-08 Thread Rayson Ho
7;t work. Rayson On Fri, Jun 8, 2012 at 2:26 PM, Joseph Farran wrote: > > > On 06/08/2012 11:19 AM, Rayson Ho wrote: >> >>  but if Joseph is OK with using a cron >> job to sync. membership then I can leave it aside for now - I will >> need to work on a

Re: [gridengine users] Linux Groups

2012-06-08 Thread Rayson Ho
Thanks William - I was also wondering how others do this in the field. As far as I know, only the primary group is considered - it was like that since many, many years ago. But I was not sure how you guys define ACLs that need to handle the non-primary group case. IMO, using external tools to syn

Re: [gridengine users] cgroups Integration in OGS/GE 2011.11 update 1

2012-06-08 Thread Rayson Ho
On Fri, Jun 1, 2012 at 6:19 AM, Mark Dixon wrote: > My underlying concern is that sometimes it is appropriate to set an address > space limit and sometimes it isn't, for the reasons we both put forward > previously in this thread. Users should therefore have some control over it. > > I hope we agr

Re: [gridengine users] Display a warning from jsv.sh to the user via stderr

2012-06-07 Thread Rayson Ho
on stdout. >>> >>> What I would like to have, is the following: >>> >>> $ id=$(echo echo 1 | qsub -l h_vmem=1g -jsv >>> $SGE_ROOT/util/resources/jsv/jsv.sh | cut -d " " -f3) && echo "id=$id" >>> WARNING: something went wr

Re: [gridengine users] how to restrict sge_execd to subnet

2012-06-07 Thread Rayson Ho
What are you trying to achieve?? Doesn't the Linux firewall handle the job nicely for you already?? The execd only handles requests from the qmaster, and also in the tight PE integration case from the machine running the Rank 0 of the parallel job. It is a security related thing, or there is a re

Re: [gridengine users] Qmon not launching

2012-06-06 Thread Rayson Ho
Also, besides switching users with "su -" (and without looging out & in of X again - ie. using a different instance of X server), you can compare the strace output and see if your account has settings that affect the reading of X fonts. Rayson On Wed, Jun 6, 2012 at 7:03 PM, Reuti wrote: > Am

Re: [gridengine users] guess required memory

2012-06-06 Thread Rayson Ho
On Wed, Jun 6, 2012 at 10:48 AM, Reuti wrote: > It depends on the actual program, but usually threads are working on the same > memory area. It would be interesting if the applications runs up to a certain > number of threads and then starts to fail. I remember Shannon & Ron discussed (on the o

Re: [gridengine users] Parallel Environment

2012-06-05 Thread Rayson Ho
On Wed, Jun 6, 2012 at 12:33 AM, Joseph A. Farran wrote: > Yes you guys are doing a great job and I've done my share of programming > back in the stone age, so I do appreciate how difficult it is to upkeep > something this big and complex.    OGE is very nice so far and it seems very > flexible.

Re: [gridengine users] guess required memory

2012-06-05 Thread Rayson Ho
On Wed, Jun 6, 2012 at 12:03 AM, Nakata Nakata wrote: > SGE6.2U5. > I ran executable openmp program without job.I can resident memory via "top" > command (1.5G) How many threads do you usually use to run the OpenMP code?? > when job submitted h_vmem value should be set to high value otherwise j

Re: [gridengine users] Parallel Environment

2012-06-05 Thread Rayson Ho
On Wed, Jun 6, 2012 at 12:03 AM, Joseph A. Farran wrote: > Ok.   I was not sure if OGE had a default way of telling it the Parallel > name to use if none was given. I think the client-side JSV way is the best way to pick the default PE... unless Reuti or others have other better ways. (There's al

Re: [gridengine users] Parallel Environment

2012-06-05 Thread Rayson Ho
On Tue, Jun 5, 2012 at 11:20 PM, Joseph A. Farran wrote: > Yeah, you have the advantage of knowing this product inside out and what is > difficult for us is simple, trivial and probably boring for you. I guess he means that you can try some DIYs testing without damaging worrying too much... and s

  1   2   3   4   5   >