On Tue, Nov 10, 2015 at 7:08 AM, Taras Shapovalov
wrote:
> Hi guys,
>
> Can we state now that OGS has not been developed any more (already 4 year)
> and we should not recommend our customers to install it?
The last release was a bug fix version GAed in 2012, so it was
released 2-3 years ago.
Ron
Hi Joshua,
We are still developing new features for our users, especially new
features from consulting projects. We are just too busy to integrate
all of them to the open source branch. If you look at our feature
list, you will find that we are the first to develop the ARM Linux
port, the cgroups
In 2011, we released OGS / GE 2011.11 that includes the ARM port. We
tried it on a Raspberry Pi like platform, but any ARM Linux platform
is supported.
The code was then copied & referenced by other GE forks as well, so
just use any recent version of Grid Engine, can you can compile & run
your fav
On Tue, Feb 4, 2014 at 6:40 PM, Simon Matthews
wrote:
> I found binaries for 6.2u5 both in the form of a tarfile and packed in rpms
> (in the EPEL repository for CentOS). However, if I try to use qstat from
> these against my 6.2u4 qmaster, I get a timeout message:
> error: failed receiving gdi re
There is slightly less overhead to "submit" a binary, because there is
no need to spool the job script.
Also, we used to tell our users that they can change the job scripts
after submission, and that's one small advantage of submitting job
scripts. However, in most use cases just users just need t
We still need more testing for 64-bit Windows. Currently, 32-bit works for
us.
Rayson
On Wed, Jun 26, 2013 at 11:16 AM, Raghavendra wrote:
> Hi,
>
> I found that scalable logic guys have implemented support of cygwin from
> the link bellow:
>
> http://blogs.scalablelogic.com/2012/06/grid-engi
Hi,
Sorry we missed your emails -- we grew from just 1 business unit (used
to only have Grid Engine support) to 2 business units now. We now
have:
- Grid Engine support
- HPC Cloud services
And we route our customers & users accordingly. Looks like your email
address did not belong to any of the
Somewhat related to the email message below: I am going to play with a
StoragePod 2.0 for the next project in 2013... The machine uses
off-the-shelf components like motherboard, CPU, 45 disks, memory,
power supply... and the real intellectual property that was released
by Backblaze is the 4U custom
012 at 10:52 AM, Rayson Ho wrote:
>
> This year, we tested the scalability of Open Grid Scheduler / Grid
> Engine on the cloud -- we ran a 10,000-node cluster on EC2 (we could
> have used Gompute's hardware but obviously there are more important
> workloads in
I just checked the source, "JB_hard_wallclock_gmt" (which is an execd
internal data structure first set when the job starts running, and the
value is pulled from h_rt) is only handled by the execd, so if the
execd is not running (just do what Reuti mentioned in his email
below), then the job *shoul
For those at SC12 who are interested in running your HPC jobs in the
Cloud, or just open source Grid Engine in general, please visit the
Gompute booth (#3436):
http://www.gompute.com/-/sc12-nov-10-16-salt-lake-city-usa
This year, we tested the scalability of Open Grid Scheduler / Grid
Engine on t
Joseph,
You need to set S_DESCRIPTORS, H_DESCRIPTORS with the "execd_params"
option in sge_conf:
http://gridscheduler.sourceforge.net/htmlman/htmlman5/sge_conf.html
Example: H_DESCRIPTORS=1
Rayson
On Wed, Nov 14, 2012 at 1:51 PM, Joseph Farran wrote:
> Hi All.
>
> I increased our ulimit
On Tue, Nov 6, 2012 at 5:47 AM, Paul Simpson wrote:
> The child process does not get killed which leaves an over
> stressed machine which leads to knock on errors.
Are these parallel (MPI?) jobs?
> From reading this list, we
> are not alone in suffering from this. Can anyone shred light on this
Hi Arnau,
Thanks for using OGE (the official name for it is OGS/GE, but a few
people call it OGE as well, so we don't really have a preference).
The SRPM build does not use the version of jemalloc shipped with the
source (in fact, it also does not use the hwloc library included), but
instead uses
On Fri, Oct 26, 2012 at 1:55 PM, Wagner, Justin wrote:
> while (!foo_done_file){};
>
> while (!bar_done_file){};
>
> while (!baz_done_file){};
Or use bash's built-in wait, like this:
http://stackoverflow.com/questions/356100/how-to-wait-in-bash-for-several-subprocesses-to-finish-and-return-exit-
Alex,
Can you run qhost and see if the memory value is also negative also??
If it is, then this bug was fixed in any release of OGS/GE.
Rayson
On Thu, Oct 18, 2012 at 6:53 PM, Alex Chekholko wrote:
> Hi,
>
> Running Rocks 6, so whatever GE version is included there.
>
> h_vmem is set consumab
Prolog runs as a separate process, so all env vars are discarded when
the prolog is done.
To implement what you need, you can look at the starter method...
Rayson
On Tue, Oct 9, 2012 at 6:40 PM, Orion Poplawski wrote:
> Is there a way to set environment variables in a prolog script that persi
Thanks Mark! It is much better than reading the access time of
/dev/mouse method that I used to use ages ago!
Rayson
On Tue, Oct 2, 2012 at 1:21 PM, Bober, Mark wrote:
> Nice! That's a lot cleaner than my hacked-up set of scripts. I'll try it
> as soon as I can.
>
> I noted your comment about
blem, which is not related to grid
> engine. But in the meantime, I wonder if there is some workaround for my
> filesystem issue.
>
> Is there a way to make the load sensor check more frequent?
>
> Regards,
> Alex
>
>
> On 09/04/2012 12:03 PM, Rayson Ho wrote:
>&
Hi Alex,
That's the correct behavior (for SSTATE_OPEN_OUTPUT), or else a user
can DoS the cluster easily by pointing the input or output file to a
path that can't be opened by the user.
Rayson
On Tue, Sep 4, 2012 at 2:50 PM, Alex Chekholko wrote:
> Hi,
>
> I have a cluster with Rayson's OGE f
Oh, and when it crashes, eg.:
Program received signal SIGSEGV, Segmentation fault.
...
(gdb) where
You will then see the stack trace.
Rayson
On Fri, Aug 31, 2012 at 3:36 PM, Rayson Ho wrote:
> - Set SGE_ND in the env
> - At the shell, gdb sge_qmaster , and then "r".
>
&g
- Set SGE_ND in the env
- At the shell, gdb sge_qmaster , and then "r".
Rayson
On Fri, Aug 31, 2012 at 3:33 PM, Bob Tupper wrote:
> Can you please explain in more detail how to launch with the debugger
> enabled?
>
> Thanks
>
>
> On 08/31/2012 12:25 PM, Rayson
per wrote:
> Thanks for your help.
> I do have PE defined. But it crashes with just a simple job that just
> sleeps.
> Crashes every time.
> -Bob
>
>
>
> On 08/31/2012 11:59 AM, Rayson Ho wrote:
>>
>> Do you have parallel (or PE) jobs in your cluster?? A bug in SGE
Do you have parallel (or PE) jobs in your cluster?? A bug in SGE 6.2u5
can cause the qmaster to seg fault when it receives the job reports
from parallel jobs.
Rayson
On Fri, Aug 31, 2012 at 2:52 PM, Bob Tupper wrote:
> Greetings,
>
> Hope someone can help me out.
> I have a 6.2u5 install on ce
e
> Fujitsu Semiconductor Design (Chengdu) Co. Ltd.,
> Phone : (86)28-85150023 ext.8826
> E-mail:harris...@cn.fujitsu.com
>
>
> -Original Message-
> From: Rayson Ho [mailto:rayray...@gmail.com]
> Sent: Thursday, August 30, 2012 11:25 PM
> To: Harris He, Kun - CD
> Cc:
-
> From: users-boun...@gridengine.org [mailto:users-boun...@gridengine.org] On
> Behalf Of Rayson Ho
> Sent: Tuesday, February 14, 2012 12:45 AM
> To: Pierre Girard
> Cc: users@gridengine.org
> Subject: Re: [gridengine users] Compiling grid engine 2011.11 on solaris 11
>
>
Hi Harris,
>From the error message, it looks like you don't have make installed...
Are you able to run "make" from an interactive shell??
Rayson
On Wed, Aug 29, 2012 at 10:08 PM, Harris He, Kun - CD
wrote:
> Dear All,
>
>
>
> I encounter a problem recently.
>
> The version GE2011.11p1 that i
x08239fae in sge_monitor_init ()
> #3 0x08056463 in sge_signaler_main ()
> #4 0x00166a49 in start_thread () from /lib/libpthread.so.0
> #5 0x00259e5e in clone () from /lib/libc.so.6
>
> Let me know if you want/need any more info
>
> -Karl
> On Mon, Aug 27, 2012 at 10
Most of our users don't run 32-bit Linux, but we tested it and it
worked for us (earlier RHEL versions, but not CentOS 6.3). Can you run
the qmaster under gdb so that gdb would show the stack trace?
Rayson
On Mon, Aug 27, 2012 at 10:35 AM, Karl Vollmer wrote:
> Hello,
>
> I recently tried in
There's Exclusive Scheduling for allocating the whole node:
http://docs.oracle.com/cd/E24901_01/doc.62/e21978/management.htm#sthref431
Rayson
On Wed, Aug 22, 2012 at 3:39 PM, Henrichs, Juryk
wrote:
> Hi,
>
> we have a heterogeneous cluster consisting of nodes with 32 and 48
> cpu's. Some of o
On Wed, Aug 15, 2012 at 1:55 PM, Joseph Farran wrote:
> This was another issued that Son of Grid Engine sge_8.1.1 solved that
> GE2011.11 had issues with.
Just for the record, GE 2011.11p1 fixed a module related issue.
However, I've re-read the whole discussion again, and I don't see any
clean a
It's the correct behavior, because with jobs running, then there is
load. And by default Grid Engine dispatches jobs to the lightest load
machines first.
You can take a look at this blog entry, "N1GE 6 - Scheduler Hacks:
"least used" / "fill up" configuration":
http://wiki.gridengine.info/wiki/in
On Mon, Aug 6, 2012 at 11:10 PM, Harris He, Kun - CD
wrote:
> But when I run: scripts/distinst -all -local –noexit
>
> A error information shows that:
>
> ---
>
> Installing Libjuti.so
>
> “Libdb-4.4.so” not found. Assuming binaries are statically linked.
Hi,
You can safely i
Check if the execd spool directory local or NFS shared?? The usage
file is written into the execd spool, not the job's working dir.
Rayson
On Fri, Aug 3, 2012 at 2:07 PM, Simon Matthews
wrote:
> Can anyone tell me what is going on here?
>
> The machine has plenty of disk space. The directory f
May be you can run qacct -j ?
Rayson
On Thu, Aug 2, 2012 at 10:28 AM, Lionel SPINELLI
wrote:
> Hello all,
>
> I would like to know if it is possible from a submit host to get information
> on a job that ended normally.
> I mean, using "qstat -f" or "qstat -j " I can get information on pending
Hi Joseph,
If you want to add a node from scratch, then you can try
"install_execd" - it should create all the needed files & the needed
queues, etc for the node.
If the qmaster already has the queues defined, and everything is the
same (node name, etc) except that the node's filesystem is gone,
Hi Joesph,
Sorry I don't have time to check the code... what is SGE_STDOUT_PATH
set to in your setup, and what's the expected behavior??
Rayson
On Wed, Aug 1, 2012 at 2:51 PM, Joseph Farran wrote:
> Hi.
>
> GE Environments variables are set for SGE_STDOUT_PATH & SGE_STDERR_PATH for
> standard
your responses... I will try to find a solution...
>
>
> Lionel
> ____
> De : Rayson Ho [rayray...@gmail.com]
> Date d'envoi : jeudi 26 juillet 2012 17:37
> À : Reuti
> Cc : Lionel SPINELLI; users@gridengine.org
> Objet : Re: [grid
On Thu, Jul 26, 2012 at 11:32 AM, Reuti wrote:
>> But when I try to submit a job, I got an error telling "error: can't open
>> output file "/home/tommy/simple.sh.o60": Permission denied
>>
>> Obviously, if i use the -o and -e option of qsub to put the output and error
>> logs to a shared disk wi
On Thu, Jul 26, 2012 at 11:17 AM, Lionel SPINELLI
wrote:
> But when I try to submit a job, I got an error telling "error: can't open
> output file "/home/tommy/simple.sh.o60": Permission denied
>
> Obviously, if i use the -o and -e option of qsub to put the output and error
> logs to a shared di
qacct, ACRo, etc??
In fact, many people write their own scripts to parse data in the
accounting & reporting files:
http://gridscheduler.sourceforge.net/htmlman/htmlman5/accounting.html
http://gridscheduler.sourceforge.net/htmlman/htmlman5/reporting.html
Rayson
On Mon, Jul 16, 2012 at 2:58 PM,
Others fixed it by running date -s "`date`" :
Ref: http://gridengine.org/pipermail/users/2012-July/004089.html
Rayson
On Fri, Jul 13, 2012 at 9:31 PM, Daniel Povey wrote:
> Has anyone noticed their sge_execd proceses suddenly taking up a lot of CPU,
> possibly since around July 2nd this year?
On Thu, Jul 12, 2012 at 6:24 AM, William Hay wrote:
> It's more than a month later and AFAICT your public SVN hasn't
> advanced beyond 2011.11p1.
> Is there any way to get hold of a version with your cgroups PDC(inc
> enforcing rss)? We'd like to evaluate it on our test cluster.
Hi William,
We
On Wed, Jul 11, 2012 at 9:38 AM, John Young wrote:
> Hmmm... If it was an OS issue, I would expect both queues to behave
> the same way since both are under the same OS, but they don't.
Again, as a normal user & then as root, run an interactive shell - but
let me clarify, *outside of Grid Engine
On Tue, Jul 10, 2012 at 4:23 PM, John Young wrote:
> With this in place, it seems odd that from one of my queues I
> get a default setting for the number of descriptors of 1024.
>
> So I have two questions really:
>
> 1. Why am I getting different behavior from the two queues?
Could be OS issue -
On Tue, Jul 10, 2012 at 3:51 PM, Reuti wrote:
>> Failure was that user's .bashrc were not being read. In each user's
>> account, I have it sourcing a system wide shell script which sets certain
>> things up, like our module environment configuration and so nothing was
>> being setup.
>
> Yep,
On Tue, Jul 10, 2012 at 4:02 PM, John Young wrote:
>> If you really have a real use-case for setting the # of descriptors in
>> the queue config, then let us know and we can implement that in OGS/GE
>> (... when time permits).
>>
> Well... I have an engineer here who want to run a 2048 core job.
The number of file descriptors is not part of the queue limit, see the
message I sent to the list 2 months ago:
http://gridengine.org/pipermail/users/2012-May/003705.html
If you really have a real use-case for setting the # of descriptors in
the queue config, then let us know and we can implement
On Tue, Jul 10, 2012 at 5:45 AM, Reuti wrote:
>
> Just to note, that the path can be accessed by $SGE_JOB_SPOOL_DIR.
Thanks Reuti - it will be useful to David.
I forgot this environment var as I have not used this hack for almost
a year... basically since getting the job exit status in epilog w
On Tue, Jul 10, 2012 at 1:48 PM, Joseph Farran wrote:
> I was using the same identical script, so it's still a mystery why the
> script ran on some nodes while it failed on others, but now with this change
> it works on all nodes and that is good enough.
Those settings affect whether the global l
an you
> point me to any info on what this error state does? Presumably it
> stops the existing and future jobs until the error state is cleared,
> but I'd love do know how to clear the error state, etc.
>
> Thanks!
> David
>
> ps is there a PPA available for OGE on Ubunt
If you are using Open Grid Scheduler/Grid Engine 2011.11 or later,
then there is the $SGE_JOBEXIT_STAT variable set in the epilog, so you
can check the exit status of the job. And then in the epilog, you can
close the queue or in fact do whatever you want - like disable
scheduling new jobs, etc.
T
See this blog post I wrote back in 2005:
http://web.archive.org/web/20051219011530/http://gridengine.info/articles/2005/10/10/unlimited-stack-limit-on-solaris
Rayson
On Mon, Jul 9, 2012 at 10:41 AM, Peter van Heusden wrote:
> Hi there
>
> I'm using gridengine 6.2u5-4 on Ubuntu 12.04. I've se
On Wed, Jul 4, 2012 at 10:38 AM, Reuti wrote:
>> ed.screen.c:(.text+0x7d3): undefined reference to `tputs'
>
> This looks like the ncurses-devel package is missing.
It's a common problem with newer versions of RHEL-based distros. I
downloaded CentOS 6.2 & Oracle Linux 6.3... I will install them s
Off the top of my head... it's "resource_quota".
Rayson
On Tue, Jul 3, 2012 at 10:27 AM, CB wrote:
> Hi,
>
> In OGS/GE 2011.11 man page, qconf, it says:
>
>
>-aattr obj_spec attr_name val obj_instance,...
>
>
>
> Allows adding specifications to a si
for Cygwin binary:
http://dl.dropbox.com/u/47200624/GE2011.11p1/ge2011.11-cygwin-p1.tar.bz2
Rayson
On Mon, Jul 2, 2012 at 5:24 PM, Joseph Farran wrote:
> When is GE 2011.11 update 1 with cgroups planned on being released?
>
>
> On 07/02/2012 01:56 PM, Rayson Ho wrote:
>>
>>
GE 2011.11 patch 1 was released a while ago (back in April or May I believe):
http://dl.dropbox.com/u/47200624/GE2011.11p1/GE2011.11p1.tar.gz
But we did not release certified binaries for it yet, so you will need
to compile from source, or wait for GE 2011.11 update 1 that has the
Grid Engine cgr
such file or directory
Rayson
On Mon, Jul 2, 2012 at 12:41 PM, george he wrote:
> Hello Rayson,
>
> That file is a link to chkconfig:
>
> /usr/lib/lsb/install_initd -> ../../../sbin/chkconfig
>
> How do you use another way to install the init script?
>
> Thanks,
>
Check if you have /usr/lib/lsb/install_initd on your system. On my
Fedora 17, that file is missing, and thus the install script uses
another way to install the init scripts.
Rayson
On Mon, Jul 2, 2012 at 11:52 AM, george he wrote:
> Hello all,
>
> I downloaded ge2011.11 from
> http://dl.dropbo
On Thu, Jun 28, 2012 at 8:11 AM, Txema Heredia Genestar
wrote:
> 2-20100 I'll change it to 2-3 to see if it still happens.
Yes, it can be due to a collusion in the GID range with some valid
user GIDs - may be real users are assigned those GIDs??
> But I have checked the gid for all
On Wed, Jun 27, 2012 at 12:51 PM, CB wrote:
> The one last ToDo
> is the trace file, which is owned by the job owner and it still has
> world-readable permission.
>
See the attached file for hardening the trace file (patch file against
Open Grid Scheduler trunk - should work with older versio
On Wed, Jun 27, 2012 at 1:26 PM, Reuti wrote:
> The job scripts need to be readable by each individual user who is running
> the job at execution time. What permission did you put on the execd's spool
> directory? For binary jobs it will work (unless you want to use $PE_HOSTFILE
> or alike).
Y
Didn't know that you wanted to harden the spool files as well (that
requirement was not in your original email) - but can't you just
change the permission of the execd spool directory?? I just do a quick
experiment and jobs seem to run fine.
With the permission set at the execd local spool dir, th
Grid Engine does not parse /etc/security/limits.conf, and you may want
to set the limit in the queue configuration:
http://gridscheduler.sourceforge.net/htmlman/htmlman5/queue_conf.html
Rayson
On Sat, Jun 23, 2012 at 9:43 PM, Daniel Povey wrote:
> We have a problem in our queue that GridEngine
ges to the code.
>
> If I had any control over this software product, I would have broken the
> ability to change the subordinate field with the GUI until the bug was
> corrected.
>
>
>
> On 06/21/2012 08:39 AM, Rayson Ho wrote:
>>
>> It's a known bug, it was
On Thu, Jun 21, 2012 at 11:31 AM, Reuti wrote:
>> If it's a bug, is this a new bug or already known bug?
>
> Maybe it depends on the fork.
It's a known bug, it was in the Sun bug database. We didn't fix it
because not that many people use qmon (ie. a lot of us just use the
command line interface)
See if it is this one: daemons/execd/tmpdir.c
(I have not tested it myself so I could be wrong!)
Rayson
On Wed, Jun 20, 2012 at 2:09 PM, CB wrote:
> Hi,
>
> I am using the GE2011.11 release.
>
> When a job dispatched to a node, it creates $TMP directory, which is usually
> located at /tmp on
If you add it into your ~/.sge_qstat then you will notice the difference...
Rayson
On Tue, Jun 19, 2012 at 7:06 PM, Wagner, Justin wrote:
> Perhaps I am missing the point of this discussion, but we have no problem
> defining a "-u *" in the $SGE_ROOT/$SGE_CELL/common/sge_qstat file which
> s
Hi Kelvin,
The interaction is a bit more complicated than that... I will update
the list on the details later, but it actually makes sense to limit
the filter when we see:
-u * -u user
Because most likely we do not want to see all the jobs when we have a
second -u. However, currently Grid Engin
Try:
-u *
Note that "*" is only needed when you are interacting with the shell.
Rayson
On Mon, Jun 18, 2012 at 5:33 PM, Joseph Farran wrote:
> Hello.
>
> I like to make qstat do "qstat -u "*" as the default to see all user jobs.
> I added:
>
> -u "*"
>
> to our //default/common/sge_qstat
>
The Gompute team has setup an OGS/GE installation with cgroups integration.
For those who are interested, please visit their booth at 560:
https://www.gompute.com/web/guest/isc12
Gridcore/Gompute is a member of the Open Grid Scheduler Project, and
we have received help from them in since last yea
> ru_maxrss 0
>> > ru_ixrss 0
>> > ru_ismrss 0
>> > ru_idrss 0
>> > ru_isrss 0
>> > ru_minflt 0
>> > ru_majflt 0
>> > ru_nswap 0
>> > ru_inblock 0
>> > ru_oublock 0
>> &
0.000
> arid undefined
>
>
>
> On Fri, Jun 15, 2012 at 11:27 AM, Michael Coffman
> wrote:
>>
>> On Fri, Jun 15, 2012 at 11:11 AM, Rayson Ho wrote:
>>>
>>> Can you set "execd_params" to KEEP_ACTIVE for this host?? (See the
>>
gt;
>> On Fri, Jun 15, 2012 at 12:58 PM, Michael Coffman
>> wrote:
>> > On Fri, Jun 15, 2012 at 10:11 AM, Rayson Ho wrote:
>> >>
>> >> On Fri, Jun 15, 2012 at 12:01 PM, Michael Coffman
>> >> wrote:
>> >> > From the qmaster
he execd_params or else you will fill up your
local spool dir eventually with job information.)
Rayson
On Fri, Jun 15, 2012 at 12:58 PM, Michael Coffman
wrote:
> On Fri, Jun 15, 2012 at 10:11 AM, Rayson Ho wrote:
>>
>> On Fri, Jun 15, 2012 at 12:01 PM, Michael Coffman
>> w
On Fri, Jun 15, 2012 at 12:29 PM, Reuti wrote:
> If you have two queue instances on a maschine, and want to limit the number
> of slots across both queue instances to avoid oversubscription, you need to
> define the overall limit in the exechost definition (to be complete: or in an
> RQS, but i
On Fri, Jun 15, 2012 at 12:20 PM, Joseph Farran wrote:
> Sorry, I don't follow. Assume I am a OGE Newbie :-). I am not new to
> the concept as I currently have the setup I described working under
> Torque/Maui, so now I am trying to duplicate the same setup but under OGE.
Hi Joseph,
Think o
On Fri, Jun 15, 2012 at 12:01 PM, Michael Coffman
wrote:
> From the qmaster messages file:
> 06/14/2012 21:29:39|worker|gemaster|W|job 3885.1 failed on host
> cs428.ftc.avagotech.net general before job because: 06/14/2012 21:29:37
> [20339:8436]: can't open file job_pid: Permission denied
>
> I ch
Hi Joseph,
(Sorry - was busy and didn't respond to your private email earlier.)
All releases of Grid Engine from us are compatible with each other,
meaning that you can mix & match versions. You don't need to stop all
running jobs if you know how to do it (but it is not an officially
supported wa
On Wed, Jun 13, 2012 at 2:14 AM, mahbube rustaee wrote:
> Any document for build GE2011 on windows 7?
You can just go to the qmake directory, and directly invoke
"configure" from there.
Rayson
___
users mailing list
users@gridengine.org
https://grideng
On Wed, Jun 13, 2012 at 6:36 PM, Reuti wrote:
> Am 14.06.2012 um 00:20 schrieb Kevin Buckley:
>> Is it possible to alter the local execd configuration so that a new
>> instance could be
>> started and have the node then accept other tasks, whilst still retaining the
>> original communication ports
On Wed, Jun 13, 2012 at 1:47 AM, Erik Soyez
wrote:
> You probably need some kind of cronjob to suspend and unsuspend your
> parallel jobs correctly. Or does anyone have a patch for this?
Erik,
So is/was it really working when you try it with SGE 6.2u5??
I have not looked into the code that han
g entry for the Grid
Engine cgroups integration:
http://blogs.scalablelogic.com/2012/05/grid-engine-cgroups-integration.html
Rayson
>
> Joseph
>
> On 6/12/2012 5:19 PM, Rayson Ho wrote:
>
> On Tue, Jun 12, 2012 at 8:10 PM, Joseph Farran wrote:
>
> If you guys are that paranoid abou
On Tue, Jun 12, 2012 at 8:10 PM, Joseph Farran wrote:
> If you guys are that paranoid about PE suspension, how about adding an
> on/off flag for this since the code is already there and let the admin pick?
Hi Joseph,
I just want to understand the background a bit more, that's all...
Esp. now we
On Tue, Jun 12, 2012 at 12:58 AM, Joseph A. Farran wrote:
> Yes it makes sense not to introduce new options.
Hi Joseph,
Sorry I was busy with many things (apparently - as it's almost 2am
here and I am still up!), and I've asked Ron to handle some of the
mailing list questions...
> I am not fam
the tree so we will need to start the
discussion again and see if it really is a good idea to suspend
parallel jobs.
Rayson
On Mon, Jun 11, 2012 at 4:21 PM, Rayson Ho wrote:
> Only rank 0 of the job is suspended if I recall correctly - it was
> designed specifically because not all paralle
Only rank 0 of the job is suspended if I recall correctly - it was
designed specifically because not all parallel jobs are able to handle
suspend/restart correctly - for example you can get TCP timeouts and
things like those.
Rayson
On Mon, Jun 11, 2012 at 3:53 PM, Joseph Farran wrote:
> Hi.
>
This time it is performance reasons... in fact I was at a site that
was experiencing performance issues & the qmaster was using more
memory than they ever like (their qmaster has many services running on
the machine). So they also turned scheduler info off (the default was
on at that time - and yes
On Fri, Jun 8, 2012 at 2:37 PM, Rayson Ho wrote:
> That's the primary group ID we are talking about. You can think of it
> as Grid Engine only checks the primary group ID, so you need to have
> the primary group ID configured properly or else it won't work.
Let me clarify.
7;t work.
Rayson
On Fri, Jun 8, 2012 at 2:26 PM, Joseph Farran wrote:
>
>
> On 06/08/2012 11:19 AM, Rayson Ho wrote:
>>
>> but if Joseph is OK with using a cron
>> job to sync. membership then I can leave it aside for now - I will
>> need to work on a
Thanks William - I was also wondering how others do this in the field.
As far as I know, only the primary group is considered - it was like
that since many, many years ago. But I was not sure how you guys
define ACLs that need to handle the non-primary group case.
IMO, using external tools to syn
On Fri, Jun 1, 2012 at 6:19 AM, Mark Dixon wrote:
> My underlying concern is that sometimes it is appropriate to set an address
> space limit and sometimes it isn't, for the reasons we both put forward
> previously in this thread. Users should therefore have some control over it.
>
> I hope we agr
on stdout.
>>>
>>> What I would like to have, is the following:
>>>
>>> $ id=$(echo echo 1 | qsub -l h_vmem=1g -jsv
>>> $SGE_ROOT/util/resources/jsv/jsv.sh | cut -d " " -f3) && echo "id=$id"
>>> WARNING: something went wr
What are you trying to achieve?? Doesn't the Linux firewall handle the
job nicely for you already??
The execd only handles requests from the qmaster, and also in the
tight PE integration case from the machine running the Rank 0 of the
parallel job.
It is a security related thing, or there is a re
Also, besides switching users with "su -" (and without looging out &
in of X again - ie. using a different instance of X server), you can
compare the strace output and see if your account has settings that
affect the reading of X fonts.
Rayson
On Wed, Jun 6, 2012 at 7:03 PM, Reuti wrote:
> Am
On Wed, Jun 6, 2012 at 10:48 AM, Reuti wrote:
> It depends on the actual program, but usually threads are working on the same
> memory area. It would be interesting if the applications runs up to a certain
> number of threads and then starts to fail.
I remember Shannon & Ron discussed (on the o
On Wed, Jun 6, 2012 at 12:33 AM, Joseph A. Farran wrote:
> Yes you guys are doing a great job and I've done my share of programming
> back in the stone age, so I do appreciate how difficult it is to upkeep
> something this big and complex. OGE is very nice so far and it seems very
> flexible.
On Wed, Jun 6, 2012 at 12:03 AM, Nakata Nakata wrote:
> SGE6.2U5.
> I ran executable openmp program without job.I can resident memory via "top"
> command (1.5G)
How many threads do you usually use to run the OpenMP code??
> when job submitted h_vmem value should be set to high value otherwise j
On Wed, Jun 6, 2012 at 12:03 AM, Joseph A. Farran wrote:
> Ok. I was not sure if OGE had a default way of telling it the Parallel
> name to use if none was given.
I think the client-side JSV way is the best way to pick the default
PE... unless Reuti or others have other better ways. (There's al
On Tue, Jun 5, 2012 at 11:20 PM, Joseph A. Farran wrote:
> Yeah, you have the advantage of knowing this product inside out and what is
> difficult for us is simple, trivial and probably boring for you.
I guess he means that you can try some DIYs testing without damaging
worrying too much... and s
1 - 100 of 457 matches
Mail list logo