s from overbooking a node.
Thank you,
Nico
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
ebugging (man sge_dl).
William
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Alex Chekholko ch...@stanford.edu 347-401-4860
___
users mailing list
users@
OK, we figured it out, the user has an orphaned tmux session which has a
shell in it with the command
watch qdel *
On 11/30/2015 04:27 PM, Alex Chekholko wrote:
Hi,
My qmaster messages log is continuously printing, every 2s:
11/30/2015 16:21:28|worker|scg3-hn01|E|The job * of user(s
Hi,
My qmaster messages log is continuously printing, every 2s:
11/30/2015 16:21:28|worker|scg3-hn01|E|The job * of user(s) yangili does
not exist
11/30/2015 16:21:30|worker|scg3-hn01|E|The job * of user(s) yangili does
not exist
11/30/2015 16:21:32|worker|scg3-hn01|E|The job * of user(s)
On 06/08/2015 12:49 AM, William Hay wrote:
On Fri, 5 Jun 2015 22:16:12 +
Alex Chekholko ch...@stanford.edu wrote:
Hi all,
I have a standard grid engine cluster (sge-8.1.8 tarball from Dave
Love's site) where users use qlogin to get interactive shells on compute
nodes, and we use a qlogin
has released
the mem_free and slots for others to use.
If anyone has advice on how I can set this up to accomplish my three
goals, it would be very much appreciated. I can post any configuration /
logs / details required.
Thanks,
-Chris Tobey
--
Alex Chekholko ch...@stanford.edu
://arc.liv.ac.uk/SGE/htmlman/htmlman8/pam_sge-qrsh-setup.html
is for, no matter how many times I re-read those man pages. Do I need
both pam_sge-qrsh-setup and pam_sge_authorize?
Regards,
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users
There is a general API called DRMAA
http://en.wikipedia.org/wiki/DRMAA
We have an example local app that uses it to track state of grid engine
jobs:
https://github.com/StanfordBioinformatics/SJM
I found one result for DRMAA Jenkins in Google:
On 11/6/14, 12:22 AM, William Hay wrote:
On Wed, 5 Nov 2014 23:08:53 +
Alex Chekholko ch...@stanford.edu wrote:
So I have a bash env var called BASH_FUNC_module() and when the tcl
jsv tries to make a TCL variable with that name, it errors out because
of the parenthesis. I think
need to do to make the -V flag to qsub work again?
I found a page like this, but we don't have mismatched bash versions:
https://rc.fas.harvard.edu/shellshock-update-issue-temporarily-affecting-slurm-jobs-software-modules/
Regards,
--
Alex Chekholko ch...@stanford.edu
@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
/
---
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org
://gridengine.org/mailman/listinfo/users
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
Normally you would leave the old install in place while the old jobs finish.
Since you used a different communications port and different path, you
can have the new daemons running alongside the old daemons and they
don't know about each other.
Direct users to submit jobs to the new
Hi,
Right, you want:
queue_sort_method load and load_formula slots.
Per Reuti's link.
You can easily test by submitting some test jobs on an empty cluster and
see that they all get assigned to the same host, leaving other hosts
empty, until the host with the most slots used if filled up.
Hi Brett,
I saw the issue with JOB vs YES when the jobs were requesting more than
one complex. e.g. qsub -l h_vmem=10G,h_stack=10M was not hitting memory
limits when JOB was set.
If your jobs are only requesting one complex, the JOB setting should
work as expected, and you can try both ways
showing negative values.
So on clusters where our users request also h_stack in addition to
h_vmem, we've had to set it back to 'YES' and have the users do the
division by slots at job submission.
Regards,
--
Alex Chekholko ch...@stanford.edu
___
users
only requests this
one complex.
Suggestions?
Regards,
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
Hi Jake,
You can do 'qhost -F h_vmem,mem_free,virtual_free', that might be a
useful view for you.
In general, I've only ever used one of the three complexes above.
Which one(s) do you have defined for the execution hosts? e.g.
qconf -se compute-1-7
h_vmem will map to 'ulimit -v'
mem_free
linux-x64 32 12.81 63.0G8.4G9.8G
27.8M
Host Resource(s): hc:h_vmem=1.000G
Is there anything I can do to diagnose this issue?
On 10/31/12 3:19 PM, Dave Love wrote:
Alex Chekholko ch...@stanford.edu writes:
Hi Reuti,
Thanks for your response, here's the output
, Oct 18, 2012 at 6:53 PM, Alex Chekholko ch...@stanford.edu wrote:
Hi,
Running Rocks 6, so whatever GE version is included there.
h_vmem is set consumable and per job, 4G default:
-bash-4.1$ qconf -sc |grep h_vmem
h_vmem h_vmem MEMORY =YES JOB 4G 0
each exec
wrote:
Am 19.10.2012 um 20:58 schrieb Alex Chekholko:
qhost values seem fine:
...
scg3-0-11 lx26-amd64 32 27.15 63.0G 38.3G9.8G 393.6M
scg3-0-12 lx26-amd64 32 27.36 63.0G 38.7G9.8G 33.6M
scg3-0-13 lx26-amd64 32 22.61 63.0G
after the job already started?
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
On 10/18/12 3:14 AM, Reuti wrote:
Am 28.09.2012 um 21:15 schrieb Alex Chekholko:
Hi,
I have a fresh install of Open Grid Scheduler on a new cluster and I see that when I do a qlogin session, the
environment variable TERM gets set to dumb. On my other clusters I see it gets set to
linux
to be cycled for the setting to take effect? I
can try that next.
Regards,
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
limits.conf?
I bet if you restart the process, it'll get the settings from your
current root shell, and all will be well.
Regards,
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo
it for that.
I don't see why this would require a reinstall. You probably need to
shut it all down and modify the bootstrap file, but that should be it.
IIRC it depends on whether their GE binaries were compiled with Kerberos
support or not.
--
Alex Chekholko ch...@stanford.edu
testq=1 will go in that
queue.
relevant thread:
http://gridengine.org/pipermail/users/2012-March/002972.html
On 10/05/2012 03:36 PM, Orion Poplawski wrote:
I'd like to create a queue that only allows jobs that have requested a
specific resource requirement. Is that possible?
--
Alex
On 10/05/2012 04:09 AM, Dave Love wrote:
Alex Chekholko ch...@stanford.edu writes:
Hi Christoph,
We do have it working with AUKS, which is mostly outside GE.
Is that better than DESY's arcx system, or are they roughly equivalent?
I can't remember why arcx seemed a better bet on a quick look
,
Christoph
Gesendet von meinem Windows Phone
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users
.
Regards,
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
blade/node infrastructure.
Thanks!
--JC
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
I found a reference Some Cisco switches are even permanently
configured this way -- they can receive pause frames but never emit them.
http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html
On 09/13/2012 11:17 AM, Alex Chekholko wrote:
3. What of hardware flow control
when you get a chance?
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
Hi,
For a low-level gridengine test, you can use the 'qping' command between
various grid engine daemons.
I would guess that during the time that you see the commlib error, your
qping also wouldn't work.
E.g. from an exec host:
qping -info QMASTER_NAME 6444 qmaster 1
See the qping man
specific to the queue. per
http://arc.liv.ac.uk/SGE/howto/troubleshooting.html
We also have a load sensor that checks for the presence of this
filesystem, but the load sensor only updates every few minutes, while
the filesystem tends to disappear for only about 60s.
Regards,
--
Alex Chekholko
, for PE
jobs, it
seems as if GE does not abide by the load_formula.
Does the scheduler use a different load formula for single
core
jobs verses parallel jobs suing the PE environment setup?
Joseph
--
Alex Chekholko ch...@stanford.edu
___
users mailing
ECS, VUW, NZ
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Alex Chekholko ch...@stanford.edu 347-401-4860
___
users mailing list
users@gridengine.org
https
that they don't overlap. The fact that the problem doesn't come up with
multiple parallel jobs may be because of load thresholds.
What do you think of my solution? If it's nonsense, can anyone suggest
what the problem may be?
--
Alex Chekholko ch...@stanford.edu
', the
latter is mem_free in 'qconf -se'.
Regards,
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
a sensor instead.
e.g.
http://gridscheduler.sourceforge.net/howto/loadsensor.html
Regards,
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
/
Regards,
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
with finished jobs.
thanks in advance for help
gerard
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Alex Chekholko ch...@stanford.edu 347-401-4860
!
-- Daniel Boone
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Alex Chekholko ch...@stanford.edu 347-401-4860
___
users mailing list
users
you might have.
Good Luck,
Mark
Mark Suhovecky
HPC System Administrator
Center for Research Computing
University of Notre Dame
suhove...@nd.edu
From: users-boun...@gridengine.org [users-boun...@gridengine.org] On Behalf Of
Alex Chekholko [ch
will
work with modern GE 6.2u5+?
Regards,
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
where you specify which parallel environments your queue
definition has.
--
Alex Chekholko ch...@stanford.edu
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
Yes, think of it as a high water mark over the duration of the process.
6.2u5 has a bug in accounting that particular metric, but previous and later
versions do not.
- Original Message -
From: William Deegan b...@baddogconsulting.com
To: Gridengine Users Group users@gridengine.org
48 matches
Mail list logo