Re: [gridengine users] Reported memory usage too high

2016-05-09 Thread Alex Chekholko
s from overbooking a node. Thank you, Nico ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] sgemaster crash

2016-02-05 Thread Alex Chekholko
ebugging (man sge_dl). William ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users -- Alex Chekholko ch...@stanford.edu 347-401-4860 ___ users mailing list users@

Re: [gridengine users] qmaster worker error message

2015-12-01 Thread Alex Chekholko
OK, we figured it out, the user has an orphaned tmux session which has a shell in it with the command watch qdel * On 11/30/2015 04:27 PM, Alex Chekholko wrote: Hi, My qmaster messages log is continuously printing, every 2s: 11/30/2015 16:21:28|worker|scg3-hn01|E|The job * of user(s

[gridengine users] qmaster worker error message

2015-11-30 Thread Alex Chekholko
Hi, My qmaster messages log is continuously printing, every 2s: 11/30/2015 16:21:28|worker|scg3-hn01|E|The job * of user(s) yangili does not exist 11/30/2015 16:21:30|worker|scg3-hn01|E|The job * of user(s) yangili does not exist 11/30/2015 16:21:32|worker|scg3-hn01|E|The job * of user(s)

Re: [gridengine users] qlogin + X11 + pam_sge_authorize ?

2015-06-08 Thread Alex Chekholko
On 06/08/2015 12:49 AM, William Hay wrote: On Fri, 5 Jun 2015 22:16:12 + Alex Chekholko ch...@stanford.edu wrote: Hi all, I have a standard grid engine cluster (sge-8.1.8 tarball from Dave Love's site) where users use qlogin to get interactive shells on compute nodes, and we use a qlogin

Re: [gridengine users] qlogin, interactive sessions, and cgroups

2015-06-05 Thread Alex Chekholko
has released the mem_free and slots for others to use. If anyone has advice on how I can set this up to accomplish my three goals, it would be very much appreciated. I can post any configuration / logs / details required. Thanks, -Chris Tobey -- Alex Chekholko ch...@stanford.edu

[gridengine users] qlogin + X11 + pam_sge_authorize ?

2015-06-05 Thread Alex Chekholko
://arc.liv.ac.uk/SGE/htmlman/htmlman8/pam_sge-qrsh-setup.html is for, no matter how many times I re-read those man pages. Do I need both pam_sge-qrsh-setup and pam_sge_authorize? Regards, -- Alex Chekholko ch...@stanford.edu ___ users mailing list users

Re: [gridengine users] Jenkins integration?

2015-02-04 Thread Alex Chekholko
There is a general API called DRMAA http://en.wikipedia.org/wiki/DRMAA We have an example local app that uses it to track state of grid engine jobs: https://github.com/StanfordBioinformatics/SJM I found one result for DRMAA Jenkins in Google:

Re: [gridengine users] tcl jsv error - bash related

2014-11-06 Thread Alex Chekholko
On 11/6/14, 12:22 AM, William Hay wrote: On Wed, 5 Nov 2014 23:08:53 + Alex Chekholko ch...@stanford.edu wrote: So I have a bash env var called BASH_FUNC_module() and when the tcl jsv tries to make a TCL variable with that name, it errors out because of the parenthesis. I think

[gridengine users] tcl jsv error - bash related

2014-11-05 Thread Alex Chekholko
need to do to make the -V flag to qsub work again? I found a page like this, but we don't have mismatched bash versions: https://rc.fas.harvard.edu/shellshock-update-issue-temporarily-affecting-slurm-jobs-software-modules/ Regards, -- Alex Chekholko ch...@stanford.edu

Re: [gridengine users] Basic information working on cluster

2014-04-15 Thread Alex Chekholko
@gridengine.org https://gridengine.org/mailman/listinfo/users -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Grid engine does not send no job-mails

2013-10-24 Thread Alex Chekholko
/ --- ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org

Re: [gridengine users] Resource Reservation logging

2013-10-04 Thread Alex Chekholko
://gridengine.org/mailman/listinfo/users -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Upgrading form SGE to OGE

2013-09-11 Thread Alex Chekholko
Normally you would leave the old install in place while the old jobs finish. Since you used a different communications port and different path, you can have the new daemons running alongside the old daemons and they don't know about each other. Direct users to submit jobs to the new

Re: [gridengine users] modify host selection algorithm?

2013-04-29 Thread Alex Chekholko
Hi, Right, you want: queue_sort_method load and load_formula slots. Per Reuti's link. You can easily test by submitting some test jobs on an empty cluster and see that they all get assigned to the same host, leaving other hosts empty, until the host with the most slots used if filled up.

Re: [gridengine users] h_vmem not actually restricting memory usage?

2013-02-08 Thread Alex Chekholko
Hi Brett, I saw the issue with JOB vs YES when the jobs were requesting more than one complex. e.g. qsub -l h_vmem=10G,h_stack=10M was not hitting memory limits when JOB was set. If your jobs are only requesting one complex, the JOB setting should work as expected, and you can try both ways

Re: [gridengine users] strategy for h_vmem and multi-slot jobs?

2013-02-01 Thread Alex Chekholko
showing negative values. So on clusters where our users request also h_stack in addition to h_vmem, we've had to set it back to 'YES' and have the users do the division by slots at job submission. Regards, -- Alex Chekholko ch...@stanford.edu ___ users

[gridengine users] h_vmem not honored at all?

2013-01-09 Thread Alex Chekholko
only requests this one complex. Suggestions? Regards, -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Memory allocation woes. Any thoughts?

2012-12-11 Thread Alex Chekholko
Hi Jake, You can do 'qhost -F h_vmem,mem_free,virtual_free', that might be a useful view for you. In general, I've only ever used one of the three complexes above. Which one(s) do you have defined for the execution hosts? e.g. qconf -se compute-1-7 h_vmem will map to 'ulimit -v' mem_free

Re: [gridengine users] h_vmem negative values?

2012-11-28 Thread Alex Chekholko
linux-x64 32 12.81 63.0G8.4G9.8G 27.8M Host Resource(s): hc:h_vmem=1.000G Is there anything I can do to diagnose this issue? On 10/31/12 3:19 PM, Dave Love wrote: Alex Chekholko ch...@stanford.edu writes: Hi Reuti, Thanks for your response, here's the output

Re: [gridengine users] h_vmem negative values?

2012-10-19 Thread Alex Chekholko
, Oct 18, 2012 at 6:53 PM, Alex Chekholko ch...@stanford.edu wrote: Hi, Running Rocks 6, so whatever GE version is included there. h_vmem is set consumable and per job, 4G default: -bash-4.1$ qconf -sc |grep h_vmem h_vmem h_vmem MEMORY =YES JOB 4G 0 each exec

Re: [gridengine users] h_vmem negative values?

2012-10-19 Thread Alex Chekholko
wrote: Am 19.10.2012 um 20:58 schrieb Alex Chekholko: qhost values seem fine: ... scg3-0-11 lx26-amd64 32 27.15 63.0G 38.3G9.8G 393.6M scg3-0-12 lx26-amd64 32 27.36 63.0G 38.7G9.8G 33.6M scg3-0-13 lx26-amd64 32 22.61 63.0G

Re: [gridengine users] h_vmem negative values?

2012-10-19 Thread Alex Chekholko
after the job already started? -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] qlogin sets TERM var to dumb?

2012-10-18 Thread Alex Chekholko
On 10/18/12 3:14 AM, Reuti wrote: Am 28.09.2012 um 21:15 schrieb Alex Chekholko: Hi, I have a fresh install of Open Grid Scheduler on a new cluster and I see that when I do a qlogin session, the environment variable TERM gets set to dumb. On my other clusters I see it gets set to linux

[gridengine users] h_vmem negative values?

2012-10-18 Thread Alex Chekholko
to be cycled for the setting to take effect? I can try that next. Regards, -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] qmaster: hard descriptor limit and soft descriptor limit?

2012-10-12 Thread Alex Chekholko
limits.conf? I bet if you restart the process, it'll get the settings from your current root shell, and all will be well. Regards, -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo

Re: [gridengine users] KRB5 support

2012-10-05 Thread Alex Chekholko
it for that. I don't see why this would require a reinstall. You probably need to shut it all down and modify the bootstrap file, but that should be it. IIRC it depends on whether their GE binaries were compiled with Kerberos support or not. -- Alex Chekholko ch...@stanford.edu

Re: [gridengine users] Limit jobs in a queue to those requesting a resource

2012-10-05 Thread Alex Chekholko
testq=1 will go in that queue. relevant thread: http://gridengine.org/pipermail/users/2012-March/002972.html On 10/05/2012 03:36 PM, Orion Poplawski wrote: I'd like to create a queue that only allows jobs that have requested a specific resource requirement. Is that possible? -- Alex

Re: [gridengine users] SGE with KRB5

2012-10-05 Thread Alex Chekholko
On 10/05/2012 04:09 AM, Dave Love wrote: Alex Chekholko ch...@stanford.edu writes: Hi Christoph, We do have it working with AUKS, which is mostly outside GE. Is that better than DESY's arcx system, or are they roughly equivalent? I can't remember why arcx seemed a better bet on a quick look

Re: [gridengine users] SGE with KRB5

2012-10-02 Thread Alex Chekholko
, Christoph Gesendet von meinem Windows Phone ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users -- Alex Chekholko ch...@stanford.edu ___ users mailing list users

[gridengine users] qlogin sets TERM var to dumb?

2012-09-28 Thread Alex Chekholko
. Regards, -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] 10GbE TOR's and performance tuning.

2012-09-13 Thread Alex Chekholko
blade/node infrastructure. Thanks! --JC -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] 10GbE TOR's and performance tuning.

2012-09-13 Thread Alex Chekholko
I found a reference Some Cisco switches are even permanently configured this way -- they can receive pause frames but never emit them. http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html On 09/13/2012 11:17 AM, Alex Chekholko wrote: 3. What of hardware flow control

[gridengine users] AMD 62xx performance was:Re: How to Restrict forking?

2012-09-12 Thread Alex Chekholko
when you get a chance? -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] debugging commlib errors?

2012-09-05 Thread Alex Chekholko
Hi, For a low-level gridengine test, you can use the 'qping' command between various grid engine daemons. I would guess that during the time that you see the commlib error, your qping also wouldn't work. E.g. from an exec host: qping -info QMASTER_NAME 6444 qmaster 1 See the qping man

[gridengine users] queues not erroring out when jobs error out

2012-09-04 Thread Alex Chekholko
specific to the queue. per http://arc.liv.ac.uk/SGE/howto/troubleshooting.html We also have a load sensor that checks for the presence of this filesystem, but the load sensor only updates every few minutes, while the filesystem tends to disappear for only about 60s. Regards, -- Alex Chekholko

Re: [gridengine users] load_formula and PE jobs

2012-08-13 Thread Alex Chekholko
, for PE jobs, it seems as if GE does not abide by the load_formula. Does the scheduler use a different load formula for single core jobs verses parallel jobs suing the PE environment setup? Joseph -- Alex Chekholko ch...@stanford.edu ___ users mailing

Re: [gridengine users] qalter not successful

2012-06-13 Thread Alex Chekholko
ECS, VUW, NZ ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users -- Alex Chekholko ch...@stanford.edu 347-401-4860 ___ users mailing list users@gridengine.org https

Re: [gridengine users] Over-subscription of hosts -- the role of slots and queues

2012-06-05 Thread Alex Chekholko
that they don't overlap. The fact that the problem doesn't come up with multiple parallel jobs may be because of load thresholds. What do you think of my solution? If it's nonsense, can anyone suggest what the problem may be? -- Alex Chekholko ch...@stanford.edu

Re: [gridengine users] Requesting h_vmem or memfree

2012-05-25 Thread Alex Chekholko
', the latter is mem_free in 'qconf -se'. Regards, -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Dynamic Resources?

2012-04-18 Thread Alex Chekholko
a sensor instead. e.g. http://gridscheduler.sourceforge.net/howto/loadsensor.html Regards, -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] MATLAB + SGE experts?

2012-01-19 Thread Alex Chekholko
/ Regards, -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] qacct in xml?

2011-11-23 Thread Alex Chekholko
with finished jobs. thanks in advance for help gerard ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users -- Alex Chekholko ch...@stanford.edu 347-401-4860

Re: [gridengine users] SC11 anyone?

2011-11-16 Thread Alex Chekholko
! -- Daniel Boone ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users -- Alex Chekholko ch...@stanford.edu 347-401-4860 ___ users mailing list users

Re: [gridengine users] current status of Kerberos support (and maybe AFS)?

2011-10-03 Thread Alex Chekholko
you might have. Good Luck, Mark Mark Suhovecky HPC System Administrator Center for Research Computing University of Notre Dame suhove...@nd.edu From: users-boun...@gridengine.org [users-boun...@gridengine.org] On Behalf Of Alex Chekholko [ch

[gridengine users] current status of Kerberos support (and maybe AFS)?

2011-09-30 Thread Alex Chekholko
will work with modern GE 6.2u5+? Regards, -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Queue and Parallell Environments

2011-08-08 Thread Alex Chekholko
where you specify which parallel environments your queue definition has. -- Alex Chekholko ch...@stanford.edu ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Maximum memory for running process?

2011-08-02 Thread Alex Chekholko
Yes, think of it as a high water mark over the duration of the process. 6.2u5 has a bug in accounting that particular metric, but previous and later versions do not. - Original Message - From: William Deegan b...@baddogconsulting.com To: Gridengine Users Group users@gridengine.org