Thank you all!! It's running now. I've been changing several things at once, but I think it started working when I put the load monitor in /tmp instead of under my home directory which is NFS mounted. Now I need to retrace my steps from my toy load monitor to my real one. Also, from my echoes, it appears that it is being invoked every 40 sec. I believe I've seen that number as the "load report time" in the Cluster Configuration tab of the qmon GUI. I assume that I can change that number. However, I assume it will change the frequency of all of the other internal load monitors within SGE; probably not a good idea. My load monitor is very simple and could run at a higher rate, but I don't want to upset the apple cart.
Thanks again for all who helped. I'm a user type, not a sys admin. Luckily the real sys admin gave me SGE manager privileges to sort this out. On Fri, Apr 20, 2012 at 10:40 AM, Txema Heredia Genestar < [email protected]> wrote: > Hi Earl, > > Have you restarted the execution daemon in that host? > If there are running jobs, you can "softstop" it ( /etc/init.d/sgeexecd > softstop ), and then start it again. > > Txema > > > El 20/04/12 17:24, Earl Lazarus escribió: > > Indeed the script would not have been able to echo to my home directory, > so I changed the destination of the echo to /tmp/LD. I then changed the > name of script slightly and went into qmon and fixed the new spelling for > both global and the one host where I am looking. After a few minutes there > is no sign of my echoes in /tmp or the script name in a ps -elf on the one > host. > > On Fri, Apr 20, 2012 at 9:42 AM, Reuti <[email protected]> wrote: > >> Am 20.04.2012 um 16:28 schrieb Earl Lazarus: >> >> > Yes the load sensor is under my home directory which is visible on all >> machines. Would it be a true statement that my load sensor should be >> running as soon as I specify it in the host configuration? I need not >> submit jobs that reference the load that it is measuring. >> >> If you change the definition in a local host configuration it's >> necessary to change the global configuration to distribute it to the node >> (just remove a blank somewhere). Then after 2 cycles of the >> load_report_time the process should be visible on a node: >> >> $ ps -e f >> ... >> 5081 ? Sl 125:47 /usr/sge/bin/lx24-amd64/sge_execd >> 5147 ? S 2:39 \_ /bin/sh /usr/sge/cluster/tmpspace.sh >> >> -- Reuti >> >> >> > On Thu, Apr 19, 2012 at 9:30 PM, Ron Chen <[email protected]> >> wrote: >> > I don't have access to a Unix machine now, so I assume the script works. >> > >> > However, it is always the execution daemons that run the load sensors, >> so >> > make sure the load sensor is available on all the machines. >> > >> > -Ron >> > >> > >> > >> > ________________________________ >> > From: Earl Lazarus <[email protected]> >> > To: Rayson Ho <[email protected]> >> > Cc: [email protected] >> > Sent: Thursday, April 19, 2012 9:49 PM >> > Subject: Re: [gridengine users] Load sensors >> > >> > >> > Here is the load sensor...it basically checks to see if a server is >> running on the host, returning 1 if yes >> > and 0 if no. It currently contains diagnostic prints to my home >> directory. It runs fine from the command prompt. >> > >> > When is a user provided load monitor actually run? Every time the >> scheduler runs? >> > >> > #!/bin/bash >> > #PURPOSE SGE load monitor >> > # >> > # >> > good(){ >> > echo "begin" >> > echo "$hst:earl_ecs_jun:1" >> > echo "end" >> > } >> > bad(){ >> > echo "begin" >> > echo "$hst:earl_ecs_jun:0" >> > echo "end" >> > } >> > echo START `date` >>/home/elazarus/LD >> > hst=$(uname -n) >> > pf="PID_FILE" >> > while [ 1 ] ; do >> > read input >> > result=$? >> > echo READ `date` >>/home/elazarus/LD >> > if [ $result != 0 ] ; then >> > exit 1 >> > fi >> > if [ "$input" = "quit" ] ; then >> > echo END `date` >>/home/elazarus/LD >> > exit 0 >> > fi >> > # --ASSERT VALID QUERY >> > tmpname=/tmp/jaeger/0p1/EDB/ECS_JUN_SS3_SL4h >> > if [ -d $tmpname ] ; then >> > cd $tmpname >> > # --EXAMINE THE PID_FILE >> > if [ -e $pf ] ; then >> > # --FOUND PID_FILE >> > pid=$(cat $pf) >> > l=$(ps h -p $pid |wc -l) >> > if [ $l -eq 0 ] ; then >> > # --CANNOT FIND THE SPECIFIED PROCESS >> > bad >> > else >> > # --IT'S RUNNING!! >> > good >> > fi >> > else >> > # --NO PID_FILE >> > bad >> > fi >> > else >> > # --NO SERVER DIRECTORY >> > bad >> > fi >> > done >> > >> > >> > >> > >> > On Thu, Apr 19, 2012 at 7:18 PM, Rayson Ho <[email protected]> >> wrote: >> > >> > Can you post your load sensor, or at least the main structure of your >> > >load sensor script?? >> > > >> > >If you run the script interactively, what do you get?? >> > > >> > >Rayson >> > > >> > > >> > > >> > > >> > >On Thu, Apr 19, 2012 at 8:14 PM, Earl Lazarus <[email protected]> >> wrote: >> > >> I followed all of those directions...it just doesn't run. >> Permissions are >> > >> 777. >> > >> I put an "echo START `date` >>/home/<myid>/LD" >> > >> >> > >> The file is always empty. >> > >> >> > >> >> > >> On Thu, Apr 19, 2012 at 12:37 PM, Rayson Ho < >> [email protected]> >> > >> wrote: >> > >>> >> > >>> There is not a lot of actual "REQUIREMENTS" for a load sensor. As >> long >> > >>> as it prints the proper values to standard output, then it is good >> > >>> enough in most cases. >> > >>> >> > >>> You can get more detail from Oracle's doc: >> > >>> >> > >>> >> > >>> >> http://docs.oracle.com/cd/E24901_01/doc.62/e21978/configuration.htm#sthref182 >> > >>> >> > >>> Rayson >> > >>> >> > >>> >> > >>> >> > >>> On Thu, Apr 19, 2012 at 1:31 PM, Earl Lazarus < >> [email protected]> >> > >>> wrote: >> > >>> > Based upon earlier postings, it looks like a load sensor will >> solve my >> > >>> > problem. Others have >> > >>> > pointed to the following link (which contains an example of a load >> > >>> > sensor >> > >>> > script). >> > >>> > >> > >>> > http://gridscheduler.sourceforge.net/howto/loadsensor.html >> > >>> > >> > >>> > The example script at this site contains a "read" statement and >> seems to >> > >>> > communicate with SGE via "echo". Is there someplace where I can >> > >>> > find the actual REQUIREMENTS for a load sensor script instead of >> > >>> > having to reverse engineer the requirements from an example? >> > >>> > >> > >>> > _______________________________________________ >> > >>> > users mailing list >> > >>> > [email protected] >> > >>> > https://gridengine.org/mailman/listinfo/users >> > >>> > >> > >> >> > >> >> > >> >> > >> _______________________________________________ >> > >> users mailing list >> > >> [email protected] >> > >> https://gridengine.org/mailman/listinfo/users >> > >> >> > > >> > >> > _______________________________________________ >> > users mailing list >> > [email protected] >> > https://gridengine.org/mailman/listinfo/users >> > >> > _______________________________________________ >> > users mailing list >> > [email protected] >> > https://gridengine.org/mailman/listinfo/users >> >> > > > _______________________________________________ > users mailing > [email protected]https://gridengine.org/mailman/listinfo/users > > > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users > >
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
