Hi Earl,
Have you restarted the execution daemon in that host?
If there are running jobs, you can "softstop" it ( /etc/init.d/sgeexecd
softstop ), and then start it again.
Txema
El 20/04/12 17:24, Earl Lazarus escribió:
Indeed the script would not have been able to echo to my home
directory, so I changed the destination of the echo to /tmp/LD. I
then changed the name of script slightly and went into qmon and fixed
the new spelling for both global and the one host where I am looking.
After a few minutes there is no sign of my echoes in /tmp or the
script name in a ps -elf on the one host.
On Fri, Apr 20, 2012 at 9:42 AM, Reuti <[email protected]
<mailto:[email protected]>> wrote:
Am 20.04.2012 um 16:28 schrieb Earl Lazarus:
> Yes the load sensor is under my home directory which is visible
on all machines. Would it be a true statement that my load sensor
should be running as soon as I specify it in the host
configuration? I need not submit jobs that reference the load
that it is measuring.
If you change the definition in a local host configuration it's
necessary to change the global configuration to distribute it to
the node (just remove a blank somewhere). Then after 2 cycles of
the load_report_time the process should be visible on a node:
$ ps -e f
...
5081 ? Sl 125:47 /usr/sge/bin/lx24-amd64/sge_execd
5147 ? S 2:39 \_ /bin/sh /usr/sge/cluster/tmpspace.sh
-- Reuti
> On Thu, Apr 19, 2012 at 9:30 PM, Ron Chen
<[email protected] <mailto:[email protected]>> wrote:
> I don't have access to a Unix machine now, so I assume the
script works.
>
> However, it is always the execution daemons that run the load
sensors, so
> make sure the load sensor is available on all the machines.
>
> -Ron
>
>
>
> ________________________________
> From: Earl Lazarus <[email protected]
<mailto:[email protected]>>
> To: Rayson Ho <[email protected]
<mailto:[email protected]>>
> Cc: [email protected] <mailto:[email protected]>
> Sent: Thursday, April 19, 2012 9:49 PM
> Subject: Re: [gridengine users] Load sensors
>
>
> Here is the load sensor...it basically checks to see if a
server is running on the host, returning 1 if yes
> and 0 if no. It currently contains diagnostic prints to my home
directory. It runs fine from the command prompt.
>
> When is a user provided load monitor actually run? Every time
the scheduler runs?
>
> #!/bin/bash
> #PURPOSE SGE load monitor
> #
> #
> good(){
> echo "begin"
> echo "$hst:earl_ecs_jun:1"
> echo "end"
> }
> bad(){
> echo "begin"
> echo "$hst:earl_ecs_jun:0"
> echo "end"
> }
> echo START `date` >>/home/elazarus/LD
> hst=$(uname -n)
> pf="PID_FILE"
> while [ 1 ] ; do
> read input
> result=$?
> echo READ `date` >>/home/elazarus/LD
> if [ $result != 0 ] ; then
> exit 1
> fi
> if [ "$input" = "quit" ] ; then
> echo END `date` >>/home/elazarus/LD
> exit 0
> fi
> # --ASSERT VALID QUERY
> tmpname=/tmp/jaeger/0p1/EDB/ECS_JUN_SS3_SL4h
> if [ -d $tmpname ] ; then
> cd $tmpname
> # --EXAMINE THE PID_FILE
> if [ -e $pf ] ; then
> # --FOUND PID_FILE
> pid=$(cat $pf)
> l=$(ps h -p $pid |wc -l)
> if [ $l -eq 0 ] ; then
> # --CANNOT FIND THE SPECIFIED PROCESS
> bad
> else
> # --IT'S RUNNING!!
> good
> fi
> else
> # --NO PID_FILE
> bad
> fi
> else
> # --NO SERVER DIRECTORY
> bad
> fi
> done
>
>
>
>
> On Thu, Apr 19, 2012 at 7:18 PM, Rayson Ho
<[email protected] <mailto:[email protected]>> wrote:
>
> Can you post your load sensor, or at least the main structure of
your
> >load sensor script??
> >
> >If you run the script interactively, what do you get??
> >
> >Rayson
> >
> >
> >
> >
> >On Thu, Apr 19, 2012 at 8:14 PM, Earl Lazarus
<[email protected] <mailto:[email protected]>> wrote:
> >> I followed all of those directions...it just doesn't run.
Permissions are
> >> 777.
> >> I put an "echo START `date` >>/home/<myid>/LD"
> >>
> >> The file is always empty.
> >>
> >>
> >> On Thu, Apr 19, 2012 at 12:37 PM, Rayson Ho
<[email protected] <mailto:[email protected]>>
> >> wrote:
> >>>
> >>> There is not a lot of actual "REQUIREMENTS" for a load
sensor. As long
> >>> as it prints the proper values to standard output, then it
is good
> >>> enough in most cases.
> >>>
> >>> You can get more detail from Oracle's doc:
> >>>
> >>>
> >>>
http://docs.oracle.com/cd/E24901_01/doc.62/e21978/configuration.htm#sthref182
> >>>
> >>> Rayson
> >>>
> >>>
> >>>
> >>> On Thu, Apr 19, 2012 at 1:31 PM, Earl Lazarus
<[email protected] <mailto:[email protected]>>
> >>> wrote:
> >>> > Based upon earlier postings, it looks like a load sensor
will solve my
> >>> > problem. Others have
> >>> > pointed to the following link (which contains an example
of a load
> >>> > sensor
> >>> > script).
> >>> >
> >>> > http://gridscheduler.sourceforge.net/howto/loadsensor.html
> >>> >
> >>> > The example script at this site contains a "read"
statement and seems to
> >>> > communicate with SGE via "echo". Is there someplace where
I can
> >>> > find the actual REQUIREMENTS for a load sensor script
instead of
> >>> > having to reverse engineer the requirements from an example?
> >>> >
> >>> > _______________________________________________
> >>> > users mailing list
> >>> > [email protected] <mailto:[email protected]>
> >>> > https://gridengine.org/mailman/listinfo/users
> >>> >
> >>
> >>
> >>
> >> _______________________________________________
> >> users mailing list
> >> [email protected] <mailto:[email protected]>
> >> https://gridengine.org/mailman/listinfo/users
> >>
> >
>
> _______________________________________________
> users mailing list
> [email protected] <mailto:[email protected]>
> https://gridengine.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> [email protected] <mailto:[email protected]>
> https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users