Hi Earl,

Have you restarted the execution daemon in that host?
If there are running jobs, you can "softstop" it ( /etc/init.d/sgeexecd softstop ), and then start it again.

Txema


El 20/04/12 17:24, Earl Lazarus escribió:
Indeed the script would not have been able to echo to my home directory, so I changed the destination of the echo to /tmp/LD. I then changed the name of script slightly and went into qmon and fixed the new spelling for both global and the one host where I am looking. After a few minutes there is no sign of my echoes in /tmp or the script name in a ps -elf on the one host.

On Fri, Apr 20, 2012 at 9:42 AM, Reuti <[email protected] <mailto:[email protected]>> wrote:

    Am 20.04.2012 um 16:28 schrieb Earl Lazarus:

    > Yes the load sensor is under my home directory which is visible
    on all machines.  Would it be a true statement that my load sensor
    should be running as soon as I specify it in the host
    configuration?  I need not submit jobs that reference the load
    that it is measuring.

    If you change the definition in a local host configuration it's
    necessary to change the global configuration to distribute it to
    the node (just remove a blank somewhere). Then after 2 cycles of
    the load_report_time the process should be visible on a node:

    $ ps -e f
    ...
     5081 ?        Sl   125:47 /usr/sge/bin/lx24-amd64/sge_execd
     5147 ?        S      2:39  \_ /bin/sh /usr/sge/cluster/tmpspace.sh

    -- Reuti


    > On Thu, Apr 19, 2012 at 9:30 PM, Ron Chen
    <[email protected] <mailto:[email protected]>> wrote:
    > I don't have access to a Unix machine now, so I assume the
    script works.
    >
    > However, it is always the execution daemons that run the load
    sensors, so
    > make sure the load sensor is available on all the machines.
    >
    >  -Ron
    >
    >
    >
    > ________________________________
    > From: Earl Lazarus <[email protected]
    <mailto:[email protected]>>
    > To: Rayson Ho <[email protected]
    <mailto:[email protected]>>
    > Cc: [email protected] <mailto:[email protected]>
    > Sent: Thursday, April 19, 2012 9:49 PM
    > Subject: Re: [gridengine users] Load sensors
    >
    >
    > Here is  the load sensor...it basically checks to see if a
    server is running on the host, returning 1 if yes
    > and 0 if no.  It currently contains diagnostic prints to my home
    directory.   It runs fine from the command prompt.
    >
    > When is a user provided load monitor actually run?  Every time
    the scheduler runs?
    >
    > #!/bin/bash
    > #PURPOSE  SGE load monitor
    > #
    > #
    > good(){
    >    echo "begin"
    >    echo "$hst:earl_ecs_jun:1"
    >    echo "end"
    > }
    > bad(){
    >    echo "begin"
    >    echo "$hst:earl_ecs_jun:0"
    >    echo "end"
    > }
    >    echo START `date` >>/home/elazarus/LD
    >    hst=$(uname -n)
    >    pf="PID_FILE"
    >    while [ 1 ] ; do
    >       read input
    >       result=$?
    >       echo READ `date` >>/home/elazarus/LD
    >       if [ $result != 0 ] ; then
    >          exit 1
    >       fi
    >       if [ "$input" = "quit" ] ; then
    >          echo END `date` >>/home/elazarus/LD
    >          exit 0
    >       fi
    > #     --ASSERT VALID QUERY
    >       tmpname=/tmp/jaeger/0p1/EDB/ECS_JUN_SS3_SL4h
    >       if [ -d $tmpname ] ; then
    >          cd $tmpname
    > #        --EXAMINE THE PID_FILE
    >          if [ -e $pf ] ; then
    > #           --FOUND PID_FILE
    >             pid=$(cat $pf)
    >             l=$(ps h -p $pid |wc -l)
    >             if [ $l -eq 0 ] ; then
    > #              --CANNOT FIND THE SPECIFIED PROCESS
    >                bad
    >             else
    > #              --IT'S RUNNING!!
    >                good
    >             fi
    >          else
    > #           --NO PID_FILE
    >             bad
    >          fi
    >       else
    > #        --NO SERVER DIRECTORY
    >          bad
    >       fi
    >    done
    >
    >
    >
    >
    > On Thu, Apr 19, 2012 at 7:18 PM, Rayson Ho
    <[email protected] <mailto:[email protected]>> wrote:
    >
    > Can you post your load sensor, or at least the main structure of
    your
    > >load sensor script??
    > >
    > >If you run the script interactively, what do you get??
    > >
    > >Rayson
    > >
    > >
    > >
    > >
    > >On Thu, Apr 19, 2012 at 8:14 PM, Earl Lazarus
    <[email protected] <mailto:[email protected]>> wrote:
    > >> I followed all of those directions...it just doesn't run.
     Permissions are
    > >> 777.
    > >>  I put an "echo START `date` >>/home/<myid>/LD"
    > >>
    > >> The file is always empty.
    > >>
    > >>
    > >> On Thu, Apr 19, 2012 at 12:37 PM, Rayson Ho
    <[email protected] <mailto:[email protected]>>
    > >> wrote:
    > >>>
    > >>> There is not a lot of actual "REQUIREMENTS" for a load
    sensor. As long
    > >>> as it prints the proper values to standard output, then it
    is good
    > >>> enough in most cases.
    > >>>
    > >>> You can get more detail from Oracle's doc:
    > >>>
    > >>>
    > >>>
    
http://docs.oracle.com/cd/E24901_01/doc.62/e21978/configuration.htm#sthref182
    > >>>
    > >>> Rayson
    > >>>
    > >>>
    > >>>
    > >>> On Thu, Apr 19, 2012 at 1:31 PM, Earl Lazarus
    <[email protected] <mailto:[email protected]>>
    > >>> wrote:
    > >>> > Based upon earlier postings, it looks like a load sensor
    will solve my
    > >>> > problem.  Others have
    > >>> > pointed to the following link (which contains an example
    of a load
    > >>> > sensor
    > >>> > script).
    > >>> >
    > >>> > http://gridscheduler.sourceforge.net/howto/loadsensor.html
    > >>> >
    > >>> > The example script at this site contains a "read"
    statement and seems to
    > >>> > communicate with SGE via "echo".  Is there someplace where
    I can
    > >>> > find the actual REQUIREMENTS for a load sensor script
    instead of
    > >>> > having to reverse engineer the requirements from an example?
    > >>> >
    > >>> > _______________________________________________
    > >>> > users mailing list
    > >>> > [email protected] <mailto:[email protected]>
    > >>> > https://gridengine.org/mailman/listinfo/users
    > >>> >
    > >>
    > >>
    > >>
    > >> _______________________________________________
    > >> users mailing list
    > >> [email protected] <mailto:[email protected]>
    > >> https://gridengine.org/mailman/listinfo/users
    > >>
    > >
    >
    > _______________________________________________
    > users mailing list
    > [email protected] <mailto:[email protected]>
    > https://gridengine.org/mailman/listinfo/users
    >
    > _______________________________________________
    > users mailing list
    > [email protected] <mailto:[email protected]>
    > https://gridengine.org/mailman/listinfo/users




_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to