Can you just start with something simple and also try to use the most
basic load sensor?

This is how I configured mine:

1) Put the load sensor in the global config by running "qconf -mconf":
  load_sensor                  /tmp/loadsensor
  loglevel                     log_info

We are turning the log level to info such that we can see what's going on.


2) Make sure that root can run the load sensor, I think root squash
also bans root from reading (not only writing) NFS files. Hint: Logon
as root and run the script.


3) Just to be sure, restart execd. And you should see something like
the following in the log file ($SGE_ROOT/default/spool/<node
name>/messages)

04/20/2012 11:28:29|  main|computer|I|using "/tmp/loadsensor" for load_sensor
...
04/20/2012 11:28:29|  main|computer|I|starting load sensor /tmp/loadsensor

Rayson



On Fri, Apr 20, 2012 at 11:24 AM, Earl Lazarus <[email protected]> wrote:
> Indeed the script would not have been able to echo to my home directory, so
> I changed the destination of the echo to /tmp/LD.  I then changed the name
> of script slightly and went into qmon and fixed the new spelling for both
> global and the one host where I am looking.  After a few minutes there is no
> sign of my echoes in /tmp or the script name in a ps -elf on the one host.
>
>
> On Fri, Apr 20, 2012 at 9:42 AM, Reuti <[email protected]> wrote:
>>
>> Am 20.04.2012 um 16:28 schrieb Earl Lazarus:
>>
>> > Yes the load sensor is under my home directory which is visible on all
>> > machines.  Would it be a true statement that my load sensor should be
>> > running as soon as I specify it in the host configuration?  I need not
>> > submit jobs that reference the load that it is measuring.
>>
>> If you change the definition in a local host configuration it's necessary
>> to change the global configuration to distribute it to the node (just remove
>> a blank somewhere). Then after 2 cycles of the load_report_time the process
>> should be visible on a node:
>>
>> $ ps -e f
>> ...
>>  5081 ?        Sl   125:47 /usr/sge/bin/lx24-amd64/sge_execd
>>  5147 ?        S      2:39  \_ /bin/sh /usr/sge/cluster/tmpspace.sh
>>
>> -- Reuti
>>
>>
>> > On Thu, Apr 19, 2012 at 9:30 PM, Ron Chen <[email protected]>
>> > wrote:
>> > I don't have access to a Unix machine now, so I assume the script works.
>> >
>> > However, it is always the execution daemons that run the load sensors,
>> > so
>> > make sure the load sensor is available on all the machines.
>> >
>> >  -Ron
>> >
>> >
>> >
>> > ________________________________
>> > From: Earl Lazarus <[email protected]>
>> > To: Rayson Ho <[email protected]>
>> > Cc: [email protected]
>> > Sent: Thursday, April 19, 2012 9:49 PM
>> > Subject: Re: [gridengine users] Load sensors
>> >
>> >
>> > Here is  the load sensor...it basically checks to see if a server is
>> > running on the host, returning 1 if yes
>> > and 0 if no.  It currently contains diagnostic prints to my home
>> > directory.   It runs fine from the command prompt.
>> >
>> > When is a user provided load monitor actually run?  Every time the
>> > scheduler runs?
>> >
>> > #!/bin/bash
>> > #PURPOSE  SGE load monitor
>> > #
>> > #
>> > good(){
>> >    echo "begin"
>> >    echo "$hst:earl_ecs_jun:1"
>> >    echo "end"
>> > }
>> > bad(){
>> >    echo "begin"
>> >    echo "$hst:earl_ecs_jun:0"
>> >    echo "end"
>> > }
>> >    echo START `date`  >>/home/elazarus/LD
>> >    hst=$(uname -n)
>> >    pf="PID_FILE"
>> >    while [ 1 ] ; do
>> >       read input
>> >       result=$?
>> >       echo READ `date`  >>/home/elazarus/LD
>> >       if [ $result != 0 ] ; then
>> >          exit 1
>> >       fi
>> >       if [ "$input" = "quit" ] ; then
>> >          echo END `date`  >>/home/elazarus/LD
>> >          exit 0
>> >       fi
>> > #     --ASSERT VALID QUERY
>> >       tmpname=/tmp/jaeger/0p1/EDB/ECS_JUN_SS3_SL4h
>> >       if [ -d $tmpname ] ; then
>> >          cd $tmpname
>> > #        --EXAMINE THE PID_FILE
>> >          if [ -e $pf ] ; then
>> > #           --FOUND PID_FILE
>> >             pid=$(cat $pf)
>> >             l=$(ps h -p $pid |wc -l)
>> >             if [ $l -eq 0 ] ; then
>> > #              --CANNOT FIND THE SPECIFIED PROCESS
>> >                bad
>> >             else
>> > #              --IT'S RUNNING!!
>> >                good
>> >             fi
>> >          else
>> > #           --NO PID_FILE
>> >             bad
>> >          fi
>> >       else
>> > #        --NO SERVER DIRECTORY
>> >          bad
>> >       fi
>> >    done
>> >
>> >
>> >
>> >
>> > On Thu, Apr 19, 2012 at 7:18 PM, Rayson Ho <[email protected]>
>> > wrote:
>> >
>> > Can you post your load sensor, or at least the main structure of your
>> > >load sensor script??
>> > >
>> > >If you run the script interactively, what do you get??
>> > >
>> > >Rayson
>> > >
>> > >
>> > >
>> > >
>> > >On Thu, Apr 19, 2012 at 8:14 PM, Earl Lazarus <[email protected]>
>> > > wrote:
>> > >> I followed all of those directions...it just doesn't run.
>> > >>  Permissions are
>> > >> 777.
>> > >>  I put an "echo START `date` >>/home/<myid>/LD"
>> > >>
>> > >> The file is always empty.
>> > >>
>> > >>
>> > >> On Thu, Apr 19, 2012 at 12:37 PM, Rayson Ho
>> > >> <[email protected]>
>> > >> wrote:
>> > >>>
>> > >>> There is not a lot of actual "REQUIREMENTS" for a load sensor. As
>> > >>> long
>> > >>> as it prints the proper values to standard output, then it is good
>> > >>> enough in most cases.
>> > >>>
>> > >>> You can get more detail from Oracle's doc:
>> > >>>
>> > >>>
>> > >>>
>> > >>> http://docs.oracle.com/cd/E24901_01/doc.62/e21978/configuration.htm#sthref182
>> > >>>
>> > >>> Rayson
>> > >>>
>> > >>>
>> > >>>
>> > >>> On Thu, Apr 19, 2012 at 1:31 PM, Earl Lazarus
>> > >>> <[email protected]>
>> > >>> wrote:
>> > >>> > Based upon earlier postings, it looks like a load sensor will
>> > >>> > solve my
>> > >>> > problem.  Others have
>> > >>> > pointed to the following link (which contains an example of a load
>> > >>> > sensor
>> > >>> > script).
>> > >>> >
>> > >>> > http://gridscheduler.sourceforge.net/howto/loadsensor.html
>> > >>> >
>> > >>> > The example script at this site contains a "read" statement and
>> > >>> > seems to
>> > >>> > communicate with SGE via "echo".  Is there someplace where I can
>> > >>> > find the actual REQUIREMENTS for a load sensor script instead of
>> > >>> > having to reverse engineer the requirements from an example?
>> > >>> >
>> > >>> > _______________________________________________
>> > >>> > users mailing list
>> > >>> > [email protected]
>> > >>> > https://gridengine.org/mailman/listinfo/users
>> > >>> >
>> > >>
>> > >>
>> > >>
>> > >> _______________________________________________
>> > >> users mailing list
>> > >> [email protected]
>> > >> https://gridengine.org/mailman/listinfo/users
>> > >>
>> > >
>> >
>> > _______________________________________________
>> > users mailing list
>> > [email protected]
>> > https://gridengine.org/mailman/listinfo/users
>> >
>> > _______________________________________________
>> > users mailing list
>> > [email protected]
>> > https://gridengine.org/mailman/listinfo/users
>>
>
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to