New package created -- testdrive it by downloading the tarball and
untar in your $SGE_ROOT.

Tested with our machines (mostly Intel, with a few older AMD boxes).
The topology discovery code (which is used by loadcheck) was also
tested on a few Magny-Cours boxes, but the binding code was not --
would be great if someone with multi-socket Magny-Cours machines is
willing to do some testing for us!

http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html

And during the pre-alpha release period, I mentioned that optimizing
for NUMA machine characteristics was on the ToDo list -- this feature
is not included in the beta release yet -- unlike core binding which
needs SGE to keep track of core usage, NUMA policy is local to the
job, and thus a simple numactl call can be added to the job script to
handle most (if not all) of the use cases.

Thanks!
Rayson



On Mon, Apr 18, 2011 at 2:17 PM, Rayson Ho <[email protected]> wrote:
> For those who had issues with earlier version, please try the latest
> loadcheck v4:
>
> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html
>
> I compiled the binary on Oracle Linux, which is compatible with RHEL
> 5.x, Scientific Linux or Centos 5.x. I tested the binary on the
> standard Red Hat kernel, and Oracle enhanced "Unbreakable Enterprise
> Kernel", Fedora 13, Ubuntu 10.04 LTS.
>
> Rayson
>
>
>
> On Thu, Apr 14, 2011 at 8:28 AM, Rayson Ho <[email protected]> wrote:
>> Hi Chansup,
>>
>> I think I fixed it last night, and I uploaded the loadcheck binary and
>> updated the page:
>> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html
>>
>> Or you can download it directly from:
>> http://gridscheduler.sourceforge.net/projects/hwloc/loadcheckv2.tar.gz
>>
>> Again, thanks for the help guys!!
>>
>> Rayson
>>
>>
>> On Wed, Apr 13, 2011 at 11:38 AM, Rayson Ho <[email protected]> wrote:
>>> On Wed, Apr 13, 2011 at 9:14 AM, CB <[email protected]> wrote:
>>>> The amount of sockets (total two) and cores  (total 24) of two 12-core
>>>> magny-cour processor node is correct
>>>
>>> First of all, thanks Chansup, Ansgar, and Alex (who contacted me
>>> offline) for testing the code!
>>>
>>> This is good, as the get_topology() code is correct, and hwloc is able
>>> to handle the Magny-Cours topology.
>>>
>>>
>>>>  but there is redundant and misleading description for interprocessor ids.
>>>
>>> This is in fact my bad, but I think I know how to fix it :-D
>>>
>>> I will let you guys know when I have the fix, and I will post the new
>>> version on the Open Grid Scheduler project page.
>>>
>>> Again, many thanks!!
>>>
>>> Rayson
>>>
>>>
>>>
>>>>
>>>> # ./loadcheck
>>>> arch            lx26-amd64
>>>> num_proc        24
>>>> m_socket        2
>>>> m_core          24
>>>> m_topology      SCCCCCCCCCCCCSCCCCCCCCCCCC
>>>> load_short      24.14
>>>> load_medium     24.00
>>>> load_long       22.36
>>>> mem_free        31241.601562M
>>>> swap_free       2047.992188M
>>>> virtual_free    33289.593750M
>>>> mem_total       64562.503906M
>>>> swap_total      2047.992188M
>>>> virtual_total   66610.496094M
>>>> mem_used        33320.902344M
>>>> swap_used       0.000000M
>>>> virtual_used    33320.902344M
>>>> cpu             100.0%
>>>>
>>>> # ./loadcheck -cb
>>>> Your SGE Linux version has built-in core binding functionality!
>>>> Your Linux kernel version is: 2.6.27.10-grsec
>>>> Amount of sockets:              2
>>>> Amount of cores:                24
>>>> Topology:                       SCCCCCCCCCCCCSCCCCCCCCCCCC
>>>> Mapping of logical socket and core numbers to internal
>>>> Internal processor ids for socket     0 core     0:      0
>>>> Internal processor ids for socket     0 core     1:      1
>>>> Internal processor ids for socket     0 core     2:      2
>>>> Internal processor ids for socket     0 core     3:      3
>>>> Internal processor ids for socket     0 core     4:      4
>>>> Internal processor ids for socket     0 core     5:      5
>>>> Internal processor ids for socket     0 core     6:      6
>>>> Internal processor ids for socket     0 core     7:      7
>>>> Internal processor ids for socket     0 core     8:      8
>>>> Internal processor ids for socket     0 core     9:      9
>>>> Internal processor ids for socket     0 core    10:     10
>>>> Internal processor ids for socket     0 core    11:     11
>>>> Internal processor ids for socket     0 core    12:     12
>>>> Internal processor ids for socket     0 core    13:     13
>>>> Internal processor ids for socket     0 core    14:     14
>>>> Internal processor ids for socket     0 core    15:     15
>>>> Internal processor ids for socket     0 core    16:     16
>>>> Internal processor ids for socket     0 core    17:     17
>>>> Internal processor ids for socket     0 core    18:     18
>>>> Internal processor ids for socket     0 core    19:     19
>>>> Internal processor ids for socket     0 core    20:     20
>>>> Internal processor ids for socket     0 core    21:     21
>>>> Internal processor ids for socket     0 core    22:     22
>>>> Internal processor ids for socket     0 core    23:     23
>>>> Internal processor ids for socket     1 core     0:      0
>>>> Internal processor ids for socket     1 core     1:      1
>>>> Internal processor ids for socket     1 core     2:      2
>>>> Internal processor ids for socket     1 core     3:      3
>>>> Internal processor ids for socket     1 core     4:      4
>>>> Internal processor ids for socket     1 core     5:      5
>>>> Internal processor ids for socket     1 core     6:      6
>>>> Internal processor ids for socket     1 core     7:      7
>>>> Internal processor ids for socket     1 core     8:      8
>>>> Internal processor ids for socket     1 core     9:      9
>>>> Internal processor ids for socket     1 core    10:     10
>>>> Internal processor ids for socket     1 core    11:     11
>>>> Internal processor ids for socket     1 core    12:     12
>>>> Internal processor ids for socket     1 core    13:     13
>>>> Internal processor ids for socket     1 core    14:     14
>>>> Internal processor ids for socket     1 core    15:     15
>>>> Internal processor ids for socket     1 core    16:     16
>>>> Internal processor ids for socket     1 core    17:     17
>>>> Internal processor ids for socket     1 core    18:     18
>>>> Internal processor ids for socket     1 core    19:     19
>>>> Internal processor ids for socket     1 core    20:     20
>>>> Internal processor ids for socket     1 core    21:     21
>>>> Internal processor ids for socket     1 core    22:     22
>>>> Internal processor ids for socket     1 core    23:     23
>>>>
>>>> I would expect the following:
>>>> Mapping of logical socket and core numbers to internal
>>>> Internal processor ids for socket     0 core     0:      0
>>>> Internal processor ids for socket     0 core     1:      1
>>>> Internal processor ids for socket     0 core     2:      2
>>>> Internal processor ids for socket     0 core     3:      3
>>>> Internal processor ids for socket     0 core     4:      4
>>>> Internal processor ids for socket     0 core     5:      5
>>>> Internal processor ids for socket     0 core     6:      6
>>>> Internal processor ids for socket     0 core     7:      7
>>>> Internal processor ids for socket     0 core     8:      8
>>>> Internal processor ids for socket     0 core     9:      9
>>>> Internal processor ids for socket     0 core    10:     10
>>>> Internal processor ids for socket     0 core    11:     11
>>>> Internal processor ids for socket     1 core    0:     12
>>>> Internal processor ids for socket     1 core    1:     13
>>>> Internal processor ids for socket     1 core    2:     14
>>>> Internal processor ids for socket     1 core    3:     15
>>>> Internal processor ids for socket     1 core    4:     16
>>>> Internal processor ids for socket     1 core    5:     17
>>>> Internal processor ids for socket     1 core    6:     18
>>>> Internal processor ids for socket     1 core    7:     19
>>>> Internal processor ids for socket     1 core    8:     20
>>>> Internal processor ids for socket     1 core    9:     21
>>>> Internal processor ids for socket     1 core    10:     22
>>>> Internal processor ids for socket     1 core    11:     23
>>>>
>>>> Any comments?
>>>>
>>>> thanks,
>>>> - Chansup
>>>>
>>>> On Tue, Apr 12, 2011 at 4:13 PM, Rayson Ho <[email protected]> wrote:
>>>>> Ansgar,
>>>>>
>>>>> We are in the final stages of hwloc migration, please give our new
>>>>> hwloc enabled loadcheck a try:
>>>>>
>>>>> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html
>>>>>
>>>>> Rayson
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Mar 14, 2011 at 11:11 AM, Esztermann, Ansgar
>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> On Mar 12, 2011, at 1:04 , Dave Love wrote:
>>>>>>
>>>>>>> "Esztermann, Ansgar" <[email protected]> writes:
>>>>>>>
>>>>>>>> Well, core IDs are unique only within the same socket ID (for older 
>>>>>>>> CPUs, say Harpertown), so I would assume the same holds for node IDs 
>>>>>>>> -- it's just that node IDs aren't displayed for Magny-Cours.
>>>>>>>
>>>>>>> What exactly would you expect?  hwloc's lstopo(1) gives the following
>>>>>>> under current RedHat 5 (Linux 2.6.18-238.5.1.el5) on a Supermicro H8DGT
>>>>>>> (Opteron 6134).  It seems to have the information exposed, but I'm not
>>>>>>> sure how it should be.  (I guess GE should move to hwloc rather than
>>>>>>> PLPA, which is now deprecated and not maintained.)
>>>>>>>
>>>>>>> Machine (63GB)
>>>>>>>  Socket #0 (32GB)
>>>>>>>    NUMANode #0 (phys=0 16GB) + L3 #0 (5118KB)
>>>>>>>      L2 #0 (512KB) + L1 #0 (64KB) + Core #0 + PU #0 (phys=0)
>>>>>>>      L2 #1 (512KB) + L1 #1 (64KB) + Core #1 + PU #1 (phys=1)
>>>>>>>      L2 #2 (512KB) + L1 #2 (64KB) + Core #2 + PU #2 (phys=2)
>>>>>>>      L2 #3 (512KB) + L1 #3 (64KB) + Core #3 + PU #3 (phys=3)
>>>>>>>    NUMANode #1 (phys=1 16GB) + L3 #1 (5118KB)
>>>>>>>      L2 #4 (512KB) + L1 #4 (64KB) + Core #4 + PU #4 (phys=4)
>>>>>>>      L2 #5 (512KB) + L1 #5 (64KB) + Core #5 + PU #5 (phys=5)
>>>>>>>      L2 #6 (512KB) + L1 #6 (64KB) + Core #6 + PU #6 (phys=6)
>>>>>>>      L2 #7 (512KB) + L1 #7 (64KB) + Core #7 + PU #7 (phys=7)
>>>>>> ...
>>>>>>
>>>>>> That's exactly what I'd expect...
>>>>>> The interface at /sys/devices/system/cpu/cpuN/topology/ doesn't know 
>>>>>> about NUMANodes, only about Sockets and cores. Thus, cores #0 and #4 in 
>>>>>> the output above have the same core ID, and SGE interprets that as being 
>>>>>> one core with two threads.
>>>>>>
>>>>>>
>>>>>> A.
>>>>>> --
>>>>>> Ansgar Esztermann
>>>>>> DV-Systemadministration
>>>>>> Max-Planck-Institut für biophysikalische Chemie, Abteilung 105
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> [email protected]
>>>>>> https://gridengine.org/mailman/listinfo/users
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> [email protected]
>>>>> https://gridengine.org/mailman/listinfo/users
>>>>>
>>>>
>>>
>>
>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to