New package created -- testdrive it by downloading the tarball and untar in your $SGE_ROOT.
Tested with our machines (mostly Intel, with a few older AMD boxes). The topology discovery code (which is used by loadcheck) was also tested on a few Magny-Cours boxes, but the binding code was not -- would be great if someone with multi-socket Magny-Cours machines is willing to do some testing for us! http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html And during the pre-alpha release period, I mentioned that optimizing for NUMA machine characteristics was on the ToDo list -- this feature is not included in the beta release yet -- unlike core binding which needs SGE to keep track of core usage, NUMA policy is local to the job, and thus a simple numactl call can be added to the job script to handle most (if not all) of the use cases. Thanks! Rayson On Mon, Apr 18, 2011 at 2:17 PM, Rayson Ho <[email protected]> wrote: > For those who had issues with earlier version, please try the latest > loadcheck v4: > > http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html > > I compiled the binary on Oracle Linux, which is compatible with RHEL > 5.x, Scientific Linux or Centos 5.x. I tested the binary on the > standard Red Hat kernel, and Oracle enhanced "Unbreakable Enterprise > Kernel", Fedora 13, Ubuntu 10.04 LTS. > > Rayson > > > > On Thu, Apr 14, 2011 at 8:28 AM, Rayson Ho <[email protected]> wrote: >> Hi Chansup, >> >> I think I fixed it last night, and I uploaded the loadcheck binary and >> updated the page: >> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html >> >> Or you can download it directly from: >> http://gridscheduler.sourceforge.net/projects/hwloc/loadcheckv2.tar.gz >> >> Again, thanks for the help guys!! >> >> Rayson >> >> >> On Wed, Apr 13, 2011 at 11:38 AM, Rayson Ho <[email protected]> wrote: >>> On Wed, Apr 13, 2011 at 9:14 AM, CB <[email protected]> wrote: >>>> The amount of sockets (total two) and cores (total 24) of two 12-core >>>> magny-cour processor node is correct >>> >>> First of all, thanks Chansup, Ansgar, and Alex (who contacted me >>> offline) for testing the code! >>> >>> This is good, as the get_topology() code is correct, and hwloc is able >>> to handle the Magny-Cours topology. >>> >>> >>>> but there is redundant and misleading description for interprocessor ids. >>> >>> This is in fact my bad, but I think I know how to fix it :-D >>> >>> I will let you guys know when I have the fix, and I will post the new >>> version on the Open Grid Scheduler project page. >>> >>> Again, many thanks!! >>> >>> Rayson >>> >>> >>> >>>> >>>> # ./loadcheck >>>> arch lx26-amd64 >>>> num_proc 24 >>>> m_socket 2 >>>> m_core 24 >>>> m_topology SCCCCCCCCCCCCSCCCCCCCCCCCC >>>> load_short 24.14 >>>> load_medium 24.00 >>>> load_long 22.36 >>>> mem_free 31241.601562M >>>> swap_free 2047.992188M >>>> virtual_free 33289.593750M >>>> mem_total 64562.503906M >>>> swap_total 2047.992188M >>>> virtual_total 66610.496094M >>>> mem_used 33320.902344M >>>> swap_used 0.000000M >>>> virtual_used 33320.902344M >>>> cpu 100.0% >>>> >>>> # ./loadcheck -cb >>>> Your SGE Linux version has built-in core binding functionality! >>>> Your Linux kernel version is: 2.6.27.10-grsec >>>> Amount of sockets: 2 >>>> Amount of cores: 24 >>>> Topology: SCCCCCCCCCCCCSCCCCCCCCCCCC >>>> Mapping of logical socket and core numbers to internal >>>> Internal processor ids for socket 0 core 0: 0 >>>> Internal processor ids for socket 0 core 1: 1 >>>> Internal processor ids for socket 0 core 2: 2 >>>> Internal processor ids for socket 0 core 3: 3 >>>> Internal processor ids for socket 0 core 4: 4 >>>> Internal processor ids for socket 0 core 5: 5 >>>> Internal processor ids for socket 0 core 6: 6 >>>> Internal processor ids for socket 0 core 7: 7 >>>> Internal processor ids for socket 0 core 8: 8 >>>> Internal processor ids for socket 0 core 9: 9 >>>> Internal processor ids for socket 0 core 10: 10 >>>> Internal processor ids for socket 0 core 11: 11 >>>> Internal processor ids for socket 0 core 12: 12 >>>> Internal processor ids for socket 0 core 13: 13 >>>> Internal processor ids for socket 0 core 14: 14 >>>> Internal processor ids for socket 0 core 15: 15 >>>> Internal processor ids for socket 0 core 16: 16 >>>> Internal processor ids for socket 0 core 17: 17 >>>> Internal processor ids for socket 0 core 18: 18 >>>> Internal processor ids for socket 0 core 19: 19 >>>> Internal processor ids for socket 0 core 20: 20 >>>> Internal processor ids for socket 0 core 21: 21 >>>> Internal processor ids for socket 0 core 22: 22 >>>> Internal processor ids for socket 0 core 23: 23 >>>> Internal processor ids for socket 1 core 0: 0 >>>> Internal processor ids for socket 1 core 1: 1 >>>> Internal processor ids for socket 1 core 2: 2 >>>> Internal processor ids for socket 1 core 3: 3 >>>> Internal processor ids for socket 1 core 4: 4 >>>> Internal processor ids for socket 1 core 5: 5 >>>> Internal processor ids for socket 1 core 6: 6 >>>> Internal processor ids for socket 1 core 7: 7 >>>> Internal processor ids for socket 1 core 8: 8 >>>> Internal processor ids for socket 1 core 9: 9 >>>> Internal processor ids for socket 1 core 10: 10 >>>> Internal processor ids for socket 1 core 11: 11 >>>> Internal processor ids for socket 1 core 12: 12 >>>> Internal processor ids for socket 1 core 13: 13 >>>> Internal processor ids for socket 1 core 14: 14 >>>> Internal processor ids for socket 1 core 15: 15 >>>> Internal processor ids for socket 1 core 16: 16 >>>> Internal processor ids for socket 1 core 17: 17 >>>> Internal processor ids for socket 1 core 18: 18 >>>> Internal processor ids for socket 1 core 19: 19 >>>> Internal processor ids for socket 1 core 20: 20 >>>> Internal processor ids for socket 1 core 21: 21 >>>> Internal processor ids for socket 1 core 22: 22 >>>> Internal processor ids for socket 1 core 23: 23 >>>> >>>> I would expect the following: >>>> Mapping of logical socket and core numbers to internal >>>> Internal processor ids for socket 0 core 0: 0 >>>> Internal processor ids for socket 0 core 1: 1 >>>> Internal processor ids for socket 0 core 2: 2 >>>> Internal processor ids for socket 0 core 3: 3 >>>> Internal processor ids for socket 0 core 4: 4 >>>> Internal processor ids for socket 0 core 5: 5 >>>> Internal processor ids for socket 0 core 6: 6 >>>> Internal processor ids for socket 0 core 7: 7 >>>> Internal processor ids for socket 0 core 8: 8 >>>> Internal processor ids for socket 0 core 9: 9 >>>> Internal processor ids for socket 0 core 10: 10 >>>> Internal processor ids for socket 0 core 11: 11 >>>> Internal processor ids for socket 1 core 0: 12 >>>> Internal processor ids for socket 1 core 1: 13 >>>> Internal processor ids for socket 1 core 2: 14 >>>> Internal processor ids for socket 1 core 3: 15 >>>> Internal processor ids for socket 1 core 4: 16 >>>> Internal processor ids for socket 1 core 5: 17 >>>> Internal processor ids for socket 1 core 6: 18 >>>> Internal processor ids for socket 1 core 7: 19 >>>> Internal processor ids for socket 1 core 8: 20 >>>> Internal processor ids for socket 1 core 9: 21 >>>> Internal processor ids for socket 1 core 10: 22 >>>> Internal processor ids for socket 1 core 11: 23 >>>> >>>> Any comments? >>>> >>>> thanks, >>>> - Chansup >>>> >>>> On Tue, Apr 12, 2011 at 4:13 PM, Rayson Ho <[email protected]> wrote: >>>>> Ansgar, >>>>> >>>>> We are in the final stages of hwloc migration, please give our new >>>>> hwloc enabled loadcheck a try: >>>>> >>>>> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html >>>>> >>>>> Rayson >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, Mar 14, 2011 at 11:11 AM, Esztermann, Ansgar >>>>> <[email protected]> wrote: >>>>>> >>>>>> On Mar 12, 2011, at 1:04 , Dave Love wrote: >>>>>> >>>>>>> "Esztermann, Ansgar" <[email protected]> writes: >>>>>>> >>>>>>>> Well, core IDs are unique only within the same socket ID (for older >>>>>>>> CPUs, say Harpertown), so I would assume the same holds for node IDs >>>>>>>> -- it's just that node IDs aren't displayed for Magny-Cours. >>>>>>> >>>>>>> What exactly would you expect? hwloc's lstopo(1) gives the following >>>>>>> under current RedHat 5 (Linux 2.6.18-238.5.1.el5) on a Supermicro H8DGT >>>>>>> (Opteron 6134). It seems to have the information exposed, but I'm not >>>>>>> sure how it should be. (I guess GE should move to hwloc rather than >>>>>>> PLPA, which is now deprecated and not maintained.) >>>>>>> >>>>>>> Machine (63GB) >>>>>>> Socket #0 (32GB) >>>>>>> NUMANode #0 (phys=0 16GB) + L3 #0 (5118KB) >>>>>>> L2 #0 (512KB) + L1 #0 (64KB) + Core #0 + PU #0 (phys=0) >>>>>>> L2 #1 (512KB) + L1 #1 (64KB) + Core #1 + PU #1 (phys=1) >>>>>>> L2 #2 (512KB) + L1 #2 (64KB) + Core #2 + PU #2 (phys=2) >>>>>>> L2 #3 (512KB) + L1 #3 (64KB) + Core #3 + PU #3 (phys=3) >>>>>>> NUMANode #1 (phys=1 16GB) + L3 #1 (5118KB) >>>>>>> L2 #4 (512KB) + L1 #4 (64KB) + Core #4 + PU #4 (phys=4) >>>>>>> L2 #5 (512KB) + L1 #5 (64KB) + Core #5 + PU #5 (phys=5) >>>>>>> L2 #6 (512KB) + L1 #6 (64KB) + Core #6 + PU #6 (phys=6) >>>>>>> L2 #7 (512KB) + L1 #7 (64KB) + Core #7 + PU #7 (phys=7) >>>>>> ... >>>>>> >>>>>> That's exactly what I'd expect... >>>>>> The interface at /sys/devices/system/cpu/cpuN/topology/ doesn't know >>>>>> about NUMANodes, only about Sockets and cores. Thus, cores #0 and #4 in >>>>>> the output above have the same core ID, and SGE interprets that as being >>>>>> one core with two threads. >>>>>> >>>>>> >>>>>> A. >>>>>> -- >>>>>> Ansgar Esztermann >>>>>> DV-Systemadministration >>>>>> Max-Planck-Institut für biophysikalische Chemie, Abteilung 105 >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> [email protected] >>>>>> https://gridengine.org/mailman/listinfo/users >>>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> [email protected] >>>>> https://gridengine.org/mailman/listinfo/users >>>>> >>>> >>> >> > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
