Re: [hwloc-users] unusual memory binding results
I simplified things and instead of writing to a 2D array, I allocate a 1D array of bytes and touch pages in a linear fashion. Then I call syscall(NR)move_pages, ) and retrieve a status array for each page in the data. When I allocate 511 pages and touch alternate pages on alternate numa nodes Numa page binding 511 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 but as soon as I increase to 512 pages, it breaks. Numa page binding 512 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 On the 8 numa node machine it sometimes gives the right answer even with 512 pages. Still baffled JB -Original Message- From: hwloc-users On Behalf Of Biddiscombe, John A. Sent: 28 January 2019 16:14 To: Brice Goglin Cc: Hardware locality user list Subject: Re: [hwloc-users] unusual memory binding results Brice >Can you print the pattern before and after thread 1 touched its pages, or even >in the middle ? >It looks like somebody is touching too many pages here. Experimenting with different threads touching one or more pages, I get unpredicatable results here on the 8 numa node device, the result is perfect. I am only allowing thread 3 and 7 to write a single memory location get_numa_domain() 8 Domain Numa pattern 3--- 7--- Contents of memory locations 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 63 0 0 0 0 0 0 0 you can see that core 26 (numa domain 3) wrote to memory, and so did core 63 (domain 8) Now I run it a second time and look, its rubbish get_numa_domain() 8 Domain Numa pattern 3--- 3--- 3--- 3--- 3--- 3--- 3--- 3--- Contents of memory locations 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 63 0 0 0 0 0 0 0 after allowing the data to be read by a random thread 3777 3777 3777 3777 3777 3777 3777 3777 I'm baffled. JB ___ hwloc-users mailing list hwloc-users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-users ___ hwloc-users mailing list hwloc-users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-users
Re: [hwloc-users] unusual memory binding results
Crazy idea: 512 pages could be replaced with a single 2MB huge page. You're not requesting huge pages in your allocation but some systems have transparent huge pages enabled by default (e.g. RHEL https://access.redhat.com/solutions/46111) This could explain why 512 pages get allocated on the same node, but it wouldn't explain crazy patterns you've seen in the past. Brice Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit : > I simplified things and instead of writing to a 2D array, I allocate a 1D > array of bytes and touch pages in a linear fashion. > Then I call syscall(NR)move_pages, ) and retrieve a status array for each > page in the data. > > When I allocate 511 pages and touch alternate pages on alternate numa nodes > > Numa page binding 511 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 > > but as soon as I increase to 512 pages, it breaks. > > Numa page binding 512 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 > > On the 8 numa node machine it sometimes gives the right answer even with 512 > pages. > > Still baffled > > JB > > -Original Message- > From: hwloc-users On Behalf Of > Biddiscombe, John A. > Sent: 28 January 2019 16:14 > To: Brice Goglin > Cc: Hardware locality user list > Subject: Re: [hwloc-users] unusual memory binding results > > Brice > >> Can you print the pattern before and after thread 1 touched its pages, or >> even in the middle ? >> It looks like somebody is touching too many pages here. > Experimenting with different threads touching one or more pages, I get > unpredicatable results > > here on the 8 numa node device, the result is perfect. I am only allowing > thread 3 and 7 to write a single memory location > > get_numa_domain() 8 Domain Numa pattern > > > > 3--- > > > > 7--- > > > > Contents of memory locations > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 26 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 63 0 0 0 0 0 0 0 > > > you can see that core 26 (numa domain 3) wrote to memory, and so did core 63 > (domain 8) > > Now I run it a second time and look, its rubbish > > get_numa_domain() 8 Domain Numa pattern > 3--- > 3--- > 3--- > 3--- > 3--- > 3--- > 3--- > 3--- > > > > Contents of memory locations > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 26 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 63 0 0 0 0 0 0 0 > > > after allowing the data to be read by a random thread > > 3777 > 3777 > 3777 > 3777 > 3777 > 3777 > 3777 > 3777 > > I'm baffled. > > JB > > ___ > hwloc-users mailing list > hwloc-users@lists.open-mpi.org > https://
Re: [hwloc-users] unusual memory binding results
I wondered something similar. The crazy patterns usually happen on columns of the 2D matrix and as it is column major, it does loosely fit the idea (most of the time). I will play some more (though I'm fed up with it now). JB -Original Message- From: Brice Goglin Sent: 29 January 2019 10:35 To: Biddiscombe, John A. ; Hardware locality user list Subject: Re: [hwloc-users] unusual memory binding results Crazy idea: 512 pages could be replaced with a single 2MB huge page. You're not requesting huge pages in your allocation but some systems have transparent huge pages enabled by default (e.g. RHEL https://access.redhat.com/solutions/46111) This could explain why 512 pages get allocated on the same node, but it wouldn't explain crazy patterns you've seen in the past. Brice Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit : > I simplified things and instead of writing to a 2D array, I allocate a 1D > array of bytes and touch pages in a linear fashion. > Then I call syscall(NR)move_pages, ) and retrieve a status array for each > page in the data. > > When I allocate 511 pages and touch alternate pages on alternate numa > nodes > > Numa page binding 511 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > > but as soon as I increase to 512 pages, it breaks. > > Numa page binding 512 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > On the 8 numa node machine it sometimes gives the right answer even with 512 > pages. > > Still baffled > > JB > > -Original Message- > From: hwloc-users On Behalf Of > Biddiscombe, John A. > Sent: 28 January 2019 16:14 > To: Brice Goglin > Cc: Hardware locality user list > Subject: Re: [hwloc-users] unusual memory binding results > > Brice > >> Can you print the pattern before and after thread 1 touched its pages, or >> even in the middle ? >> It looks like somebody is touching too many pages here. > Experimenting with different threads touching one or more pages, I get > unpredicatable results > > here on the 8 numa node device, the result is perfect. I am only > allowing thread 3 and 7 to write a single memory location > > get_numa_domain() 8 Domain Numa pattern > > > > 3--- > > > > 7--- > > > > Contents of memory locations > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 26 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 63 0 0 0 0 0 0 0 > > > you can see that core 26 (numa domain 3) wrote to memory, and so did > core 63 (domain 8) > > Now I run it a second time and look, its rubbish > > get_numa_domain() 8 Domain Numa pattern > 3--- > 3--- > 3--- > 3--- > 3--- > 3--- > 3--- > 3--- > > > > Contents of memory locations > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0
Re: [hwloc-users] unusual memory binding results
Brice madvise(addr, n * sizeof(T), MADV_NOHUGEPAGE) seems to make things behave much more sensibly. I had no idea it was a thing, but one of my colleagues pointed me to it. Problem seems to be solved for now. Thank you very much for your insights and suggestions/help. JB -Original Message- From: Brice Goglin Sent: 29 January 2019 10:35 To: Biddiscombe, John A. ; Hardware locality user list Subject: Re: [hwloc-users] unusual memory binding results Crazy idea: 512 pages could be replaced with a single 2MB huge page. You're not requesting huge pages in your allocation but some systems have transparent huge pages enabled by default (e.g. RHEL https://access.redhat.com/solutions/46111) This could explain why 512 pages get allocated on the same node, but it wouldn't explain crazy patterns you've seen in the past. Brice Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit : > I simplified things and instead of writing to a 2D array, I allocate a 1D > array of bytes and touch pages in a linear fashion. > Then I call syscall(NR)move_pages, ) and retrieve a status array for each > page in the data. > > When I allocate 511 pages and touch alternate pages on alternate numa > nodes > > Numa page binding 511 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > > but as soon as I increase to 512 pages, it breaks. > > Numa page binding 512 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > On the 8 numa node machine it sometimes gives the right answer even with 512 > pages. > > Still baffled > > JB > > -Original Message- > From: hwloc-users On Behalf Of > Biddiscombe, John A. > Sent: 28 January 2019 16:14 > To: Brice Goglin > Cc: Hardware locality user list > Subject: Re: [hwloc-users] unusual memory binding results > > Brice > >> Can you print the pattern before and after thread 1 touched its pages, or >> even in the middle ? >> It looks like somebody is touching too many pages here. > Experimenting with different threads touching one or more pages, I get > unpredicatable results > > here on the 8 numa node device, the result is perfect. I am only > allowing thread 3 and 7 to write a single memory location > > get_numa_domain() 8 Domain Numa pattern > > > > 3--- > > > > 7--- > > > > Contents of memory locations > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 26 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 63 0 0 0 0 0 0 0 > > > you can see that core 26 (numa domain 3) wrote to memory, and so did > core 63 (domain 8) > > Now I run it a second time and look, its rubbish > > get_numa_domain() 8 Domain Numa pattern > 3--- > 3--- > 3--- > 3--- > 3--- > 3--- > 3--- > 3--- > > > > Contents of memory locations > 0 0
Re: [hwloc-users] unusual memory binding results
Oh, that's very good to know. I guess lots of people using first touch will be affected by this issue. We may want to add a hwloc memory flag doing something similar. Do you have root access to verify that writing "never" or "madvise" in /sys/kernel/mm/transparent_hugepage/enabled fixes the issue too? Brice Le 29/01/2019 à 14:02, Biddiscombe, John A. a écrit : > Brice > > madvise(addr, n * sizeof(T), MADV_NOHUGEPAGE) > > seems to make things behave much more sensibly. I had no idea it was a thing, > but one of my colleagues pointed me to it. > > Problem seems to be solved for now. Thank you very much for your insights and > suggestions/help. > > JB > > -Original Message- > From: Brice Goglin > Sent: 29 January 2019 10:35 > To: Biddiscombe, John A. ; Hardware locality user list > > Subject: Re: [hwloc-users] unusual memory binding results > > Crazy idea: 512 pages could be replaced with a single 2MB huge page. > You're not requesting huge pages in your allocation but some systems have > transparent huge pages enabled by default (e.g. RHEL > https://access.redhat.com/solutions/46111) > > This could explain why 512 pages get allocated on the same node, but it > wouldn't explain crazy patterns you've seen in the past. > > Brice > > > > > Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit : >> I simplified things and instead of writing to a 2D array, I allocate a 1D >> array of bytes and touch pages in a linear fashion. >> Then I call syscall(NR)move_pages, ) and retrieve a status array for >> each page in the data. >> >> When I allocate 511 pages and touch alternate pages on alternate numa >> nodes >> >> Numa page binding 511 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> >> but as soon as I increase to 512 pages, it breaks. >> >> Numa page binding 512 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> On the 8 numa node machine it sometimes gives the right answer even with 512 >> pages. >> >> Still baffled >> >> JB >> >> -Original Message- >> From: hwloc-users On Behalf Of >> Biddiscombe, John A. >> Sent: 28 January 2019 16:14 >> To: Brice Goglin >> Cc: Hardware locality user list >> Subject: Re: [hwloc-users] unusual memory binding results >> >> Brice >> >>> Can you print the pattern before and after thread 1 touched its pages, or >>> even in the middle ? >>> It looks like somebody is touching too many pages here. >> Experimenting with different threads touching one or more pages, I get >> unpredicatable results >> >> here on the 8 numa node device, the result is perfect. I am only >> allowing thread 3 and 7 to write a single memory location >> >> get_numa_domain() 8 Domain Numa pattern >> >> >> >> 3--- >> >> >> >> 7--- >> >> >> >> Contents of memory locations >> 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0
Re: [hwloc-users] unusual memory binding results
On the 8 numa node machine $cat /sys/kernel/mm/transparent_hugepage/enabled [always] madvise never is set already, so I'm not really sure what should go in there to disable it. JB -Original Message- From: Brice Goglin Sent: 29 January 2019 15:29 To: Biddiscombe, John A. ; Hardware locality user list Subject: Re: [hwloc-users] unusual memory binding results Oh, that's very good to know. I guess lots of people using first touch will be affected by this issue. We may want to add a hwloc memory flag doing something similar. Do you have root access to verify that writing "never" or "madvise" in /sys/kernel/mm/transparent_hugepage/enabled fixes the issue too? Brice Le 29/01/2019 à 14:02, Biddiscombe, John A. a écrit : > Brice > > madvise(addr, n * sizeof(T), MADV_NOHUGEPAGE) > > seems to make things behave much more sensibly. I had no idea it was a thing, > but one of my colleagues pointed me to it. > > Problem seems to be solved for now. Thank you very much for your insights and > suggestions/help. > > JB > > -Original Message- > From: Brice Goglin > Sent: 29 January 2019 10:35 > To: Biddiscombe, John A. ; Hardware locality user > list > Subject: Re: [hwloc-users] unusual memory binding results > > Crazy idea: 512 pages could be replaced with a single 2MB huge page. > You're not requesting huge pages in your allocation but some systems > have transparent huge pages enabled by default (e.g. RHEL > https://access.redhat.com/solutions/46111) > > This could explain why 512 pages get allocated on the same node, but it > wouldn't explain crazy patterns you've seen in the past. > > Brice > > > > > Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit : >> I simplified things and instead of writing to a 2D array, I allocate a 1D >> array of bytes and touch pages in a linear fashion. >> Then I call syscall(NR)move_pages, ) and retrieve a status array for >> each page in the data. >> >> When I allocate 511 pages and touch alternate pages on alternate numa >> nodes >> >> Numa page binding 511 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> >> but as soon as I increase to 512 pages, it breaks. >> >> Numa page binding 512 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> On the 8 numa node machine it sometimes gives the right answer even with 512 >> pages. >> >> Still baffled >> >> JB >> >> -Original Message- >> From: hwloc-users On Behalf Of >> Biddiscombe, John A. >> Sent: 28 January 2019 16:14 >> To: Brice Goglin >> Cc: Hardware locality user list >> Subject: Re: [hwloc-users] unusual memory binding results >> >> Brice >> >>> Can you print the pattern before and after thread 1 touched its pages, or >>> even in the middle ? >>> It looks like somebody is touching too many pages here. >> Experimenting with different threads touching one or more pages, I >> get unpredicatable results >> >> here on the 8 numa node device, the result is perfect. I am only >>
Re: [hwloc-users] unusual memory binding results
Only the one in brackets is set, others are unset alternatives. If you write "madvise" in that file, it'll become "always [madvise] never". Brice Le 29/01/2019 à 15:36, Biddiscombe, John A. a écrit : > On the 8 numa node machine > > $cat /sys/kernel/mm/transparent_hugepage/enabled > [always] madvise never > > is set already, so I'm not really sure what should go in there to disable it. > > JB > > -Original Message- > From: Brice Goglin > Sent: 29 January 2019 15:29 > To: Biddiscombe, John A. ; Hardware locality user list > > Subject: Re: [hwloc-users] unusual memory binding results > > Oh, that's very good to know. I guess lots of people using first touch will > be affected by this issue. We may want to add a hwloc memory flag doing > something similar. > > Do you have root access to verify that writing "never" or "madvise" in > /sys/kernel/mm/transparent_hugepage/enabled fixes the issue too? > > Brice > > > > Le 29/01/2019 à 14:02, Biddiscombe, John A. a écrit : >> Brice >> >> madvise(addr, n * sizeof(T), MADV_NOHUGEPAGE) >> >> seems to make things behave much more sensibly. I had no idea it was a >> thing, but one of my colleagues pointed me to it. >> >> Problem seems to be solved for now. Thank you very much for your insights >> and suggestions/help. >> >> JB >> >> -Original Message- >> From: Brice Goglin >> Sent: 29 January 2019 10:35 >> To: Biddiscombe, John A. ; Hardware locality user >> list >> Subject: Re: [hwloc-users] unusual memory binding results >> >> Crazy idea: 512 pages could be replaced with a single 2MB huge page. >> You're not requesting huge pages in your allocation but some systems >> have transparent huge pages enabled by default (e.g. RHEL >> https://access.redhat.com/solutions/46111) >> >> This could explain why 512 pages get allocated on the same node, but it >> wouldn't explain crazy patterns you've seen in the past. >> >> Brice >> >> >> >> >> Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit : >>> I simplified things and instead of writing to a 2D array, I allocate a 1D >>> array of bytes and touch pages in a linear fashion. >>> Then I call syscall(NR)move_pages, ) and retrieve a status array for >>> each page in the data. >>> >>> When I allocate 511 pages and touch alternate pages on alternate numa >>> nodes >>> >>> Numa page binding 511 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> >>> but as soon as I increase to 512 pages, it breaks. >>> >>> Numa page binding 512 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> >>> On the 8 numa node machine it sometimes gives the right answer even with >>> 512 pages. >>> >>> Still baffled >>> >>> JB >>> >>> -Original Message- >>> From: hwloc-users On Behalf Of >>> Biddiscombe, John A. >>> Sent: 28 January 2019 16:14 >>> To: Brice Goglin >>> Cc: Hardware locality user list >>> Subject: Re: [hwloc-users] unusual memory binding results >>> >>> Brice
Re: [hwloc-users] unusual memory binding results
The answer is "no", I don't have root access, but I suspect that that would be the right fix if it is currently set to [always] and either madvise or never would be good options. If it is of interest, I'll ask someone to try it and report back on what happens. -Original Message- From: Brice Goglin Sent: 29 January 2019 15:39 To: Biddiscombe, John A. ; Hardware locality user list Subject: Re: [hwloc-users] unusual memory binding results Only the one in brackets is set, others are unset alternatives. If you write "madvise" in that file, it'll become "always [madvise] never". Brice Le 29/01/2019 à 15:36, Biddiscombe, John A. a écrit : > On the 8 numa node machine > > $cat /sys/kernel/mm/transparent_hugepage/enabled > [always] madvise never > > is set already, so I'm not really sure what should go in there to disable it. > > JB > > -Original Message- > From: Brice Goglin > Sent: 29 January 2019 15:29 > To: Biddiscombe, John A. ; Hardware locality user > list > Subject: Re: [hwloc-users] unusual memory binding results > > Oh, that's very good to know. I guess lots of people using first touch will > be affected by this issue. We may want to add a hwloc memory flag doing > something similar. > > Do you have root access to verify that writing "never" or "madvise" in > /sys/kernel/mm/transparent_hugepage/enabled fixes the issue too? > > Brice > > > > Le 29/01/2019 à 14:02, Biddiscombe, John A. a écrit : >> Brice >> >> madvise(addr, n * sizeof(T), MADV_NOHUGEPAGE) >> >> seems to make things behave much more sensibly. I had no idea it was a >> thing, but one of my colleagues pointed me to it. >> >> Problem seems to be solved for now. Thank you very much for your insights >> and suggestions/help. >> >> JB >> >> -Original Message- >> From: Brice Goglin >> Sent: 29 January 2019 10:35 >> To: Biddiscombe, John A. ; Hardware locality user >> list >> Subject: Re: [hwloc-users] unusual memory binding results >> >> Crazy idea: 512 pages could be replaced with a single 2MB huge page. >> You're not requesting huge pages in your allocation but some systems >> have transparent huge pages enabled by default (e.g. RHEL >> https://access.redhat.com/solutions/46111) >> >> This could explain why 512 pages get allocated on the same node, but it >> wouldn't explain crazy patterns you've seen in the past. >> >> Brice >> >> >> >> >> Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit : >>> I simplified things and instead of writing to a 2D array, I allocate a 1D >>> array of bytes and touch pages in a linear fashion. >>> Then I call syscall(NR)move_pages, ) and retrieve a status array for >>> each page in the data. >>> >>> When I allocate 511 pages and touch alternate pages on alternate >>> numa nodes >>> >>> Numa page binding 511 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> >>> but as soon as I increase to 512 pages, it breaks. >>> >>> Numa page binding 512 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 >>> 0