Re: [hwloc-users] unusual memory binding results
The answer is "no", I don't have root access, but I suspect that that would be the right fix if it is currently set to [always] and either madvise or never would be good options. If it is of interest, I'll ask someone to try it and report back on what happens. -Original Message- From: Brice Goglin Sent: 29 January 2019 15:39 To: Biddiscombe, John A. ; Hardware locality user list Subject: Re: [hwloc-users] unusual memory binding results Only the one in brackets is set, others are unset alternatives. If you write "madvise" in that file, it'll become "always [madvise] never". Brice Le 29/01/2019 à 15:36, Biddiscombe, John A. a écrit : > On the 8 numa node machine > > $cat /sys/kernel/mm/transparent_hugepage/enabled > [always] madvise never > > is set already, so I'm not really sure what should go in there to disable it. > > JB > > -Original Message- > From: Brice Goglin > Sent: 29 January 2019 15:29 > To: Biddiscombe, John A. ; Hardware locality user > list > Subject: Re: [hwloc-users] unusual memory binding results > > Oh, that's very good to know. I guess lots of people using first touch will > be affected by this issue. We may want to add a hwloc memory flag doing > something similar. > > Do you have root access to verify that writing "never" or "madvise" in > /sys/kernel/mm/transparent_hugepage/enabled fixes the issue too? > > Brice > > > > Le 29/01/2019 à 14:02, Biddiscombe, John A. a écrit : >> Brice >> >> madvise(addr, n * sizeof(T), MADV_NOHUGEPAGE) >> >> seems to make things behave much more sensibly. I had no idea it was a >> thing, but one of my colleagues pointed me to it. >> >> Problem seems to be solved for now. Thank you very much for your insights >> and suggestions/help. >> >> JB >> >> -Original Message- >> From: Brice Goglin >> Sent: 29 January 2019 10:35 >> To: Biddiscombe, John A. ; Hardware locality user >> list >> Subject: Re: [hwloc-users] unusual memory binding results >> >> Crazy idea: 512 pages could be replaced with a single 2MB huge page. >> You're not requesting huge pages in your allocation but some systems >> have transparent huge pages enabled by default (e.g. RHEL >> https://access.redhat.com/solutions/46111) >> >> This could explain why 512 pages get allocated on the same node, but it >> wouldn't explain crazy patterns you've seen in the past. >> >> Brice >> >> >> >> >> Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit : >>> I simplified things and instead of writing to a 2D array, I allocate a 1D >>> array of bytes and touch pages in a linear fashion. >>> Then I call syscall(NR)move_pages, ) and retrieve a status array for >>> each page in the data. >>> >>> When I allocate 511 pages and touch alternate pages on alternate >>> numa nodes >>> >>> Numa page binding 511 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> >>> but as soon as I increase to 512 pages, it breaks. >>> >>> Numa page binding 512 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 >>>
Re: [hwloc-users] unusual memory binding results
Only the one in brackets is set, others are unset alternatives. If you write "madvise" in that file, it'll become "always [madvise] never". Brice Le 29/01/2019 à 15:36, Biddiscombe, John A. a écrit : > On the 8 numa node machine > > $cat /sys/kernel/mm/transparent_hugepage/enabled > [always] madvise never > > is set already, so I'm not really sure what should go in there to disable it. > > JB > > -Original Message- > From: Brice Goglin > Sent: 29 January 2019 15:29 > To: Biddiscombe, John A. ; Hardware locality user list > > Subject: Re: [hwloc-users] unusual memory binding results > > Oh, that's very good to know. I guess lots of people using first touch will > be affected by this issue. We may want to add a hwloc memory flag doing > something similar. > > Do you have root access to verify that writing "never" or "madvise" in > /sys/kernel/mm/transparent_hugepage/enabled fixes the issue too? > > Brice > > > > Le 29/01/2019 à 14:02, Biddiscombe, John A. a écrit : >> Brice >> >> madvise(addr, n * sizeof(T), MADV_NOHUGEPAGE) >> >> seems to make things behave much more sensibly. I had no idea it was a >> thing, but one of my colleagues pointed me to it. >> >> Problem seems to be solved for now. Thank you very much for your insights >> and suggestions/help. >> >> JB >> >> -Original Message- >> From: Brice Goglin >> Sent: 29 January 2019 10:35 >> To: Biddiscombe, John A. ; Hardware locality user >> list >> Subject: Re: [hwloc-users] unusual memory binding results >> >> Crazy idea: 512 pages could be replaced with a single 2MB huge page. >> You're not requesting huge pages in your allocation but some systems >> have transparent huge pages enabled by default (e.g. RHEL >> https://access.redhat.com/solutions/46111) >> >> This could explain why 512 pages get allocated on the same node, but it >> wouldn't explain crazy patterns you've seen in the past. >> >> Brice >> >> >> >> >> Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit : >>> I simplified things and instead of writing to a 2D array, I allocate a 1D >>> array of bytes and touch pages in a linear fashion. >>> Then I call syscall(NR)move_pages, ) and retrieve a status array for >>> each page in the data. >>> >>> When I allocate 511 pages and touch alternate pages on alternate numa >>> nodes >>> >>> Numa page binding 511 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >>> >>> but as soon as I increase to 512 pages, it breaks. >>> >>> Numa page binding 512 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> >>> On the 8 numa node machine it sometimes gives the right answer even with >>> 512 pages. >>> >>> Still baffled >>> >>> JB >>> >>> -Original Message- >>> From: hwloc-users On Behalf Of >>> Biddiscombe, John A. >>> Sent: 28 January 2019 16:14 >>> To: Brice Goglin >>> Cc: Hardware locality user list >>> Subject: Re: [hwloc-users] unusual memory binding results >>> >>> Brice
Re: [hwloc-users] unusual memory binding results
On the 8 numa node machine $cat /sys/kernel/mm/transparent_hugepage/enabled [always] madvise never is set already, so I'm not really sure what should go in there to disable it. JB -Original Message- From: Brice Goglin Sent: 29 January 2019 15:29 To: Biddiscombe, John A. ; Hardware locality user list Subject: Re: [hwloc-users] unusual memory binding results Oh, that's very good to know. I guess lots of people using first touch will be affected by this issue. We may want to add a hwloc memory flag doing something similar. Do you have root access to verify that writing "never" or "madvise" in /sys/kernel/mm/transparent_hugepage/enabled fixes the issue too? Brice Le 29/01/2019 à 14:02, Biddiscombe, John A. a écrit : > Brice > > madvise(addr, n * sizeof(T), MADV_NOHUGEPAGE) > > seems to make things behave much more sensibly. I had no idea it was a thing, > but one of my colleagues pointed me to it. > > Problem seems to be solved for now. Thank you very much for your insights and > suggestions/help. > > JB > > -Original Message- > From: Brice Goglin > Sent: 29 January 2019 10:35 > To: Biddiscombe, John A. ; Hardware locality user > list > Subject: Re: [hwloc-users] unusual memory binding results > > Crazy idea: 512 pages could be replaced with a single 2MB huge page. > You're not requesting huge pages in your allocation but some systems > have transparent huge pages enabled by default (e.g. RHEL > https://access.redhat.com/solutions/46111) > > This could explain why 512 pages get allocated on the same node, but it > wouldn't explain crazy patterns you've seen in the past. > > Brice > > > > > Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit : >> I simplified things and instead of writing to a 2D array, I allocate a 1D >> array of bytes and touch pages in a linear fashion. >> Then I call syscall(NR)move_pages, ) and retrieve a status array for >> each page in the data. >> >> When I allocate 511 pages and touch alternate pages on alternate numa >> nodes >> >> Numa page binding 511 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> >> but as soon as I increase to 512 pages, it breaks. >> >> Numa page binding 512 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> On the 8 numa node machine it sometimes gives the right answer even with 512 >> pages. >> >> Still baffled >> >> JB >> >> -Original Message- >> From: hwloc-users On Behalf Of >> Biddiscombe, John A. >> Sent: 28 January 2019 16:14 >> To: Brice Goglin >> Cc: Hardware locality user list >> Subject: Re: [hwloc-users] unusual memory binding results >> >> Brice >> >>> Can you print the pattern before and after thread 1 touched its pages, or >>> even in the middle ? >>> It looks like somebody is touching too many pages here. >> Experimenting with different threads touching one or more pages, I >> get unpredicatable results >> >> here on the 8 numa node device, the result is perfect. I am only >>
Re: [hwloc-users] unusual memory binding results
Oh, that's very good to know. I guess lots of people using first touch will be affected by this issue. We may want to add a hwloc memory flag doing something similar. Do you have root access to verify that writing "never" or "madvise" in /sys/kernel/mm/transparent_hugepage/enabled fixes the issue too? Brice Le 29/01/2019 à 14:02, Biddiscombe, John A. a écrit : > Brice > > madvise(addr, n * sizeof(T), MADV_NOHUGEPAGE) > > seems to make things behave much more sensibly. I had no idea it was a thing, > but one of my colleagues pointed me to it. > > Problem seems to be solved for now. Thank you very much for your insights and > suggestions/help. > > JB > > -Original Message- > From: Brice Goglin > Sent: 29 January 2019 10:35 > To: Biddiscombe, John A. ; Hardware locality user list > > Subject: Re: [hwloc-users] unusual memory binding results > > Crazy idea: 512 pages could be replaced with a single 2MB huge page. > You're not requesting huge pages in your allocation but some systems have > transparent huge pages enabled by default (e.g. RHEL > https://access.redhat.com/solutions/46111) > > This could explain why 512 pages get allocated on the same node, but it > wouldn't explain crazy patterns you've seen in the past. > > Brice > > > > > Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit : >> I simplified things and instead of writing to a 2D array, I allocate a 1D >> array of bytes and touch pages in a linear fashion. >> Then I call syscall(NR)move_pages, ) and retrieve a status array for >> each page in the data. >> >> When I allocate 511 pages and touch alternate pages on alternate numa >> nodes >> >> Numa page binding 511 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> >> but as soon as I increase to 512 pages, it breaks. >> >> Numa page binding 512 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> On the 8 numa node machine it sometimes gives the right answer even with 512 >> pages. >> >> Still baffled >> >> JB >> >> -Original Message- >> From: hwloc-users On Behalf Of >> Biddiscombe, John A. >> Sent: 28 January 2019 16:14 >> To: Brice Goglin >> Cc: Hardware locality user list >> Subject: Re: [hwloc-users] unusual memory binding results >> >> Brice >> >>> Can you print the pattern before and after thread 1 touched its pages, or >>> even in the middle ? >>> It looks like somebody is touching too many pages here. >> Experimenting with different threads touching one or more pages, I get >> unpredicatable results >> >> here on the 8 numa node device, the result is perfect. I am only >> allowing thread 3 and 7 to write a single memory location >> >> get_numa_domain() 8 Domain Numa pattern >> >> >> >> 3--- >> >> >> >> 7--- >> >> >> >> Contents of memory locations >> 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 >> 0 0 0 0 0
Re: [hwloc-users] unusual memory binding results
Brice madvise(addr, n * sizeof(T), MADV_NOHUGEPAGE) seems to make things behave much more sensibly. I had no idea it was a thing, but one of my colleagues pointed me to it. Problem seems to be solved for now. Thank you very much for your insights and suggestions/help. JB -Original Message- From: Brice Goglin Sent: 29 January 2019 10:35 To: Biddiscombe, John A. ; Hardware locality user list Subject: Re: [hwloc-users] unusual memory binding results Crazy idea: 512 pages could be replaced with a single 2MB huge page. You're not requesting huge pages in your allocation but some systems have transparent huge pages enabled by default (e.g. RHEL https://access.redhat.com/solutions/46111) This could explain why 512 pages get allocated on the same node, but it wouldn't explain crazy patterns you've seen in the past. Brice Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit : > I simplified things and instead of writing to a 2D array, I allocate a 1D > array of bytes and touch pages in a linear fashion. > Then I call syscall(NR)move_pages, ) and retrieve a status array for each > page in the data. > > When I allocate 511 pages and touch alternate pages on alternate numa > nodes > > Numa page binding 511 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > > but as soon as I increase to 512 pages, it breaks. > > Numa page binding 512 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > On the 8 numa node machine it sometimes gives the right answer even with 512 > pages. > > Still baffled > > JB > > -Original Message- > From: hwloc-users On Behalf Of > Biddiscombe, John A. > Sent: 28 January 2019 16:14 > To: Brice Goglin > Cc: Hardware locality user list > Subject: Re: [hwloc-users] unusual memory binding results > > Brice > >> Can you print the pattern before and after thread 1 touched its pages, or >> even in the middle ? >> It looks like somebody is touching too many pages here. > Experimenting with different threads touching one or more pages, I get > unpredicatable results > > here on the 8 numa node device, the result is perfect. I am only > allowing thread 3 and 7 to write a single memory location > > get_numa_domain() 8 Domain Numa pattern > > > > 3--- > > > > 7--- > > > > Contents of memory locations > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 26 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 63 0 0 0 0 0 0 0 > > > you can see that core 26 (numa domain 3) wrote to memory, and so did > core 63 (domain 8) > > Now I run it a second time and look, its rubbish > > get_numa_domain() 8 Domain Numa pattern > 3--- > 3--- > 3--- > 3--- > 3--- > 3--- > 3--- > 3--- > > > > Contents of memory locations > 0
Re: [hwloc-users] unusual memory binding results
I wondered something similar. The crazy patterns usually happen on columns of the 2D matrix and as it is column major, it does loosely fit the idea (most of the time). I will play some more (though I'm fed up with it now). JB -Original Message- From: Brice Goglin Sent: 29 January 2019 10:35 To: Biddiscombe, John A. ; Hardware locality user list Subject: Re: [hwloc-users] unusual memory binding results Crazy idea: 512 pages could be replaced with a single 2MB huge page. You're not requesting huge pages in your allocation but some systems have transparent huge pages enabled by default (e.g. RHEL https://access.redhat.com/solutions/46111) This could explain why 512 pages get allocated on the same node, but it wouldn't explain crazy patterns you've seen in the past. Brice Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit : > I simplified things and instead of writing to a 2D array, I allocate a 1D > array of bytes and touch pages in a linear fashion. > Then I call syscall(NR)move_pages, ) and retrieve a status array for each > page in the data. > > When I allocate 511 pages and touch alternate pages on alternate numa > nodes > > Numa page binding 511 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > > but as soon as I increase to 512 pages, it breaks. > > Numa page binding 512 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > On the 8 numa node machine it sometimes gives the right answer even with 512 > pages. > > Still baffled > > JB > > -Original Message- > From: hwloc-users On Behalf Of > Biddiscombe, John A. > Sent: 28 January 2019 16:14 > To: Brice Goglin > Cc: Hardware locality user list > Subject: Re: [hwloc-users] unusual memory binding results > > Brice > >> Can you print the pattern before and after thread 1 touched its pages, or >> even in the middle ? >> It looks like somebody is touching too many pages here. > Experimenting with different threads touching one or more pages, I get > unpredicatable results > > here on the 8 numa node device, the result is perfect. I am only > allowing thread 3 and 7 to write a single memory location > > get_numa_domain() 8 Domain Numa pattern > > > > 3--- > > > > 7--- > > > > Contents of memory locations > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 26 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 63 0 0 0 0 0 0 0 > > > you can see that core 26 (numa domain 3) wrote to memory, and so did > core 63 (domain 8) > > Now I run it a second time and look, its rubbish > > get_numa_domain() 8 Domain Numa pattern > 3--- > 3--- > 3--- > 3--- > 3--- > 3--- > 3--- > 3--- > > > > Contents of memory locations > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0
Re: [hwloc-users] unusual memory binding results
Crazy idea: 512 pages could be replaced with a single 2MB huge page. You're not requesting huge pages in your allocation but some systems have transparent huge pages enabled by default (e.g. RHEL https://access.redhat.com/solutions/46111) This could explain why 512 pages get allocated on the same node, but it wouldn't explain crazy patterns you've seen in the past. Brice Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit : > I simplified things and instead of writing to a 2D array, I allocate a 1D > array of bytes and touch pages in a linear fashion. > Then I call syscall(NR)move_pages, ) and retrieve a status array for each > page in the data. > > When I allocate 511 pages and touch alternate pages on alternate numa nodes > > Numa page binding 511 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 > 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 > 1 0 1 0 > > but as soon as I increase to 512 pages, it breaks. > > Numa page binding 512 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 > > On the 8 numa node machine it sometimes gives the right answer even with 512 > pages. > > Still baffled > > JB > > -Original Message- > From: hwloc-users On Behalf Of > Biddiscombe, John A. > Sent: 28 January 2019 16:14 > To: Brice Goglin > Cc: Hardware locality user list > Subject: Re: [hwloc-users] unusual memory binding results > > Brice > >> Can you print the pattern before and after thread 1 touched its pages, or >> even in the middle ? >> It looks like somebody is touching too many pages here. > Experimenting with different threads touching one or more pages, I get > unpredicatable results > > here on the 8 numa node device, the result is perfect. I am only allowing > thread 3 and 7 to write a single memory location > > get_numa_domain() 8 Domain Numa pattern > > > > 3--- > > > > 7--- > > > > Contents of memory locations > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 26 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 63 0 0 0 0 0 0 0 > > > you can see that core 26 (numa domain 3) wrote to memory, and so did core 63 > (domain 8) > > Now I run it a second time and look, its rubbish > > get_numa_domain() 8 Domain Numa pattern > 3--- > 3--- > 3--- > 3--- > 3--- > 3--- > 3--- > 3--- > > > > Contents of memory locations > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 26 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 63 0 0 0 0 0 0 0 > > > after allowing the data to be read by a random thread > > 3777 > 3777 > 3777 > 3777 > 3777 > 3777 > 3777 > 3777 > > I'm baffled. > > JB > > ___ > hwloc-users mailing list > hwloc-users@lists.open-mpi.org >
Re: [hwloc-users] unusual memory binding results
I simplified things and instead of writing to a 2D array, I allocate a 1D array of bytes and touch pages in a linear fashion. Then I call syscall(NR)move_pages, ) and retrieve a status array for each page in the data. When I allocate 511 pages and touch alternate pages on alternate numa nodes Numa page binding 511 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 but as soon as I increase to 512 pages, it breaks. Numa page binding 512 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 On the 8 numa node machine it sometimes gives the right answer even with 512 pages. Still baffled JB -Original Message- From: hwloc-users On Behalf Of Biddiscombe, John A. Sent: 28 January 2019 16:14 To: Brice Goglin Cc: Hardware locality user list Subject: Re: [hwloc-users] unusual memory binding results Brice >Can you print the pattern before and after thread 1 touched its pages, or even >in the middle ? >It looks like somebody is touching too many pages here. Experimenting with different threads touching one or more pages, I get unpredicatable results here on the 8 numa node device, the result is perfect. I am only allowing thread 3 and 7 to write a single memory location get_numa_domain() 8 Domain Numa pattern 3--- 7--- Contents of memory locations 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 63 0 0 0 0 0 0 0 you can see that core 26 (numa domain 3) wrote to memory, and so did core 63 (domain 8) Now I run it a second time and look, its rubbish get_numa_domain() 8 Domain Numa pattern 3--- 3--- 3--- 3--- 3--- 3--- 3--- 3--- Contents of memory locations 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 63 0 0 0 0 0 0 0 after allowing the data to be read by a random thread 3777 3777 3777 3777 3777 3777 3777 3777 I'm baffled. JB ___ hwloc-users mailing list hwloc-users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-users ___ hwloc-users mailing list hwloc-users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-users
Re: [OMPI users] OpenMPI 3 without network connection
Thanks Gilles for this work around. And thanks to OpenMPI developpers for this responsiveness to quickly correct the problem too. I'll build and deploy this new version for the users as soon as I'm back to the laboratory. Patrick Le 29/01/2019 à 06:48, Gilles Gouaillardet a écrit : > Patrick, > > > I double checked the code, and indeed, mpirun should have > automatically felt back > > on the loopback interface (and mpirun should have worked) > > The virbr0 interface prevented that and this is a bug I fixed in > https://github.com/open-mpi/ompi/pull/6315 > > > Future releases of Open MPI will include this fix, meanwhile, you can > either remove the virbr0 interface > > or use the workaround I previously described > > > Cheers, > > > Gilles > > On 1/29/2019 1:56 PM, Gilles Gouaillardet wrote: >> Patrick, >> >> The root cause is we do not include the localhost interface by >> default for OOB communications. >> >> >> You should be able to run with >> >> mpirun --mca oob_tcp_if_include lo -np 4 hostname >> >> >> Cheers, >> >> Gilles >> >> On 1/28/2019 11:02 PM, Patrick Bégou wrote: >>> >>> Hi, >>> >>> I fall in a strange problem with OpenMPI 3.1 installed on a CentOS7 >>> laptop. If no network is available I cannot launch a local mpi job >>> on the laptop: >>> >>> bash-4.2$ mpirun -np 4 hostname >>> -- >>> >>> No network interfaces were found for out-of-band communications. We >>> require >>> at least one available network for out-of-band messaging. >>> -- >>> >>> >>> OpenMPI is built localy with >>> >>> Open MPI: 3.1.3rc1 >>> Open MPI repo revision: v3.1.2-78-gc8e9819 >>> Configure command line: '--prefix=/opt/GCC73/openmpi31x' >>> '--enable-mpirun-prefix-by-default' >>> '--disable-dlopen' >>> '--enable-mca-no-build=openib' >>> '--without-verbs' '--enable-mpi-cxx' >>> '--without-slurm' >>> '--enable-mpi-thread-multiple' >>> >>> I've tested some btl setup found with google but none solve the >>> problem. >>> >>> bash-4.2$ mpirun -np 4 -mca btl ^tcp hostname >>> >>> or >>> >>> bash-4.2$ mpirun -np 4 -mca btl vader,self hostname >>> >>> Sarting a wifi connection (when it is available): >>> >>> bash-4.2$ mpirun -np 4 hostname >>> localhost.localdomain >>> localhost.localdomain >>> localhost.localdomain >>> localhost.localdomain >>> >>> Any suggestion is welcome >>> >>> Patrick >>> >>> >>> ___ >>> users mailing list >>> users@lists.open-mpi.org >>> https://lists.open-mpi.org/mailman/listinfo/users >> > ___ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users