Re: [hwloc-users] unusual memory binding results

2019-01-29 Thread Biddiscombe, John A.
The answer is "no", I don't have root access, but I suspect that that would be 
the right fix if it is currently set to [always] and either madvise or never 
would be good options. If it is of interest, I'll ask someone to try it and 
report back on what happens.

-Original Message-
From: Brice Goglin  
Sent: 29 January 2019 15:39
To: Biddiscombe, John A. ; Hardware locality user list 

Subject: Re: [hwloc-users] unusual memory binding results

Only the one in brackets is set, others are unset alternatives.

If you write "madvise" in that file, it'll become "always [madvise] never".

Brice


Le 29/01/2019 à 15:36, Biddiscombe, John A. a écrit :
> On the 8 numa node machine
>
> $cat /sys/kernel/mm/transparent_hugepage/enabled
> [always] madvise never
>
> is set already, so I'm not really sure what should go in there to disable it.
>
> JB
>
> -Original Message-
> From: Brice Goglin 
> Sent: 29 January 2019 15:29
> To: Biddiscombe, John A. ; Hardware locality user 
> list 
> Subject: Re: [hwloc-users] unusual memory binding results
>
> Oh, that's very good to know. I guess lots of people using first touch will 
> be affected by this issue. We may want to add a hwloc memory flag doing 
> something similar.
>
> Do you have root access to verify that writing "never" or "madvise" in 
> /sys/kernel/mm/transparent_hugepage/enabled fixes the issue too?
>
> Brice
>
>
>
> Le 29/01/2019 à 14:02, Biddiscombe, John A. a écrit :
>> Brice
>>
>> madvise(addr, n * sizeof(T), MADV_NOHUGEPAGE)
>>
>> seems to make things behave much more sensibly. I had no idea it was a 
>> thing, but one of my colleagues pointed me to it.
>>
>> Problem seems to be solved for now. Thank you very much for your insights 
>> and suggestions/help.
>>
>> JB
>>
>> -Original Message-
>> From: Brice Goglin 
>> Sent: 29 January 2019 10:35
>> To: Biddiscombe, John A. ; Hardware locality user 
>> list 
>> Subject: Re: [hwloc-users] unusual memory binding results
>>
>> Crazy idea: 512 pages could be replaced with a single 2MB huge page.
>> You're not requesting huge pages in your allocation but some systems 
>> have transparent huge pages enabled by default (e.g. RHEL
>> https://access.redhat.com/solutions/46111)
>>
>> This could explain why 512 pages get allocated on the same node, but it 
>> wouldn't explain crazy patterns you've seen in the past.
>>
>> Brice
>>
>>
>>
>>
>> Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit :
>>> I simplified things and instead of writing to a 2D array, I allocate a 1D 
>>> array of bytes and touch pages in a linear fashion.
>>> Then I call syscall(NR)move_pages, ) and retrieve a status array for 
>>> each page in the data.
>>>
>>> When I allocate 511 pages and touch alternate pages on alternate 
>>> numa nodes
>>>
>>> Numa page binding 511
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>>> 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>>> 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>>> 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>>> 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>>> 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>>> 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>>> 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>>> 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>>> 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>>> 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>>> 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>>> 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>>> 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>>> 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>>>
>>> but as soon as I increase to 512 pages, it breaks.
>>>
>>> Numa page binding 512
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 

Re: [hwloc-users] unusual memory binding results

2019-01-29 Thread Brice Goglin
Only the one in brackets is set, others are unset alternatives.

If you write "madvise" in that file, it'll become "always [madvise] never".

Brice


Le 29/01/2019 à 15:36, Biddiscombe, John A. a écrit :
> On the 8 numa node machine
>
> $cat /sys/kernel/mm/transparent_hugepage/enabled 
> [always] madvise never
>
> is set already, so I'm not really sure what should go in there to disable it.
>
> JB
>
> -Original Message-
> From: Brice Goglin  
> Sent: 29 January 2019 15:29
> To: Biddiscombe, John A. ; Hardware locality user list 
> 
> Subject: Re: [hwloc-users] unusual memory binding results
>
> Oh, that's very good to know. I guess lots of people using first touch will 
> be affected by this issue. We may want to add a hwloc memory flag doing 
> something similar.
>
> Do you have root access to verify that writing "never" or "madvise" in 
> /sys/kernel/mm/transparent_hugepage/enabled fixes the issue too?
>
> Brice
>
>
>
> Le 29/01/2019 à 14:02, Biddiscombe, John A. a écrit :
>> Brice
>>
>> madvise(addr, n * sizeof(T), MADV_NOHUGEPAGE)
>>
>> seems to make things behave much more sensibly. I had no idea it was a 
>> thing, but one of my colleagues pointed me to it.
>>
>> Problem seems to be solved for now. Thank you very much for your insights 
>> and suggestions/help.
>>
>> JB
>>
>> -Original Message-
>> From: Brice Goglin 
>> Sent: 29 January 2019 10:35
>> To: Biddiscombe, John A. ; Hardware locality user 
>> list 
>> Subject: Re: [hwloc-users] unusual memory binding results
>>
>> Crazy idea: 512 pages could be replaced with a single 2MB huge page.
>> You're not requesting huge pages in your allocation but some systems 
>> have transparent huge pages enabled by default (e.g. RHEL
>> https://access.redhat.com/solutions/46111)
>>
>> This could explain why 512 pages get allocated on the same node, but it 
>> wouldn't explain crazy patterns you've seen in the past.
>>
>> Brice
>>
>>
>>
>>
>> Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit :
>>> I simplified things and instead of writing to a 2D array, I allocate a 1D 
>>> array of bytes and touch pages in a linear fashion.
>>> Then I call syscall(NR)move_pages, ) and retrieve a status array for 
>>> each page in the data.
>>>
>>> When I allocate 511 pages and touch alternate pages on alternate numa 
>>> nodes
>>>
>>> Numa page binding 511
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>>>
>>> but as soon as I increase to 512 pages, it breaks.
>>>
>>> Numa page binding 512
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>>
>>> On the 8 numa node machine it sometimes gives the right answer even with 
>>> 512 pages.
>>>
>>> Still baffled
>>>
>>> JB
>>>
>>> -Original Message-
>>> From: hwloc-users  On Behalf Of 
>>> Biddiscombe, John A.
>>> Sent: 28 January 2019 16:14
>>> To: Brice Goglin 
>>> Cc: Hardware locality user list 
>>> Subject: Re: [hwloc-users] unusual memory binding results
>>>
>>> Brice

Re: [hwloc-users] unusual memory binding results

2019-01-29 Thread Biddiscombe, John A.
On the 8 numa node machine

$cat /sys/kernel/mm/transparent_hugepage/enabled 
[always] madvise never

is set already, so I'm not really sure what should go in there to disable it.

JB

-Original Message-
From: Brice Goglin  
Sent: 29 January 2019 15:29
To: Biddiscombe, John A. ; Hardware locality user list 

Subject: Re: [hwloc-users] unusual memory binding results

Oh, that's very good to know. I guess lots of people using first touch will be 
affected by this issue. We may want to add a hwloc memory flag doing something 
similar.

Do you have root access to verify that writing "never" or "madvise" in 
/sys/kernel/mm/transparent_hugepage/enabled fixes the issue too?

Brice



Le 29/01/2019 à 14:02, Biddiscombe, John A. a écrit :
> Brice
>
> madvise(addr, n * sizeof(T), MADV_NOHUGEPAGE)
>
> seems to make things behave much more sensibly. I had no idea it was a thing, 
> but one of my colleagues pointed me to it.
>
> Problem seems to be solved for now. Thank you very much for your insights and 
> suggestions/help.
>
> JB
>
> -Original Message-
> From: Brice Goglin 
> Sent: 29 January 2019 10:35
> To: Biddiscombe, John A. ; Hardware locality user 
> list 
> Subject: Re: [hwloc-users] unusual memory binding results
>
> Crazy idea: 512 pages could be replaced with a single 2MB huge page.
> You're not requesting huge pages in your allocation but some systems 
> have transparent huge pages enabled by default (e.g. RHEL
> https://access.redhat.com/solutions/46111)
>
> This could explain why 512 pages get allocated on the same node, but it 
> wouldn't explain crazy patterns you've seen in the past.
>
> Brice
>
>
>
>
> Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit :
>> I simplified things and instead of writing to a 2D array, I allocate a 1D 
>> array of bytes and touch pages in a linear fashion.
>> Then I call syscall(NR)move_pages, ) and retrieve a status array for 
>> each page in the data.
>>
>> When I allocate 511 pages and touch alternate pages on alternate numa 
>> nodes
>>
>> Numa page binding 511
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>>
>> but as soon as I increase to 512 pages, it breaks.
>>
>> Numa page binding 512
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>
>> On the 8 numa node machine it sometimes gives the right answer even with 512 
>> pages.
>>
>> Still baffled
>>
>> JB
>>
>> -Original Message-
>> From: hwloc-users  On Behalf Of 
>> Biddiscombe, John A.
>> Sent: 28 January 2019 16:14
>> To: Brice Goglin 
>> Cc: Hardware locality user list 
>> Subject: Re: [hwloc-users] unusual memory binding results
>>
>> Brice
>>
>>> Can you print the pattern before and after thread 1 touched its pages, or 
>>> even in the middle ?
>>> It looks like somebody is touching too many pages here.
>> Experimenting with different threads touching one or more pages, I 
>> get unpredicatable results
>>
>> here on the 8 numa node device, the result is perfect. I am only 
>> 

Re: [hwloc-users] unusual memory binding results

2019-01-29 Thread Brice Goglin
Oh, that's very good to know. I guess lots of people using first touch
will be affected by this issue. We may want to add a hwloc memory flag
doing something similar.

Do you have root access to verify that writing "never" or "madvise" in
/sys/kernel/mm/transparent_hugepage/enabled fixes the issue too?

Brice



Le 29/01/2019 à 14:02, Biddiscombe, John A. a écrit :
> Brice
>
> madvise(addr, n * sizeof(T), MADV_NOHUGEPAGE)
>
> seems to make things behave much more sensibly. I had no idea it was a thing, 
> but one of my colleagues pointed me to it.
>
> Problem seems to be solved for now. Thank you very much for your insights and 
> suggestions/help.
>
> JB
>
> -Original Message-
> From: Brice Goglin  
> Sent: 29 January 2019 10:35
> To: Biddiscombe, John A. ; Hardware locality user list 
> 
> Subject: Re: [hwloc-users] unusual memory binding results
>
> Crazy idea: 512 pages could be replaced with a single 2MB huge page.
> You're not requesting huge pages in your allocation but some systems have 
> transparent huge pages enabled by default (e.g. RHEL
> https://access.redhat.com/solutions/46111)
>
> This could explain why 512 pages get allocated on the same node, but it 
> wouldn't explain crazy patterns you've seen in the past.
>
> Brice
>
>
>
>
> Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit :
>> I simplified things and instead of writing to a 2D array, I allocate a 1D 
>> array of bytes and touch pages in a linear fashion.
>> Then I call syscall(NR)move_pages, ) and retrieve a status array for 
>> each page in the data.
>>
>> When I allocate 511 pages and touch alternate pages on alternate numa 
>> nodes
>>
>> Numa page binding 511
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>>
>> but as soon as I increase to 512 pages, it breaks.
>>
>> Numa page binding 512
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>
>> On the 8 numa node machine it sometimes gives the right answer even with 512 
>> pages.
>>
>> Still baffled
>>
>> JB
>>
>> -Original Message-
>> From: hwloc-users  On Behalf Of 
>> Biddiscombe, John A.
>> Sent: 28 January 2019 16:14
>> To: Brice Goglin 
>> Cc: Hardware locality user list 
>> Subject: Re: [hwloc-users] unusual memory binding results
>>
>> Brice
>>
>>> Can you print the pattern before and after thread 1 touched its pages, or 
>>> even in the middle ?
>>> It looks like somebody is touching too many pages here.
>> Experimenting with different threads touching one or more pages, I get 
>> unpredicatable results
>>
>> here on the 8 numa node device, the result is perfect. I am only 
>> allowing thread 3 and 7 to write a single memory location
>>
>> get_numa_domain() 8 Domain Numa pattern
>> 
>> 
>> 
>> 3---
>> 
>> 
>> 
>> 7---
>> 
>>
>> 
>> Contents of memory locations
>> 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 

Re: [hwloc-users] unusual memory binding results

2019-01-29 Thread Biddiscombe, John A.
Brice

madvise(addr, n * sizeof(T), MADV_NOHUGEPAGE)

seems to make things behave much more sensibly. I had no idea it was a thing, 
but one of my colleagues pointed me to it.

Problem seems to be solved for now. Thank you very much for your insights and 
suggestions/help.

JB

-Original Message-
From: Brice Goglin  
Sent: 29 January 2019 10:35
To: Biddiscombe, John A. ; Hardware locality user list 

Subject: Re: [hwloc-users] unusual memory binding results

Crazy idea: 512 pages could be replaced with a single 2MB huge page.
You're not requesting huge pages in your allocation but some systems have 
transparent huge pages enabled by default (e.g. RHEL
https://access.redhat.com/solutions/46111)

This could explain why 512 pages get allocated on the same node, but it 
wouldn't explain crazy patterns you've seen in the past.

Brice




Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit :
> I simplified things and instead of writing to a 2D array, I allocate a 1D 
> array of bytes and touch pages in a linear fashion.
> Then I call syscall(NR)move_pages, ) and retrieve a status array for each 
> page in the data.
>
> When I allocate 511 pages and touch alternate pages on alternate numa 
> nodes
>
> Numa page binding 511
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>
> but as soon as I increase to 512 pages, it breaks.
>
> Numa page binding 512
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>
> On the 8 numa node machine it sometimes gives the right answer even with 512 
> pages.
>
> Still baffled
>
> JB
>
> -Original Message-
> From: hwloc-users  On Behalf Of 
> Biddiscombe, John A.
> Sent: 28 January 2019 16:14
> To: Brice Goglin 
> Cc: Hardware locality user list 
> Subject: Re: [hwloc-users] unusual memory binding results
>
> Brice
>
>> Can you print the pattern before and after thread 1 touched its pages, or 
>> even in the middle ?
>> It looks like somebody is touching too many pages here.
> Experimenting with different threads touching one or more pages, I get 
> unpredicatable results
>
> here on the 8 numa node device, the result is perfect. I am only 
> allowing thread 3 and 7 to write a single memory location
>
> get_numa_domain() 8 Domain Numa pattern
> 
> 
> 
> 3---
> 
> 
> 
> 7---
> 
>
> 
> Contents of memory locations
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 26 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 63 0 0 0 0 0 0 0
> 
>
> you can see that core 26 (numa domain 3) wrote to memory, and so did 
> core 63 (domain 8)
>
> Now I run it a second time and look, its rubbish
>
> get_numa_domain() 8 Domain Numa pattern
> 3---
> 3---
> 3---
> 3---
> 3---
> 3---
> 3---
> 3---
> 
>
> 
> Contents of memory locations
> 0 

Re: [hwloc-users] unusual memory binding results

2019-01-29 Thread Biddiscombe, John A.
I wondered something similar. The crazy patterns usually happen on columns of 
the 2D matrix and as it is column major, it does loosely fit the idea (most of 
the time).

I will play some more (though I'm fed up with it now).

JB

-Original Message-
From: Brice Goglin  
Sent: 29 January 2019 10:35
To: Biddiscombe, John A. ; Hardware locality user list 

Subject: Re: [hwloc-users] unusual memory binding results

Crazy idea: 512 pages could be replaced with a single 2MB huge page.
You're not requesting huge pages in your allocation but some systems have 
transparent huge pages enabled by default (e.g. RHEL
https://access.redhat.com/solutions/46111)

This could explain why 512 pages get allocated on the same node, but it 
wouldn't explain crazy patterns you've seen in the past.

Brice




Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit :
> I simplified things and instead of writing to a 2D array, I allocate a 1D 
> array of bytes and touch pages in a linear fashion.
> Then I call syscall(NR)move_pages, ) and retrieve a status array for each 
> page in the data.
>
> When I allocate 511 pages and touch alternate pages on alternate numa 
> nodes
>
> Numa page binding 511
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>
> but as soon as I increase to 512 pages, it breaks.
>
> Numa page binding 512
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>
> On the 8 numa node machine it sometimes gives the right answer even with 512 
> pages.
>
> Still baffled
>
> JB
>
> -Original Message-
> From: hwloc-users  On Behalf Of 
> Biddiscombe, John A.
> Sent: 28 January 2019 16:14
> To: Brice Goglin 
> Cc: Hardware locality user list 
> Subject: Re: [hwloc-users] unusual memory binding results
>
> Brice
>
>> Can you print the pattern before and after thread 1 touched its pages, or 
>> even in the middle ?
>> It looks like somebody is touching too many pages here.
> Experimenting with different threads touching one or more pages, I get 
> unpredicatable results
>
> here on the 8 numa node device, the result is perfect. I am only 
> allowing thread 3 and 7 to write a single memory location
>
> get_numa_domain() 8 Domain Numa pattern
> 
> 
> 
> 3---
> 
> 
> 
> 7---
> 
>
> 
> Contents of memory locations
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 26 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 63 0 0 0 0 0 0 0
> 
>
> you can see that core 26 (numa domain 3) wrote to memory, and so did 
> core 63 (domain 8)
>
> Now I run it a second time and look, its rubbish
>
> get_numa_domain() 8 Domain Numa pattern
> 3---
> 3---
> 3---
> 3---
> 3---
> 3---
> 3---
> 3---
> 
>
> 
> Contents of memory locations
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 

Re: [hwloc-users] unusual memory binding results

2019-01-29 Thread Brice Goglin
Crazy idea: 512 pages could be replaced with a single 2MB huge page.
You're not requesting huge pages in your allocation but some systems
have transparent huge pages enabled by default (e.g. RHEL
https://access.redhat.com/solutions/46111)

This could explain why 512 pages get allocated on the same node, but it
wouldn't explain crazy patterns you've seen in the past.

Brice




Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit :
> I simplified things and instead of writing to a 2D array, I allocate a 1D 
> array of bytes and touch pages in a linear fashion.
> Then I call syscall(NR)move_pages, ) and retrieve a status array for each 
> page in the data.
>
> When I allocate 511 pages and touch alternate pages on alternate numa nodes
>
> Numa page binding 511
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
> 1 0 1 0
>
> but as soon as I increase to 512 pages, it breaks.
>
> Numa page binding 512
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0
>
> On the 8 numa node machine it sometimes gives the right answer even with 512 
> pages.
>
> Still baffled
>
> JB
>
> -Original Message-
> From: hwloc-users  On Behalf Of 
> Biddiscombe, John A.
> Sent: 28 January 2019 16:14
> To: Brice Goglin 
> Cc: Hardware locality user list 
> Subject: Re: [hwloc-users] unusual memory binding results
>
> Brice
>
>> Can you print the pattern before and after thread 1 touched its pages, or 
>> even in the middle ?
>> It looks like somebody is touching too many pages here.
> Experimenting with different threads touching one or more pages, I get 
> unpredicatable results
>
> here on the 8 numa node device, the result is perfect. I am only allowing 
> thread 3 and 7 to write a single memory location
>
> get_numa_domain() 8 Domain Numa pattern
> 
> 
> 
> 3---
> 
> 
> 
> 7---
> 
>
> 
> Contents of memory locations
> 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 
> 26 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 
> 63 0 0 0 0 0 0 0 
> 
>
> you can see that core 26 (numa domain 3) wrote to memory, and so did core 63 
> (domain 8)
>
> Now I run it a second time and look, its rubbish
>
> get_numa_domain() 8 Domain Numa pattern
> 3---
> 3---
> 3---
> 3---
> 3---
> 3---
> 3---
> 3---
> 
>
> 
> Contents of memory locations
> 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 
> 26 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 
> 63 0 0 0 0 0 0 0 
> 
>
> after allowing the data to be read by a random thread
>
> 3777
> 3777
> 3777
> 3777
> 3777
> 3777
> 3777
> 3777
>
> I'm baffled.
>
> JB
>
> ___
> hwloc-users mailing list
> hwloc-users@lists.open-mpi.org
> 

Re: [hwloc-users] unusual memory binding results

2019-01-29 Thread Biddiscombe, John A.
I simplified things and instead of writing to a 2D array, I allocate a 1D array 
of bytes and touch pages in a linear fashion.
Then I call syscall(NR)move_pages, ) and retrieve a status array for each 
page in the data.

When I allocate 511 pages and touch alternate pages on alternate numa nodes

Numa page binding 511
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
 1 0 1 0 1 0 1 0 1 0 1 0

but as soon as I increase to 512 pages, it breaks.

Numa page binding 512
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
 0 0 0 0 0 0 0 0 0 0 0 0 0

On the 8 numa node machine it sometimes gives the right answer even with 512 
pages.

Still baffled

JB

-Original Message-
From: hwloc-users  On Behalf Of 
Biddiscombe, John A.
Sent: 28 January 2019 16:14
To: Brice Goglin 
Cc: Hardware locality user list 
Subject: Re: [hwloc-users] unusual memory binding results

Brice

>Can you print the pattern before and after thread 1 touched its pages, or even 
>in the middle ?
>It looks like somebody is touching too many pages here.

Experimenting with different threads touching one or more pages, I get 
unpredicatable results

here on the 8 numa node device, the result is perfect. I am only allowing 
thread 3 and 7 to write a single memory location

get_numa_domain() 8 Domain Numa pattern



3---



7---



Contents of memory locations
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
26 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
63 0 0 0 0 0 0 0 


you can see that core 26 (numa domain 3) wrote to memory, and so did core 63 
(domain 8)

Now I run it a second time and look, its rubbish

get_numa_domain() 8 Domain Numa pattern
3---
3---
3---
3---
3---
3---
3---
3---



Contents of memory locations
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
26 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
63 0 0 0 0 0 0 0 


after allowing the data to be read by a random thread

3777
3777
3777
3777
3777
3777
3777
3777

I'm baffled.

JB

___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users


Re: [OMPI users] OpenMPI 3 without network connection

2019-01-29 Thread Patrick Bégou
Thanks Gilles for this work around. And thanks to OpenMPI developpers
for this responsiveness to quickly correct the problem too.
I'll build and deploy this new version for the users as soon as I'm back
to the laboratory.

Patrick

Le 29/01/2019 à 06:48, Gilles Gouaillardet a écrit :
> Patrick,
>
>
> I double checked the code, and indeed, mpirun should have
> automatically felt back
>
> on the loopback interface (and mpirun should have worked)
>
> The virbr0 interface prevented that and this is a bug I fixed in
> https://github.com/open-mpi/ompi/pull/6315
>
>
> Future releases of Open MPI will include this fix, meanwhile, you can
> either remove the virbr0 interface
>
> or use the workaround I previously described
>
>
> Cheers,
>
>
> Gilles
>
> On 1/29/2019 1:56 PM, Gilles Gouaillardet wrote:
>> Patrick,
>>
>> The root cause is we do not include the localhost interface by
>> default for OOB communications.
>>
>>
>> You should be able to run with
>>
>> mpirun --mca oob_tcp_if_include lo -np 4 hostname
>>
>>
>> Cheers,
>>
>> Gilles
>>
>> On 1/28/2019 11:02 PM, Patrick Bégou wrote:
>>>
>>> Hi,
>>>
>>> I fall in a strange problem with OpenMPI 3.1 installed on a CentOS7
>>> laptop. If no  network is available I cannot launch a local mpi job
>>> on the laptop:
>>>
>>> bash-4.2$ mpirun -np 4 hostname
>>> --
>>>
>>> No network interfaces were found for out-of-band communications. We
>>> require
>>> at least one available network for out-of-band messaging.
>>> --
>>>
>>>
>>> OpenMPI is built localy with
>>>
>>>     Open MPI: 3.1.3rc1
>>>   Open MPI repo revision: v3.1.2-78-gc8e9819
>>>   Configure command line: '--prefix=/opt/GCC73/openmpi31x'
>>> '--enable-mpirun-prefix-by-default'
>>>   '--disable-dlopen'
>>> '--enable-mca-no-build=openib'
>>>   '--without-verbs' '--enable-mpi-cxx'
>>>   '--without-slurm'
>>> '--enable-mpi-thread-multiple'
>>>
>>> I've tested some btl setup found with google but none solve the
>>> problem.
>>>
>>> bash-4.2$ mpirun -np 4 -mca btl ^tcp hostname
>>>
>>> or
>>>
>>> bash-4.2$ mpirun -np 4 -mca btl vader,self hostname
>>>
>>> Sarting a wifi connection (when it is available):
>>>
>>> bash-4.2$ mpirun -np 4 hostname
>>> localhost.localdomain
>>> localhost.localdomain
>>> localhost.localdomain
>>> localhost.localdomain
>>>
>>> Any suggestion is welcome
>>>
>>> Patrick
>>>
>>>
>>> ___
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users