On 3/6/15 12:29 PM, Mike Galbraith wrote:
On Fri, 2015-03-06 at 11:37 -0700, David Ahern wrote:
But, I do not understand how the wrong topology is causing the NMI
watchdog to trigger. In the end there are still N domains, M groups per
domain and P cpus per group. Doesn't the balancing walk
On 3/6/15 12:29 PM, Mike Galbraith wrote:
On Fri, 2015-03-06 at 11:37 -0700, David Ahern wrote:
But, I do not understand how the wrong topology is causing the NMI
watchdog to trigger. In the end there are still N domains, M groups per
domain and P cpus per group. Doesn't the balancing walk
On Fri, Mar 06, 2015 at 11:37:11AM -0700, David Ahern wrote:
> On 3/6/15 11:11 AM, Mike Galbraith wrote:
> In responding earlier today I realized that the topology is all wrong as you
> were pointing out. There should be 16 NUMA domains (4 memory controllers per
> socket and 4 sockets). There
On Fri, Mar 06, 2015 at 11:37:11AM -0700, David Ahern wrote:
On 3/6/15 11:11 AM, Mike Galbraith wrote:
In responding earlier today I realized that the topology is all wrong as you
were pointing out. There should be 16 NUMA domains (4 memory controllers per
socket and 4 sockets). There should
On Fri, 2015-03-06 at 11:37 -0700, David Ahern wrote:
> But, I do not understand how the wrong topology is causing the NMI
> watchdog to trigger. In the end there are still N domains, M groups per
> domain and P cpus per group. Doesn't the balancing walk over all of them
> irrespective of
On 3/6/15 11:11 AM, Mike Galbraith wrote:
That was the question, _do_ you have any control, because that topology
is toxic. I guess your reply means 'nope'.
The system has 4 physical cpus (sockets). Each cpu has 32 cores with 8
threads per core and each cpu has 4 memory controllers.
Thank
On Fri, 2015-03-06 at 08:01 -0700, David Ahern wrote:
> On 3/5/15 9:52 PM, Mike Galbraith wrote:
> >> CPU970 attaching sched-domain:
> >>domain 0: span 968-975 level SIBLING
> >> groups: 8 single CPU groups
> >> domain 1: span 968-975 level MC
> >> groups: 1 group with 8 cpus
> >>
On 3/6/15 2:12 AM, Peter Zijlstra wrote:
On Thu, Mar 05, 2015 at 09:05:28PM -0700, David Ahern wrote:
Socket(s): 32
NUMA node(s): 4
Urgh, with 32 'cpus' per socket, you still do _8_ sockets per node, for
a total of 256 cpus per node.
Per the response to Mike, the system
On 3/6/15 2:07 AM, Peter Zijlstra wrote:
On Thu, Mar 05, 2015 at 09:05:28PM -0700, David Ahern wrote:
Since each domain is a superset of the lower one each pass through
load_balance regularly repeats the processing of the previous domain (e.g.,
NODE domain repeats the cpus in the CPU domain).
On 3/6/15 1:51 AM, Peter Zijlstra wrote:
On Thu, Mar 05, 2015 at 09:05:28PM -0700, David Ahern wrote:
Hi Peter/Mike/Ingo:
Does that make sense or am I off in the weeds?
How much of your story pertains to 3.18? I'm not particularly interested
in anything much older than that.
No. All of
On 3/5/15 9:52 PM, Mike Galbraith wrote:
CPU970 attaching sched-domain:
domain 0: span 968-975 level SIBLING
groups: 8 single CPU groups
domain 1: span 968-975 level MC
groups: 1 group with 8 cpus
domain 2: span 768-1023 level CPU
groups: 4 groups with 256 cpus per
On Thu, Mar 05, 2015 at 09:05:28PM -0700, David Ahern wrote:
> Socket(s): 32
> NUMA node(s): 4
Urgh, with 32 'cpus' per socket, you still do _8_ sockets per node, for
a total of 256 cpus per node.
That's painful. I don't suppose you can really change the hardware, but
that's
On Thu, Mar 05, 2015 at 09:05:28PM -0700, David Ahern wrote:
> Since each domain is a superset of the lower one each pass through
> load_balance regularly repeats the processing of the previous domain (e.g.,
> NODE domain repeats the cpus in the CPU domain). Then multiplying that
> across 1024
On Thu, Mar 05, 2015 at 09:05:28PM -0700, David Ahern wrote:
> Hi Peter/Mike/Ingo:
>
> Does that make sense or am I off in the weeds?
How much of your story pertains to 3.18? I'm not particularly interested
in anything much older than that.
--
To unsubscribe from this list: send the line
On 3/6/15 2:07 AM, Peter Zijlstra wrote:
On Thu, Mar 05, 2015 at 09:05:28PM -0700, David Ahern wrote:
Since each domain is a superset of the lower one each pass through
load_balance regularly repeats the processing of the previous domain (e.g.,
NODE domain repeats the cpus in the CPU domain).
On 3/6/15 2:12 AM, Peter Zijlstra wrote:
On Thu, Mar 05, 2015 at 09:05:28PM -0700, David Ahern wrote:
Socket(s): 32
NUMA node(s): 4
Urgh, with 32 'cpus' per socket, you still do _8_ sockets per node, for
a total of 256 cpus per node.
Per the response to Mike, the system
On 3/5/15 9:52 PM, Mike Galbraith wrote:
CPU970 attaching sched-domain:
domain 0: span 968-975 level SIBLING
groups: 8 single CPU groups
domain 1: span 968-975 level MC
groups: 1 group with 8 cpus
domain 2: span 768-1023 level CPU
groups: 4 groups with 256 cpus per
On 3/6/15 1:51 AM, Peter Zijlstra wrote:
On Thu, Mar 05, 2015 at 09:05:28PM -0700, David Ahern wrote:
Hi Peter/Mike/Ingo:
Does that make sense or am I off in the weeds?
How much of your story pertains to 3.18? I'm not particularly interested
in anything much older than that.
No. All of
On Thu, Mar 05, 2015 at 09:05:28PM -0700, David Ahern wrote:
Socket(s): 32
NUMA node(s): 4
Urgh, with 32 'cpus' per socket, you still do _8_ sockets per node, for
a total of 256 cpus per node.
That's painful. I don't suppose you can really change the hardware, but
that's a
On Thu, Mar 05, 2015 at 09:05:28PM -0700, David Ahern wrote:
Since each domain is a superset of the lower one each pass through
load_balance regularly repeats the processing of the previous domain (e.g.,
NODE domain repeats the cpus in the CPU domain). Then multiplying that
across 1024 cpus
On Fri, 2015-03-06 at 08:01 -0700, David Ahern wrote:
On 3/5/15 9:52 PM, Mike Galbraith wrote:
CPU970 attaching sched-domain:
domain 0: span 968-975 level SIBLING
groups: 8 single CPU groups
domain 1: span 968-975 level MC
groups: 1 group with 8 cpus
domain 2:
On 3/6/15 11:11 AM, Mike Galbraith wrote:
That was the question, _do_ you have any control, because that topology
is toxic. I guess your reply means 'nope'.
The system has 4 physical cpus (sockets). Each cpu has 32 cores with 8
threads per core and each cpu has 4 memory controllers.
Thank
On Fri, 2015-03-06 at 11:37 -0700, David Ahern wrote:
But, I do not understand how the wrong topology is causing the NMI
watchdog to trigger. In the end there are still N domains, M groups per
domain and P cpus per group. Doesn't the balancing walk over all of them
irrespective of physical
On Thu, Mar 05, 2015 at 09:05:28PM -0700, David Ahern wrote:
Hi Peter/Mike/Ingo:
Does that make sense or am I off in the weeds?
How much of your story pertains to 3.18? I'm not particularly interested
in anything much older than that.
--
To unsubscribe from this list: send the line
On Thu, 2015-03-05 at 21:05 -0700, David Ahern wrote:
> Hi Peter/Mike/Ingo:
>
> I've been banging my against this wall for a week now and hoping you or
> someone could shed some light on the problem.
>
> On larger systems (256 to 1024 cpus) there are several use cases (e.g.,
>
Hi Peter/Mike/Ingo:
I've been banging my against this wall for a week now and hoping you or
someone could shed some light on the problem.
On larger systems (256 to 1024 cpus) there are several use cases (e.g.,
http://www.cs.virginia.edu/stream/) that regularly trigger the NMI
watchdog with
On Thu, 2015-03-05 at 21:05 -0700, David Ahern wrote:
Hi Peter/Mike/Ingo:
I've been banging my against this wall for a week now and hoping you or
someone could shed some light on the problem.
On larger systems (256 to 1024 cpus) there are several use cases (e.g.,
Hi Peter/Mike/Ingo:
I've been banging my against this wall for a week now and hoping you or
someone could shed some light on the problem.
On larger systems (256 to 1024 cpus) there are several use cases (e.g.,
http://www.cs.virginia.edu/stream/) that regularly trigger the NMI
watchdog with
28 matches
Mail list logo