Re: [Regression] sched: division by zero in find_busiest_group()

2013-12-18 Thread Peter Zijlstra
On Wed, Dec 18, 2013 at 04:28:35AM +, Hedi Berriche wrote:
> On Mon, Dec 09, 2013 at 18:10 Hedi Berriche wrote:
> | Folks,
> | 
> | The following panic occurs *early* at boot time on high *enough* CPU count
> | machines:
> | 
> | divide error:  [#1] SMP 
> | Modules linked in:
> | CPU: 22 PID: 1146 Comm: kworker/22:0 Not tainted 3.13.0-rc2-00122-gdea4f48 
> #8
> | Hardware name: Intel Corp. Stoutland Platform, BIOS 2.20 UEFI2.10 PI1.0 X64 
> 2013-09-20
> | task: 8827d49f31c0 ti: 8827d4a18000 task.ti: 8827d4a18000
> | RIP: 0010:[]  [] 
> find_busiest_group+0x26b/0x890
> | RSP: :8827d4a19b68  EFLAGS: 00010006
> | RAX: 7fff RBX: 8000 RCX: 0200
> | RDX:  RSI: 8000 RDI: 0020
> | RBP: 8827d4a19cc0 R08:  R09: 
> | R10:  R11:  R12: 
> | R13: 8827d4a19d28 R14: 8827d4a19b98 R15: 
> | FS:  () GS:8827dfd8() knlGS:
> | CS:  0010 DS:  ES:  CR0: 8005003b
> | CR2: 00b8 CR3: 018da000 CR4: 07e0
> | Stack:
> | 8827d4b35800  00014600 00014600
> |  8827d4b35818  
> |   8000 
> | Call Trace:
> | [] load_balance+0x166/0x7f0
> | [] idle_balance+0x10e/0x1b0
> | [] __schedule+0x723/0x780
> | [] schedule+0x29/0x70
> | [] worker_thread+0x1c9/0x400
> | [] ? rescuer_thread+0x3e0/0x3e0
> | [] kthread+0xd2/0xf0
> | [] ? kthread_create_on_node+0x180/0x180
> | [] ret_from_fork+0x7c/0xb0
> | [] ? kthread_create_on_node+0x180/0x180
> 
> Hmm...had time to dig into this a bit deeper and looking at
> build_overlap_sched_groups(), specifically this bit of code:
> 
> kernel/sched/core.c:
> 
> 5066 static int
> 5067 build_overlap_sched_groups(struct sched_domain *sd, int cpu)
> 5068 {
> ...
> 5109 /*
> 5110  * Initialize sgp->power such that even if we mess up the
> 5111  * domains and no possible iteration will get us here, 
> we won't
> 5112  * die on a /0 trap.
> 5113  */
> 5114 sg->sgp->power = SCHED_POWER_SCALE * 
> cpumask_weight(sg_span);
> 
> I'm wondering whether the same precaution should be used when it comes to 
> sg->sgp->power_orig.

http://marc.info/?l=linux-kernel=138684195315258
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] sched: division by zero in find_busiest_group()

2013-12-18 Thread Peter Zijlstra
On Wed, Dec 18, 2013 at 04:28:35AM +, Hedi Berriche wrote:
 On Mon, Dec 09, 2013 at 18:10 Hedi Berriche wrote:
 | Folks,
 | 
 | The following panic occurs *early* at boot time on high *enough* CPU count
 | machines:
 | 
 | divide error:  [#1] SMP 
 | Modules linked in:
 | CPU: 22 PID: 1146 Comm: kworker/22:0 Not tainted 3.13.0-rc2-00122-gdea4f48 
 #8
 | Hardware name: Intel Corp. Stoutland Platform, BIOS 2.20 UEFI2.10 PI1.0 X64 
 2013-09-20
 | task: 8827d49f31c0 ti: 8827d4a18000 task.ti: 8827d4a18000
 | RIP: 0010:[810a345b]  [810a345b] 
 find_busiest_group+0x26b/0x890
 | RSP: :8827d4a19b68  EFLAGS: 00010006
 | RAX: 7fff RBX: 8000 RCX: 0200
 | RDX:  RSI: 8000 RDI: 0020
 | RBP: 8827d4a19cc0 R08:  R09: 
 | R10:  R11:  R12: 
 | R13: 8827d4a19d28 R14: 8827d4a19b98 R15: 
 | FS:  () GS:8827dfd8() knlGS:
 | CS:  0010 DS:  ES:  CR0: 8005003b
 | CR2: 00b8 CR3: 018da000 CR4: 07e0
 | Stack:
 | 8827d4b35800  00014600 00014600
 |  8827d4b35818  
 |   8000 
 | Call Trace:
 | [810a3be6] load_balance+0x166/0x7f0
 | [810a477e] idle_balance+0x10e/0x1b0
 | [815d83d3] __schedule+0x723/0x780
 | [815d8459] schedule+0x29/0x70
 | [810818b9] worker_thread+0x1c9/0x400
 | [810816f0] ? rescuer_thread+0x3e0/0x3e0
 | [81088562] kthread+0xd2/0xf0
 | [81088490] ? kthread_create_on_node+0x180/0x180
 | [815e437c] ret_from_fork+0x7c/0xb0
 | [81088490] ? kthread_create_on_node+0x180/0x180
 
 Hmm...had time to dig into this a bit deeper and looking at
 build_overlap_sched_groups(), specifically this bit of code:
 
 kernel/sched/core.c:
 
 5066 static int
 5067 build_overlap_sched_groups(struct sched_domain *sd, int cpu)
 5068 {
 ...
 5109 /*
 5110  * Initialize sgp-power such that even if we mess up the
 5111  * domains and no possible iteration will get us here, 
 we won't
 5112  * die on a /0 trap.
 5113  */
 5114 sg-sgp-power = SCHED_POWER_SCALE * 
 cpumask_weight(sg_span);
 
 I'm wondering whether the same precaution should be used when it comes to 
 sg-sgp-power_orig.

http://marc.info/?l=linux-kernelm=138684195315258
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] sched: division by zero in find_busiest_group()

2013-12-17 Thread Hedi Berriche
On Mon, Dec 09, 2013 at 18:10 Hedi Berriche wrote:
| Folks,
| 
| The following panic occurs *early* at boot time on high *enough* CPU count
| machines:
| 
| divide error:  [#1] SMP 
| Modules linked in:
| CPU: 22 PID: 1146 Comm: kworker/22:0 Not tainted 3.13.0-rc2-00122-gdea4f48 #8
| Hardware name: Intel Corp. Stoutland Platform, BIOS 2.20 UEFI2.10 PI1.0 X64 
2013-09-20
| task: 8827d49f31c0 ti: 8827d4a18000 task.ti: 8827d4a18000
| RIP: 0010:[]  [] 
find_busiest_group+0x26b/0x890
| RSP: :8827d4a19b68  EFLAGS: 00010006
| RAX: 7fff RBX: 8000 RCX: 0200
| RDX:  RSI: 8000 RDI: 0020
| RBP: 8827d4a19cc0 R08:  R09: 
| R10:  R11:  R12: 
| R13: 8827d4a19d28 R14: 8827d4a19b98 R15: 
| FS:  () GS:8827dfd8() knlGS:
| CS:  0010 DS:  ES:  CR0: 8005003b
| CR2: 00b8 CR3: 018da000 CR4: 07e0
| Stack:
| 8827d4b35800  00014600 00014600
|  8827d4b35818  
|   8000 
| Call Trace:
| [] load_balance+0x166/0x7f0
| [] idle_balance+0x10e/0x1b0
| [] __schedule+0x723/0x780
| [] schedule+0x29/0x70
| [] worker_thread+0x1c9/0x400
| [] ? rescuer_thread+0x3e0/0x3e0
| [] kthread+0xd2/0xf0
| [] ? kthread_create_on_node+0x180/0x180
| [] ret_from_fork+0x7c/0xb0
| [] ? kthread_create_on_node+0x180/0x180

Hmm...had time to dig into this a bit deeper and looking at
build_overlap_sched_groups(), specifically this bit of code:

kernel/sched/core.c:

5066 static int
5067 build_overlap_sched_groups(struct sched_domain *sd, int cpu)
5068 {
...
5109 /*
5110  * Initialize sgp->power such that even if we mess up the
5111  * domains and no possible iteration will get us here, we 
won't
5112  * die on a /0 trap.
5113  */
5114 sg->sgp->power = SCHED_POWER_SCALE * 
cpumask_weight(sg_span);

I'm wondering whether the same precaution should be used when it comes to 
sg->sgp->power_orig.

Cheers,
Hedi.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] sched: division by zero in find_busiest_group()

2013-12-17 Thread Hedi Berriche
On Mon, Dec 09, 2013 at 18:10 Hedi Berriche wrote:
| Folks,
| 
| The following panic occurs *early* at boot time on high *enough* CPU count
| machines:
| 
| divide error:  [#1] SMP 
| Modules linked in:
| CPU: 22 PID: 1146 Comm: kworker/22:0 Not tainted 3.13.0-rc2-00122-gdea4f48 #8
| Hardware name: Intel Corp. Stoutland Platform, BIOS 2.20 UEFI2.10 PI1.0 X64 
2013-09-20
| task: 8827d49f31c0 ti: 8827d4a18000 task.ti: 8827d4a18000
| RIP: 0010:[810a345b]  [810a345b] 
find_busiest_group+0x26b/0x890
| RSP: :8827d4a19b68  EFLAGS: 00010006
| RAX: 7fff RBX: 8000 RCX: 0200
| RDX:  RSI: 8000 RDI: 0020
| RBP: 8827d4a19cc0 R08:  R09: 
| R10:  R11:  R12: 
| R13: 8827d4a19d28 R14: 8827d4a19b98 R15: 
| FS:  () GS:8827dfd8() knlGS:
| CS:  0010 DS:  ES:  CR0: 8005003b
| CR2: 00b8 CR3: 018da000 CR4: 07e0
| Stack:
| 8827d4b35800  00014600 00014600
|  8827d4b35818  
|   8000 
| Call Trace:
| [810a3be6] load_balance+0x166/0x7f0
| [810a477e] idle_balance+0x10e/0x1b0
| [815d83d3] __schedule+0x723/0x780
| [815d8459] schedule+0x29/0x70
| [810818b9] worker_thread+0x1c9/0x400
| [810816f0] ? rescuer_thread+0x3e0/0x3e0
| [81088562] kthread+0xd2/0xf0
| [81088490] ? kthread_create_on_node+0x180/0x180
| [815e437c] ret_from_fork+0x7c/0xb0
| [81088490] ? kthread_create_on_node+0x180/0x180

Hmm...had time to dig into this a bit deeper and looking at
build_overlap_sched_groups(), specifically this bit of code:

kernel/sched/core.c:

5066 static int
5067 build_overlap_sched_groups(struct sched_domain *sd, int cpu)
5068 {
...
5109 /*
5110  * Initialize sgp-power such that even if we mess up the
5111  * domains and no possible iteration will get us here, we 
won't
5112  * die on a /0 trap.
5113  */
5114 sg-sgp-power = SCHED_POWER_SCALE * 
cpumask_weight(sg_span);

I'm wondering whether the same precaution should be used when it comes to 
sg-sgp-power_orig.

Cheers,
Hedi.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/