Re: [Regression] sched: division by zero in find_busiest_group()
On Wed, Dec 18, 2013 at 04:28:35AM +, Hedi Berriche wrote: > On Mon, Dec 09, 2013 at 18:10 Hedi Berriche wrote: > | Folks, > | > | The following panic occurs *early* at boot time on high *enough* CPU count > | machines: > | > | divide error: [#1] SMP > | Modules linked in: > | CPU: 22 PID: 1146 Comm: kworker/22:0 Not tainted 3.13.0-rc2-00122-gdea4f48 > #8 > | Hardware name: Intel Corp. Stoutland Platform, BIOS 2.20 UEFI2.10 PI1.0 X64 > 2013-09-20 > | task: 8827d49f31c0 ti: 8827d4a18000 task.ti: 8827d4a18000 > | RIP: 0010:[] [] > find_busiest_group+0x26b/0x890 > | RSP: :8827d4a19b68 EFLAGS: 00010006 > | RAX: 7fff RBX: 8000 RCX: 0200 > | RDX: RSI: 8000 RDI: 0020 > | RBP: 8827d4a19cc0 R08: R09: > | R10: R11: R12: > | R13: 8827d4a19d28 R14: 8827d4a19b98 R15: > | FS: () GS:8827dfd8() knlGS: > | CS: 0010 DS: ES: CR0: 8005003b > | CR2: 00b8 CR3: 018da000 CR4: 07e0 > | Stack: > | 8827d4b35800 00014600 00014600 > | 8827d4b35818 > | 8000 > | Call Trace: > | [] load_balance+0x166/0x7f0 > | [] idle_balance+0x10e/0x1b0 > | [] __schedule+0x723/0x780 > | [] schedule+0x29/0x70 > | [] worker_thread+0x1c9/0x400 > | [] ? rescuer_thread+0x3e0/0x3e0 > | [] kthread+0xd2/0xf0 > | [] ? kthread_create_on_node+0x180/0x180 > | [] ret_from_fork+0x7c/0xb0 > | [] ? kthread_create_on_node+0x180/0x180 > > Hmm...had time to dig into this a bit deeper and looking at > build_overlap_sched_groups(), specifically this bit of code: > > kernel/sched/core.c: > > 5066 static int > 5067 build_overlap_sched_groups(struct sched_domain *sd, int cpu) > 5068 { > ... > 5109 /* > 5110 * Initialize sgp->power such that even if we mess up the > 5111 * domains and no possible iteration will get us here, > we won't > 5112 * die on a /0 trap. > 5113 */ > 5114 sg->sgp->power = SCHED_POWER_SCALE * > cpumask_weight(sg_span); > > I'm wondering whether the same precaution should be used when it comes to > sg->sgp->power_orig. http://marc.info/?l=linux-kernel=138684195315258 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] sched: division by zero in find_busiest_group()
On Wed, Dec 18, 2013 at 04:28:35AM +, Hedi Berriche wrote: On Mon, Dec 09, 2013 at 18:10 Hedi Berriche wrote: | Folks, | | The following panic occurs *early* at boot time on high *enough* CPU count | machines: | | divide error: [#1] SMP | Modules linked in: | CPU: 22 PID: 1146 Comm: kworker/22:0 Not tainted 3.13.0-rc2-00122-gdea4f48 #8 | Hardware name: Intel Corp. Stoutland Platform, BIOS 2.20 UEFI2.10 PI1.0 X64 2013-09-20 | task: 8827d49f31c0 ti: 8827d4a18000 task.ti: 8827d4a18000 | RIP: 0010:[810a345b] [810a345b] find_busiest_group+0x26b/0x890 | RSP: :8827d4a19b68 EFLAGS: 00010006 | RAX: 7fff RBX: 8000 RCX: 0200 | RDX: RSI: 8000 RDI: 0020 | RBP: 8827d4a19cc0 R08: R09: | R10: R11: R12: | R13: 8827d4a19d28 R14: 8827d4a19b98 R15: | FS: () GS:8827dfd8() knlGS: | CS: 0010 DS: ES: CR0: 8005003b | CR2: 00b8 CR3: 018da000 CR4: 07e0 | Stack: | 8827d4b35800 00014600 00014600 | 8827d4b35818 | 8000 | Call Trace: | [810a3be6] load_balance+0x166/0x7f0 | [810a477e] idle_balance+0x10e/0x1b0 | [815d83d3] __schedule+0x723/0x780 | [815d8459] schedule+0x29/0x70 | [810818b9] worker_thread+0x1c9/0x400 | [810816f0] ? rescuer_thread+0x3e0/0x3e0 | [81088562] kthread+0xd2/0xf0 | [81088490] ? kthread_create_on_node+0x180/0x180 | [815e437c] ret_from_fork+0x7c/0xb0 | [81088490] ? kthread_create_on_node+0x180/0x180 Hmm...had time to dig into this a bit deeper and looking at build_overlap_sched_groups(), specifically this bit of code: kernel/sched/core.c: 5066 static int 5067 build_overlap_sched_groups(struct sched_domain *sd, int cpu) 5068 { ... 5109 /* 5110 * Initialize sgp-power such that even if we mess up the 5111 * domains and no possible iteration will get us here, we won't 5112 * die on a /0 trap. 5113 */ 5114 sg-sgp-power = SCHED_POWER_SCALE * cpumask_weight(sg_span); I'm wondering whether the same precaution should be used when it comes to sg-sgp-power_orig. http://marc.info/?l=linux-kernelm=138684195315258 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] sched: division by zero in find_busiest_group()
On Mon, Dec 09, 2013 at 18:10 Hedi Berriche wrote: | Folks, | | The following panic occurs *early* at boot time on high *enough* CPU count | machines: | | divide error: [#1] SMP | Modules linked in: | CPU: 22 PID: 1146 Comm: kworker/22:0 Not tainted 3.13.0-rc2-00122-gdea4f48 #8 | Hardware name: Intel Corp. Stoutland Platform, BIOS 2.20 UEFI2.10 PI1.0 X64 2013-09-20 | task: 8827d49f31c0 ti: 8827d4a18000 task.ti: 8827d4a18000 | RIP: 0010:[] [] find_busiest_group+0x26b/0x890 | RSP: :8827d4a19b68 EFLAGS: 00010006 | RAX: 7fff RBX: 8000 RCX: 0200 | RDX: RSI: 8000 RDI: 0020 | RBP: 8827d4a19cc0 R08: R09: | R10: R11: R12: | R13: 8827d4a19d28 R14: 8827d4a19b98 R15: | FS: () GS:8827dfd8() knlGS: | CS: 0010 DS: ES: CR0: 8005003b | CR2: 00b8 CR3: 018da000 CR4: 07e0 | Stack: | 8827d4b35800 00014600 00014600 | 8827d4b35818 | 8000 | Call Trace: | [] load_balance+0x166/0x7f0 | [] idle_balance+0x10e/0x1b0 | [] __schedule+0x723/0x780 | [] schedule+0x29/0x70 | [] worker_thread+0x1c9/0x400 | [] ? rescuer_thread+0x3e0/0x3e0 | [] kthread+0xd2/0xf0 | [] ? kthread_create_on_node+0x180/0x180 | [] ret_from_fork+0x7c/0xb0 | [] ? kthread_create_on_node+0x180/0x180 Hmm...had time to dig into this a bit deeper and looking at build_overlap_sched_groups(), specifically this bit of code: kernel/sched/core.c: 5066 static int 5067 build_overlap_sched_groups(struct sched_domain *sd, int cpu) 5068 { ... 5109 /* 5110 * Initialize sgp->power such that even if we mess up the 5111 * domains and no possible iteration will get us here, we won't 5112 * die on a /0 trap. 5113 */ 5114 sg->sgp->power = SCHED_POWER_SCALE * cpumask_weight(sg_span); I'm wondering whether the same precaution should be used when it comes to sg->sgp->power_orig. Cheers, Hedi. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] sched: division by zero in find_busiest_group()
On Mon, Dec 09, 2013 at 18:10 Hedi Berriche wrote: | Folks, | | The following panic occurs *early* at boot time on high *enough* CPU count | machines: | | divide error: [#1] SMP | Modules linked in: | CPU: 22 PID: 1146 Comm: kworker/22:0 Not tainted 3.13.0-rc2-00122-gdea4f48 #8 | Hardware name: Intel Corp. Stoutland Platform, BIOS 2.20 UEFI2.10 PI1.0 X64 2013-09-20 | task: 8827d49f31c0 ti: 8827d4a18000 task.ti: 8827d4a18000 | RIP: 0010:[810a345b] [810a345b] find_busiest_group+0x26b/0x890 | RSP: :8827d4a19b68 EFLAGS: 00010006 | RAX: 7fff RBX: 8000 RCX: 0200 | RDX: RSI: 8000 RDI: 0020 | RBP: 8827d4a19cc0 R08: R09: | R10: R11: R12: | R13: 8827d4a19d28 R14: 8827d4a19b98 R15: | FS: () GS:8827dfd8() knlGS: | CS: 0010 DS: ES: CR0: 8005003b | CR2: 00b8 CR3: 018da000 CR4: 07e0 | Stack: | 8827d4b35800 00014600 00014600 | 8827d4b35818 | 8000 | Call Trace: | [810a3be6] load_balance+0x166/0x7f0 | [810a477e] idle_balance+0x10e/0x1b0 | [815d83d3] __schedule+0x723/0x780 | [815d8459] schedule+0x29/0x70 | [810818b9] worker_thread+0x1c9/0x400 | [810816f0] ? rescuer_thread+0x3e0/0x3e0 | [81088562] kthread+0xd2/0xf0 | [81088490] ? kthread_create_on_node+0x180/0x180 | [815e437c] ret_from_fork+0x7c/0xb0 | [81088490] ? kthread_create_on_node+0x180/0x180 Hmm...had time to dig into this a bit deeper and looking at build_overlap_sched_groups(), specifically this bit of code: kernel/sched/core.c: 5066 static int 5067 build_overlap_sched_groups(struct sched_domain *sd, int cpu) 5068 { ... 5109 /* 5110 * Initialize sgp-power such that even if we mess up the 5111 * domains and no possible iteration will get us here, we won't 5112 * die on a /0 trap. 5113 */ 5114 sg-sgp-power = SCHED_POWER_SCALE * cpumask_weight(sg_span); I'm wondering whether the same precaution should be used when it comes to sg-sgp-power_orig. Cheers, Hedi. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/