Re: [RFC] Start SMP subsystem earlier
On 1/6/15 10:55 AM, Ian Lepore wrote: > On Tue, 2015-01-06 at 09:37 -0500, John Baldwin wrote: >> On 1/5/15 8:18 AM, Hans Petter Selasky wrote: >>> Hi, >>> >>> There is a limitiation on the number of interrupt vectors available when >>> only a single processor is running. To have more interrupts available we >>> need to start SMP earlier when building a monotolith kernel and not >>> loading drivers as modules. The driver in question is a network driver >>> and because it cannot be started after SI_SUB_ROOT_CONF due to PXE >>> support I see no other option than to move SI_SUB_SMP earlier. >>> >>> Suggested patch: >>> [...] >>> >>> This fixes a problem for Mellanox drivers in the OFED layer. Possibly we >>> need to move the SMP even earlier to not miss the generic FreeBSD PCI >>> device enumeration or maybe this is not possible. Does anyone know how >>> early we can start SMP? >> >> We need a lot more work before this is ready. This is one of the goals >> of the multipass new-bus stuff. In particular, we have to enumerate >> enough devices to bring event timer hardware up so that timer interrupts >> work so that tsleep() will actually sleep. In addition, we also need >> idle threads created and working before APs are started as otherwise >> they will have no thread to run initially. This is certainly a desired >> feature, but it is not as simple as moving the sysinit up I'm afraid. >> > > Just an FYI, the ARM world is now using the multipass newbus stuff. It > works well, with some quirks... > > The predefined pass names don't always makes sense for the arm world. > There aren't enough predefined pass names and even though the number > space for them is 4 billion wide all the predefined names are in the > range < 100 and separated by only 10 so it's tricky to wedge things > between the existing names. > > The strangest bit is when you have interdependent drivers at different > early pass numbers. Sometimes it's necessary to do almost nothing in > the attach() routine and do all the real attach-time type stuff in a > bus_new_pass() routine after the pass number becomes high enough that > your co-dependent driver peers are available. Yes, I almost want another downcall through the tree that is something like 'bus_pass_completed', though the original design was to override bus_new_pass as you have done. And yes, in many cases the logic needs to move out of attach. The pci bus will end up only doing enumeration but no resource assignment in its attach routine once things are fleshed out more for example. However, for now I've found that even on x86 I've had to add a new pass level for ACPI and some other things like acpi_sysresource. :( It almost wants more of a provides-requires setup than hardcoded pass levels, but that's more complicated to implement. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [RFC] Start SMP subsystem earlier
On Tue, 2015-01-06 at 07:57 -0800, Adrian Chadd wrote: > On 6 January 2015 at 07:55, Ian Lepore wrote: > > On Tue, 2015-01-06 at 09:37 -0500, John Baldwin wrote: > >> On 1/5/15 8:18 AM, Hans Petter Selasky wrote: > >> > Hi, > >> > > >> > There is a limitiation on the number of interrupt vectors available when > >> > only a single processor is running. To have more interrupts available we > >> > need to start SMP earlier when building a monotolith kernel and not > >> > loading drivers as modules. The driver in question is a network driver > >> > and because it cannot be started after SI_SUB_ROOT_CONF due to PXE > >> > support I see no other option than to move SI_SUB_SMP earlier. > >> > > >> > Suggested patch: > >> > > >> >>[...] > >> > > >> > This fixes a problem for Mellanox drivers in the OFED layer. Possibly we > >> > need to move the SMP even earlier to not miss the generic FreeBSD PCI > >> > device enumeration or maybe this is not possible. Does anyone know how > >> > early we can start SMP? > >> > >> We need a lot more work before this is ready. This is one of the goals > >> of the multipass new-bus stuff. In particular, we have to enumerate > >> enough devices to bring event timer hardware up so that timer interrupts > >> work so that tsleep() will actually sleep. In addition, we also need > >> idle threads created and working before APs are started as otherwise > >> they will have no thread to run initially. This is certainly a desired > >> feature, but it is not as simple as moving the sysinit up I'm afraid. > >> > > > > Just an FYI, the ARM world is now using the multipass newbus stuff. It > > works well, with some quirks... > > > > The predefined pass names don't always makes sense for the arm world. > > There aren't enough predefined pass names and even though the number > > space for them is 4 billion wide all the predefined names are in the > > range < 100 and separated by only 10 so it's tricky to wedge things > > between the existing names. > > Maybe we need a RENUM script? :) > I wanted to renumber them but it was pointed out to me that the existing names and numbers are part of the ABI and we're not free to do so except on -current, and that would make all related work going forward ineligible for MFC. (Personally, I'm a bit skeptical that there's any big out-of-tree use of the existing names/numbers.) -- Ian ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [RFC] Start SMP subsystem earlier
On 6 January 2015 at 07:55, Ian Lepore wrote: > On Tue, 2015-01-06 at 09:37 -0500, John Baldwin wrote: >> On 1/5/15 8:18 AM, Hans Petter Selasky wrote: >> > Hi, >> > >> > There is a limitiation on the number of interrupt vectors available when >> > only a single processor is running. To have more interrupts available we >> > need to start SMP earlier when building a monotolith kernel and not >> > loading drivers as modules. The driver in question is a network driver >> > and because it cannot be started after SI_SUB_ROOT_CONF due to PXE >> > support I see no other option than to move SI_SUB_SMP earlier. >> > >> > Suggested patch: >> > >> >>[...] >> > >> > This fixes a problem for Mellanox drivers in the OFED layer. Possibly we >> > need to move the SMP even earlier to not miss the generic FreeBSD PCI >> > device enumeration or maybe this is not possible. Does anyone know how >> > early we can start SMP? >> >> We need a lot more work before this is ready. This is one of the goals >> of the multipass new-bus stuff. In particular, we have to enumerate >> enough devices to bring event timer hardware up so that timer interrupts >> work so that tsleep() will actually sleep. In addition, we also need >> idle threads created and working before APs are started as otherwise >> they will have no thread to run initially. This is certainly a desired >> feature, but it is not as simple as moving the sysinit up I'm afraid. >> > > Just an FYI, the ARM world is now using the multipass newbus stuff. It > works well, with some quirks... > > The predefined pass names don't always makes sense for the arm world. > There aren't enough predefined pass names and even though the number > space for them is 4 billion wide all the predefined names are in the > range < 100 and separated by only 10 so it's tricky to wedge things > between the existing names. Maybe we need a RENUM script? :) -adrian > > The strangest bit is when you have interdependent drivers at different > early pass numbers. Sometimes it's necessary to do almost nothing in > the attach() routine and do all the real attach-time type stuff in a > bus_new_pass() routine after the pass number becomes high enough that > your co-dependent driver peers are available. > > -- Ian > > > ___ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [RFC] Start SMP subsystem earlier
On Tue, 2015-01-06 at 09:37 -0500, John Baldwin wrote: > On 1/5/15 8:18 AM, Hans Petter Selasky wrote: > > Hi, > > > > There is a limitiation on the number of interrupt vectors available when > > only a single processor is running. To have more interrupts available we > > need to start SMP earlier when building a monotolith kernel and not > > loading drivers as modules. The driver in question is a network driver > > and because it cannot be started after SI_SUB_ROOT_CONF due to PXE > > support I see no other option than to move SI_SUB_SMP earlier. > > > > Suggested patch: > > > >>[...] > > > > This fixes a problem for Mellanox drivers in the OFED layer. Possibly we > > need to move the SMP even earlier to not miss the generic FreeBSD PCI > > device enumeration or maybe this is not possible. Does anyone know how > > early we can start SMP? > > We need a lot more work before this is ready. This is one of the goals > of the multipass new-bus stuff. In particular, we have to enumerate > enough devices to bring event timer hardware up so that timer interrupts > work so that tsleep() will actually sleep. In addition, we also need > idle threads created and working before APs are started as otherwise > they will have no thread to run initially. This is certainly a desired > feature, but it is not as simple as moving the sysinit up I'm afraid. > Just an FYI, the ARM world is now using the multipass newbus stuff. It works well, with some quirks... The predefined pass names don't always makes sense for the arm world. There aren't enough predefined pass names and even though the number space for them is 4 billion wide all the predefined names are in the range < 100 and separated by only 10 so it's tricky to wedge things between the existing names. The strangest bit is when you have interdependent drivers at different early pass numbers. Sometimes it's necessary to do almost nothing in the attach() routine and do all the real attach-time type stuff in a bus_new_pass() routine after the pass number becomes high enough that your co-dependent driver peers are available. -- Ian ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [RFC] Start SMP subsystem earlier
On Tue, Jan 06, 2015 at 09:37:30AM -0500, John Baldwin wrote: > On 1/5/15 8:18 AM, Hans Petter Selasky wrote: > > Hi, > > > > There is a limitiation on the number of interrupt vectors available when > > only a single processor is running. To have more interrupts available we > > need to start SMP earlier when building a monotolith kernel and not > > loading drivers as modules. The driver in question is a network driver > > and because it cannot be started after SI_SUB_ROOT_CONF due to PXE > > support I see no other option than to move SI_SUB_SMP earlier. > > > > Suggested patch: > > > >> Index: sys/kernel.h > >> === > >> --- sys/kernel.h(revision 276691) > >> +++ sys/kernel.h(working copy) > >> @@ -152,6 +152,7 @@ > >> SI_SUB_KPROF= 0x900,/* kernel profiling*/ > >> SI_SUB_KICK_SCHEDULER= 0xa00,/* start the timeout > >> events*/ > >> SI_SUB_INT_CONFIG_HOOKS= 0xa80,/* Interrupts enabled > >> config */ > >> +SI_SUB_SMP= 0xa85,/* start the APs*/ > >> SI_SUB_ROOT_CONF= 0xb00,/* Find root devices */ > >> SI_SUB_DUMP_CONF= 0xb20,/* Find dump devices */ > >> SI_SUB_RAID= 0xb38,/* Configure GEOM classes */ > >> @@ -165,7 +166,6 @@ > >> SI_SUB_KTHREAD_BUF= 0xea0,/* buffer daemon*/ > >> SI_SUB_KTHREAD_UPDATE= 0xec0,/* update daemon*/ > >> SI_SUB_KTHREAD_IDLE= 0xee0,/* idle procs*/ > >> -SI_SUB_SMP= 0xf00,/* start the APs*/ > >> SI_SUB_RACCTD= 0xf10,/* start racctd*/ > >> SI_SUB_LAST= 0xfff/* final initialization */ > >> }; > > > > This fixes a problem for Mellanox drivers in the OFED layer. Possibly we > > need to move the SMP even earlier to not miss the generic FreeBSD PCI > > device enumeration or maybe this is not possible. Does anyone know how > > early we can start SMP? > > We need a lot more work before this is ready. This is one of the goals > of the multipass new-bus stuff. In particular, we have to enumerate > enough devices to bring event timer hardware up so that timer interrupts > work so that tsleep() will actually sleep. In addition, we also need > idle threads created and working before APs are started as otherwise > they will have no thread to run initially. This is certainly a desired > feature, but it is not as simple as moving the sysinit up I'm afraid. > I believe that idle threads are still created before the APs start with the patch posted, this was the thing I checked first. It is SUB_SCHED_IDLE, which is done long before even drivers are configured. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [RFC] Start SMP subsystem earlier
On 01/06/15 15:37, John Baldwin wrote: We need a lot more work before this is ready. This is one of the goals of the multipass new-bus stuff. In particular, we have to enumerate enough devices to bring event timer hardware up so that timer interrupts work so that tsleep() will actually sleep. In addition, we also need idle threads created and working before APs are started as otherwise they will have no thread to run initially. This is certainly a desired feature, but it is not as simple as moving the sysinit up I'm afraid. Got it. Thank you! --HPS ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [RFC] Start SMP subsystem earlier
On 1/5/15 8:18 AM, Hans Petter Selasky wrote: > Hi, > > There is a limitiation on the number of interrupt vectors available when > only a single processor is running. To have more interrupts available we > need to start SMP earlier when building a monotolith kernel and not > loading drivers as modules. The driver in question is a network driver > and because it cannot be started after SI_SUB_ROOT_CONF due to PXE > support I see no other option than to move SI_SUB_SMP earlier. > > Suggested patch: > >> Index: sys/kernel.h >> === >> --- sys/kernel.h(revision 276691) >> +++ sys/kernel.h(working copy) >> @@ -152,6 +152,7 @@ >> SI_SUB_KPROF= 0x900,/* kernel profiling*/ >> SI_SUB_KICK_SCHEDULER= 0xa00,/* start the timeout >> events*/ >> SI_SUB_INT_CONFIG_HOOKS= 0xa80,/* Interrupts enabled >> config */ >> +SI_SUB_SMP= 0xa85,/* start the APs*/ >> SI_SUB_ROOT_CONF= 0xb00,/* Find root devices */ >> SI_SUB_DUMP_CONF= 0xb20,/* Find dump devices */ >> SI_SUB_RAID= 0xb38,/* Configure GEOM classes */ >> @@ -165,7 +166,6 @@ >> SI_SUB_KTHREAD_BUF= 0xea0,/* buffer daemon*/ >> SI_SUB_KTHREAD_UPDATE= 0xec0,/* update daemon*/ >> SI_SUB_KTHREAD_IDLE= 0xee0,/* idle procs*/ >> -SI_SUB_SMP= 0xf00,/* start the APs*/ >> SI_SUB_RACCTD= 0xf10,/* start racctd*/ >> SI_SUB_LAST= 0xfff/* final initialization */ >> }; > > This fixes a problem for Mellanox drivers in the OFED layer. Possibly we > need to move the SMP even earlier to not miss the generic FreeBSD PCI > device enumeration or maybe this is not possible. Does anyone know how > early we can start SMP? We need a lot more work before this is ready. This is one of the goals of the multipass new-bus stuff. In particular, we have to enumerate enough devices to bring event timer hardware up so that timer interrupts work so that tsleep() will actually sleep. In addition, we also need idle threads created and working before APs are started as otherwise they will have no thread to run initially. This is certainly a desired feature, but it is not as simple as moving the sysinit up I'm afraid. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [RFC] Start SMP subsystem earlier
On 5 January 2015 at 06:08, Hans Petter Selasky wrote: > On 01/05/15 14:43, Konstantin Belousov wrote: >> >> On Mon, Jan 05, 2015 at 02:18:17PM +0100, Hans Petter Selasky wrote: >>> >>> Hi, >>> >>> There is a limitiation on the number of interrupt vectors available when >>> only a single processor is running. To have more interrupts available we >>> need to start SMP earlier when building a monotolith kernel and not >>> loading drivers as modules. The driver in question is a network driver >>> and because it cannot be started after SI_SUB_ROOT_CONF due to PXE >>> support I see no other option than to move SI_SUB_SMP earlier. >>> >>> Suggested patch: >>> Index: sys/kernel.h === --- sys/kernel.h(revision 276691) +++ sys/kernel.h(working copy) @@ -152,6 +152,7 @@ SI_SUB_KPROF= 0x900,/* kernel profiling*/ SI_SUB_KICK_SCHEDULER = 0xa00,/* start the timeout events*/ SI_SUB_INT_CONFIG_HOOKS = 0xa80,/* Interrupts enabled config */ + SI_SUB_SMP = 0xa85,/* start the APs*/ SI_SUB_ROOT_CONF= 0xb00,/* Find root devices */ SI_SUB_DUMP_CONF= 0xb20,/* Find dump devices */ SI_SUB_RAID = 0xb38,/* Configure GEOM classes */ @@ -165,7 +166,6 @@ SI_SUB_KTHREAD_BUF = 0xea0,/* buffer daemon*/ SI_SUB_KTHREAD_UPDATE = 0xec0,/* update daemon*/ SI_SUB_KTHREAD_IDLE = 0xee0,/* idle procs*/ - SI_SUB_SMP = 0xf00,/* start the APs*/ SI_SUB_RACCTD = 0xf10,/* start racctd*/ SI_SUB_LAST = 0xfff /* final initialization */ }; >> >> Did you inspected all reordered sysinit routines and ensured that the >> reordering is safe ? At very least, SUB_SMP starts event timers, >> while KTHREAD_IDLE is about configuring some hardware which might >> be required/not ready for that. > > > Hi, > > I did not inspect everything myself yet regarding this change. That's why > I'm sending this e-mail out. The problem is simply that the total number of > interrupts appears to be limited by "APIC_NUM_IOINTS" and "NUM_IO_INTS" > which is per CPU from what I understand. Until SMP is activated the newbus > code is simply distributing the IRQ vectors on the available IRQs, then when > SMP is up it is re-shuffling them all. > > I was initially thinking that a hack might be possible, like using RF_SHARED > for the IRQ resource, but then noticed that we were using MSI interrupts, > which are not allocated in the same manner. > > The other issue is that the IRQs should be functional too, so that PXE boot > can work. > I'm also starting to see increasing amounts of wifi hardware that expects interrupts to be up and working during probe/attach. (I think i915kms has the same issue too, no?) -adrian ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [RFC] Start SMP subsystem earlier
On 01/05/15 14:43, Konstantin Belousov wrote: On Mon, Jan 05, 2015 at 02:18:17PM +0100, Hans Petter Selasky wrote: Hi, There is a limitiation on the number of interrupt vectors available when only a single processor is running. To have more interrupts available we need to start SMP earlier when building a monotolith kernel and not loading drivers as modules. The driver in question is a network driver and because it cannot be started after SI_SUB_ROOT_CONF due to PXE support I see no other option than to move SI_SUB_SMP earlier. Suggested patch: Index: sys/kernel.h === --- sys/kernel.h(revision 276691) +++ sys/kernel.h(working copy) @@ -152,6 +152,7 @@ SI_SUB_KPROF= 0x900,/* kernel profiling*/ SI_SUB_KICK_SCHEDULER = 0xa00,/* start the timeout events*/ SI_SUB_INT_CONFIG_HOOKS = 0xa80,/* Interrupts enabled config */ + SI_SUB_SMP = 0xa85,/* start the APs*/ SI_SUB_ROOT_CONF= 0xb00,/* Find root devices */ SI_SUB_DUMP_CONF= 0xb20,/* Find dump devices */ SI_SUB_RAID = 0xb38,/* Configure GEOM classes */ @@ -165,7 +166,6 @@ SI_SUB_KTHREAD_BUF = 0xea0,/* buffer daemon*/ SI_SUB_KTHREAD_UPDATE = 0xec0,/* update daemon*/ SI_SUB_KTHREAD_IDLE = 0xee0,/* idle procs*/ - SI_SUB_SMP = 0xf00,/* start the APs*/ SI_SUB_RACCTD = 0xf10,/* start racctd*/ SI_SUB_LAST = 0xfff /* final initialization */ }; Did you inspected all reordered sysinit routines and ensured that the reordering is safe ? At very least, SUB_SMP starts event timers, while KTHREAD_IDLE is about configuring some hardware which might be required/not ready for that. Hi, I did not inspect everything myself yet regarding this change. That's why I'm sending this e-mail out. The problem is simply that the total number of interrupts appears to be limited by "APIC_NUM_IOINTS" and "NUM_IO_INTS" which is per CPU from what I understand. Until SMP is activated the newbus code is simply distributing the IRQ vectors on the available IRQs, then when SMP is up it is re-shuffling them all. I was initially thinking that a hack might be possible, like using RF_SHARED for the IRQ resource, but then noticed that we were using MSI interrupts, which are not allocated in the same manner. The other issue is that the IRQs should be functional too, so that PXE boot can work. --HPS This fixes a problem for Mellanox drivers in the OFED layer. Possibly we need to move the SMP even earlier to not miss the generic FreeBSD PCI device enumeration or maybe this is not possible. Does anyone know how early we can start SMP? ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [RFC] Start SMP subsystem earlier
On Mon, Jan 05, 2015 at 02:18:17PM +0100, Hans Petter Selasky wrote: > Hi, > > There is a limitiation on the number of interrupt vectors available when > only a single processor is running. To have more interrupts available we > need to start SMP earlier when building a monotolith kernel and not > loading drivers as modules. The driver in question is a network driver > and because it cannot be started after SI_SUB_ROOT_CONF due to PXE > support I see no other option than to move SI_SUB_SMP earlier. > > Suggested patch: > > > Index: sys/kernel.h > > === > > --- sys/kernel.h(revision 276691) > > +++ sys/kernel.h(working copy) > > @@ -152,6 +152,7 @@ > > SI_SUB_KPROF= 0x900,/* kernel profiling*/ > > SI_SUB_KICK_SCHEDULER = 0xa00,/* start the timeout events*/ > > SI_SUB_INT_CONFIG_HOOKS = 0xa80,/* Interrupts enabled config */ > > + SI_SUB_SMP = 0xa85,/* start the APs*/ > > SI_SUB_ROOT_CONF= 0xb00,/* Find root devices */ > > SI_SUB_DUMP_CONF= 0xb20,/* Find dump devices */ > > SI_SUB_RAID = 0xb38,/* Configure GEOM classes */ > > @@ -165,7 +166,6 @@ > > SI_SUB_KTHREAD_BUF = 0xea0,/* buffer daemon*/ > > SI_SUB_KTHREAD_UPDATE = 0xec0,/* update daemon*/ > > SI_SUB_KTHREAD_IDLE = 0xee0,/* idle procs*/ > > - SI_SUB_SMP = 0xf00,/* start the APs*/ > > SI_SUB_RACCTD = 0xf10,/* start racctd*/ > > SI_SUB_LAST = 0xfff /* final initialization */ > > }; Did you inspected all reordered sysinit routines and ensured that the reordering is safe ? At very least, SUB_SMP starts event timers, while KTHREAD_IDLE is about configuring some hardware which might be required/not ready for that. > > This fixes a problem for Mellanox drivers in the OFED layer. Possibly we > need to move the SMP even earlier to not miss the generic FreeBSD PCI > device enumeration or maybe this is not possible. Does anyone know how > early we can start SMP? ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"