Fix GNUism in bsd.dep.mk
Hi, Here at genua, trying to build libpcap sometimes breaks in libpcap with the following error message: | Using $< in a non-suffix rule context is a GNUmake idiom \ |(/data/git/ehrhardt/genuos/os/mk/bsd.dep.mk:47) The bug is in bsd.dep.mk where ${.IMPSRC} (aka $<) is used in a context where it is not neccessarily defined by OpenBSD make. Below is a diff to Use ${.ALLSRC:M*.y} instead of ${.IMPSRC} as the input file for yacc. The issue is with the rule for the grammar.h file that is generated by yacc from grammar.c. You can easily reproduce the bug with the following steps: - build libpcap from scratch: cd .../src/lib/libpcap && make clean all - remove the generated grammar.h file: rm obj*/grammar.h - build libpcap again (incremental build): make In normal builds this does not trigger as grammar.h is implicitly generated by the rule for grammar.c and when make checks for dependencies it simply finds grammar.h uptodate. However, incremental or parallel builds might decide to make grammar.h from grammar.y. Now, why is this only a problem for grammar.h but not for grammar.c? The answer to this question is burried deeply in OpenBSD's mk files. The snippet in bsd.dep.mk that triggers the error is a single rule statement that generates foo.c and foo.h from foo.y with a call to yacc -d. The rule is generated with a loop, i.e. it is not a prefix rule. However, a prefix rule context is required for the use of ${.IMPSRC} aka $<. For the .c file such a prefix rule is provided by bsd.sys.mk and this rule is in scope when make evaluates the yacc rule. However, for .h file generation from a .y file there is no such prefix rule defined in any of the Makefiles. Even if it were the .h suffix is missing from .SUFFIXES and the rule would not be considered. NOTE: The obvious way to fix this would be to use $f instead of ${.IMPSRC}. However, this does not work as $f is then missing the path prefix and yacc won't find it if an obj directory is used. This is probably the reason for the use of ${.IMPSRC} in the first place. regards Christian diff --git a/os/mk/bsd.dep.mk b/os/mk/bsd.dep.mk index 4bd8bf7778..77ae4f 100644 --- a/os/mk/bsd.dep.mk +++ b/os/mk/bsd.dep.mk @@ -44,7 +44,7 @@ ${i:R:S/$/.o/} ${i:R:S/$/.po/} ${i:R:S/$/.so/} ${i:R:S/$/.do/}: $i # loop may not trigger . for f in ${SRCS:M*.y} ${f:.y=.c} ${f:.y=.h}: $f - ${YACC.y} -o ${f:.y=.c} ${.IMPSRC} + ${YACC.y} -o ${f:.y=.c} ${.ALLSRC:M*.y} . endfor CLEANFILES += ${SRCS:M*.y:.y=.h} .endif smime.p7s Description: S/MIME cryptographic signature
Re: Intel i219 network card support
Hi, before I post the next iteration of the patch, I'd like some more advice on some of the points you commented (inline): On Sun, Feb 14, 2016 at 03:34:19PM +1100, Jonathan Gray wrote: > On Mon, Feb 08, 2016 at 01:53:20PM +0100, Christian Ehrhardt wrote: > > retrieving revision 1.329 > > diff -u -p -r1.329 if_em.c > > --- dev/pci/if_em.c 12 Jan 2016 00:05:21 - 1.329 > > +++ dev/pci/if_em.c 4 Feb 2016 20:33:26 - > > @@ -145,6 +145,10 @@ const struct pci_matchid em_devices[] = > > { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I218_V }, > > { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I218_V_2 }, > > { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I218_V_3 }, > > + { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I219_LM }, > > + { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I219_V }, > > + { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I219_LM2 }, > > + { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I219_V2 }, > > { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_82580_COPPER }, > > { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_82580_FIBER }, > > { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_82580_SERDES }, > > @@ -253,6 +257,9 @@ int em_dma_malloc(struct em_softc *, bu > > void em_dma_free(struct em_softc *, struct em_dma_alloc *); > > u_int32_t em_fill_descriptors(u_int64_t address, u_int32_t length, > > PDESC_ARRAY desc_array); > > +void em_flush_tx_ring(struct em_softc *); > > +void em_flush_rx_ring(struct em_softc *); > > +void em_flush_desc_rings(struct em_softc *); > > > > /* > > * OpenBSD Device Interface Entry Points > > @@ -435,6 +442,7 @@ em_attach(struct device *parent, struct > > case em_ich10lan: > > case em_pch2lan: > > case em_pch_lpt: > > + case em_pch_spt: > > case em_80003es2lan: > > /* 9K Jumbo Frame size */ > > sc->hw.max_frame_size = 9234; > > @@ -844,6 +852,7 @@ em_init(void *arg) > > case em_pchlan: > > case em_pch2lan: > > case em_pch_lpt: > > + case em_pch_spt: > > pba = E1000_PBA_26K; > > break; > > default: > > @@ -1506,10 +1515,12 @@ em_stop(void *arg, int softonly) > > timeout_del(>timer_handle); > > timeout_del(>tx_fifo_timer_handle); > > > > - if (!softonly) { > > + if (!softonly) > > em_disable_intr(sc); > > + if (sc->hw.mac_type == em_pch_spt) > > + em_flush_desc_rings(sc); > > + if (!softonly) > > em_reset_hw(>hw); > > - } > > > > intr_barrier(sc->sc_intrhand); > > ifq_barrier(>if_snd); > > @@ -1563,6 +1574,27 @@ em_identify_hardware(struct em_softc *sc > > sc->hw.phy_init_script = TRUE; > > } > > > > +void > > +em_legacy_irq_quirk_spt(struct em_softc *sc) > > +{ > > + uint32_treg; > > + > > + /* Legacy interrupt: SPT needs a quirk. */ > > + if (sc->hw.mac_type != em_pch_spt) > > + return; > > + if (sc->legacy_irq == 0) > > + return; > > + > > + reg = EM_READ_REG(>hw, E1000_FEXTNVM7); > > + reg |= E1000_FEXTNVM7_SIDE_CLK_UNGATE; > > + EM_WRITE_REG(>hw, E1000_FEXTNVM7, reg); > > + > > + reg = EM_READ_REG(>hw, E1000_FEXTNVM9); > > + reg |= E1000_FEXTNVM9_IOSFSB_CLKGATE_DIS | > > + E1000_FEXTNVM9_IOSFSB_CLKREQ_DIS; > > + EM_WRITE_REG(>hw, E1000_FEXTNVM9, reg); > > +} > > + > > int > > em_allocate_pci_resources(struct em_softc *sc) > > { > > @@ -1617,8 +1649,15 @@ em_allocate_pci_resources(struct em_soft > > break; > > } > > > > + sc->osdep.em_flashoffset = 0; > > /* for ICH8 and family we need to find the flash memory */ > > - if (IS_ICH8(sc->hw.mac_type)) { > > + if (sc->hw.mac_type == em_pch_spt) { > > + sc->osdep.flash_bus_space_tag = sc->osdep.mem_bus_space_tag; > > + sc->osdep.flash_bus_space_handle = > > sc->osdep.mem_bus_space_handle; > > + sc->osdep.em_flashbase = 0; > > + sc->osdep.em_flashsize = 0; > > + sc->osdep.em_flashoffset = 0xe000; > > + } else if (IS_ICH8(sc->hw.mac_type)) { > > val = pci_conf_read(pa->pa_pc, pa->pa_tag, EM_FLASH); > > if (PCI_MAPREG_TYPE(val) != PCI_MAPREG_TYPE_MEM) { > > printf(": f
Re: Intel i219 network card support
Hi, On Mon, Feb 08, 2016 at 02:35:32PM +, Stuart Henderson wrote: > On 2016/02/08 13:53, Christian Ehrhardt wrote: > > If you own a skylake based onboard NIC, please give this diff a try. > > The most important bit, especially at this point in the release cycle, > is whether it makes any changes that cause problems with other chips. > I don't have time to look closely now but the question is "does > the diff change anything for existing chips, or does it *only* change > things for new chips"? It's not supposed to change anything for existing chips. If it does I'd consider this a bug in the diff. > > Index: dev/pci/pcidevs > > === > > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/pcidevs,v > > retrieving revision 1.1786 > > diff -u -p -r1.1786 pcidevs > > --- dev/pci/pcidevs 30 Jan 2016 01:02:04 - 1.1786 > > +++ dev/pci/pcidevs 4 Feb 2016 16:04:31 - > > @@ -3353,6 +3353,10 @@ product INTEL I218_LM_2 0x15a0 I218-LM > > product INTEL I218_V_2 0x15a1 I218-V > > product INTEL I218_LM_30x15a2 I218-LM > > product INTEL I218_V_3 0x15a3 I218-V > > +product INTEL I219_LM 0x156F I219_LM > > +product INTEL I219_V 0x1570 I219_V > > +product INTEL I219_LM2 0x15B7 I219_LM2 > > +product INTEL I219_V2 0x15B8 I219_V2 > > product INTEL X557_AT2 0x15ad X557-AT2 > > product INTEL CORE5G_H_PCIE_X160x1601 Core 5G PCIE > > product INTEL CORE5G_M_GT1_1 0x1602 HD Graphics > > Please maintain numerical order of pid (0x156f/0x1570 are out of place), > and like the other entries use lowercase for pid and s/_/-/ in the text > version of the device name on the right-hand column. Ok.regards Christian signature.asc Description: Digital signature
Intel i219 network card support
Hi everyone, based on the latest release of the Intel driver for FreeBSD (em-7.5.2), I've adapted the em(4) driver to support the skylake based i219 onboard ethernet chips. I do not have access to relevant specs, everything below is based on reading the Intel code referenced above. The diff was initially done for 5.7, bluhm@ did the forward port to current. The resulting diff is below. The major differences are: - The NVRAM access registers are no longer in a separate BAR. Instead they are at offset 0xe000 in the MMIO bar. As a consequence the BAR access registers must be accessed with 32-Bit memory operations. - There are several quirks where the i219 chips need special handling. The most suspicious one of the quirks is where we need to flush the tx desc ring. We use the pointer list of the tx ring as a send buffer for a dummy packet that must be sent to flush the tx ring. Here's a list of functions that have special handling for the SPT based cards in the uptream intel code that I did not port for various reasons: - The link check code in OpenBSD differs significantly from upstream code. Special cases for SPT based chips in em_check_for_link have been ignored. - e1000_check_for_copper_link_ich8lan has been ignored. - e1000_suspend_workaroudns_ich8lan has been ignored. Suspend does seem to work on my I219_V chip, though. - NVRAM write is not supported for I217 based chips in OpenBSD and the code is missing for I219 chips, too. The patch works nicely on my I219_V chip, the other chips are untested. I may be able to do tests with I219_LM withhin the next few days, though. If you own a skylake based onboard NIC, please give this diff a try. Who should I ask for oks? regardsChristian Index: dev/pci/if_em.c === RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_em.c,v retrieving revision 1.329 diff -u -p -r1.329 if_em.c --- dev/pci/if_em.c 12 Jan 2016 00:05:21 - 1.329 +++ dev/pci/if_em.c 4 Feb 2016 20:33:26 - @@ -145,6 +145,10 @@ const struct pci_matchid em_devices[] = { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I218_V }, { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I218_V_2 }, { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I218_V_3 }, + { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I219_LM }, + { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I219_V }, + { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I219_LM2 }, + { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I219_V2 }, { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_82580_COPPER }, { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_82580_FIBER }, { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_82580_SERDES }, @@ -253,6 +257,9 @@ int em_dma_malloc(struct em_softc *, bu void em_dma_free(struct em_softc *, struct em_dma_alloc *); u_int32_t em_fill_descriptors(u_int64_t address, u_int32_t length, PDESC_ARRAY desc_array); +void em_flush_tx_ring(struct em_softc *); +void em_flush_rx_ring(struct em_softc *); +void em_flush_desc_rings(struct em_softc *); /* * OpenBSD Device Interface Entry Points @@ -435,6 +442,7 @@ em_attach(struct device *parent, struct case em_ich10lan: case em_pch2lan: case em_pch_lpt: + case em_pch_spt: case em_80003es2lan: /* 9K Jumbo Frame size */ sc->hw.max_frame_size = 9234; @@ -844,6 +852,7 @@ em_init(void *arg) case em_pchlan: case em_pch2lan: case em_pch_lpt: + case em_pch_spt: pba = E1000_PBA_26K; break; default: @@ -1506,10 +1515,12 @@ em_stop(void *arg, int softonly) timeout_del(>timer_handle); timeout_del(>tx_fifo_timer_handle); - if (!softonly) { + if (!softonly) em_disable_intr(sc); + if (sc->hw.mac_type == em_pch_spt) + em_flush_desc_rings(sc); + if (!softonly) em_reset_hw(>hw); - } intr_barrier(sc->sc_intrhand); ifq_barrier(>if_snd); @@ -1563,6 +1574,27 @@ em_identify_hardware(struct em_softc *sc sc->hw.phy_init_script = TRUE; } +void +em_legacy_irq_quirk_spt(struct em_softc *sc) +{ + uint32_treg; + + /* Legacy interrupt: SPT needs a quirk. */ + if (sc->hw.mac_type != em_pch_spt) + return; + if (sc->legacy_irq == 0) + return; + + reg = EM_READ_REG(>hw, E1000_FEXTNVM7); + reg |= E1000_FEXTNVM7_SIDE_CLK_UNGATE; + EM_WRITE_REG(>hw, E1000_FEXTNVM7, reg); + + reg = EM_READ_REG(>hw, E1000_FEXTNVM9); + reg |= E1000_FEXTNVM9_IOSFSB_CLKGATE_DIS | + E1000_FEXTNVM9_IOSFSB_CLKREQ_DIS; + EM_WRITE_REG(>hw, E1000_FEXTNVM9, reg); +} + int em_allocate_pci_resources(struct em_softc *sc) { @@ -1617,8 +1649,15 @@
Re: Consistent Kernel Panic-Hardware-Related?
Hi, On Wed, Jul 24, 2013 at 11:52:38AM +0200, Mark Kettenis wrote: Taking this to tech@ in the hope some more people will look into this. Ok. And thanks for picking this up. On Thu, Jul 04, 2013 at 09:56:56AM -0700, Scott Vanderbilt wrote: I've been trying to build userland repeatedly over the past few days on a particular machine and consistently get kernel panics, though never at exactly the same point in the process. The latest occurred midway through 'make obj'. Attempts to build userland on another i386 machine from code pulled via cvs at more or less the same time works fine, so it seems the issue is isolated to this hardware. I initially suspected my SSD had gone bad, so I replaced it with a brand new drive. However, the issue persists, so I no longer suspect the drive. A ps, trace, and dmesg are provided below. This is my first reporting a bug of this nature. I hope I've followed procedure. If not, please do let me know. I'm trying to be useful. :-) - panic: pmap_remove_ptes: managed page without PG_PVLIST for 0x3c001000 Stopped at Debugger+0x4: popl%ebp RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC! DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION! ddb show panic pmap_remove_ptes: managed page without PG_PVLIST for 0x3c001000 ddb trace Debugger(d0963718,f6269e38,d0966be4,f6269e38,d1cf1040) at Debugger+0x4 panic(d0966be4,3c001000,d1ceb16c,f6269e4c,0) at panic+0x5d pmap_remove_ptes(d9e39798,d1cf1040,ffcf,3c00,3c003000) at pmap_remove_p tes+0x142 pmap_do_remove(d9e39798,3c00,3c003000,0,d0ad7820) at pmap_do_remove+0xeb pmap_remove(d9e39798,3c00,3c003000,d056c4e9,d9c68e1c) at pmap_remove+0x27 uvm_unmap_kill_entry(d9e3ad80,d9c68e1c,f6269f2c,d043a597,0) at uvm_unmap_kill_e ntry+0xf8 uvm_map_teardown(d9e3ad80,1,4,d093e66e,d9cc2700) at uvm_map_teardown+0xac uvmspace_free(d9e3ad80,1,1,f6269f6c,d0203009) at uvmspace_free+0x2e uvm_exit(d9cc3ba4,d0a4e0a8,4,d093e66e,0) at uvm_exit+0x15 reaper(d9e33004) at reaper+0x8a Bad frame pointer: 0xd0c3ce68 Can you try to see if the following patch helps? It did for me, when I was debugging a similar panic back in December. However, my explanation why the patch would fix this bug, turned out to be invalid. Still the bug went away. If the same happens for you, some more people should have a look at the patch: --- /mount/blink/aegis/project/gg/history/os/src/sys/arch/i386/i386/pmap.c 2012/10/16 18:31:28 1.117 +++ /mount/blink/aegis/project/gg/history/os/src/sys/arch/i386/i386/pmap.c 2013/01/24 17:20:06 1.118 @@ -495,7 +495,7 @@ pmap_map_ptes(struct pmap *pmap) /* need to load a new alternate pt space into curpmap? */ opde = *APDP_PDE; -#if defined(MULTIPROCESSOR) defined(DIAGNOSTIC) +#if defined(DIAGNOSTIC) if (pmap_valid_entry(opde)) panic(pmap_map_ptes: APTE valid); #endif @@ -521,10 +521,8 @@ pmap_unmap_ptes(struct pmap *pmap) if (pmap_is_curpmap(pmap)) { simple_unlock(pmap-pm_obj.vmobjlock); } else { -#if defined(MULTIPROCESSOR) *APDP_PDE = 0; pmap_apte_flush(); -#endif simple_unlock(pmap-pm_obj.vmobjlock); simple_unlock(curpcb-pcb_pmap-pm_obj.vmobjlock); } Wish somebody with more in-depth knowledge about the i386 pmap implementation would respond :(. I think it is some kind of caching issue. IIRC I've seen at least one case where the condition that is checked by the assertionx turned out to be _not true_ when re-evaluated manually in the debugger. Your diff basically disables an optimization where the alternate pmap is kept around in case we need it again. Not sure how important this optimization is. I guess the primary user of the alternate pmap is the reaper, and keeping the alternate pmap around there could be beneficial if the address space of the process we're reaping is heavily fragmented. fork() also makes use of the alternate pmap. And ptrace based memory access, I guess. There is something fishy with this optimization. *APDP_PDE is never cleared, which means that it becomes stale after the process exits. Presumably we'd notice the next time we try to map an alternate pmap, but if the physical pages for the pmap get recycled, we might not. Yes. Additionally, the reaper is particularly problematic as it is a kernel thread and switching to/from kernel threads omits the TLB-flush that is inherent to a normal process switch. Not quite seeing how this leads to that panic, Exactly. This is the problem with the patch. It most likely fixes a bug. However, the bug being fixed is not sufficient to explain the symptoms seen by the origial poster. but perhaps we should clear *APDP_PDE in pmap_switch()?
Re: Use ACPI to detect secondary PCI root segments on x86
Hi, On Fri, Sep 21, 2012 at 08:34:56PM +0200, Mark Kettenis wrote: Any chance of having a diff with the amd64 bits as well? Included below. Also: + if (pba-pba_busex == NULL) { + printf(cannot attach ACPI discovered busses\n); + return; + } Instead of doing a printf here, I think you should just do a KASSERT(pba-pba_busex != NULL); Done as wall. Updated diff below. regardsChristian Index: arch/amd64/amd64/mainbus.c === RCS file: /cvs/src/sys/arch/amd64/amd64/mainbus.c,v retrieving revision 1.25 diff -u -r1.25 mainbus.c --- arch/amd64/amd64/mainbus.c 19 Sep 2012 23:23:50 - 1.25 +++ arch/amd64/amd64/mainbus.c 24 Sep 2012 11:15:41 - @@ -218,6 +218,9 @@ mba.mba_pba.pba_domain = pci_ndomains++; mba.mba_pba.pba_bus = 0; config_found(self, mba.mba_pba, mainbus_print); +#if NACPI 0 + acpi_pciroots_attach(self, mba.mba_pba, mainbus_print); +#endif } #endif Index: arch/i386/i386/mainbus.c === RCS file: /cvs/src/sys/arch/i386/i386/mainbus.c,v retrieving revision 1.49 diff -u -r1.49 mainbus.c --- arch/i386/i386/mainbus.c19 Sep 2012 23:03:12 - 1.49 +++ arch/i386/i386/mainbus.c24 Sep 2012 11:15:42 - @@ -245,6 +245,9 @@ mba.mba_pba.pba_domain = pci_ndomains++; mba.mba_pba.pba_bus = 0; config_found(self, mba.mba_pba, mainbus_print); +#if NACPI 0 + acpi_pciroots_attach(self, mba.mba_pba, mainbus_print); +#endif } #endif Index: dev/acpi/acpi.c === RCS file: /cvs/src/sys/dev/acpi/acpi.c,v retrieving revision 1.239 diff -u -r1.239 acpi.c --- dev/acpi/acpi.c 7 Sep 2012 19:19:59 - 1.239 +++ dev/acpi/acpi.c 24 Sep 2012 11:15:43 - @@ -394,6 +394,8 @@ TAILQ_HEAD(, acpi_pci) acpi_pcidevs = TAILQ_HEAD_INITIALIZER(acpi_pcidevs); +TAILQ_HEAD(, acpi_pci) acpi_pcirootdevs = +TAILQ_HEAD_INITIALIZER(acpi_pcirootdevs); int acpi_getpci(struct aml_node *node, void *arg); int acpi_getminbus(union acpi_resource *crs, void *arg); @@ -482,6 +484,7 @@ node-pci = pci; dnprintf(10, found PCI root: %s %d\n, aml_nodename(node), pci-bus); + TAILQ_INSERT_TAIL(acpi_pcirootdevs, pci, next); } aml_freevalue(res); return 0; @@ -600,6 +603,23 @@ } return PCI_PMCSR_STATE_D3; +} + +void +acpi_pciroots_attach(struct device *dev, void *aux, cfprint_t pr) +{ + struct acpi_pci *pdev; + struct pcibus_attach_args *pba = aux; + + KASSERT(pba-pba_busex != NULL); + + TAILQ_FOREACH(pdev, acpi_pcirootdevs, next) { + if (extent_alloc_region(pba-pba_busex, pdev-bus, + 1, EX_NOWAIT) != 0) + continue; + pba-pba_bus = pdev-bus; + config_found(dev, pba, pr); + } } void Index: dev/acpi/acpivar.h === RCS file: /cvs/src/sys/dev/acpi/acpivar.h,v retrieving revision 1.72 diff -u -r1.72 acpivar.h --- dev/acpi/acpivar.h 7 Sep 2012 19:19:59 - 1.72 +++ dev/acpi/acpivar.h 24 Sep 2012 11:15:43 - @@ -323,6 +323,8 @@ void acpi_powerdown_task(void *, int); void acpi_sleep_task(void *, int); +void acpi_pciroots_attach(struct device *, void *, cfprint_t); + #endif #endif /* !_ACPI_WAKECODE */
Re: Use ACPI to detect secondary PCI root segments on x86
Hi, thanks to Mark, we have enough PCI bus number accounting, now to allow us to attach ACPI bus numbers via ACPI. A patch to do this is below. regards Christian Index: arch/i386/i386/mainbus.c === RCS file: /cvs/src/sys/arch/i386/i386/mainbus.c,v retrieving revision 1.49 diff -u -r1.49 mainbus.c --- arch/i386/i386/mainbus.c19 Sep 2012 23:03:12 - 1.49 +++ arch/i386/i386/mainbus.c21 Sep 2012 11:36:59 - @@ -245,6 +245,9 @@ mba.mba_pba.pba_domain = pci_ndomains++; mba.mba_pba.pba_bus = 0; config_found(self, mba.mba_pba, mainbus_print); +#if NACPI 0 + acpi_pciroots_attach(self, mba.mba_pba, mainbus_print); +#endif } #endif Index: dev/acpi/acpi.c === RCS file: /cvs/src/sys/dev/acpi/acpi.c,v retrieving revision 1.239 diff -u -r1.239 acpi.c --- dev/acpi/acpi.c 7 Sep 2012 19:19:59 - 1.239 +++ dev/acpi/acpi.c 21 Sep 2012 11:37:01 - @@ -394,6 +394,8 @@ TAILQ_HEAD(, acpi_pci) acpi_pcidevs = TAILQ_HEAD_INITIALIZER(acpi_pcidevs); +TAILQ_HEAD(, acpi_pci) acpi_pcirootdevs = +TAILQ_HEAD_INITIALIZER(acpi_pcirootdevs); int acpi_getpci(struct aml_node *node, void *arg); int acpi_getminbus(union acpi_resource *crs, void *arg); @@ -482,6 +484,7 @@ node-pci = pci; dnprintf(10, found PCI root: %s %d\n, aml_nodename(node), pci-bus); + TAILQ_INSERT_TAIL(acpi_pcirootdevs, pci, next); } aml_freevalue(res); return 0; @@ -600,6 +603,25 @@ } return PCI_PMCSR_STATE_D3; +} + +void +acpi_pciroots_attach(struct device *dev, void *aux, cfprint_t pr) +{ + struct acpi_pci *pdev; + struct pcibus_attach_args *pba = aux; + + if (pba-pba_busex == NULL) { + printf(cannot attach ACPI discovered busses\n); + return; + } + TAILQ_FOREACH(pdev, acpi_pcirootdevs, next) { + if (extent_alloc_region(pba-pba_busex, pdev-bus, + 1, EX_NOWAIT) != 0) + continue; + pba-pba_bus = pdev-bus; + config_found(dev, pba, pr); + } } void Index: dev/acpi/acpivar.h === RCS file: /cvs/src/sys/dev/acpi/acpivar.h,v retrieving revision 1.72 diff -u -r1.72 acpivar.h --- dev/acpi/acpivar.h 7 Sep 2012 19:19:59 - 1.72 +++ dev/acpi/acpivar.h 21 Sep 2012 11:37:01 - @@ -323,6 +323,8 @@ void acpi_powerdown_task(void *, int); void acpi_sleep_task(void *, int); +void acpi_pciroots_attach(struct device *, void *, cfprint_t); + #endif #endif /* !_ACPI_WAKECODE */
Re: Use ACPI to detect secondary PCI root segments on x86
Hi, On Thu, Sep 13, 2012 at 11:00:12PM +0200, Mark Kettenis wrote: However, there might be a different solution: We could track attached busses (not just host bridge busses but all busses) in the pci code (or via MD code and hooks but I think that would be a generic thing) and do nothing in pciattach if a bus is discovered for the second time. With this MD code could use whatever methods it wants to detect otherwise undiscovered host bridges. Possible methods might be magic numbers, hints from ACPI or even a linear scan of all possible bus numbers. Does this sound like a possible solution? It actually got me thinking about proper bus number resource accounting again. Something that's needed for properly hot-plugging pci busses. Below is a diff that does this. With that code, your acpi code can just try to grab the bus number using extent_alloc_region() and only attach a bus if that succeeds. Nice idea! However, I have some doubts about the implementation. In particular you don't pass sc_busex down through bridge devices. This is ok as long as all busses that we _discover_ downstream of that bridge are in fact _physically_ dowstream of that bridge, too. In this case the bus range of the bridge (PPB_BUSINFO_SECONDARY..PPB_BUSINFO_SUBORDINATE) covers all busses. However, host bridges that are handled e.g. in arch/i386/pci/pchb.c can appear downstream of another bridge but use bus numbers that are outside of the upstream bridge's bus number range. The easiest solution would probably be, to make the pcibus_ex an MI thing, i.e. define and allocate it globally in dev/pci/pci.c. regardsChristian
Re: Use ACPI to detect secondary PCI root segments on x86
Hi, On Fri, Sep 14, 2012 at 10:31:05AM +0200, Christian Ehrhardt wrote: On Thu, Sep 13, 2012 at 11:00:12PM +0200, Mark Kettenis wrote: However, there might be a different solution: We could track attached busses (not just host bridge busses but all busses) in the pci code (or via MD code and hooks but I think that would be a generic thing) and do nothing in pciattach if a bus is discovered for the second time. With this MD code could use whatever methods it wants to detect otherwise undiscovered host bridges. Possible methods might be magic numbers, hints from ACPI or even a linear scan of all possible bus numbers. Does this sound like a possible solution? It actually got me thinking about proper bus number resource accounting again. Something that's needed for properly hot-plugging pci busses. Below is a diff that does this. With that code, your acpi code can just try to grab the bus number using extent_alloc_region() and only attach a bus if that succeeds. Nice idea! However, I have some doubts about the implementation. In particular you don't pass sc_busex down through bridge devices. This is ok as long as all busses that we _discover_ downstream of that bridge are in fact _physically_ dowstream of that bridge, too. In this case the bus range of the bridge (PPB_BUSINFO_SECONDARY..PPB_BUSINFO_SUBORDINATE) covers all busses. However, host bridges that are handled e.g. in arch/i386/pci/pchb.c can appear downstream of another bridge but use bus numbers that are outside of the upstream bridge's bus number range. The easiest solution would probably be, to make the pcibus_ex an MI thing, i.e. define and allocate it globally in dev/pci/pci.c. Below is the suggestion for an updated patch that follows this strategy. I'm not entirely confident about the i386/pchb.c part of the patch, though. People that own such hardware would have to verify that all their hardware is discovered properly. This code tries to catch theoretical cases where we have two ACPI discovered host bridges and the one listed _first_ in ACPI is discoverable from the second one via PCI mechanisms (but not vice versa). In this case the host bridge code and not the ACPI code must detect that the second host bridge has already been attached. However, this assumes that host bridges never use a bus numbers that are within the bus range of their upstream bridge. regards Christian Index: arch/i386/i386/mainbus.c === RCS file: /cvs/src/sys/arch/i386/i386/mainbus.c,v retrieving revision 1.48 diff -u -r1.48 mainbus.c --- arch/i386/i386/mainbus.c3 Nov 2010 10:15:23 - 1.48 +++ arch/i386/i386/mainbus.c14 Sep 2012 09:48:34 - @@ -244,6 +244,9 @@ mba.mba_pba.pba_domain = pci_ndomains++; mba.mba_pba.pba_bus = 0; config_found(self, mba.mba_pba, mainbus_print); +#if NACPI 0 + acpi_pciroots_attach(self, mba.mba_pba, mainbus_print); +#endif } #endif Index: arch/i386/pci/pchb.c === RCS file: /cvs/src/sys/arch/i386/pci/pchb.c,v retrieving revision 1.85 diff -u -r1.85 pchb.c --- arch/i386/pci/pchb.c31 Aug 2010 17:13:46 - 1.85 +++ arch/i386/pci/pchb.c14 Sep 2012 09:48:34 - @@ -403,6 +421,8 @@ if (doattach == 0) return; + if (extent_alloc_region(pcibus_ex, pbnum, 1, EX_NOWAIT) != 0) + return; bzero(pba, sizeof(pba)); pba.pba_busname = pci; @@ -505,6 +525,8 @@ pba.pba_domain = pa-pa_domain; pba.pba_bus = AMD64HT_LDT_SEC_BUS_NUM(bus); pba.pba_pc = pa-pa_pc; - config_found(self, pba, pchb_print); + if (extent_alloc_region(pcibus_ex, pba.pba_bus, 1, EX_NOWAIT) + == 0) + config_found(self, pba, pchb_print); } } Index: arch/i386/pci/pci_machdep.c === RCS file: /cvs/src/sys/arch/i386/pci/pci_machdep.c,v retrieving revision 1.68 diff -u -r1.68 pci_machdep.c --- arch/i386/pci/pci_machdep.c 4 Dec 2011 20:08:09 - 1.68 +++ arch/i386/pci/pci_machdep.c 14 Sep 2012 09:48:34 - @@ -921,6 +921,11 @@ extent_alloc_region(pcimem_ex, IOM_BEGIN, IOM_SIZE, EX_CONFLICTOK | EX_NOWAIT); } + + if (pcibus_ex == NULL) { + pcibus_ex = extent_create(pcibus, 0, 0xff, M_DEVBUF, + NULL, 0, EX_NOWAIT); + } } #include acpi.h Index: arch/i386/pci/pci_machdep.h === RCS file: /cvs/src/sys/arch/i386/pci/pci_machdep.h,v retrieving revision 1.23 diff -u -r1.23 pci_machdep.h --- arch/i386/pci/pci_machdep.h 10 Oct 2011 19:42:36 - 1.23 +++ arch/i386/pci
Re: Use ACPI to detect secondary PCI root segments on x86
Hi, On Mon, Sep 10, 2012 at 01:22:35PM +0200, Christian Ehrhardt wrote: Yup. Since we have code to detect additional host bridges some of them may already have been attached. And it would be a good way to defend against ACPI lying to us. However, please keep the ACPI hooks out of the MI PCI code. It'd be better if you used pci_attach_hook(), which lives in arch/pci/pci_machdep.c. Ok, will do that. Updated version of the patch is below. regards Christian Index: arch/i386/i386/mainbus.c === RCS file: /cvs/src/sys/arch/i386/i386/mainbus.c,v retrieving revision 1.48 diff -u -r1.48 mainbus.c --- arch/i386/i386/mainbus.c3 Nov 2010 10:15:23 - 1.48 +++ arch/i386/i386/mainbus.c12 Sep 2012 13:39:19 - @@ -244,6 +244,9 @@ mba.mba_pba.pba_domain = pci_ndomains++; mba.mba_pba.pba_bus = 0; config_found(self, mba.mba_pba, mainbus_print); +#if NACPI 0 + acpi_pciroots_attach(self, mba.mba_pba, mainbus_print); +#endif } #endif Index: arch/i386/pci/pci_machdep.c === RCS file: /cvs/src/sys/arch/i386/pci/pci_machdep.c,v retrieving revision 1.68 diff -u -r1.68 pci_machdep.c --- arch/i386/pci/pci_machdep.c 4 Dec 2011 20:08:09 - 1.68 +++ arch/i386/pci/pci_machdep.c 12 Sep 2012 13:39:19 - @@ -208,6 +208,9 @@ printf(: configuration mode %d, pci_mode); #endif +#if NACPI 0 + acpi_pciroot_match(self, pba-pba_bus); +#endif if (pba-pba_bus != 0) return; Index: dev/acpi/acpi.c === RCS file: /cvs/src/sys/dev/acpi/acpi.c,v retrieving revision 1.238 diff -u -r1.238 acpi.c --- dev/acpi/acpi.c 13 Jul 2012 11:51:41 - 1.238 +++ dev/acpi/acpi.c 12 Sep 2012 13:39:20 - @@ -392,6 +392,8 @@ TAILQ_HEAD(, acpi_pci) acpi_pcidevs = TAILQ_HEAD_INITIALIZER(acpi_pcidevs); +TAILQ_HEAD(, acpi_pci) acpi_pcirootdevs = +TAILQ_HEAD_INITIALIZER(acpi_pcirootdevs); int acpi_getpci(struct aml_node *node, void *arg); int acpi_getminbus(union acpi_resource *crs, void *arg); @@ -480,6 +482,7 @@ node-pci = pci; dnprintf(10, found PCI root: %s %d\n, aml_nodename(node), pci-bus); + TAILQ_INSERT_TAIL(acpi_pcirootdevs, pci, next); } aml_freevalue(res); return 0; @@ -548,6 +551,31 @@ dev-dv_xname, aml_nodename(pdev-node)); pdev-device = dev; } + } +} + +void +acpi_pciroot_match(struct device *dev, int bus) +{ + struct acpi_pci *pdev; + + TAILQ_FOREACH(pdev, acpi_pcirootdevs, next) { + if (pdev-bus == bus) + pdev-device = dev; + } +} + +void +acpi_pciroots_attach(struct device *dev, void *aux, cfprint_t pr) +{ + struct acpi_pci *pdev; + struct pcibus_attach_args *pba = aux; + + TAILQ_FOREACH(pdev, acpi_pcirootdevs, next) { + if (pdev-device) /* Already attached */ + continue; + pba-pba_bus = pdev-bus; + config_found(dev, pba, pr); } } Index: dev/acpi/acpivar.h === RCS file: /cvs/src/sys/dev/acpi/acpivar.h,v retrieving revision 1.71 diff -u -r1.71 acpivar.h --- dev/acpi/acpivar.h 15 Apr 2011 17:34:51 - 1.71 +++ dev/acpi/acpivar.h 12 Sep 2012 13:39:20 - @@ -322,6 +322,9 @@ void acpi_powerdown_task(void *, int); void acpi_sleep_task(void *, int); +void acpi_pciroot_match(struct device *, int); +void acpi_pciroots_attach(struct device *, void *, cfprint_t); + #endif #endif /* !_ACPI_WAKECODE */
Re: Use ACPI to detect secondary PCI root segments on x86
Hi, first of all: Thanks for your reply! On Sat, Sep 08, 2012 at 04:04:41PM +0200, Mark Kettenis wrote: most modern x86 hardware includes more than one PCI root segement. E.g. a hardware that I have here has three PCI root segemnts with bus numbers 0, 0x7f, 0x80 and 0xff respectively. (0x7f and 0xff host the uncore devices of each processor). Well, machines with multiple host bridges have been around for a while. Some of them even pre-date ACPI. These segments are already detected by APCI but this information is not used when attaching PCI busses to the mainbus. Below is a patch that should solve the PCI bus detection problem in a robust way. I hate ACPI! It's full of lies; I don't trust it. Agreed. It is much better to detect these additional PCI busses by looking at actual hardware registers. This approach (unfortunately) means that device detection breaks with each new chipset and I hate that :-) So before we go down this path, can you investigate that? It seems the bus number for the additional Uncore devices of the Xeon E5 1600/2600 CPUs is available in config register 0x108 of device 0:5:0. Yes, I already figured that. It works and I can cook a patch if you want. The patch is not completely straight forward, tough, because the device is not just a bridge but a multi-purpose devices. But I haven't figured how to find the bus number for the second socket. Looking at the AML for this machine and determining how it detects these additional busses might provide some clues. Feel free to send me the acpidump output for this box. I will send you ACPI data in private. However, the second PCI bus is simply hardwired in DSDT, i.e. there does not seem to be code in ACPI that would try to detect its presence. I tend to believe that there is no bridge device that exposes the second PCI bus. There's a small problem though: Some of the secondary PCI segments can show up as busses behind some kind of host bridge device dowstream of pci bus 0, too. These PCI busses would be attached twice, now. Thus we keep track of attached root segemnents in the ACPI code. Yup. Since we have code to detect additional host bridges some of them may already have been attached. And it would be a good way to defend against ACPI lying to us. However, please keep the ACPI hooks out of the MI PCI code. It'd be better if you used pci_attach_hook(), which lives in arch/pci/pci_machdep.c. Ok, will do that. However, there might be a different solution: We could track attached busses (not just host bridge busses but all busses) in the pci code (or via MD code and hooks but I think that would be a generic thing) and do nothing in pciattach if a bus is discovered for the second time. With this MD code could use whatever methods it wants to detect otherwise undiscovered host bridges. Possible methods might be magic numbers, hints from ACPI or even a linear scan of all possible bus numbers. Does this sound like a possible solution? regardsChristian
Re: Use ACPI to detect secondary PCI root segments on x86
Hi again, On Thu, Sep 06, 2012 at 11:24:01AM +0200, Christian Ehrhardt wrote: most modern x86 hardware includes more than one PCI root segement. E.g. a hardware that I have here has four PCI root segemnts with bus numbers 0, 0x7f, 0x80 and 0xff respectively. (0x7f and 0xff host the uncore devices of each processor). These segments are already detected by APCI but this information is not used when attaching PCI busses to the mainbus. Below is a patch that should solve the PCI bus detection problem in a robust way. Below is a diff of the dmesg output before and after the patch. Comments welcome. regardsChristian --- dmesg.before2012-09-07 10:06:49.0 +0200 +++ dmesg.after 2012-09-07 10:10:21.0 +0200 @@ -1,186 +1,300 @@ Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. Copyright (c) 1995-2012 OpenBSD. All rights reserved. http://www.OpenBSD.org OpenBSD 5.2-current (GENERIC.MP) ... cpu0: Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz (GenuineIntel 686-class) 2.91 GHz cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,LONG,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,LAHF real mem = 2118721536 (2020MB) avail mem = 2073182208 (1977MB) mainbus0 at root bios0 at mainbus0: AT/286+ BIOS, date 10/11/11, SMBIOS rev. 2.7 @ 0xeb2f0 (65 entries) bios0: vendor American Megatrends Inc. version 1.0a date 06/05/2012 bios0: Supermicro X9DRH-7TF/7F/iTF/iF acpi0 at bios0: rev 2 acpi0: sleep states S0 S1 S4 S5 acpi0: tables DSDT FACP APIC MCFG PRAD SRAT SLIT HPET SPMI SSDT SPCR BGRT DMAR acpi0: wakeup devices BR20(S1) EUSB(S4) USBE(S4) NPE1(S4) NPE2(S4) NPE3(S4) NPE4(S4) NPE5(S4) NPE6(S4) NPE7(S4) NPE8(S4) NPE9(S4) I35P(S4) I35I(S4) X54P(S4) X54I(S4) NPEA(S4) SLPB(S0) NPE1(S4) NPE2(S4) NPE3(S4) NPE4(S4) NPE5(S4) NPE6(S4) NPE7(S4) NPE8(S4) NPE9(S4) NPEA(S4) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: apic clock running at 100MHz cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz (GenuineIntel 686-class) 2.91 GHz cpu1: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,LONG,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,LAHF cpu2 at mainbus0: apid 4 (application processor) cpu2: Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz (GenuineIntel 686-class) 2.91 GHz cpu2: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,LONG,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,LAHF cpu3 at mainbus0: apid 6 (application processor) cpu3: Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz (GenuineIntel 686-class) 2.91 GHz cpu3: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,LONG,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,LAHF cpu4 at mainbus0: apid 8 (application processor) cpu4: Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz (GenuineIntel 686-class) 2.91 GHz cpu4: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,LONG,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,LAHF cpu5 at mainbus0: apid 10 (application processor) cpu5: Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz (GenuineIntel 686-class) 2.91 GHz cpu5: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,LONG,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,LAHF cpu6 at mainbus0: apid 32 (application processor) cpu6: Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz (GenuineIntel 686-class) 2.91 GHz cpu6: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,LONG,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,LAHF cpu7 at mainbus0: apid 34 (application processor) cpu7: Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz (GenuineIntel 686-class) 2.91 GHz cpu7: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,LONG,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA
Use ACPI to detect secondary PCI root segments on x86
Hi, most modern x86 hardware includes more than one PCI root segement. E.g. a hardware that I have here has three PCI root segemnts with bus numbers 0, 0x7f, 0x80 and 0xff respectively. (0x7f and 0xff host the uncore devices of each processor). These segments are already detected by APCI but this information is not used when attaching PCI busses to the mainbus. Below is a patch that should solve the PCI bus detection problem in a robust way. There's a small problem though: Some of the secondary PCI segments can show up as busses behind some kind of host bridge device dowstream of pci bus 0, too. These PCI busses would be attached twice, now. Thus we keep track of attached root segemnents in the ACPI code. NOTE: Patch is currently i386 only but almost everything is MI. regards Christian Index: arch/i386/i386/mainbus.c === RCS file: /cvs/src/sys/arch/i386/i386/mainbus.c,v retrieving revision 1.48 diff -u -r1.48 mainbus.c --- arch/i386/i386/mainbus.c3 Nov 2010 10:15:23 - 1.48 +++ arch/i386/i386/mainbus.c6 Sep 2012 08:58:38 - @@ -244,6 +244,9 @@ mba.mba_pba.pba_domain = pci_ndomains++; mba.mba_pba.pba_bus = 0; config_found(self, mba.mba_pba, mainbus_print); +#if NACPI 0 + acpi_pciroots_attach(self, mba.mba_pba, mainbus_print); +#endif } #endif Index: dev/acpi/acpi.c === RCS file: /cvs/src/sys/dev/acpi/acpi.c,v retrieving revision 1.238 diff -u -r1.238 acpi.c --- dev/acpi/acpi.c 13 Jul 2012 11:51:41 - 1.238 +++ dev/acpi/acpi.c 6 Sep 2012 08:58:39 - @@ -392,6 +392,8 @@ TAILQ_HEAD(, acpi_pci) acpi_pcidevs = TAILQ_HEAD_INITIALIZER(acpi_pcidevs); +TAILQ_HEAD(, acpi_pci) acpi_pcirootdevs = +TAILQ_HEAD_INITIALIZER(acpi_pcirootdevs); int acpi_getpci(struct aml_node *node, void *arg); int acpi_getminbus(union acpi_resource *crs, void *arg); @@ -480,6 +482,7 @@ node-pci = pci; dnprintf(10, found PCI root: %s %d\n, aml_nodename(node), pci-bus); + TAILQ_INSERT_TAIL(acpi_pcirootdevs, pci, next); } aml_freevalue(res); return 0; @@ -548,6 +551,31 @@ dev-dv_xname, aml_nodename(pdev-node)); pdev-device = dev; } + } +} + +void +acpi_pciroot_match(struct device *dev, int bus) +{ + struct acpi_pci *pdev; + + TAILQ_FOREACH(pdev, acpi_pcirootdevs, next) { + if (pdev-bus == bus) + pdev-device = dev; + } +} + +void +acpi_pciroots_attach(struct device *dev, void *aux, cfprint_t pr) +{ + struct acpi_pci *pdev; + struct pcibus_attach_args *pba = aux; + + TAILQ_FOREACH(pdev, acpi_pcirootdevs, next) { + if (pdev-device) /* Already attached */ + continue; + pba-pba_bus = pdev-bus; + config_found(dev, pba, pr); } } Index: dev/acpi/acpivar.h === RCS file: /cvs/src/sys/dev/acpi/acpivar.h,v retrieving revision 1.71 diff -u -r1.71 acpivar.h --- dev/acpi/acpivar.h 15 Apr 2011 17:34:51 - 1.71 +++ dev/acpi/acpivar.h 6 Sep 2012 08:58:39 - @@ -322,6 +322,9 @@ void acpi_powerdown_task(void *, int); void acpi_sleep_task(void *, int); +void acpi_pciroot_match(struct device *, int); +void acpi_pciroots_attach(struct device *, void *, cfprint_t); + #endif #endif /* !_ACPI_WAKECODE */ Index: dev/pci/pci.c === RCS file: /cvs/src/sys/dev/pci/pci.c,v retrieving revision 1.94 diff -u -r1.94 pci.c --- dev/pci/pci.c 10 Oct 2011 19:42:37 - 1.94 +++ dev/pci/pci.c 6 Sep 2012 08:58:39 - @@ -46,6 +46,12 @@ #include dev/pci/pcidevs.h #include dev/pci/ppbreg.h +#include acpi.h + +#if NACPI 0 +#include dev/acpi/acpivar.h +#endif + int pcimatch(struct device *, void *, void *); void pciattach(struct device *, struct device *, void *); int pcidetach(struct device *, int); @@ -187,6 +193,9 @@ if (pci_enumerate_bus(sc, pci_primary_vga, NULL)) pci_vga_pci = sc; pci_enumerate_bus(sc, NULL, NULL); +#if NACPI 0 + acpi_pciroot_match(self, pba-pba_bus); +#endif } int
ix(4) driver Bugfix
Hi! the ix(4)-Driver stores a pointer to the pci_attach_args structure (as passed to ixgbe_attach) in its softc. However, the pci_attach_args structure lives on the stack somewhere in autoconf, i.e. its contents are undefined after ixgbe_attach returns. However, the pointer to that data remains in ix_softc and is used at least in ixgbe_detach. The diff below changes this by storing a copy of the pci_attach_args in ix_softc. Tested (with a slightly modified source tree) on x86. regardsChristian Index: if_ix.c === RCS file: /cvs/src/sys/dev/pci/if_ix.c,v retrieving revision 1.59 diff -u -p -r1.59 if_ix.c --- if_ix.c 13 Jan 2012 09:38:23 - 1.59 +++ if_ix.c 20 Jan 2012 14:02:04 - @@ -209,7 +209,7 @@ ixgbe_attach(struct device *parent, stru INIT_DEBUGOUT(ixgbe_attach: begin); sc-osdep.os_sc = sc; - sc-osdep.os_pa = pa; + sc-osdep.os_pa = *pa; /* Core Lock Init*/ mtx_init(sc-core_mtx, IPL_NET); @@ -1400,7 +1400,7 @@ void ixgbe_identify_hardware(struct ix_softc *sc) { struct ixgbe_osdep *os = sc-osdep; - struct pci_attach_args *pa = os-os_pa; + struct pci_attach_args *pa = os-os_pa; uint32_t reg; /* Save off the information about this board */ @@ -1534,7 +1534,7 @@ ixgbe_allocate_legacy(struct ix_softc *s { struct ifnet*ifp = sc-arpcom.ac_if; struct ixgbe_osdep *os = sc-osdep; - struct pci_attach_args *pa = os-os_pa; + struct pci_attach_args *pa = os-os_pa; const char *intrstr = NULL; pci_chipset_tag_t pc = pa-pa_pc; pci_intr_handle_t ih; @@ -1576,7 +1576,7 @@ int ixgbe_allocate_pci_resources(struct ix_softc *sc) { struct ixgbe_osdep *os = sc-osdep; - struct pci_attach_args *pa = os-os_pa; + struct pci_attach_args *pa = os-os_pa; int val; val = pci_conf_read(pa-pa_pc, pa-pa_tag, PCIR_BAR(0)); @@ -1609,7 +1609,7 @@ void ixgbe_free_pci_resources(struct ix_softc * sc) { struct ixgbe_osdep *os = sc-osdep; - struct pci_attach_args *pa = os-os_pa; + struct pci_attach_args *pa = os-os_pa; struct ix_queue *que = sc-queues; int i; @@ -1749,7 +1749,7 @@ ixgbe_dma_malloc(struct ix_softc *sc, bu struct ixgbe_osdep *os = sc-osdep; int r; - dma-dma_tag = os-os_pa-pa_dmat; + dma-dma_tag = os-os_pa.pa_dmat; r = bus_dmamap_create(dma-dma_tag, size, 1, size, 0, BUS_DMA_NOWAIT, dma-dma_map); if (r != 0) { @@ -3330,7 +3330,7 @@ ixgbe_read_pci_cfg(struct ixgbe_hw *hw, high = 1; reg = ~0x2; } - pa = ((struct ixgbe_osdep *)hw-back)-os_pa; + pa = ((struct ixgbe_osdep *)hw-back)-os_pa; value = pci_conf_read(pa-pa_pc, pa-pa_tag, reg); if (high) @@ -3351,7 +3351,7 @@ ixgbe_write_pci_cfg(struct ixgbe_hw *hw, high = 1; reg = ~0x2; } - pa = ((struct ixgbe_osdep *)hw-back)-os_pa; + pa = ((struct ixgbe_osdep *)hw-back)-os_pa; rv = pci_conf_read(pa-pa_pc, pa-pa_tag, reg); if (!high) rv = (rv 0x) | value; Index: ixgbe.h === RCS file: /cvs/src/sys/dev/pci/ixgbe.h,v retrieving revision 1.7 diff -u -p -r1.7 ixgbe.h --- ixgbe.h 10 Jun 2011 12:46:35 - 1.7 +++ ixgbe.h 20 Jan 2012 14:02:04 - @@ -127,7 +127,7 @@ struct ixgbe_osdep { bus_addr_t os_membase; void*os_sc; - struct pci_attach_args *os_pa; + struct pci_attach_args os_pa; }; extern uint16_t ixgbe_read_pci_cfg(struct ixgbe_hw *, uint32_t);
Fix page deactivation in uvm_vnode.c when clustering
Hi, while reading through uvm code I stubled accross a piece of code that appears to be buggy. Here's the proposed patch, rational follows: Index: uvm_vnode.c === RCS file: /cvs/src/sys/uvm/uvm_vnode.c,v retrieving revision 1.71 diff -u -r1.71 uvm_vnode.c --- uvm_vnode.c 18 May 2010 04:41:14 - 1.71 +++ uvm_vnode.c 7 Jun 2011 16:42:15 - @@ -924,8 +924,8 @@ */ if (flags PGO_DEACTIVATE) { - if ((pp-pg_flags PQ_INACTIVE) == 0 - pp-wire_count == 0) { + if ((ptmp-pg_flags PQ_INACTIVE) == 0 + ptmp-wire_count == 0) { pmap_page_protect(ptmp, VM_PROT_NONE); uvm_pagedeactivate(ptmp); } This code handles (among other things) write back of dirty pages to disk. It calls uvm_pager_put to do the actual IO (one call per page). To improve performance, uvm_pager_put is allowed to do IO for adjacent pages as well and returns an array of all pages that must be unbusied by the caller (uvn_flush). The code above is part of the loop that does this. pp is the initial page passed to uvm_pager_put and ptmp is the page that we are about to unbusy. Thus we should IMHO check the PQ_INACTIVE flag and the wire_count of ptmp instead of pp. Coments appreciated! regardsChristian
Re: Remove useless sti in x86 interrupt return path
Hi, On Mon, Apr 18, 2011 at 10:00:02PM -0700, Philip Guenther wrote: The sti was introduced in revision 1.97 of locore.s in March 2006 by mickey@. Commit message: | prevent the faults on iret to run w/ disabled intrs and cause | deadlocks; niklas toby tom ok Maybe mickey or one of the people giving oks back then want to comment? That predates my involvement with OpenBSD, but I think get the gist. iret can fault if, for example, the user corrupts certain members of the sigcontext passed to a signal handler. If that happens the the CPU generates an exception on the iret and you end up in trap() where it peeks at the instruction that triggered the trap and, seeing it was iret, arranges to 'return' to the resume_iret label to generate what looks like a normal fault against the process. The goal of that sti is to keep that 'normal' fault from being processed with interrupts blocked. Point taken, I missed the sigreturn path. However, note that iret can only fault if the values of the segment registers stored on the kernel stack are bogus. In particular iret will not fault on the new EIP address. (The address can page fault but the fault will be on the instruction pointed to by the new CS:EIP and not in iret). With that in mind, the question is: Is it really a problem if the trap handler runs with interrupts disabled? See below. So, possibility one: sti's effect is delayed to the end of next instruction. This is true, the effect of sti is delayed by one instruction. This is documented in the intel manual. What happens if you put it immediately before an iret that faults? If the flags pushed on the stack by the fault have interrupts unblocked, then simply moving the sti down a line to between the addl and the iret would be enough. If you can reproduce this situation easily than I would ask that you try that. This solves the stack overflow that I see, but it results in inconsistent code. See below. Possibility two would be to try to handle this from, uh, inside resume_iret I guess. I'm not 100% sure whether it can be unilateral there though. This would be my preferred solution. In particular, I see two possibilities: * Maybe it is not even a problem is irqs are sometimes disabled during trap. * It is a problem if trap runs with irqs disabled. In this case, userland can easily trigger this by trapping on one of the pop instructions for ds, es, fs or gs. Those instructions in the interrupt return path run with irqs disabled. Thus moving the sti in the interupt return path down one instruction immediately before the iret does not really help. The proper solution would probably be to add explicit calls to sti in all of the resume paths (or even explicitly before calling trap). regardsChristian
Remove useless sti in x86 interrupt return path
Hi, we are seeing kernel hangs (hard reset required) under high interrupt load (triggered by ix(4) cards). I've traced this back to an overflow of the kernel stack. This is how the stack looks at the time of the overflow (manual backtrace, as ddb won't show the final part): EBP (%EBP) 4(%EBP) 0xe13b825c: 0xe13b827c 0xd02d8b1d panic 0xe13b827c: 0xe13b82ac 0xd02d1a42 timeout_add 0xe13b82ac: 0xe13b82dc 0xd02bfd6e hardclock 0xe13b82dc: 0xe13b82fc 0xd04dddca lapic_clockintr 0xe13b82fc: 0xe13b8304 0xd0202475 Xintrltimer /* OLD EBP at ebp+0x20(9 words) OLD EIP at ebp+0x3c (15words) */ EBP=0xe13b8384 EIP=0xd02d7c31 pool_do_put 0xe13b8384: 0xe13b83a4 0xd02d7b6f pool_put 0xe13b83a4: 0xe13b83d4 0xd02e7f26 m_free 0xe13b83d4: 0xe13b83f4 0xd02e7fa9 m_freem 0xe13b83f4: 0xe13b8434 0xd030a492 ether_input 0xe13b8434: 0xe13b8474 0xd043f2cf ixgbe_rxeof 0xe13b8474: 0xe13b84b4 0xd043c7ed ixgbe_legacy_irq 0xe13b84b4: 0xe13b84bc 0xd02027ee Xintr_ioapic1 EBP=0xe13b9e10 EIP=d0202128Xdoreti (before iret) 0xe13b9e10: 0xe13b9e40 0xd028bd27 pf_pull_hdr 0xe13b9e40: 0xe13b9f40 0xd028d570 pf_test 0xe13b9f40: 0xe13b9f80 0xd03478b4 ipv4_input 0xe13b9f80: 0xe13b9fa0 0xd0347765 ipintr 0xe13b9fa0: 0xe13b9fa0 0xd0202182 Xsoftnet Note that the size of the stack frame between Xintr_ioapic1 and pf_pull_hdr is huge (~6k). This area is filled with a 12 byte pattern that looks like a code address, the kernel code segment and a pushed eflags register. The code address points to the interrupt return path in this piece of code from i386/locore.s: #define INTRFASTEXIT \ popl%fs ; \ popl%gs ; \ popl%es ; \ popl%ds ; \ popl%edi; \ popl%esi; \ popl%ebp; \ popl%ebx; \ popl%edx; \ popl%ecx; \ popl%eax; \ sti ; \= (1) addl$8,%esp ; \ iret = (2) The return address points to the iret marked with (2). I.e. we get hit by an interrupt immediately before the iret and this happens repeatedly until the kernel stack overflows. This is only possible due to the sti instruction at the point marked (1). As iret will restore the pushed eflags value anyway, it should be safe to remove the sti altogether. Discussed with bluhm@ and hshoexer@. Patch follows: Index: locore.s === RCS file: /cvs/src/sys/arch/i386/i386/locore.s,v retrieving revision 1.130 diff -u -r1.130 locore.s --- locore.s3 Jul 2010 04:54:32 - 1.130 +++ locore.s18 Apr 2011 13:52:16 - @@ -128,7 +128,6 @@ popl%edx; \ popl%ecx; \ popl%eax; \ - sti ; \ addl$8,%esp ; \ iret The sti was introduced in revision 1.97 of locore.s in March 2006 by mickey@. Commit message: | prevent the faults on iret to run w/ disabled intrs and cause | deadlocks; niklas toby tom ok Maybe mickey or one of the people giving oks back then want to comment? regards Christian