Fix GNUism in bsd.dep.mk

2021-12-31 Thread Christian Ehrhardt
   
Hi,

Here at genua, trying to build libpcap sometimes breaks in
libpcap with the following error message:

| Using $< in a non-suffix rule context is a GNUmake idiom \
|(/data/git/ehrhardt/genuos/os/mk/bsd.dep.mk:47)

The bug is in bsd.dep.mk where ${.IMPSRC} (aka $<) is used
in a context where it is not neccessarily defined by OpenBSD make.
 
Below is a diff to Use ${.ALLSRC:M*.y} instead of ${.IMPSRC} as
the input file for yacc.

The issue is with the rule for the grammar.h file that is generated
by yacc from grammar.c. You can easily reproduce the bug with the
following steps:
- build libpcap from scratch: cd .../src/lib/libpcap && make clean all
- remove the generated grammar.h file: rm obj*/grammar.h
- build libpcap again (incremental build): make
In normal builds this does not trigger as grammar.h is implicitly
generated by the rule for grammar.c and when make checks for
dependencies it simply finds grammar.h uptodate. However,
incremental or parallel builds might decide to make grammar.h
from grammar.y.

Now, why is this only a problem for grammar.h but not for
grammar.c? The answer to this question is burried deeply in
OpenBSD's mk files.

The snippet in bsd.dep.mk that triggers the error is a single
rule statement that generates foo.c and foo.h from foo.y with a
call to yacc -d. The rule is generated with a loop, i.e. it is not
a prefix rule. However, a prefix rule context is required for
the use of ${.IMPSRC} aka $<. For the .c file such a prefix
rule is provided by bsd.sys.mk and this rule is in scope when
make evaluates the yacc rule. However, for .h file generation
from a .y file there is no such prefix rule defined in any of
the Makefiles. Even if it were the .h suffix is missing from
.SUFFIXES and the rule would not be considered.

NOTE: The obvious way to fix this would be to use $f instead of
${.IMPSRC}. However, this does not work as $f is then missing
the path prefix and yacc won't find it if an obj directory is
used. This is probably the reason for the use of ${.IMPSRC} in
the first place.

 regards   Christian

diff --git a/os/mk/bsd.dep.mk b/os/mk/bsd.dep.mk
index 4bd8bf7778..77ae4f 100644
--- a/os/mk/bsd.dep.mk
+++ b/os/mk/bsd.dep.mk
@@ -44,7 +44,7 @@ ${i:R:S/$/.o/} ${i:R:S/$/.po/} ${i:R:S/$/.so/} 
${i:R:S/$/.do/}: $i
 # loop may not trigger
 .  for f in ${SRCS:M*.y}   
 ${f:.y=.c} ${f:.y=.h}: $f
-   ${YACC.y} -o ${f:.y=.c} ${.IMPSRC}
+   ${YACC.y} -o ${f:.y=.c} ${.ALLSRC:M*.y}
 .  endfor
 CLEANFILES += ${SRCS:M*.y:.y=.h}
 .endif


smime.p7s
Description: S/MIME cryptographic signature


Re: Intel i219 network card support

2016-02-15 Thread Christian Ehrhardt

Hi,

before I post the next iteration of the patch, I'd like
some more advice on some of the points you commented (inline):

On Sun, Feb 14, 2016 at 03:34:19PM +1100, Jonathan Gray wrote:
> On Mon, Feb 08, 2016 at 01:53:20PM +0100, Christian Ehrhardt wrote:
> > retrieving revision 1.329
> > diff -u -p -r1.329 if_em.c
> > --- dev/pci/if_em.c 12 Jan 2016 00:05:21 -  1.329
> > +++ dev/pci/if_em.c 4 Feb 2016 20:33:26 -
> > @@ -145,6 +145,10 @@ const struct pci_matchid em_devices[] = 
> > { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I218_V },
> > { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I218_V_2 },
> > { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I218_V_3 },
> > +   { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I219_LM },
> > +   { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I219_V },
> > +   { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I219_LM2 },
> > +   { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I219_V2 },
> > { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_82580_COPPER },
> > { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_82580_FIBER },
> > { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_82580_SERDES },
> > @@ -253,6 +257,9 @@ int  em_dma_malloc(struct em_softc *, bu
> >  void em_dma_free(struct em_softc *, struct em_dma_alloc *);
> >  u_int32_t em_fill_descriptors(u_int64_t address, u_int32_t length,
> >   PDESC_ARRAY desc_array);
> > +void em_flush_tx_ring(struct em_softc *);
> > +void em_flush_rx_ring(struct em_softc *);
> > +void em_flush_desc_rings(struct em_softc *);
> >  
> >  /*
> >   *  OpenBSD Device Interface Entry Points
> > @@ -435,6 +442,7 @@ em_attach(struct device *parent, struct 
> > case em_ich10lan:
> > case em_pch2lan:
> > case em_pch_lpt:
> > +   case em_pch_spt:
> > case em_80003es2lan:
> > /* 9K Jumbo Frame size */
> > sc->hw.max_frame_size = 9234;
> > @@ -844,6 +852,7 @@ em_init(void *arg)
> > case em_pchlan:
> > case em_pch2lan:
> > case em_pch_lpt:
> > +   case em_pch_spt:
> > pba = E1000_PBA_26K;
> > break;
> > default:
> > @@ -1506,10 +1515,12 @@ em_stop(void *arg, int softonly)
> > timeout_del(>timer_handle);
> > timeout_del(>tx_fifo_timer_handle);
> >  
> > -   if (!softonly) {
> > +   if (!softonly)
> > em_disable_intr(sc);
> > +   if (sc->hw.mac_type == em_pch_spt)
> > +   em_flush_desc_rings(sc);
> > +   if (!softonly)
> > em_reset_hw(>hw);
> > -   }
> >  
> > intr_barrier(sc->sc_intrhand);
> > ifq_barrier(>if_snd);
> > @@ -1563,6 +1574,27 @@ em_identify_hardware(struct em_softc *sc
> > sc->hw.phy_init_script = TRUE;
> >  }
> >  
> > +void
> > +em_legacy_irq_quirk_spt(struct em_softc *sc)
> > +{
> > +   uint32_treg;
> > +
> > +   /* Legacy interrupt: SPT needs a quirk. */
> > +   if (sc->hw.mac_type != em_pch_spt)
> > +   return;
> > +   if (sc->legacy_irq == 0)
> > +   return;
> > +
> > +   reg = EM_READ_REG(>hw, E1000_FEXTNVM7);
> > +   reg |= E1000_FEXTNVM7_SIDE_CLK_UNGATE;
> > +   EM_WRITE_REG(>hw, E1000_FEXTNVM7, reg);
> > +
> > +   reg = EM_READ_REG(>hw, E1000_FEXTNVM9);
> > +   reg |= E1000_FEXTNVM9_IOSFSB_CLKGATE_DIS |
> > +   E1000_FEXTNVM9_IOSFSB_CLKREQ_DIS;
> > +   EM_WRITE_REG(>hw, E1000_FEXTNVM9, reg);
> > +}
> > +
> >  int
> >  em_allocate_pci_resources(struct em_softc *sc)
> >  {
> > @@ -1617,8 +1649,15 @@ em_allocate_pci_resources(struct em_soft
> > break;
> > }
> >  
> > +   sc->osdep.em_flashoffset = 0;
> > /* for ICH8 and family we need to find the flash memory */
> > -   if (IS_ICH8(sc->hw.mac_type)) {
> > +   if (sc->hw.mac_type == em_pch_spt) {
> > +   sc->osdep.flash_bus_space_tag = sc->osdep.mem_bus_space_tag;
> > +   sc->osdep.flash_bus_space_handle = 
> > sc->osdep.mem_bus_space_handle;
> > +   sc->osdep.em_flashbase = 0;
> > +   sc->osdep.em_flashsize = 0;
> > +   sc->osdep.em_flashoffset = 0xe000;
> > +   } else if (IS_ICH8(sc->hw.mac_type)) {
> > val = pci_conf_read(pa->pa_pc, pa->pa_tag, EM_FLASH);
> > if (PCI_MAPREG_TYPE(val) != PCI_MAPREG_TYPE_MEM) {
> > printf(": f

Re: Intel i219 network card support

2016-02-08 Thread Christian Ehrhardt

Hi,

On Mon, Feb 08, 2016 at 02:35:32PM +, Stuart Henderson wrote:
> On 2016/02/08 13:53, Christian Ehrhardt wrote:
> > If you own a skylake based onboard NIC, please give this diff a try.
> 
> The most important bit, especially at this point in the release cycle,
> is whether it makes any changes that cause problems with other chips.
> I don't have time to look closely now but the question is "does
> the diff change anything for existing chips, or does it *only* change
> things for new chips"?

It's not supposed to change anything for existing chips. If it does
I'd consider this a bug in the diff.

> > Index: dev/pci/pcidevs
> > ===
> > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/pcidevs,v
> > retrieving revision 1.1786
> > diff -u -p -r1.1786 pcidevs
> > --- dev/pci/pcidevs 30 Jan 2016 01:02:04 -  1.1786
> > +++ dev/pci/pcidevs 4 Feb 2016 16:04:31 -
> > @@ -3353,6 +3353,10 @@ product INTEL I218_LM_2  0x15a0  I218-LM
> >  product INTEL I218_V_2 0x15a1  I218-V
> >  product INTEL I218_LM_30x15a2  I218-LM
> >  product INTEL I218_V_3 0x15a3  I218-V
> > +product INTEL I219_LM  0x156F  I219_LM
> > +product INTEL I219_V   0x1570  I219_V
> > +product INTEL I219_LM2 0x15B7  I219_LM2
> > +product INTEL I219_V2  0x15B8  I219_V2
> >  product INTEL X557_AT2 0x15ad  X557-AT2
> >  product INTEL CORE5G_H_PCIE_X160x1601  Core 5G PCIE
> >  product INTEL CORE5G_M_GT1_1   0x1602  HD Graphics
> 
> Please maintain numerical order of pid (0x156f/0x1570 are out of place),
> and like the other entries use lowercase for pid and s/_/-/ in the text
> version of the device name on the right-hand column.

Ok.regards   Christian



signature.asc
Description: Digital signature


Intel i219 network card support

2016-02-08 Thread Christian Ehrhardt

Hi everyone,

based on the latest release of the Intel driver for FreeBSD (em-7.5.2),
I've adapted the em(4) driver to support the skylake based i219
onboard ethernet chips.

I do not have access to relevant specs, everything below is
based on reading the Intel code referenced above.

The diff was initially done for 5.7, bluhm@ did the forward port
to current. The resulting diff is below.

The major differences are:
- The NVRAM access registers are no longer in a separate BAR. Instead
  they are at offset 0xe000 in the MMIO bar. As a consequence the
  BAR access registers must be accessed with 32-Bit memory operations.
- There are several quirks where the i219 chips need special handling.

The most suspicious one of the quirks is where we need to flush the
tx desc ring. We use the pointer list of the tx ring as a send buffer
for a dummy packet that must be sent to flush the tx ring.

Here's a list of functions that have special handling for the
SPT based cards in the uptream intel code that I did not port
for various reasons:
- The link check code in OpenBSD differs significantly from upstream
  code. Special cases for SPT based chips in em_check_for_link have
  been ignored.
- e1000_check_for_copper_link_ich8lan has been ignored.
- e1000_suspend_workaroudns_ich8lan has been ignored. Suspend does
  seem to work on my I219_V chip, though.
- NVRAM write is not supported for I217 based chips in OpenBSD and
  the code is missing for I219 chips, too.

The patch works nicely on my I219_V chip, the other chips are
untested. I may be able to do tests with I219_LM withhin the next
few days, though.

If you own a skylake based onboard NIC, please give this diff a try.

Who should I ask for oks?

 regardsChristian

Index: dev/pci/if_em.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_em.c,v
retrieving revision 1.329
diff -u -p -r1.329 if_em.c
--- dev/pci/if_em.c 12 Jan 2016 00:05:21 -  1.329
+++ dev/pci/if_em.c 4 Feb 2016 20:33:26 -
@@ -145,6 +145,10 @@ const struct pci_matchid em_devices[] = 
{ PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I218_V },
{ PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I218_V_2 },
{ PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I218_V_3 },
+   { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I219_LM },
+   { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I219_V },
+   { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I219_LM2 },
+   { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_I219_V2 },
{ PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_82580_COPPER },
{ PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_82580_FIBER },
{ PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_82580_SERDES },
@@ -253,6 +257,9 @@ int  em_dma_malloc(struct em_softc *, bu
 void em_dma_free(struct em_softc *, struct em_dma_alloc *);
 u_int32_t em_fill_descriptors(u_int64_t address, u_int32_t length,
  PDESC_ARRAY desc_array);
+void em_flush_tx_ring(struct em_softc *);
+void em_flush_rx_ring(struct em_softc *);
+void em_flush_desc_rings(struct em_softc *);
 
 /*
  *  OpenBSD Device Interface Entry Points
@@ -435,6 +442,7 @@ em_attach(struct device *parent, struct 
case em_ich10lan:
case em_pch2lan:
case em_pch_lpt:
+   case em_pch_spt:
case em_80003es2lan:
/* 9K Jumbo Frame size */
sc->hw.max_frame_size = 9234;
@@ -844,6 +852,7 @@ em_init(void *arg)
case em_pchlan:
case em_pch2lan:
case em_pch_lpt:
+   case em_pch_spt:
pba = E1000_PBA_26K;
break;
default:
@@ -1506,10 +1515,12 @@ em_stop(void *arg, int softonly)
timeout_del(>timer_handle);
timeout_del(>tx_fifo_timer_handle);
 
-   if (!softonly) {
+   if (!softonly)
em_disable_intr(sc);
+   if (sc->hw.mac_type == em_pch_spt)
+   em_flush_desc_rings(sc);
+   if (!softonly)
em_reset_hw(>hw);
-   }
 
intr_barrier(sc->sc_intrhand);
ifq_barrier(>if_snd);
@@ -1563,6 +1574,27 @@ em_identify_hardware(struct em_softc *sc
sc->hw.phy_init_script = TRUE;
 }
 
+void
+em_legacy_irq_quirk_spt(struct em_softc *sc)
+{
+   uint32_treg;
+
+   /* Legacy interrupt: SPT needs a quirk. */
+   if (sc->hw.mac_type != em_pch_spt)
+   return;
+   if (sc->legacy_irq == 0)
+   return;
+
+   reg = EM_READ_REG(>hw, E1000_FEXTNVM7);
+   reg |= E1000_FEXTNVM7_SIDE_CLK_UNGATE;
+   EM_WRITE_REG(>hw, E1000_FEXTNVM7, reg);
+
+   reg = EM_READ_REG(>hw, E1000_FEXTNVM9);
+   reg |= E1000_FEXTNVM9_IOSFSB_CLKGATE_DIS |
+   E1000_FEXTNVM9_IOSFSB_CLKREQ_DIS;
+   EM_WRITE_REG(>hw, E1000_FEXTNVM9, reg);
+}
+
 int
 em_allocate_pci_resources(struct em_softc *sc)
 {
@@ -1617,8 +1649,15 @@ 

Re: Consistent Kernel Panic-Hardware-Related?

2013-07-24 Thread Christian Ehrhardt

Hi,

On Wed, Jul 24, 2013 at 11:52:38AM +0200, Mark Kettenis wrote:
 Taking this to tech@ in the hope some more people will look into this.

Ok. And thanks for picking this up.

  On Thu, Jul 04, 2013 at 09:56:56AM -0700, Scott Vanderbilt wrote:
   I've been trying to build userland repeatedly over the past few days
   on a particular machine and consistently get kernel panics, though
   never at exactly the same point in the process. The latest occurred
   midway through 'make obj'. Attempts to build userland on another
   i386 machine from code pulled via cvs at more or less the same time
   works fine, so it seems the issue is isolated to this hardware.
  
   I initially suspected my SSD had gone bad, so I replaced it with a
   brand new drive. However, the issue persists, so I no longer suspect
   the drive.
  
   A ps, trace, and dmesg are provided below. This is my first
   reporting a bug of this nature. I hope I've followed procedure. If
   not, please do let me know. I'm trying to be useful. :-)
  
   -
  
   panic: pmap_remove_ptes: managed page without PG_PVLIST for 0x3c001000
   Stopped at  Debugger+0x4:   popl%ebp
   RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS 
   PANIC!
   DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
  
   ddb show panic
   pmap_remove_ptes: managed page without PG_PVLIST for 0x3c001000
  
   ddb trace
   Debugger(d0963718,f6269e38,d0966be4,f6269e38,d1cf1040) at Debugger+0x4
   panic(d0966be4,3c001000,d1ceb16c,f6269e4c,0) at panic+0x5d
   pmap_remove_ptes(d9e39798,d1cf1040,ffcf,3c00,3c003000) at
   pmap_remove_p
   tes+0x142
   pmap_do_remove(d9e39798,3c00,3c003000,0,d0ad7820) at
  pmap_do_remove+0xeb
   pmap_remove(d9e39798,3c00,3c003000,d056c4e9,d9c68e1c) at
   pmap_remove+0x27
   uvm_unmap_kill_entry(d9e3ad80,d9c68e1c,f6269f2c,d043a597,0) at
   uvm_unmap_kill_e
   ntry+0xf8
   uvm_map_teardown(d9e3ad80,1,4,d093e66e,d9cc2700) at uvm_map_teardown+0xac
   uvmspace_free(d9e3ad80,1,1,f6269f6c,d0203009) at uvmspace_free+0x2e
   uvm_exit(d9cc3ba4,d0a4e0a8,4,d093e66e,0) at uvm_exit+0x15
   reaper(d9e33004) at reaper+0x8a
   Bad frame pointer: 0xd0c3ce68
  
  Can you try to see if the following patch helps? It did for me, when
  I was debugging a similar panic back in December. However, my
  explanation why the patch would fix this bug, turned out to be invalid.
  Still the bug went away. If the same happens for you, some more people
  should have a look at the patch:
  
  --- /mount/blink/aegis/project/gg/history/os/src/sys/arch/i386/i386/pmap.c
  2012/10/16 18:31:28 1.117
  +++ /mount/blink/aegis/project/gg/history/os/src/sys/arch/i386/i386/pmap.c
  2013/01/24 17:20:06 1.118
  @@ -495,7 +495,7 @@ pmap_map_ptes(struct pmap *pmap)
  
  /* need to load a new alternate pt space into curpmap? */
  opde = *APDP_PDE;
  -#if defined(MULTIPROCESSOR)  defined(DIAGNOSTIC)
  +#if defined(DIAGNOSTIC)
  if (pmap_valid_entry(opde))
  panic(pmap_map_ptes: APTE valid);
   #endif
  @@ -521,10 +521,8 @@ pmap_unmap_ptes(struct pmap *pmap)
  if (pmap_is_curpmap(pmap)) {
  simple_unlock(pmap-pm_obj.vmobjlock);
  } else {
  -#if defined(MULTIPROCESSOR)
  *APDP_PDE = 0;
  pmap_apte_flush();
  -#endif
  simple_unlock(pmap-pm_obj.vmobjlock);
  simple_unlock(curpcb-pcb_pmap-pm_obj.vmobjlock);
  }
 
 Wish somebody with more in-depth knowledge about the i386 pmap
 implementation would respond :(.

I think it is some kind of caching issue. IIRC I've seen at least
one case where the condition that is checked by the assertionx
turned out to be _not true_ when re-evaluated manually in the debugger.

 Your diff basically disables an optimization where the alternate pmap
 is kept around in case we need it again.  Not sure how important this
 optimization is.  I guess the primary user of the alternate pmap is
 the reaper, and keeping the alternate pmap around there could be
 beneficial if the address space of the process we're reaping is
 heavily fragmented.

fork() also makes use of the alternate pmap. And ptrace based memory
access, I guess.

 There is something fishy with this optimization.  *APDP_PDE is never
 cleared, which means that it becomes stale after the process exits.
 Presumably we'd notice the next time we try to map an alternate pmap,
 but if the physical pages for the pmap get recycled, we might not.

Yes. Additionally, the reaper is particularly problematic as
it is a kernel thread and switching to/from kernel threads omits
the TLB-flush that is inherent to a normal process switch.

 Not quite seeing how this leads to that panic,

Exactly. This is the problem with the patch. It most likely fixes a bug.
However, the bug being fixed is not sufficient to explain the symptoms
seen by the origial poster.

 but perhaps we should clear *APDP_PDE in pmap_switch()?


Re: Use ACPI to detect secondary PCI root segments on x86

2012-09-24 Thread Christian Ehrhardt
Hi,

On Fri, Sep 21, 2012 at 08:34:56PM +0200, Mark Kettenis wrote:
 Any chance of having a diff with the amd64 bits as well?

Included below.

 Also:
 
  +   if (pba-pba_busex == NULL) {
  +   printf(cannot attach ACPI discovered busses\n);
  +   return;
  +   }
 
 Instead of doing a printf here, I think you should just do a
 
 KASSERT(pba-pba_busex != NULL);

Done as wall. Updated diff below.

   regardsChristian


Index: arch/amd64/amd64/mainbus.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/mainbus.c,v
retrieving revision 1.25
diff -u -r1.25 mainbus.c
--- arch/amd64/amd64/mainbus.c  19 Sep 2012 23:23:50 -  1.25
+++ arch/amd64/amd64/mainbus.c  24 Sep 2012 11:15:41 -
@@ -218,6 +218,9 @@
mba.mba_pba.pba_domain = pci_ndomains++;
mba.mba_pba.pba_bus = 0;
config_found(self, mba.mba_pba, mainbus_print);
+#if NACPI  0
+   acpi_pciroots_attach(self, mba.mba_pba, mainbus_print);
+#endif
}
 #endif
 
Index: arch/i386/i386/mainbus.c
===
RCS file: /cvs/src/sys/arch/i386/i386/mainbus.c,v
retrieving revision 1.49
diff -u -r1.49 mainbus.c
--- arch/i386/i386/mainbus.c19 Sep 2012 23:03:12 -  1.49
+++ arch/i386/i386/mainbus.c24 Sep 2012 11:15:42 -
@@ -245,6 +245,9 @@
mba.mba_pba.pba_domain = pci_ndomains++;
mba.mba_pba.pba_bus = 0;
config_found(self, mba.mba_pba, mainbus_print);
+#if NACPI  0
+   acpi_pciroots_attach(self, mba.mba_pba, mainbus_print);
+#endif
}
 #endif
 
Index: dev/acpi/acpi.c
===
RCS file: /cvs/src/sys/dev/acpi/acpi.c,v
retrieving revision 1.239
diff -u -r1.239 acpi.c
--- dev/acpi/acpi.c 7 Sep 2012 19:19:59 -   1.239
+++ dev/acpi/acpi.c 24 Sep 2012 11:15:43 -
@@ -394,6 +394,8 @@
 
 TAILQ_HEAD(, acpi_pci) acpi_pcidevs =
 TAILQ_HEAD_INITIALIZER(acpi_pcidevs);
+TAILQ_HEAD(, acpi_pci) acpi_pcirootdevs = 
+TAILQ_HEAD_INITIALIZER(acpi_pcirootdevs);
 
 int acpi_getpci(struct aml_node *node, void *arg);
 int acpi_getminbus(union acpi_resource *crs, void *arg);
@@ -482,6 +484,7 @@
node-pci = pci;
dnprintf(10, found PCI root: %s %d\n,
aml_nodename(node), pci-bus);
+   TAILQ_INSERT_TAIL(acpi_pcirootdevs, pci, next);
}
aml_freevalue(res);
return 0;
@@ -600,6 +603,23 @@
}
 
return PCI_PMCSR_STATE_D3;
+}
+
+void
+acpi_pciroots_attach(struct device *dev, void *aux, cfprint_t pr)
+{
+   struct acpi_pci *pdev;
+   struct pcibus_attach_args   *pba = aux;
+
+   KASSERT(pba-pba_busex != NULL);
+
+   TAILQ_FOREACH(pdev, acpi_pcirootdevs, next) {
+   if (extent_alloc_region(pba-pba_busex, pdev-bus,
+   1, EX_NOWAIT) != 0)
+   continue;
+   pba-pba_bus = pdev-bus;
+   config_found(dev, pba, pr);
+   }
 }
 
 void
Index: dev/acpi/acpivar.h
===
RCS file: /cvs/src/sys/dev/acpi/acpivar.h,v
retrieving revision 1.72
diff -u -r1.72 acpivar.h
--- dev/acpi/acpivar.h  7 Sep 2012 19:19:59 -   1.72
+++ dev/acpi/acpivar.h  24 Sep 2012 11:15:43 -
@@ -323,6 +323,8 @@
 void   acpi_powerdown_task(void *, int);
 void   acpi_sleep_task(void *, int);
 
+void   acpi_pciroots_attach(struct device *, void *, cfprint_t);
+
 #endif
 
 #endif /* !_ACPI_WAKECODE */



Re: Use ACPI to detect secondary PCI root segments on x86

2012-09-21 Thread Christian Ehrhardt
Hi,

thanks to Mark, we have enough PCI bus number accounting, now to allow us
to attach ACPI bus numbers via ACPI. A patch to do this is below.

regards   Christian


Index: arch/i386/i386/mainbus.c
===
RCS file: /cvs/src/sys/arch/i386/i386/mainbus.c,v
retrieving revision 1.49
diff -u -r1.49 mainbus.c
--- arch/i386/i386/mainbus.c19 Sep 2012 23:03:12 -  1.49
+++ arch/i386/i386/mainbus.c21 Sep 2012 11:36:59 -
@@ -245,6 +245,9 @@
mba.mba_pba.pba_domain = pci_ndomains++;
mba.mba_pba.pba_bus = 0;
config_found(self, mba.mba_pba, mainbus_print);
+#if NACPI  0
+   acpi_pciroots_attach(self, mba.mba_pba, mainbus_print);
+#endif
}
 #endif
 
Index: dev/acpi/acpi.c
===
RCS file: /cvs/src/sys/dev/acpi/acpi.c,v
retrieving revision 1.239
diff -u -r1.239 acpi.c
--- dev/acpi/acpi.c 7 Sep 2012 19:19:59 -   1.239
+++ dev/acpi/acpi.c 21 Sep 2012 11:37:01 -
@@ -394,6 +394,8 @@
 
 TAILQ_HEAD(, acpi_pci) acpi_pcidevs =
 TAILQ_HEAD_INITIALIZER(acpi_pcidevs);
+TAILQ_HEAD(, acpi_pci) acpi_pcirootdevs = 
+TAILQ_HEAD_INITIALIZER(acpi_pcirootdevs);
 
 int acpi_getpci(struct aml_node *node, void *arg);
 int acpi_getminbus(union acpi_resource *crs, void *arg);
@@ -482,6 +484,7 @@
node-pci = pci;
dnprintf(10, found PCI root: %s %d\n,
aml_nodename(node), pci-bus);
+   TAILQ_INSERT_TAIL(acpi_pcirootdevs, pci, next);
}
aml_freevalue(res);
return 0;
@@ -600,6 +603,25 @@
}
 
return PCI_PMCSR_STATE_D3;
+}
+
+void
+acpi_pciroots_attach(struct device *dev, void *aux, cfprint_t pr)
+{
+   struct acpi_pci *pdev;
+   struct pcibus_attach_args   *pba = aux;
+
+   if (pba-pba_busex == NULL) {
+   printf(cannot attach ACPI discovered busses\n);
+   return;
+   }
+   TAILQ_FOREACH(pdev, acpi_pcirootdevs, next) {
+   if (extent_alloc_region(pba-pba_busex, pdev-bus,
+   1, EX_NOWAIT) != 0)
+   continue;
+   pba-pba_bus = pdev-bus;
+   config_found(dev, pba, pr);
+   }
 }
 
 void
Index: dev/acpi/acpivar.h
===
RCS file: /cvs/src/sys/dev/acpi/acpivar.h,v
retrieving revision 1.72
diff -u -r1.72 acpivar.h
--- dev/acpi/acpivar.h  7 Sep 2012 19:19:59 -   1.72
+++ dev/acpi/acpivar.h  21 Sep 2012 11:37:01 -
@@ -323,6 +323,8 @@
 void   acpi_powerdown_task(void *, int);
 void   acpi_sleep_task(void *, int);
 
+void   acpi_pciroots_attach(struct device *, void *, cfprint_t);
+
 #endif
 
 #endif /* !_ACPI_WAKECODE */



Re: Use ACPI to detect secondary PCI root segments on x86

2012-09-14 Thread Christian Ehrhardt
Hi,

On Thu, Sep 13, 2012 at 11:00:12PM +0200, Mark Kettenis wrote:
  However, there might be a different solution:
  We could track attached busses (not just host bridge busses but all busses)
  in the pci code (or via MD code and hooks but I think that would be a
  generic thing) and do nothing in pciattach if a bus is discovered for
  the second time.
  
  With this MD code could use whatever methods it wants to detect otherwise
  undiscovered host bridges. Possible methods might be magic numbers, hints
  from ACPI or even a linear scan of all possible bus numbers.
  
  Does this sound like a possible solution?
 
 It actually got me thinking about proper bus number resource
 accounting again.  Something that's needed for properly hot-plugging
 pci busses.  Below is a diff that does this.
 
 With that code, your acpi code can just try to grab the bus number
 using extent_alloc_region() and only attach a bus if that succeeds.

Nice idea!

However, I have some doubts about the implementation. In particular
you don't pass sc_busex down through bridge devices. This is ok as long
as all busses that we _discover_ downstream of that bridge are in fact
_physically_ dowstream of that bridge, too. In this case the bus range
of the bridge (PPB_BUSINFO_SECONDARY..PPB_BUSINFO_SUBORDINATE) covers
all busses.

However, host bridges that are handled e.g. in arch/i386/pci/pchb.c
can appear downstream of another bridge but use bus numbers that are
outside of the upstream bridge's bus number range.

The easiest solution would probably be, to make the pcibus_ex an MI
thing, i.e. define and allocate it globally in dev/pci/pci.c.

 regardsChristian



Re: Use ACPI to detect secondary PCI root segments on x86

2012-09-14 Thread Christian Ehrhardt
Hi,

On Fri, Sep 14, 2012 at 10:31:05AM +0200, Christian Ehrhardt wrote:
 On Thu, Sep 13, 2012 at 11:00:12PM +0200, Mark Kettenis wrote:
   However, there might be a different solution:
   We could track attached busses (not just host bridge busses but all 
   busses)
   in the pci code (or via MD code and hooks but I think that would be a
   generic thing) and do nothing in pciattach if a bus is discovered for
   the second time.
   
   With this MD code could use whatever methods it wants to detect otherwise
   undiscovered host bridges. Possible methods might be magic numbers, hints
   from ACPI or even a linear scan of all possible bus numbers.
   
   Does this sound like a possible solution?
  
  It actually got me thinking about proper bus number resource
  accounting again.  Something that's needed for properly hot-plugging
  pci busses.  Below is a diff that does this.
  
  With that code, your acpi code can just try to grab the bus number
  using extent_alloc_region() and only attach a bus if that succeeds.
 
 Nice idea!
 
 However, I have some doubts about the implementation. In particular
 you don't pass sc_busex down through bridge devices. This is ok as long
 as all busses that we _discover_ downstream of that bridge are in fact
 _physically_ dowstream of that bridge, too. In this case the bus range
 of the bridge (PPB_BUSINFO_SECONDARY..PPB_BUSINFO_SUBORDINATE) covers
 all busses.
 
 However, host bridges that are handled e.g. in arch/i386/pci/pchb.c
 can appear downstream of another bridge but use bus numbers that are
 outside of the upstream bridge's bus number range.
 
 The easiest solution would probably be, to make the pcibus_ex an MI
 thing, i.e. define and allocate it globally in dev/pci/pci.c.

Below is the suggestion for an updated patch that follows this strategy.

I'm not entirely confident about the i386/pchb.c part of the patch, though.
People that own such hardware would have to verify that all their hardware
is discovered properly.

This code tries to catch theoretical cases where we have two ACPI
discovered host bridges and the one listed _first_ in ACPI is discoverable
from the second one via PCI mechanisms (but not vice versa). In this case
the host bridge code and not the ACPI code must detect that the second
host bridge has already been attached. However, this assumes that host
bridges never use a bus numbers that are within the bus range of their
upstream bridge.

regards   Christian

Index: arch/i386/i386/mainbus.c
===
RCS file: /cvs/src/sys/arch/i386/i386/mainbus.c,v
retrieving revision 1.48
diff -u -r1.48 mainbus.c
--- arch/i386/i386/mainbus.c3 Nov 2010 10:15:23 -   1.48
+++ arch/i386/i386/mainbus.c14 Sep 2012 09:48:34 -
@@ -244,6 +244,9 @@
mba.mba_pba.pba_domain = pci_ndomains++;
mba.mba_pba.pba_bus = 0;
config_found(self, mba.mba_pba, mainbus_print);
+#if NACPI  0
+   acpi_pciroots_attach(self, mba.mba_pba, mainbus_print);
+#endif
}
 #endif
 
Index: arch/i386/pci/pchb.c
===
RCS file: /cvs/src/sys/arch/i386/pci/pchb.c,v
retrieving revision 1.85
diff -u -r1.85 pchb.c
--- arch/i386/pci/pchb.c31 Aug 2010 17:13:46 -  1.85
+++ arch/i386/pci/pchb.c14 Sep 2012 09:48:34 -
@@ -403,6 +421,8 @@
 
if (doattach == 0)
return;
+   if (extent_alloc_region(pcibus_ex, pbnum, 1, EX_NOWAIT) != 0)
+   return;
 
bzero(pba, sizeof(pba));
pba.pba_busname = pci;
@@ -505,6 +525,8 @@
pba.pba_domain = pa-pa_domain;
pba.pba_bus = AMD64HT_LDT_SEC_BUS_NUM(bus);
pba.pba_pc = pa-pa_pc;
-   config_found(self, pba, pchb_print);
+   if (extent_alloc_region(pcibus_ex, pba.pba_bus, 1, EX_NOWAIT)
+   == 0)
+   config_found(self, pba, pchb_print);
}
 }
Index: arch/i386/pci/pci_machdep.c
===
RCS file: /cvs/src/sys/arch/i386/pci/pci_machdep.c,v
retrieving revision 1.68
diff -u -r1.68 pci_machdep.c
--- arch/i386/pci/pci_machdep.c 4 Dec 2011 20:08:09 -   1.68
+++ arch/i386/pci/pci_machdep.c 14 Sep 2012 09:48:34 -
@@ -921,6 +921,11 @@
extent_alloc_region(pcimem_ex, IOM_BEGIN, IOM_SIZE,
EX_CONFLICTOK | EX_NOWAIT);
}
+
+   if (pcibus_ex == NULL) {
+   pcibus_ex = extent_create(pcibus, 0, 0xff, M_DEVBUF,
+   NULL, 0, EX_NOWAIT);
+   }
 }
 
 #include acpi.h
Index: arch/i386/pci/pci_machdep.h
===
RCS file: /cvs/src/sys/arch/i386/pci/pci_machdep.h,v
retrieving revision 1.23
diff -u -r1.23 pci_machdep.h
--- arch/i386/pci/pci_machdep.h 10 Oct 2011 19:42:36 -  1.23
+++ arch/i386/pci

Re: Use ACPI to detect secondary PCI root segments on x86

2012-09-12 Thread Christian Ehrhardt
Hi,

On Mon, Sep 10, 2012 at 01:22:35PM +0200, Christian Ehrhardt wrote:
  Yup.  Since we have code to detect additional host bridges some of
  them may already have been attached.  And it would be a good way to
  defend against ACPI lying to us.  However, please keep the ACPI hooks
  out of the MI PCI code.  It'd be better if you used pci_attach_hook(),
  which lives in arch/pci/pci_machdep.c.
 
 Ok, will do that.

Updated version of the patch is below.  regards   Christian


Index: arch/i386/i386/mainbus.c
===
RCS file: /cvs/src/sys/arch/i386/i386/mainbus.c,v
retrieving revision 1.48
diff -u -r1.48 mainbus.c
--- arch/i386/i386/mainbus.c3 Nov 2010 10:15:23 -   1.48
+++ arch/i386/i386/mainbus.c12 Sep 2012 13:39:19 -
@@ -244,6 +244,9 @@
mba.mba_pba.pba_domain = pci_ndomains++;
mba.mba_pba.pba_bus = 0;
config_found(self, mba.mba_pba, mainbus_print);
+#if NACPI  0
+   acpi_pciroots_attach(self, mba.mba_pba, mainbus_print);
+#endif
}
 #endif
 
Index: arch/i386/pci/pci_machdep.c
===
RCS file: /cvs/src/sys/arch/i386/pci/pci_machdep.c,v
retrieving revision 1.68
diff -u -r1.68 pci_machdep.c
--- arch/i386/pci/pci_machdep.c 4 Dec 2011 20:08:09 -   1.68
+++ arch/i386/pci/pci_machdep.c 12 Sep 2012 13:39:19 -
@@ -208,6 +208,9 @@
printf(: configuration mode %d, pci_mode);
 #endif
 
+#if NACPI  0
+   acpi_pciroot_match(self, pba-pba_bus);
+#endif
if (pba-pba_bus != 0)
return;
 
Index: dev/acpi/acpi.c
===
RCS file: /cvs/src/sys/dev/acpi/acpi.c,v
retrieving revision 1.238
diff -u -r1.238 acpi.c
--- dev/acpi/acpi.c 13 Jul 2012 11:51:41 -  1.238
+++ dev/acpi/acpi.c 12 Sep 2012 13:39:20 -
@@ -392,6 +392,8 @@
 
 TAILQ_HEAD(, acpi_pci) acpi_pcidevs =
 TAILQ_HEAD_INITIALIZER(acpi_pcidevs);
+TAILQ_HEAD(, acpi_pci) acpi_pcirootdevs = 
+TAILQ_HEAD_INITIALIZER(acpi_pcirootdevs);
 
 int acpi_getpci(struct aml_node *node, void *arg);
 int acpi_getminbus(union acpi_resource *crs, void *arg);
@@ -480,6 +482,7 @@
node-pci = pci;
dnprintf(10, found PCI root: %s %d\n,
aml_nodename(node), pci-bus);
+   TAILQ_INSERT_TAIL(acpi_pcirootdevs, pci, next);
}
aml_freevalue(res);
return 0;
@@ -548,6 +551,31 @@
dev-dv_xname, aml_nodename(pdev-node));
pdev-device = dev;
}
+   }
+}
+
+void
+acpi_pciroot_match(struct device *dev, int bus)
+{
+   struct acpi_pci *pdev;
+
+   TAILQ_FOREACH(pdev, acpi_pcirootdevs, next) {
+   if (pdev-bus == bus)
+   pdev-device = dev;
+   }
+}
+
+void
+acpi_pciroots_attach(struct device *dev, void *aux, cfprint_t pr)
+{
+   struct acpi_pci *pdev;
+   struct pcibus_attach_args   *pba = aux;
+
+   TAILQ_FOREACH(pdev, acpi_pcirootdevs, next) {
+   if (pdev-device)   /* Already attached */
+   continue;
+   pba-pba_bus = pdev-bus;
+   config_found(dev, pba, pr);
}
 }
 
Index: dev/acpi/acpivar.h
===
RCS file: /cvs/src/sys/dev/acpi/acpivar.h,v
retrieving revision 1.71
diff -u -r1.71 acpivar.h
--- dev/acpi/acpivar.h  15 Apr 2011 17:34:51 -  1.71
+++ dev/acpi/acpivar.h  12 Sep 2012 13:39:20 -
@@ -322,6 +322,9 @@
 void   acpi_powerdown_task(void *, int);
 void   acpi_sleep_task(void *, int);
 
+void   acpi_pciroot_match(struct device *, int);
+void   acpi_pciroots_attach(struct device *, void *, cfprint_t);
+
 #endif
 
 #endif /* !_ACPI_WAKECODE */



Re: Use ACPI to detect secondary PCI root segments on x86

2012-09-10 Thread Christian Ehrhardt
Hi,

first of all: Thanks for your reply!

On Sat, Sep 08, 2012 at 04:04:41PM +0200, Mark Kettenis wrote:
  most modern x86 hardware includes more than one PCI root segement.
  E.g. a hardware that I have here has three PCI root segemnts with bus
  numbers 0, 0x7f, 0x80 and 0xff respectively. (0x7f and 0xff host the
  uncore devices of each processor).
 
 Well, machines with multiple host bridges have been around for a
 while.  Some of them even pre-date ACPI.
 
  
  These segments are already detected by APCI but this information is not
  used when attaching PCI busses to the mainbus. Below is a patch that
  should solve the PCI bus detection problem in a robust way.
 
 I hate ACPI!  It's full of lies; I don't trust it.

Agreed.

 It is much better
 to detect these additional PCI busses by looking at actual hardware
 registers.

This approach (unfortunately) means that device detection breaks
with each new chipset and I hate that :-)

 So before we go down this path, can you investigate that?
 It seems the bus number for the additional Uncore devices of the
 Xeon E5 1600/2600 CPUs is available in config register 0x108 of device
 0:5:0.

Yes, I already figured that. It works and I can cook a patch if you want.
The patch is not completely straight forward, tough, because the device is
not just a bridge but a multi-purpose devices.

 But I haven't figured how to find the bus number for the
 second socket.  Looking at the AML for this machine and determining
 how it detects these additional busses might provide some clues.  Feel
 free to send me the acpidump output for this box.

I will send you ACPI data in private. However, the second PCI bus is
simply hardwired in DSDT, i.e. there does not seem to be code in ACPI that
would try to detect its presence.

I tend to believe that there is no bridge device that exposes the
second PCI bus.

  There's a small problem though: Some of the secondary PCI segments
  can show up as busses behind some kind of host bridge device dowstream
  of pci bus 0, too. These PCI busses would be attached twice, now. Thus we
  keep track of attached root segemnents in the ACPI code.
 
 Yup.  Since we have code to detect additional host bridges some of
 them may already have been attached.  And it would be a good way to
 defend against ACPI lying to us.  However, please keep the ACPI hooks
 out of the MI PCI code.  It'd be better if you used pci_attach_hook(),
 which lives in arch/pci/pci_machdep.c.

Ok, will do that.

However, there might be a different solution:
We could track attached busses (not just host bridge busses but all busses)
in the pci code (or via MD code and hooks but I think that would be a
generic thing) and do nothing in pciattach if a bus is discovered for
the second time.

With this MD code could use whatever methods it wants to detect otherwise
undiscovered host bridges. Possible methods might be magic numbers, hints
from ACPI or even a linear scan of all possible bus numbers.

Does this sound like a possible solution?

regardsChristian



Re: Use ACPI to detect secondary PCI root segments on x86

2012-09-07 Thread Christian Ehrhardt
Hi again,

On Thu, Sep 06, 2012 at 11:24:01AM +0200, Christian Ehrhardt wrote:
 most modern x86 hardware includes more than one PCI root segement.
 E.g. a hardware that I have here has four PCI root segemnts with bus
 numbers 0, 0x7f, 0x80 and 0xff respectively. (0x7f and 0xff host the
 uncore devices of each processor).
 
 These segments are already detected by APCI but this information is not
 used when attaching PCI busses to the mainbus. Below is a patch that
 should solve the PCI bus detection problem in a robust way.

Below is a diff of the dmesg output before and after the
patch. Comments welcome.

regardsChristian

--- dmesg.before2012-09-07 10:06:49.0 +0200
+++ dmesg.after 2012-09-07 10:10:21.0 +0200
@@ -1,186 +1,300 @@
 Copyright (c) 1982, 1986, 1989, 1991, 1993
 The Regents of the University of California.  All rights reserved.
 Copyright (c) 1995-2012 OpenBSD. All rights reserved.  http://www.OpenBSD.org
 
 OpenBSD 5.2-current (GENERIC.MP) ...
 cpu0: Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz (GenuineIntel 686-class) 2.91 
GHz
 cpu0: 
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,LONG,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,LAHF
 real mem  = 2118721536 (2020MB)
 avail mem = 2073182208 (1977MB)
 mainbus0 at root
 bios0 at mainbus0: AT/286+ BIOS, date 10/11/11, SMBIOS rev. 2.7 @ 0xeb2f0 (65 
entries)
 bios0: vendor American Megatrends Inc. version 1.0a date 06/05/2012
 bios0: Supermicro X9DRH-7TF/7F/iTF/iF
 acpi0 at bios0: rev 2
 acpi0: sleep states S0 S1 S4 S5
 acpi0: tables DSDT FACP APIC MCFG PRAD SRAT SLIT HPET SPMI SSDT SPCR BGRT DMAR
 acpi0: wakeup devices BR20(S1) EUSB(S4) USBE(S4) NPE1(S4) NPE2(S4) NPE3(S4) 
NPE4(S4) NPE5(S4) NPE6(S4) NPE7(S4) NPE8(S4) NPE9(S4) I35P(S4) I35I(S4) 
X54P(S4) X54I(S4) NPEA(S4) SLPB(S0) NPE1(S4) NPE2(S4) NPE3(S4) NPE4(S4) 
NPE5(S4) NPE6(S4) NPE7(S4) NPE8(S4) NPE9(S4) NPEA(S4)
 acpitimer0 at acpi0: 3579545 Hz, 24 bits
 acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
 cpu0 at mainbus0: apid 0 (boot processor)
 cpu0: apic clock running at 100MHz
 cpu1 at mainbus0: apid 2 (application processor)
 cpu1: Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz (GenuineIntel 686-class) 2.91 
GHz
 cpu1: 
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,LONG,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,LAHF
 cpu2 at mainbus0: apid 4 (application processor)
 cpu2: Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz (GenuineIntel 686-class) 2.91 
GHz
 cpu2: 
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,LONG,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,LAHF
 cpu3 at mainbus0: apid 6 (application processor)
 cpu3: Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz (GenuineIntel 686-class) 2.91 
GHz
 cpu3: 
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,LONG,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,LAHF
 cpu4 at mainbus0: apid 8 (application processor)
 cpu4: Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz (GenuineIntel 686-class) 2.91 
GHz
 cpu4: 
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,LONG,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,LAHF
 cpu5 at mainbus0: apid 10 (application processor)
 cpu5: Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz (GenuineIntel 686-class) 2.91 
GHz
 cpu5: 
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,LONG,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,LAHF
 cpu6 at mainbus0: apid 32 (application processor)
 cpu6: Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz (GenuineIntel 686-class) 2.91 
GHz
 cpu6: 
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,LONG,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,LAHF
 cpu7 at mainbus0: apid 34 (application processor)
 cpu7: Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz (GenuineIntel 686-class) 2.91 
GHz
 cpu7: 
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,LONG,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA

Use ACPI to detect secondary PCI root segments on x86

2012-09-06 Thread Christian Ehrhardt
Hi,

most modern x86 hardware includes more than one PCI root segement.
E.g. a hardware that I have here has three PCI root segemnts with bus
numbers 0, 0x7f, 0x80 and 0xff respectively. (0x7f and 0xff host the
uncore devices of each processor).

These segments are already detected by APCI but this information is not
used when attaching PCI busses to the mainbus. Below is a patch that
should solve the PCI bus detection problem in a robust way.

There's a small problem though: Some of the secondary PCI segments
can show up as busses behind some kind of host bridge device dowstream
of pci bus 0, too. These PCI busses would be attached twice, now. Thus we
keep track of attached root segemnents in the ACPI code.

NOTE: Patch is currently i386 only but almost everything is MI.

regards   Christian


Index: arch/i386/i386/mainbus.c
===
RCS file: /cvs/src/sys/arch/i386/i386/mainbus.c,v
retrieving revision 1.48
diff -u -r1.48 mainbus.c
--- arch/i386/i386/mainbus.c3 Nov 2010 10:15:23 -   1.48
+++ arch/i386/i386/mainbus.c6 Sep 2012 08:58:38 -
@@ -244,6 +244,9 @@
mba.mba_pba.pba_domain = pci_ndomains++;
mba.mba_pba.pba_bus = 0;
config_found(self, mba.mba_pba, mainbus_print);
+#if NACPI  0
+   acpi_pciroots_attach(self, mba.mba_pba, mainbus_print);
+#endif
}
 #endif
 
Index: dev/acpi/acpi.c
===
RCS file: /cvs/src/sys/dev/acpi/acpi.c,v
retrieving revision 1.238
diff -u -r1.238 acpi.c
--- dev/acpi/acpi.c 13 Jul 2012 11:51:41 -  1.238
+++ dev/acpi/acpi.c 6 Sep 2012 08:58:39 -
@@ -392,6 +392,8 @@
 
 TAILQ_HEAD(, acpi_pci) acpi_pcidevs =
 TAILQ_HEAD_INITIALIZER(acpi_pcidevs);
+TAILQ_HEAD(, acpi_pci) acpi_pcirootdevs = 
+TAILQ_HEAD_INITIALIZER(acpi_pcirootdevs);
 
 int acpi_getpci(struct aml_node *node, void *arg);
 int acpi_getminbus(union acpi_resource *crs, void *arg);
@@ -480,6 +482,7 @@
node-pci = pci;
dnprintf(10, found PCI root: %s %d\n,
aml_nodename(node), pci-bus);
+   TAILQ_INSERT_TAIL(acpi_pcirootdevs, pci, next);
}
aml_freevalue(res);
return 0;
@@ -548,6 +551,31 @@
dev-dv_xname, aml_nodename(pdev-node));
pdev-device = dev;
}
+   }
+}
+
+void
+acpi_pciroot_match(struct device *dev, int bus)
+{
+   struct acpi_pci *pdev;
+
+   TAILQ_FOREACH(pdev, acpi_pcirootdevs, next) {
+   if (pdev-bus == bus)
+   pdev-device = dev;
+   }
+}
+
+void
+acpi_pciroots_attach(struct device *dev, void *aux, cfprint_t pr)
+{
+   struct acpi_pci *pdev;
+   struct pcibus_attach_args   *pba = aux;
+
+   TAILQ_FOREACH(pdev, acpi_pcirootdevs, next) {
+   if (pdev-device)   /* Already attached */
+   continue;
+   pba-pba_bus = pdev-bus;
+   config_found(dev, pba, pr);
}
 }
 
Index: dev/acpi/acpivar.h
===
RCS file: /cvs/src/sys/dev/acpi/acpivar.h,v
retrieving revision 1.71
diff -u -r1.71 acpivar.h
--- dev/acpi/acpivar.h  15 Apr 2011 17:34:51 -  1.71
+++ dev/acpi/acpivar.h  6 Sep 2012 08:58:39 -
@@ -322,6 +322,9 @@
 void   acpi_powerdown_task(void *, int);
 void   acpi_sleep_task(void *, int);
 
+void   acpi_pciroot_match(struct device *, int);
+void   acpi_pciroots_attach(struct device *, void *, cfprint_t);
+
 #endif
 
 #endif /* !_ACPI_WAKECODE */
Index: dev/pci/pci.c
===
RCS file: /cvs/src/sys/dev/pci/pci.c,v
retrieving revision 1.94
diff -u -r1.94 pci.c
--- dev/pci/pci.c   10 Oct 2011 19:42:37 -  1.94
+++ dev/pci/pci.c   6 Sep 2012 08:58:39 -
@@ -46,6 +46,12 @@
 #include dev/pci/pcidevs.h
 #include dev/pci/ppbreg.h
 
+#include acpi.h
+
+#if NACPI  0
+#include dev/acpi/acpivar.h
+#endif
+
 int pcimatch(struct device *, void *, void *);
 void pciattach(struct device *, struct device *, void *);
 int pcidetach(struct device *, int);
@@ -187,6 +193,9 @@
if (pci_enumerate_bus(sc, pci_primary_vga, NULL))
pci_vga_pci = sc;
pci_enumerate_bus(sc, NULL, NULL);
+#if NACPI  0
+   acpi_pciroot_match(self, pba-pba_bus);
+#endif
 }
 
 int



ix(4) driver Bugfix

2012-01-20 Thread Christian Ehrhardt
Hi!

the ix(4)-Driver stores a pointer to the pci_attach_args structure (as
passed to ixgbe_attach) in its softc. However, the pci_attach_args
structure lives on the stack somewhere in autoconf, i.e. its contents are
undefined after ixgbe_attach returns.

However, the pointer to that data remains in ix_softc and is used at least
in ixgbe_detach. The diff below changes this by storing a copy of the
pci_attach_args in ix_softc.

Tested (with a slightly modified source tree) on x86.

regardsChristian

Index: if_ix.c
===
RCS file: /cvs/src/sys/dev/pci/if_ix.c,v
retrieving revision 1.59
diff -u -p -r1.59 if_ix.c
--- if_ix.c 13 Jan 2012 09:38:23 -  1.59
+++ if_ix.c 20 Jan 2012 14:02:04 -
@@ -209,7 +209,7 @@ ixgbe_attach(struct device *parent, stru
INIT_DEBUGOUT(ixgbe_attach: begin);
 
sc-osdep.os_sc = sc;
-   sc-osdep.os_pa = pa;
+   sc-osdep.os_pa = *pa;
 
/* Core Lock Init*/
mtx_init(sc-core_mtx, IPL_NET);
@@ -1400,7 +1400,7 @@ void
 ixgbe_identify_hardware(struct ix_softc *sc)
 {
struct ixgbe_osdep  *os = sc-osdep;
-   struct pci_attach_args  *pa = os-os_pa;
+   struct pci_attach_args  *pa = os-os_pa;
uint32_t reg;
 
/* Save off the information about this board */
@@ -1534,7 +1534,7 @@ ixgbe_allocate_legacy(struct ix_softc *s
 {
struct ifnet*ifp = sc-arpcom.ac_if;
struct ixgbe_osdep  *os = sc-osdep;
-   struct pci_attach_args  *pa = os-os_pa;
+   struct pci_attach_args  *pa = os-os_pa;
const char  *intrstr = NULL;
pci_chipset_tag_t   pc = pa-pa_pc;
pci_intr_handle_t   ih;
@@ -1576,7 +1576,7 @@ int
 ixgbe_allocate_pci_resources(struct ix_softc *sc)
 {
struct ixgbe_osdep  *os = sc-osdep;
-   struct pci_attach_args  *pa = os-os_pa;
+   struct pci_attach_args  *pa = os-os_pa;
int  val;
 
val = pci_conf_read(pa-pa_pc, pa-pa_tag, PCIR_BAR(0));
@@ -1609,7 +1609,7 @@ void
 ixgbe_free_pci_resources(struct ix_softc * sc)
 {
struct ixgbe_osdep  *os = sc-osdep;
-   struct pci_attach_args  *pa = os-os_pa;
+   struct pci_attach_args  *pa = os-os_pa;
struct ix_queue *que = sc-queues;
int i;
 
@@ -1749,7 +1749,7 @@ ixgbe_dma_malloc(struct ix_softc *sc, bu
struct ixgbe_osdep  *os = sc-osdep;
int  r;
 
-   dma-dma_tag = os-os_pa-pa_dmat;
+   dma-dma_tag = os-os_pa.pa_dmat;
r = bus_dmamap_create(dma-dma_tag, size, 1,
size, 0, BUS_DMA_NOWAIT, dma-dma_map);
if (r != 0) {
@@ -3330,7 +3330,7 @@ ixgbe_read_pci_cfg(struct ixgbe_hw *hw, 
high = 1;
reg = ~0x2;
}
-   pa = ((struct ixgbe_osdep *)hw-back)-os_pa;
+   pa = ((struct ixgbe_osdep *)hw-back)-os_pa;
value = pci_conf_read(pa-pa_pc, pa-pa_tag, reg);
 
if (high)
@@ -3351,7 +3351,7 @@ ixgbe_write_pci_cfg(struct ixgbe_hw *hw,
high = 1;
reg = ~0x2;
}
-   pa = ((struct ixgbe_osdep *)hw-back)-os_pa;
+   pa = ((struct ixgbe_osdep *)hw-back)-os_pa;
rv = pci_conf_read(pa-pa_pc, pa-pa_tag, reg);
if (!high)
rv = (rv  0x) | value;
Index: ixgbe.h
===
RCS file: /cvs/src/sys/dev/pci/ixgbe.h,v
retrieving revision 1.7
diff -u -p -r1.7 ixgbe.h
--- ixgbe.h 10 Jun 2011 12:46:35 -  1.7
+++ ixgbe.h 20 Jan 2012 14:02:04 -
@@ -127,7 +127,7 @@ struct ixgbe_osdep {
bus_addr_t   os_membase;
 
void*os_sc;
-   struct pci_attach_args  *os_pa;
+   struct pci_attach_args   os_pa;
 };
 
 extern uint16_t ixgbe_read_pci_cfg(struct ixgbe_hw *, uint32_t);



Fix page deactivation in uvm_vnode.c when clustering

2011-06-07 Thread Christian Ehrhardt
Hi,

while reading through uvm code I stubled accross a piece of code that
appears to be buggy. Here's the proposed patch, rational follows:

Index: uvm_vnode.c
===
RCS file: /cvs/src/sys/uvm/uvm_vnode.c,v
retrieving revision 1.71
diff -u -r1.71 uvm_vnode.c
--- uvm_vnode.c 18 May 2010 04:41:14 -  1.71
+++ uvm_vnode.c 7 Jun 2011 16:42:15 -
@@ -924,8 +924,8 @@
 */
 
if (flags  PGO_DEACTIVATE) {
-   if ((pp-pg_flags  PQ_INACTIVE) == 0 
-   pp-wire_count == 0) {
+   if ((ptmp-pg_flags  PQ_INACTIVE) == 0 
+   ptmp-wire_count == 0) {
pmap_page_protect(ptmp, VM_PROT_NONE);
uvm_pagedeactivate(ptmp);
}

This code handles (among other things) write back of dirty pages to disk.
It calls uvm_pager_put to do the actual IO (one call per page). To improve
performance, uvm_pager_put is allowed to do IO for adjacent pages as well
and returns an array of all pages that must be unbusied by the caller
(uvn_flush).

The code above is part of the loop that does this. pp is the initial page
passed to uvm_pager_put and ptmp is the page that we are about to unbusy.
Thus we should IMHO check the PQ_INACTIVE flag and the wire_count of ptmp
instead of pp.

Coments appreciated!

   regardsChristian



Re: Remove useless sti in x86 interrupt return path

2011-04-19 Thread Christian Ehrhardt
Hi,

On Mon, Apr 18, 2011 at 10:00:02PM -0700, Philip Guenther wrote:
  The sti was introduced in revision 1.97 of locore.s in March 2006 by
  mickey@. Commit message:
 
  | prevent the faults on iret to run w/ disabled intrs and cause
  | deadlocks; niklas toby tom ok
 
  Maybe mickey or one of the people giving oks back then want to comment?
 
 That predates my involvement with OpenBSD, but I think get the gist.
 iret can fault if, for example, the user corrupts certain members of
 the sigcontext passed to a signal handler.  If that happens the the
 CPU generates an exception on the iret and you end up in trap() where
 it peeks at the instruction that triggered the trap and, seeing it was
 iret, arranges to 'return' to the resume_iret label to generate what
 looks like a normal fault against the process.  The goal of that sti
 is to keep that 'normal' fault from being processed with interrupts
 blocked.

Point taken, I missed the sigreturn path. However, note that iret can
only fault if the values of the segment registers stored on the kernel
stack are bogus. In particular iret will not fault on the new EIP address.
(The address can page fault but the fault will be on the instruction
pointed to by the new CS:EIP and not in iret).

With that in mind, the question is: Is it really a problem if the trap
handler runs with interrupts disabled? See below.

 So, possibility one: sti's effect is delayed to the end of next
 instruction.

This is true, the effect of sti is delayed by one instruction. This is
documented in the intel manual.

 What happens if you put it immediately before an iret
 that faults?  If the flags pushed on the stack by the fault have
 interrupts unblocked, then simply moving the sti down a line to
 between the addl and the iret would be enough.  If you can reproduce
 this situation easily than I would ask that you try that.

This solves the stack overflow that I see, but it results in
inconsistent code. See below.

 Possibility two would be to try to handle this from, uh, inside
 resume_iret I guess.  I'm not 100% sure whether it can be unilateral
 there though.

This would be my preferred solution. In particular, I see two possibilities:

* Maybe it is not even a problem is irqs are sometimes disabled during
  trap.
* It is a problem if trap runs with irqs disabled. In this case, userland
  can easily trigger this by trapping on one of the pop instructions for
  ds, es, fs or gs. Those instructions in the interrupt return path run
  with irqs disabled. Thus moving the sti in the interupt return path down
  one instruction immediately before the iret does not really help.
  The proper solution would probably be to add explicit calls to sti in all
  of the resume paths (or even explicitly before calling trap).

regardsChristian



Remove useless sti in x86 interrupt return path

2011-04-18 Thread Christian Ehrhardt
Hi,

we are seeing kernel hangs (hard reset required) under high interrupt
load (triggered by ix(4) cards). I've traced this back to an overflow
of the kernel stack. This is how the stack looks at the time of the
overflow (manual backtrace, as ddb won't show the final part):

EBP (%EBP)  4(%EBP)
0xe13b825c: 0xe13b827c  0xd02d8b1d  panic
0xe13b827c: 0xe13b82ac  0xd02d1a42  timeout_add
0xe13b82ac: 0xe13b82dc  0xd02bfd6e  hardclock
0xe13b82dc: 0xe13b82fc  0xd04dddca  lapic_clockintr
0xe13b82fc: 0xe13b8304  0xd0202475  Xintrltimer
/* OLD EBP at ebp+0x20(9 words) OLD EIP at ebp+0x3c (15words) */
EBP=0xe13b8384  EIP=0xd02d7c31  pool_do_put
0xe13b8384: 0xe13b83a4  0xd02d7b6f  pool_put
0xe13b83a4: 0xe13b83d4  0xd02e7f26  m_free
0xe13b83d4: 0xe13b83f4  0xd02e7fa9  m_freem
0xe13b83f4: 0xe13b8434  0xd030a492  ether_input
0xe13b8434: 0xe13b8474  0xd043f2cf  ixgbe_rxeof
0xe13b8474: 0xe13b84b4  0xd043c7ed  ixgbe_legacy_irq
0xe13b84b4: 0xe13b84bc  0xd02027ee  Xintr_ioapic1
EBP=0xe13b9e10  EIP=d0202128Xdoreti (before iret)
0xe13b9e10: 0xe13b9e40  0xd028bd27  pf_pull_hdr
0xe13b9e40: 0xe13b9f40  0xd028d570  pf_test
0xe13b9f40: 0xe13b9f80  0xd03478b4  ipv4_input
0xe13b9f80: 0xe13b9fa0  0xd0347765  ipintr
0xe13b9fa0: 0xe13b9fa0  0xd0202182  Xsoftnet

Note that the size of the stack frame between Xintr_ioapic1 and pf_pull_hdr
is huge (~6k). This area is filled with a 12 byte pattern that looks like
a code address, the kernel code segment and a pushed eflags register. The
code address points to the interrupt return path in this piece of code from
i386/locore.s:

#define INTRFASTEXIT \
popl%fs ; \
popl%gs ; \
popl%es ; \
popl%ds ; \
popl%edi; \
popl%esi; \
popl%ebp; \
popl%ebx; \
popl%edx; \
popl%ecx; \
popl%eax; \
sti ; \= (1)
addl$8,%esp ; \
iret   = (2)

The return address points to the iret marked with (2). I.e. we get hit by
an interrupt immediately before the iret and this happens repeatedly until
the kernel stack overflows. This is only possible due to the sti instruction
at the point marked (1). As iret will restore the pushed eflags value anyway,
it should be safe to remove the sti altogether. Discussed with bluhm@ and
hshoexer@. Patch follows:

Index: locore.s
===
RCS file: /cvs/src/sys/arch/i386/i386/locore.s,v
retrieving revision 1.130
diff -u -r1.130 locore.s
--- locore.s3 Jul 2010 04:54:32 -   1.130
+++ locore.s18 Apr 2011 13:52:16 -
@@ -128,7 +128,6 @@
popl%edx; \
popl%ecx; \
popl%eax; \
-   sti ; \
addl$8,%esp ; \
iret
 

The sti was introduced in revision 1.97 of locore.s in March 2006 by
mickey@. Commit message:

| prevent the faults on iret to run w/ disabled intrs and cause
| deadlocks; niklas toby tom ok

Maybe mickey or one of the people giving oks back then want to comment?

 regards Christian