Re: SiFive Unmatched if_cad fix

2021-06-26 Thread Mark Kettenis
> Date: Sat, 26 Jun 2021 11:24:57 +
> From: Visa Hankala 
> 
> On Fri, Jun 25, 2021 at 04:15:43PM +0200, Mark Kettenis wrote:
> > > Date: Fri, 25 Jun 2021 13:27:28 +
> > > From: Visa Hankala 
> > > 
> > > On Thu, Jun 24, 2021 at 07:02:11PM +, Mickael Torres wrote:
> > > > Hello,
> > > > 
> > > > On the risc-v SiFive Unmatched the internal cad0 ethernet interface 
> > > > stops 
> > > > working randomly after some packets are sent/received. It looks like 
> > > > it's 
> > > > because the bus_dmamap used isn't restricted to lower than 4GB physical 
> > > > addresses, and the interface itself is.
> > > 
> > > I am surprised that this has not been raised before. I also wonder if
> > > riscv64's DMA constraints are fully sane.
> > 
> > There is no DMA constraint on riscv64 yet.  We try to avoid having
> > such a constraint on platforms that don't have a long history, hoping
> > those platforms are (and remain) 64-bit "clean".  And on some modern
> > platforms (e.g. arm64) there is no memory below 4GB, so we can't have
> > a DMA constraint on those platforms.  The jury is still out where
> > riscv64 will end up.
> > 
> > There is infrastructure to have the bootloader set the DMA constraint
> > based on the device tree.
> > 
> > > > Configuring the interface for 64 bits DMA fixes the problem, and
> > > > the machine is now useable with its internal ethernet port.
> > > > 
> > > > I didn't test very extensively, but it was very easy to run into
> > > > "cad0: hresp error, interface stopped" before the patch. After the
> > > > patch it survived a couple hours of tests and ping -f from and to
> > > > it.
> > > > 
> > > > It is now depending on being compiled for __riscv64__ or not, would it 
> > > > be 
> > > > better to do it dynamically when matching "sifive,fu740-c000-gem" ?
> > > 
> > > Hopefully all 64-bit platforms have a 64-bit capable revision of the
> > > controller.
> > 
> > So far that seems to be true.  The controller on the PolarFire SoC is
> > also 64-bit capable.
> > 
> > > However, I would avoid #ifdef'ing and make the selecting of
> > > the DMA mode happen at runtime.
> > > 
> > > Below is how I had envisioned how the driver should work.
> > > 
> > > I have not tested the 64-bit side of the patch.
> > 
> > Seems to work fine.  The diff looks good to me.  Your diff does not
> > set a 4GB boundary in the bus_dmamap_create() call for the rings.  It
> > works without that, but if there really is a hardware constraint in
> > crossing a 4GB boundary we may need to add this in.
> 
> So far I have not spotted such a restriction in the documentations
> of Zynq UltraScale+ and PolarFire SoCs. These SoCs have 64-bit capable
> GEM controllers.
> 
> The standalone GEM driver for Zynq in Xilinx embeddedsw library does
> have comments about not crossing the 0x boundary. However, those
> comments predate and seem inconsistent with 64-bit code.

Good.

> > The hardware has a register that indicates whether 64-bit DMA is
> > supported.  Maybe we should look at that instead of checking the
> > compatible string.  But let's get this in and tweak it later.
> 
> The register is undocumented on the Zynq-7000. Reading the register
> returns a constant (?) value (0x200), but the value probably means
> something different.
> 
> However, making the register access conditional to GEM version might
> work. Xilinx Zynq UltraScale+ and SiFive HiFive Unmatched have GEM
> version 0x7, whereas MicroSemi PolarFire and Xilinx Versal appear to
> have GEM version 0x107.
> 
> In addition, as cad(4) is now able to use 64-bit DMA, the DMA maps could
> be created with the 64-bit capability turned on.
> 
> The diff is untested on 64-bit hardware.

Looks good to me and works on the hifive unmatched.

ok kettenis@

> Index: dev/fdt/if_cad.c
> ===
> RCS file: src/sys/dev/fdt/if_cad.c,v
> retrieving revision 1.4
> diff -u -p -r1.4 if_cad.c
> --- dev/fdt/if_cad.c  26 Jun 2021 10:47:59 -  1.4
> +++ dev/fdt/if_cad.c  26 Jun 2021 11:08:36 -
> @@ -126,6 +126,8 @@
>  #define GEM_LADDRH(i)(0x008c + (i) * 8)
>  #define GEM_LADDRNUM 4
>  #define GEM_MID  0x00fc
> +#define  GEM_MID_VERSION_MASK(0xfff << 16)
> +#define  GEM_MID_VERSION_SHIFT   16
>  #define GEM_OCTTXL   0x0100
>  #define GEM_OCTTXH   0x0104
>  #define GEM_TXCNT0x0108
> @@ -169,6 +171,8 @@
>  #define GEM_RXIPCCNT 0x01a8
>  #define GEM_RXTCPCCNT0x01ac
>  #define GEM_RXUDPCCNT0x01b0
> +#define GEM_CFG6 0x0294
> +#define  GEM_CFG6_DMA64  (1 << 23)
>  #define GEM_TXQBASEHI0x04c8
>  #define GEM_RXQBASEHI0x04d4
>  
> @@ -364,6 +368,7 @@ cad_attach(struct device *parent, struct
> 

Re: SiFive Unmatched if_cad fix

2021-06-26 Thread Visa Hankala
On Fri, Jun 25, 2021 at 04:15:43PM +0200, Mark Kettenis wrote:
> > Date: Fri, 25 Jun 2021 13:27:28 +
> > From: Visa Hankala 
> > 
> > On Thu, Jun 24, 2021 at 07:02:11PM +, Mickael Torres wrote:
> > > Hello,
> > > 
> > > On the risc-v SiFive Unmatched the internal cad0 ethernet interface stops 
> > > working randomly after some packets are sent/received. It looks like it's 
> > > because the bus_dmamap used isn't restricted to lower than 4GB physical 
> > > addresses, and the interface itself is.
> > 
> > I am surprised that this has not been raised before. I also wonder if
> > riscv64's DMA constraints are fully sane.
> 
> There is no DMA constraint on riscv64 yet.  We try to avoid having
> such a constraint on platforms that don't have a long history, hoping
> those platforms are (and remain) 64-bit "clean".  And on some modern
> platforms (e.g. arm64) there is no memory below 4GB, so we can't have
> a DMA constraint on those platforms.  The jury is still out where
> riscv64 will end up.
> 
> There is infrastructure to have the bootloader set the DMA constraint
> based on the device tree.
> 
> > > Configuring the interface for 64 bits DMA fixes the problem, and
> > > the machine is now useable with its internal ethernet port.
> > > 
> > > I didn't test very extensively, but it was very easy to run into
> > > "cad0: hresp error, interface stopped" before the patch. After the
> > > patch it survived a couple hours of tests and ping -f from and to
> > > it.
> > > 
> > > It is now depending on being compiled for __riscv64__ or not, would it be 
> > > better to do it dynamically when matching "sifive,fu740-c000-gem" ?
> > 
> > Hopefully all 64-bit platforms have a 64-bit capable revision of the
> > controller.
> 
> So far that seems to be true.  The controller on the PolarFire SoC is
> also 64-bit capable.
> 
> > However, I would avoid #ifdef'ing and make the selecting of
> > the DMA mode happen at runtime.
> > 
> > Below is how I had envisioned how the driver should work.
> > 
> > I have not tested the 64-bit side of the patch.
> 
> Seems to work fine.  The diff looks good to me.  Your diff does not
> set a 4GB boundary in the bus_dmamap_create() call for the rings.  It
> works without that, but if there really is a hardware constraint in
> crossing a 4GB boundary we may need to add this in.

So far I have not spotted such a restriction in the documentations
of Zynq UltraScale+ and PolarFire SoCs. These SoCs have 64-bit capable
GEM controllers.

The standalone GEM driver for Zynq in Xilinx embeddedsw library does
have comments about not crossing the 0x boundary. However, those
comments predate and seem inconsistent with 64-bit code.

> The hardware has a register that indicates whether 64-bit DMA is
> supported.  Maybe we should look at that instead of checking the
> compatible string.  But let's get this in and tweak it later.

The register is undocumented on the Zynq-7000. Reading the register
returns a constant (?) value (0x200), but the value probably means
something different.

However, making the register access conditional to GEM version might
work. Xilinx Zynq UltraScale+ and SiFive HiFive Unmatched have GEM
version 0x7, whereas MicroSemi PolarFire and Xilinx Versal appear to
have GEM version 0x107.

In addition, as cad(4) is now able to use 64-bit DMA, the DMA maps could
be created with the 64-bit capability turned on.

The diff is untested on 64-bit hardware.

Index: dev/fdt/if_cad.c
===
RCS file: src/sys/dev/fdt/if_cad.c,v
retrieving revision 1.4
diff -u -p -r1.4 if_cad.c
--- dev/fdt/if_cad.c26 Jun 2021 10:47:59 -  1.4
+++ dev/fdt/if_cad.c26 Jun 2021 11:08:36 -
@@ -126,6 +126,8 @@
 #define GEM_LADDRH(i)  (0x008c + (i) * 8)
 #define GEM_LADDRNUM   4
 #define GEM_MID0x00fc
+#define  GEM_MID_VERSION_MASK  (0xfff << 16)
+#define  GEM_MID_VERSION_SHIFT 16
 #define GEM_OCTTXL 0x0100
 #define GEM_OCTTXH 0x0104
 #define GEM_TXCNT  0x0108
@@ -169,6 +171,8 @@
 #define GEM_RXIPCCNT   0x01a8
 #define GEM_RXTCPCCNT  0x01ac
 #define GEM_RXUDPCCNT  0x01b0
+#define GEM_CFG6   0x0294
+#define  GEM_CFG6_DMA64(1 << 23)
 #define GEM_TXQBASEHI  0x04c8
 #define GEM_RXQBASEHI  0x04d4
 
@@ -364,6 +368,7 @@ cad_attach(struct device *parent, struct
struct cad_softc *sc = (struct cad_softc *)self;
struct ifnet *ifp = >sc_ac.ac_if;
uint32_t hi, lo;
+   uint32_t rev, ver;
unsigned int i;
int node, phy;
 
@@ -416,8 +421,12 @@ cad_attach(struct device *parent, struct
}
}
 
+   rev = HREAD4(sc, GEM_MID);
+   ver = (rev & GEM_MID_VERSION_MASK) >> GEM_MID_VERSION_SHIFT;
+

Re: SiFive Unmatched if_cad fix

2021-06-25 Thread Theo de Raadt
Mark Kettenis  wrote:

> > I am surprised that this has not been raised before. I also wonder if
> > riscv64's DMA constraints are fully sane.
> 
> There is no DMA constraint on riscv64 yet.  We try to avoid having
> such a constraint on platforms that don't have a long history, hoping
> those platforms are (and remain) 64-bit "clean".  And on some modern
> platforms (e.g. arm64) there is no memory below 4GB, so we can't have
> a DMA constraint on those platforms.  The jury is still out where
> riscv64 will end up.
> 
> There is infrastructure to have the bootloader set the DMA constraint
> based on the device tree.

The bootloader mechanism is a little weird.

32-bit dma problems are going to come from pci devices which are not in
the device tree.

mbufs are the worst, because they are received on one device and
'routed' for output on a different device.  So if you give 64-bit
reachable memory to an input device, but the output device is dma
constrained, that turns into a nasty crash.

For non-network subsystems, this problem is easier because the memory
objects stay inside a single driver.



Re: SiFive Unmatched if_cad fix

2021-06-25 Thread Mark Kettenis
> Date: Fri, 25 Jun 2021 13:27:28 +
> From: Visa Hankala 
> 
> On Thu, Jun 24, 2021 at 07:02:11PM +, Mickael Torres wrote:
> > Hello,
> > 
> > On the risc-v SiFive Unmatched the internal cad0 ethernet interface stops 
> > working randomly after some packets are sent/received. It looks like it's 
> > because the bus_dmamap used isn't restricted to lower than 4GB physical 
> > addresses, and the interface itself is.
> 
> I am surprised that this has not been raised before. I also wonder if
> riscv64's DMA constraints are fully sane.

There is no DMA constraint on riscv64 yet.  We try to avoid having
such a constraint on platforms that don't have a long history, hoping
those platforms are (and remain) 64-bit "clean".  And on some modern
platforms (e.g. arm64) there is no memory below 4GB, so we can't have
a DMA constraint on those platforms.  The jury is still out where
riscv64 will end up.

There is infrastructure to have the bootloader set the DMA constraint
based on the device tree.

> > Configuring the interface for 64 bits DMA fixes the problem, and
> > the machine is now useable with its internal ethernet port.
> > 
> > I didn't test very extensively, but it was very easy to run into
> > "cad0: hresp error, interface stopped" before the patch. After the
> > patch it survived a couple hours of tests and ping -f from and to
> > it.
> > 
> > It is now depending on being compiled for __riscv64__ or not, would it be 
> > better to do it dynamically when matching "sifive,fu740-c000-gem" ?
> 
> Hopefully all 64-bit platforms have a 64-bit capable revision of the
> controller.

So far that seems to be true.  The controller on the PolarFire SoC is
also 64-bit capable.

> However, I would avoid #ifdef'ing and make the selecting of
> the DMA mode happen at runtime.
> 
> Below is how I had envisioned how the driver should work.
> 
> I have not tested the 64-bit side of the patch.

Seems to work fine.  The diff looks good to me.  Your diff does not
set a 4GB boundary in the bus_dmamap_create() call for the rings.  It
works without that, but if there really is a hardware constraint in
crossing a 4GB boundary we may need to add this in.

The hardware has a register that indicates whether 64-bit DMA is
supported.  Maybe we should look at that instead of checking the
compatible string.  But let's get this in and tweak it later.

ok kettenis@


> Index: dev/fdt/if_cad.c
> ===
> RCS file: src/sys/dev/fdt/if_cad.c,v
> retrieving revision 1.2
> diff -u -p -r1.2 if_cad.c
> --- dev/fdt/if_cad.c  13 Jun 2021 02:56:48 -  1.2
> +++ dev/fdt/if_cad.c  25 Jun 2021 13:18:22 -
> @@ -81,6 +81,7 @@
>  #define GEM_NETSR0x0008
>  #define  GEM_NETSR_PHY_MGMT_IDLE (1 << 2)
>  #define GEM_DMACR0x0010
> +#define  GEM_DMACR_DMA64 (1 << 30)
>  #define  GEM_DMACR_AHBDISC   (1 << 24)
>  #define  GEM_DMACR_RXBUF_MASK(0xff << 16)
>  #define  GEM_DMACR_RXBUF_SHIFT   16
> @@ -168,6 +169,8 @@
>  #define GEM_RXIPCCNT 0x01a8
>  #define GEM_RXTCPCCNT0x01ac
>  #define GEM_RXUDPCCNT0x01b0
> +#define GEM_TXQBASEHI0x04c8
> +#define GEM_RXQBASEHI0x04d4
>  
>  #define GEM_CLK_TX   "tx_clk"
>  
> @@ -183,11 +186,18 @@ struct cad_dmamem {
>   caddr_t cdm_kva;
>  };
>  
> -struct cad_desc {
> +struct cad_desc32 {
>   uint32_td_addr;
>   uint32_td_status;
>  };
>  
> +struct cad_desc64 {
> + uint32_td_addrlo;
> + uint32_td_status;
> + uint32_td_addrhi;
> + uint32_td_unused;
> +};
> +
>  #define GEM_RXD_ADDR_WRAP(1 << 1)
>  #define GEM_RXD_ADDR_USED(1 << 0)
>  
> @@ -250,6 +260,8 @@ struct cad_softc {
>   enum cad_phy_mode   sc_phy_mode;
>   unsigned char   sc_rxhang_erratum;
>   unsigned char   sc_rxdone;
> + unsigned char   sc_dma64;
> + size_t  sc_descsize;
>  
>   struct mii_data sc_mii;
>  #define sc_media sc_mii.mii_media
> @@ -257,14 +269,14 @@ struct cad_softc {
>  
>   struct cad_dmamem   *sc_txring;
>   struct cad_buf  *sc_txbuf;
> - struct cad_desc *sc_txdesc;
> + caddr_t sc_txdesc;
>   unsigned intsc_tx_prod;
>   unsigned intsc_tx_cons;
>  
>   struct if_rxringsc_rx_ring;
>   struct cad_dmamem   *sc_rxring;
>   struct cad_buf  *sc_rxbuf;
> - struct cad_desc *sc_rxdesc;
> + caddr_t sc_rxdesc;
>   unsigned intsc_rx_prod;
>   unsigned intsc_rx_cons;
>   uint32_tsc_netctl;
> @@ -409,6 +421,12 @@ 

Re: SiFive Unmatched if_cad fix

2021-06-25 Thread Theo de Raadt
Visa Hankala  wrote:

> On Thu, Jun 24, 2021 at 07:02:11PM +, Mickael Torres wrote:
> > Hello,
> > 
> > On the risc-v SiFive Unmatched the internal cad0 ethernet interface stops 
> > working randomly after some packets are sent/received. It looks like it's 
> > because the bus_dmamap used isn't restricted to lower than 4GB physical 
> > addresses, and the interface itself is.
> 
> I am surprised that this has not been raised before. I also wonder if
> riscv64's DMA constraints are fully sane.

This has been discussed elsewhere.

I believe as a rule, dma constraints should be safe, so that it works in
all realistic cases, including plugging 32-bit dma-restricted devices into
a machine's 64-bit dma-unrestricted busses.  Limiting physical memory which
is submitted to the dma subsystems into the lower 32-bit range is safer.
When this isn't satisfied, the dma subsystems tend to crash the machine.



Re: SiFive Unmatched if_cad fix

2021-06-25 Thread Visa Hankala
On Thu, Jun 24, 2021 at 07:02:11PM +, Mickael Torres wrote:
> Hello,
> 
> On the risc-v SiFive Unmatched the internal cad0 ethernet interface stops 
> working randomly after some packets are sent/received. It looks like it's 
> because the bus_dmamap used isn't restricted to lower than 4GB physical 
> addresses, and the interface itself is.

I am surprised that this has not been raised before. I also wonder if
riscv64's DMA constraints are fully sane.

> Configuring the interface for 64 bits DMA fixes the problem, and the machine 
> is now useable with its internal ethernet port.
> 
> I didn't test very extensively, but it was very easy to run into 
> "cad0: hresp error, interface stopped" before the patch. After the patch it 
> survived a couple hours of tests and ping -f from and to it.
> 
> It is now depending on being compiled for __riscv64__ or not, would it be 
> better to do it dynamically when matching "sifive,fu740-c000-gem" ?

Hopefully all 64-bit platforms have a 64-bit capable revision of the
controller. However, I would avoid #ifdef'ing and make the selecting of
the DMA mode happen at runtime.

Below is how I had envisioned how the driver should work.

I have not tested the 64-bit side of the patch.

Index: dev/fdt/if_cad.c
===
RCS file: src/sys/dev/fdt/if_cad.c,v
retrieving revision 1.2
diff -u -p -r1.2 if_cad.c
--- dev/fdt/if_cad.c13 Jun 2021 02:56:48 -  1.2
+++ dev/fdt/if_cad.c25 Jun 2021 13:18:22 -
@@ -81,6 +81,7 @@
 #define GEM_NETSR  0x0008
 #define  GEM_NETSR_PHY_MGMT_IDLE   (1 << 2)
 #define GEM_DMACR  0x0010
+#define  GEM_DMACR_DMA64   (1 << 30)
 #define  GEM_DMACR_AHBDISC (1 << 24)
 #define  GEM_DMACR_RXBUF_MASK  (0xff << 16)
 #define  GEM_DMACR_RXBUF_SHIFT 16
@@ -168,6 +169,8 @@
 #define GEM_RXIPCCNT   0x01a8
 #define GEM_RXTCPCCNT  0x01ac
 #define GEM_RXUDPCCNT  0x01b0
+#define GEM_TXQBASEHI  0x04c8
+#define GEM_RXQBASEHI  0x04d4
 
 #define GEM_CLK_TX "tx_clk"
 
@@ -183,11 +186,18 @@ struct cad_dmamem {
caddr_t cdm_kva;
 };
 
-struct cad_desc {
+struct cad_desc32 {
uint32_td_addr;
uint32_td_status;
 };
 
+struct cad_desc64 {
+   uint32_td_addrlo;
+   uint32_td_status;
+   uint32_td_addrhi;
+   uint32_td_unused;
+};
+
 #define GEM_RXD_ADDR_WRAP  (1 << 1)
 #define GEM_RXD_ADDR_USED  (1 << 0)
 
@@ -250,6 +260,8 @@ struct cad_softc {
enum cad_phy_mode   sc_phy_mode;
unsigned char   sc_rxhang_erratum;
unsigned char   sc_rxdone;
+   unsigned char   sc_dma64;
+   size_t  sc_descsize;
 
struct mii_data sc_mii;
 #define sc_media   sc_mii.mii_media
@@ -257,14 +269,14 @@ struct cad_softc {
 
struct cad_dmamem   *sc_txring;
struct cad_buf  *sc_txbuf;
-   struct cad_desc *sc_txdesc;
+   caddr_t sc_txdesc;
unsigned intsc_tx_prod;
unsigned intsc_tx_cons;
 
struct if_rxringsc_rx_ring;
struct cad_dmamem   *sc_rxring;
struct cad_buf  *sc_rxbuf;
-   struct cad_desc *sc_rxdesc;
+   caddr_t sc_rxdesc;
unsigned intsc_rx_prod;
unsigned intsc_rx_cons;
uint32_tsc_netctl;
@@ -409,6 +421,12 @@ cad_attach(struct device *parent, struct
}
}
 
+   sc->sc_descsize = sizeof(struct cad_desc32);
+   if (OF_is_compatible(faa->fa_node, "sifive,fu740-c000-gem")) {
+   sc->sc_descsize = sizeof(struct cad_desc64);
+   sc->sc_dma64 = 1;
+   }
+
if (OF_is_compatible(faa->fa_node, "cdns,zynq-gem"))
sc->sc_rxhang_erratum = 1;
 
@@ -547,6 +565,10 @@ cad_reset(struct cad_softc *sc)
HWRITE4(sc, GEM_IDR, ~0U);
HWRITE4(sc, GEM_RXSR, 0);
HWRITE4(sc, GEM_TXSR, 0);
+   if (sc->sc_dma64) {
+   HWRITE4(sc, GEM_RXQBASEHI, 0);
+   HWRITE4(sc, GEM_TXQBASEHI, 0);
+   }
HWRITE4(sc, GEM_RXQBASE, 0);
HWRITE4(sc, GEM_TXQBASE, 0);
 
@@ -573,7 +595,9 @@ cad_up(struct cad_softc *sc)
 {
struct ifnet *ifp = >sc_ac.ac_if;
struct cad_buf *rxb, *txb;
-   struct cad_desc *rxd, *txd;
+   struct cad_desc32 *desc32;
+   struct cad_desc64 *desc64;
+   uint64_t addr;
unsigned int i;
uint32_t val;
 
@@ -582,8 +606,11 @@ cad_up(struct cad_softc *sc)
 */
 
sc->sc_txring = cad_dmamem_alloc(sc,
-   CAD_NTXDESC * sizeof(struct cad_desc), sizeof(struct 

SiFive Unmatched if_cad fix

2021-06-24 Thread Mickael Torres
Hello,

On the risc-v SiFive Unmatched the internal cad0 ethernet interface stops 
working randomly after some packets are sent/received. It looks like it's 
because the bus_dmamap used isn't restricted to lower than 4GB physical 
addresses, and the interface itself is.

Configuring the interface for 64 bits DMA fixes the problem, and the machine 
is now useable with its internal ethernet port.

I didn't test very extensively, but it was very easy to run into 
"cad0: hresp error, interface stopped" before the patch. After the patch it 
survived a couple hours of tests and ping -f from and to it.

It is now depending on being compiled for __riscv64__ or not, would it be 
better to do it dynamically when matching "sifive,fu740-c000-gem" ?

Best,
Mickael



Index: sys/dev/fdt/if_cad.c
===
RCS file: /cvs/src/sys/dev/fdt/if_cad.c,v
retrieving revision 1.2
diff -u -p -u -r1.2 if_cad.c
--- sys/dev/fdt/if_cad.c13 Jun 2021 02:56:48 -  1.2
+++ sys/dev/fdt/if_cad.c24 Jun 2021 18:56:07 -
@@ -54,6 +54,10 @@
 #include 
 #include 
 
+#ifdef __riscv64__
+#define GEM_DMA64
+#endif
+
 #define GEM_NETCTL 0x
 #define  GEM_NETCTL_DPRAM  (1 << 18)
 #define  GEM_NETCTL_STARTTX(1 << 9)
@@ -92,6 +96,7 @@
 #define  GEM_DMACR_ES_DESCR(1 << 6)
 #define  GEM_DMACR_BLEN_MASK   (0x1f << 0)
 #define  GEM_DMACR_BLEN_16 (0x10 << 0)
+#define  GEM_DMACR_DMA64   (1 << 30)
 #define GEM_TXSR   0x0014
 #define  GEM_TXSR_TXGO (1 << 3)
 #define GEM_RXQBASE0x0018
@@ -168,6 +173,8 @@
 #define GEM_RXIPCCNT   0x01a8
 #define GEM_RXTCPCCNT  0x01ac
 #define GEM_RXUDPCCNT  0x01b0
+#define GEM_TXQBASE_HI 0x04c8
+#define GEM_RXQBASE_HI 0x04d4
 
 #define GEM_CLK_TX "tx_clk"
 
@@ -186,6 +193,10 @@ struct cad_dmamem {
 struct cad_desc {
uint32_td_addr;
uint32_td_status;
+#ifdef GEM_DMA64
+   uint32_td_addrh;
+   uint32_td_pad;
+#endif
 };
 
 #define GEM_RXD_ADDR_WRAP  (1 << 1)
@@ -211,6 +222,10 @@ struct cad_desc {
 struct cad_txdesc {
uint32_ttxd_addr;
uint32_ttxd_status;
+#ifdef GEM_DMA64
+   uint32_ttxd_addrh;
+   uint32_ttxd_pad;
+#endif
 };
 
 #define GEM_TXD_USED   (1 << 31)
@@ -549,6 +564,10 @@ cad_reset(struct cad_softc *sc)
HWRITE4(sc, GEM_TXSR, 0);
HWRITE4(sc, GEM_RXQBASE, 0);
HWRITE4(sc, GEM_TXQBASE, 0);
+#ifdef GEM_DMA64
+   HWRITE4(sc, GEM_RXQBASE_HI, 0);
+   HWRITE4(sc, GEM_TXQBASE_HI, 0);
+#endif
 
/* MDIO clock rate must not exceed 2.5 MHz. */
freq = clock_get_frequency(sc->sc_node, "pclk");
@@ -607,6 +626,10 @@ cad_up(struct cad_softc *sc)
0, sc->sc_txring->cdm_size,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 
+#ifdef GEM_DMA64
+   HWRITE4(sc, GEM_TXQBASE_HI,
+   sc->sc_txring->cdm_map->dm_segs[0].ds_addr >> 32);
+#endif
HWRITE4(sc, GEM_TXQBASE, sc->sc_txring->cdm_map->dm_segs[0].ds_addr);
 
/*
@@ -642,6 +665,10 @@ cad_up(struct cad_softc *sc)
0, sc->sc_rxring->cdm_size,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 
+#ifdef GEM_DMA64
+   HWRITE4(sc, GEM_RXQBASE_HI,
+   sc->sc_rxring->cdm_map->dm_segs[0].ds_addr >> 32);
+#endif
HWRITE4(sc, GEM_RXQBASE, sc->sc_rxring->cdm_map->dm_segs[0].ds_addr);
 
/*
@@ -697,6 +724,9 @@ cad_up(struct cad_softc *sc)
val |= (MCLBYTES / 64) << GEM_DMACR_RXBUF_SHIFT;
val &= ~GEM_DMACR_BLEN_MASK;
val |= GEM_DMACR_BLEN_16;
+#ifdef GEM_DMA64
+   val |= GEM_DMACR_DMA64;
+#endif
 
if (ifp->if_capabilities & IFCAP_CSUM_IPv4)
val |= GEM_DMACR_TXCSUMEN;
@@ -987,6 +1017,9 @@ cad_encap(struct cad_softc *sc, struct m
 
txd = >sc_txdesc[idx];
txd->d_addr = map->dm_segs[i].ds_addr;
+#ifdef GEM_DMA64
+   txd->d_addrh = map->dm_segs[i].ds_addr >> 32;
+#endif
 
/* Make d_addr visible before GEM_TXD_USED is cleared
 * in d_status. */
@@ -1151,6 +1184,10 @@ cad_rxfill(struct cad_softc *sc)
rxd = >sc_rxdesc[idx];
rxd->d_status = 0;
 
+#ifdef GEM_DMA64
+   rxd->d_addrh = rxb->bf_map->dm_segs[0].ds_addr >> 32;
+#endif
+
/* Make d_status visible before clearing GEM_RXD_ADDR_USED
 * in d_addr. */
bus_dmamap_sync(sc->sc_dmat, sc->sc_rxring->cdm_map,
@@ -1413,9 +1450,16 @@ cad_dmamem_alloc(struct cad_softc *sc, b
cdm = malloc(sizeof(*cdm), M_DEVBUF, M_WAITOK | M_ZERO);