Re: AMD64 buffer cache 4GB cap anything new, multiqueueing plans? ("64bit DMA on amd64" cont)
On Tue, Nov 6, 2018 at 9:51 PM Joseph Mayer wrote: > Previously there was a years-long thread about a 4GB (32bit) buffer > cache constraint on AMD64, ref > https://marc.info/?t=14682443664=1=2 . > > What I gather is, > > * The problematique is that on AMD64, DMA is limited to 32bit >addressing, I guess because unlike AMD64 arch CPU:s which all have >64bit DMA support, popular PCI accessories and supporting hardware >out there like bridges, have DMA functionality limited to 32bit >addressing. > My read of that thread, particularly Theo's comments, is that no one actually demonstrated a case where lack of 64bit DMA caused any problems or limitations. If you have a system and use where lack of 64bit DMA creates a performance limitation, then describe it and, *more importantly*, *why* you think the DMA limit is involved. Philip Guenther
AMD64 buffer cache 4GB cap anything new, multiqueueing plans? ("64bit DMA on amd64" cont)
Hi, Previously there was a years-long thread about a 4GB (32bit) buffer cache constraint on AMD64, ref https://marc.info/?t=14682443664=1=2 . What I gather is, * The problematique is that on AMD64, DMA is limited to 32bit addressing, I guess because unlike AMD64 arch CPU:s which all have 64bit DMA support, popular PCI accessories and supporting hardware out there like bridges, have DMA functionality limited to 32bit addressing. (Is this a feature of lower-quality hardware, or for very old PCI devices, or is it systemic to the whole AMD64 ecosystem today? Could a system be configured to use 64bit DMA on AMD64 and be expected to work presuming recent or higher-quality / well-selected hardware?) * The OS asks the disk hardware to load disk data to give memory locations via DMA, and then userland fread() and mmap() is fed with that data - no need for further data moving or mapping. This is the dynamics leading to the 4GB cap. And, the 4GB cap is kind of constraining for any computer with much RAM and lots of disk reading, as it means lots of reads that wouldn't need to hit the disk (as it could be cached using all this free memory) isn't cached and is directed to disk anyhow which takes a lot of time, yes? * This was recognized a long time ago and Bob wrote a solution in the form of a "buffer cache flipper" that would push buffer cache data out of the 32bit area (to "high memory" as in >32bit) hence lifting the limit, via a "(generic) backpressure" mechanism that as a bonus used the DMA engine to do the memory moving, I guess this means that the buffer cache would be pretty much zero-cost to the CPU - sounds incredibly neat! And then, it didn't really work, malfunctioned and irritated people (was "busted" - for unknown reasons, actually why was it?) and Theo wrote it will be fixed in the future. Has it been fixed since? Also - when fixed, fread() and mmap() reads to data that's in the buffer cache will be incredibly fast right, as, in optimal conditions the mmap:ed addresses will be already-mapped to the buffer cache data and hence in optimal conditions mmap:ed buffer cache data reads will have the speed of any memory access, right? (The ML thread also mentioned an undeadly.org post discussing this topic, however both searching and browsing I can't find it, the closest i find is 5 words here https://undeadly.org/cgi?action=article;sid=20170815171854 - do you have any URL?) Last, OpenBSD's biggest limit as an OS seems to be that the disk/file subsystem is sequential. A modern SSD can read at 2.8GB/sec but that requires parallellism, without multiqueueing and with small reads e.g. 4KB or smaller, speeds stay around 70-120MB/sec = ~3.5% of the hardware's potential performance. This would be really worthy goal to donate to for instance, in particular as OpenBSD leads the way in many other areas. Are there any thoughts about implementing this in the future? Thanks, Joseph
Re: 64bit DMA on amd64
On Sat, Nov 11, 2017 at 4:22 PM,wrote: > Theo 2016-07-11 15:09:48, https://marc.info/?l=openbsd- > tech=146824981122013=2 , > https://marc.info/?l=openbsd-tech=146825098022380=2 : > > And bufs don't need it either. Have you actually cranked your buffer > > cache that high? I have test this, on sparc64 which has unlimited DMA > > reach due to the iommu. The system comes to a crawl when there are > > too many mbufs or bufs, probably due to management structures unable > > to handle the pressure. > > Theo 2016-07-11 16:16:13 , https://marc.info/?l=openbsd- > tech=146825379723312=2 : > > I was simply pointing out that massive (well above 4GB) buffer cache > > on a 64-bit DMA-reachable machine worked poorly. Likely due to data > > structures managing the memory with rather large O... > > What algorithms drive the buffer cache structure now? If I recall Bob and Ted's undeadly posts correctly, the buffers are both in per-vnode red-black trees and a global 2Q structure to manage the total set of buffers. (How those names will be useful I don't know.) Philip Guenther
Re: 64bit DMA on amd64
Theo 2016-07-11 15:09:48, https://marc.info/?l=openbsd-tech=146824981122013=2 , https://marc.info/?l=openbsd-tech=146825098022380=2 : > And bufs don't need it either. Have you actually cranked your buffer > cache that high? I have test this, on sparc64 which has unlimited DMA > reach due to the iommu. The system comes to a crawl when there are > too many mbufs or bufs, probably due to management structures unable > to handle the pressure. Theo 2016-07-11 16:16:13 , https://marc.info/?l=openbsd-tech=146825379723312=2 : > I was simply pointing out that massive (well above 4GB) buffer cache > on a 64-bit DMA-reachable machine worked poorly. Likely due to data > structures managing the memory with rather large O... What algorithms drive the buffer cache structure now?
Re: 64bit DMA on amd64
(Reply to misc@ I presume.) Hi Theo / list, Some humble followup questions regarding the previous buffer cache conversation. In particular curious what the crawl was that you saw in the very large buffer cache test you made on Sparc64? On 2016-07-12 00:16, Theo de Raadt wrote: [...] The buffer cache flipper was going to give us very large buffer cache compared to other systems. Until it is finished, we are still doing fine. What do you mean by very large compared to other systems, do other OS:es have any limit to it within their software architecture? Just to get the idea. [...] I was simply pointing out that massive (well above 4GB) buffer cache on a 64-bit DMA-reachable machine worked poorly. Likely due to data structures managing the memory with rather large O... (What did you mean by "O..."?) On 2016-07-11 23:09, Theo de Raadt wrote: [...] And bufs don't need it either. Have you actually cranked your buffer cache that high? I have test this, on sparc64 which has unlimited DMA reach due to the iommu. The system comes to a crawl when there are too many mbufs or bufs, probably due to management structures unable to handle the pressure. At what kind of sizes does it start to crawl, how is the crawling experienced on the user level, why is the crawling / what kind of pressure on management structures are we talking, how can it be CPU-expensive? ( On 2016-07-11 23:29, Theo de Raadt wrote: [...] BTW, my tests were on a 128GB sun4v machine. Sun T5140. They are actually fairly cheap used these days. Not sure how that affects the benchmarkas I not understand the performance characteristics of the Sun T2 CPU, http://johnjmclaughlin.blogspot.hk/2007/10/utrasparc-t2-server-benchmark-results.html ) On 2016-07-12 00:07, Mark Kettenis wrote: .. Except that the flipper isn't enabled yet and that the backpressure mechanism is busted somewhow. At least that is what the recent experiment with cranking up the buffer cache limit showed us. People screamed and we backed the change out again. And there were problems on amd64 and sparc64 alike. What function does/would the backpressure mechanism serve do on Sparc64? Also last and very much secondarily, if you have any guess on if ARM64 and Power8 would have 64bit DMA (and hence like Sparc64 no buffer cache size limit) or not (and hence be like AMD64 with a 32bit buffer cache size limit), that would be interesting to learn to know. Thanks! Tinker
Re: 64bit DMA on amd64
> Except that the flipper isn't enabled yet and that the backpressure > mechanism is busted somewhow. At least that is what the recent > experiment with cranking up the buffer cache limit showed us. > People screamed and we backed the change out again. And there were > problems on amd64 and sparc64 alike. Which means the generic backpressure mechanism is busted. As a result, we currently rely on the 4GB dma limit as a forwardpressure subsystem, and tuneables which keep the buffer cache small. The buffer cache flipper was going to give us very large buffer cache compared to other systems. Until it is finished, we are still doing fine. > What we probably need is help fixing the buffer cache. Then we can > enable the flipper. And then we see if 64-bit DMA is still a > requirement. I was simply pointing out that massive (well above 4GB) buffer cache on a 64-bit DMA-reachable machine worked poorly. Likely due to data structures managing the memory with rather large O... Chasing DMA-reachability on a theory that it helps some subsystem... some substantiation is required. In my experience (and I think yours), there are other hurdles.
Re: 64bit DMA on amd64
> From: "Theo de Raadt"> Date: Mon, 11 Jul 2016 09:29:16 -0600 > > > > And bufs don't need it either. Have you actually cranked your buffer > > > cache that high? I have test this, on sparc64 which has unlimited DMA > > > reach due to the iommu. The system comes to a crawl when there are > > > too many mbufs or bufs, probably due to management structures unable > > > to handle the pressure. > > > > No, I didn't know that. I assumed that having a few more GBs of bufcache > > would help the performance. Until that is the case, 64bit dma does not > > make much sense. > > BTW, my tests were on a 128GB sun4v machine. Sun T5140. They are > actually fairly cheap used these days. > > A maximum sized buffer cache should be fast. However there is no need > for it to be dma-reachable. Bob's buffer cache flipper can bounce it > to high memory easily after it is read the first time, and preserve it > in otherwise unused memory. A buffer cache object of that sort is > never written back to the io path. Also, it can be discarded in any > memory shortage condition without cost. Except that the flipper isn't enabled yet and that the backpressure mechanism is busted somewhow. At least that is what the recent experiment with cranking up the buffer cache limit showed us. People screamed and we backed the change out again. And there were problems on amd64 and sparc64 alike. What we probably need is help fixing the buffer cache. Then we can enable the flipper. And then we see if 64-bit DMA is still a requirement.
Re: 64bit DMA on amd64
> On Mon, 11 Jul 2016, Theo de Raadt wrote: > > > No, I didn't know that. I assumed that having a few more GBs of bufcache > > > would help the performance. Until that is the case, 64bit dma does not > > > make much sense. > > > > BTW, my tests were on a 128GB sun4v machine. Sun T5140. They are > > actually fairly cheap used these days. > > > > A maximum sized buffer cache should be fast. However there is no need > > for it to be dma-reachable. Bob's buffer cache flipper can bounce it > > to high memory easily after it is read the first time, and preserve it > > in otherwise unused memory. A buffer cache object of that sort is > > never written back to the io path. Also, it can be discarded in any > > memory shortage condition without cost. > > But flipping buffers is not without cost. Especially for a SSD at rates of > >200 MB/s (or even > 500 MB/s). With 64bit DMA, one could have a large > buffer cache without this cost. But actual benchmarks would be required to > see how relevant this is. Stefan -- you don't understand the system. Buffers are not flipped at the moment of read or write. They are read into available dma memory. They are used by process immediately, without latency. At a later time when they are about to be thrown away to (to conserve dma memory), they are not thrown away but asyncronously / low-cost flipped to high memory, and conserved. Then future reads can find that the on-disk blocks are still cached in (high) memory. DMA reachability is not required to copy that memory to processes. You are suggesting that buf storage is latency sensitive. That is not the case.
Re: 64bit DMA on amd64
On Mon, 11 Jul 2016, Theo de Raadt wrote: > > No, I didn't know that. I assumed that having a few more GBs of bufcache > > would help the performance. Until that is the case, 64bit dma does not > > make much sense. > > BTW, my tests were on a 128GB sun4v machine. Sun T5140. They are > actually fairly cheap used these days. > > A maximum sized buffer cache should be fast. However there is no need > for it to be dma-reachable. Bob's buffer cache flipper can bounce it > to high memory easily after it is read the first time, and preserve it > in otherwise unused memory. A buffer cache object of that sort is > never written back to the io path. Also, it can be discarded in any > memory shortage condition without cost. But flipping buffers is not without cost. Especially for a SSD at rates of >200 MB/s (or even > 500 MB/s). With 64bit DMA, one could have a large buffer cache without this cost. But actual benchmarks would be required to see how relevant this is.
Re: 64bit DMA on amd64
> > And bufs don't need it either. Have you actually cranked your buffer > > cache that high? I have test this, on sparc64 which has unlimited DMA > > reach due to the iommu. The system comes to a crawl when there are > > too many mbufs or bufs, probably due to management structures unable > > to handle the pressure. > > No, I didn't know that. I assumed that having a few more GBs of bufcache > would help the performance. Until that is the case, 64bit dma does not > make much sense. BTW, my tests were on a 128GB sun4v machine. Sun T5140. They are actually fairly cheap used these days. A maximum sized buffer cache should be fast. However there is no need for it to be dma-reachable. Bob's buffer cache flipper can bounce it to high memory easily after it is read the first time, and preserve it in otherwise unused memory. A buffer cache object of that sort is never written back to the io path. Also, it can be discarded in any memory shortage condition without cost.
Re: 64bit DMA on amd64
On Mon, 11 Jul 2016, Theo de Raadt wrote: > > Openbsd on amd64 assumes that DMA is only possible to the lower 4GB. > > Not exactly. On an architecture-by-architecture basis, OpenBSD is > capable of insisting DMA reachable memory only lands in a smaller zone > of memory -- because it makes the other layers of code easier. > > > More interesting would be bufs and mbufs. > > Why is it interesting for mbufs? Please describe the environment > where anywhere near that many mbufs make sense. > > And bufs don't need it either. Have you actually cranked your buffer > cache that high? I have test this, on sparc64 which has unlimited DMA > reach due to the iommu. The system comes to a crawl when there are > too many mbufs or bufs, probably due to management structures unable > to handle the pressure. No, I didn't know that. I assumed that having a few more GBs of bufcache would help the performance. Until that is the case, 64bit dma does not make much sense. > > What is the usage case for this diff, if it cannot be enabled? >
Re: 64bit DMA on amd64
> BTW, for usb devices, it probably depends on the host controller if 64bit > dma is possible or not. I guess most xhci controllers will be able to do > it. The 4GB limitation is a simple solution to a wide variety of problems. Please describe a situation where 4GB of dma memory is a limitation. > > That said, I'm not 100% convinced the fear of bounce buffers is justified. > > If > > a USB device requires bouncing, it's already pretty slow. What are we > > optimizing for again? > > True for spinning disks or usb storage sticks. But an SSD attached via USB > 3.x is not slow. The buffer cache is cabable of doing flipping at the right point. I still cannot identify a need for 64 bit dma. Why?
Re: 64bit DMA on amd64
> Openbsd on amd64 assumes that DMA is only possible to the lower 4GB. Not exactly. On an architecture-by-architecture basis, OpenBSD is capable of insisting DMA reachable memory only lands in a smaller zone of memory -- because it makes the other layers of code easier. > More interesting would be bufs and mbufs. Why is it interesting for mbufs? Please describe the environment where anywhere near that many mbufs make sense. And bufs don't need it either. Have you actually cranked your buffer cache that high? I have test this, on sparc64 which has unlimited DMA reach due to the iommu. The system comes to a crawl when there are too many mbufs or bufs, probably due to management structures unable to handle the pressure. What is the usage case for this diff, if it cannot be enabled?
Re: 64bit DMA on amd64
On Mon, 11 Jul 2016, Ted Unangst wrote: > Stefan Fritsch wrote: > > On Mon, 11 Jul 2016, Reyk Floeter wrote: > > > The intentional 4GB limit is for forwarding: what if you forward mbufs > > > from a 64bit-capable interface to another one that doesn't support 64bit > > > DMA? And even if you would only enable it if all interfaces are > > > 64bit-capable, what if you plug in a 32bit USB/hotplug interface? We did > > > not want to support bounce buffers in OpenBSD. > > > > Yes, I have understood that. My mail was more about non-mbuf DMA: Does it > > make sense to allow 64bit DMA in other cases while keeping the 4GB > > limitation for mbufs? > > every kind of device can be attached via usb now. for the code that supports > flipping, like bufcache, this is still tricky to handle dynamic limit changes. > what happens to buffers marked DMA that suddenly aren't? I guess the flipping would have to be done just before the device driver is called, but after it is clear which device driver will be called. Not sure if that is feasible or worth the effort. That's what I wanted to find out with my mail ;) BTW, for usb devices, it probably depends on the host controller if 64bit dma is possible or not. I guess most xhci controllers will be able to do it. > That said, I'm not 100% convinced the fear of bounce buffers is justified. If > a USB device requires bouncing, it's already pretty slow. What are we > optimizing for again? True for spinning disks or usb storage sticks. But an SSD attached via USB 3.x is not slow. > Or something could be done to bring iommu to life. The problem is that there are many systems that dont' have any. Or for openbsd in VMs, it may be expensive for the host system to emulate the iommu. Cheers, Stefan
Re: 64bit DMA on amd64
> From: "Ted Unangst"> Date: Mon, 11 Jul 2016 10:45:19 -0400 > > Stefan Fritsch wrote: > > On Mon, 11 Jul 2016, Reyk Floeter wrote: > > > The intentional 4GB limit is for forwarding: what if you forward mbufs > > > from a 64bit-capable interface to another one that doesn't support 64bit > > > DMA? And even if you would only enable it if all interfaces are > > > 64bit-capable, what if you plug in a 32bit USB/hotplug interface? We did > > > not want to support bounce buffers in OpenBSD. > > > > Yes, I have understood that. My mail was more about non-mbuf DMA: Does it > > make sense to allow 64bit DMA in other cases while keeping the 4GB > > limitation for mbufs? > > every kind of device can be attached via usb now. for the code that > supports flipping, like bufcache, this is still tricky to handle > dynamic limit changes. what happens to buffers marked DMA that > suddenly aren't? Actually, as long as the usb controller implements 64-bit DMA, all these devices should work just fine. It's just that not all USB controllers support this and that our uhci(4), ohci(4) and ehci(4) drivers don't support this. > That said, I'm not 100% convinced the fear of bounce buffers is justified. If > a USB device requires bouncing, it's already pretty slow. What are we > optimizing for again? Right. At some point the vast majority of the amd64 hardware we run on will be 64-bit "clean". The major issue here is that we don't really trust all the legacy drivers to do the proper bus_dmamap_sync() operations that are needed for bounce buffers to work. But perhaps that's an argument to do this sooner than later such that we can fix things while hardware is still around.
Re: 64bit DMA on amd64
Stefan Fritsch wrote: > On Mon, 11 Jul 2016, Reyk Floeter wrote: > > The intentional 4GB limit is for forwarding: what if you forward mbufs > > from a 64bit-capable interface to another one that doesn't support 64bit > > DMA? And even if you would only enable it if all interfaces are > > 64bit-capable, what if you plug in a 32bit USB/hotplug interface? We did > > not want to support bounce buffers in OpenBSD. > > Yes, I have understood that. My mail was more about non-mbuf DMA: Does it > make sense to allow 64bit DMA in other cases while keeping the 4GB > limitation for mbufs? every kind of device can be attached via usb now. for the code that supports flipping, like bufcache, this is still tricky to handle dynamic limit changes. what happens to buffers marked DMA that suddenly aren't? That said, I'm not 100% convinced the fear of bounce buffers is justified. If a USB device requires bouncing, it's already pretty slow. What are we optimizing for again? Or something could be done to bring iommu to life.
Re: 64bit DMA on amd64
> Date: Mon, 11 Jul 2016 16:10:04 +0200 (CEST) > From: Stefan Fritsch> > On Mon, 11 Jul 2016, Reyk Floeter wrote: > > The intentional 4GB limit is for forwarding: what if you forward mbufs > > from a 64bit-capable interface to another one that doesn't support 64bit > > DMA? And even if you would only enable it if all interfaces are > > 64bit-capable, what if you plug in a 32bit USB/hotplug interface? We did > > not want to support bounce buffers in OpenBSD. > > Yes, I have understood that. My mail was more about non-mbuf DMA: Does it > make sense to allow 64bit DMA in other cases while keeping the 4GB > limitation for mbufs? My guess is: not really. I have a hard time coming up with a driver that will allocate significant amounts of DMA memory that isn't a disk or a network driver.
Re: 64bit DMA on amd64
On Mon, 11 Jul 2016, Reyk Floeter wrote: > The intentional 4GB limit is for forwarding: what if you forward mbufs > from a 64bit-capable interface to another one that doesn't support 64bit > DMA? And even if you would only enable it if all interfaces are > 64bit-capable, what if you plug in a 32bit USB/hotplug interface? We did > not want to support bounce buffers in OpenBSD. Yes, I have understood that. My mail was more about non-mbuf DMA: Does it make sense to allow 64bit DMA in other cases while keeping the 4GB limitation for mbufs? Cheers, Stefan > > Reyk > > > On 11.07.2016, at 15:37, Stefan Fritschwrote: > > > > Hi, > > > > following the discussion about mbufs, I have some questions about 64bit > > DMA in general. > > > > Openbsd on amd64 assumes that DMA is only possible to the lower 4GB. But > > there are many devices (PCIe, virtio, ...) that can do DMA to the whole > > memory. Is it feasible to have known good devices opt-in into 64bit DMA? > > > > I have done a patch that allows virtio to do 64bit DMA. This works insofar > > as the queues used by the device will now be allocated above 4GB. But this > > is only a small amount of memory. More interesting would be bufs and > > mbufs. > > > > > > For bufs, Bob added some code for copying bufs above/below 4GB. But this > > code only has a single flag B_DMA to denote if DMA is possible into a buf > > or not. Would it make sense to replace that by a mechanism that is device > > specific, so that we can use devices efficiently that allow 64bit DMA? > > Maybe a flag in the device vnode? > > > > > > Does it make sense to commit something like the diff below (not tested > > much), even if it saves at most a few MB below 4GB right now? > > > > Cheers, > > Stefan > > > > > > diff --git sys/arch/amd64/amd64/bus_dma.c sys/arch/amd64/amd64/bus_dma.c > > index 8eaa2e7..1aba7c0 100644 > > --- sys/arch/amd64/amd64/bus_dma.c > > +++ sys/arch/amd64/amd64/bus_dma.c > > @@ -293,6 +293,7 @@ _bus_dmamap_load_raw(bus_dma_tag_t t, bus_dmamap_t map, > > bus_dma_segment_t *segs, > > { > > bus_addr_t paddr, baddr, bmask, lastaddr = 0; > > bus_size_t plen, sgsize, mapsize; > > + struct uvm_constraint_range *constraint = t->_cookie; > > int first = 1; > > int i, seg = 0; > > > > @@ -320,7 +321,7 @@ _bus_dmamap_load_raw(bus_dma_tag_t t, bus_dmamap_t map, > > bus_dma_segment_t *segs, > > if (plen < sgsize) > > sgsize = plen; > > > > - if (paddr > dma_constraint.ucr_high) > > + if (paddr > constraint->ucr_high) > > panic("Non dma-reachable buffer at paddr > > %#lx(raw)", > > paddr); > > > > @@ -405,15 +406,11 @@ _bus_dmamem_alloc(bus_dma_tag_t t, bus_size_t size, > > bus_size_t alignment, > > bus_size_t boundary, bus_dma_segment_t *segs, int nsegs, int *rsegs, > > int flags) > > { > > + struct uvm_constraint_range *constraint = t->_cookie; > > > > - /* > > -* XXX in the presence of decent (working) iommus and bouncebuffers > > -* we can then fallback this allocation to a range of { 0, -1 }. > > -* However for now we err on the side of caution and allocate dma > > -* memory under the 4gig boundary. > > -*/ > > return (_bus_dmamem_alloc_range(t, size, alignment, boundary, > > - segs, nsegs, rsegs, flags, (bus_addr_t)0, (bus_addr_t)0x)); > > + segs, nsegs, rsegs, flags, (bus_addr_t)constraint->ucr_low, > > + (bus_addr_t)constraint->ucr_high)); > > } > > > > /* > > @@ -567,6 +564,7 @@ _bus_dmamap_load_buffer(bus_dma_tag_t t, bus_dmamap_t > > map, void *buf, > > bus_size_t sgsize; > > bus_addr_t curaddr, lastaddr, baddr, bmask; > > vaddr_t vaddr = (vaddr_t)buf; > > + struct uvm_constraint_range *constraint = t->_cookie; > > int seg; > > pmap_t pmap; > > > > @@ -584,7 +582,7 @@ _bus_dmamap_load_buffer(bus_dma_tag_t t, bus_dmamap_t > > map, void *buf, > > */ > > pmap_extract(pmap, vaddr, (paddr_t *)); > > > > - if (curaddr > dma_constraint.ucr_high) > > + if (curaddr > constraint->ucr_high) > > panic("Non dma-reachable buffer at curaddr %#lx(raw)", > > curaddr); > > > > diff --git sys/arch/amd64/amd64/machdep.c sys/arch/amd64/amd64/machdep.c > > index de9f481..7640532 100644 > > --- sys/arch/amd64/amd64/machdep.c > > +++ sys/arch/amd64/amd64/machdep.c > > @@ -201,6 +201,12 @@ struct vm_map *phys_map = NULL; > > > > /* UVM constraint ranges. */ > > struct uvm_constraint_range isa_constraint = { 0x0, 0x00ffUL }; > > + /* > > +* XXX in the presence of decent (working) iommus and bouncebuffers > > +* we can then fallback this allocation to a range of { 0, -1 }. > > +* However for now we err on the side of caution and allocate dma > > +* memory under the 4gig boundary. >
Re: 64bit DMA on amd64
Hi, The intentional 4GB limit is for forwarding: what if you forward mbufs from a 64bit-capable interface to another one that doesn't support 64bit DMA? And even if you would only enable it if all interfaces are 64bit-capable, what if you plug in a 32bit USB/hotplug interface? We did not want to support bounce buffers in OpenBSD. Reyk > On 11.07.2016, at 15:37, Stefan Fritschwrote: > > Hi, > > following the discussion about mbufs, I have some questions about 64bit > DMA in general. > > Openbsd on amd64 assumes that DMA is only possible to the lower 4GB. But > there are many devices (PCIe, virtio, ...) that can do DMA to the whole > memory. Is it feasible to have known good devices opt-in into 64bit DMA? > > I have done a patch that allows virtio to do 64bit DMA. This works insofar > as the queues used by the device will now be allocated above 4GB. But this > is only a small amount of memory. More interesting would be bufs and > mbufs. > > > For bufs, Bob added some code for copying bufs above/below 4GB. But this > code only has a single flag B_DMA to denote if DMA is possible into a buf > or not. Would it make sense to replace that by a mechanism that is device > specific, so that we can use devices efficiently that allow 64bit DMA? > Maybe a flag in the device vnode? > > > Does it make sense to commit something like the diff below (not tested > much), even if it saves at most a few MB below 4GB right now? > > Cheers, > Stefan > > > diff --git sys/arch/amd64/amd64/bus_dma.c sys/arch/amd64/amd64/bus_dma.c > index 8eaa2e7..1aba7c0 100644 > --- sys/arch/amd64/amd64/bus_dma.c > +++ sys/arch/amd64/amd64/bus_dma.c > @@ -293,6 +293,7 @@ _bus_dmamap_load_raw(bus_dma_tag_t t, bus_dmamap_t map, > bus_dma_segment_t *segs, > { > bus_addr_t paddr, baddr, bmask, lastaddr = 0; > bus_size_t plen, sgsize, mapsize; > + struct uvm_constraint_range *constraint = t->_cookie; > int first = 1; > int i, seg = 0; > > @@ -320,7 +321,7 @@ _bus_dmamap_load_raw(bus_dma_tag_t t, bus_dmamap_t map, > bus_dma_segment_t *segs, > if (plen < sgsize) > sgsize = plen; > > - if (paddr > dma_constraint.ucr_high) > + if (paddr > constraint->ucr_high) > panic("Non dma-reachable buffer at paddr > %#lx(raw)", > paddr); > > @@ -405,15 +406,11 @@ _bus_dmamem_alloc(bus_dma_tag_t t, bus_size_t size, > bus_size_t alignment, > bus_size_t boundary, bus_dma_segment_t *segs, int nsegs, int *rsegs, > int flags) > { > + struct uvm_constraint_range *constraint = t->_cookie; > > - /* > - * XXX in the presence of decent (working) iommus and bouncebuffers > - * we can then fallback this allocation to a range of { 0, -1 }. > - * However for now we err on the side of caution and allocate dma > - * memory under the 4gig boundary. > - */ > return (_bus_dmamem_alloc_range(t, size, alignment, boundary, > - segs, nsegs, rsegs, flags, (bus_addr_t)0, (bus_addr_t)0x)); > + segs, nsegs, rsegs, flags, (bus_addr_t)constraint->ucr_low, > + (bus_addr_t)constraint->ucr_high)); > } > > /* > @@ -567,6 +564,7 @@ _bus_dmamap_load_buffer(bus_dma_tag_t t, bus_dmamap_t > map, void *buf, > bus_size_t sgsize; > bus_addr_t curaddr, lastaddr, baddr, bmask; > vaddr_t vaddr = (vaddr_t)buf; > + struct uvm_constraint_range *constraint = t->_cookie; > int seg; > pmap_t pmap; > > @@ -584,7 +582,7 @@ _bus_dmamap_load_buffer(bus_dma_tag_t t, bus_dmamap_t > map, void *buf, >*/ > pmap_extract(pmap, vaddr, (paddr_t *)); > > - if (curaddr > dma_constraint.ucr_high) > + if (curaddr > constraint->ucr_high) > panic("Non dma-reachable buffer at curaddr %#lx(raw)", > curaddr); > > diff --git sys/arch/amd64/amd64/machdep.c sys/arch/amd64/amd64/machdep.c > index de9f481..7640532 100644 > --- sys/arch/amd64/amd64/machdep.c > +++ sys/arch/amd64/amd64/machdep.c > @@ -201,6 +201,12 @@ struct vm_map *phys_map = NULL; > > /* UVM constraint ranges. */ > struct uvm_constraint_range isa_constraint = { 0x0, 0x00ffUL }; > + /* > + * XXX in the presence of decent (working) iommus and bouncebuffers > + * we can then fallback this allocation to a range of { 0, -1 }. > + * However for now we err on the side of caution and allocate dma > + * memory under the 4gig boundary. > + */ > struct uvm_constraint_range dma_constraint = { 0x0, 0xUL }; > struct uvm_constraint_range *uvm_md_constraints[] = { > _constraint, > diff --git sys/arch/amd64/include/pci_machdep.h > sys/arch/amd64/include/pci_machdep.h > index 27b833b..bf54f31 100644 > --- sys/arch/amd64/include/pci_machdep.h > +++ sys/arch/amd64/include/pci_machdep.h > @@
64bit DMA on amd64
Hi, following the discussion about mbufs, I have some questions about 64bit DMA in general. Openbsd on amd64 assumes that DMA is only possible to the lower 4GB. But there are many devices (PCIe, virtio, ...) that can do DMA to the whole memory. Is it feasible to have known good devices opt-in into 64bit DMA? I have done a patch that allows virtio to do 64bit DMA. This works insofar as the queues used by the device will now be allocated above 4GB. But this is only a small amount of memory. More interesting would be bufs and mbufs. For bufs, Bob added some code for copying bufs above/below 4GB. But this code only has a single flag B_DMA to denote if DMA is possible into a buf or not. Would it make sense to replace that by a mechanism that is device specific, so that we can use devices efficiently that allow 64bit DMA? Maybe a flag in the device vnode? Does it make sense to commit something like the diff below (not tested much), even if it saves at most a few MB below 4GB right now? Cheers, Stefan diff --git sys/arch/amd64/amd64/bus_dma.c sys/arch/amd64/amd64/bus_dma.c index 8eaa2e7..1aba7c0 100644 --- sys/arch/amd64/amd64/bus_dma.c +++ sys/arch/amd64/amd64/bus_dma.c @@ -293,6 +293,7 @@ _bus_dmamap_load_raw(bus_dma_tag_t t, bus_dmamap_t map, bus_dma_segment_t *segs, { bus_addr_t paddr, baddr, bmask, lastaddr = 0; bus_size_t plen, sgsize, mapsize; + struct uvm_constraint_range *constraint = t->_cookie; int first = 1; int i, seg = 0; @@ -320,7 +321,7 @@ _bus_dmamap_load_raw(bus_dma_tag_t t, bus_dmamap_t map, bus_dma_segment_t *segs, if (plen < sgsize) sgsize = plen; - if (paddr > dma_constraint.ucr_high) + if (paddr > constraint->ucr_high) panic("Non dma-reachable buffer at paddr %#lx(raw)", paddr); @@ -405,15 +406,11 @@ _bus_dmamem_alloc(bus_dma_tag_t t, bus_size_t size, bus_size_t alignment, bus_size_t boundary, bus_dma_segment_t *segs, int nsegs, int *rsegs, int flags) { + struct uvm_constraint_range *constraint = t->_cookie; - /* -* XXX in the presence of decent (working) iommus and bouncebuffers -* we can then fallback this allocation to a range of { 0, -1 }. -* However for now we err on the side of caution and allocate dma -* memory under the 4gig boundary. -*/ return (_bus_dmamem_alloc_range(t, size, alignment, boundary, - segs, nsegs, rsegs, flags, (bus_addr_t)0, (bus_addr_t)0x)); + segs, nsegs, rsegs, flags, (bus_addr_t)constraint->ucr_low, + (bus_addr_t)constraint->ucr_high)); } /* @@ -567,6 +564,7 @@ _bus_dmamap_load_buffer(bus_dma_tag_t t, bus_dmamap_t map, void *buf, bus_size_t sgsize; bus_addr_t curaddr, lastaddr, baddr, bmask; vaddr_t vaddr = (vaddr_t)buf; + struct uvm_constraint_range *constraint = t->_cookie; int seg; pmap_t pmap; @@ -584,7 +582,7 @@ _bus_dmamap_load_buffer(bus_dma_tag_t t, bus_dmamap_t map, void *buf, */ pmap_extract(pmap, vaddr, (paddr_t *)); - if (curaddr > dma_constraint.ucr_high) + if (curaddr > constraint->ucr_high) panic("Non dma-reachable buffer at curaddr %#lx(raw)", curaddr); diff --git sys/arch/amd64/amd64/machdep.c sys/arch/amd64/amd64/machdep.c index de9f481..7640532 100644 --- sys/arch/amd64/amd64/machdep.c +++ sys/arch/amd64/amd64/machdep.c @@ -201,6 +201,12 @@ struct vm_map *phys_map = NULL; /* UVM constraint ranges. */ struct uvm_constraint_range isa_constraint = { 0x0, 0x00ffUL }; + /* +* XXX in the presence of decent (working) iommus and bouncebuffers +* we can then fallback this allocation to a range of { 0, -1 }. +* However for now we err on the side of caution and allocate dma +* memory under the 4gig boundary. +*/ struct uvm_constraint_range dma_constraint = { 0x0, 0xUL }; struct uvm_constraint_range *uvm_md_constraints[] = { _constraint, diff --git sys/arch/amd64/include/pci_machdep.h sys/arch/amd64/include/pci_machdep.h index 27b833b..bf54f31 100644 --- sys/arch/amd64/include/pci_machdep.h +++ sys/arch/amd64/include/pci_machdep.h @@ -41,6 +41,7 @@ */ extern struct bus_dma_tag pci_bus_dma_tag; +extern struct bus_dma_tag virtio_pci_bus_dma_tag; /* * Types provided to machine-independent PCI code diff --git sys/arch/amd64/isa/isa_machdep.c sys/arch/amd64/isa/isa_machdep.c index 74dc907..ec35edead 100644 --- sys/arch/amd64/isa/isa_machdep.c +++ sys/arch/amd64/isa/isa_machdep.c @@ -140,7 +140,7 @@ void_isa_dma_free_bouncebuf(bus_dma_tag_t, bus_dmamap_t); * buffers, if necessary. */ struct bus_dma_tag isa_bus_dma_tag = { - NULL,