Follow up to this as it actually turns out to be related to what I'm working on. It turns out to be a not-serious problem, but it required a small bit of thinking and planning ahead.
I'm working on a driver for a board that needs to have DMA S/G lists in coherent memory (as do many boards). These S/G lists are filled in from the dma cookies you get back for mapping a command. If the underlying mapping mechanism is an IOMMU, then the number of S/G list elements to allocate equals (roughly) the maximum command load for each board. If the underlying mapping mechanism is DMA S/G, then the number of S/G list elements to allocate equals (roughly) the maximum command load for each board times the (roughly) the expected average page length for each command. Yes, a DMA S/G mechanism /may/ have coalesced pages, but more often than not doesn't. It turns out to be prudent to build an allocation of card S/G list memory a page at a time anyway as load increases- this way you avoid problems trying to allocate a bunch of contiguous consistent memory at load time. (only to have it fail as this memory, at least on x86, tends to fragment over time). I'm fortunate in that I can chain these chunks together for this card. Other cards aren't so lucky. It might helpful to know the properties of the underlying DMA mapper so different strategies could be implemented at load time. Just a thought....