On Tue, 11 Feb 2014, David Laight wrote: > On Tue, Feb 11, 2014 at 04:19:26PM +0000, Eduardo Horvath wrote: > > > > We really should enhance the bus_dma framework to add bus_space-like > > accessor routines so we can implement something like this. Using bswap is > > a lousy way to implement byte swapping. Yes, on x86 you have byte swap > > instructions that allow you to work on register contents. But most RISC > > CPUs do the byte swapping in the load/store path. That really doesn't > > map well to the bswap API. Instead of one load or store operation to > > swap a 64-bit value, you need a load/store plus another dozen shift and > > mask operations. > > > > I proposed such an extension years ago. Someone might want to resurrect > > it. > > What you don't want to have is an API that swaps data in memory > (unless that is really what you want to do). > > IIRC modern gcc detects uses of its internal byteswap function > that are related to memory read/write and uses the appropriate > byte-swapping memory access. > > I can see the advantage of being able to do byteswap in the load/store > path, but sometimes that can't be arranged and a byteswap instruction > is very useful.
When do you ever really want to byte swap the contents of one register to another register? Byte swapping almost always involves I/O, which means reading or writing memory or a device register. In this case we are specifically talking about DMA, in which case there is always a load or store operation involved. The current API we have using the bswap routines is a real pain in the neck for DMA. You really want the byte swaps to happen when needed. They should be controlled by the DMA attributes of the device you're talking to along with the characteristics of the CPU and page in question. A big-endian CPU talking to a device that runs only little-endian needs to do byte swapping when accessing DMA structures. But what if the device can also support big-endian DMA? So each driver needs to determine whether it needs to do byte swapping during setup time and have code to conditionally byte swap data if needed for each access to a structure that needs DMA. > I really can't imagine implementing it being a big problem! Yes, it a big problem. For a 2 byte swap you need to do 2 shift operations, one mask operation (if you're lucky) and one or operation. Double that for a 4 byte swap. And even if you argue that a dozen CPU cycles here or there don't make much difference, the byte swap code is replicated all over the place since the routines are macros, so you're paying for it with your I$ bandwidth. Eduardo
