Dear Scott Wood, > On 06/25/2012 08:33 PM, Marek Vasut wrote: > > Dear Scott Wood, > > > >> On 06/25/2012 06:37 PM, Marek Vasut wrote: > >>> Dear Scott Wood, > >>> > >>>> On 06/24/2012 07:17 PM, Marek Vasut wrote: > >>>>> but that involves a lot of copying and therefore degrades performance > >>>>> rapidly. Therefore disallow this possibility of unaligned load > >>>>> address altogether if data cache is on. > >>>> > >>>> How about use the bounce buffer only if the address is misaligned? > >>> > >>> Not happening, bounce buffer is bullshit, > >> > >> Hacking up the common frontend with a new limitation because you can't > >> be bothered to fix your drivers is bullshit. > > > > The drivers are not broken, they have hardware limitations. > > They're broken because they ignore those limitations. > > > And checking for > > those has to be done as early as possible. > > Why?
Why keep an overhead. > > And it's not a new common frontend! > > No, it's a compatibility-breaking change to the existing common frontend. Well, those are corner cases, it's not like the people will start hitting it en- masse. I agree it should be somehow platform or even CPU specific. > >>> It's like driving a car in the wrong lane. Sure, you can do it, but > >>> it'll eventually have some consequences. And using a bounce buffer is > >>> like driving a tank in the wrong lane ... > >> > >> Using a bounce buffer is like parking your car before going into the > >> building, rather than insisting the building's hallways be paved. > > > > The other is obviously faster, more comfortable and lets you carry more > > stuff at once. > > Then you end up needing buildings to be many times as large to give > every cubicle an adjacent parking spot, maneuvering room, etc. You'll > be breathing fumes all day, and it'll be a lot less comfortable to get > even across the hallway without using a car, etc. Communication between > coworkers would be limited to horns and obscene gestures. :-) Ok, this has gone waaay out of hand here :-) > > And if you drive a truck, you can dump a lot of payload instead of > > carrying it back and forth from the building. That's why there's a > > special garage for trucks possibly with cargo elevators etc. > > Yes, it's called targeted optimization rather than premature optimization. > > >>>> The > >>>> corrective action a user has to take is the same as with this patch, > >>>> except for an additional option of living with the slight performance > >>>> penalty. > >>> > >>> Slight is very weak word here. > >> > >> Prove me wrong with benchmarks. > > > > Well, copying data back and forth is tremendous overhead. You don't need > > a benchmark to calculate something like this: > > > > 133MHz SDRAM (pumped) gives you what ... 133 Mb/s throughput > > You're saying you get only a little more bandwidth from memory than > you'd get from a 100 Mb/s ethernet port? Come on. Data buses are not > one bit wide. Good point, it was too late and I forgot to count that in. > And how fast can you pull data out of a NAND chip, even with DMA? > > > (now if it's DDR, dual/quad pumped, that doesn't give you any more > > advantage > > So such things were implemented for fun? > > > since you have to: send address, read the data, send address, write the > > data ... > > What about bursts? I'm pretty sure you don't have to send the address > separately for every single byte. If you do memcpy? You only have registers, sure, you can optimize it, but on intel for example, you don't have many of those. > > this is expensive ... without data cache on, even more so) > > Why do we care about "without data cache"? You don't need the bounce > buffer in that case. Correct, it's expensive in both cases though. > > Now consider you do it via really dump memcpy, what happens: > It looks like ARM U-Boot has an optimized memcpy. > > > 1) You need to read the data into register > > 1a) Send address > > 1b) Read the data into register > > 2) You need to write the data to a new location > > 2a) Send address > > 2b) Write the data into the memory > > > > In the meantime, you get some refresh cycles etc. Now, if you take read > > and write in 1 time unit and "send address" in 0.5 time unit (this gives > > total 3 time units per one loop) and consider you're not doing sustained > > read/write, you should see you'll be able to copy at speed of about > > 133/3 ~= 40Mb/s > > > > If you want to load 3MB kernel at 40Mb/s onto an unaligned address via > > DMA, the DMA will deploy it via sustained write, that'll be at 10MB/s, > > therefore in 300ms. But the subsequent copy will take another 600ms. > > On a p5020ds, using NAND hardware that doesn't do DMA at all, I'm able > to load a 3MiB image from NAND in around 300-400 ms. This is with using > memcpy_fromio() on an uncached hardware buffer. blazing almost half a second ... but everyone these days wants a faster boot, without memcpy, it can go down to 100ms or even less! And this kind of limitation is not something that'd inconvenience anyone. > Again, I'm not saying that bounce buffers are always negligible overhead > -- just that I doubt NAND is fast enough that it makes a huge difference > in this specific case. It does make a difference. Correct, thinking about it -- implementing a generic bounce buffer for cache-impotent hardware might be a better way to go. > >>>> How often does this actually happen? How much does it > >>>> actually slow things down compared to the speed of the NAND chip? > >>> > >>> If the user is dumb, always. But if you tell the user how to milk the > >>> most of the hardware, he'll be happier. > >> > >> So, if you use bounce buffers conditionally (based on whether the > >> address is misaligned), there's no impact except to "dumb" users, and > >> for those users they would merely get a performance degradation rather > >> than breakage. How is this "bullshit"? > > > > Correct, but users will complain if they get a subpar performance. > > If you expend the minimal effort required to make the bounce buffer > usage conditional on the address actually being misaligned, the only > users that will see subpar performance are those who would see breakage > with your approach. Users will complain if they see breakage even more > than when the see subpar performance. > > If the issue is educating the user to avoid the performance hit > (regardless of magnitude), and you care enough, have the driver print a > warning (not error) message the first time it needs to use a bounce buffer. Ok, on a second thought, you're right. Let's do it the same way we did it with mmc. > -Scott Best regards, Marek Vasut _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot