David Gibson <da...@gibson.dropbear.id.au> writes: > [ Unknown signature status ] > On Wed, Sep 28, 2016 at 11:01:22AM +0530, Nikunj A Dadhania wrote: >> Load 8byte at a time and manipulate. >> >> Big-Endian Storage >> +-------------+-------------+-------------+-------------+ >> | 00 11 22 33 | 44 55 66 77 | 88 99 AA BB | CC DD EE FF | >> +-------------+-------------+-------------+-------------+ >> >> Little-Endian Storage >> +-------------+-------------+-------------+-------------+ >> | 33 22 11 00 | 77 66 55 44 | BB AA 99 88 | FF EE DD CC | >> +-------------+-------------+-------------+-------------+ >> >> Vector load results in: >> +-------------+-------------+-------------+-------------+ >> | 00 11 22 33 | 44 55 66 77 | 88 99 AA BB | CC DD EE FF | >> +-------------+-------------+-------------+-------------+ > > Ok. I'm guessing from this that implementing those GPR<->VSR > instructions showed that the earlier versions were endian-incorrect as > I suspected. > > Have you verified that this new implementation is actually faster (or > at least no slower) on LE than the original implementation with > individual 32-bit stores?
I haven't, will check it once and get back. Regards Nikunj