> It looks like you have used an automated process to convert the AArch32 > NEON code to AArch64. Will you be able to repeat that process for other > code, or at least assist others to repeat your steps?
Sorry, but I've wrote before, all of the patch were converted by hand. "converter script" didn't work correctly. # But the script was very helpful for me to understand the difference # between aarch32 and aarch64 :) > The reason I ask is that I have a large number of outstanding patches to > the ARM NEON support. Hmm... How should we proceed the implementation ? I've seen a comment that current (and I've based) pixman-arm-neon-asm*.S were optimized on older Cortex-A8. And, your new patches seem to be working well on latest Cortex chips. If so, we should first apply your latest patch to the master, and then, someone (or I ?) do the conversion to aarch64 again. It would be good both aarch32 and aarch64 worlds. # FYI: I've spent 1 week to convert all of the code, # and 2 weeks to pass all tests. On 5 April 2016 at 03:53, Ben Avison <[email protected]> wrote: > On Sat, 02 Apr 2016 13:30:58 +0100, Mizuki Asakura <[email protected]> > wrote: >> >> This patch only contains STD_FAST_PATH codes, not scaling (nearest, >> bilinear) codes. > > > Hi Mizuki, > > It looks like you have used an automated process to convert the AArch32 > NEON code to AArch64. Will you be able to repeat that process for other > code, or at least assist others to repeat your steps? > > The reason I ask is that I have a large number of outstanding patches to > the ARM NEON support. The process of getting them merged into the > FreeDesktop git repository has been very slow because there aren't many > people on this list with the time and ability to review them, however my > versions are in many cases up to twice the speed of the FreeDesktop > versions, and it would be a shame if AArch64 couldn't benefit from them. > If your AArch64 conversion is a one-time thing, it will make make it > extremely difficult to merge my changes in. > >> After completing optimization this patch, scaling related codes should be >> done. > > > One of my aims was to implement missing "iter" routines so as to accelerate > scaled plots for a much wider combination of pixels formats and Porter-Duff > combiner rules than the existing limited selection of fast paths could > cover. If you look towards the end of my patch series here: > > https://github.com/bavison/pixman/commits/arm-neon-release1 > > you'll see that I discovered that I was actually outperforming Pixman's > existing bilinear plotters so consistently that I'm advocating removing > them entirely, with the additional advantage that it simplifies the code > base a lot. So you might want to consider whether it's worth bothering > converting those to AArch64 in the first place. > > I would maybe go so far as to suggest that you try converting all the iters > first and only add fast paths if you find they do better than the iters. > One of the drawbacks of using iters is that the prefetch code can't be as > sophisticated - it can't easily be prefetching the start of the next row > while it is still working on the end of the current one. But since hardware > prefetchers are better now and conditional execution is hard in AArch64, > this will be less of a drawback with AArch64 CPUs. > > I'll also repeat what has been said, that it's very neat the way the > existing prefetch code sneaks calculations into pipeline stalls, but it was > only ever really ideal for Cortex-A8. With Cortex-A7 (despite the number, > actually a much more recent 32-bit core) I noted that it was impossible to > schedule such complex prefetch code without adding to the cycle count, at > least when the images were already in the cache. > > Ben _______________________________________________ Pixman mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/pixman
