[x265] [PATCH] asm: Optimized sad_64xN for better cache performance. Reduced lea instruction by half. Performance gain is average +5x w.r.t. previous asm code

2013-10-31 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar Gorade dnyanesh...@multicorewareinc.com # Date 1383216695 -19800 # Thu Oct 31 16:21:35 2013 +0530 # Node ID 86ff1a3ec89720a73325148e8ac01ec1dbdab3c2 # Parent 5d6ed411995acd674b838f989385c61039760780 asm: Optimized sad_64xN for better cache performance.

Re: [x265] [PATCH] asm: Optimized sad_64xN for better cache performance. Reduced lea instruction by half. Performance gain is average +5x w.r.t. previous asm code

2013-10-31 Thread chen
right except pixel_sad_64x32, it is loop 2 times only, I am not sure which is better between loop 4 times and all unroll ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel

Re: [x265] [PATCH] asm: Optimized sad_64xN for better cache performance. Reduced lea instruction by half. Performance gain is average +5x w.r.t. previous asm code

2013-10-31 Thread Steve Borho
On Thu, Oct 31, 2013 at 5:53 AM, dnyanesh...@multicorewareinc.com wrote: # HG changeset patch # User Dnyaneshwar Gorade dnyanesh...@multicorewareinc.com # Date 1383216695 -19800 # Thu Oct 31 16:21:35 2013 +0530 # Node ID 86ff1a3ec89720a73325148e8ac01ec1dbdab3c2 # Parent