*Highlights*

*Details*
I design and implement ARM NEON algorithm on DCT16x16, since ARM registers
very limited, I design algorithm to process 16x4 everytime, and loop 4
times to process all of DCT-1D rows. the DCT-2D is similar but work on
32-bits intermedia (the 32-bits multiplication is bottleneck here, as
compare to single cycle 16-bits multiplication, it is 4-cycles)

*Plans*
Write a example for psyCost_pp<2> (psyCost_pp_4x4)
I need more ~2 weeks to finish the DCT16x16, the function too large and
complex, I need more time to debug and adjust my algorithm / code, and I
need average ~20 minutes to execute debug top (modify from our Testbench)
in the simulate environment.

Thank you
Regards
Ramya
_______________________________________________
x265-devel mailing list
[email protected]
https://mailman.videolan.org/listinfo/x265-devel

Reply via email to