>+pand m0, [pw_00ff] >+pand m2, [pw_00ff] >+pand m4, [pw_00ff] >+pand m6, [pw_00ff] >+ >+packuswb m0, m1 >+packuswb m2, m3 >+packuswb m4, m5 >+packuswb m6, m7 1. If you don't buffer [pw_00ff] into register, you can merge pand+packuswb to pshufb, most time buffer constant into register is faster 2. packuswb m0,m0 is better, since it depends on one register.
_______________________________________________ x265-devel mailing list [email protected] https://mailman.videolan.org/listinfo/x265-devel
