>+pand        m0,      [pw_00ff]
>+pand        m2,      [pw_00ff]
>+pand        m4,      [pw_00ff]
>+pand        m6,      [pw_00ff]
>+
>+packuswb    m0,      m1
>+packuswb    m2,      m3
>+packuswb    m4,      m5
>+packuswb    m6,      m7
1. If you don't buffer [pw_00ff] into register, you can merge pand+packuswb to 
pshufb, most time buffer constant into register is faster
2. packuswb m0,m0 is better, since it depends on one register.
 
_______________________________________________
x265-devel mailing list
[email protected]
https://mailman.videolan.org/listinfo/x265-devel

Reply via email to