When playing some video with mplayer I noticed with oprofile that half the time is spent in xf86XVCopyPacked() or xf86XVCopyYUV12ToPacked().
Looking at the former, I wonder why a mere memcpy was not used instead of "manually" copying each words. glibc's memcpy is usually optimized for the target architecture while there is little the compiler can do to optimize given code. Also, for the plannar to packed version, you can achieve much better performance using vector instructions, but it's less easy to do it portably. So I suppose there is a good reason why these functions are so slow. Maybe because the video driver are supposed to propose better ones ? Or maybe because it's planned to use an external library like pixman to do this kind of job in the future ? More to the point, what I'm trying to know is weither I'm supposed to optimize my video driver to not use these functions, or if it's OK to optimize them instead, and what path I should follow ? _______________________________________________ xorg mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/xorg
