On Thu, Jul 21, 2011 at 14:15, Matt Turner <[email protected]> wrote: > On Thu, Feb 4, 2010 at 7:45 PM, <[email protected]> wrote: >> When playing some video with mplayer I noticed with oprofile that >> half the time is spent in xf86XVCopyPacked() or xf86XVCopyYUV12ToPacked(). >> >> Looking at the former, I wonder why a mere memcpy was not used instead >> of "manually" copying each words. glibc's memcpy is usually optimized >> for the target architecture while there is little the compiler can do >> to optimize given code. >> Also, for the plannar to packed version, you can achieve much better >> performance using vector instructions, but it's less easy to do it >> portably. >> >> So I suppose there is a good reason why these functions are so slow. >> Maybe because the video driver are supposed to propose better ones ? >> Or maybe because it's planned to use an external library like pixman >> to do this kind of job in the future ? >> >> More to the point, what I'm trying to know is weither I'm supposed to >> optimize my video driver to not use these functions, or if it's OK to >> optimize them instead, and what path I should follow ? > > I was digging through some old patches and came across a > Loongson-optimized xf86XVCopyYUV12ToPacked function (attached). Do you > know who wrote it? > > Did we ever come to some conclusion as to how this was supposed to be > handled? Would optimized implementations be acceptable to put in > hw/xfree86/common/xf86xv.c? > > Also, I see no reason why xf86XVCopyPacked can't be simplified by > using memcpy (or maybe memmove?). Any reason why not? >
I suspect that on some combinations of arches+drivers we'll be copying to I/O area. I suppose we could have our own memset_toio functions and handle that. Stéphane _______________________________________________ [email protected]: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: http://lists.x.org/mailman/listinfo/xorg-devel
