Based on the debug and experiment, geode graphics processor will be used for 
the middle dword PictOpAdd operation rendering. Because there is no writemask 
in geode HW, so modulus(%) with 4 must be garanteed using this method at first.
With the code improvement, if the destination start point can not be modulus 
with 4 to zero, we let driver do SW "*dest = *dest + *src" calculation firstly 
to the first pixels. Alike, the driver do that SW rendering for the remaining 
pixels for each line. That is to say, we will make full use of GP of geode to 
do the rendering. Use a picture to demonstrate:

%/4          1   2   3   0   1   2   3    0   1   2   3   0   1   2
destOffset:  x   x   x   x   x   x   x    x   x   x   x   x   x   x
HW                       x   x   x   x   (x   x   x   x)
SW           x   x   x                                    x   x   x

But from my debug, the last dword(content in bracket) is wrongly rendered, so I 
render the (dest%4 -1) dwords in HW. The next picture is the actually rendered 
result:   

%/4          1   2   3   0   1   2   3   0   1   2   3   0   1   2
destOffset:  x   x   x   x   x   x   x   x   x   x   x   x   x   x
HW                       x   x   x   x        
SW           x   x   x                   x   x   x   x   x   x   x

That is to say, if the width of glyph is bigger, our rendering performance is 
better.
For my test,
"x11perf -aa10text"    :       46400/s         
"x11perf -aa24text"    :       18300/s     
They are improved 10 times than before.



Thanks,
Frank           

> -----Original Message-----
> From: Michel Dänzer [mailto:[email protected]]
> Sent: 2010年7月21日 16:24
> To: Huang, FrankR
> Cc: Mart Raudsepp; Torres, Rigo; Writer, Tim; [email protected];
> [email protected]; Cui, Hunk; Deucher, Alexander
> Subject: RE: Glyph rendering
> 
> 
> [ Fixed your quoting, please consider using a better e-mail client ]
> 
> On Mit, 2010-07-21 at 16:11 +0800, Huang, FrankR wrote:
> >
> > From: Michel Dänzer [mailto:[email protected]]
> >
> > > On Mit, 2010-07-21 at 15:30 +0800, Huang, FrankR wrote:
> > > >
> > > > But as you known, for the PICT_a8r8g8b8 method, the width and height
> > > > of source sometimes can not be divied by 4(such as 5...), so the
> > > > remaining pixel PictOpAdd should be done by SW code.
> > >
> > > The height doesn't matter, and if there's a writemask it should be
> > > possible to use that to mask out source/destination pixels that don't
> > > align to an ARGB pixel.
> >
> > Yes. My description is not accurate, we only care width for this
> > condition. Do you mean the writemask implemented in HW? From what I
> > found, it is not in geode GP. :(
> 
> Yes, that's what I meant.
> 
> 
> > > > For the mixed way(HW+SW as I described above), the speed can be
> > > > 50000/s, unfortunely the result still is not correct(seems correct
> by
> > > > debugging, I'm still checking it).
> > >
> > > Sounds like maybe you're not properly synchronizing between GPU and
> CPU
> > > access.
> >
> > Michael,
> 
> Ahem.
> 
> > maybe you misunderstand. The "SW" I mean is that our driver still use
> > a formula to do the "+" operation in video memory instead of fallback
> > to server handling(may be you means this). We don't fallback anymore.
> 
> I figured that, which means the driver is responsible for making sure
> the GPU and CPU properly see each other's rendering results.
> 
> 
> --
> Earthling Michel Dänzer           |                http://www.vmware.com
> Libre software enthusiast         |          Debian, X and DRI developer

_______________________________________________
[email protected]: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Reply via email to