On Sun, 2017-05-14 at 16:16 -0700, Keith Packard wrote: > > Clemens Eisserer <[email protected]> writes: > > > I have observed extremly low x11perf-putimagexy10 results when using > > glamor on top of the i965 driver - while shmput10 results are quite > > ok. > > Why do you care? xy format images are not something you should be using > at all; they were designed for 1980s era Apollo workstations.
Indeed. But for the morbidly curious... Glamor mostly doesn't accelerate xy image handling because GL mostly doesn't believe in planar bitmaps as, like, a thing. The fallback path creates a pbo, copies out from the drawable's fbo to the pbo, glMapBuffer's the pbo, runs the op against the pbo with fb, and then copies the data back to the fbo. So that's bad enough, now every operation is at least two more blits. Underneath, fbPutXYImage works by walking the bit planes in order and merging them into the destination. This is just about pathologically bad, because it means a read/modify/write cycle of the entire destination _for each plane_. So you're doing many more operations per pixel, and you're fighting the cache to do it. You're also not comparing apples to apples in your test. shmput10 is 10x10 ZPixmap, you mean to compare to shmputxy10. Interestingly at least on my glamor machine shmputxy10 is _slower_ than non-shm: 84000 trep @ 0.1108 msec ( 9030.0/sec): PutImage XY 10x10 square 72000 trep @ 0.1302 msec ( 7680.0/sec): ShmPutImage XY 10x10 square I think that's just a funny interaction with the MIT-SHM code, which when faced with an xy image will blast it into a (presumably z image) scratch pixmap first and then CopyArea from that to the destination. If glamor creates that pixmap on the GPU we're still going to do the same fallback logic for the xy putimage phase, and then yet another blit from that to the real destination. Oops. Forcing that pixmap's usage to be GLAMOR_CREATE_PIXMAP_CPU brings shmputxy10 to 117kops/sec, which is quite a bit nicer; probably we should formalize that usage for ShmPutImage, and maybe do the equivalent trick for wire PutImage too. But compare this all with leaving your pixels in a sensible format: 6000000 trep @ 0.0019 msec (525000.0/sec): PutImage 10x10 square 4800000 trep @ 0.0022 msec (454000.0/sec): ShmPutImage 10x10 square XY images are just losers, don't bother. (ShmPutImage being slower is curious, and it's slower for larger request sizes too, so there's definitely something amiss there we should dig into.) - ajax _______________________________________________ [email protected]: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: https://lists.x.org/mailman/listinfo/xorg-devel
