On Fri, 17 Apr 2020 16:02:12 GMT, Nir Lisker <nlis...@openjdk.org> wrote:
>> Conclusion: The new shaders that support attenuation don't seem to have much >> of a performance impact on machines with >> an Intel HD, but on systems with a graphics accelerator, it is a significant >> slowdown. >> So we are left with the two choices of doubling the number of shaders (that >> is, a set of shaders with attenuation and a >> set without) or living with the performance hit (which will only be a >> problem on machines with a dedicated graphics >> accelerator for highly fill-limited scenes). The only way we can justify a >> 2x drop in performance is if we are fairly >> certain that this is a corner case, and thus unlikely to hit real >> applications. If we do end up deciding to replicate >> the shaders, I don't think it is all that much work. I'm more worried about >> how well it would scale to subsequent >> improvements, although we could easily decide that for, say, spotlights >> attenuation is so common that you wouldn't >> create a version that doesn't do that. In the D3D HLSL shaders, ifdefs are >> used, so the work would be to restore the >> original code and add the new code under an ifdef. Then double the number of >> lines of gradle (at that point, I'd do it >> in a for-each loop), then modify the logic that loads the shaders to pick >> the right one. For GLSL, the different parts >> of the shader are in different files, so it's a matter of creating new >> versions of each of the three lighting shaders >> that handle attenuation and choosing the right one at runtime. > > I discussed this with a graphics engineer. He said that a couple of branches > do not have any real performance impact > even on modern mobile devices, and that, e.g., on iOS 7 using half floats > instead of floats was improving shader > execution dramatically. Desktops with NVIDIA or AMD and even Intel modern > cards can process dozens of branches with no > significant performance degradation. He suggested actually to have all the > light types in a single shader file > (looking ahead here). He also suggested not to permute on shaders based on > the number of lights and just pass in a > uniform for that number and loop over it. The permutations on the bump, > specular and self illuminations components are > correct (not sure we are not doing that for the diffuse component). If we add > later shadows, which is not on my near > to-do list, then we should permute there. It also depends on our target > hardware. If we take into account hardware > from, say, 2005 then maybe branching will cause significant performance loss, > but that hinders our ability to increase > performance for newer hardware. What is the policy here? I have a Win10 > laptop with a GeForce 610M that I will test > this weekend to see if the mobile NVidia cards have some issue. I think most of those are good suggestions going forward. As for the performance drop, the only place we've seen it so far is on graphics accelerators that are a few years old by now. Integrated graphics chipsets (such as Intel HD) either old or new seem largely unaffected by the shader changes. What we are missing is performance metrics from newer graphics accelerators on Mac and Windows. Even with the performance drop on older graphics devices, I'm leaning towards not having the shaders to be shaders to be doubled, since this is an artificial stress test with huge quads. If we could get performance data from a couple more recent graphics accelerators that would be best. ------------- PR: https://git.openjdk.java.net/jfx/pull/43