Lets please try and keep conversation civil. Accusing people of trolling isn't helpful.
There are few people on this mailing list which have a lot of knowledge about GPU rendering, and Ian is definitely one of them. I think he was genuinely trying to be helpful. His claim isn't even controversial - GPU, ASIC, and CPU rendering all have different trade offs. As do game libraries like pygame. On Tue, Mar 21, 2017 at 12:18 AM, Leif Theden <leif.the...@gmail.com> wrote: > I'm not really sure how to respond to this wall of text. Ian, you are > really trying to make the case that a software renderer making simple > shapes around the on the screen is better than a GPU? Why then are > basically all games these days using a GPU? Please, don't answer it, > because I'm not sure if you are trolling or not, and don't want to risk > derailing the tread with this...honestly quite ludicrous assertion you've > made. The proof is in the pudding, so to speak, and the pudding is > certainly not software rendering anything. > > Anyway, I don't think the typical use case for games is drawing lines. > People want sprites, and they want them to rotate, scale, and translate > quickly. Pygame1 cannot do this, but SDL2 can. > > I would say maybe there is a niche for pygame1, a toy for drawing lines > and breaking students dreams of building a game using python. There is a > real desire for a fast python library that doesn't have the magic of say, > Kivy, or even pyglet, but isn't slow and cumbersome either. > > On Mon, Mar 20, 2017 at 5:07 PM, Ian Mallett <i...@geometrian.com> wrote: > >> On Mon, Mar 20, 2017 at 3:52 PM, Greg Ewing <greg.ew...@canterbury.ac.nz >> > wrote: >> >>> Ian Mallett wrote: >>> >>>> Per-pixel drawing operations, if they must be individually managed by >>>> the CPU, will always be much faster to do on the CPU. This means things >>>> like Surface.set_at(...) and likely draw.circle(...) as well as potentially >>>> things like draw.rect(...) and draw.line(...). >>>> >>> >>> This is undoubtedly true in the case of drawing one pixel at a time, >>> but are you sure it's true for lines and rectangles? >> >> >> On Mon, Mar 20, 2017 at 12:25 PM, Leif Theden <leif.the...@gmail.com> >> wrote: >> >>> Good points Ian, but I don't see why we need to support software drawing >>> when OpengGL supports drawing primitives? Is there a compelling reason >>> that drawing lines with the CPU is better then doing it on the GPU? >>> >> Oh yes! >> >> Basically, it's because a GPU runs well when it has a big, >> parallelizeable workload, and terribly when not. Flexible, small workloads, >> such as you see in a typical indie game or small project, are basically >> exactly this worst case. They are small (rendering dozens to hundreds of >> objects), and they are dynamic in that the objects change positions and >> shading according to CPU-hosted logic. Heuristic: if you've got a branch >> deciding where/whether to render your object or what color it should be, >> then the GPU hates it and you.* >> >> If that made sense you to, you can skip this elaboration: >> ---- >> >> The GPU is basically a bunch of workers (thousands, nowadays) sitting in >> a room. When you tell the GPU to do something, you tell everyone in the >> room to do that same thing. Configuring the GPU to do something else >> (saliently: changing the shader) is slow (for technical reasons). >> >> I have a GTX 980 sitting on my desk right now, and it has 2048 thread >> processors clocked at 1126 MHz. That's ****ing *insane*. I can throw >> millions and millions of triangles at it, and it laughs right back at me >> because it's rendering them (essentially) 2048 at a time. The fragments (≈ >> pixels) generated from those triangles are also rendered 2048 at a time. >> This is awesome, but only if you're drawing lots of triangles or shading >> lots of pixels in the same way (the same shader). >> >> But I *cannot* change the way I'm drawing those triangles individually. >> Say I alternate between a red shader and a blue shader for each of a >> million triangles. NVIDIA guidelines tell me I'm at about 3 *seconds per >> frame*, not even counting the rendering. This is what I mean by >> overhead. (To work around this problem, you *double* the amount of work >> and send a color along with each vertex as data. That's just more data and >> the GPU can handle it easily. But reconfigure? No good.) And this is in >> C-like languages. In Python, you have a huge amount of software overhead >> for those state changes, even before you get to the PCIe bus. >> >> And in a typical pygame project or indie game, this is basically exactly >> what we're trying to do. We've got sprites with individual location data >> and different ways of being rendered--different textures, different blend >> modes, etc. Only a few objects, but decent complexity in how to draw them. >> With a bunch of cleverness, you could conceivably write some complex code >> to work around this (generate work batches, abstract to an übershader, >> etc.), but I doubt you could (or would want to) fully abstract this away >> from the user--particularly in such a flexible API as pygame. >> >> The second issue is that the PCIe bus, which is how the CPU talks to the >> GPU, is *really slow* compared to the CPU's memory subsystem--both in >> terms of latency and bandwidth. My lab computer has ~64 GB/s DDR4 bandwidth >> (my computer at home has quadruple that) at 50ns-500ns latency. By >> contrast, the PCIe bus tops out at 2 GB/s at ~20000ns latency. My CPU also >> has 15MB of L3 cache, while my 980 has no L3 cache and only 2MiB of L2 >> (because streaming workloads need less caching and caching is expensive). >> >> So when you draw something on the CPU, you're drawing using a fast >> processor (my machine: 3.5 GHz, wide vectors, long pipe) into very close >> DRAM at a really low latency, but it's probably cached in L3 or lower >> anyway. When you draw something on the GPU, you're drawing (slowly (~1 GHz, >> narrow vectors, short pipe), but in-parallel) into >> different-DRAM-which-is-optimized-more-for-streaming and which may or >> may not be cached at all. Similar maybe, but you *also* have to wait for >> the command to go over the PCIe bus, take any driver sync hit, spool up the >> GPU pipeline in the right configuration, and so on. The overhead is worth >> it if you're drawing a million triangles, but not if you're calling >> Surface.set_at(...). >> >> The point is, GPUs have great parallelism, but you pay for it in latency >> and usability. It's a tradeoff, and when you consider all the rest you need >> to do on the CPU, it's not always a clear one. But, as a heuristic, lots of >> geometry or fillrate or math-intensive shading calls for a GPU. Anything >> less calls for a CPU. My argument is that the typical use-case of pygame >> falls, *easily*, into the latter. >> >> >> ---- >> >> *(Of course, you *can* make this fast at a tremendous programmer cost by >> emulating all that logic on the GPU using e.g. compute shaders, which is >> what all the cool kids are doing, or amortizing state changes with e.g. >> Vulkan's new command lists. But it requires (1) being competent at GPU >> architecture and (2) being willing to invest the time. I still use pygame >> mainly because of 2.) >> >> >>> Also, I'm a bit tired of the "python is slow so you may as well make >>> everything slow and not expect it to work quickly" attitude. >>> >> I was worried someone might take it that way; this isn't my point at >> all. What I want is for people to remember what's important. >> >> Clearly, one should not aspire to make things slow. I'm just saying that >> if a game developer tries to use Python+pygame to write some crazy >> graphics-intensive mega-AAA game, when it fails it's really on them for >> picking the wrong tool. At least for now--this is what I mean when I say we >> need to figure out if we like our niche. >> >> >>> A pygame app burns through the CPU not because of the interpretor, but >>> because it is flipping bits in ram when a GPU could do it. >>> >> It's both of these and more. SDL's core blitting routines are in C, >> occasionally vectorized, IIRC, whereas as I mentioned above you have to >> figure in the cost of command transfers and overhead when you do operations >> on the GPU. >> >> Ian >> > >