Re: [pygame] pygame with SDL2 proposal

René Dudfield Mon, 20 Mar 2017 22:56:12 -0700

Lets please try and keep conversation civil. Accusing people of trolling
isn't helpful.


There are few people on this mailing list which have a lot of knowledge
about GPU rendering, and Ian is definitely one of them. I think he was
genuinely trying to be helpful. His claim isn't even controversial - GPU,
ASIC, and CPU rendering all have different trade offs. As do game libraries
like pygame.



On Tue, Mar 21, 2017 at 12:18 AM, Leif Theden <leif.the...@gmail.com> wrote:

> I'm not really sure how to respond to this wall of text.  Ian, you are
> really trying to make the case that a software renderer making simple
> shapes around the on the screen is better than a GPU?  Why then are
> basically all games these days using a GPU?  Please, don't answer it,
> because I'm not sure if you are trolling or not, and don't want to risk
> derailing the tread with this...honestly quite ludicrous assertion you've
> made.  The proof is in the pudding, so to speak, and the pudding is
> certainly not software rendering anything.
>
> Anyway, I don't think the typical use case for games is drawing lines.
> People want sprites, and they want them to rotate, scale, and translate
> quickly.  Pygame1 cannot do this, but SDL2 can.
>
> I would say maybe there is a niche for pygame1, a toy for drawing lines
> and breaking students dreams of building a game using python.  There is a
> real desire for a fast python library that doesn't have the magic of say,
> Kivy, or even pyglet, but isn't slow and cumbersome either.
>
> On Mon, Mar 20, 2017 at 5:07 PM, Ian Mallett <i...@geometrian.com> wrote:
>
>> On Mon, Mar 20, 2017 at 3:52 PM, Greg Ewing <greg.ew...@canterbury.ac.nz
>> > wrote:
>>
>>> Ian Mallett wrote:
>>>
>>>> Per-pixel drawing operations, if they must be individually managed by
>>>> the CPU, will always be much faster to do on the CPU. This means things
>>>> like Surface.set_at(...) and likely draw.circle(...) as well as potentially
>>>> things like draw.rect(...) and draw.line(...).
>>>>
>>>
>>> This is undoubtedly true in the case of drawing one pixel at a time,
>>> but are you sure it's true for lines and rectangles?
>>
>>
>> On Mon, Mar 20, 2017 at 12:25 PM, Leif Theden <leif.the...@gmail.com>
>> wrote:
>>
>>> Good points Ian, but I don't see why we need to support software drawing
>>> when OpengGL supports drawing primitives?  Is there a compelling reason
>>> that drawing lines with the CPU is better then doing it on the GPU?
>>>
>> Oh yes!
>>
>> Basically, it's because a GPU runs well when it has a big,
>> parallelizeable workload, and terribly when not. Flexible, small workloads,
>> such as you see in a typical indie game or small project, are basically
>> exactly this worst case. They are small (rendering dozens to hundreds of
>> objects), and they are dynamic in that the objects change positions and
>> shading according to CPU-hosted logic. Heuristic: if you've got a branch
>> deciding where/whether to render your object or what color it should be,
>> then the GPU hates it and you.*
>>
>> If that made sense you to, you can skip this elaboration:
>> ----
>>
>> The GPU is basically a bunch of workers (thousands, nowadays) sitting in
>> a room. When you tell the GPU to do something, you tell everyone in the
>> room to do that same thing. Configuring the GPU to do something else
>> (saliently: changing the shader) is slow (for technical reasons).
>>
>> I have a GTX 980 sitting on my desk right now, and it has 2048 thread
>> processors clocked at 1126 MHz. That's ****ing *insane*. I can throw
>> millions and millions of triangles at it, and it laughs right back at me
>> because it's rendering them (essentially) 2048 at a time. The fragments (≈
>> pixels) generated from those triangles are also rendered 2048 at a time.
>> This is awesome, but only if you're drawing lots of triangles or shading
>> lots of pixels in the same way (the same shader).
>>
>> But I *cannot* change the way I'm drawing those triangles individually.
>> Say I alternate between a red shader and a blue shader for each of a
>> million triangles. NVIDIA guidelines tell me I'm at about 3 *seconds per
>> frame*, not even counting the rendering. This is what I mean by
>> overhead. (To work around this problem, you *double* the amount of work
>> and send a color along with each vertex as data. That's just more data and
>> the GPU can handle it easily. But reconfigure? No good.) And this is in
>> C-like languages. In Python, you have a huge amount of software overhead
>> for those state changes, even before you get to the PCIe bus.
>>
>> And in a typical pygame project or indie game, this is basically exactly
>> what we're trying to do. We've got sprites with individual location data
>> and different ways of being rendered--different textures, different blend
>> modes, etc. Only a few objects, but decent complexity in how to draw them.
>> With a bunch of cleverness, you could conceivably write some complex code
>> to work around this (generate work batches, abstract to an übershader,
>> etc.), but I doubt you could (or would want to) fully abstract this away
>> from the user--particularly in such a flexible API as pygame.
>>
>> The second issue is that the PCIe bus, which is how the CPU talks to the
>> GPU, is *really slow* compared to the CPU's memory subsystem--both in
>> terms of latency and bandwidth. My lab computer has ~64 GB/s DDR4 bandwidth
>> (my computer at home has quadruple that) at 50ns-500ns latency. By
>> contrast, the PCIe bus tops out at 2 GB/s at ~20000ns latency. My CPU also
>> has 15MB of L3 cache, while my 980 has no L3 cache and only 2MiB of L2
>> (because streaming workloads need less caching and caching is expensive).
>>
>> So when you draw something on the CPU, you're drawing using a fast
>> processor (my machine: 3.5 GHz, wide vectors, long pipe) into very close
>> DRAM at a really low latency, but it's probably cached in L3 or lower
>> anyway. When you draw something on the GPU, you're drawing (slowly (~1 GHz,
>> narrow vectors, short pipe), but in-parallel) into
>> different-DRAM-which-is-optimized-more-for-streaming and which may or
>> may not be cached at all. Similar maybe, but you *also* have to wait for
>> the command to go over the PCIe bus, take any driver sync hit, spool up the
>> GPU pipeline in the right configuration, and so on. The overhead is worth
>> it if you're drawing a million triangles, but not if you're calling
>> Surface.set_at(...).
>>
>> The point is, GPUs have great parallelism, but you pay for it in latency
>> and usability. It's a tradeoff, and when you consider all the rest you need
>> to do on the CPU, it's not always a clear one. But, as a heuristic, lots of
>> geometry or fillrate or math-intensive shading calls for a GPU. Anything
>> less calls for a CPU. My argument is that the typical use-case of pygame
>> falls, *easily*, into the latter.
>>
>>
>> ----
>>
>> *(Of course, you *can* make this fast at a tremendous programmer cost by
>> emulating all that logic on the GPU using e.g. compute shaders, which is
>> what all the cool kids are doing, or amortizing state changes with e.g.
>> Vulkan's new command lists. But it requires (1) being competent at GPU
>> architecture and (2) being willing to invest the time. I still use pygame
>> mainly because of 2.)
>>
>>
>>> Also, I'm a bit tired of the "python is slow so you may as well make
>>> everything slow and not expect it to work quickly" attitude.
>>>
>> I was worried someone might take it that way; this isn't my point at
>> all. What I want is for people to remember what's important.
>>
>> Clearly, one should not aspire to make things slow. I'm just saying that
>> if a game developer tries to use Python+pygame to write some crazy
>> graphics-intensive mega-AAA game, when it fails it's really on them for
>> picking the wrong tool. At least for now--this is what I mean when I say we
>> need to figure out if we like our niche.
>> 
>>
>>> A pygame app burns through the CPU not because of the interpretor, but
>>> because it is flipping bits in ram when a GPU could do it.
>>>
>> It's both of these and more. SDL's core blitting routines are in C,
>> occasionally vectorized, IIRC, whereas as I mentioned above you have to
>> figure in the cost of command transfers and overhead when you do operations
>> on the GPU.
>>
>> Ian
>>
>
>

Re: [pygame] pygame with SDL2 proposal

Reply via email to