On Sun, Dec 25, 2016 at 12:53 PM, Mikhail V <mikhail...@gmail.com> wrote:
> On Sat, Dec 24, 2016 at 5:12 PM, Mikhail V <mikhail...@gmail.com> wrote: >> >>> Probably there is more criterias here that I am not aware of >>> and objective arguments to prefer "FORTRAN" order, apart >>> from having more traditional [x,y] notation? >>> >> The argument I think comes from building/slicing matrices out of >> (column) vectors. You see this a lot in numerical work. If the row is of >> pointers, you can build sparse systems that reference underlying vector >> without doing any copying (you can do this with row data instead, but then >> you need row vectors, and that would be morally wrong). This is important >> since building sparse systems can be very slow if you're not careful. >> >> I still avoid FORTRAN order because it's not mathy. E.g., the matrix >> element "a_{0,2}" should be accessed as "a[0][2]". For an objective >> argument, I'll note that graphics hardware--in particular VGA/VBE hardware, >> which influenced latter standards, e.g. HDMI--is row-major, top-to-bottom >> raster order. This has been hugely influential, and is more-or-less >> expected today by graphics programmers. It explains everything from most >> windowing systems today having GUI controls at the top and left, to why GL >> takes padded scanlines as texture input. >> >> One way or another, at this point, changing the order in PyGame is >> probably a bad idea (backwards compatibility and suchlike). At the very >> least, it would needs to be deferred to a major update with breaking API >> changes. >> > > So you kind of agree, that surfarray/pixelcopy should better deal with C > order? > Definitely. > I am curious, if it is worth proposing adding methods which do so. > I agree, one should not touch the existing API. > > Now I have tested the performance one more time, namely > comparing 3 variants to copy data from array to surface: > 1. buf = Dest.get_buffer() > buf.write(Src.tostring(), 0) > 2. pygame.pixelcopy.array_to_surface(Dest, Src) > 3. pygame.pixelcopy.array_to_surface(Dest, Src.T) > > And it turned out that I was wrong about transpose being expensive. > Actually transpose itself does not add significant overhead. First time > I was testing it, I did something wrong. > > For method 2. if I define order="FORTRAN" for original array, > there is no difference in comparison to 3. But if I leave default (C) > order then the performance degrades with bigger arrays > (ca. 20% slower by 800x600 8bit array). > So it is indeed important thing. > Makes sense. For bigger arrays, caching becomes more important in the copying, and implicit transposes of the order mean you thrash on reading. > Most interesting that 1. method with buffer write seems to be always > faster > than others, by ca. 5%. Not a big win, but still interesting... > And if I try it with FORTRAN order, it becomes 2 times slower! > I'm not sure I fully parse what you're doing here. As long as it's safe, copying buffers should be slightly faster since it's 1D--maybe the buffer API is smart enough to step in larger chunks that might potentially straddle a scanline, and you also have one fewer loop variable. When you try it with FORTRAN order, to produce a buffer of the same format would require an allocation and then a copy, so that's probably why it's slower. The NumPy internals <https://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues> has salient things to say on this issue. So I would still look forward to having methods dealing with C order, > just to avoid writing extra transposing and full compliance > with default numpy notation. > > Any comments or opinions about it? > It would be good to know first, which of those things > people use more often and make some use case examples. > Personally, I would like C order just because it's "expected" in graphics*. Under this assumption, I wrote all my code e.g. looping over "y" first, using the buffer API for GL interop, etc. This is optimal in the C order every graphics programmer would expect, but in FORTRAN order, it's *exactly* wrong. I never profiled both options because it's a nearly fundamental assumption. I mean, it's not terribly important. Python is not a fast language. One writes stuff in Python because your program running 5x/50x slower is a non-issue and you want the expressivity. But free perf is free, so it's a bit annoying. *In the interest of fairness, it should be noted that there is an offshoot of image processing (a subset of graphics) that might disagree. They're very FORTRAN-y, using langs with 1-based indexing and both array orders. They also tend to be non-CS/non-math types who work in industry, generating appalling code. Mikhail > Ian