[Mesa-dev] [Bug 107670] Massive slowdown under specific memcpy implementations (32bit, no-SIMD, backward copy).

2018-09-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107670 Timothy Arceri changed: What|Removed |Added Status|NEEDINFO|RESOLVED Resolution|---

[Mesa-dev] [Bug 107670] Massive slowdown under specific memcpy implementations (32bit, no-SIMD, backward copy).

2018-09-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107670 --- Comment #20 from Axel Davy --- To clarify what I said, based on our source code and the calls made by the game trace, the only "upload" that could occur every frame is buffer data upload. The game has uses two types of d3d vertex buffers.

[Mesa-dev] [Bug 107670] Massive slowdown under specific memcpy implementations (32bit, no-SIMD, backward copy).

2018-09-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107670 --- Comment #19 from Michel Dänzer --- Axel, I'm not sure what you're saying. Anyway, if the problem was that the source of the memcpy is uncacheable, surely it would always be slow, regardless of which memcpy implementation is used? > So

[Mesa-dev] [Bug 107670] Massive slowdown under specific memcpy implementations (32bit, no-SIMD, backward copy).

2018-09-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107670 --- Comment #18 from Axel Davy --- I doubled checked that it is indeed likely to be GTT WC read issue by looking at the mentionned trace. Some vertex buffers are in GTT WC (but with no memcpy inside mesa) and some buffers are in VRAM, with the

[Mesa-dev] [Bug 107670] Massive slowdown under specific memcpy implementations (32bit, no-SIMD, backward copy).

2018-09-04 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107670 --- Comment #17 from i...@yahoo.com --- (In reply to Michel Dänzer from comment #16) > (In reply to iive from comment #15) > > Aka, I do expect that the whole 512MB buffer is mapped at once. > > It's not (if it was, one process could access the

[Mesa-dev] [Bug 107670] Massive slowdown under specific memcpy implementations (32bit, no-SIMD, backward copy).

2018-09-04 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107670 --- Comment #16 from Michel Dänzer --- (In reply to iive from comment #15) > Aka, I do expect that the whole 512MB buffer is mapped at once. It's not (if it was, one process could access the buffer object memory of another process, bypassing

[Mesa-dev] [Bug 107670] Massive slowdown under specific memcpy implementations (32bit, no-SIMD, backward copy).

2018-09-04 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107670 --- Comment #15 from i...@yahoo.com --- (In reply to Michel Dänzer from comment #14) > (In reply to iive from comment #13) > > It looks to me like the data is first moved ram->vram using dma, then > > vram->vram using CPU... > > No.

[Mesa-dev] [Bug 107670] Massive slowdown under specific memcpy implementations (32bit, no-SIMD, backward copy).

2018-09-04 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107670 --- Comment #14 from Michel Dänzer --- (In reply to iive from comment #13) > Of course, reading from PCI is slow, not cached; and in this exact case also > completely unnecessary. Right, reading from uncacheable memory can certainly explain

[Mesa-dev] [Bug 107670] Massive slowdown under specific memcpy implementations (32bit, no-SIMD, backward copy).

2018-09-04 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107670 --- Comment #13 from i...@yahoo.com --- As I've said, I'm still investigating the issue. Here are some of the things I've found so far: 1. Slackware32, i586 and glibc. Slackware tries to support as many machines as possible, since i586 is

[Mesa-dev] [Bug 107670] Massive slowdown under specific memcpy implementations (32bit, no-SIMD, backward copy).

2018-09-03 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107670 --- Comment #12 from Emil Velikov --- Why are we even discussing a potential optimisation where the user is _unknown_? It contradicts with the principles that we've been using in Mesa for years. -- You are receiving this mail because: You are

[Mesa-dev] [Bug 107670] Massive slowdown under specific memcpy implementations (32bit, no-SIMD, backward copy).

2018-09-03 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107670 --- Comment #11 from Eero Tamminen --- Libc memcpy() obviously won't be optimized for PCI bus transfers, it's way too rare use-case for it. E.g. libpciaccess would seem more suitable place for PCI bus transfer optimized memory copy function,

[Mesa-dev] [Bug 107670] Massive slowdown under specific memcpy implementations (32bit, no-SIMD, backward copy).

2018-09-03 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107670 --- Comment #10 from Emil Velikov --- My personal train of though: Details such as WC are left to the kernel module. Even on the case where userspace can provide hints, it's ultimately up-to the kernel to manage it. Optimising w/o saying the

[Mesa-dev] [Bug 107670] Massive slowdown under specific memcpy implementations (32bit, no-SIMD, backward copy).

2018-08-31 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107670 --- Comment #9 from i...@yahoo.com --- (In reply to Timothy Arceri from comment #8) > Using SSE2 memcpy seems to avoid this problem" > > Glib should select the SSE2 (or better) version of memcpy. If Slackware > doesn't ship and SSE2 support for

[Mesa-dev] [Bug 107670] Massive slowdown under specific memcpy implementations (32bit, no-SIMD, backward copy).

2018-08-30 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107670 Timothy Arceri changed: What|Removed |Added Status|NEW |NEEDINFO --- Comment #8 from Timothy

[Mesa-dev] [Bug 107670] Massive slowdown under specific memcpy implementations (32bit, no-SIMD, backward copy).

2018-08-24 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107670 --- Comment #7 from i...@yahoo.com --- (In reply to Grazvydas Ignotas from comment #4) > What game/benchmark do you see this with? > > Can you try calling _mesa_streaming_load_memcpy() there? It's for reading > uncached memory, but by the looks

[Mesa-dev] [Bug 107670] Massive slowdown under specific memcpy implementations (32bit, no-SIMD, backward copy).

2018-08-24 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107670 --- Comment #6 from Eero Tamminen --- (In reply to Timothy Arceri from comment #1) > There already is asm optimized version of memcpy() in glibc. Why would we > want to reinvent that in Mesa? > > glibc should pick the right implementation for

[Mesa-dev] [Bug 107670] Massive slowdown under specific memcpy implementations (32bit, no-SIMD, backward copy).

2018-08-24 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107670 --- Comment #5 from i...@yahoo.com --- (In reply to Roland Scheidegger from comment #3) > Isn't this mapped as WC? > In this case I'd expect the direction to make little difference, since write > combine of any decent cpu should be able to

[Mesa-dev] [Bug 107670] Massive slowdown under specific memcpy implementations (32bit, no-SIMD, backward copy).

2018-08-24 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107670 --- Comment #4 from Grazvydas Ignotas --- What game/benchmark do you see this with? Can you try calling _mesa_streaming_load_memcpy() there? It's for reading uncached memory, but by the looks of it it might be suitable for writing too. --

[Mesa-dev] [Bug 107670] Massive slowdown under specific memcpy implementations (32bit, no-SIMD, backward copy).

2018-08-23 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107670 --- Comment #3 from Roland Scheidegger --- Isn't this mapped as WC? In this case I'd expect the direction to make little difference, since write combine of any decent cpu should be able to combine the writes regardless the order? Although if

[Mesa-dev] [Bug 107670] Massive slowdown under specific memcpy implementations (32bit, no-SIMD, backward copy).

2018-08-23 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107670 --- Comment #2 from i...@yahoo.com --- (In reply to Timothy Arceri from comment #1) > There already is asm optimized version of memcpy() in glibc. Why would we > want to reinvent that in Mesa? > > glibc should pick the right implementation for

[Mesa-dev] [Bug 107670] Massive slowdown under specific memcpy implementations (32bit, no-SIMD, backward copy).

2018-08-23 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107670 --- Comment #1 from Timothy Arceri --- There already is asm optimized version of memcpy() in glibc. Why would we want to reinvent that in Mesa? glibc should pick the right implementation for you system. -- You are receiving this mail

[Mesa-dev] [Bug 107670] Massive slowdown under specific memcpy implementations (32bit, no-SIMD, backward copy).

2018-08-23 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107670 Bug ID: 107670 Summary: Massive slowdown under specific memcpy implementations (32bit, no-SIMD, backward copy). Product: Mesa Version: unspecified Hardware: x86 (IA32)