https://bugs.freedesktop.org/show_bug.cgi?id=107670
Timothy Arceri changed:
What|Removed |Added
Status|NEEDINFO|RESOLVED
Resolution|---
https://bugs.freedesktop.org/show_bug.cgi?id=107670
--- Comment #20 from Axel Davy ---
To clarify what I said, based on our source code and the calls made by the game
trace, the only "upload" that could occur every frame is buffer data upload.
The game has uses two types of d3d vertex buffers.
https://bugs.freedesktop.org/show_bug.cgi?id=107670
--- Comment #19 from Michel Dänzer ---
Axel, I'm not sure what you're saying. Anyway, if the problem was that the
source of the memcpy is uncacheable, surely it would always be slow, regardless
of which memcpy implementation is used?
> So
https://bugs.freedesktop.org/show_bug.cgi?id=107670
--- Comment #18 from Axel Davy ---
I doubled checked that it is indeed likely to be GTT WC read issue by looking
at the mentionned trace. Some vertex buffers are in GTT WC (but with no memcpy
inside mesa) and some buffers are in VRAM, with the
https://bugs.freedesktop.org/show_bug.cgi?id=107670
--- Comment #17 from i...@yahoo.com ---
(In reply to Michel Dänzer from comment #16)
> (In reply to iive from comment #15)
> > Aka, I do expect that the whole 512MB buffer is mapped at once.
>
> It's not (if it was, one process could access the
https://bugs.freedesktop.org/show_bug.cgi?id=107670
--- Comment #16 from Michel Dänzer ---
(In reply to iive from comment #15)
> Aka, I do expect that the whole 512MB buffer is mapped at once.
It's not (if it was, one process could access the buffer object memory of
another process, bypassing
https://bugs.freedesktop.org/show_bug.cgi?id=107670
--- Comment #15 from i...@yahoo.com ---
(In reply to Michel Dänzer from comment #14)
> (In reply to iive from comment #13)
> > It looks to me like the data is first moved ram->vram using dma, then
> > vram->vram using CPU...
>
> No.
https://bugs.freedesktop.org/show_bug.cgi?id=107670
--- Comment #14 from Michel Dänzer ---
(In reply to iive from comment #13)
> Of course, reading from PCI is slow, not cached; and in this exact case also
> completely unnecessary.
Right, reading from uncacheable memory can certainly explain
https://bugs.freedesktop.org/show_bug.cgi?id=107670
--- Comment #13 from i...@yahoo.com ---
As I've said, I'm still investigating the issue.
Here are some of the things I've found so far:
1. Slackware32, i586 and glibc.
Slackware tries to support as many machines as possible, since i586 is
https://bugs.freedesktop.org/show_bug.cgi?id=107670
--- Comment #12 from Emil Velikov ---
Why are we even discussing a potential optimisation where the user is
_unknown_?
It contradicts with the principles that we've been using in Mesa for years.
--
You are receiving this mail because:
You are
https://bugs.freedesktop.org/show_bug.cgi?id=107670
--- Comment #11 from Eero Tamminen ---
Libc memcpy() obviously won't be optimized for PCI bus transfers, it's way too
rare use-case for it.
E.g. libpciaccess would seem more suitable place for PCI bus transfer optimized
memory copy function,
https://bugs.freedesktop.org/show_bug.cgi?id=107670
--- Comment #10 from Emil Velikov ---
My personal train of though:
Details such as WC are left to the kernel module. Even on the case where
userspace can provide hints, it's ultimately up-to the kernel to manage it.
Optimising w/o saying the
https://bugs.freedesktop.org/show_bug.cgi?id=107670
--- Comment #9 from i...@yahoo.com ---
(In reply to Timothy Arceri from comment #8)
> Using SSE2 memcpy seems to avoid this problem"
>
> Glib should select the SSE2 (or better) version of memcpy. If Slackware
> doesn't ship and SSE2 support for
https://bugs.freedesktop.org/show_bug.cgi?id=107670
Timothy Arceri changed:
What|Removed |Added
Status|NEW |NEEDINFO
--- Comment #8 from Timothy
https://bugs.freedesktop.org/show_bug.cgi?id=107670
--- Comment #7 from i...@yahoo.com ---
(In reply to Grazvydas Ignotas from comment #4)
> What game/benchmark do you see this with?
>
> Can you try calling _mesa_streaming_load_memcpy() there? It's for reading
> uncached memory, but by the looks
https://bugs.freedesktop.org/show_bug.cgi?id=107670
--- Comment #6 from Eero Tamminen ---
(In reply to Timothy Arceri from comment #1)
> There already is asm optimized version of memcpy() in glibc. Why would we
> want to reinvent that in Mesa?
>
> glibc should pick the right implementation for
https://bugs.freedesktop.org/show_bug.cgi?id=107670
--- Comment #5 from i...@yahoo.com ---
(In reply to Roland Scheidegger from comment #3)
> Isn't this mapped as WC?
> In this case I'd expect the direction to make little difference, since write
> combine of any decent cpu should be able to
https://bugs.freedesktop.org/show_bug.cgi?id=107670
--- Comment #4 from Grazvydas Ignotas ---
What game/benchmark do you see this with?
Can you try calling _mesa_streaming_load_memcpy() there? It's for reading
uncached memory, but by the looks of it it might be suitable for writing too.
--
https://bugs.freedesktop.org/show_bug.cgi?id=107670
--- Comment #3 from Roland Scheidegger ---
Isn't this mapped as WC?
In this case I'd expect the direction to make little difference, since write
combine of any decent cpu should be able to combine the writes regardless the
order?
Although if
https://bugs.freedesktop.org/show_bug.cgi?id=107670
--- Comment #2 from i...@yahoo.com ---
(In reply to Timothy Arceri from comment #1)
> There already is asm optimized version of memcpy() in glibc. Why would we
> want to reinvent that in Mesa?
>
> glibc should pick the right implementation for
https://bugs.freedesktop.org/show_bug.cgi?id=107670
--- Comment #1 from Timothy Arceri ---
There already is asm optimized version of memcpy() in glibc. Why would we want
to reinvent that in Mesa?
glibc should pick the right implementation for you system.
--
You are receiving this mail
https://bugs.freedesktop.org/show_bug.cgi?id=107670
Bug ID: 107670
Summary: Massive slowdown under specific memcpy implementations
(32bit, no-SIMD, backward copy).
Product: Mesa
Version: unspecified
Hardware: x86 (IA32)
22 matches
Mail list logo