Re: [Mesa-dev] Request for support of GL_AMD_pinned_memory and GL_ARB_buffer_storage extensions

2014-02-06 Thread Marek Olšák
On Thu, Feb 6, 2014 at 2:40 AM, Roland Scheidegger srol...@vmware.com wrote:
 I don't think that would work. The reason for this stuff to exist is
 because new hw makes that possible on the hw level directly.

I don't think this has anything to do with new hardware. This stuff
has always been possible and it's a shame it wasn't exposed by DX9 and
GL1.5 or even earlier versions.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Request for support of GL_AMD_pinned_memory and GL_ARB_buffer_storage extensions

2014-02-06 Thread Roland Scheidegger
Am 06.02.2014 12:42, schrieb Marek Olšák:
 On Thu, Feb 6, 2014 at 2:40 AM, Roland Scheidegger srol...@vmware.com wrote:
 I don't think that would work. The reason for this stuff to exist is
 because new hw makes that possible on the hw level directly.
 
 I don't think this has anything to do with new hardware. This stuff
 has always been possible and it's a shame it wasn't exposed by DX9 and
 GL1.5 or even earlier versions.
 
 Marek
 

Yes you are quite right. You could potentially use cached memory though
with new APUs, but that is probably not useful for streaming vertex
data... Might not be useful outside of compute.

Roland
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Request for support of GL_AMD_pinned_memory and GL_ARB_buffer_storage extensions

2014-02-05 Thread neobrain

Hello Marek,

Nice to hear the extension is being tackled! Took me a while to get mesa 
building again, and I did a quick test with your patches - unfortunately 
they outright crash Dolphin at the moment. I'm not quite sure if you 
have sent an updated patch series yet, so I used the one you sent on Jan 
29, applied on current git mesa (5c975966) from today.


Most importantly, our buffer streaming code uses persistent (and 
coherent) mapping in the buffer_storage code path, so maybe that's an 
issue? If you like, I can do some additional debugging, but I'm not very 
familiar with mesa debugging so I'd need some help. For what it's worth, 
I'm usually hanging around in #dri-devel under the nick neobrain.


Regards,
Tony


Am 29.01.2014 01:49, schrieb Marek Olšák:

On Wed, Jan 29, 2014 at 1:42 AM, Ian Romanick i...@freedesktop.org wrote:

On 01/28/2014 05:35 PM, Marek Olšák wrote:

Yes, GL_ARB_buffer_storage is being worked on. We'll support it on all
Radeon cards R300 and up.

Are you guys working on that?  Have an ETA? :)

It's done. I'm writing piglit tests at the moment. I'll send my
patches tomorrow.

Marek


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Request for support of GL_AMD_pinned_memory and GL_ARB_buffer_storage extensions

2014-02-05 Thread Jose Fonseca
I honestly hope that GL_AMD_pinned_memory doesn't become popular. It would have 
been alright if it wasn't for this bit in 
http://www.opengl.org/registry/specs/AMD/pinned_memory.txt which says:

2) Can the application still use the buffer using the CPU address?

RESOLVED: YES. However, this access would be completely
non synchronized to the OpenGL pipeline, unless explicit
synchronization is being used (for example, through glFinish or by using
sync objects).

And I'm imagining apps which are streaming vertex data doing precisely just 
that...

This means that, in order to obtain traces of applications that use 
AMD_pinned_memory like that with Apitrace, we'll need to use heuristics to 
determine when applications touch the memory behind the scenes and emit fake 
memcpies, which means slow tracing and/or bloated trace files... Just like user 
memory pointer arrays...  :(

Instead of that uglyness, maybe Apitrace should just mask out 
GL_AMD_pinned_memory support, so that I don't have to worry about it, and let 
apps and OpenGL drivers support/use it at their own peril.

Jose


- Original Message -
 Yes, GL_ARB_buffer_storage is being worked on. We'll support it on all
 Radeon cards R300 and up.
 
 Anyway, GL_STREAM_DRAW should give you the same behavior as
 GL_CLIENT_STORAGE_BIT on open source Radeon drivers.
 
 Marek
 
 On Sun, Nov 24, 2013 at 1:19 PM, Tony Wasserka neobra...@googlemail.com
 wrote:
  Hello everyone,
  I was told on IRC that my question would get most attention around here -
  so
  bear with me if this is the wrong place to ask
 
  I'm one of the developers of the GC/Wii emulator Dolphin. We recently
  rewrote our OpenGL renderer to use modern OpenGL 3 features, however one
  thing that we stumbled upon are the lack of efficient (vertex/index) buffer
  data streaming mechanisms in OpenGL. Basically, most of our vertex data is
  used once and never again after that (we have to do this for accurate
  emulation) - so all vertex data gets streamed into one huge ring buffer
  (and
  analogously for index data, which uses its own huge ring buffer). For
  buffer
  streaming, we have multiple code paths using a combination of
  glMapBufferRange, glBufferSubData, fences and buffer orphaning, yet none of
  these come anywhere close to the performance of (legacy) rendering from a
  vertex array stored in RAM.
 
  There are two OpenGL extensions which greatly help us in this situation:
  AMD's pinned memory [1], and buffer storage[2] in GL 4.4. We currently have
  no buffer storage code path, but usage of pinned memory gave us a speedup
  of
  up to 60% under heavy workloads when working with AMD's Catalyst driver
  under Windows. We expect the same speedup when using buffer storage
  (specifically we need CLIENT_STORAGE_BIT, if I recall correctly).
 
  So the natural question that arises is: Is either of these two extensions
  going to be supported in mesa anytime soon or is it of lower priority than
  other extensions? Also, is the pinned memory extension AMD hardware
  specific
  or would it be possible to support it for other hardware, too? I'm not sure
  if buffer storage (being a GL 4.4 extension, and I read that it might
  actually depend on some other GL 4.3 extension) is possible to implement on
  older hardware, yet it would be very useful for us to have efficient
  streaming methods for old GPUs, too.
 
  I hope this mail doesn't sound too commanding or anything, it's just
  supposed to be a friendly question on improving the emulator experience for
  our user base
  Thanks in advance!
 
  Best regards,
  Tony
 
  [1]
  https://urldefense.proofpoint.com/v1/url?u=http://www.opengl.org/registry/specs/AMD/pinned_memory.txtk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=NMr9uy2iTjWVixC0wOcYCWEIYhfo80qKwRgdodpoDzA%3D%0Am=cv4fRRLbo4swWVoK5KeixmMKacksWBLX%2Bi4XDCp0aDI%3D%0As=ede103da8bd227ae11f6ab3a4a6d6b0c673860dc15b6814055302759bf4ef355
  [2]
  https://urldefense.proofpoint.com/v1/url?u=http://www.opengl.org/registry/specs/ARB/buffer_storage.txtk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=NMr9uy2iTjWVixC0wOcYCWEIYhfo80qKwRgdodpoDzA%3D%0Am=cv4fRRLbo4swWVoK5KeixmMKacksWBLX%2Bi4XDCp0aDI%3D%0As=7ceb1af3a41882ca6ba4e13bf3df2c8a59b08441835b05174e93968ef8f580f2
 
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/mesa-devk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=NMr9uy2iTjWVixC0wOcYCWEIYhfo80qKwRgdodpoDzA%3D%0Am=cv4fRRLbo4swWVoK5KeixmMKacksWBLX%2Bi4XDCp0aDI%3D%0As=5d44af52ecd36e285eff028b59a928ba327a33c57498a7e5d0a3f8e5e12070a9
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 

Re: [Mesa-dev] Request for support of GL_AMD_pinned_memory and GL_ARB_buffer_storage extensions

2014-02-05 Thread Grigori Goronzy

On 05.02.2014 18:08, Jose Fonseca wrote:

I honestly hope that GL_AMD_pinned_memory doesn't become popular. It would have 
been alright if it wasn't for this bit in 
http://www.opengl.org/registry/specs/AMD/pinned_memory.txt which says:

 2) Can the application still use the buffer using the CPU address?

 RESOLVED: YES. However, this access would be completely
 non synchronized to the OpenGL pipeline, unless explicit
 synchronization is being used (for example, through glFinish or by 
using
 sync objects).

And I'm imagining apps which are streaming vertex data doing precisely just 
that...



I don't understand your concern, this is exactly the same behavior 
GL_MAP_UNSYCHRONIZED_BIT has, and apps are supposedly using that 
properly. How does apitrace handle it?


Grigori
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Request for support of GL_AMD_pinned_memory and GL_ARB_buffer_storage extensions

2014-02-05 Thread Patrick Baggett
My understanding is that this is like having MAP_UNSYNCHRONIZED on at all
times, even when it isn't mapped, because it is always mapped (into
memory). Is that correct Jose?

Patrick


On Wed, Feb 5, 2014 at 11:53 AM, Grigori Goronzy g...@chown.ath.cx wrote:

 On 05.02.2014 18:08, Jose Fonseca wrote:

 I honestly hope that GL_AMD_pinned_memory doesn't become popular. It
 would have been alright if it wasn't for this bit in
 http://www.opengl.org/registry/specs/AMD/pinned_memory.txt which says:

  2) Can the application still use the buffer using the CPU address?

  RESOLVED: YES. However, this access would be completely
  non synchronized to the OpenGL pipeline, unless explicit
  synchronization is being used (for example, through glFinish or
 by using
  sync objects).

 And I'm imagining apps which are streaming vertex data doing precisely
 just that...


 I don't understand your concern, this is exactly the same behavior
 GL_MAP_UNSYCHRONIZED_BIT has, and apps are supposedly using that properly.
 How does apitrace handle it?

 Grigori

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Request for support of GL_AMD_pinned_memory and GL_ARB_buffer_storage extensions

2014-02-05 Thread Jose Fonseca


- Original Message -
 On 05.02.2014 18:08, Jose Fonseca wrote:
  I honestly hope that GL_AMD_pinned_memory doesn't become popular. It would
  have been alright if it wasn't for this bit in
  https://urldefense.proofpoint.com/v1/url?u=http://www.opengl.org/registry/specs/AMD/pinned_memory.txtk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=NMr9uy2iTjWVixC0wOcYCWEIYhfo80qKwRgdodpoDzA%3D%0Am=pA%2FnK9X3xx0wAlMUZ24PfQ1mW6wAMdTUujz%2Bx7LRwCA%3D%0As=ebbe1f51deb46c81578b3c125b16e31b5f4b28c1d47e283bc9ef588e2707024d
  which says:
 
   2) Can the application still use the buffer using the CPU address?
 
   RESOLVED: YES. However, this access would be completely
   non synchronized to the OpenGL pipeline, unless explicit
   synchronization is being used (for example, through glFinish or by
   using
   sync objects).
 
  And I'm imagining apps which are streaming vertex data doing precisely just
  that...
 
 
 I don't understand your concern, this is exactly the same behavior
 GL_MAP_UNSYCHRONIZED_BIT has, and apps are supposedly using that
 properly. How does apitrace handle it?

GL_AMD_pinned_memory it's nothing like GL_ARB_map_buffer_range's 
GL_MAP_UNSYCHRONIZED_BIT:

- When an app touches memory returned by 
glMapBufferRange(GL_MAP_UNSYCHRONIZED_BIT) it will communicate back to the 
OpenGL driver which bytes it actually touched via the glFlushMappedBufferRange 
(unless the apps doesn't care about performance and doesn't call 
glFlushMappedBufferRange at all, which is silly as it will force the OpenGL 
driver to assumed the whole range changed)

  In this case, the OpenGL driver (hence apitrace) should get all the 
information it needs about which bytes were updated betwen glMap/glUnmap.

- When an app touches memory bound via GL_AMD_pinned_memory outside 
glMap/glUnmap, there are be _no_ hints whatsever.  The OpenGL driver might not 
care as the memory is shared between CPU and GPU, so all is good as far is it 
is concerned, but all the changes the app does are invisible at an API level, 
hence apitrace will not be able to catch them unless it does onerous heuristics.


So while both extensions allow unsynchronized access, but lack of 
synchronization is not my concern. My concern is that GL_AMD_pinned_memory 
allows *hidden* access to GPU memory. 


Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Request for support of GL_AMD_pinned_memory and GL_ARB_buffer_storage extensions

2014-02-05 Thread Jose Fonseca
Yes, precisely.

Jose

- Original Message -
 My understanding is that this is like having MAP_UNSYNCHRONIZED on at all
 times, even when it isn't mapped, because it is always mapped (into
 memory). Is that correct Jose?
 
 Patrick
 
 
 On Wed, Feb 5, 2014 at 11:53 AM, Grigori Goronzy g...@chown.ath.cx wrote:
 
  On 05.02.2014 18:08, Jose Fonseca wrote:
 
  I honestly hope that GL_AMD_pinned_memory doesn't become popular. It
  would have been alright if it wasn't for this bit in
  https://urldefense.proofpoint.com/v1/url?u=http://www.opengl.org/registry/specs/AMD/pinned_memory.txtk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=NMr9uy2iTjWVixC0wOcYCWEIYhfo80qKwRgdodpoDzA%3D%0Am=vU0qdyo0bT3OlrdDiNzEDE1rwoALRno8drdsy3dobcI%3D%0As=0068372b93924b1324d4b77b80c5deec67683c1a59a9b2e91255c5a041603274
  which says:
 
   2) Can the application still use the buffer using the CPU address?
 
   RESOLVED: YES. However, this access would be completely
   non synchronized to the OpenGL pipeline, unless explicit
   synchronization is being used (for example, through glFinish or
  by using
   sync objects).
 
  And I'm imagining apps which are streaming vertex data doing precisely
  just that...
 
 
  I don't understand your concern, this is exactly the same behavior
  GL_MAP_UNSYCHRONIZED_BIT has, and apps are supposedly using that properly.
  How does apitrace handle it?
 
  Grigori
 
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/mesa-devk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=NMr9uy2iTjWVixC0wOcYCWEIYhfo80qKwRgdodpoDzA%3D%0Am=vU0qdyo0bT3OlrdDiNzEDE1rwoALRno8drdsy3dobcI%3D%0As=5f37c510dc241c96f7f1918728b86768c5ad61c70a9281e5b46a460197cec9ee
 
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Request for support of GL_AMD_pinned_memory and GL_ARB_buffer_storage extensions

2014-02-05 Thread Jose Fonseca


- Original Message -
 
 
 - Original Message -
  On 05.02.2014 18:08, Jose Fonseca wrote:
   I honestly hope that GL_AMD_pinned_memory doesn't become popular. It
   would
   have been alright if it wasn't for this bit in
   https://urldefense.proofpoint.com/v1/url?u=http://www.opengl.org/registry/specs/AMD/pinned_memory.txtk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=NMr9uy2iTjWVixC0wOcYCWEIYhfo80qKwRgdodpoDzA%3D%0Am=pA%2FnK9X3xx0wAlMUZ24PfQ1mW6wAMdTUujz%2Bx7LRwCA%3D%0As=ebbe1f51deb46c81578b3c125b16e31b5f4b28c1d47e283bc9ef588e2707024d
   which says:
  
2) Can the application still use the buffer using the CPU address?
  
RESOLVED: YES. However, this access would be completely
non synchronized to the OpenGL pipeline, unless explicit
synchronization is being used (for example, through glFinish or
by
using
sync objects).
  
   And I'm imagining apps which are streaming vertex data doing precisely
   just
   that...
  
  
  I don't understand your concern, this is exactly the same behavior
  GL_MAP_UNSYCHRONIZED_BIT has, and apps are supposedly using that
  properly. How does apitrace handle it?
 
 GL_AMD_pinned_memory it's nothing like GL_ARB_map_buffer_range's
 GL_MAP_UNSYCHRONIZED_BIT:
 
 - When an app touches memory returned by
 glMapBufferRange(GL_MAP_UNSYCHRONIZED_BIT) it will communicate back to the
 OpenGL driver which bytes it actually touched via the
 glFlushMappedBufferRange (unless the apps doesn't care about performance and
 doesn't call glFlushMappedBufferRange at all, which is silly as it will
 force the OpenGL driver to assumed the whole range changed)
 
   In this case, the OpenGL driver (hence apitrace) should get all the
   information it needs about which bytes were updated betwen glMap/glUnmap.
 
 - When an app touches memory bound via GL_AMD_pinned_memory outside
 glMap/glUnmap, there are be _no_ hints whatsever.  The OpenGL driver might
 not care as the memory is shared between CPU and GPU, so all is good as far
 is it is concerned, but all the changes the app does are invisible at an API
 level, hence apitrace will not be able to catch them unless it does onerous
 heuristics.
 
 
 So while both extensions allow unsynchronized access, but lack of
 synchronization is not my concern. My concern is that GL_AMD_pinned_memory
 allows *hidden* access to GPU memory.

Just for the record, the challenges GL_AMD_pinned_memory presents to Apitrace 
are much similar to the old-fashioned OpenGL user array pointers: an app is 
free to change the contents of memory pointed by user arrays pointers at any 
point in time, except during a draw call.  This means that before every draw 
call, Apitrace needs to scavenge all the user memory pointers and write their 
contents to the trace file, just in case the app changed it..

In order to support GL_AMD_pinned_memory, for every draw call Apitrace would 
also need to walk over bound GL_AMD_pinned_memory (and nowadays there are loads 
of bound points!), and check if data changed, and serialize in the trace file 
if it did...


I never care much about performance of Apitrace with user array pointers: it is 
an old paradigm; only old apps use it, or programmers which don't particular 
care about performance -- either way, a performance conscious app developer 
would use VBOs hence never hit the problem at all.  My displeasure with 
GL_AMD_pinned_memory is that it essentially flips everything on its head -- it 
encourages a paradigm which apitrace will never be able to handle properly. 


People often complain that OpenGL development tools are poor compared with 
Direct3D's.  An important fact they often miss is that Direct3D API is several 
orders of mangnitude tool friendlier: it's clear that Direct3D API's cares 
about things like allowing to query all state back, whereas OpenGL is more fire 
and forget and never look back -- the main concern in OpenGL is ensuring that 
state can go from App to Driver fast, but little thought is often given to 
ensuring that one can read whole state back, or ensuring that one can intercept 
all state as it goes between the app and the driver...


In this particular case, if the answer for Can the application still use the 
buffer using the CPU address? was a NO, the world would be a much better place.


Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Request for support of GL_AMD_pinned_memory and GL_ARB_buffer_storage extensions

2014-02-05 Thread Marek Olšák
However, GL_ARB_buffer_storage (OpenGL 4.4) with GL_MAP_PERSISTENT_BIT
isn't much different. The only difference I see between
ARB_buffer_storage and AMD_pinned_memory is that AMD_pinned_memory
allows mapping CPU memory to the GPU address space permanently, while
ARB_buffer_storage allows mapping GPU memory to the CPU address
permanently. At the end of the day, both the GPU and the CPU can read
and modify the same buffer and all they need to use for
synchronization is fences.

Marek

On Wed, Feb 5, 2014 at 8:10 PM, Jose Fonseca jfons...@vmware.com wrote:


 - Original Message -


 - Original Message -
  On 05.02.2014 18:08, Jose Fonseca wrote:
   I honestly hope that GL_AMD_pinned_memory doesn't become popular. It
   would
   have been alright if it wasn't for this bit in
   https://urldefense.proofpoint.com/v1/url?u=http://www.opengl.org/registry/specs/AMD/pinned_memory.txtk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=NMr9uy2iTjWVixC0wOcYCWEIYhfo80qKwRgdodpoDzA%3D%0Am=pA%2FnK9X3xx0wAlMUZ24PfQ1mW6wAMdTUujz%2Bx7LRwCA%3D%0As=ebbe1f51deb46c81578b3c125b16e31b5f4b28c1d47e283bc9ef588e2707024d
   which says:
  
2) Can the application still use the buffer using the CPU address?
  
RESOLVED: YES. However, this access would be completely
non synchronized to the OpenGL pipeline, unless explicit
synchronization is being used (for example, through glFinish or
by
using
sync objects).
  
   And I'm imagining apps which are streaming vertex data doing precisely
   just
   that...
  
 
  I don't understand your concern, this is exactly the same behavior
  GL_MAP_UNSYCHRONIZED_BIT has, and apps are supposedly using that
  properly. How does apitrace handle it?

 GL_AMD_pinned_memory it's nothing like GL_ARB_map_buffer_range's
 GL_MAP_UNSYCHRONIZED_BIT:

 - When an app touches memory returned by
 glMapBufferRange(GL_MAP_UNSYCHRONIZED_BIT) it will communicate back to the
 OpenGL driver which bytes it actually touched via the
 glFlushMappedBufferRange (unless the apps doesn't care about performance and
 doesn't call glFlushMappedBufferRange at all, which is silly as it will
 force the OpenGL driver to assumed the whole range changed)

   In this case, the OpenGL driver (hence apitrace) should get all the
   information it needs about which bytes were updated betwen glMap/glUnmap.

 - When an app touches memory bound via GL_AMD_pinned_memory outside
 glMap/glUnmap, there are be _no_ hints whatsever.  The OpenGL driver might
 not care as the memory is shared between CPU and GPU, so all is good as far
 is it is concerned, but all the changes the app does are invisible at an API
 level, hence apitrace will not be able to catch them unless it does onerous
 heuristics.


 So while both extensions allow unsynchronized access, but lack of
 synchronization is not my concern. My concern is that GL_AMD_pinned_memory
 allows *hidden* access to GPU memory.

 Just for the record, the challenges GL_AMD_pinned_memory presents to Apitrace 
 are much similar to the old-fashioned OpenGL user array pointers: an app is 
 free to change the contents of memory pointed by user arrays pointers at any 
 point in time, except during a draw call.  This means that before every draw 
 call, Apitrace needs to scavenge all the user memory pointers and write their 
 contents to the trace file, just in case the app changed it..

 In order to support GL_AMD_pinned_memory, for every draw call Apitrace would 
 also need to walk over bound GL_AMD_pinned_memory (and nowadays there are 
 loads of bound points!), and check if data changed, and serialize in the 
 trace file if it did...


 I never care much about performance of Apitrace with user array pointers: it 
 is an old paradigm; only old apps use it, or programmers which don't 
 particular care about performance -- either way, a performance conscious app 
 developer would use VBOs hence never hit the problem at all.  My displeasure 
 with GL_AMD_pinned_memory is that it essentially flips everything on its head 
 -- it encourages a paradigm which apitrace will never be able to handle 
 properly.


 People often complain that OpenGL development tools are poor compared with 
 Direct3D's.  An important fact they often miss is that Direct3D API is 
 several orders of mangnitude tool friendlier: it's clear that Direct3D API's 
 cares about things like allowing to query all state back, whereas OpenGL is 
 more fire and forget and never look back -- the main concern in OpenGL is 
 ensuring that state can go from App to Driver fast, but little thought is 
 often given to ensuring that one can read whole state back, or ensuring that 
 one can intercept all state as it goes between the app and the driver...


 In this particular case, if the answer for Can the application still use the 
 buffer using the CPU address? was a NO, the world would be a much better 
 place.


 Jose
 ___
 mesa-dev 

Re: [Mesa-dev] Request for support of GL_AMD_pinned_memory and GL_ARB_buffer_storage extensions

2014-02-05 Thread Jose Fonseca
I hadn't looked at GL_ARB_buffer_storage. I need to read more closely, but at a 
glance i looks like GL_MAP_PERSISTENT_BIT alone is okay (app needs to call 
FlushMappedBufferRange must be called to guarantee coherence) but if 
GL_MAP_COHERENCE_BIT is set we are indeed in face of the same issue... :-(

Even worse, being part of GL 4.4 and there being no way for the implementation 
to fail GL_MAP_COHERENCE_BIT mappings, it means there is no way to avoid 
supporting it...

Jose

Note to self: my time would be better spent on reviewing extensions before they 
are ratified, than ranting after the fact...


- Original Message -
 However, GL_ARB_buffer_storage (OpenGL 4.4) with GL_MAP_PERSISTENT_BIT
 isn't much different. The only difference I see between
 ARB_buffer_storage and AMD_pinned_memory is that AMD_pinned_memory
 allows mapping CPU memory to the GPU address space permanently, while
 ARB_buffer_storage allows mapping GPU memory to the CPU address
 permanently. At the end of the day, both the GPU and the CPU can read
 and modify the same buffer and all they need to use for
 synchronization is fences.
 
 Marek
 
 On Wed, Feb 5, 2014 at 8:10 PM, Jose Fonseca jfons...@vmware.com wrote:
 
 
  - Original Message -
 
 
  - Original Message -
   On 05.02.2014 18:08, Jose Fonseca wrote:
I honestly hope that GL_AMD_pinned_memory doesn't become popular. It
would
have been alright if it wasn't for this bit in
https://urldefense.proofpoint.com/v1/url?u=http://www.opengl.org/registry/specs/AMD/pinned_memory.txtk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=NMr9uy2iTjWVixC0wOcYCWEIYhfo80qKwRgdodpoDzA%3D%0Am=pA%2FnK9X3xx0wAlMUZ24PfQ1mW6wAMdTUujz%2Bx7LRwCA%3D%0As=ebbe1f51deb46c81578b3c125b16e31b5f4b28c1d47e283bc9ef588e2707024d
which says:
   
 2) Can the application still use the buffer using the CPU
 address?
   
 RESOLVED: YES. However, this access would be completely
 non synchronized to the OpenGL pipeline, unless explicit
 synchronization is being used (for example, through glFinish
 or
 by
 using
 sync objects).
   
And I'm imagining apps which are streaming vertex data doing precisely
just
that...
   
  
   I don't understand your concern, this is exactly the same behavior
   GL_MAP_UNSYCHRONIZED_BIT has, and apps are supposedly using that
   properly. How does apitrace handle it?
 
  GL_AMD_pinned_memory it's nothing like GL_ARB_map_buffer_range's
  GL_MAP_UNSYCHRONIZED_BIT:
 
  - When an app touches memory returned by
  glMapBufferRange(GL_MAP_UNSYCHRONIZED_BIT) it will communicate back to the
  OpenGL driver which bytes it actually touched via the
  glFlushMappedBufferRange (unless the apps doesn't care about performance
  and
  doesn't call glFlushMappedBufferRange at all, which is silly as it will
  force the OpenGL driver to assumed the whole range changed)
 
In this case, the OpenGL driver (hence apitrace) should get all the
information it needs about which bytes were updated betwen
glMap/glUnmap.
 
  - When an app touches memory bound via GL_AMD_pinned_memory outside
  glMap/glUnmap, there are be _no_ hints whatsever.  The OpenGL driver might
  not care as the memory is shared between CPU and GPU, so all is good as
  far
  is it is concerned, but all the changes the app does are invisible at an
  API
  level, hence apitrace will not be able to catch them unless it does
  onerous
  heuristics.
 
 
  So while both extensions allow unsynchronized access, but lack of
  synchronization is not my concern. My concern is that GL_AMD_pinned_memory
  allows *hidden* access to GPU memory.
 
  Just for the record, the challenges GL_AMD_pinned_memory presents to
  Apitrace are much similar to the old-fashioned OpenGL user array pointers:
  an app is free to change the contents of memory pointed by user arrays
  pointers at any point in time, except during a draw call.  This means that
  before every draw call, Apitrace needs to scavenge all the user memory
  pointers and write their contents to the trace file, just in case the app
  changed it..
 
  In order to support GL_AMD_pinned_memory, for every draw call Apitrace
  would also need to walk over bound GL_AMD_pinned_memory (and nowadays
  there are loads of bound points!), and check if data changed, and
  serialize in the trace file if it did...
 
 
  I never care much about performance of Apitrace with user array pointers:
  it is an old paradigm; only old apps use it, or programmers which don't
  particular care about performance -- either way, a performance conscious
  app developer would use VBOs hence never hit the problem at all.  My
  displeasure with GL_AMD_pinned_memory is that it essentially flips
  everything on its head -- it encourages a paradigm which apitrace will
  never be able to handle properly.
 
 
  People often complain that OpenGL development tools are poor compared with
  Direct3D's.  

Re: [Mesa-dev] Request for support of GL_AMD_pinned_memory and GL_ARB_buffer_storage extensions

2014-02-05 Thread Marek Olšák
The synchronization for non-coherent persistent mappings can also be done using:

glMemoryBarrier(GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT);

In which case you don't know the range either. However I fully support
the addition of coherent persistent mappings to GL. It's perfect for
uploading data without the GL API overhead.

Marek

On Thu, Feb 6, 2014 at 12:49 AM, Jose Fonseca jfons...@vmware.com wrote:
 I hadn't looked at GL_ARB_buffer_storage. I need to read more closely, but at 
 a glance i looks like GL_MAP_PERSISTENT_BIT alone is okay (app needs to call 
 FlushMappedBufferRange must be called to guarantee coherence) but if 
 GL_MAP_COHERENCE_BIT is set we are indeed in face of the same issue... :-(

 Even worse, being part of GL 4.4 and there being no way for the 
 implementation to fail GL_MAP_COHERENCE_BIT mappings, it means there is no 
 way to avoid supporting it...

 Jose

 Note to self: my time would be better spent on reviewing extensions before 
 they are ratified, than ranting after the fact...


 - Original Message -
 However, GL_ARB_buffer_storage (OpenGL 4.4) with GL_MAP_PERSISTENT_BIT
 isn't much different. The only difference I see between
 ARB_buffer_storage and AMD_pinned_memory is that AMD_pinned_memory
 allows mapping CPU memory to the GPU address space permanently, while
 ARB_buffer_storage allows mapping GPU memory to the CPU address
 permanently. At the end of the day, both the GPU and the CPU can read
 and modify the same buffer and all they need to use for
 synchronization is fences.

 Marek

 On Wed, Feb 5, 2014 at 8:10 PM, Jose Fonseca jfons...@vmware.com wrote:
 
 
  - Original Message -
 
 
  - Original Message -
   On 05.02.2014 18:08, Jose Fonseca wrote:
I honestly hope that GL_AMD_pinned_memory doesn't become popular. It
would
have been alright if it wasn't for this bit in
https://urldefense.proofpoint.com/v1/url?u=http://www.opengl.org/registry/specs/AMD/pinned_memory.txtk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=NMr9uy2iTjWVixC0wOcYCWEIYhfo80qKwRgdodpoDzA%3D%0Am=pA%2FnK9X3xx0wAlMUZ24PfQ1mW6wAMdTUujz%2Bx7LRwCA%3D%0As=ebbe1f51deb46c81578b3c125b16e31b5f4b28c1d47e283bc9ef588e2707024d
which says:
   
 2) Can the application still use the buffer using the CPU
 address?
   
 RESOLVED: YES. However, this access would be completely
 non synchronized to the OpenGL pipeline, unless explicit
 synchronization is being used (for example, through glFinish
 or
 by
 using
 sync objects).
   
And I'm imagining apps which are streaming vertex data doing precisely
just
that...
   
  
   I don't understand your concern, this is exactly the same behavior
   GL_MAP_UNSYCHRONIZED_BIT has, and apps are supposedly using that
   properly. How does apitrace handle it?
 
  GL_AMD_pinned_memory it's nothing like GL_ARB_map_buffer_range's
  GL_MAP_UNSYCHRONIZED_BIT:
 
  - When an app touches memory returned by
  glMapBufferRange(GL_MAP_UNSYCHRONIZED_BIT) it will communicate back to the
  OpenGL driver which bytes it actually touched via the
  glFlushMappedBufferRange (unless the apps doesn't care about performance
  and
  doesn't call glFlushMappedBufferRange at all, which is silly as it will
  force the OpenGL driver to assumed the whole range changed)
 
In this case, the OpenGL driver (hence apitrace) should get all the
information it needs about which bytes were updated betwen
glMap/glUnmap.
 
  - When an app touches memory bound via GL_AMD_pinned_memory outside
  glMap/glUnmap, there are be _no_ hints whatsever.  The OpenGL driver might
  not care as the memory is shared between CPU and GPU, so all is good as
  far
  is it is concerned, but all the changes the app does are invisible at an
  API
  level, hence apitrace will not be able to catch them unless it does
  onerous
  heuristics.
 
 
  So while both extensions allow unsynchronized access, but lack of
  synchronization is not my concern. My concern is that GL_AMD_pinned_memory
  allows *hidden* access to GPU memory.
 
  Just for the record, the challenges GL_AMD_pinned_memory presents to
  Apitrace are much similar to the old-fashioned OpenGL user array pointers:
  an app is free to change the contents of memory pointed by user arrays
  pointers at any point in time, except during a draw call.  This means that
  before every draw call, Apitrace needs to scavenge all the user memory
  pointers and write their contents to the trace file, just in case the app
  changed it..
 
  In order to support GL_AMD_pinned_memory, for every draw call Apitrace
  would also need to walk over bound GL_AMD_pinned_memory (and nowadays
  there are loads of bound points!), and check if data changed, and
  serialize in the trace file if it did...
 
 
  I never care much about performance of Apitrace with user array pointers:
  it is an old paradigm; only old apps use it, or programmers which don't
  

Re: [Mesa-dev] Request for support of GL_AMD_pinned_memory and GL_ARB_buffer_storage extensions

2014-02-05 Thread Ian Romanick
On 02/05/2014 11:10 AM, Jose Fonseca wrote:
 
 
 - Original Message -


 - Original Message -
 On 05.02.2014 18:08, Jose Fonseca wrote:
 I honestly hope that GL_AMD_pinned_memory doesn't become popular. It
 would
 have been alright if it wasn't for this bit in
 https://urldefense.proofpoint.com/v1/url?u=http://www.opengl.org/registry/specs/AMD/pinned_memory.txtk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=NMr9uy2iTjWVixC0wOcYCWEIYhfo80qKwRgdodpoDzA%3D%0Am=pA%2FnK9X3xx0wAlMUZ24PfQ1mW6wAMdTUujz%2Bx7LRwCA%3D%0As=ebbe1f51deb46c81578b3c125b16e31b5f4b28c1d47e283bc9ef588e2707024d
 which says:

  2) Can the application still use the buffer using the CPU address?

  RESOLVED: YES. However, this access would be completely
  non synchronized to the OpenGL pipeline, unless explicit
  synchronization is being used (for example, through glFinish or
  by
  using
  sync objects).

 And I'm imagining apps which are streaming vertex data doing precisely
 just
 that...


 I don't understand your concern, this is exactly the same behavior
 GL_MAP_UNSYCHRONIZED_BIT has, and apps are supposedly using that
 properly. How does apitrace handle it?

 GL_AMD_pinned_memory it's nothing like GL_ARB_map_buffer_range's
 GL_MAP_UNSYCHRONIZED_BIT:

 - When an app touches memory returned by
 glMapBufferRange(GL_MAP_UNSYCHRONIZED_BIT) it will communicate back to the
 OpenGL driver which bytes it actually touched via the
 glFlushMappedBufferRange (unless the apps doesn't care about performance and
 doesn't call glFlushMappedBufferRange at all, which is silly as it will
 force the OpenGL driver to assumed the whole range changed)

   In this case, the OpenGL driver (hence apitrace) should get all the
   information it needs about which bytes were updated betwen glMap/glUnmap.

 - When an app touches memory bound via GL_AMD_pinned_memory outside
 glMap/glUnmap, there are be _no_ hints whatsever.  The OpenGL driver might
 not care as the memory is shared between CPU and GPU, so all is good as far
 is it is concerned, but all the changes the app does are invisible at an API
 level, hence apitrace will not be able to catch them unless it does onerous
 heuristics.


 So while both extensions allow unsynchronized access, but lack of
 synchronization is not my concern. My concern is that GL_AMD_pinned_memory
 allows *hidden* access to GPU memory.
 
 Just for the record, the challenges GL_AMD_pinned_memory presents to Apitrace 
 are much similar to the old-fashioned OpenGL user array pointers: an app is 
 free to change the contents of memory pointed by user arrays pointers at any 
 point in time, except during a draw call.  This means that before every draw 
 call, Apitrace needs to scavenge all the user memory pointers and write their 
 contents to the trace file, just in case the app changed it..
 
 In order to support GL_AMD_pinned_memory, for every draw call Apitrace would 
 also need to walk over bound GL_AMD_pinned_memory (and nowadays there are 
 loads of bound points!), and check if data changed, and serialize in the 
 trace file if it did...
 
 
 I never care much about performance of Apitrace with user array pointers: it 
 is an old paradigm; only old apps use it, or programmers which don't 
 particular care about performance -- either way, a performance conscious app 
 developer would use VBOs hence never hit the problem at all.  My displeasure 
 with GL_AMD_pinned_memory is that it essentially flips everything on its head 
 -- it encourages a paradigm which apitrace will never be able to handle 
 properly. 
 
 
 People often complain that OpenGL development tools are poor compared with 
 Direct3D's.  An important fact they often miss is that Direct3D API is 
 several orders of mangnitude tool friendlier: it's clear that Direct3D API's 
 cares about things like allowing to query all state back, whereas OpenGL is 
 more fire and forget and never look back -- the main concern in OpenGL is 
 ensuring that state can go from App to Driver fast, but little thought is 
 often given to ensuring that one can read whole state back, or ensuring that 
 one can intercept all state as it goes between the app and the driver...
 
 
 In this particular case, if the answer for Can the application still use the 
 buffer using the CPU address? was a NO, the world would be a much better 
 place.

I suspect the reason that they didn't do that is it would imply a very
expensive validation step at draw time.  There are a whole bunch of
technologies in newer GL implementations that will make tracing a
miserable prospect. :(

 Jose
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Request for support of GL_AMD_pinned_memory and GL_ARB_buffer_storage extensions

2014-02-05 Thread Roland Scheidegger
Am 06.02.2014 00:49, schrieb Jose Fonseca:
 I hadn't looked at GL_ARB_buffer_storage. I need to read more closely, but at 
 a glance i looks like GL_MAP_PERSISTENT_BIT alone is okay (app needs to call 
 FlushMappedBufferRange must be called to guarantee coherence) but if 
 GL_MAP_COHERENCE_BIT is set we are indeed in face of the same issue... :-(
 
 Even worse, being part of GL 4.4 and there being no way for the 
 implementation to fail GL_MAP_COHERENCE_BIT mappings, it means there is no 
 way to avoid supporting it...
 
 Jose
 
 Note to self: my time would be better spent on reviewing extensions before 
 they are ratified, than ranting after the fact...

I don't think that would work. The reason for this stuff to exist is
because new hw makes that possible on the hw level directly. Some apus
might even be able to share such buffers in LLC (I don't know if Haswell
can do that, and AMD APUs lack a common cache level but they can
actually do fully coherent memory access from the cpu and gpu side). Now
with discrete chips it's not that easy but everybody is doing unified
memory these days.
I don't know how to solve this for tracing, though, indeed seems
impossible...

Roland



 
 - Original Message -
 However, GL_ARB_buffer_storage (OpenGL 4.4) with GL_MAP_PERSISTENT_BIT
 isn't much different. The only difference I see between
 ARB_buffer_storage and AMD_pinned_memory is that AMD_pinned_memory
 allows mapping CPU memory to the GPU address space permanently, while
 ARB_buffer_storage allows mapping GPU memory to the CPU address
 permanently. At the end of the day, both the GPU and the CPU can read
 and modify the same buffer and all they need to use for
 synchronization is fences.

 Marek

 On Wed, Feb 5, 2014 at 8:10 PM, Jose Fonseca jfons...@vmware.com wrote:


 - Original Message -


 - Original Message -
 On 05.02.2014 18:08, Jose Fonseca wrote:
 I honestly hope that GL_AMD_pinned_memory doesn't become popular. It
 would
 have been alright if it wasn't for this bit in
 https://urldefense.proofpoint.com/v1/url?u=http://www.opengl.org/registry/specs/AMD/pinned_memory.txtk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=NMr9uy2iTjWVixC0wOcYCWEIYhfo80qKwRgdodpoDzA%3D%0Am=pA%2FnK9X3xx0wAlMUZ24PfQ1mW6wAMdTUujz%2Bx7LRwCA%3D%0As=ebbe1f51deb46c81578b3c125b16e31b5f4b28c1d47e283bc9ef588e2707024d
 which says:

  2) Can the application still use the buffer using the CPU
  address?

  RESOLVED: YES. However, this access would be completely
  non synchronized to the OpenGL pipeline, unless explicit
  synchronization is being used (for example, through glFinish
  or
  by
  using
  sync objects).

 And I'm imagining apps which are streaming vertex data doing precisely
 just
 that...


 I don't understand your concern, this is exactly the same behavior
 GL_MAP_UNSYCHRONIZED_BIT has, and apps are supposedly using that
 properly. How does apitrace handle it?

 GL_AMD_pinned_memory it's nothing like GL_ARB_map_buffer_range's
 GL_MAP_UNSYCHRONIZED_BIT:

 - When an app touches memory returned by
 glMapBufferRange(GL_MAP_UNSYCHRONIZED_BIT) it will communicate back to the
 OpenGL driver which bytes it actually touched via the
 glFlushMappedBufferRange (unless the apps doesn't care about performance
 and
 doesn't call glFlushMappedBufferRange at all, which is silly as it will
 force the OpenGL driver to assumed the whole range changed)

   In this case, the OpenGL driver (hence apitrace) should get all the
   information it needs about which bytes were updated betwen
   glMap/glUnmap.

 - When an app touches memory bound via GL_AMD_pinned_memory outside
 glMap/glUnmap, there are be _no_ hints whatsever.  The OpenGL driver might
 not care as the memory is shared between CPU and GPU, so all is good as
 far
 is it is concerned, but all the changes the app does are invisible at an
 API
 level, hence apitrace will not be able to catch them unless it does
 onerous
 heuristics.


 So while both extensions allow unsynchronized access, but lack of
 synchronization is not my concern. My concern is that GL_AMD_pinned_memory
 allows *hidden* access to GPU memory.

 Just for the record, the challenges GL_AMD_pinned_memory presents to
 Apitrace are much similar to the old-fashioned OpenGL user array pointers:
 an app is free to change the contents of memory pointed by user arrays
 pointers at any point in time, except during a draw call.  This means that
 before every draw call, Apitrace needs to scavenge all the user memory
 pointers and write their contents to the trace file, just in case the app
 changed it..

 In order to support GL_AMD_pinned_memory, for every draw call Apitrace
 would also need to walk over bound GL_AMD_pinned_memory (and nowadays
 there are loads of bound points!), and check if data changed, and
 serialize in the trace file if it did...


 I never care much about performance of Apitrace with user array pointers:
 it is an old paradigm; only old 

Re: [Mesa-dev] Request for support of GL_AMD_pinned_memory and GL_ARB_buffer_storage extensions

2014-01-28 Thread Marek Olšák
Yes, GL_ARB_buffer_storage is being worked on. We'll support it on all
Radeon cards R300 and up.

Anyway, GL_STREAM_DRAW should give you the same behavior as
GL_CLIENT_STORAGE_BIT on open source Radeon drivers.

Marek

On Sun, Nov 24, 2013 at 1:19 PM, Tony Wasserka neobra...@googlemail.com wrote:
 Hello everyone,
 I was told on IRC that my question would get most attention around here - so
 bear with me if this is the wrong place to ask

 I'm one of the developers of the GC/Wii emulator Dolphin. We recently
 rewrote our OpenGL renderer to use modern OpenGL 3 features, however one
 thing that we stumbled upon are the lack of efficient (vertex/index) buffer
 data streaming mechanisms in OpenGL. Basically, most of our vertex data is
 used once and never again after that (we have to do this for accurate
 emulation) - so all vertex data gets streamed into one huge ring buffer (and
 analogously for index data, which uses its own huge ring buffer). For buffer
 streaming, we have multiple code paths using a combination of
 glMapBufferRange, glBufferSubData, fences and buffer orphaning, yet none of
 these come anywhere close to the performance of (legacy) rendering from a
 vertex array stored in RAM.

 There are two OpenGL extensions which greatly help us in this situation:
 AMD's pinned memory [1], and buffer storage[2] in GL 4.4. We currently have
 no buffer storage code path, but usage of pinned memory gave us a speedup of
 up to 60% under heavy workloads when working with AMD's Catalyst driver
 under Windows. We expect the same speedup when using buffer storage
 (specifically we need CLIENT_STORAGE_BIT, if I recall correctly).

 So the natural question that arises is: Is either of these two extensions
 going to be supported in mesa anytime soon or is it of lower priority than
 other extensions? Also, is the pinned memory extension AMD hardware specific
 or would it be possible to support it for other hardware, too? I'm not sure
 if buffer storage (being a GL 4.4 extension, and I read that it might
 actually depend on some other GL 4.3 extension) is possible to implement on
 older hardware, yet it would be very useful for us to have efficient
 streaming methods for old GPUs, too.

 I hope this mail doesn't sound too commanding or anything, it's just
 supposed to be a friendly question on improving the emulator experience for
 our user base
 Thanks in advance!

 Best regards,
 Tony

 [1] http://www.opengl.org/registry/specs/AMD/pinned_memory.txt
 [2] http://www.opengl.org/registry/specs/ARB/buffer_storage.txt

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Request for support of GL_AMD_pinned_memory and GL_ARB_buffer_storage extensions

2014-01-28 Thread Ian Romanick
On 01/28/2014 05:35 PM, Marek Olšák wrote:
 Yes, GL_ARB_buffer_storage is being worked on. We'll support it on all
 Radeon cards R300 and up.

Are you guys working on that?  Have an ETA? :)

 Anyway, GL_STREAM_DRAW should give you the same behavior as
 GL_CLIENT_STORAGE_BIT on open source Radeon drivers.

I think a big piece of functionality that Tony wants is the ability to
have CPU pointers that persist for the lifetime of the context.  Without
GL_ARB_buffer_storage or GL_AMD_pinned_memory the application has to
MapBuffer and UnmapBuffer around draw calls.

 Marek
 
 On Sun, Nov 24, 2013 at 1:19 PM, Tony Wasserka neobra...@googlemail.com 
 wrote:
 Hello everyone,
 I was told on IRC that my question would get most attention around here - so
 bear with me if this is the wrong place to ask

 I'm one of the developers of the GC/Wii emulator Dolphin. We recently
 rewrote our OpenGL renderer to use modern OpenGL 3 features, however one
 thing that we stumbled upon are the lack of efficient (vertex/index) buffer
 data streaming mechanisms in OpenGL. Basically, most of our vertex data is
 used once and never again after that (we have to do this for accurate
 emulation) - so all vertex data gets streamed into one huge ring buffer (and
 analogously for index data, which uses its own huge ring buffer). For buffer
 streaming, we have multiple code paths using a combination of
 glMapBufferRange, glBufferSubData, fences and buffer orphaning, yet none of
 these come anywhere close to the performance of (legacy) rendering from a
 vertex array stored in RAM.

 There are two OpenGL extensions which greatly help us in this situation:
 AMD's pinned memory [1], and buffer storage[2] in GL 4.4. We currently have
 no buffer storage code path, but usage of pinned memory gave us a speedup of
 up to 60% under heavy workloads when working with AMD's Catalyst driver
 under Windows. We expect the same speedup when using buffer storage
 (specifically we need CLIENT_STORAGE_BIT, if I recall correctly).

 So the natural question that arises is: Is either of these two extensions
 going to be supported in mesa anytime soon or is it of lower priority than
 other extensions? Also, is the pinned memory extension AMD hardware specific
 or would it be possible to support it for other hardware, too? I'm not sure
 if buffer storage (being a GL 4.4 extension, and I read that it might
 actually depend on some other GL 4.3 extension) is possible to implement on
 older hardware, yet it would be very useful for us to have efficient
 streaming methods for old GPUs, too.

 I hope this mail doesn't sound too commanding or anything, it's just
 supposed to be a friendly question on improving the emulator experience for
 our user base
 Thanks in advance!

 Best regards,
 Tony

 [1] http://www.opengl.org/registry/specs/AMD/pinned_memory.txt
 [2] http://www.opengl.org/registry/specs/ARB/buffer_storage.txt

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Request for support of GL_AMD_pinned_memory and GL_ARB_buffer_storage extensions

2014-01-28 Thread Marek Olšák
On Wed, Jan 29, 2014 at 1:42 AM, Ian Romanick i...@freedesktop.org wrote:
 On 01/28/2014 05:35 PM, Marek Olšák wrote:
 Yes, GL_ARB_buffer_storage is being worked on. We'll support it on all
 Radeon cards R300 and up.

 Are you guys working on that?  Have an ETA? :)

It's done. I'm writing piglit tests at the moment. I'll send my
patches tomorrow.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Request for support of GL_AMD_pinned_memory and GL_ARB_buffer_storage extensions

2013-11-28 Thread Tony Wasserka

Hey Matt,
The speedup was only observed on discrete GPUs so far, I have no data 
about APUs.


Best regards,
Tony


Am 26.11.2013 04:50, schrieb Matt Harvey:

Hi Tony,

I guess the lack of response means that neither of those extensions is 
on anyone's road map for right now.


I have a quick question. Were you seeing those speedups only on the 
AMD APUs, or also on the discrete cards?


Thanks,
Matt


On Sun, Nov 24, 2013 at 7:19 AM, Tony Wasserka 
neobra...@googlemail.com mailto:neobra...@googlemail.com wrote:


Hello everyone,
I was told on IRC that my question would get most attention around
here - so bear with me if this is the wrong place to ask

I'm one of the developers of the GC/Wii emulator Dolphin. We
recently rewrote our OpenGL renderer to use modern OpenGL 3
features, however one thing that we stumbled upon are the lack of
efficient (vertex/index) buffer data streaming mechanisms in
OpenGL. Basically, most of our vertex data is used once and never
again after that (we have to do this for accurate emulation) - so
all vertex data gets streamed into one huge ring buffer (and
analogously for index data, which uses its own huge ring buffer).
For buffer streaming, we have multiple code paths using a
combination of glMapBufferRange, glBufferSubData, fences and
buffer orphaning, yet none of these come anywhere close to the
performance of (legacy) rendering from a vertex array stored in RAM.

There are two OpenGL extensions which greatly help us in this
situation: AMD's pinned memory [1], and buffer storage[2] in GL
4.4. We currently have no buffer storage code path, but usage of
pinned memory gave us a speedup of up to 60% under heavy workloads
when working with AMD's Catalyst driver under Windows. We expect
the same speedup when using buffer storage (specifically we need
CLIENT_STORAGE_BIT, if I recall correctly).

So the natural question that arises is: Is either of these two
extensions going to be supported in mesa anytime soon or is it of
lower priority than other extensions? Also, is the pinned memory
extension AMD hardware specific or would it be possible to support
it for other hardware, too? I'm not sure if buffer storage (being
a GL 4.4 extension, and I read that it might actually depend on
some other GL 4.3 extension) is possible to implement on older
hardware, yet it would be very useful for us to have efficient
streaming methods for old GPUs, too.

I hope this mail doesn't sound too commanding or anything, it's
just supposed to be a friendly question on improving the emulator
experience for our user base
Thanks in advance!

Best regards,
Tony

[1] http://www.opengl.org/registry/specs/AMD/pinned_memory.txt
[2] http://www.opengl.org/registry/specs/ARB/buffer_storage.txt

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org mailto:mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Request for support of GL_AMD_pinned_memory and GL_ARB_buffer_storage extensions

2013-11-28 Thread Tony Wasserka

Hi Tim,
I've given your suggestion some thought - and while it looks like 
something which would work, my time schedule currently is too tight 
(regardless of money involved) to implement support for the extension. I 
might think about it again in three months, if the extension has not 
been implemented by then by someone else. Thanks for bringing this up 
though, it definitely would've been an option that I hadn't thought of, 
if only I had any time to spend on it :)


Best regards,
Tony



Am 25.11.2013 22:45, schrieb Timothy Arceri:

Hi Tony,

I'm not one of the main Mesa devs just an independent developer that 
works on Mesa in my spare time. All I can suggest is you have a go at 
implementing the features yourself. You obviously have a lot of talent 
and I'm sure you would be able to accomplish the task. If time is an 
issue there is a lot of interest in the Linux community for improving 
Mesa and I myself have run two successful crowd funding campaigns to 
be able to support some full time work on Mesa. See: 
http://www.indiegogo.com/projects/improve-opengl-support-for-the-linux-graphics-drivers-mesa/x/2053460
Maybe you could do something similar. If you do decide to do this I 
find its useful to start working on the extension (showing work on 
github etc) before running the campaign as people like to be sure you 
can accomplish what you are promising.

Anyway this is just an option for you to think about.

Tim


On Sunday, 24 November 2013 11:57 PM, Tony Wasserka 
neobra...@googlemail.com wrote:

Hello everyone,
I was told on IRC that my question would get most attention around here
- so bear with me if this is the wrong place to ask

I'm one of the developers of the GC/Wii emulator Dolphin. We recently
rewrote our OpenGL renderer to use modern OpenGL 3 features, however one
thing that we stumbled upon are the lack of efficient (vertex/index)
buffer data streaming mechanisms in OpenGL. Basically, most of our
vertex data is used once and never again after that (we have to do this
for accurate emulation) - so all vertex data gets streamed into one huge
ring buffer (and analogously for index data, which uses its own huge
ring buffer). For buffer streaming, we have multiple code paths using a
combination of glMapBufferRange, glBufferSubData, fences and buffer
orphaning, yet none of these come anywhere close to the performance of
(legacy) rendering from a vertex array stored in RAM.

There are two OpenGL extensions which greatly help us in this situation:
AMD's pinned memory [1], and buffer storage[2] in GL 4.4. We currently
have no buffer storage code path, but usage of pinned memory gave us a
speedup of up to 60% under heavy workloads when working with AMD's
Catalyst driver under Windows. We expect the same speedup when using
buffer storage (specifically we need CLIENT_STORAGE_BIT, if I recall
correctly).

So the natural question that arises is: Is either of these two
extensions going to be supported in mesa anytime soon or is it of lower
priority than other extensions? Also, is the pinned memory extension AMD
hardware specific or would it be possible to support it for other
hardware, too? I'm not sure if buffer storage (being a GL 4.4 extension,
and I read that it might actually depend on some other GL 4.3 extension)
is possible to implement on older hardware, yet it would be very useful
for us to have efficient streaming methods for old GPUs, too.

I hope this mail doesn't sound too commanding or anything, it's just
supposed to be a friendly question on improving the emulator experience
for our user base
Thanks in advance!

Best regards,
Tony

[1] http://www.opengl.org/registry/specs/AMD/pinned_memory.txt
[2] http://www.opengl.org/registry/specs/ARB/buffer_storage.txt

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org mailto:mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Request for support of GL_AMD_pinned_memory and GL_ARB_buffer_storage extensions

2013-11-24 Thread Tony Wasserka

Hello everyone,
I was told on IRC that my question would get most attention around here 
- so bear with me if this is the wrong place to ask


I'm one of the developers of the GC/Wii emulator Dolphin. We recently 
rewrote our OpenGL renderer to use modern OpenGL 3 features, however one 
thing that we stumbled upon are the lack of efficient (vertex/index) 
buffer data streaming mechanisms in OpenGL. Basically, most of our 
vertex data is used once and never again after that (we have to do this 
for accurate emulation) - so all vertex data gets streamed into one huge 
ring buffer (and analogously for index data, which uses its own huge 
ring buffer). For buffer streaming, we have multiple code paths using a 
combination of glMapBufferRange, glBufferSubData, fences and buffer 
orphaning, yet none of these come anywhere close to the performance of 
(legacy) rendering from a vertex array stored in RAM.


There are two OpenGL extensions which greatly help us in this situation: 
AMD's pinned memory [1], and buffer storage[2] in GL 4.4. We currently 
have no buffer storage code path, but usage of pinned memory gave us a 
speedup of up to 60% under heavy workloads when working with AMD's 
Catalyst driver under Windows. We expect the same speedup when using 
buffer storage (specifically we need CLIENT_STORAGE_BIT, if I recall 
correctly).


So the natural question that arises is: Is either of these two 
extensions going to be supported in mesa anytime soon or is it of lower 
priority than other extensions? Also, is the pinned memory extension AMD 
hardware specific or would it be possible to support it for other 
hardware, too? I'm not sure if buffer storage (being a GL 4.4 extension, 
and I read that it might actually depend on some other GL 4.3 extension) 
is possible to implement on older hardware, yet it would be very useful 
for us to have efficient streaming methods for old GPUs, too.


I hope this mail doesn't sound too commanding or anything, it's just 
supposed to be a friendly question on improving the emulator experience 
for our user base

Thanks in advance!

Best regards,
Tony

[1] http://www.opengl.org/registry/specs/AMD/pinned_memory.txt
[2] http://www.opengl.org/registry/specs/ARB/buffer_storage.txt

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev