Re: [PATCH v3 3/4] drm/ttm: convert to unified vma offset manager

2013-07-22 Thread David Herrmann
Sorry, I forgot to CC correctly.

On Mon, Jul 22, 2013 at 12:53 PM, David Herrmann dh.herrm...@gmail.com wrote:
 Hi

 On Fri, Jul 19, 2013 at 11:13 AM, Thomas Hellstrom
 thellst...@vmware.com wrote:
 On 07/18/2013 10:54 PM, David Herrmann wrote:

 Hi

 On Thu, Jul 18, 2013 at 1:24 PM, Thomas Hellstrom thellst...@vmware.com
 wrote:


 ...



 I think that if there are good reasons to keep locking internal, I'm fine
 with that, (And also, of course, with
 Daniel's proposal). Currently the add / remove / lookup paths are mostly
 used by TTM during object creation and
 destruction.

 However, if the lookup path is ever used by pread / pwrite, that
 situation
 might change and we would then like to
 minimize the locking.

 I tried to keep the change as minimal as I could. Follow-up patches
 are welcome. I just thought pushing the lock into drm_vma_* would
 simplify things. If there are benchmarks that prove me wrong, I'll
 gladly spend some time optimizing that.


 In the general case, one reason for designing the locking outside of a
 utilities like this, is that different callers may have
 different requirements. For example, the call path is known not to be
 multithreaded at all, or
 the caller may prefer a mutex over a spinlock for various reasons. It might
 also be that some callers will want to use
 RCU locking in the future if the lookup path becomes busy, and that would
 require *all* users to adapt to RCU object destruction...

 I haven't looked at the code closely enough to say that any of this applies
 in this particular case, though.

 Some notes why I went with local locking:
  - mmap offset creation is done once and is independent of any other
 operations you might do on the BO in parallel
  - mmap lookup is also only done once in most cases. User-space rarely
 maps a buffer twice (setup/teardown of mmaps is expensive, but keeping
 them around is very cheap). And for shared buffers only the writer
 maps it as the reader normally passes it to the kernel without
 mapping/touching it. Only for software-rendering we have two or more
 mappings of the same object.
  - creating/mapping/destroying buffer objects is expensive in its
 current form and buffers tend to stay around for a long time

 I couldn't find a situation were the vma-manager was part of a
 performance critical path. But on the other hand, the paths were it is
 used are quite heavy. That's why I don't want to lock the whole buffer
 or even device. Instead, we need some management lock which is used
 for mmap-setup or similar things that don't affect other operations on
 the buffer or device. We don't have such a lock, so I introduced local
 locking. If we end up with more use-cases, I volunteer to refactor
 this. But I am no big fan of overgeneralizing it before more uses are
 known.

 Anyway, I will resend with vm_lock removed (+_unlocked() helpers) so
 we keep the status-quo regarding locks. Thanks for the comments on TTM
 buffer transfer.

 Thanks
 David
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v3 3/4] drm/ttm: convert to unified vma offset manager

2013-07-22 Thread Thomas Hellstrom

On 07/22/2013 12:55 PM, David Herrmann wrote:

Sorry, I forgot to CC correctly.

On Mon, Jul 22, 2013 at 12:53 PM, David Herrmann dh.herrm...@gmail.com wrote:

Hi

On Fri, Jul 19, 2013 at 11:13 AM, Thomas Hellstrom
thellst...@vmware.com wrote:

On 07/18/2013 10:54 PM, David Herrmann wrote:

Hi

On Thu, Jul 18, 2013 at 1:24 PM, Thomas Hellstrom thellst...@vmware.com
wrote:


...



I think that if there are good reasons to keep locking internal, I'm fine
with that, (And also, of course, with
Daniel's proposal). Currently the add / remove / lookup paths are mostly
used by TTM during object creation and
destruction.

However, if the lookup path is ever used by pread / pwrite, that
situation
might change and we would then like to
minimize the locking.

I tried to keep the change as minimal as I could. Follow-up patches
are welcome. I just thought pushing the lock into drm_vma_* would
simplify things. If there are benchmarks that prove me wrong, I'll
gladly spend some time optimizing that.


In the general case, one reason for designing the locking outside of a
utilities like this, is that different callers may have
different requirements. For example, the call path is known not to be
multithreaded at all, or
the caller may prefer a mutex over a spinlock for various reasons. It might
also be that some callers will want to use
RCU locking in the future if the lookup path becomes busy, and that would
require *all* users to adapt to RCU object destruction...

I haven't looked at the code closely enough to say that any of this applies
in this particular case, though.

Some notes why I went with local locking:
  - mmap offset creation is done once and is independent of any other
operations you might do on the BO in parallel
  - mmap lookup is also only done once in most cases. User-space rarely
maps a buffer twice (setup/teardown of mmaps is expensive, but keeping
them around is very cheap). And for shared buffers only the writer
maps it as the reader normally passes it to the kernel without
mapping/touching it. Only for software-rendering we have two or more
mappings of the same object.
  - creating/mapping/destroying buffer objects is expensive in its
current form and buffers tend to stay around for a long time

I couldn't find a situation were the vma-manager was part of a
performance critical path. But on the other hand, the paths were it is
used are quite heavy. That's why I don't want to lock the whole buffer
or even device. Instead, we need some management lock which is used
for mmap-setup or similar things that don't affect other operations on
the buffer or device. We don't have such a lock, so I introduced local
locking. If we end up with more use-cases, I volunteer to refactor
this. But I am no big fan of overgeneralizing it before more uses are
known.


I think we are discussing two different things here:

1) Having a separate lock for vma management vs
2) building that lock into the vma manager utility.

You're arguing for 1) and I completely agree with you, and although I 
still think one generally should avoid building locks into utilities 
like this (avoid 2),

I'm fine with the current approach.

Thanks,
Thomas
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v3 3/4] drm/ttm: convert to unified vma offset manager

2013-07-19 Thread Maarten Lankhorst
Op 18-07-13 22:54, David Herrmann schreef:
 Hi

 On Thu, Jul 18, 2013 at 1:24 PM, Thomas Hellstrom thellst...@vmware.com 
 wrote:
 On 07/18/2013 01:07 PM, David Herrmann wrote:
 Hi

 On Thu, Jul 18, 2013 at 10:53 AM, Thomas Hellstrom
 thellst...@vmware.com wrote:
 A quick look, but not a full review:

 Looks mostly good, but it looks like the TTM vm lock isn't needed at all
 anymore (provided the vma offset manager is properly protected), since
 kref_get_unless_zero() is used when a reference after lookup is taken.
 (please see the kref_get_unless_zero documentation examples). When
 drm_vma_offset_remove() is called, the kref value is unconditionally
 zero,
 and that should block any successful lookup.
 If we drop vm_lock without modifying TTM, this will not work. Even
 kref_get_unless_zero() needs some guarantee that the object is still
 valid. Assume this scenario:

 drm_vma_offset_lookup() returns a valid node, however, before we call
 kref_get_*(), another thread destroys the last reference to the bo so
 it gets kfree()d. kref_get_unless_zero() will thus reference memory
 which can be _anything_ and is not guarantee to stay 0.
 (Documentation/kref.txt mentions this, too, and I guess it was you who
 wrote that).

 I cannot see any locking around the mmap call that could guarantee
 this except vm_lock.

 You're right. My apologies. Parental leave has taken its toll.


 Otherwise, if the vma offset manager always needs external locking to
 make
 lookup + referencing atomic, then perhaps locking should be completely
 left to the caller?
 I would actually prefer to keep it in the VMA manager. It allows some
 fast-paths which otherwise would need special semantics for the caller
 (for instance see the access-management patches which are based on
 this patchset or the reduced locking during setup/release). We'd also
 require the caller to use rwsems for good performance, which is not
 the case for gem.

 So how about Daniel's proposal to add an _unlocked() version and
 provide _lock_lookup() and _unlock_lookup() helpers which just wrap
 the read_lock() and read_unlock?

 Thanks!
 David

 I think that if there are good reasons to keep locking internal, I'm fine
 with that, (And also, of course, with
 Daniel's proposal). Currently the add / remove / lookup paths are mostly
 used by TTM during object creation and
 destruction.

 However, if the lookup path is ever used by pread / pwrite, that situation
 might change and we would then like to
 minimize the locking.
 I tried to keep the change as minimal as I could. Follow-up patches
 are welcome. I just thought pushing the lock into drm_vma_* would
 simplify things. If there are benchmarks that prove me wrong, I'll
 gladly spend some time optimizing that.

 Apart from this, I'd like to see someone ack the
 ttm_buffer_object_transfer() changes. I am not very confident with
 that. Everything else should be trivial.

The transfer object only exists to kill off the memory backing during 
asynchronous eviction,
by using the delayed destroyed mechanism. The re-initialization there looks 
correct to me.

~Maarten
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v3 3/4] drm/ttm: convert to unified vma offset manager

2013-07-19 Thread Thomas Hellstrom

On 07/18/2013 10:54 PM, David Herrmann wrote:

Hi

On Thu, Jul 18, 2013 at 1:24 PM, Thomas Hellstrom thellst...@vmware.com wrote:


...



I think that if there are good reasons to keep locking internal, I'm fine
with that, (And also, of course, with
Daniel's proposal). Currently the add / remove / lookup paths are mostly
used by TTM during object creation and
destruction.

However, if the lookup path is ever used by pread / pwrite, that situation
might change and we would then like to
minimize the locking.

I tried to keep the change as minimal as I could. Follow-up patches
are welcome. I just thought pushing the lock into drm_vma_* would
simplify things. If there are benchmarks that prove me wrong, I'll
gladly spend some time optimizing that.


In the general case, one reason for designing the locking outside of a 
utilities like this, is that different callers may have
different requirements. For example, the call path is known not to be 
multithreaded at all, or
the caller may prefer a mutex over a spinlock for various reasons. It 
might also be that some callers will want to use
RCU locking in the future if the lookup path becomes busy, and that 
would require *all* users to adapt to RCU object destruction...


I haven't looked at the code closely enough to say that any of this 
applies in this particular case, though.



Thanks,
Thomas



Thanks
David

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v3 3/4] drm/ttm: convert to unified vma offset manager

2013-07-18 Thread Thomas Hellstrom

A quick look, but not a full review:

Looks mostly good, but it looks like the TTM vm lock isn't needed at all 
anymore (provided the vma offset manager is properly protected), since
kref_get_unless_zero() is used when a reference after lookup is taken. 
(please see the kref_get_unless_zero documentation examples). When 
drm_vma_offset_remove() is called, the kref value is unconditionally 
zero, and that should block any successful lookup.


Otherwise, if the vma offset manager always needs external locking to 
make lookup + referencing atomic, then perhaps locking should be completely

left to the caller?

Thanks,
Thomas



On 07/17/2013 08:14 PM, David Herrmann wrote:

Use the new vma-manager infrastructure. This doesn't change any
implementation details as the vma-offset-manager is nearly copied 1-to-1
from TTM.

Even though the vma-manager uses its own locks, we still need bo-vm_lock
to prevent bos from being destroyed before we can get a reference during
lookup. However, this lock is not needed during vm-setup as we still
hold a reference there.

This also drops the addr_space_offset member as it is a copy of vm_start
in vma_node objects. Use the accessor functions instead.

Cc: Dave Airlie airl...@redhat.com
Cc: Ben Skeggs bske...@redhat.com
Cc: Maarten Lankhorst maarten.lankho...@canonical.com
Cc: Martin Peres martin.pe...@labri.fr
Cc: Alex Deucher alexander.deuc...@amd.com
Cc: Thomas Hellstrom thellst...@vmware.com
Signed-off-by: David Herrmann dh.herrm...@gmail.com
---
  drivers/gpu/drm/ast/ast_main.c|  2 +-
  drivers/gpu/drm/cirrus/cirrus_main.c  |  2 +-
  drivers/gpu/drm/mgag200/mgag200_main.c|  2 +-
  drivers/gpu/drm/nouveau/nouveau_display.c |  2 +-
  drivers/gpu/drm/nouveau/nouveau_gem.c |  2 +-
  drivers/gpu/drm/qxl/qxl_object.h  |  2 +-
  drivers/gpu/drm/qxl/qxl_release.c |  2 +-
  drivers/gpu/drm/radeon/radeon_object.h|  5 +-
  drivers/gpu/drm/ttm/ttm_bo.c  | 84 ++-
  drivers/gpu/drm/ttm/ttm_bo_util.c |  3 +-
  drivers/gpu/drm/ttm/ttm_bo_vm.c   | 81 -
  drivers/gpu/drm/vmwgfx/vmwgfx_resource.c  |  4 +-
  include/drm/ttm/ttm_bo_api.h  | 15 ++
  include/drm/ttm/ttm_bo_driver.h   |  7 +--
  14 files changed, 65 insertions(+), 148 deletions(-)

diff --git a/drivers/gpu/drm/ast/ast_main.c b/drivers/gpu/drm/ast/ast_main.c
index f60fd7b..c195dc2 100644
--- a/drivers/gpu/drm/ast/ast_main.c
+++ b/drivers/gpu/drm/ast/ast_main.c
@@ -487,7 +487,7 @@ void ast_gem_free_object(struct drm_gem_object *obj)
  
  static inline u64 ast_bo_mmap_offset(struct ast_bo *bo)

  {
-   return bo-bo.addr_space_offset;
+   return drm_vma_node_offset_addr(bo-bo.vma_node);
  }
  int
  ast_dumb_mmap_offset(struct drm_file *file,
diff --git a/drivers/gpu/drm/cirrus/cirrus_main.c 
b/drivers/gpu/drm/cirrus/cirrus_main.c
index 35cbae8..3a7a0ef 100644
--- a/drivers/gpu/drm/cirrus/cirrus_main.c
+++ b/drivers/gpu/drm/cirrus/cirrus_main.c
@@ -294,7 +294,7 @@ void cirrus_gem_free_object(struct drm_gem_object *obj)
  
  static inline u64 cirrus_bo_mmap_offset(struct cirrus_bo *bo)

  {
-   return bo-bo.addr_space_offset;
+   return drm_vma_node_offset_addr(bo-bo.vma_node);
  }
  
  int

diff --git a/drivers/gpu/drm/mgag200/mgag200_main.c 
b/drivers/gpu/drm/mgag200/mgag200_main.c
index 9fa5685..1a75ea3 100644
--- a/drivers/gpu/drm/mgag200/mgag200_main.c
+++ b/drivers/gpu/drm/mgag200/mgag200_main.c
@@ -349,7 +349,7 @@ void mgag200_gem_free_object(struct drm_gem_object *obj)
  
  static inline u64 mgag200_bo_mmap_offset(struct mgag200_bo *bo)

  {
-   return bo-bo.addr_space_offset;
+   return drm_vma_node_offset_addr(bo-bo.vma_node);
  }
  
  int

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 708b2d1..7a8caa1 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -697,7 +697,7 @@ nouveau_display_dumb_map_offset(struct drm_file *file_priv,
gem = drm_gem_object_lookup(dev, file_priv, handle);
if (gem) {
struct nouveau_bo *bo = gem-driver_private;
-   *poffset = bo-bo.addr_space_offset;
+   *poffset = drm_vma_node_offset_addr(bo-bo.vma_node);
drm_gem_object_unreference_unlocked(gem);
return 0;
}
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c 
b/drivers/gpu/drm/nouveau/nouveau_gem.c
index e72d09c..86597eb 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -226,7 +226,7 @@ nouveau_gem_info(struct drm_file *file_priv, struct 
drm_gem_object *gem,
}
  
  	rep-size = nvbo-bo.mem.num_pages  PAGE_SHIFT;

-   rep-map_handle = nvbo-bo.addr_space_offset;
+   rep-map_handle = drm_vma_node_offset_addr(nvbo-bo.vma_node);
rep-tile_mode = nvbo-tile_mode;
rep-tile_flags = nvbo-tile_flags;
 

Re: [PATCH v3 3/4] drm/ttm: convert to unified vma offset manager

2013-07-18 Thread Daniel Vetter
On Thu, Jul 18, 2013 at 10:53 AM, Thomas Hellstrom
thellst...@vmware.com wrote:
 A quick look, but not a full review:

 Looks mostly good, but it looks like the TTM vm lock isn't needed at all
 anymore (provided the vma offset manager is properly protected), since
 kref_get_unless_zero() is used when a reference after lookup is taken.
 (please see the kref_get_unless_zero documentation examples). When
 drm_vma_offset_remove() is called, the kref value is unconditionally zero,
 and that should block any successful lookup.

 Otherwise, if the vma offset manager always needs external locking to make
 lookup + referencing atomic, then perhaps locking should be completely
 left to the caller?

Somehow I've thought plain gem drivers had this fixed, but looks like
I've never gotten around to it. So we need to rework things anyway.

What about a drm_vma_offset_lookup_unlocked which just checks that the
caller is holding the offset manager's lock? That way everyone can
follow up with the get_unless_zero dance correctly. And ttm could drop
it's own vm lock.

I've considered whether we should just move the vma node into struct
drm_gem_object and let the offset manager do the dance, but that's
much more invasive. And I'm not sure it's the right thing to do yet.
So I think we should consider this only as a follow-up series.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v3 3/4] drm/ttm: convert to unified vma offset manager

2013-07-18 Thread David Herrmann
Hi

On Thu, Jul 18, 2013 at 10:53 AM, Thomas Hellstrom
thellst...@vmware.com wrote:
 A quick look, but not a full review:

 Looks mostly good, but it looks like the TTM vm lock isn't needed at all
 anymore (provided the vma offset manager is properly protected), since
 kref_get_unless_zero() is used when a reference after lookup is taken.
 (please see the kref_get_unless_zero documentation examples). When
 drm_vma_offset_remove() is called, the kref value is unconditionally zero,
 and that should block any successful lookup.

If we drop vm_lock without modifying TTM, this will not work. Even
kref_get_unless_zero() needs some guarantee that the object is still
valid. Assume this scenario:

drm_vma_offset_lookup() returns a valid node, however, before we call
kref_get_*(), another thread destroys the last reference to the bo so
it gets kfree()d. kref_get_unless_zero() will thus reference memory
which can be _anything_ and is not guarantee to stay 0.
(Documentation/kref.txt mentions this, too, and I guess it was you who
wrote that).

I cannot see any locking around the mmap call that could guarantee
this except vm_lock.

 Otherwise, if the vma offset manager always needs external locking to make
 lookup + referencing atomic, then perhaps locking should be completely
 left to the caller?

I would actually prefer to keep it in the VMA manager. It allows some
fast-paths which otherwise would need special semantics for the caller
(for instance see the access-management patches which are based on
this patchset or the reduced locking during setup/release). We'd also
require the caller to use rwsems for good performance, which is not
the case for gem.

So how about Daniel's proposal to add an _unlocked() version and
provide _lock_lookup() and _unlock_lookup() helpers which just wrap
the read_lock() and read_unlock?

Thanks!
David

 Thanks,
 Thomas




 On 07/17/2013 08:14 PM, David Herrmann wrote:

 Use the new vma-manager infrastructure. This doesn't change any
 implementation details as the vma-offset-manager is nearly copied 1-to-1
 from TTM.

 Even though the vma-manager uses its own locks, we still need bo-vm_lock
 to prevent bos from being destroyed before we can get a reference during
 lookup. However, this lock is not needed during vm-setup as we still
 hold a reference there.

 This also drops the addr_space_offset member as it is a copy of vm_start
 in vma_node objects. Use the accessor functions instead.

 Cc: Dave Airlie airl...@redhat.com
 Cc: Ben Skeggs bske...@redhat.com
 Cc: Maarten Lankhorst maarten.lankho...@canonical.com
 Cc: Martin Peres martin.pe...@labri.fr
 Cc: Alex Deucher alexander.deuc...@amd.com
 Cc: Thomas Hellstrom thellst...@vmware.com
 Signed-off-by: David Herrmann dh.herrm...@gmail.com
 ---
   drivers/gpu/drm/ast/ast_main.c|  2 +-
   drivers/gpu/drm/cirrus/cirrus_main.c  |  2 +-
   drivers/gpu/drm/mgag200/mgag200_main.c|  2 +-
   drivers/gpu/drm/nouveau/nouveau_display.c |  2 +-
   drivers/gpu/drm/nouveau/nouveau_gem.c |  2 +-
   drivers/gpu/drm/qxl/qxl_object.h  |  2 +-
   drivers/gpu/drm/qxl/qxl_release.c |  2 +-
   drivers/gpu/drm/radeon/radeon_object.h|  5 +-
   drivers/gpu/drm/ttm/ttm_bo.c  | 84
 ++-
   drivers/gpu/drm/ttm/ttm_bo_util.c |  3 +-
   drivers/gpu/drm/ttm/ttm_bo_vm.c   | 81
 -
   drivers/gpu/drm/vmwgfx/vmwgfx_resource.c  |  4 +-
   include/drm/ttm/ttm_bo_api.h  | 15 ++
   include/drm/ttm/ttm_bo_driver.h   |  7 +--
   14 files changed, 65 insertions(+), 148 deletions(-)

 diff --git a/drivers/gpu/drm/ast/ast_main.c
 b/drivers/gpu/drm/ast/ast_main.c
 index f60fd7b..c195dc2 100644
 --- a/drivers/gpu/drm/ast/ast_main.c
 +++ b/drivers/gpu/drm/ast/ast_main.c
 @@ -487,7 +487,7 @@ void ast_gem_free_object(struct drm_gem_object *obj)
 static inline u64 ast_bo_mmap_offset(struct ast_bo *bo)
   {
 -   return bo-bo.addr_space_offset;
 +   return drm_vma_node_offset_addr(bo-bo.vma_node);
   }
   int
   ast_dumb_mmap_offset(struct drm_file *file,
 diff --git a/drivers/gpu/drm/cirrus/cirrus_main.c
 b/drivers/gpu/drm/cirrus/cirrus_main.c
 index 35cbae8..3a7a0ef 100644
 --- a/drivers/gpu/drm/cirrus/cirrus_main.c
 +++ b/drivers/gpu/drm/cirrus/cirrus_main.c
 @@ -294,7 +294,7 @@ void cirrus_gem_free_object(struct drm_gem_object
 *obj)
 static inline u64 cirrus_bo_mmap_offset(struct cirrus_bo *bo)
   {
 -   return bo-bo.addr_space_offset;
 +   return drm_vma_node_offset_addr(bo-bo.vma_node);
   }
 int
 diff --git a/drivers/gpu/drm/mgag200/mgag200_main.c
 b/drivers/gpu/drm/mgag200/mgag200_main.c
 index 9fa5685..1a75ea3 100644
 --- a/drivers/gpu/drm/mgag200/mgag200_main.c
 +++ b/drivers/gpu/drm/mgag200/mgag200_main.c
 @@ -349,7 +349,7 @@ void mgag200_gem_free_object(struct drm_gem_object
 *obj)
 static inline u64 mgag200_bo_mmap_offset(struct mgag200_bo *bo)
   {
 -   return bo-bo.addr_space_offset;
 +

Re: [PATCH v3 3/4] drm/ttm: convert to unified vma offset manager

2013-07-18 Thread Thomas Hellstrom

On 07/18/2013 01:07 PM, David Herrmann wrote:

Hi

On Thu, Jul 18, 2013 at 10:53 AM, Thomas Hellstrom
thellst...@vmware.com wrote:

A quick look, but not a full review:

Looks mostly good, but it looks like the TTM vm lock isn't needed at all
anymore (provided the vma offset manager is properly protected), since
kref_get_unless_zero() is used when a reference after lookup is taken.
(please see the kref_get_unless_zero documentation examples). When
drm_vma_offset_remove() is called, the kref value is unconditionally zero,
and that should block any successful lookup.

If we drop vm_lock without modifying TTM, this will not work. Even
kref_get_unless_zero() needs some guarantee that the object is still
valid. Assume this scenario:

drm_vma_offset_lookup() returns a valid node, however, before we call
kref_get_*(), another thread destroys the last reference to the bo so
it gets kfree()d. kref_get_unless_zero() will thus reference memory
which can be _anything_ and is not guarantee to stay 0.
(Documentation/kref.txt mentions this, too, and I guess it was you who
wrote that).

I cannot see any locking around the mmap call that could guarantee
this except vm_lock.


You're right. My apologies. Parental leave has taken its toll.




Otherwise, if the vma offset manager always needs external locking to make
lookup + referencing atomic, then perhaps locking should be completely
left to the caller?

I would actually prefer to keep it in the VMA manager. It allows some
fast-paths which otherwise would need special semantics for the caller
(for instance see the access-management patches which are based on
this patchset or the reduced locking during setup/release). We'd also
require the caller to use rwsems for good performance, which is not
the case for gem.

So how about Daniel's proposal to add an _unlocked() version and
provide _lock_lookup() and _unlock_lookup() helpers which just wrap
the read_lock() and read_unlock?

Thanks!
David


I think that if there are good reasons to keep locking internal, I'm 
fine with that, (And also, of course, with
Daniel's proposal). Currently the add / remove / lookup paths are mostly 
used by TTM during object creation and

destruction.

However, if the lookup path is ever used by pread / pwrite, that 
situation might change and we would then like to

minimize the locking.

Thanks,
Thomas




Thanks,
Thomas




On 07/17/2013 08:14 PM, David Herrmann wrote:

Use the new vma-manager infrastructure. This doesn't change any
implementation details as the vma-offset-manager is nearly copied 1-to-1
from TTM.

Even though the vma-manager uses its own locks, we still need bo-vm_lock
to prevent bos from being destroyed before we can get a reference during
lookup. However, this lock is not needed during vm-setup as we still
hold a reference there.

This also drops the addr_space_offset member as it is a copy of vm_start
in vma_node objects. Use the accessor functions instead.

Cc: Dave Airlie airl...@redhat.com
Cc: Ben Skeggs bske...@redhat.com
Cc: Maarten Lankhorst maarten.lankho...@canonical.com
Cc: Martin Peres martin.pe...@labri.fr
Cc: Alex Deucher alexander.deuc...@amd.com
Cc: Thomas Hellstrom thellst...@vmware.com
Signed-off-by: David Herrmann dh.herrm...@gmail.com
---
   drivers/gpu/drm/ast/ast_main.c|  2 +-
   drivers/gpu/drm/cirrus/cirrus_main.c  |  2 +-
   drivers/gpu/drm/mgag200/mgag200_main.c|  2 +-
   drivers/gpu/drm/nouveau/nouveau_display.c |  2 +-
   drivers/gpu/drm/nouveau/nouveau_gem.c |  2 +-
   drivers/gpu/drm/qxl/qxl_object.h  |  2 +-
   drivers/gpu/drm/qxl/qxl_release.c |  2 +-
   drivers/gpu/drm/radeon/radeon_object.h|  5 +-
   drivers/gpu/drm/ttm/ttm_bo.c  | 84
++-
   drivers/gpu/drm/ttm/ttm_bo_util.c |  3 +-
   drivers/gpu/drm/ttm/ttm_bo_vm.c   | 81
-
   drivers/gpu/drm/vmwgfx/vmwgfx_resource.c  |  4 +-
   include/drm/ttm/ttm_bo_api.h  | 15 ++
   include/drm/ttm/ttm_bo_driver.h   |  7 +--
   14 files changed, 65 insertions(+), 148 deletions(-)

diff --git a/drivers/gpu/drm/ast/ast_main.c
b/drivers/gpu/drm/ast/ast_main.c
index f60fd7b..c195dc2 100644
--- a/drivers/gpu/drm/ast/ast_main.c
+++ b/drivers/gpu/drm/ast/ast_main.c
@@ -487,7 +487,7 @@ void ast_gem_free_object(struct drm_gem_object *obj)
 static inline u64 ast_bo_mmap_offset(struct ast_bo *bo)
   {
-   return bo-bo.addr_space_offset;
+   return drm_vma_node_offset_addr(bo-bo.vma_node);
   }
   int
   ast_dumb_mmap_offset(struct drm_file *file,
diff --git a/drivers/gpu/drm/cirrus/cirrus_main.c
b/drivers/gpu/drm/cirrus/cirrus_main.c
index 35cbae8..3a7a0ef 100644
--- a/drivers/gpu/drm/cirrus/cirrus_main.c
+++ b/drivers/gpu/drm/cirrus/cirrus_main.c
@@ -294,7 +294,7 @@ void cirrus_gem_free_object(struct drm_gem_object
*obj)
 static inline u64 cirrus_bo_mmap_offset(struct cirrus_bo *bo)
   {
-   return bo-bo.addr_space_offset;
+   return 

Re: [PATCH v3 3/4] drm/ttm: convert to unified vma offset manager

2013-07-18 Thread David Herrmann
Hi

On Thu, Jul 18, 2013 at 1:24 PM, Thomas Hellstrom thellst...@vmware.com wrote:
 On 07/18/2013 01:07 PM, David Herrmann wrote:

 Hi

 On Thu, Jul 18, 2013 at 10:53 AM, Thomas Hellstrom
 thellst...@vmware.com wrote:

 A quick look, but not a full review:

 Looks mostly good, but it looks like the TTM vm lock isn't needed at all
 anymore (provided the vma offset manager is properly protected), since
 kref_get_unless_zero() is used when a reference after lookup is taken.
 (please see the kref_get_unless_zero documentation examples). When
 drm_vma_offset_remove() is called, the kref value is unconditionally
 zero,
 and that should block any successful lookup.

 If we drop vm_lock without modifying TTM, this will not work. Even
 kref_get_unless_zero() needs some guarantee that the object is still
 valid. Assume this scenario:

 drm_vma_offset_lookup() returns a valid node, however, before we call
 kref_get_*(), another thread destroys the last reference to the bo so
 it gets kfree()d. kref_get_unless_zero() will thus reference memory
 which can be _anything_ and is not guarantee to stay 0.
 (Documentation/kref.txt mentions this, too, and I guess it was you who
 wrote that).

 I cannot see any locking around the mmap call that could guarantee
 this except vm_lock.


 You're right. My apologies. Parental leave has taken its toll.



 Otherwise, if the vma offset manager always needs external locking to
 make
 lookup + referencing atomic, then perhaps locking should be completely
 left to the caller?

 I would actually prefer to keep it in the VMA manager. It allows some
 fast-paths which otherwise would need special semantics for the caller
 (for instance see the access-management patches which are based on
 this patchset or the reduced locking during setup/release). We'd also
 require the caller to use rwsems for good performance, which is not
 the case for gem.

 So how about Daniel's proposal to add an _unlocked() version and
 provide _lock_lookup() and _unlock_lookup() helpers which just wrap
 the read_lock() and read_unlock?

 Thanks!
 David


 I think that if there are good reasons to keep locking internal, I'm fine
 with that, (And also, of course, with
 Daniel's proposal). Currently the add / remove / lookup paths are mostly
 used by TTM during object creation and
 destruction.

 However, if the lookup path is ever used by pread / pwrite, that situation
 might change and we would then like to
 minimize the locking.

I tried to keep the change as minimal as I could. Follow-up patches
are welcome. I just thought pushing the lock into drm_vma_* would
simplify things. If there are benchmarks that prove me wrong, I'll
gladly spend some time optimizing that.

Apart from this, I'd like to see someone ack the
ttm_buffer_object_transfer() changes. I am not very confident with
that. Everything else should be trivial.

Thanks
David
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v3 3/4] drm/ttm: convert to unified vma offset manager

2013-07-18 Thread Jerome Glisse
On Thu, Jul 18, 2013 at 4:54 PM, David Herrmann dh.herrm...@gmail.com wrote:
 Hi

 On Thu, Jul 18, 2013 at 1:24 PM, Thomas Hellstrom thellst...@vmware.com 
 wrote:
 On 07/18/2013 01:07 PM, David Herrmann wrote:

 Hi

 On Thu, Jul 18, 2013 at 10:53 AM, Thomas Hellstrom
 thellst...@vmware.com wrote:

 A quick look, but not a full review:

 Looks mostly good, but it looks like the TTM vm lock isn't needed at all
 anymore (provided the vma offset manager is properly protected), since
 kref_get_unless_zero() is used when a reference after lookup is taken.
 (please see the kref_get_unless_zero documentation examples). When
 drm_vma_offset_remove() is called, the kref value is unconditionally
 zero,
 and that should block any successful lookup.

 If we drop vm_lock without modifying TTM, this will not work. Even
 kref_get_unless_zero() needs some guarantee that the object is still
 valid. Assume this scenario:

 drm_vma_offset_lookup() returns a valid node, however, before we call
 kref_get_*(), another thread destroys the last reference to the bo so
 it gets kfree()d. kref_get_unless_zero() will thus reference memory
 which can be _anything_ and is not guarantee to stay 0.
 (Documentation/kref.txt mentions this, too, and I guess it was you who
 wrote that).

 I cannot see any locking around the mmap call that could guarantee
 this except vm_lock.


 You're right. My apologies. Parental leave has taken its toll.



 Otherwise, if the vma offset manager always needs external locking to
 make
 lookup + referencing atomic, then perhaps locking should be completely
 left to the caller?

 I would actually prefer to keep it in the VMA manager. It allows some
 fast-paths which otherwise would need special semantics for the caller
 (for instance see the access-management patches which are based on
 this patchset or the reduced locking during setup/release). We'd also
 require the caller to use rwsems for good performance, which is not
 the case for gem.

 So how about Daniel's proposal to add an _unlocked() version and
 provide _lock_lookup() and _unlock_lookup() helpers which just wrap
 the read_lock() and read_unlock?

 Thanks!
 David


 I think that if there are good reasons to keep locking internal, I'm fine
 with that, (And also, of course, with
 Daniel's proposal). Currently the add / remove / lookup paths are mostly
 used by TTM during object creation and
 destruction.

 However, if the lookup path is ever used by pread / pwrite, that situation
 might change and we would then like to
 minimize the locking.

 I tried to keep the change as minimal as I could. Follow-up patches
 are welcome. I just thought pushing the lock into drm_vma_* would
 simplify things. If there are benchmarks that prove me wrong, I'll
 gladly spend some time optimizing that.

 Apart from this, I'd like to see someone ack the
 ttm_buffer_object_transfer() changes. I am not very confident with
 that. Everything else should be trivial.

 Thanks
 David

Looks good to me, the transfer object must have an empty
vm_node/vma_node as we only interested in tying the system ram or vram
to the ghost object so that delayed delete the vram or system ram but
the original buffer is still valid and thus its original
vm_node/vma_node must not be free.

Cheers,
Jerome
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel