Re: [PATCH] drm/amdgpu: set sched_hw_submission higher for KIQ

2017-08-22 Thread Xie, AlexBin
Hi Alex,


See below comment about coding style of 80 characters per line.


Thanks,

Alex Bin Xie


From: amd-gfx  on behalf of Alex Deucher 

Sent: Tuesday, August 22, 2017 4:49 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander
Subject: [PATCH] drm/amdgpu: set sched_hw_submission higher for KIQ

KIQ doesn't really use the GPU scheduler.  The base
drivers generally use the KIQ ring directly rather than
submitting IBs.  However, amdgpu_sched_hw_submission
(which defaults to 2) limits the number of outstanding
fences to 2.  KFD uses the KIQ for TLB flushes and the
2 fence limit hurts performance when there are several KFD
processes running.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 6c5646b..f39b851 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -170,6 +170,16 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
amdgpu_ring *ring,
  unsigned irq_type)
 {
 int r;
+   int sched_hw_submission = amdgpu_sched_hw_submission;
+
+   /* Set the hw submission limit higher for KIQ because
+* it's used for a number of gfx/compute tasks by both
+* KFD and KGD which may have outstanding fences and
+* it doesn't really use the gpu scheduler anyway;
+* KIQ tasks get submitted directly to the ring.
+*/
+   if (ring->funcs->type == AMDGPU_RING_TYPE_KIQ)
+   sched_hw_submission *= 2;

 if (ring->adev == NULL) {
 if (adev->num_rings >= AMDGPU_MAX_RINGS)
@@ -179,7 +189,7 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
amdgpu_ring *ring,
 ring->idx = adev->num_rings++;
 adev->rings[ring->idx] = ring;
 r = amdgpu_fence_driver_init_ring(ring,
-   amdgpu_sched_hw_submission);
+ sched_hw_submission);

Alex Bin Xie: With shorter variable name, there is no need to wrap into a new 
line of code.

 if (r)
 return r;
 }
@@ -219,7 +229,7 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
amdgpu_ring *ring,
 }

 ring->ring_size = roundup_pow_of_two(max_dw * 4 *
-amdgpu_sched_hw_submission);
+sched_hw_submission);

Alex Bin Xie: With shorter variable name, there is no need to wrap into a new 
line of code.


 ring->buf_mask = (ring->ring_size / 4) - 1;
 ring->ptr_mask = ring->funcs->support_64bit_ptrs ?
--
2.5.5

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
amd-gfx Info Page - 
freedesktop.org
lists.freedesktop.org
Subscribing to amd-gfx: Subscribe to amd-gfx by filling out the following form. 
Use of all freedesktop.org lists is subject to our Code of ...


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: fix a bogus warning

2017-08-17 Thread Xie, AlexBin
Reviewed-by: Alex Xie <alexbin@amd.com>



From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> on behalf of Alex Deucher 
<alexdeuc...@gmail.com>
Sent: Thursday, August 17, 2017 4:44 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander
Subject: [PATCH] drm/amdgpu: fix a bogus warning

Don't validate the default value.  Prevents a needless
warning. Also fix a spelling typo in the warning message.

Signed-off-by: Alex Deucher <alexander.deuc...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2554ddf..1a459ac 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1077,8 +1077,9 @@ static void amdgpu_check_arguments(struct amdgpu_device 
*adev)
 }

 /* valid range is between 4 and 9 inclusive */
-   if (amdgpu_vm_fragment_size > 9 || amdgpu_vm_fragment_size < 4) {
-   dev_warn(adev->dev, "valid rang is between 4 and 9\n");
+   if (amdgpu_vm_fragment_size != -1 &&
+   (amdgpu_vm_fragment_size > 9 || amdgpu_vm_fragment_size < 4)) {
+   dev_warn(adev->dev, "valid range is between 4 and 9\n");
 amdgpu_vm_fragment_size = -1;
 }

--
2.5.5

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
amd-gfx Info Page - 
freedesktop.org<https://lists.freedesktop.org/mailman/listinfo/amd-gfx>
lists.freedesktop.org
Subscribing to amd-gfx: Subscribe to amd-gfx by filling out the following form. 
Use of all freedesktop.org lists is subject to our Code of ...



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/syncobj: add sync obj wait interface. (v6)

2017-07-10 Thread Xie, AlexBin
I understand this discussion from closes source driver terminology.

If a process is killed before it sends out the signaling command, will some 
part of the GPU be in a waiting situation forever?

Alex Bin Xie
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of Jason 
Ekstrand
Sent: Monday, July 10, 2017 11:53 AM
To: Christian König 
Cc: Dave Airlie ; Maling list - DRI developers 
; amd-gfx mailing list 

Subject: Re: [PATCH] drm/syncobj: add sync obj wait interface. (v6)

On Mon, Jul 10, 2017 at 8:45 AM, Christian König 
> wrote:
Am 10.07.2017 um 17:28 schrieb Jason Ekstrand:
On Wed, Jul 5, 2017 at 6:04 PM, Dave Airlie 
> wrote:
From: Dave Airlie >

This interface will allow sync object to be used to back
Vulkan fences. This API is pretty much the vulkan fence waiting
API, and I've ported the code from amdgpu.

v2: accept relative timeout, pass remaining time back
to userspace.
v3: return to absolute timeouts.
v4: absolute zero = poll,
rewrite any/all code to have same operation for arrays
return -EINVAL for 0 fences.
v4.1: fixup fences allocation check, use u64_to_user_ptr
v5: move to sec/nsec, and use timespec64 for calcs.
v6: use -ETIME and drop the out status flag. (-ETIME
is suggested by ickle, I can feel a shed painting)

Signed-off-by: Dave Airlie >
---
 drivers/gpu/drm/drm_internal.h |   2 +
 drivers/gpu/drm/drm_ioctl.c|   2 +
 drivers/gpu/drm/drm_syncobj.c  | 142 +
 include/uapi/drm/drm.h |  13 
 4 files changed, 159 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 5cecc97..d71b50d 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -157,3 +157,5 @@ int drm_syncobj_handle_to_fd_ioctl(struct drm_device *dev, 
void *data,
   struct drm_file *file_private);
 int drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_private);
+int drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
+  struct drm_file *file_private);
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index f1e5681..385ce74 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -657,6 +657,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE, 
drm_syncobj_fd_to_handle_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_WAIT, drm_syncobj_wait_ioctl,
+ DRM_UNLOCKED|DRM_RENDER_ALLOW),
 };

 #define DRM_CORE_IOCTL_COUNT   ARRAY_SIZE( drm_ioctls )
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 89441bc..2d5a7a1 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1,5 +1,7 @@
 /*
  * Copyright 2017 Red Hat
+ * Parts ported from amdgpu (fence wait code).
+ * Copyright 2016 Advanced Micro Devices, Inc.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a
  * copy of this software and associated documentation files (the "Software"),
@@ -31,6 +33,9 @@
  * that contain an optional fence. The fence can be updated with a new
  * fence, or be NULL.
  *
+ * syncobj's can be waited upon, where it will wait for the underlying
+ * fence.
+ *
  * syncobj's can be export to fd's and back, these fd's are opaque and
  * have no other use case, except passing the syncobj between processes.
  *
@@ -451,3 +456,140 @@ drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, 
void *data,
return drm_syncobj_fd_to_handle(file_private, args->fd,
>handle);
 }
+
+/**
+ * drm_timeout_abs_to_jiffies - calculate jiffies timeout from absolute value
+ *
+ * @timeout_sec: timeout sec component, 0 for poll
+ * @timeout_nsec: timeout nsec component in ns, 0 for poll
+ * both must be 0 for poll.
+ *
+ * Calculate the timeout in jiffies from an absolute time in sec/nsec.
+ */
+static unsigned long drm_timeout_abs_to_jiffies(int64_t timeout_sec, uint64_t 
timeout_nsec)
+{
+   struct timespec64 abs_timeout, timeout, max_jiffy_timespec;
+   unsigned long timeout_jiffies;
+
+   /* make 0 timeout means poll - absolute 0 doesn't seem valid */
+   if (timeout_sec == 0 && timeout_nsec == 0)
+   return 0;
+
+   abs_timeout.tv_sec = timeout_sec;
+   abs_timeout.tv_nsec = timeout_nsec;
+
+   /* clamp timeout if it's to large */
+   if (!timespec64_valid_strict(_timeout))
+   return 

RE: [PATCH 3/3] drm/amdgpu: Add kernel parameter to control use of ECC/EDC.

2017-06-26 Thread Xie, AlexBin
Hi,

I have not checked the background of this discussion very closely yet. And you 
might have known the following.

Customers may not want the default setting to change meaning. This is like an 
API.
Example: The application and its environment is already set up and tested. Then 
if customer updates driver, suddenly driver has some new behavior?
Certain serious application definitely does not accept this.

IMHO, it is better to avoid vague concepts like "best". It will become rather 
difficult to define what is best when there are multiple customers with 
different requirements. Driver is to provide a feature or mechanism. "Best" 
sounds like a policy or a preference from driver side.

In my pass work, I generally use default for two cases:
1. The default is the most conservative option, which must work. Then the 
application can choose advanced features by choosing other parameter 
value/option.
2. The default parameter is the compatible behavior before introducing this 
parameter/option.

Regards,
Alex Bin


-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of 
Panariti, David
Sent: Monday, June 26, 2017 12:06 PM
To: Deucher, Alexander ; 
amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH 3/3] drm/amdgpu: Add kernel parameter to control use of 
ECC/EDC.

>> > I'd suggest setting amdgpu_ecc_flags to AMD_ECC_SUPPORT_BEST by 
>> > default.  That can be our asic specific default setting.  In the 
>> > case of CZ, that will be disabled until we decide to enable EDC by default.
>> [davep] I'm confused.  ECC...BEST will cause EDC to be enabled.
>> I used ECC as the generic term for ECC and EDC, since ECC seems more 
>> basic (EDC is built on top of ECC).
>> If I understand you, we can't do what you want with the current setup.

>I'm saying we make ECC_BEST the default config (feel free to re-name if 
>ECC_DEFAULT).  Each asic can have a different default depending 
>on what features are ready.  So for CZ, we'd make ECC_BEST equivalent to 
>disabling ECC for now.  If a user wants to force it on, they can 
>set ECC_EDC.  Once EDC is stable on CZ, we can make ECC_BEST be equivalent to 
>ECC_EDC.  The way the default (ECC_BEST) always maps 
>to the best available combination in that version of the driver.

That's not how I meant it to work WRT BEST.
Each asic will have a DEFAULT, but that isn't what BEST means.
CZ is a good example (when fully implemented).  DEFAULT for CZ is everything 
except HALT, since, IMO opinion, most people do not want to hang or reboot.
BEST for CZ would be everything a person most interested in reliability would 
want, which IMO, includes HALT/reboot.
Similar is if something like performance degradation is really bad, DEFAULT 
would be OFF. BEST would be ON, e.g., if the user's app doesn't trigger the 
performance problem.
The BEST bit is in a fixed position, so that customers don't need to worry what 
bits are needed for the most reliable performance (in our opinion) on a given 
asic.
And if a customer (or developer) wants some arbitrary set of features, they can 
set bits as they want.

I think DEFAULT will make most people happy.
BEST allows people who are interested in everything they can get, regardless of 
any issues that brings with it. It is requested simply by using a fixed param 
value (0x01) for any asic.
This probably should not include features that have any kind of fatal flaw such 
as the Vega10 HBM ECC issue.  When fixed, it can be added to DEFAULT.
And allowing per-feature control allows anyone to do precisely what they want.
"Effort" increases as the number of interested users decreases.

Using defines in the init code will be a problem if there is more than one kind 
of asic involved or a single asic that the user wants to use with different 
parameters.  However, this doesn't seem to be a high priority.
If we do want to worry about it, then we'll need to put the values into the 
amdgpu_gfx struct.

regards,
davep

> -Original Message-
> From: Deucher, Alexander
> Sent: Tuesday, June 06, 2017 6:16 PM
> To: Panariti, David ; amd-
> g...@lists.freedesktop.org
> Subject: RE: [PATCH 3/3] drm/amdgpu: Add kernel parameter to control use
> of ECC/EDC.
> 
> > -Original Message-
> > From: Panariti, David
> > Sent: Tuesday, June 06, 2017 5:50 PM
> > To: Deucher, Alexander; amd-gfx@lists.freedesktop.org
> > Subject: RE: [PATCH 3/3] drm/amdgpu: Add kernel parameter to control
> > use of ECC/EDC.
> >
> > Rather than inlining this in a number of places, Re verbosity:
> > I've worked in embedded environments and when dealing with
> > intermittent problems it's nice to have all of the information ASAP
> > rather than waiting for a problem to reoccur, especially if it's very
> intermittent.
> > I would've preferred more.
> > Since it only shows up happens on CZ, it adds little to the output.
> > I like to show the reasons why EDC didn't happen, hence the backwards
> > 

Re: [PATCH 1/3] drm/amdgpu: fix a typo

2017-06-22 Thread Xie, AlexBin
Hi Christian,


In fact, the change from spinlock to atomic is quite painful. When I started, I 
thought it was easy but later I found there might be race condition here and 
there. Now I think the change looks more robust. In kernel source, there are 
several other drivers used the same trick.


On the other hand, I think the logic itself might be optimized considering the 
locking. But I had spent quite some effort to maintain original logic.


Thanks,

Alex Bin


From: Christian König <deathsim...@vodafone.de>
Sent: Thursday, June 22, 2017 3:35 AM
To: Xie, AlexBin; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] drm/amdgpu: fix a typo

Am 22.06.2017 um 04:42 schrieb Alex Xie:
> Signed-off-by: Alex Xie <alexbin@amd.com>

With the commit message fixed as Michel suggested patches #1 and #2 are
Reviewed-by: Christian König <christian.koe...@amd.com> as well.

On patch #3 Marek needs to take a look, cause I don't know the logic
behind that.

Regards,
Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 7635f38..94c27fc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -494,7 +494,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser 
> *p,
> >user_invalidated) && e->user_pages) {
>
>/* We acquired a page array, but somebody
> -  * invalidated it. Free it an try again
> +  * invalidated it. Free it and try again
> */
>release_pages(e->user_pages,
>  e->robj->tbo.ttm->num_pages,


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH umr] Don't read SGPR if wave isn't halted.

2017-06-15 Thread Xie, AlexBin
Reviewed-by: Alex Xie <alexbin@amd.com><mailto:alexbin@amd.com>



From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> on behalf of Tom St Denis 
<tom.stde...@amd.com>
Sent: Thursday, June 15, 2017 8:14:51 AM
To: amd-gfx@lists.freedesktop.org
Cc: StDenis, Tom
Subject: [PATCH umr] Don't read SGPR if wave isn't halted.

On previous generations this was allowed but on Vega10 it will result
in the occasional system hang.

Also like others wave status reading is only reliable if

1) the waves are halted/hung (core is active)
2) or, you disable CG/PG with cg_mask=pg_mask=0

Even with this patch hangs are possible if PG/CG are still enabled on
Vega10.

Signed-off-by: Tom St Denis <tom.stde...@amd.com>
---
 src/app/print_waves.c | 40 ++--
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/src/app/print_waves.c b/src/app/print_waves.c
index e3662983d8d1..e157db9f9386 100644
--- a/src/app/print_waves.c
+++ b/src/app/print_waves.c
@@ -55,7 +55,8 @@ void umr_print_waves(struct umr_asic *asic)
 umr_get_wave_status(asic, se, sh, cu, simd, 
wave, );
 if (ws.wave_status.halt || 
ws.wave_status.valid) {
 // grab sgprs..
-   umr_read_sgprs(asic, , [0]);
+   if (ws.wave_status.halt)
+   umr_read_sgprs(asic, , 
[0]);

 if (!options.bitfields && first) {
 first = 0;
@@ -75,14 +76,15 @@ void umr_print_waves(struct umr_asic *asic)
 (unsigned long)ws.hw_id.value, (unsigned long)ws.gpr_alloc.value, (unsigned 
long)ws.lds_alloc.value, (unsigned long)ws.trapsts.value, (unsigned 
long)ws.ib_sts.value,
 (unsigned long)ws.tba_hi, (unsigned long)ws.tba_lo, (unsigned long)ws.tma_hi, 
(unsigned long)ws.tma_lo, (unsigned long)ws.ib_dbg0, (unsigned long)ws.m0
 );
-   for (x = 0; x < 
((ws.gpr_alloc.sgpr_size + 1) << shift); x += 4)
-   printf(">SGPRS[%u..%u] 
= { %08lx, %08lx, %08lx, %08lx }\n",
-   
(unsigned)((ws.gpr_alloc.sgpr_base << shift) + x),
-   
(unsigned)((ws.gpr_alloc.sgpr_base << shift) + x + 3),
-   (unsigned 
long)sgprs[x],
-   (unsigned 
long)sgprs[x+1],
-   (unsigned 
long)sgprs[x+2],
-   (unsigned 
long)sgprs[x+3]);
+   if (ws.wave_status.halt)
+   for (x = 0; x < 
((ws.gpr_alloc.sgpr_size + 1) << shift); x += 4)
+   
printf(">SGPRS[%u..%u] = { %08lx, %08lx, %08lx, %08lx }\n",
+   
(unsigned)((ws.gpr_alloc.sgpr_base << shift) + x),
+   
(unsigned)((ws.gpr_alloc.sgpr_base << shift) + x + 3),
+   
(unsigned long)sgprs[x],
+   
(unsigned long)sgprs[x+1],
+   
(unsigned long)sgprs[x+2],
+   
(unsigned long)sgprs[x+3]);

 pgm_addr = 
(((uint64_t)ws.pc_hi << 32) | ws.pc_lo) - (sizeof(opcodes)/2);
 umr_read_vram(asic, 
ws.hw_id.vm_id, pgm_addr, sizeof(opcodes), opcodes);
@@ -154,15 +156,17 @@ void umr_print_waves(struct umr_asic *asic)
 PP(gpr_alloc, sgpr_base);
 PP(gpr_alloc, sgpr_size);

-   printf("\n\nSGPRS:\n");
-   for (x = 0; x < 
((ws.gpr_alloc.sgpr_size + 1) << shift); x += 4)
-   printf("\t[%4u..%4u] = 
{ %08lx, %08lx, %08lx, %08lx }\n",
-   
(unsigned)((ws.gpr_alloc.sgpr_base << shift) + x),
-   

RE: . [PATCH] drm/amd/amdgpu: Fix ring initialization for GFX9

2017-06-05 Thread Xie, AlexBin
Hi Andres,

I think the original patch was written by you. Would you comment? Is it a bug 
or intentional?

Thank you.

Alex Bin Xie


Message: 1
Date: Mon, 5 Jun 2017 10:36:59 -0400
From: Tom St Denis <tom.stde...@amd.com>
To: amd-gfx@lists.freedesktop.org
Subject: Re: amd-gfx Digest, Vol 13, Issue 29
Message-ID: <58773ac0-93a7-e494-d447-88e93ded9...@amd.com>
Content-Type: text/plain; charset=utf-8; format=flowed

On 05/06/17 10:24 AM, Xie, AlexBin wrote:
> Hi, Tom,
> 
> You have found a bug.
> 
> Your patch looks fine for me.
> 
> Have you confirmed the deleted part is older version? Perhaps search email 
> list or git history to confirm?

It looks like the edits to the older GFX files (7/8) simply changed that 
block of code whereas the gfx9 version they pasted in the fixed block 
moving the old block down.

Tom


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: amd-gfx Digest, Vol 13, Issue 29

2017-06-05 Thread Xie, AlexBin
Hi Tom,

I have found the original patch which introduce the duplicate code.

The patch is in:
amd-gfx Digest, Vol 11, Issue 301

Alex Bin Xie

-Original Message-
From: Xie, AlexBin 
Sent: Monday, June 5, 2017 10:25 AM
To: amd-gfx@lists.freedesktop.org
Subject: RE: amd-gfx Digest, Vol 13, Issue 29

Hi, Tom,

You have found a bug.

Your patch looks fine for me.

Have you confirmed the deleted part is older version? Perhaps search email list 
or git history to confirm? 

Alex Bin Xie

-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of 
amd-gfx-requ...@lists.freedesktop.org
Sent: Monday, June 5, 2017 8:00 AM
To: amd-gfx@lists.freedesktop.org
Subject: amd-gfx Digest, Vol 13, Issue 29

Send amd-gfx mailing list submissions to
amd-gfx@lists.freedesktop.org

To subscribe or unsubscribe via the World Wide Web, visit
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
or, via email, send a message with subject or body 'help' to
amd-gfx-requ...@lists.freedesktop.org

You can reach the person managing the list at
amd-gfx-ow...@lists.freedesktop.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of amd-gfx digest..."


Today's Topics:

   1. [PATCH] drm/amd/amdgpu:  Fix ring initialization for GFX9
  (Tom St Denis)


--

Message: 1
Date: Mon,  5 Jun 2017 07:46:01 -0400
From: Tom St Denis <tom.stde...@amd.com>
To: amd-gfx@lists.freedesktop.org
Cc: Tom St Denis <tom.stde...@amd.com>
Subject: [PATCH] drm/amd/amdgpu:  Fix ring initialization for GFX9
Message-ID: <20170605114601.11995-1-tom.stde...@amd.com>
Content-Type: text/plain

The commit 83866f0fc72017d55f40cbd4160cd1e42a2cc3a8 erroneously included the
old ring init sequence along with the new one which uses shared header 
definitions.

The fix which works on my vega10 seems to be to drop the old init sequence.

Signed-off-by: Tom St Denis <tom.stde...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 26 --
 1 file changed, 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 9502353ec325..8388893e0b11 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -1586,32 +1586,6 @@ static int gfx_v9_0_sw_init(void *handle)
ring_id++;
}
 
-   /* set up the compute queues */
-   for (i = 0, ring_id = 0; i < AMDGPU_MAX_COMPUTE_QUEUES; i++) {
-   unsigned irq_type;
-
-   /* max 32 queues per MEC */
-   if ((i >= 32) || (i >= AMDGPU_MAX_COMPUTE_RINGS)) {
-   DRM_ERROR("Too many (%d) compute rings!\n", i);
-   break;
-   }
-   ring = >gfx.compute_ring[i];
-   ring->ring_obj = NULL;
-   ring->use_doorbell = true;
-   ring->doorbell_index = (AMDGPU_DOORBELL64_MEC_RING0 + i) << 1;
-   ring->me = 1; /* first MEC */
-   ring->pipe = i / 8;
-   ring->queue = i % 8;
-   ring->eop_gpu_addr = adev->gfx.mec.hpd_eop_gpu_addr + (i * 
GFX9_MEC_HPD_SIZE);
-   sprintf(ring->name, "comp_%d.%d.%d", ring->me, ring->pipe, 
ring->queue);
-   irq_type = AMDGPU_CP_IRQ_COMPUTE_MEC1_PIPE0_EOP + ring->pipe;
-   /* type-2 packets are deprecated on MEC, use type-3 instead */
-   r = amdgpu_ring_init(adev, ring, 1024,
->gfx.eop_irq, irq_type);
-   if (r)
-   return r;
-   }
-
r = gfx_v9_0_kiq_init(adev);
if (r) {
DRM_ERROR("Failed to init KIQ BOs!\n");
-- 
2.12.0



--

Subject: Digest Footer

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


--

End of amd-gfx Digest, Vol 13, Issue 29
***
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Move compute vm bug logic to amdgpu_vm.c

2017-06-01 Thread Xie, AlexBin
Ok. I will add the v2 next time. I did not add the v2 in case this email list 
does not put the two patches under same thread...


-Alex Bin


From: Christian König <deathsim...@vodafone.de>
Sent: Thursday, June 1, 2017 9:26 AM
To: Xie, AlexBin; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: Move compute vm bug logic to amdgpu_vm.c

Am 01.06.2017 um 01:08 schrieb Alex Xie:
>In review, Christian would like to keep the logic
>inside amdgpu_vm.c with a cost of slightly slower.
>The loop is still optimized out with this patch.
>
> v2: remove the if statement. Now it is not slower.
>
> Signed-off-by: Alex Xie <alexbin@amd.com>

When you create a v2 of a patch please note that in the subject line as
well.

Either by using the "-v2" option with "git send-email" or by adding the
v2 manually on the subject line.

I sometimes get mails twice because they are send to multiple lists I'm
registered on and that helps allot seeing that a patch changed and is
not just a duplicated mail.

Anyway this version is Reviewed-by: Christian König
<christian.ko...@amd.com> as well.

Regards,
Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  2 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c   | 32 -
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  5 
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 38 
> --
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  1 +
>   5 files changed, 39 insertions(+), 39 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 89bc34a..2f1a4e9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2227,6 +2227,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>
>adev->accel_working = true;
>
> + amdgpu_vm_check_compute_bug(adev);
> +
>/* Initialize the buffer migration limit. */
>if (amdgpu_moverate >= 0)
>max_MBps = amdgpu_moverate;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> index 7d95435..31aa51d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> @@ -153,36 +153,6 @@ void amdgpu_ring_undo(struct amdgpu_ring *ring)
>   }
>
>   /**
> - * amdgpu_ring_check_compute_vm_bug - check whether this ring has compute vm 
> bug
> - *
> - * @adev: amdgpu_device pointer
> - * @ring: amdgpu_ring structure holding ring information
> - */
> -static void amdgpu_ring_check_compute_vm_bug(struct amdgpu_device *adev,
> - struct amdgpu_ring *ring)
> -{
> - const struct amdgpu_ip_block *ip_block;
> -
> - ring->has_compute_vm_bug = false;
> -
> - if (ring->funcs->type != AMDGPU_RING_TYPE_COMPUTE)
> - /* only compute rings */
> - return;
> -
> - ip_block = amdgpu_get_ip_block(adev, AMD_IP_BLOCK_TYPE_GFX);
> - if (!ip_block)
> - return;
> -
> - /* Compute ring has a VM bug for GFX version < 7.
> -   And compute ring has a VM bug for GFX 8 MEC firmware version < 
> 673.*/
> - if (ip_block->version->major <= 7) {
> - ring->has_compute_vm_bug = true;
> - } else if (ip_block->version->major == 8)
> - if (adev->gfx.mec_fw_version < 673)
> - ring->has_compute_vm_bug = true;
> -}
> -
> -/**
>* amdgpu_ring_init - init driver ring struct.
>*
>* @adev: amdgpu_device pointer
> @@ -288,8 +258,6 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
> amdgpu_ring *ring,
>DRM_ERROR("Failed to register debugfs file for rings !\n");
>}
>
> - amdgpu_ring_check_compute_vm_bug(adev, ring);
> -
>return 0;
>   }
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index 334307e..ad399c8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -208,9 +208,4 @@ static inline void amdgpu_ring_clear_ring(struct 
> amdgpu_ring *ring)
>
>   }
>
> -static inline bool amdgpu_ring_has_compute_vm_bug(struct amdgpu_ring *ring)
> -{
> - return ring->has_compute_vm_bug;
> -}
> -
>   #endif
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index d4d05a8..6e32748 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdg

Re: [PATCH] drm/amdgpu: Move compute vm bug logic to amdgpu_vm.c

2017-06-01 Thread Xie, AlexBin
Hi Christian,


Please review this v2 patch. It is slightly faster.


Thanks,

Alex Bin


From: Xie, AlexBin
Sent: Wednesday, May 31, 2017 7:08 PM
To: amd-gfx@lists.freedesktop.org
Cc: Xie, AlexBin
Subject: [PATCH] drm/amdgpu: Move compute vm bug logic to amdgpu_vm.c

  In review, Christian would like to keep the logic
  inside amdgpu_vm.c with a cost of slightly slower.
  The loop is still optimized out with this patch.

v2: remove the if statement. Now it is not slower.

Signed-off-by: Alex Xie <alexbin@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c   | 32 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  5 
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 38 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  1 +
 5 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 89bc34a..2f1a4e9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2227,6 +2227,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,

 adev->accel_working = true;

+   amdgpu_vm_check_compute_bug(adev);
+
 /* Initialize the buffer migration limit. */
 if (amdgpu_moverate >= 0)
 max_MBps = amdgpu_moverate;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 7d95435..31aa51d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -153,36 +153,6 @@ void amdgpu_ring_undo(struct amdgpu_ring *ring)
 }

 /**
- * amdgpu_ring_check_compute_vm_bug - check whether this ring has compute vm 
bug
- *
- * @adev: amdgpu_device pointer
- * @ring: amdgpu_ring structure holding ring information
- */
-static void amdgpu_ring_check_compute_vm_bug(struct amdgpu_device *adev,
-   struct amdgpu_ring *ring)
-{
-   const struct amdgpu_ip_block *ip_block;
-
-   ring->has_compute_vm_bug = false;
-
-   if (ring->funcs->type != AMDGPU_RING_TYPE_COMPUTE)
-   /* only compute rings */
-   return;
-
-   ip_block = amdgpu_get_ip_block(adev, AMD_IP_BLOCK_TYPE_GFX);
-   if (!ip_block)
-   return;
-
-   /* Compute ring has a VM bug for GFX version < 7.
-   And compute ring has a VM bug for GFX 8 MEC firmware version < 
673.*/
-   if (ip_block->version->major <= 7) {
-   ring->has_compute_vm_bug = true;
-   } else if (ip_block->version->major == 8)
-   if (adev->gfx.mec_fw_version < 673)
-   ring->has_compute_vm_bug = true;
-}
-
-/**
  * amdgpu_ring_init - init driver ring struct.
  *
  * @adev: amdgpu_device pointer
@@ -288,8 +258,6 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
amdgpu_ring *ring,
 DRM_ERROR("Failed to register debugfs file for rings !\n");
 }

-   amdgpu_ring_check_compute_vm_bug(adev, ring);
-
 return 0;
 }

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 334307e..ad399c8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -208,9 +208,4 @@ static inline void amdgpu_ring_clear_ring(struct 
amdgpu_ring *ring)

 }

-static inline bool amdgpu_ring_has_compute_vm_bug(struct amdgpu_ring *ring)
-{
-   return ring->has_compute_vm_bug;
-}
-
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index d4d05a8..6e32748 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -656,6 +656,41 @@ static int amdgpu_vm_alloc_reserved_vmid(struct 
amdgpu_device *adev,
 return r;
 }

+/**
+ * amdgpu_vm_check_compute_bug - check whether asic has compute vm bug
+ *
+ * @adev: amdgpu_device pointer
+ */
+void amdgpu_vm_check_compute_bug(struct amdgpu_device *adev)
+{
+   const struct amdgpu_ip_block *ip_block;
+   bool has_compute_vm_bug;
+   struct amdgpu_ring *ring;
+   int i;
+
+   has_compute_vm_bug = false;
+
+   ip_block = amdgpu_get_ip_block(adev, AMD_IP_BLOCK_TYPE_GFX);
+   if (ip_block) {
+   /* Compute has a VM bug for GFX version < 7.
+  Compute has a VM bug for GFX 8 MEC firmware version < 673.*/
+   if (ip_block->version->major <= 7)
+   has_compute_vm_bug = true;
+   else if (ip_block->version->major == 8)
+   if (adev->gfx.mec_fw_version < 673)
+   has_compute_vm_bug = true;
+   }
+
+   for (i = 0; i < adev->num_rings; i++) {
+   

Re: [PATCH 1/3] drm/amdgpu: Optimize a function called by every IB sheduling

2017-05-31 Thread Xie, AlexBin
Hi Christian,


I have asked for code review of a new patch.


It turns out that the index operation is not needed. But there is one extra if 
statement in every IB submission, which will be a slight performance hit.


Function amdgpu_vm_check_compute_bug is called in amdgpu_device_init.

Function amdgpu_vm_check_compute_bug cannot be called in 
amdgpu_vm_manager_init, in which GFX is not initialized yet. MEC firmware 
version information is not available yet.


Thanks,

Alex Bin Xie


From: Christian König <deathsim...@vodafone.de>
Sent: Wednesday, May 31, 2017 9:07 AM
To: Xie, AlexBin; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] drm/amdgpu: Optimize a function called by every IB 
sheduling


I don't have strong opinion about where the code should be. But if we put this 
code in VM, there will be one extra array index operation because the VM bug is 
ring related. VM manager need to maintain an array to manage this information.

Yeah, but array index suck when you access cache cold stuff as well.

My concern is not so much where to put the field, but rather where to put the 
code to detect this condition. That is a bug very deeply related to how the CP 
manages VMs and VMIDs and actually not specific to the ring.

Just send out a patch which uses the ring type again to check if that 
workaround applies or not. The heavy stuff was calling amdgpu_get_ip_block() on 
every command submission, one additional if shouldn't hurt us here.

Regards,
Christian.

Am 31.05.2017 um 14:53 schrieb Xie, AlexBin:

HI Christian,


Too late. The code has been committed.


I don't have strong opinion about where the code should be. But if we put this 
code in VM, there will be one extra array index operation because the VM bug is 
ring related. VM manager need to maintain an array to manage this information.


In the amdgpu_ring structure, there is already information like vm_inv_eng and 
vmhub. Those are VM related information too. So this one extra information is 
not new.


Thanks,

Alex Bin


From: Christian König <deathsim...@vodafone.de><mailto:deathsim...@vodafone.de>
Sent: Wednesday, May 31, 2017 2:57 AM
To: Xie, AlexBin; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH 1/3] drm/amdgpu: Optimize a function called by every IB 
sheduling

Am 30.05.2017 um 23:47 schrieb Alex Xie:
>Move several if statements and a loop statment from
>run time to initialization time.

Yeah, that's exactly what I've suggested before as well.

Just keep the code inside amdgpu_vm.c (and the variable inside
amdgpu_vm_manager), since this isn't related to ring management at all.

Regards,
Christian.

>
> Signed-off-by: Alex Xie <alexbin@amd.com><mailto:alexbin@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 33 
> 
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  6 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   | 28 +--
>   3 files changed, 40 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> index 6a85db0..7d95435 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> @@ -153,6 +153,36 @@ void amdgpu_ring_undo(struct amdgpu_ring *ring)
>   }
>
>   /**
> + * amdgpu_ring_check_compute_vm_bug - check whether this ring has compute vm 
> bug
> + *
> + * @adev: amdgpu_device pointer
> + * @ring: amdgpu_ring structure holding ring information
> + */
> +static void amdgpu_ring_check_compute_vm_bug(struct amdgpu_device *adev,
> + struct amdgpu_ring *ring)
> +{
> + const struct amdgpu_ip_block *ip_block;
> +
> + ring->has_compute_vm_bug = false;
> +
> + if (ring->funcs->type != AMDGPU_RING_TYPE_COMPUTE)
> + /* only compute rings */
> + return;
> +
> + ip_block = amdgpu_get_ip_block(adev, AMD_IP_BLOCK_TYPE_GFX);
> + if (!ip_block)
> + return;
> +
> + /* Compute ring has a VM bug for GFX version < 7.
> +   And compute ring has a VM bug for GFX 8 MEC firmware version < 
> 673.*/
> + if (ip_block->version->major <= 7) {
> + ring->has_compute_vm_bug = true;
> + } else if (ip_block->version->major == 8)
> + if (adev->gfx.mec_fw_version < 673)
> + ring->has_compute_vm_bug = true;
> +}
> +
> +/**
>* amdgpu_ring_init - init driver ring struct.
>*
>* @adev: amdgpu_device pointer
> @@ -257,6 +287,9 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
> amdgpu_ring *ring,
>if (amdgpu_debugfs_ring_

Re: [PATCH 1/3] drm/amdgpu: Optimize a function called by every IB sheduling

2017-05-31 Thread Xie, AlexBin
HI Christian,


Too late. The code has been committed.


I don't have strong opinion about where the code should be. But if we put this 
code in VM, there will be one extra array index operation because the VM bug is 
ring related. VM manager need to maintain an array to manage this information.


In the amdgpu_ring structure, there is already information like vm_inv_eng and 
vmhub. Those are VM related information too. So this one extra information is 
not new.


Thanks,

Alex Bin


From: Christian König <deathsim...@vodafone.de>
Sent: Wednesday, May 31, 2017 2:57 AM
To: Xie, AlexBin; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] drm/amdgpu: Optimize a function called by every IB 
sheduling

Am 30.05.2017 um 23:47 schrieb Alex Xie:
>Move several if statements and a loop statment from
>run time to initialization time.

Yeah, that's exactly what I've suggested before as well.

Just keep the code inside amdgpu_vm.c (and the variable inside
amdgpu_vm_manager), since this isn't related to ring management at all.

Regards,
Christian.

>
> Signed-off-by: Alex Xie <alexbin@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 33 
> 
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  6 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   | 28 +--
>   3 files changed, 40 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> index 6a85db0..7d95435 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> @@ -153,6 +153,36 @@ void amdgpu_ring_undo(struct amdgpu_ring *ring)
>   }
>
>   /**
> + * amdgpu_ring_check_compute_vm_bug - check whether this ring has compute vm 
> bug
> + *
> + * @adev: amdgpu_device pointer
> + * @ring: amdgpu_ring structure holding ring information
> + */
> +static void amdgpu_ring_check_compute_vm_bug(struct amdgpu_device *adev,
> + struct amdgpu_ring *ring)
> +{
> + const struct amdgpu_ip_block *ip_block;
> +
> + ring->has_compute_vm_bug = false;
> +
> + if (ring->funcs->type != AMDGPU_RING_TYPE_COMPUTE)
> + /* only compute rings */
> + return;
> +
> + ip_block = amdgpu_get_ip_block(adev, AMD_IP_BLOCK_TYPE_GFX);
> + if (!ip_block)
> + return;
> +
> + /* Compute ring has a VM bug for GFX version < 7.
> +   And compute ring has a VM bug for GFX 8 MEC firmware version < 
> 673.*/
> + if (ip_block->version->major <= 7) {
> + ring->has_compute_vm_bug = true;
> + } else if (ip_block->version->major == 8)
> + if (adev->gfx.mec_fw_version < 673)
> + ring->has_compute_vm_bug = true;
> +}
> +
> +/**
>* amdgpu_ring_init - init driver ring struct.
>*
>* @adev: amdgpu_device pointer
> @@ -257,6 +287,9 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
> amdgpu_ring *ring,
>if (amdgpu_debugfs_ring_init(adev, ring)) {
>DRM_ERROR("Failed to register debugfs file for rings !\n");
>}
> +
> + amdgpu_ring_check_compute_vm_bug(adev, ring);
> +
>return 0;
>   }
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index a9223a8..334307e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -185,6 +185,7 @@ struct amdgpu_ring {
>u64 cond_exe_gpu_addr;
>volatile u32*cond_exe_cpu_addr;
>unsignedvm_inv_eng;
> + boolhas_compute_vm_bug;
>   #if defined(CONFIG_DEBUG_FS)
>struct dentry *ent;
>   #endif
> @@ -207,4 +208,9 @@ static inline void amdgpu_ring_clear_ring(struct 
> amdgpu_ring *ring)
>
>   }
>
> +static inline bool amdgpu_ring_has_compute_vm_bug(struct amdgpu_ring *ring)
> +{
> + return ring->has_compute_vm_bug;
> +}
> +
>   #endif
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index b2384b8..7a323f9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -656,32 +656,6 @@ static int amdgpu_vm_alloc_reserved_vmid(struct 
> amdgpu_device *adev,
>return r;
>   }
>
> -static bool amdgpu_vm_ring_has_compute_vm_bug(struct amdgpu_ring *ring)
> -{
> - struct amdgpu_device *adev = ring->adev;
> - const struct amdgpu_ip_block *ip_block;
> -
> -  

Re: [PATCH libdrm] amdgpu: add missing extern "C" headers

2017-05-14 Thread Xie, AlexBin
Hi Michel,


This patch was from Nicolai.


Though there are two amdgpu.h, one header is in kernel and the other header is 
in libdrm. But their content are irrelevant. They are not synchronized.


I think include/drm/README is not applicable to this case.


Thanks,

Alex Bin Xie


From: Michel Dänzer <mic...@daenzer.net>
Sent: Sunday, May 14, 2017 10:09 PM
To: Xie, AlexBin; Haehnle, Nicolai
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH libdrm] amdgpu: add missing extern "C" headers

On 15/05/17 10:10 AM, Xie, AlexBin wrote:
>  amdgpu/amdgpu.h | 8 
>  1 file changed, 8 insertions(+)

This header is synchronized with the kernel, see include/drm/README.


BTW, don't send patches for review as HTML. Ideally use git send-email.


--
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm] amdgpu: add missing extern "C" headers

2017-05-14 Thread Xie, AlexBin
Hi Michel,


It is fine. There is a glitch in my email client. The same email was sent 
twice, probably because I pressed a special key.


I am new to open source review method. Perhaps I did not make thing clear 
enough.



-Alex Bin Xie


From: Michel Dänzer <mic...@daenzer.net>
Sent: Sunday, May 14, 2017 10:11 PM
To: Xie, AlexBin; Haehnle, Nicolai
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH libdrm] amdgpu: add missing extern "C" headers

On 15/05/17 11:09 AM, Michel Dänzer wrote:
>
> BTW, don't send patches for review as HTML. Ideally use git send-email.

Sorry Alex, I was confused and thought you sent the patch.


--
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm] amdgpu: add missing extern "C" headers

2017-05-14 Thread Xie, AlexBin
 amdgpu/amdgpu.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index fdea905..1901fa8 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -30,20 +30,24 @@
  * User wanted to use libdrm_amdgpu functionality must include
  * this file.
  *
  */
 #ifndef _AMDGPU_H_
 #define _AMDGPU_H_

 #include 
 #include 

+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct drm_amdgpu_info_hw_ip;

 /*--*/
 /* --- Defines  */
 /*--*/

 /**
  * Define max. number of Command Buffers (IB) which could be sent to the single
  * hardware IP to accommodate CE/DE requirements
  *
@@ -1317,11 +1321,15 @@ int amdgpu_cs_destroy_semaphore(amdgpu_semaphore_handle 
sem);
 /**
  *  Get the ASIC marketing name
  *
  * \param   dev - \c [in] Device handle. See 
#amdgpu_device_initialize()
  *
  * \return  the constant string of the marketing name
  *  "NULL" means the ASIC is not found
 */
 const char *amdgpu_get_marketing_name(amdgpu_device_handle dev);

+#ifdef __cplusplus
+}
+#endif
+
 #endif /* #ifdef _AMDGPU_H_ */
--
2.9.3


Reviewed by: Alex Xie <alexbin@amd.com>


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm] amdgpu: add missing extern "C" headers

2017-05-14 Thread Xie, AlexBin
 amdgpu/amdgpu.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index fdea905..1901fa8 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -30,20 +30,24 @@
  * User wanted to use libdrm_amdgpu functionality must include
  * this file.
  *
  */
 #ifndef _AMDGPU_H_
 #define _AMDGPU_H_

 #include 
 #include 

+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct drm_amdgpu_info_hw_ip;

 /*--*/
 /* --- Defines  */
 /*--*/

 /**
  * Define max. number of Command Buffers (IB) which could be sent to the single
  * hardware IP to accommodate CE/DE requirements
  *
@@ -1317,11 +1321,15 @@ int amdgpu_cs_destroy_semaphore(amdgpu_semaphore_handle 
sem);
 /**
  *  Get the ASIC marketing name
  *
  * \param   dev - \c [in] Device handle. See 
#amdgpu_device_initialize()
  *
  * \return  the constant string of the marketing name
  *  "NULL" means the ASIC is not found
 */
 const char *amdgpu_get_marketing_name(amdgpu_device_handle dev);

+#ifdef __cplusplus
+}
+#endif
+
 #endif /* #ifdef _AMDGPU_H_ */
--
2.9.3


Reviewed by: Alex Xie <alexbin@amd.com>


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Make amdgpu_bo_reserve use uninterruptible waits for cleanup

2017-05-02 Thread Xie, AlexBin
I just realised that there are some stress test that sends huge amount of 
signals. I used those tests in the pass 2 years...


-Alex Bin


From: Christian König <deathsim...@vodafone.de>
Sent: Monday, May 1, 2017 10:28 AM
To: Xie, AlexBin; Michel Dänzer
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: Make amdgpu_bo_reserve use uninterruptible 
waits for cleanup

Am 01.05.2017 um 02:13 schrieb Xie, AlexBin:

On 28/04/17 11:12 PM, Xie, AlexBin wrote:

>> Am 28.04.2017 um 10:47 schrieb Michel Dänzer:
>>> From: Michel Dänzer <michel.daen...@amd.com><mailto:michel.daen...@amd.com>
>>>
>>> Some of these paths probably cannot be interrupted by a signal anyway.
>>> Those that can would fail to clean up things if they actually got
>>> interrupted.
>>>
>>> Signed-off-by: Michel Dänzer 
>>> <michel.daen...@amd.com><mailto:michel.daen...@amd.com>
>>
>> Reviewed-by: Christian König 
>> <christian.koe...@amd.com><mailto:christian.koe...@amd.com>
>
> Alex X: Just a reminder: amdgpu_unpin_work_func is called by work queue.
> Signal is blocked already. un-interruptible waiting might slow thing
> down very slightly.

How so?

Alex X: I said "might". If you think it is not slower, it is fine for me.
My real concern is that the signals are blocked for work queue already. We 
don't need this change to avoid unnecessary risk.
In theory, I am not against this change.

I have to agree with that. For waits from work queues it is usually best to 
implicitly use un-interruptible waits for documentation purposes.

In other words even when we are sure that the code can't receive a signal we 
should write it conservatively.

Christian.


--
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Make amdgpu_bo_reserve use uninterruptible waits for cleanup

2017-04-30 Thread Xie, AlexBin
On 28/04/17 11:12 PM, Xie, AlexBin wrote:

>> Am 28.04.2017 um 10:47 schrieb Michel Dänzer:
>>> From: Michel Dänzer <michel.daen...@amd.com>
>>>
>>> Some of these paths probably cannot be interrupted by a signal anyway.
>>> Those that can would fail to clean up things if they actually got
>>> interrupted.
>>>
>>> Signed-off-by: Michel Dänzer <michel.daen...@amd.com>
>>
>> Reviewed-by: Christian König <christian.koe...@amd.com>
>
> Alex X: Just a reminder: amdgpu_unpin_work_func is called by work queue.
> Signal is blocked already. un-interruptible waiting might slow thing
> down very slightly.

How so?

Alex X: I said "might". If you think it is not slower, it is fine for me.
My real concern is that the signals are blocked for work queue already. We 
don't need this change to avoid unnecessary risk.
In theory, I am not against this change.

--
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Make amdgpu_bo_reserve use uninterruptible waits for cleanup

2017-04-28 Thread Xie, AlexBin
Am 28.04.2017 um 10:47 schrieb Michel Dänzer:
> From: Michel Dänzer 
>
> Some of these paths probably cannot be interrupted by a signal anyway.
> Those that can would fail to clean up things if they actually got
> interrupted.
>
> Signed-off-by: Michel Dänzer 

Reviewed-by: Christian König 

Alex X: Just a reminder: amdgpu_unpin_work_func is called by work queue.
Signal is blocked already. un-interruptible waiting might slow thing down very 
slightly.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_display.c |  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c  |  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c|  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c   |  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c  |  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  2 +-
>   drivers/gpu/drm/amd/amdgpu/dce_v10_0.c  |  6 +++---
>   drivers/gpu/drm/amd/amdgpu/dce_v11_0.c  |  6 +++---
>   drivers/gpu/drm/amd/amdgpu/dce_v6_0.c   |  6 +++---
>   drivers/gpu/drm/amd/amdgpu/dce_v8_0.c   |  6 +++---
>   drivers/gpu/drm/amd/amdgpu/dce_virtual.c|  2 +-
>   drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c   |  6 +++---
>   drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c   | 10 +-
>   drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   |  8 
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   |  8 
>   16 files changed, 36 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> index 771a6aae58d6..af64448a565c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> @@ -123,7 +123,7 @@ static void amdgpu_unpin_work_func(struct work_struct 
> *__work)
>int r;
>
>/* unpin of the old buffer */
> - r = amdgpu_bo_reserve(work->old_abo, false);
> + r = amdgpu_bo_reserve(work->old_abo, true);
>if (likely(r == 0)) {
>r = amdgpu_bo_unpin(work->old_abo);
>if (unlikely(r != 0)) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
> index 9dea2f661f1d..e869e60994ce 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
> @@ -113,7 +113,7 @@ static void amdgpufb_destroy_pinned_object(struct 
> drm_gem_object *gobj)
>struct amdgpu_bo *abo = gem_to_amdgpu_bo(gobj);
>int ret;
>
> - ret = amdgpu_bo_reserve(abo, false);
> + ret = amdgpu_bo_reserve(abo, true);
>if (likely(ret == 0)) {
>amdgpu_bo_kunmap(abo);
>amdgpu_bo_unpin(abo);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> index 6d691abe889c..e7406ce7093c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> @@ -183,7 +183,7 @@ void amdgpu_gart_table_vram_unpin(struct amdgpu_device 
> *adev)
>if (adev->gart.robj == NULL) {
>return;
>}
> - r = amdgpu_bo_reserve(adev->gart.robj, false);
> + r = amdgpu_bo_reserve(adev->gart.robj, true);
>if (likely(r == 0)) {
>amdgpu_bo_kunmap(adev->gart.robj);
>amdgpu_bo_unpin(adev->gart.robj);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index ec5b1bc0e428..d40b8ac745cd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -819,7 +819,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
>
>if (amdgpu_sriov_vf(adev)) {
>/* TODO: how to handle reserve failure */
> - BUG_ON(amdgpu_bo_reserve(adev->virt.csa_obj, false));
> + BUG_ON(amdgpu_bo_reserve(adev->virt.csa_obj, true));
>amdgpu_vm_bo_rmv(adev, fpriv->vm.csa_bo_va);
>fpriv->vm.csa_bo_va = NULL;
>amdgpu_bo_unreserve(adev->virt.csa_obj);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> index 3826d5aea0a6..6bdc866570ab 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> @@ -113,7 +113,7 @@ void amdgpu_gem_prime_unpin(struct drm_gem_object *obj)
>struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
>int ret = 0;
>
> - ret = amdgpu_bo_reserve(bo, false);
> + ret = amdgpu_bo_reserve(bo, true);
>if (unlikely(ret != 0))
>return;
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c
> index 7b56d9988aba..de4ebcf4ac2c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c
> @@ -130,7 +130,7 @@ int amdgpu_sa_bo_manager_suspend(struct amdgpu_device 

Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO

2017-04-20 Thread Xie, AlexBin
Hi Christian,


I read amdgpu_ttm_io_mem_reserve() and amdgpu_ttm_io_mem_free() and relevant 
codes from amdgpu_vram_scratch_init and amdgpu_vram_scratch_fini.


No. For the current source code, I think the premap and no-op is not working.


I add printk to prove. You can see bo_kmap_type is an invalid value. ioremap_wc 
is really called to map the IO range into vmalloc space.


...

Apr 20 16:31:18 axie-System-Product-Name kernel: [  106.759623] entering 
amdgpu_vram_scratch_init.
Apr 20 16:31:18 axie-System-Product-Name kernel: [  106.759631] scratch 
ioremap_wc
Apr 20 16:31:18 axie-System-Product-Name kernel: [  106.759631] bo_kmap_type = 
129
Apr 20 16:31:18 axie-System-Product-Name kernel: [  106.759632] bus address =   
(null)
Apr 20 16:31:18 axie-System-Product-Name kernel: [  106.759632] is_iomem = 1
Apr 20 16:31:18 axie-System-Product-Name kernel: [  106.759633] leaving 
amdgpu_vram_scratch_init.
...

I don't have log on rmmod AMDGPU yet. There are quite some settings to make 
that happen in my computer.
I think they are symemtric. Both mapping and unmapping are not no-op.

Here is the printk patch for your reference.

From 25f95239c23225008e4b59f30b9b5363fb321f94 Mon Sep 17 00:00:00 2001
From: Alex Xie <alexbin@amd.com>
Date: Thu, 20 Apr 2017 17:48:49 -0400
Subject: [PATCH] A hack to trace premap and noop.

Change-Id: I61fbbdbd82f62433e229b2e7e97be7a780ea8afa
Signed-off-by: Alex Xie <alexbin@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 22 ++
 drivers/gpu/drm/ttm/ttm_bo.c   |  1 +
 drivers/gpu/drm/ttm/ttm_bo_util.c  | 29 ++---
 include/drm/ttm/ttm_bo_api.h   |  1 +
 4 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index fbb4afb..a537ea1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -313,6 +313,7 @@ static void amdgpu_block_invalid_wreg(struct amdgpu_device 
*adev,
 static int amdgpu_vram_scratch_init(struct amdgpu_device *adev)
 {
  int r;
+ printk("entering amdgpu_vram_scratch_init.");

  if (adev->vram_scratch.robj == NULL) {
  r = amdgpu_bo_create(adev, AMDGPU_GPU_PAGE_SIZE,
@@ -340,16 +341,36 @@ static int amdgpu_vram_scratch_init(struct amdgpu_device 
*adev)
  amdgpu_bo_unpin(adev->vram_scratch.robj);
  amdgpu_bo_unreserve(adev->vram_scratch.robj);

+ /* Would like a printk to see if map / unmap is noop*/
+ adev->vram_scratch.robj->tbo.mem.bus.printk = true;
+
+ if (adev->vram_scratch.robj->kmap.bo_kmap_type == ttm_bo_map_premapped)
+ printk("amdgpu_vram_scratch premapped!");
+
+ printk("bo_kmap_type = %d", adev->vram_scratch.robj->kmap.bo_kmap_type);
+ printk("bus address = %p", adev->vram_scratch.robj->tbo.mem.bus.addr);
+ printk("is_iomem = %d", adev->vram_scratch.robj->tbo.mem.bus.is_iomem);
+ printk("leaving amdgpu_vram_scratch_init.");
+
  return r;
 }

 static void amdgpu_vram_scratch_fini(struct amdgpu_device *adev)
 {
  int r;
+ printk("entering amdgpu_vram_scratch_fini.");

  if (adev->vram_scratch.robj == NULL) {
  return;
  }
+
+ if (adev->vram_scratch.robj->kmap.bo_kmap_type == ttm_bo_map_premapped)
+ printk("amdgpu_vram_scratch premapped!");
+
+ printk("bo_kmap_type = %d", adev->vram_scratch.robj->kmap.bo_kmap_type);
+ printk("bus address = %p", adev->vram_scratch.robj->tbo.mem.bus.addr);
+ printk("is_iomem = %d", adev->vram_scratch.robj->tbo.mem.bus.is_iomem);
+
  r = amdgpu_bo_reserve(adev->vram_scratch.robj, false);
  if (likely(r == 0)) {
  amdgpu_bo_kunmap(adev->vram_scratch.robj);
@@ -357,6 +378,7 @@ static void amdgpu_vram_scratch_fini(struct amdgpu_device 
*adev)
  amdgpu_bo_unreserve(adev->vram_scratch.robj);
  }
  amdgpu_bo_unref(>vram_scratch.robj);
+ printk("leaving amdgpu_vram_scratch_fini.");
 }

 /**
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 989b98b..9b05d54 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1178,6 +1178,7 @@ int ttm_bo_init_reserved(struct ttm_bo_device *bdev,
  bo->mem.page_alignment = page_alignment;
  bo->mem.bus.io_reserved_vm = false;
  bo->mem.bus.io_reserved_count = 0;
+ bo->mem.bus.printk = false;
  bo->moving = NULL;
  bo->mem.placement = (TTM_PL_FLAG_SYSTEM | TTM_PL_FLAG_CACHED);
  bo->persistent_swap_storage = persistent_swap_storage;
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
b/drivers/gpu/drm/ttm/ttm_bo_util.c
index bf6e216..9d06952 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -526,14 +526,24 @@ static int ttm_bo_ioremap(struct ttm_buffer_object *bo,
  if (bo->mem.bus.addr) {
  map->bo_kmap_type = ttm_bo_map_premapp

Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO

2017-04-19 Thread Xie, AlexBin
Hi Christian,


Without correctly kunmap, page table is corrupted. Page entries point to wrong 
memory locations. You might call it completely harmless. But I think this is a 
severe bug. Leaking memory is better than a corrupted page table. Think 
security.


Would you provide any document and reference by saying" It is impossible to 
receive a signal during module load/unload"? For example, if the unload stuck 
in a lock, can CTRL+C stop the unload?


If "It is impossible to receive a signal during module load/unload", 
interruptible waiting is fine too, because function amdgpu_bo_reserve will 
return successfully.


What about there is some other return error? What about in future somebody 
improve amdgpu_bo_reserve to return other errors, then function 
amdgpu_vram_scratch_fini becomes buggy?


While I am thinking whether there is a better way for the current situation, 
would you give a real world example that my patch really not working? Then we 
can address it.


Thanks,

Alex Bin


From: Christian König <deathsim...@vodafone.de>
Sent: Wednesday, April 19, 2017 2:35 AM
To: Xie, AlexBin; Zhou, David(ChunMing); amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO

Hi AlexBin,

the answer is ttm_bo_kunmap isn't called at all and yes in the case of an iomap 
we leak the address space reserved.

But that is completely harmless on a 64bit system compared to leaking the 
memory backing the address space.

Using amdgpu_bo_free_kernel() instead of openly coding it here is probably a 
good idea.

Additional to that it's probably a good idea to set the no_intr flag when 
reserving kernel BOs. It is impossible to receive a signal during module 
load/unload, but it's probably better to document that in the code as well.

Regards,
Christian.

Am 18.04.2017 um 20:54 schrieb Xie, AlexBin:
Hi Christian,

Have you found how/where/when? When you said "mapping will just be released a 
bit later on", you must know the answer.

It is difficult to prove something does not exist. Anyway, I will give it a try 
to prove such "later on" does not exist.

Function ttm_bo_kunmap is the only function to unmap. To prove this, search 
ttm_bo_map_iomap, only ttm_bo_kunmap use this enum to correctly kunmap.

Function ttm_bo_kunmap is not called by ttm itself. This is a hint that all TTM 
delay delete mechanism or unref mechanism will NOT kunmap BO later on.

Function ttm_bo_kunmap is called by AMDGPU function amdgpu_bo_kunmap and 
amdgpu_gem_prime_vunmap.

Search AMDGPU for amdgpu_bo_kunmap. All matches do not kunmap for scratch VRAM 
BO.  amdgpu_bo_free_kernel is a suspect but the answer is still NO.

So all possibilities are searched. Did I miss anything?

Thanks,
Alex Bin Xie

____
From: Xie, AlexBin
Sent: Tuesday, April 18, 2017 2:04:33 PM
To: Christian König; Zhou, David(ChunMing); 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO


Hi Christian,


Would you point out where/when will kunmap happen for this BO when release? It 
must be somewhere in some function calls.


I checked before I asked for review. But I did not see such obvious kunmap 
function call.


If so, there should be a comment in function amdgpu_vram_scratch_fini to avoid 
future confusion.

Thanks,
Alex Bin Xie

From: Christian König <deathsim...@vodafone.de><mailto:deathsim...@vodafone.de>
Sent: Tuesday, April 18, 2017 1:46 PM
To: Xie, AlexBin; Zhou, David(ChunMing); 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO

Hi AlexBin,

No, David is right. This is a very common coding pattern in the kernel module.

Freeing up a BO while there still exists a kernel mapping is perfectly ok, the 
mapping will just be released a bit later on.

So this code is actually perfectly ok and just an optimization, but your patch 
breaks it and creates a memory leak.

Regards,
Christian.

Am 18.04.2017 um 17:17 schrieb Xie, AlexBin:

Hi David,


When amdgpu_bo_reserve return errors, we cannot release the BO. This is not a 
memory leak. General speaking, memory leak is unnoticed and unintentional.


The caller of function amdgpu_vram_scratch_fini ignores the return error 
value...


The "memory leak" is not caused by my patch. It is caused because reserving BO 
fails.


This patch only aim to make function amdgpu_vram_scratch_fini behave correctly.


To follow up, we can add a warning message when amdgpu_bo_reserve error happens 
in a different patch.


If function call amdgpu_bo_reserve is changed to uninterruptible, this changes 
driver behaviour. Without a substantial issue, I would be cautious for such a 
change.


Thanks,

Alex Bin Xie

________
From: Zhou, David(ChunMing)
Sent: Monday, April 1

Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO

2017-04-18 Thread Xie, AlexBin
Hi Christian,

Have you found how/where/when? When you said "mapping will just be released a 
bit later on", you must know the answer.

It is difficult to prove something does not exist. Anyway, I will give it a try 
to prove such "later on" does not exist.

Function ttm_bo_kunmap is the only function to unmap. To prove this, search 
ttm_bo_map_iomap, only ttm_bo_kunmap use this enum to correctly kunmap.

Function ttm_bo_kunmap is not called by ttm itself. This is a hint that all TTM 
delay delete mechanism or unref mechanism will NOT kunmap BO later on.

Function ttm_bo_kunmap is called by AMDGPU function amdgpu_bo_kunmap and 
amdgpu_gem_prime_vunmap.

Search AMDGPU for amdgpu_bo_kunmap. All matches do not kunmap for scratch VRAM 
BO.  amdgpu_bo_free_kernel is a suspect but the answer is still NO.

So all possibilities are searched. Did I miss anything?

Thanks,
Alex Bin Xie

____________
From: Xie, AlexBin
Sent: Tuesday, April 18, 2017 2:04:33 PM
To: Christian König; Zhou, David(ChunMing); amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO


Hi Christian,


Would you point out where/when will kunmap happen for this BO when release? It 
must be somewhere in some function calls.


I checked before I asked for review. But I did not see such obvious kunmap 
function call.


If so, there should be a comment in function amdgpu_vram_scratch_fini to avoid 
future confusion.

Thanks,
Alex Bin Xie

From: Christian König <deathsim...@vodafone.de>
Sent: Tuesday, April 18, 2017 1:46 PM
To: Xie, AlexBin; Zhou, David(ChunMing); amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO

Hi AlexBin,

No, David is right. This is a very common coding pattern in the kernel module.

Freeing up a BO while there still exists a kernel mapping is perfectly ok, the 
mapping will just be released a bit later on.

So this code is actually perfectly ok and just an optimization, but your patch 
breaks it and creates a memory leak.

Regards,
Christian.

Am 18.04.2017 um 17:17 schrieb Xie, AlexBin:

Hi David,


When amdgpu_bo_reserve return errors, we cannot release the BO. This is not a 
memory leak. General speaking, memory leak is unnoticed and unintentional.


The caller of function amdgpu_vram_scratch_fini ignores the return error 
value...


The "memory leak" is not caused by my patch. It is caused because reserving BO 
fails.


This patch only aim to make function amdgpu_vram_scratch_fini behave correctly.


To follow up, we can add a warning message when amdgpu_bo_reserve error happens 
in a different patch.


If function call amdgpu_bo_reserve is changed to uninterruptible, this changes 
driver behaviour. Without a substantial issue, I would be cautious for such a 
change.


Thanks,

Alex Bin Xie


From: Zhou, David(ChunMing)
Sent: Monday, April 17, 2017 10:38 PM
To: Xie, AlexBin; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO



On 2017年04月17日 22:54, Xie, AlexBin wrote:

Hi David,


Thanks for the comments. However, please have look at amdgpu_bo_reserve 
definition.

static inline int amdgpu_bo_reserve(struct amdgpu_bo *bo, bool no_intr)

Ah, this is a wired wrapper for ttm_bo_reserve.



When we call this function like the following:

 r = amdgpu_bo_reserve(adev->vram_scratch.robj, false);
The false means interruptible.


On the other hand,  when amdgpu_bo_reserve function return error, why do we 
unref BO without kunmap and unpin the BO? This is wrong implementation when 
amdgpu_bo_reserve return any error.

Yeah, I see your mean, it's in driver un-loading, How about changing to no 
interruptible? Your patch will make a memleak if bo_reserve fails, but it seems 
not matter. I have no strong preference.

Regards,
David Zhou


Thanks,
Alex Bin Xie


From: Zhou, David(ChunMing)
Sent: Friday, April 14, 2017 12:00 AM
To: Xie, AlexBin; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO



On 2017年04月14日 05:34, Alex Xie wrote:
> According to comment of amdgpu_bo_reserve, amdgpu_bo_reserve
> can return with -ERESTARTSYS. When this function was interrupted
> by a signal, BO should not be unref. Otherwise the BO might be
> released while is kmapped and pinned, or BO MIGHT be deref
> multiple times, etc.
 r = amdgpu_bo_reserve(adev->vram_scratch.robj, false);
we have specified interruptible to false, so -ERESTARTSYS isn't possible
here.

Thanks,
David Zhou
>
> Change-Id: If76071a768950a0d3ad9d5da7fcae04881807621
> Signed-off-by: Alex Xie <alexbin@amd.com><mailto:alexbin@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
>   1 file changed, 1 insertion(+), 1 dele

Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO

2017-04-18 Thread Xie, AlexBin
Hi Christian,


Would you point out where/when will kunmap happen for this BO when release? It 
must be somewhere in some function calls.


I checked before I asked for review. But I did not see such obvious kunmap 
function call.


If so, there should be a comment in function amdgpu_vram_scratch_fini to avoid 
future confusion.

Thanks,
Alex Bin Xie

From: Christian König <deathsim...@vodafone.de>
Sent: Tuesday, April 18, 2017 1:46 PM
To: Xie, AlexBin; Zhou, David(ChunMing); amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO

Hi AlexBin,

No, David is right. This is a very common coding pattern in the kernel module.

Freeing up a BO while there still exists a kernel mapping is perfectly ok, the 
mapping will just be released a bit later on.

So this code is actually perfectly ok and just an optimization, but your patch 
breaks it and creates a memory leak.

Regards,
Christian.

Am 18.04.2017 um 17:17 schrieb Xie, AlexBin:

Hi David,


When amdgpu_bo_reserve return errors, we cannot release the BO. This is not a 
memory leak. General speaking, memory leak is unnoticed and unintentional.


The caller of function amdgpu_vram_scratch_fini ignores the return error 
value...


The "memory leak" is not caused by my patch. It is caused because reserving BO 
fails.


This patch only aim to make function amdgpu_vram_scratch_fini behave correctly.


To follow up, we can add a warning message when amdgpu_bo_reserve error happens 
in a different patch.


If function call amdgpu_bo_reserve is changed to uninterruptible, this changes 
driver behaviour. Without a substantial issue, I would be cautious for such a 
change.


Thanks,

Alex Bin Xie


From: Zhou, David(ChunMing)
Sent: Monday, April 17, 2017 10:38 PM
To: Xie, AlexBin; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO



On 2017年04月17日 22:54, Xie, AlexBin wrote:

Hi David,


Thanks for the comments. However, please have look at amdgpu_bo_reserve 
definition.

static inline int amdgpu_bo_reserve(struct amdgpu_bo *bo, bool no_intr)

Ah, this is a wired wrapper for ttm_bo_reserve.



When we call this function like the following:

 r = amdgpu_bo_reserve(adev->vram_scratch.robj, false);
The false means interruptible.


On the other hand,  when amdgpu_bo_reserve function return error, why do we 
unref BO without kunmap and unpin the BO? This is wrong implementation when 
amdgpu_bo_reserve return any error.

Yeah, I see your mean, it's in driver un-loading, How about changing to no 
interruptible? Your patch will make a memleak if bo_reserve fails, but it seems 
not matter. I have no strong preference.

Regards,
David Zhou


Thanks,
Alex Bin Xie


From: Zhou, David(ChunMing)
Sent: Friday, April 14, 2017 12:00 AM
To: Xie, AlexBin; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO



On 2017年04月14日 05:34, Alex Xie wrote:
> According to comment of amdgpu_bo_reserve, amdgpu_bo_reserve
> can return with -ERESTARTSYS. When this function was interrupted
> by a signal, BO should not be unref. Otherwise the BO might be
> released while is kmapped and pinned, or BO MIGHT be deref
> multiple times, etc.
 r = amdgpu_bo_reserve(adev->vram_scratch.robj, false);
we have specified interruptible to false, so -ERESTARTSYS isn't possible
here.

Thanks,
David Zhou
>
> Change-Id: If76071a768950a0d3ad9d5da7fcae04881807621
> Signed-off-by: Alex Xie <alexbin@amd.com><mailto:alexbin@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 53996e3..1dcc2d1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -355,8 +355,8 @@ static void amdgpu_vram_scratch_fini(struct amdgpu_device 
> *adev)
>amdgpu_bo_kunmap(adev->vram_scratch.robj);
>amdgpu_bo_unpin(adev->vram_scratch.robj);
>amdgpu_bo_unreserve(adev->vram_scratch.robj);
> + amdgpu_bo_unref(>vram_scratch.robj);
>}
> - amdgpu_bo_unref(>vram_scratch.robj);
>   }
>
>   /**





___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO

2017-04-18 Thread Xie, AlexBin
Hi David,


When amdgpu_bo_reserve return errors, we cannot release the BO. This is not a 
memory leak. General speaking, memory leak is unnoticed and unintentional.


The caller of function amdgpu_vram_scratch_fini ignores the return error 
value...


The "memory leak" is not caused by my patch. It is caused because reserving BO 
fails.


This patch only aim to make function amdgpu_vram_scratch_fini behave correctly.


To follow up, we can add a warning message when amdgpu_bo_reserve error happens 
in a different patch.


If function call amdgpu_bo_reserve is changed to uninterruptible, this changes 
driver behaviour. Without a substantial issue, I would be cautious for such a 
change.


Thanks,

Alex Bin Xie


From: Zhou, David(ChunMing)
Sent: Monday, April 17, 2017 10:38 PM
To: Xie, AlexBin; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO



On 2017年04月17日 22:54, Xie, AlexBin wrote:

Hi David,


Thanks for the comments. However, please have look at amdgpu_bo_reserve 
definition.

static inline int amdgpu_bo_reserve(struct amdgpu_bo *bo, bool no_intr)

Ah, this is a wired wrapper for ttm_bo_reserve.



When we call this function like the following:

 r = amdgpu_bo_reserve(adev->vram_scratch.robj, false);
The false means interruptible.


On the other hand,  when amdgpu_bo_reserve function return error, why do we 
unref BO without kunmap and unpin the BO? This is wrong implementation when 
amdgpu_bo_reserve return any error.

Yeah, I see your mean, it's in driver un-loading, How about changing to no 
interruptible? Your patch will make a memleak if bo_reserve fails, but it seems 
not matter. I have no strong preference.

Regards,
David Zhou


Thanks,
Alex Bin Xie


From: Zhou, David(ChunMing)
Sent: Friday, April 14, 2017 12:00 AM
To: Xie, AlexBin; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO



On 2017年04月14日 05:34, Alex Xie wrote:
> According to comment of amdgpu_bo_reserve, amdgpu_bo_reserve
> can return with -ERESTARTSYS. When this function was interrupted
> by a signal, BO should not be unref. Otherwise the BO might be
> released while is kmapped and pinned, or BO MIGHT be deref
> multiple times, etc.
 r = amdgpu_bo_reserve(adev->vram_scratch.robj, false);
we have specified interruptible to false, so -ERESTARTSYS isn't possible
here.

Thanks,
David Zhou
>
> Change-Id: If76071a768950a0d3ad9d5da7fcae04881807621
> Signed-off-by: Alex Xie <alexbin@amd.com><mailto:alexbin@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 53996e3..1dcc2d1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -355,8 +355,8 @@ static void amdgpu_vram_scratch_fini(struct amdgpu_device 
> *adev)
>amdgpu_bo_kunmap(adev->vram_scratch.robj);
>amdgpu_bo_unpin(adev->vram_scratch.robj);
>amdgpu_bo_unreserve(adev->vram_scratch.robj);
> + amdgpu_bo_unref(>vram_scratch.robj);
>}
> - amdgpu_bo_unref(>vram_scratch.robj);
>   }
>
>   /**


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] amdgpu:fix incorrect use on the remap_mutex

2017-04-17 Thread Xie, AlexBin
Hi Monk,


Correction:


Please remove the following line in the patch:

+#include "stdio.h"


This patch does not need to use stdio.h.

With that fixed, Reviewed by:  Alex Xie <alexbin@amd.com>

Thanks,
Alex Bin Xie

________
From: Xie, AlexBin
Sent: Monday, April 17, 2017 2:58 PM
To: Liu, Monk; 'amd-gfx@lists.freedesktop.org'
Cc: brahma_sw_dev
Subject: Re: [PATCH] amdgpu:fix incorrect use on the remap_mutex


Reviewed by:  Alex Xie <alexbin@amd.com>


Please note that this is patch is not for all open libdrm.


There is no remap_mutex in master branch of all open libdrm.


Thanks,

Alex Bin Xie


From: Liu, Monk
Sent: Monday, April 17, 2017 11:12 AM
To: 'amd-gfx@lists.freedesktop.org'
Cc: brahma_sw_dev
Subject: 答复: [PATCH] amdgpu:fix incorrect use on the remap_mutex

Anyone review it  ?

-邮件原件-
发件人: Liu, Monk
发送时间: 2017年4月17日 14:16
收件人: amd-gfx@lists.freedesktop.org
主题: [PATCH] amdgpu:fix incorrect use on the remap_mutex

[PATCH] amdgpu:fix incorrect use on the remap_mutex

this patch fixed a multi-thread race issue by expand the protection range of 
remap_mutex to the whole LIST walk through, by this fix the CTS test:
"dEQP-VK.api.object_management.multithreaded_per_thread_device.device_memory_smal"
can always pass now,
Previously it has 1/15 chance to fail with a high performance I7 CPU under 
virtualization envrionment.

Change-Id: If88b9d35e79fe1cb6eb0e32c680bc4996c7915bc
Signed-off-by: Monk Liu <monk@amd.com>
---
 amdgpu/amdgpu_bo.c| 29 ++---
 amdgpu/amdgpu_vamgr.c |  5 +++--
 2 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c index 33cb705..8960728 
100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -989,6 +989,8 @@ int amdgpu_bo_va_op_refcounted(amdgpu_device_handle dev,
 return -EINVAL;
 }

+   pthread_mutex_lock(>remap_mutex);
+
 /* find the previous mapped va object and its bo and unmap it*/
 if (ops == AMDGPU_VA_OP_MAP) {
 LIST_FOR_EACH_ENTRY(vahandle, >remap_list, list) { @@ 
-998,15 +1000,15 @@ int amdgpu_bo_va_op_refcounted(amdgpu_device_handle dev,
 /* the overlap va mapping which need to be 
unmapped first */
 vao = vahandle;
 r = amdgpu_bo_va_op(vao->bo, vao->offset, 
vao->size, vao->address, flags, AMDGPU_VA_OP_UNMAP);
-   if (r)
+   if (r) {
+   pthread_mutex_unlock(>remap_mutex);
 return -EINVAL;
+   }

 /* Just drop the reference. */
 amdgpu_bo_reference(>bo, NULL);
 /* remove the remap from list */
-   pthread_mutex_lock(>remap_mutex);
 list_del(>list);
-   pthread_mutex_unlock(>remap_mutex);
 free(vao);
 }
 }
@@ -1021,12 +1023,18 @@ int amdgpu_bo_va_op_refcounted(amdgpu_device_handle dev,
 }
 }
 if (vao) {
-   if (bo && (bo != vao->bo))
+   if (bo && (bo != vao->bo)) {
+   pthread_mutex_unlock(>remap_mutex);
 return -EINVAL;
-   } else
+   }
+   } else {
+   pthread_mutex_unlock(>remap_mutex);
 return -EINVAL;
-   } else
+   }
+   } else {
+   pthread_mutex_unlock(>remap_mutex);
 return -EINVAL;
+   }

 /* we only allow null bo for unmap operation */
 if (!bo)
@@ -1039,26 +1047,25 @@ int amdgpu_bo_va_op_refcounted(amdgpu_device_handle dev,
 /* Just drop the reference. */
 amdgpu_bo_reference(, NULL);
 /* remove the remap from list */
-   pthread_mutex_lock(>remap_mutex);
 list_del(>list);
-   pthread_mutex_unlock(>remap_mutex);
 free(vao);
 } else if (ops == AMDGPU_VA_OP_MAP) {
 /* bump the refcount of bo! */
 atomic_inc(>refcount);
 /* add the remap to list and vao should be NULL for 
map */
 vao = (struct amdgpu_va_remap*)calloc(1, sizeof(struct 
amdgpu_va_remap));
-   if (!vao)
+   if (!vao) {
+

Re: [PATCH] amdgpu:fix incorrect use on the remap_mutex

2017-04-17 Thread Xie, AlexBin
Reviewed by:  Alex Xie <alexbin@amd.com>


Please note that this is patch is not for all open libdrm.


There is no remap_mutex in master branch of all open libdrm.


Thanks,

Alex Bin Xie


From: Liu, Monk
Sent: Monday, April 17, 2017 11:12 AM
To: 'amd-gfx@lists.freedesktop.org'
Cc: brahma_sw_dev
Subject: 答复: [PATCH] amdgpu:fix incorrect use on the remap_mutex

Anyone review it  ?

-邮件原件-
发件人: Liu, Monk
发送时间: 2017年4月17日 14:16
收件人: amd-gfx@lists.freedesktop.org
主题: [PATCH] amdgpu:fix incorrect use on the remap_mutex

[PATCH] amdgpu:fix incorrect use on the remap_mutex

this patch fixed a multi-thread race issue by expand the protection range of 
remap_mutex to the whole LIST walk through, by this fix the CTS test:
"dEQP-VK.api.object_management.multithreaded_per_thread_device.device_memory_smal"
can always pass now,
Previously it has 1/15 chance to fail with a high performance I7 CPU under 
virtualization envrionment.

Change-Id: If88b9d35e79fe1cb6eb0e32c680bc4996c7915bc
Signed-off-by: Monk Liu <monk@amd.com>
---
 amdgpu/amdgpu_bo.c| 29 ++---
 amdgpu/amdgpu_vamgr.c |  5 +++--
 2 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c index 33cb705..8960728 
100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -989,6 +989,8 @@ int amdgpu_bo_va_op_refcounted(amdgpu_device_handle dev,
 return -EINVAL;
 }

+   pthread_mutex_lock(>remap_mutex);
+
 /* find the previous mapped va object and its bo and unmap it*/
 if (ops == AMDGPU_VA_OP_MAP) {
 LIST_FOR_EACH_ENTRY(vahandle, >remap_list, list) { @@ 
-998,15 +1000,15 @@ int amdgpu_bo_va_op_refcounted(amdgpu_device_handle dev,
 /* the overlap va mapping which need to be 
unmapped first */
 vao = vahandle;
 r = amdgpu_bo_va_op(vao->bo, vao->offset, 
vao->size, vao->address, flags, AMDGPU_VA_OP_UNMAP);
-   if (r)
+   if (r) {
+   pthread_mutex_unlock(>remap_mutex);
 return -EINVAL;
+   }

 /* Just drop the reference. */
 amdgpu_bo_reference(>bo, NULL);
 /* remove the remap from list */
-   pthread_mutex_lock(>remap_mutex);
 list_del(>list);
-   pthread_mutex_unlock(>remap_mutex);
 free(vao);
 }
 }
@@ -1021,12 +1023,18 @@ int amdgpu_bo_va_op_refcounted(amdgpu_device_handle dev,
 }
 }
 if (vao) {
-   if (bo && (bo != vao->bo))
+   if (bo && (bo != vao->bo)) {
+   pthread_mutex_unlock(>remap_mutex);
 return -EINVAL;
-   } else
+   }
+   } else {
+   pthread_mutex_unlock(>remap_mutex);
 return -EINVAL;
-   } else
+   }
+   } else {
+   pthread_mutex_unlock(>remap_mutex);
 return -EINVAL;
+   }

 /* we only allow null bo for unmap operation */
 if (!bo)
@@ -1039,26 +1047,25 @@ int amdgpu_bo_va_op_refcounted(amdgpu_device_handle dev,
 /* Just drop the reference. */
 amdgpu_bo_reference(, NULL);
 /* remove the remap from list */
-   pthread_mutex_lock(>remap_mutex);
 list_del(>list);
-   pthread_mutex_unlock(>remap_mutex);
 free(vao);
 } else if (ops == AMDGPU_VA_OP_MAP) {
 /* bump the refcount of bo! */
 atomic_inc(>refcount);
 /* add the remap to list and vao should be NULL for 
map */
 vao = (struct amdgpu_va_remap*)calloc(1, sizeof(struct 
amdgpu_va_remap));
-   if (!vao)
+   if (!vao) {
+   pthread_mutex_unlock(>remap_mutex);
 return -ENOMEM;
+   }
 vao->address = addr;
 vao->size = size;
 vao->offset = offset;
 vao->bo = bo;
-   pthread_mutex_lock(>remap_mutex);
 list_add(>list, >remap_list);
-   pthre

Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO

2017-04-17 Thread Xie, AlexBin
Hi David,


Thanks for the comments. However, please have look at amdgpu_bo_reserve 
definition.

static inline int amdgpu_bo_reserve(struct amdgpu_bo *bo, bool no_intr)


When we call this function like the following:

 r = amdgpu_bo_reserve(adev->vram_scratch.robj, false);
The false means interruptible.


On the other hand,  when amdgpu_bo_reserve function return error, why do we 
unref BO without kunmap and unpin the BO? This is wrong implementation when 
amdgpu_bo_reserve return any error.


Thanks,
Alex Bin Xie


From: Zhou, David(ChunMing)
Sent: Friday, April 14, 2017 12:00 AM
To: Xie, AlexBin; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO



On 2017年04月14日 05:34, Alex Xie wrote:
> According to comment of amdgpu_bo_reserve, amdgpu_bo_reserve
> can return with -ERESTARTSYS. When this function was interrupted
> by a signal, BO should not be unref. Otherwise the BO might be
> released while is kmapped and pinned, or BO MIGHT be deref
> multiple times, etc.
 r = amdgpu_bo_reserve(adev->vram_scratch.robj, false);
we have specified interruptible to false, so -ERESTARTSYS isn't possible
here.

Thanks,
David Zhou
>
> Change-Id: If76071a768950a0d3ad9d5da7fcae04881807621
> Signed-off-by: Alex Xie <alexbin@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 53996e3..1dcc2d1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -355,8 +355,8 @@ static void amdgpu_vram_scratch_fini(struct amdgpu_device 
> *adev)
>amdgpu_bo_kunmap(adev->vram_scratch.robj);
>amdgpu_bo_unpin(adev->vram_scratch.robj);
>amdgpu_bo_unreserve(adev->vram_scratch.robj);
> + amdgpu_bo_unref(>vram_scratch.robj);
>}
> - amdgpu_bo_unref(>vram_scratch.robj);
>   }
>
>   /**

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH libdrm 3/3] amdgpu: A new option to run tests on render node

2017-01-24 Thread Xie, AlexBin
Hi Emil,

Point 1 will be left for future patch.

Current error message is following.
Error: Permission denied. Hint:Try to run this test program as root.

I am thinking change it. Error message will be:
Error: Permission denied. Hint:Try to run this test program as root or in TTY.

Thanks,
Alex Bin Xie

-Original Message-
From: Emil Velikov [mailto:emil.l.veli...@gmail.com] 
Sent: Friday, January 20, 2017 8:31 AM
To: Xie, AlexBin <alexbin@amd.com>
Cc: amd-gfx mailing list <amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH libdrm 3/3] amdgpu: A new option to run tests on render node

HI Alex,

A couple of small idea(s) for future work (?).

On 19 January 2017 at 22:53, Alex Xie <alexbin@amd.com> wrote:
> Tested:
> 1. As root, tests passed on primary.
Add auth mechanism and request run outside of X environment (switching
to TTY should work).
Then adjust the suggestion s/run as root/run in TTY/ ?

> 2. As root, BO export/import failed on render node as expected.
Afaict those can never succeed, so might as well change the test to
expect failure [when using the render node], or at least print a
message "the following failure is expected" ?

Thanks the series !
Emil
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 2/3] amdgpu: A new option to choose which device to run most tests

2017-01-20 Thread Xie, AlexBin
HI Emil,


See below.


Thanks,

Alex



From: Emil Velikov <emil.l.veli...@gmail.com>
Sent: Friday, January 20, 2017 8:24 AM
To: Xie, AlexBin
Cc: amd-gfx mailing list
Subject: Re: [PATCH libdrm 2/3] amdgpu: A new option to choose which device to 
run most tests

On 19 January 2017 at 22:53, Alex Xie <alexbin@amd.com> wrote:
> This can be used to test multiple GPUs
>
> Signed-off-by: Alex Xie <alexbin@amd.com>
> ---
>  tests/amdgpu/amdgpu_test.c | 25 -
>  1 file changed, 20 insertions(+), 5 deletions(-)
>
> diff --git a/tests/amdgpu/amdgpu_test.c b/tests/amdgpu/amdgpu_test.c
> index e42ef9d..2437db4 100644
> --- a/tests/amdgpu/amdgpu_test.c
> +++ b/tests/amdgpu/amdgpu_test.c
> @@ -108,12 +108,14 @@ static void display_test_suites(void)
>
>
>  /** Help string for command line parameters */
> -static const char usage[] = "Usage: %s [-hl] [<-s > [-t  id>]]\n"
> -   "where:\n"
> -   "   l - Display all suites and their 
> tests\n"
> -   "   h - Display this help\n";
> +static const char usage[] =
> +   "Usage: %s [-hl] [<-s > [-t ]] [-d ]\n"
> +   "where:\n"
> +   "   l - Display all suites and their tests\n"
> +   "   d - Choose which device to run tests\n"
You want to elaborate on what you mean with "device" here.
Even if currently "card0 is my Kabini, card1 Hawaii, card2 other" that
may change upon reboot.
So although very nice to say "0,1, 2..." things are non-deterministic.

[Alex Bin Xie] Originally I just want to give a simple solution for people to 
swap to test another GPU device.
Every time amdgpu_test runs, it lists the device id and its information. I was 
thinking to improve this but such as adding an option to list the devices. So 
that the 0, 1 ,2 does not change if computer is not reboot. If 0, 1, 2 is not 
deterministic, I have to give up this human friendly interface...

How about choose the test device by bus ID?  I will add an option to list the 
devices with BUS ID (if people run drmdevice program, they know the information 
too). Then people can choose  device on which BUS ID to run the test. I will 
provide a new patch if there is no object to this.


-Emil
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 1/3] amdgpu: verify the tested device

2017-01-20 Thread Xie, AlexBin
Hi Emil,


Thanks for the comments.


Please see below.


Regards,

Alex Bin Xie



From: Emil Velikov <emil.l.veli...@gmail.com>
Sent: Friday, January 20, 2017 8:18 AM
To: Xie, AlexBin
Cc: amd-gfx mailing list
Subject: Re: [PATCH libdrm 1/3] amdgpu: verify the tested device

Hi Alex,

Thanks for doing this. There's a few nitpicks on top of what David and
Christian has spotted.

On 19 January 2017 at 22:53, Alex Xie <alexbin@amd.com> wrote:
> Verify the vender ID and driver name.
> Open all AMDGPU devices.
> Provide an option to open render node.
>
> Tested as root: PASS
> Tested as non-privileged user:
> All tests failed as expected
>
> Signed-off-by: Alex Xie <alexbin@amd.com>
> ---
>  tests/amdgpu/amdgpu_test.c | 144 
> +
>  1 file changed, 121 insertions(+), 23 deletions(-)
>
> diff --git a/tests/amdgpu/amdgpu_test.c b/tests/amdgpu/amdgpu_test.c
> index 71f357c..e42ef9d 100644
> --- a/tests/amdgpu/amdgpu_test.c
> +++ b/tests/amdgpu/amdgpu_test.c
> @@ -115,6 +115,119 @@ static const char usage[] = "Usage: %s [-hl] [<-s 
> > [-t ]]\n"
>  /** Specified options strings for getopt */
>  static const char options[]   = "hls:t:";
>
> +/* Open AMD devices.
> + * Return the number of AMD device openned.
> + */
> +static int amdgpu_open_devices(int open_render_node)
> +{
> +   drmDevicePtr devices[MAX_CARDS_SUPPORTED];
> +   int ret;
> +   int i;
> +   int j;
> +   int amd_index = 0;
> +   int drm_count;
> +   int fd;
> +   char *device_name;
> +   drmVersionPtr version;
> +
> +   drm_count = drmGetDevices2(0, devices, MAX_CARDS_SUPPORTED);
> +
> +   if (drm_count < 0) {
> +   fprintf(stderr,
> +   "drmGetDevices2() returned an error %d\n",
> +   drm_count);
> +   return 0;
> +   }
> +
> +   for (i = 0; i < drm_count; i++) {
> +   /* If this is not AMD GPU vender ID, skip*/
> +   if (devices[i]->bustype == DRM_BUS_PCI)
> +   if (devices[i]->deviceinfo.pci->vendor_id != 0x1002)
> +   continue;
> +
> +   for (j = 0; j < DRM_NODE_MAX; j++) {
> +   if (devices[i]->available_nodes & 1 << j) {
> +   fd = open(
> +   devices[i]->nodes[j],
> +   O_RDONLY | O_CLOEXEC,
> +   0);
> +   if (fd < 0) continue;
> +   }
You don't need to iterate over all the available nodes. Just fetch the
PRIMARY or RENDER based on open_render_node.
Note that a device can be missing some node types (say RENDER) so make
sure the available_nodes bitmask is set.

[Alex Bin Xie]:
That was my original design, but later I changed.
I knew this work for the current xf86drm.c. But is the nodes[0]  always primary?
I was afraid that this may be changed in future. I searched all drm tests. I 
did not see an example.

So amdgpu_test will be the first to access nodes like: 
open(devices[i]->nodes[DRM_NODE_PRIMARY]), for example.

> +   if (open_render_node)
> +   device_name = 
> drmGetRenderDeviceNameFromFd(fd);
> +   else
> +   device_name = 
> drmGetPrimaryDeviceNameFromFd(fd);
> +
> +   close(fd);
> +
> +   drm_amdgpu[amd_index] = open(device_name,
> +   O_RDWR | O_CLOEXEC);
> +
> +   if (drm_amdgpu[amd_index] >= 0)
> +   amd_index++;
> +
> +   free(device_name);
> +
With the above comment this becomes redundant.

> +   /* We have open this device. Go to next device.*/
> +   break;
> +   }
> +   }
> +
Here you want to initialise the remainder of drm_amdgpu[] (since
drm_count can be less than MAX_CARDS_SUPPORTED) with -1.
Otherwise you'll have fun experiences during close/print (below).

[Alex Bin Xie]: This initialization is done in existing main() function (not 
introduced this patch).

-Emil
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: More specific libdrm error message

2017-01-16 Thread Xie, AlexBin
Hi Emil,


Thanks for the advice. I noticed that you have changed some tests using 
drmGetDevices2.

I mark this to my to do list.


Thanks,

Alex Bin Xie


From: Emil Velikov <emil.l.veli...@gmail.com>
Sent: Thursday, January 12, 2017 2:26 PM
To: Edward O'Callaghan
Cc: Xie, AlexBin; amd-gfx@lists.freedesktop.org
Subject: Re: More specific libdrm error message

On 12 January 2017 at 01:29, Edward O'Callaghan
<funfunc...@folklore1984.net> wrote:
> Hi Xie,
>
> Perhaps you want to use `fprintf(stderr, "...")` over `printf("..")` and
> lose the space before the start parenthesis. Also, line wrap
> your commit message.
>
> Side note, use git send-email so that the patch is inline and not a HTML
> email for easy review and application of the patch.
>
Thanks Edward.

Xie, suggesting to run almost anything as root is a bad idea ;-)
Instead one could auth, in order to have the correct permissions.
Alternatively can use the renderD node all together.

Idea for future work:
Would be even better to use drmGetDevices2 to fetch all the devices
and use the correct node since card0/renderD128 is not guaranteed to
be a amdgpu one. Or maybe even run all the amdgpu devices through the
tests ?

Please correctly wrap commit messages and code.

Thanks
Emil
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


More specific libdrm error message

2017-01-11 Thread Xie, AlexBin
Hi,

Provide more specific error message if non-privileged user runs amdgpu_test

Before this change, the error message is:" WARNING - Suite initialization 
failed..." People might think this is a driver problem.


Tested with non-privileged user. Now the error message is like.

...
Error:Permission denied. Hint: Try to run this test program as root.
WARNING - Suite initialization failed for 'Basic Tests'.

...



Tested as root with no regression.



Please review.


Thanks,

Alex Bin Xie
From 3cb44de3c4baa3db514dbcb180c4fdae128d4864 Mon Sep 17 00:00:00 2001
From: Alex Xie <alexbin@amd.com>
Date: Wed, 11 Jan 2017 16:00:47 -0500
Subject: [PATCH libdrm] amdgpu: Provide more specific error message if
 non-privileged user runs amdgpu_test

Before this change, the error message is:" WARNING - Suite initialization failed..." People might think this is a driver problem.

Change-Id: Ib891c40ec812053f49ce5a99909455ac3137e32c
Signed-off-by: Alex Xie <alexbin@amd.com>
---
 tests/amdgpu/basic_tests.c | 5 -
 tests/amdgpu/bo_tests.c| 6 +-
 tests/amdgpu/cs_tests.c| 6 +-
 tests/amdgpu/vce_tests.c   | 6 +-
 4 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/tests/amdgpu/basic_tests.c b/tests/amdgpu/basic_tests.c
index 11f6a63..b95f3f4 100644
--- a/tests/amdgpu/basic_tests.c
+++ b/tests/amdgpu/basic_tests.c
@@ -206,8 +206,11 @@ int suite_basic_tests_init(void)
 
 	if (r == 0)
 		return CUE_SUCCESS;
-	else
+	else {
+		if ((r == -EACCES) && (errno == EACCES))
+			printf ("\n\nError:%m. Hint: Try to run this test program as root.");
 		return CUE_SINIT_FAILED;
+	}
 }
 
 int suite_basic_tests_clean(void)
diff --git a/tests/amdgpu/bo_tests.c b/tests/amdgpu/bo_tests.c
index 993895d..7f4550d 100644
--- a/tests/amdgpu/bo_tests.c
+++ b/tests/amdgpu/bo_tests.c
@@ -65,8 +65,12 @@ int suite_bo_tests_init(void)
 
 	r = amdgpu_device_initialize(drm_amdgpu[0], _version,
   _version, _handle);
-	if (r)
+	if (r) {
+		if ((r == -EACCES) && (errno == EACCES))
+			printf ("\n\nError:%m. Hint: Try to run this test program as root.");
+
 		return CUE_SINIT_FAILED;
+	}
 
 	req.alloc_size = BUFFER_SIZE;
 	req.phys_alignment = BUFFER_ALIGN;
diff --git a/tests/amdgpu/cs_tests.c b/tests/amdgpu/cs_tests.c
index a01ee48..0432332 100644
--- a/tests/amdgpu/cs_tests.c
+++ b/tests/amdgpu/cs_tests.c
@@ -76,8 +76,12 @@ int suite_cs_tests_init(void)
 
 	r = amdgpu_device_initialize(drm_amdgpu[0], _version,
  _version, _handle);
-	if (r)
+	if (r) {
+		if ((r == -EACCES) && (errno == EACCES))
+			printf ("\n\nError:%m. Hint: Try to run this test program as root.");
+
 		return CUE_SINIT_FAILED;
+	}
 
 	family_id = device_handle->info.family_id;
 	/* VI asic POLARIS10/11 have specific external_rev_id */
diff --git a/tests/amdgpu/vce_tests.c b/tests/amdgpu/vce_tests.c
index 4915170..8c7c8c6 100644
--- a/tests/amdgpu/vce_tests.c
+++ b/tests/amdgpu/vce_tests.c
@@ -94,8 +94,12 @@ int suite_vce_tests_init(void)
 
 	r = amdgpu_device_initialize(drm_amdgpu[0], _version,
  _version, _handle);
-	if (r)
+	if (r) {
+		if ((r == -EACCES) && (errno == EACCES))
+			printf ("\n\nError:%m. Hint: Try to run this test program as root.");
+
 		return CUE_SINIT_FAILED;
+	}
 
 	family_id = device_handle->info.family_id;
 	vce_harvest_config = device_handle->info.vce_harvest_config;
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: More specific libdrm error message

2017-01-11 Thread Xie, AlexBin
Hi,


v2: Use strerror instead of %m. %m is a GNU C Library extension.

Thanks,

Alex Bin Xie


From: Xie, AlexBin
Sent: Wednesday, January 11, 2017 4:14 PM
To: amd-gfx@lists.freedesktop.org
Subject: More specific libdrm error message

Hi,

Provide more specific error message if non-privileged user runs amdgpu_test

Before this change, the error message is:" WARNING - Suite initialization 
failed..." People might think this is a driver problem.


Tested with non-privileged user. Now the error message is like.

...
Error:Permission denied. Hint: Try to run this test program as root.
WARNING - Suite initialization failed for 'Basic Tests'.

...



Tested as root with no regression.



Please review.


Thanks,

Alex Bin Xie
From 02fe2acc7a1aec34a9f1ae687fb6210a848b308e Mon Sep 17 00:00:00 2001
From: Alex Xie <alexbin@amd.com>
Date: Wed, 11 Jan 2017 16:43:21 -0500
Subject: [PATCH libdrm] amdgpu: Provide more specific error message if
 non-privileged user runs amdgpu_test

Before this change, the error message is:" WARNING - Suite initialization failed..." People might think this is a driver problem.

v2: Use strerror instead of %m. %m is a GNU C Library extension.

Change-Id: Ib891c40ec812053f49ce5a99909455ac3137e32c
Signed-off-by: Alex Xie <alexbin@amd.com>
---
 tests/amdgpu/basic_tests.c | 5 -
 tests/amdgpu/bo_tests.c| 6 +-
 tests/amdgpu/cs_tests.c| 6 +-
 tests/amdgpu/vce_tests.c   | 6 +-
 4 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/tests/amdgpu/basic_tests.c b/tests/amdgpu/basic_tests.c
index 11f6a63..7a47274 100644
--- a/tests/amdgpu/basic_tests.c
+++ b/tests/amdgpu/basic_tests.c
@@ -206,8 +206,11 @@ int suite_basic_tests_init(void)
 
 	if (r == 0)
 		return CUE_SUCCESS;
-	else
+	else {
+		if ((r == -EACCES) && (errno == EACCES))
+			printf ("\n\nError:%s. Hint: Try to run this test program as root.", strerror(errno));
 		return CUE_SINIT_FAILED;
+	}
 }
 
 int suite_basic_tests_clean(void)
diff --git a/tests/amdgpu/bo_tests.c b/tests/amdgpu/bo_tests.c
index 993895d..37ee6d1 100644
--- a/tests/amdgpu/bo_tests.c
+++ b/tests/amdgpu/bo_tests.c
@@ -65,8 +65,12 @@ int suite_bo_tests_init(void)
 
 	r = amdgpu_device_initialize(drm_amdgpu[0], _version,
   _version, _handle);
-	if (r)
+	if (r) {
+		if ((r == -EACCES) && (errno == EACCES))
+			printf ("\n\nError:%s. Hint: Try to run this test program as root.", strerror(errno));
+
 		return CUE_SINIT_FAILED;
+	}
 
 	req.alloc_size = BUFFER_SIZE;
 	req.phys_alignment = BUFFER_ALIGN;
diff --git a/tests/amdgpu/cs_tests.c b/tests/amdgpu/cs_tests.c
index a01ee48..ce723a9 100644
--- a/tests/amdgpu/cs_tests.c
+++ b/tests/amdgpu/cs_tests.c
@@ -76,8 +76,12 @@ int suite_cs_tests_init(void)
 
 	r = amdgpu_device_initialize(drm_amdgpu[0], _version,
  _version, _handle);
-	if (r)
+	if (r) {
+		if ((r == -EACCES) && (errno == EACCES))
+			printf ("\n\nError:%s. Hint: Try to run this test program as root.", strerror(errno));
+
 		return CUE_SINIT_FAILED;
+	}
 
 	family_id = device_handle->info.family_id;
 	/* VI asic POLARIS10/11 have specific external_rev_id */
diff --git a/tests/amdgpu/vce_tests.c b/tests/amdgpu/vce_tests.c
index 4915170..2936ef5 100644
--- a/tests/amdgpu/vce_tests.c
+++ b/tests/amdgpu/vce_tests.c
@@ -94,8 +94,12 @@ int suite_vce_tests_init(void)
 
 	r = amdgpu_device_initialize(drm_amdgpu[0], _version,
  _version, _handle);
-	if (r)
+	if (r) {
+		if ((r == -EACCES) && (errno == EACCES))
+			printf ("\n\nError:%s. Hint: Try to run this test program as root.", strerror(errno));
+
 		return CUE_SINIT_FAILED;
+	}
 
 	family_id = device_handle->info.family_id;
 	vce_harvest_config = device_handle->info.vce_harvest_config;
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx