Re: [PATCH] drm/amdgpu: clear seq64 memory on free

2024-04-16 Thread Paneer Selvam, Arunpravin

Hi Christian,

On 4/16/2024 6:24 PM, Christian König wrote:

Am 16.04.24 um 14:34 schrieb Paneer Selvam, Arunpravin:

Hi Christian,

On 4/16/2024 5:47 PM, Christian König wrote:

Am 16.04.24 um 14:16 schrieb Paneer Selvam, Arunpravin:

Hi Christian,

On 4/16/2024 2:35 PM, Christian König wrote:

Am 15.04.24 um 20:48 schrieb Arunpravin Paneer Selvam:

We should clear the memory on free. Otherwise,
there is a chance that we will access the previous
application data and this would leads to an abnormal
behaviour in the current application.


Mhm, I would strongly expect that we initialize the seq number 
after allocation.


It could be that we also have situations were the correct start 
value is 0x or something like that instead.


Why does this matter?
when the user queue A's u64 address (fence address) is allocated to 
the newly created user queue B, we see a problem.
User queue B calls the signal IOCTL which creates a new fence 
having the wptr as the seq number, in
amdgpu_userq_fence_create function we have a check where we are 
comparing the rptr and wptr value (rptr >= wptr).
since the rptr value is read from the u64 address which has the 
user queue A's wptr data, this rptr >= wptr condition
gets satisfied and we are dropping the reference before the actual 
command gets processed in the hardware.

If we clear this u64 value on free, we dont see this problem.


Yeah, but that doesn't belongs into the seq64 handling.

Instead the code which allocates the seq64 during userqueue created 
needs to clear it to 0.

sure, got it.


Yeah, but fixing that aside. We should probably initialize the seq64 
array to something like 0xdeadbeef or a similar pattern to find issues 
were we forget to initialize the allocated slots.

yes, I will prepare a patch and send for the review.

Thanks,
Arun.


Regards,
Christian.



Thanks,
Arun.


Regards,
Christian.



Thanks,
Arun.


Regards,
Christian.



Signed-off-by: Arunpravin Paneer Selvam 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c

index 4b9afc4df031..9613992c9804 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -189,10 +189,14 @@ int amdgpu_seq64_alloc(struct amdgpu_device 
*adev, u64 *va, u64 **cpu_addr)

  void amdgpu_seq64_free(struct amdgpu_device *adev, u64 va)
  {
  unsigned long bit_pos;
+    u64 *cpu_addr;
    bit_pos = (va - amdgpu_seq64_get_va_base(adev)) / 
sizeof(u64);

-    if (bit_pos < adev->seq64.num_sem)
+    if (bit_pos < adev->seq64.num_sem) {
+    cpu_addr = bit_pos + adev->seq64.cpu_base_addr;
+    memset(cpu_addr, 0, sizeof(u64));
  __clear_bit(bit_pos, adev->seq64.used);
+    }
  }
    /**














Re: [PATCH] drm/amdgpu: clear seq64 memory on free

2024-04-16 Thread Christian König

Am 16.04.24 um 14:34 schrieb Paneer Selvam, Arunpravin:

Hi Christian,

On 4/16/2024 5:47 PM, Christian König wrote:

Am 16.04.24 um 14:16 schrieb Paneer Selvam, Arunpravin:

Hi Christian,

On 4/16/2024 2:35 PM, Christian König wrote:

Am 15.04.24 um 20:48 schrieb Arunpravin Paneer Selvam:

We should clear the memory on free. Otherwise,
there is a chance that we will access the previous
application data and this would leads to an abnormal
behaviour in the current application.


Mhm, I would strongly expect that we initialize the seq number 
after allocation.


It could be that we also have situations were the correct start 
value is 0x or something like that instead.


Why does this matter?
when the user queue A's u64 address (fence address) is allocated to 
the newly created user queue B, we see a problem.
User queue B calls the signal IOCTL which creates a new fence having 
the wptr as the seq number, in
amdgpu_userq_fence_create function we have a check where we are 
comparing the rptr and wptr value (rptr >= wptr).
since the rptr value is read from the u64 address which has the user 
queue A's wptr data, this rptr >= wptr condition
gets satisfied and we are dropping the reference before the actual 
command gets processed in the hardware.

If we clear this u64 value on free, we dont see this problem.


Yeah, but that doesn't belongs into the seq64 handling.

Instead the code which allocates the seq64 during userqueue created 
needs to clear it to 0.

sure, got it.


Yeah, but fixing that aside. We should probably initialize the seq64 
array to something like 0xdeadbeef or a similar pattern to find issues 
were we forget to initialize the allocated slots.


Regards,
Christian.



Thanks,
Arun.


Regards,
Christian.



Thanks,
Arun.


Regards,
Christian.



Signed-off-by: Arunpravin Paneer Selvam 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c

index 4b9afc4df031..9613992c9804 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -189,10 +189,14 @@ int amdgpu_seq64_alloc(struct amdgpu_device 
*adev, u64 *va, u64 **cpu_addr)

  void amdgpu_seq64_free(struct amdgpu_device *adev, u64 va)
  {
  unsigned long bit_pos;
+    u64 *cpu_addr;
    bit_pos = (va - amdgpu_seq64_get_va_base(adev)) / 
sizeof(u64);

-    if (bit_pos < adev->seq64.num_sem)
+    if (bit_pos < adev->seq64.num_sem) {
+    cpu_addr = bit_pos + adev->seq64.cpu_base_addr;
+    memset(cpu_addr, 0, sizeof(u64));
  __clear_bit(bit_pos, adev->seq64.used);
+    }
  }
    /**












Re: [PATCH] drm/amdgpu: clear seq64 memory on free

2024-04-16 Thread Paneer Selvam, Arunpravin

Hi Christian,

On 4/16/2024 5:47 PM, Christian König wrote:

Am 16.04.24 um 14:16 schrieb Paneer Selvam, Arunpravin:

Hi Christian,

On 4/16/2024 2:35 PM, Christian König wrote:

Am 15.04.24 um 20:48 schrieb Arunpravin Paneer Selvam:

We should clear the memory on free. Otherwise,
there is a chance that we will access the previous
application data and this would leads to an abnormal
behaviour in the current application.


Mhm, I would strongly expect that we initialize the seq number after 
allocation.


It could be that we also have situations were the correct start 
value is 0x or something like that instead.


Why does this matter?
when the user queue A's u64 address (fence address) is allocated to 
the newly created user queue B, we see a problem.
User queue B calls the signal IOCTL which creates a new fence having 
the wptr as the seq number, in
amdgpu_userq_fence_create function we have a check where we are 
comparing the rptr and wptr value (rptr >= wptr).
since the rptr value is read from the u64 address which has the user 
queue A's wptr data, this rptr >= wptr condition
gets satisfied and we are dropping the reference before the actual 
command gets processed in the hardware.

If we clear this u64 value on free, we dont see this problem.


Yeah, but that doesn't belongs into the seq64 handling.

Instead the code which allocates the seq64 during userqueue created 
needs to clear it to 0.

sure, got it.

Thanks,
Arun.


Regards,
Christian.



Thanks,
Arun.


Regards,
Christian.



Signed-off-by: Arunpravin Paneer Selvam 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c

index 4b9afc4df031..9613992c9804 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -189,10 +189,14 @@ int amdgpu_seq64_alloc(struct amdgpu_device 
*adev, u64 *va, u64 **cpu_addr)

  void amdgpu_seq64_free(struct amdgpu_device *adev, u64 va)
  {
  unsigned long bit_pos;
+    u64 *cpu_addr;
    bit_pos = (va - amdgpu_seq64_get_va_base(adev)) / sizeof(u64);
-    if (bit_pos < adev->seq64.num_sem)
+    if (bit_pos < adev->seq64.num_sem) {
+    cpu_addr = bit_pos + adev->seq64.cpu_base_addr;
+    memset(cpu_addr, 0, sizeof(u64));
  __clear_bit(bit_pos, adev->seq64.used);
+    }
  }
    /**










Re: [PATCH] drm/amdgpu: clear seq64 memory on free

2024-04-16 Thread Christian König

Am 16.04.24 um 14:16 schrieb Paneer Selvam, Arunpravin:

Hi Christian,

On 4/16/2024 2:35 PM, Christian König wrote:

Am 15.04.24 um 20:48 schrieb Arunpravin Paneer Selvam:

We should clear the memory on free. Otherwise,
there is a chance that we will access the previous
application data and this would leads to an abnormal
behaviour in the current application.


Mhm, I would strongly expect that we initialize the seq number after 
allocation.


It could be that we also have situations were the correct start value 
is 0x or something like that instead.


Why does this matter?
when the user queue A's u64 address (fence address) is allocated to 
the newly created user queue B, we see a problem.
User queue B calls the signal IOCTL which creates a new fence having 
the wptr as the seq number, in
amdgpu_userq_fence_create function we have a check where we are 
comparing the rptr and wptr value (rptr >= wptr).
since the rptr value is read from the u64 address which has the user 
queue A's wptr data, this rptr >= wptr condition
gets satisfied and we are dropping the reference before the actual 
command gets processed in the hardware.

If we clear this u64 value on free, we dont see this problem.


Yeah, but that doesn't belongs into the seq64 handling.

Instead the code which allocates the seq64 during userqueue created 
needs to clear it to 0.


Regards,
Christian.



Thanks,
Arun.


Regards,
Christian.



Signed-off-by: Arunpravin Paneer Selvam 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c

index 4b9afc4df031..9613992c9804 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -189,10 +189,14 @@ int amdgpu_seq64_alloc(struct amdgpu_device 
*adev, u64 *va, u64 **cpu_addr)

  void amdgpu_seq64_free(struct amdgpu_device *adev, u64 va)
  {
  unsigned long bit_pos;
+    u64 *cpu_addr;
    bit_pos = (va - amdgpu_seq64_get_va_base(adev)) / sizeof(u64);
-    if (bit_pos < adev->seq64.num_sem)
+    if (bit_pos < adev->seq64.num_sem) {
+    cpu_addr = bit_pos + adev->seq64.cpu_base_addr;
+    memset(cpu_addr, 0, sizeof(u64));
  __clear_bit(bit_pos, adev->seq64.used);
+    }
  }
    /**








Re: [PATCH] drm/amdgpu: clear seq64 memory on free

2024-04-16 Thread Paneer Selvam, Arunpravin

Hi Christian,

On 4/16/2024 2:35 PM, Christian König wrote:

Am 15.04.24 um 20:48 schrieb Arunpravin Paneer Selvam:

We should clear the memory on free. Otherwise,
there is a chance that we will access the previous
application data and this would leads to an abnormal
behaviour in the current application.


Mhm, I would strongly expect that we initialize the seq number after 
allocation.


It could be that we also have situations were the correct start value 
is 0x or something like that instead.


Why does this matter?
when the user queue A's u64 address (fence address) is allocated to the 
newly created user queue B, we see a problem.
User queue B calls the signal IOCTL which creates a new fence having the 
wptr as the seq number, in
amdgpu_userq_fence_create function we have a check where we are 
comparing the rptr and wptr value (rptr >= wptr).
since the rptr value is read from the u64 address which has the user 
queue A's wptr data, this rptr >= wptr condition
gets satisfied and we are dropping the reference before the actual 
command gets processed in the hardware.

If we clear this u64 value on free, we dont see this problem.

Thanks,
Arun.


Regards,
Christian.



Signed-off-by: Arunpravin Paneer Selvam 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c

index 4b9afc4df031..9613992c9804 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -189,10 +189,14 @@ int amdgpu_seq64_alloc(struct amdgpu_device 
*adev, u64 *va, u64 **cpu_addr)

  void amdgpu_seq64_free(struct amdgpu_device *adev, u64 va)
  {
  unsigned long bit_pos;
+    u64 *cpu_addr;
    bit_pos = (va - amdgpu_seq64_get_va_base(adev)) / sizeof(u64);
-    if (bit_pos < adev->seq64.num_sem)
+    if (bit_pos < adev->seq64.num_sem) {
+    cpu_addr = bit_pos + adev->seq64.cpu_base_addr;
+    memset(cpu_addr, 0, sizeof(u64));
  __clear_bit(bit_pos, adev->seq64.used);
+    }
  }
    /**






Re: [PATCH] drm/amdgpu: clear seq64 memory on free

2024-04-16 Thread Christian König

Am 15.04.24 um 20:48 schrieb Arunpravin Paneer Selvam:

We should clear the memory on free. Otherwise,
there is a chance that we will access the previous
application data and this would leads to an abnormal
behaviour in the current application.


Mhm, I would strongly expect that we initialize the seq number after 
allocation.


It could be that we also have situations were the correct start value is 
0x or something like that instead.


Why does this matter?

Regards,
Christian.



Signed-off-by: Arunpravin Paneer Selvam 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
index 4b9afc4df031..9613992c9804 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -189,10 +189,14 @@ int amdgpu_seq64_alloc(struct amdgpu_device *adev, u64 
*va, u64 **cpu_addr)
  void amdgpu_seq64_free(struct amdgpu_device *adev, u64 va)
  {
unsigned long bit_pos;
+   u64 *cpu_addr;
  
  	bit_pos = (va - amdgpu_seq64_get_va_base(adev)) / sizeof(u64);

-   if (bit_pos < adev->seq64.num_sem)
+   if (bit_pos < adev->seq64.num_sem) {
+   cpu_addr = bit_pos + adev->seq64.cpu_base_addr;
+   memset(cpu_addr, 0, sizeof(u64));
__clear_bit(bit_pos, adev->seq64.used);
+   }
  }
  
  /**




[PATCH] drm/amdgpu: clear seq64 memory on free

2024-04-15 Thread Arunpravin Paneer Selvam
We should clear the memory on free. Otherwise,
there is a chance that we will access the previous
application data and this would leads to an abnormal
behaviour in the current application.

Signed-off-by: Arunpravin Paneer Selvam 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
index 4b9afc4df031..9613992c9804 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -189,10 +189,14 @@ int amdgpu_seq64_alloc(struct amdgpu_device *adev, u64 
*va, u64 **cpu_addr)
 void amdgpu_seq64_free(struct amdgpu_device *adev, u64 va)
 {
unsigned long bit_pos;
+   u64 *cpu_addr;
 
bit_pos = (va - amdgpu_seq64_get_va_base(adev)) / sizeof(u64);
-   if (bit_pos < adev->seq64.num_sem)
+   if (bit_pos < adev->seq64.num_sem) {
+   cpu_addr = bit_pos + adev->seq64.cpu_base_addr;
+   memset(cpu_addr, 0, sizeof(u64));
__clear_bit(bit_pos, adev->seq64.used);
+   }
 }
 
 /**
-- 
2.25.1