Re: [Nouveau] [PATCH v3] nv110/exa: update sched codes

2017-06-12 Thread Samuel Pitoiset



On 06/10/2017 09:14 AM, Aaryaman Vasishta wrote:
See the 'wt' on the first fmul in exacanv110.fp, exacmnv110.fp and 
exasanv110.fp. Any ideas on what could be causing the first fmul to 
require $r0 and/or $r1?


'tex nodep $r4 $r2 0x0 0x1 t2d 0xf'

is actually:

'tex nodep $r4:$r7 $r2 0x0 0x1 t2d 0xf'

Very confusing, I know.



Cheers,
Aaryaman

On Sat, Jun 10, 2017 at 4:10 PM, Aaryaman Vasishta 
> wrote:


This patch adds proper delays to maxwell exa shaders. rendercheck tests
seem consistent with/without this patch. I haven't extensively tested
them though.

Trello:
https://trello.com/c/6LPB2EIS/174-update-maxwell-shaders-with-proper-delays



Signed-off-by: Aaryaman Vasishta >
---
  src/shader/exac8nv110.fp  | 10 +-
  src/shader/exac8nv110.fpc | 18 +-
  src/shader/exacanv110.fp  | 10 +-
  src/shader/exacanv110.fpc | 18 +-
  src/shader/exacmnv110.fp  | 10 +-
  src/shader/exacmnv110.fpc | 18 +-
  src/shader/exas8nv110.fp  |  6 +++---
  src/shader/exas8nv110.fpc | 12 ++--
  src/shader/exasanv110.fp  | 10 +-
  src/shader/exasanv110.fpc | 18 +-
  src/shader/exascnv110.fp  |  6 +++---
  src/shader/exascnv110.fpc | 10 +-
  src/shader/videonv110.fp  | 14 +++---
  src/shader/videonv110.fpc | 26 +-
  14 files changed, 93 insertions(+), 93 deletions(-)

diff --git a/src/shader/exac8nv110.fp b/src/shader/exac8nv110.fp
index ce78036..101b67f 100644
--- a/src/shader/exac8nv110.fp
+++ b/src/shader/exac8nv110.fp
@@ -25,23 +25,23 @@ NV110FP_Composite_A8[] = {
  };
  #else

-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x0) (st 0xd wr 0x0 wt 0x1) (st 0xf wr 0x0 wt 0x1)
  ipa pass $r0 a[0x7c] 0x0 0x0 0x1
  mufu rcp $r0 $r0
  ipa $r3 a[0x94] $r0 0x0 0x1
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x1) (st 0xf wr 0x0 rd 0x1 wt 0x3) (st 0xf wr 0x1
wt 0x2)
  ipa $r2 a[0x90] $r0 0x0 0x1
  tex nodep $r1 $r2 0x0 0x1 t2d 0x8
  ipa $r3 a[0x84] $r0 0x0 0x1
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x2) (st 0xf wr 0x1 wt 0x6) (st 0xf)
  ipa $r2 a[0x80] $r0 0x0 0x1
  tex nodep $r0 $r2 0x0 0x0 t2d 0x8
  depbar le 0x5 0x0 0x0
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0x6 wt 0x3) (st 0x1) (st 0x1)
  fmul ftz $r3 $r0 $r1
  mov $r2 $r3 0xf
  mov $r1 $r3 0xf
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0x1) (st 0xf) (st 0x0)
  mov $r0 $r3 0xf
  exit
  #endif
diff --git a/src/shader/exac8nv110.fpc b/src/shader/exac8nv110.fpc
index 4aa1368..1f7d649 100644
--- a/src/shader/exac8nv110.fpc
+++ b/src/shader/exac8nv110.fpc
@@ -1,36 +1,36 @@
-0xfc0007e0,
-0x001f8000,
+0xe1a0070f,
+0x003c3c01,
  0xcff7ff00,
  0xe003ff87,
  0x0047,
  0x5080,
  0x4007ff03,
  0xe043ff89,
-0xfc0007e0,
-0x001f8000,
+0x21e0072f,
+0x005cbc03,
  0x0007ff02,
  0xe043ff89,
  0x2ff70201,
  0xc03a0014,
  0x4007ff03,
  0xe043ff88,
-0xfc0007e0,
-0x001f8000,
+0xe5e0074f,
+0x001fbc06,
  0x0007ff02,
  0xe043ff88,
  0x2ff70200,
  0xc03a0004,
  0x3407,
  0xf0f0,
-0xfc0007e0,
-0x001f8000,
+0xfc201fe6,
+0x001f8400,
  0x00170003,
  0x5c681000,
  0x00370002,
  0x5c980780,
  0x00370001,
  0x5c980780,
-0xfc0007e0,
+0xfde007e1,
  0x001f8000,
  0x0037,
  0x5c980780,
diff --git a/src/shader/exacanv110.fp b/src/shader/exacanv110.fp
index a70d5c5..8a9bd43 100644
--- a/src/shader/exacanv110.fp
+++ b/src/shader/exacanv110.fp
@@ -25,23 +25,23 @@ NV110FP_CAComposite[] = {
  };
  #else

-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x0) (st 0xd wr 0x0 wt 0x1) (st 0xf wr 0x0 wt 0x1)
  ipa pass $r0 a[0x7c] 0x0 0x0 0x1
  mufu rcp $r0 $r0
  ipa $r3 a[0x94] $r0 0x0 0x1
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x1) (st 0xf wr 0x0 rd 0x1 wt 0x3) (st 0xf wr 0x2)
  ipa $r2 a[0x90] $r0 0x0 0x1
  tex nodep $r4 $r2 0x0 0x1 t2d 0xf
  ipa $r1 a[0x84] $r0 0x0 0x1
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x2 wt 0x4) (st 0xf wr 0x2 wt 0x4) (st 0xf)
  ipa $r0 a[0x80] $r0 0x0 0x1
  tex nodep $r0 $r0 0x0 0x0 t2d 0xf
  depbar le 0x5 0x0 0x0
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0x1 wt 0x4) (st 0x1) (st 0x1)
  fmul ftz $r3 $r3 $r7
  fmul ftz $r2 $r2 $r6
  fmul ftz $r1 $r1 $r5
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0x1 wt 0x1) (st 0xf) (st 0x0)
  fmul ftz 

Re: [Nouveau] [PATCH v2] nv110/exa: update sched codes

2017-06-09 Thread Samuel Pitoiset



On 06/08/2017 05:19 PM, Aaryaman Vasishta wrote:



On Thu, Jun 8, 2017 at 5:01 AM, Samuel Pitoiset 
<samuel.pitoi...@gmail.com <mailto:samuel.pitoi...@gmail.com>> wrote:




On 06/07/2017 06:58 PM, Aaryaman Vasishta wrote:



On Tue, Jun 6, 2017 at 7:15 AM, Samuel Pitoiset
<samuel.pitoi...@gmail.com <mailto:samuel.pitoi...@gmail.com>
<mailto:samuel.pitoi...@gmail.com
<mailto:samuel.pitoi...@gmail.com>>> wrote:

 Nice work!

 See my comments below, and double-check if some of them can be
 applied to the shaders I didn't review yet.

 I recommend you to test your work because if one sched code is
 wrong, you are likely going to kill your card and reboot
your box. :-)


 On 06/03/2017 04:16 PM, Aaryaman Vasishta wrote:

 v2: Add missing delays

 This patch adds proper delays to maxwell exa shaders.
 rendercheck tests
 seem consistent with/without this patch. I haven't
extensively
 tested
 them though.

 Trello:

https://trello.com/c/6LPB2EIS/174-update-maxwell-shaders-with-proper-delays

<https://trello.com/c/6LPB2EIS/174-update-maxwell-shaders-with-proper-delays>

<https://trello.com/c/6LPB2EIS/174-update-maxwell-shaders-with-proper-delays


<https://trello.com/c/6LPB2EIS/174-update-maxwell-shaders-with-proper-delays>>

 Signed-off-by: Aaryaman Vasishta
<jem456.vasis...@gmail.com <mailto:jem456.vasis...@gmail.com>
 <mailto:jem456.vasis...@gmail.com
<mailto:jem456.vasis...@gmail.com>>>

 ---
src/shader/exac8nv110.fp  | 10 +-
src/shader/exac8nv110.fpc | 18 +-
src/shader/exacanv110.fp  | 10 +-
src/shader/exacanv110.fpc | 18 +-
src/shader/exacmnv110.fp  | 10 +-
src/shader/exacmnv110.fpc | 18 +-
src/shader/exas8nv110.fp  |  6 +++---
src/shader/exas8nv110.fpc | 12 ++--
src/shader/exasanv110.fp  | 10 +-
src/shader/exasanv110.fpc | 18 +-
src/shader/exascnv110.fp  |  6 +++---
src/shader/exascnv110.fpc | 10 +-
src/shader/videonv110.fp  | 14 +++---
src/shader/videonv110.fpc | 26
+-
14 files changed, 93 insertions(+), 93 deletions(-)

 diff --git a/src/shader/exac8nv110.fp
b/src/shader/exac8nv110.fp
 index ce78036..1c4a4f1 100644
 --- a/src/shader/exac8nv110.fp
 +++ b/src/shader/exac8nv110.fp
 @@ -25,23 +25,23 @@ NV110FP_Composite_A8[] = {
};
#else
-sched (st 0x0) (st 0x0) (st 0x0)
 +sched (st 0xf wr 0x0) (st 0xd wr 0x0 wt 0x1) (st 0xf
wr 0x0 wt 0x1)
ipa pass $r0 a[0x7c] 0x0 0x0 0x1
mufu rcp $r0 $r0
ipa $r3 a[0x94] $r0 0x0 0x1
 -sched (st 0x0) (st 0x0) (st 0x0)
 +sched (st 0xf wr 0x1) (st 0xf wr 0x0 rd 0x1 wt 0x3)
(st 0xf wr
 0x1 wt 0x2)
ipa $r2 a[0x90] $r0 0x0 0x1
tex nodep $r1 $r2 0x0 0x1 t2d 0x8
ipa $r3 a[0x84] $r0 0x0 0x1
 -sched (st 0x0) (st 0x0) (st 0x0)
 +sched (st 0xf wr 0x2) (st 0xf wr 0x1 wt 0x6) (st 0xf)
ipa $r2 a[0x80] $r0 0x0 0x1
tex nodep $r0 $r2 0x0 0x0 t2d 0x8


 Out of curiosity, what didn't you add a read-dep-bar on
$r2:$r3 here?

Missed it, thanks for pointing it out.


You don't have to. 'tex' reads two sources ($r2:$r3) and writes into
$r0, but as $r2:$r3 are NOT re-used before $r0 is read, you can
assume that $r0 will be ready and don't need any read-dep-bar.

Ah, so r2:r3, which are written on by the two 'ipa' above it, have 
already been waited on in this tex, and both of them read $r0 so we can 
safely assume that since the two 'ipa' instructions are already waited 
on, $r0 will be ready?


No.

It's because the next 'fmul' waits for $r0 (output of 'tex'). So, if $r0 
is "ready", you can assume that $r2:$r3 can be re-used. It's a 
particular situation which doesn't need to emit any read-dep-bars, you 
can add them if you want but that's useless.









depbar le 0x5 0x0 0x0
 -sched (st 0x0) (st 0x0) (st 0x0)
  

Re: [Nouveau] [PATCH v2] nv110/exa: update sched codes

2017-06-07 Thread Samuel Pitoiset



On 06/07/2017 06:58 PM, Aaryaman Vasishta wrote:



On Tue, Jun 6, 2017 at 7:15 AM, Samuel Pitoiset 
<samuel.pitoi...@gmail.com <mailto:samuel.pitoi...@gmail.com>> wrote:


Nice work!

See my comments below, and double-check if some of them can be
applied to the shaders I didn't review yet.

I recommend you to test your work because if one sched code is
wrong, you are likely going to kill your card and reboot your box. :-)


On 06/03/2017 04:16 PM, Aaryaman Vasishta wrote:

v2: Add missing delays

This patch adds proper delays to maxwell exa shaders.
rendercheck tests
seem consistent with/without this patch. I haven't extensively
tested
them though.

Trello:

https://trello.com/c/6LPB2EIS/174-update-maxwell-shaders-with-proper-delays

<https://trello.com/c/6LPB2EIS/174-update-maxwell-shaders-with-proper-delays>

Signed-off-by: Aaryaman Vasishta <jem456.vasis...@gmail.com
<mailto:jem456.vasis...@gmail.com>>
---
   src/shader/exac8nv110.fp  | 10 +-
   src/shader/exac8nv110.fpc | 18 +-
   src/shader/exacanv110.fp  | 10 +-
   src/shader/exacanv110.fpc | 18 +-
   src/shader/exacmnv110.fp  | 10 +-
   src/shader/exacmnv110.fpc | 18 +-
   src/shader/exas8nv110.fp  |  6 +++---
   src/shader/exas8nv110.fpc | 12 ++--
   src/shader/exasanv110.fp  | 10 +-
   src/shader/exasanv110.fpc | 18 +-
   src/shader/exascnv110.fp  |  6 +++---
   src/shader/exascnv110.fpc | 10 +-
   src/shader/videonv110.fp  | 14 +++---
   src/shader/videonv110.fpc | 26 +-
   14 files changed, 93 insertions(+), 93 deletions(-)

diff --git a/src/shader/exac8nv110.fp b/src/shader/exac8nv110.fp
index ce78036..1c4a4f1 100644
--- a/src/shader/exac8nv110.fp
+++ b/src/shader/exac8nv110.fp
@@ -25,23 +25,23 @@ NV110FP_Composite_A8[] = {
   };
   #else
   -sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x0) (st 0xd wr 0x0 wt 0x1) (st 0xf wr 0x0 wt 0x1)
   ipa pass $r0 a[0x7c] 0x0 0x0 0x1
   mufu rcp $r0 $r0
   ipa $r3 a[0x94] $r0 0x0 0x1
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x1) (st 0xf wr 0x0 rd 0x1 wt 0x3) (st 0xf wr
0x1 wt 0x2)
   ipa $r2 a[0x90] $r0 0x0 0x1
   tex nodep $r1 $r2 0x0 0x1 t2d 0x8
   ipa $r3 a[0x84] $r0 0x0 0x1
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x2) (st 0xf wr 0x1 wt 0x6) (st 0xf)
   ipa $r2 a[0x80] $r0 0x0 0x1
   tex nodep $r0 $r2 0x0 0x0 t2d 0x8


Out of curiosity, what didn't you add a read-dep-bar on $r2:$r3 here?

Missed it, thanks for pointing it out.


You don't have to. 'tex' reads two sources ($r2:$r3) and writes into 
$r0, but as $r2:$r3 are NOT re-used before $r0 is read, you can assume 
that $r0 will be ready and don't need any read-dep-bar.






   depbar le 0x5 0x0 0x0
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0x6 wt 0x3) (st 0x6) (st 0x1)
   fmul ftz $r3 $r0 $r1
   mov $r2 $r3 0xf


You can stall for only one cycle here, but the 6 cycles on fmul is
needed.

   mov $r1 $r3 0xf
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0x6) (st 0xf) (st 0x0)
   mov $r0 $r3 0xf


Same here. 




   exit
   #endif
diff --git a/src/shader/exac8nv110.fpc b/src/shader/exac8nv110.fpc
index 4aa1368..46943b7 100644
--- a/src/shader/exac8nv110.fpc
+++ b/src/shader/exac8nv110.fpc
@@ -1,36 +1,36 @@
-0xfc0007e0,
-0x001f8000,
+0xe1a0070f,
+0x003c3c01,
   0xcff7ff00,
   0xe003ff87,
   0x0047,
   0x5080,
   0x4007ff03,
   0xe043ff89,
-0xfc0007e0,
-0x001f8000,
+0x21e0072f,
+0x005cbc03,
   0x0007ff02,
   0xe043ff89,
   0x2ff70201,
   0xc03a0014,
   0x4007ff03,
   0xe043ff88,
-0xfc0007e0,
-0x001f8000,
+0xe5e0074f,
+0x001fbc06,
   0x0007ff02,
   0xe043ff88,
   0x2ff70200,
   0xc03a0004,
   0x3407,
   0xf0f0,
-0xfc0007e0,
-0x001f8000,
+0xfcc01fe6,
+0x001f8400,
   0x00170003,
   0x5c681000,
   0x00370002,
   0x5c980780,
   0x00370001,
   0x5c980780,
-0xfc0007e0,
+0xfde007e6,
   0x001f8000,
   0x0037,
   0x5c980780,
diff --git a/src/shader/exacanv110.fp b/src/shader/exacanv110.fp
index a70d5c5..d7c2867 1

Re: [Nouveau] [PATCH v2] nv110/exa: update sched codes

2017-06-05 Thread Samuel Pitoiset

Nice work!

See my comments below, and double-check if some of them can be applied 
to the shaders I didn't review yet.


I recommend you to test your work because if one sched code is wrong, 
you are likely going to kill your card and reboot your box. :-)


On 06/03/2017 04:16 PM, Aaryaman Vasishta wrote:

v2: Add missing delays

This patch adds proper delays to maxwell exa shaders. rendercheck tests
seem consistent with/without this patch. I haven't extensively tested
them though.

Trello:
https://trello.com/c/6LPB2EIS/174-update-maxwell-shaders-with-proper-delays

Signed-off-by: Aaryaman Vasishta 
---
  src/shader/exac8nv110.fp  | 10 +-
  src/shader/exac8nv110.fpc | 18 +-
  src/shader/exacanv110.fp  | 10 +-
  src/shader/exacanv110.fpc | 18 +-
  src/shader/exacmnv110.fp  | 10 +-
  src/shader/exacmnv110.fpc | 18 +-
  src/shader/exas8nv110.fp  |  6 +++---
  src/shader/exas8nv110.fpc | 12 ++--
  src/shader/exasanv110.fp  | 10 +-
  src/shader/exasanv110.fpc | 18 +-
  src/shader/exascnv110.fp  |  6 +++---
  src/shader/exascnv110.fpc | 10 +-
  src/shader/videonv110.fp  | 14 +++---
  src/shader/videonv110.fpc | 26 +-
  14 files changed, 93 insertions(+), 93 deletions(-)

diff --git a/src/shader/exac8nv110.fp b/src/shader/exac8nv110.fp
index ce78036..1c4a4f1 100644
--- a/src/shader/exac8nv110.fp
+++ b/src/shader/exac8nv110.fp
@@ -25,23 +25,23 @@ NV110FP_Composite_A8[] = {
  };
  #else
  
-sched (st 0x0) (st 0x0) (st 0x0)

+sched (st 0xf wr 0x0) (st 0xd wr 0x0 wt 0x1) (st 0xf wr 0x0 wt 0x1)
  ipa pass $r0 a[0x7c] 0x0 0x0 0x1
  mufu rcp $r0 $r0
  ipa $r3 a[0x94] $r0 0x0 0x1
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x1) (st 0xf wr 0x0 rd 0x1 wt 0x3) (st 0xf wr 0x1 wt 0x2)
  ipa $r2 a[0x90] $r0 0x0 0x1
  tex nodep $r1 $r2 0x0 0x1 t2d 0x8
  ipa $r3 a[0x84] $r0 0x0 0x1
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x2) (st 0xf wr 0x1 wt 0x6) (st 0xf)
  ipa $r2 a[0x80] $r0 0x0 0x1
  tex nodep $r0 $r2 0x0 0x0 t2d 0x8


Out of curiosity, what didn't you add a read-dep-bar on $r2:$r3 here?


  depbar le 0x5 0x0 0x0
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0x6 wt 0x3) (st 0x6) (st 0x1)
  fmul ftz $r3 $r0 $r1
  mov $r2 $r3 0xf


You can stall for only one cycle here, but the 6 cycles on fmul is needed.


  mov $r1 $r3 0xf
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0x6) (st 0xf) (st 0x0)
  mov $r0 $r3 0xf


Same here.


  exit
  #endif
diff --git a/src/shader/exac8nv110.fpc b/src/shader/exac8nv110.fpc
index 4aa1368..46943b7 100644
--- a/src/shader/exac8nv110.fpc
+++ b/src/shader/exac8nv110.fpc
@@ -1,36 +1,36 @@
-0xfc0007e0,
-0x001f8000,
+0xe1a0070f,
+0x003c3c01,
  0xcff7ff00,
  0xe003ff87,
  0x0047,
  0x5080,
  0x4007ff03,
  0xe043ff89,
-0xfc0007e0,
-0x001f8000,
+0x21e0072f,
+0x005cbc03,
  0x0007ff02,
  0xe043ff89,
  0x2ff70201,
  0xc03a0014,
  0x4007ff03,
  0xe043ff88,
-0xfc0007e0,
-0x001f8000,
+0xe5e0074f,
+0x001fbc06,
  0x0007ff02,
  0xe043ff88,
  0x2ff70200,
  0xc03a0004,
  0x3407,
  0xf0f0,
-0xfc0007e0,
-0x001f8000,
+0xfcc01fe6,
+0x001f8400,
  0x00170003,
  0x5c681000,
  0x00370002,
  0x5c980780,
  0x00370001,
  0x5c980780,
-0xfc0007e0,
+0xfde007e6,
  0x001f8000,
  0x0037,
  0x5c980780,
diff --git a/src/shader/exacanv110.fp b/src/shader/exacanv110.fp
index a70d5c5..d7c2867 100644
--- a/src/shader/exacanv110.fp
+++ b/src/shader/exacanv110.fp
@@ -25,23 +25,23 @@ NV110FP_CAComposite[] = {
  };
  #else
  
-sched (st 0x0) (st 0x0) (st 0x0)

+sched (st 0xf wr 0x0) (st 0xd wr 0x0 wt 0x1) (st 0xf wr 0x0 wt 0x1)
  ipa pass $r0 a[0x7c] 0x0 0x0 0x1
  mufu rcp $r0 $r0
  ipa $r3 a[0x94] $r0 0x0 0x1
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x1) (st 0xf wr 0x0 wt 0x3) (st 0xf wr 0x1 rd 0x2)
  ipa $r2 a[0x90] $r0 0x0 0x1
  tex nodep $r4 $r2 0x0 0x1 t2d 0xf


Please add a read-dep-bar and wait for on the first fmul because $r2:$r3 
are re-used before $r4. Should be safer.



  ipa $r1 a[0x84] $r0 0x0 0x1
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x2 wt 0x4) (st 0xf wr 0x1 wt 0x6) (st 0xf)
  ipa $r0 a[0x80] $r0 0x0 0x1
  tex nodep $r0 $r0 0x0 0x0 t2d 0xf
  depbar le 0x5 0x0 0x0
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0x1 wt 0x3f) (st 0x1) (st 0x1)
  fmul ftz $r3 $r3 $r7


Why are you waiting all barriers? Only $r3 is needed here.


  fmul ftz $r2 $r2 $r6
  fmul ftz $r1 $r1 $r5
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0x1 wt 0x3) (st 0xf) (st 0x0)
  fmul ftz $r0 $r0 $r4
  exit
  #endif
diff --git a/src/shader/exacanv110.fpc b/src/shader/exacanv110.fpc
index 7c0ca5e..9cad139 100644
--- a/src/shader/exacanv110.fpc
+++ b/src/shader/exacanv110.fpc
@@ -1,36 +1,36 @@
-0xfc0007e0,
-0x001f8000,
+0xe1a0070f,
+0x003c3c01,
  0xcff7ff00,
  0xe003ff87,
  0x0047,
  0x5080,
  0x4007ff03,
  0xe043ff89,
-0xfc0007e0,
-0x001f8000,
+0xe1e0072f,
+0x0008bc03,
  0x0007ff02,
  0xe043ff89,
  0xaff70204,
  

Re: [Nouveau] [PATCH] nv50/ir: optimmize shl(a, 0) to a

2017-04-29 Thread Samuel Pitoiset

"optimmize" ? No need to resend just for that though.

Reviewed-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>

On 04/29/2017 06:46 PM, Karol Herbst wrote:

helps two alien isolation shaders

shader-db:
total instructions in shared programs : 4251497 -> 4251494 (-0.00%)
total gprs used in shared programs: 513962 -> 513962 (0.00%)
total local used in shared programs   : 29797 -> 29797 (0.00%)
total bytes used in shared programs   : 38960264 -> 38960232 (-0.00%)

 localgpr   inst  bytes
 helped   0   0   2   2
   hurt   0   0   0   0

Signed-off-by: Karol Herbst <karolher...@gmail.com>
---
  src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 5 +
  1 file changed, 5 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 015def0391..a2446e4df8 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1284,6 +1284,11 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue 
, int s)
  
 case OP_SHL:

 {
+  if (s == 1 && imm0.isInteger(0)) {
+ i->op = OP_MOV;
+ i->setSrc(1, NULL);
+ break;
+  }
if (s != 1 || i->src(0).mod != Modifier(0))
   break;
// try to concatenate shifts


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] nvc0: support for GP10B

2017-03-30 Thread Samuel Pitoiset

How about piglit? :)

Acked-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>

On 03/30/2017 12:05 PM, Alexandre Courbot wrote:

GP10B uses the same 3D class as GP100.

Signed-off-by: Alexandre Courbot <acour...@nvidia.com>
---
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index 3e4c4f44ba92..c9042fc00447 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -903,6 +903,7 @@ nvc0_screen_create(struct nouveau_device *dev)
case 0x130:
   switch (dev->chipset) {
   case 0x130:
+  case 0x13b:
  obj_class = GP100_3D_CLASS;
  break;
   default:


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v2 1/7] exa: add GM10x acceleration support

2016-10-27 Thread Samuel Pitoiset

Two minor nitpicks below.

I didn't read all shaders carefully because it's a pain, but I didn't 
see any obvious things. :-)


Thanks for that great work Ilia!

Reviewed-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>

On 10/27/2016 04:02 PM, Ilia Mirkin wrote:

rendercheck -f a8r8g8b8 passes as much as on a GK208, and xv appears to
work. Very lightly tested.

Instead of sticking coordinates into pushbufs, the vertex shader is
modified to read them from a constbuf, indexed by vertex id. This
approach could be used for all nvc0 generations, but I didn't want to
rock the boat.

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---
 src/Makefile.am   |  16 
 src/nouveau_copy.c|   1 +
 src/nouveau_exa.c |   2 +-
 src/nouveau_xv.c  |   2 +-
 src/nv_accel_common.c |   1 +
 src/nv_driver.c   |   1 +
 src/nvc0_accel.c  |  37 ++---
 src/nvc0_exa.c|  48 --
 src/nvc0_xv.c |  48 --
 src/shader/Makefile   |  23 ---
 src/shader/exac8nv110.fp  |  47 +
 src/shader/exac8nv110.fpc |  38 +
 src/shader/exacanv110.fp  |  47 +
 src/shader/exacanv110.fpc |  38 +
 src/shader/exacmnv110.fp  |  47 +
 src/shader/exacmnv110.fpc |  38 +
 src/shader/exas8nv110.fp  |  42 +++
 src/shader/exas8nv110.fpc |  28 +
 src/shader/exasanv110.fp  |  47 +
 src/shader/exasanv110.fpc |  38 +
 src/shader/exascnv110.fp  |  38 +
 src/shader/exascnv110.fpc |  20 +
 src/shader/videonv110.fp  |  54 
 src/shader/videonv110.fpc |  52 +++
 src/shader/xfrm2nv110.vp  |  82 +
 src/shader/xfrm2nv110.vpc | 102 ++
 26 files changed, 918 insertions(+), 19 deletions(-)
 create mode 100644 src/shader/exac8nv110.fp
 create mode 100644 src/shader/exac8nv110.fpc
 create mode 100644 src/shader/exacanv110.fp
 create mode 100644 src/shader/exacanv110.fpc
 create mode 100644 src/shader/exacmnv110.fp
 create mode 100644 src/shader/exacmnv110.fpc
 create mode 100644 src/shader/exas8nv110.fp
 create mode 100644 src/shader/exas8nv110.fpc
 create mode 100644 src/shader/exasanv110.fp
 create mode 100644 src/shader/exasanv110.fpc
 create mode 100644 src/shader/exascnv110.fp
 create mode 100644 src/shader/exascnv110.fpc
 create mode 100644 src/shader/videonv110.fp
 create mode 100644 src/shader/videonv110.fpc
 create mode 100644 src/shader/xfrm2nv110.vp
 create mode 100644 src/shader/xfrm2nv110.vpc

diff --git a/src/Makefile.am b/src/Makefile.am
index 1e04ddf..6ba8d87 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -77,48 +77,64 @@ EXTRA_DIST = hwdefs/nv_3ddefs.xml.h \
 shader/exac8nve0.fpc \
 shader/exac8nvf0.fp \
 shader/exac8nvf0.fpc \
+shader/exac8nv110.fp \
+shader/exac8nv110.fpc \
 shader/exacanvc0.fp \
 shader/exacanvc0.fpc \
 shader/exacanve0.fp \
 shader/exacanve0.fpc \
 shader/exacanvf0.fp \
 shader/exacanvf0.fpc \
+shader/exacanv110.fp \
+shader/exacanv110.fpc \
 shader/exacmnvc0.fp \
 shader/exacmnvc0.fpc \
 shader/exacmnve0.fp \
 shader/exacmnve0.fpc \
 shader/exacmnvf0.fp \
 shader/exacmnvf0.fpc \
+shader/exacmnv110.fp \
+shader/exacmnv110.fpc \
 shader/exas8nvc0.fp \
 shader/exas8nvc0.fpc \
 shader/exas8nve0.fp \
 shader/exas8nve0.fpc \
 shader/exas8nvf0.fp \
 shader/exas8nvf0.fpc \
+shader/exas8nv110.fp \
+shader/exas8nv110.fpc \
 shader/exasanvc0.fp \
 shader/exasanvc0.fpc \
 shader/exasanve0.fp \
 shader/exasanve0.fpc \
 shader/exasanvf0.fp \
 shader/exasanvf0.fpc \
+shader/exasanv110.fp \
+shader/exasanv110.fpc \
 shader/exascnvc0.fp \
 shader/exascnvc0.fpc \
 shader/exascnve0.fp \
 shader/exascnve0.fpc \
 shader/exascnvf0.fp \
 shader/exascnvf0.fpc \
+shader/exascnv110.fp \
+shader/exascnv110.fpc \
 shader/videonvc0.fp \
 shader/videonvc0.fpc \
 shader/videonve0.fp \
 shader/videonve0.fpc \
 shader/videonvf0.fp \
 shader/videonvf0.fpc \
+shader/videonv110.fp \
+shader/videonv110.fpc \
 shader/xfrm2nvc0.vp \
 shader/xfrm2nvc0.vpc \
 shader/xfrm2nve0.vp \
 shader/xfrm2nve0.vpc \
 shader/

Re: [Nouveau] [PATCH v2 5/7] nvc0: refactor TIC uploads to allow different specifics per generation

2016-10-27 Thread Samuel Pitoiset



On 10/27/2016 07:28 PM, Ilia Mirkin wrote:

On Thu, Oct 27, 2016 at 1:19 PM, Samuel Pitoiset
<samuel.pitoi...@gmail.com> wrote:

Are you sure this refactoring doesn't break anything?

Few comments inline.


On 10/27/2016 04:02 PM, Ilia Mirkin wrote:


This flips GM10x to using the updated format, which is what I tested
with. However GM20x and GP10x also use this TIC format.

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---
 src/nvc0_accel.c | 11 ++
 src/nvc0_accel.h | 56 ++
 src/nvc0_exa.c   | 23 ---
 src/nvc0_xv.c| 67
+++-
 4 files changed, 93 insertions(+), 64 deletions(-)

diff --git a/src/nvc0_accel.c b/src/nvc0_accel.c
index 0682806..8da5051 100644
--- a/src/nvc0_accel.c
+++ b/src/nvc0_accel.c
@@ -322,6 +322,17 @@ NVAccelInit3D_NVC0(ScrnInfoPtr pScrn)
PUSH_DATA (push, (bo->offset + MISC_OFFSET) >> 32);
PUSH_DATA (push, (bo->offset + MISC_OFFSET));
PUSH_DATA (push, 1);
+   } else {
+   /* Use new TIC format. Not strictly necessary for GM20x+
*/
+   IMMED_NVC0(push, SUBC_3D(0x0f10), 1);
+   if (pNv->dev->chipset >= 0x120) {
+   /* Use center sample locations. */
+   BEGIN_NVC0(push, SUBC_3D(0x11e0), 4);
+   PUSH_DATA (push, 0x);
+   PUSH_DATA (push, 0x);
+   PUSH_DATA (push, 0x);
+   PUSH_DATA (push, 0x);
+   }
}

BEGIN_NVC0(push, NVC0_3D(CODE_ADDRESS_HIGH), 2);
diff --git a/src/nvc0_accel.h b/src/nvc0_accel.h
index 607e97b..959f67f 100644
--- a/src/nvc0_accel.h
+++ b/src/nvc0_accel.h
@@ -7,6 +7,7 @@
 #include "hwdefs/nvc0_m2mf.xml.h"
 #include "hwdefs/nv50_defs.xml.h"
 #include "hwdefs/nv50_texture.h"
+#include "hwdefs/gm107_texture.xml.h"
 #include "hwdefs/nv_3ddefs.xml.h"

 /* subchannel assignments, compatible with kepler's fixed layout  */
@@ -108,4 +109,59 @@ PUSH_DATAu(struct nouveau_pushbuf *push, struct
nouveau_bo *bo,
}
 }

+static __inline__ void
+PUSH_TIC(struct nouveau_pushbuf *push, struct nouveau_bo *bo, unsigned
offset,
+unsigned width, unsigned height, unsigned pitch, unsigned format)
+{
+   if (push->client->device->chipset < 0x110) {
+   unsigned tic2 = 0xd0001000;
+   if (pitch == 0)
+   tic2 |= 0x4000;
+   else
+   tic2 |= 0x0005c000;
+   PUSH_DATA(push, format);
+   PUSH_DATA(push, bo->offset + offset);
+   PUSH_DATA(push, ((bo->offset + offset) >> 32) |
+   (bo->config.nvc0.tile_mode << 18) |
+   tic2);
+   PUSH_DATA(push, 0x0030);
+   PUSH_DATA(push, 0x8000 | width);
+   PUSH_DATA(push, 0x0001 | height);
+   PUSH_DATA(push, 0x0300);
+   PUSH_DATA(push, 0x);
+   } else {
+   unsigned tile_mode = bo->config.nvc0.tile_mode;
+   PUSH_DATA(push, (format & 0x3f) | ((format & ~0x3f) <<
1));
+   PUSH_DATA(push, bo->offset + offset);
+   if (pitch == 0) {
+   PUSH_DATA(push, ((bo->offset + offset) >> 32) |
+
GM107_TIC2_2_HEADER_VERSION_BLOCKLINEAR);
+   PUSH_DATA(push, GM107_TIC2_3_LOD_ANISO_QUALITY_2 |
+ ((tile_mode & 0x007)) |
+ ((tile_mode & 0x070) >> (4 - 3)) |
+ ((tile_mode & 0x700) >> (8 - 6)));
+   PUSH_DATA(push,
GM107_TIC2_4_SECTOR_PROMOTION_PROMOTE_TO_2_V |
+ GM107_TIC2_4_BORDER_SIZE_SAMPLER_COLOR |
+ GM107_TIC2_4_TEXTURE_TYPE_TWO_D |
+ (width - 1));
+   PUSH_DATA(push, GM107_TIC2_5_NORMALIZED_COORDS |
+   ((height - 1) & 0x));
+   PUSH_DATA(push,
GM107_TIC2_6_ANISO_FINE_SPREAD_FUNC_TWO |
+
GM107_TIC2_6_ANISO_COARSE_SPREAD_FUNC_ONE);
+   PUSH_DATA(push, 0x);
+   } else {
+   PUSH_DATA(push, ((bo->offset + offset) >> 32) |
+
GM107_TIC2_2_HEADER_VERSION_PITCH);
+   PUSH_DATA(push, GM107_TIC2_3_LOD_ANISO_QUALITY_2 |
+   (pitch >> 5));
+   PUSH_DATA(push,
GM107_TIC2_4_BORDER_SIZE_SAMPLER_COLOR |
+
GM107_TIC2_4_TEXTURE_TYPE_TWO_D_NO_MIPMAP |
+ (width - 1));
+

Re: [Nouveau] [PATCH v2 5/7] nvc0: refactor TIC uploads to allow different specifics per generation

2016-10-27 Thread Samuel Pitoiset

Are you sure this refactoring doesn't break anything?

Few comments inline.

On 10/27/2016 04:02 PM, Ilia Mirkin wrote:

This flips GM10x to using the updated format, which is what I tested
with. However GM20x and GP10x also use this TIC format.

Signed-off-by: Ilia Mirkin 
---
 src/nvc0_accel.c | 11 ++
 src/nvc0_accel.h | 56 ++
 src/nvc0_exa.c   | 23 ---
 src/nvc0_xv.c| 67 +++-
 4 files changed, 93 insertions(+), 64 deletions(-)

diff --git a/src/nvc0_accel.c b/src/nvc0_accel.c
index 0682806..8da5051 100644
--- a/src/nvc0_accel.c
+++ b/src/nvc0_accel.c
@@ -322,6 +322,17 @@ NVAccelInit3D_NVC0(ScrnInfoPtr pScrn)
PUSH_DATA (push, (bo->offset + MISC_OFFSET) >> 32);
PUSH_DATA (push, (bo->offset + MISC_OFFSET));
PUSH_DATA (push, 1);
+   } else {
+   /* Use new TIC format. Not strictly necessary for GM20x+ */
+   IMMED_NVC0(push, SUBC_3D(0x0f10), 1);
+   if (pNv->dev->chipset >= 0x120) {
+   /* Use center sample locations. */
+   BEGIN_NVC0(push, SUBC_3D(0x11e0), 4);
+   PUSH_DATA (push, 0x);
+   PUSH_DATA (push, 0x);
+   PUSH_DATA (push, 0x);
+   PUSH_DATA (push, 0x);
+   }
}

BEGIN_NVC0(push, NVC0_3D(CODE_ADDRESS_HIGH), 2);
diff --git a/src/nvc0_accel.h b/src/nvc0_accel.h
index 607e97b..959f67f 100644
--- a/src/nvc0_accel.h
+++ b/src/nvc0_accel.h
@@ -7,6 +7,7 @@
 #include "hwdefs/nvc0_m2mf.xml.h"
 #include "hwdefs/nv50_defs.xml.h"
 #include "hwdefs/nv50_texture.h"
+#include "hwdefs/gm107_texture.xml.h"
 #include "hwdefs/nv_3ddefs.xml.h"

 /* subchannel assignments, compatible with kepler's fixed layout  */
@@ -108,4 +109,59 @@ PUSH_DATAu(struct nouveau_pushbuf *push, struct nouveau_bo 
*bo,
}
 }

+static __inline__ void
+PUSH_TIC(struct nouveau_pushbuf *push, struct nouveau_bo *bo, unsigned offset,
+unsigned width, unsigned height, unsigned pitch, unsigned format)
+{
+   if (push->client->device->chipset < 0x110) {
+   unsigned tic2 = 0xd0001000;
+   if (pitch == 0)
+   tic2 |= 0x4000;
+   else
+   tic2 |= 0x0005c000;
+   PUSH_DATA(push, format);
+   PUSH_DATA(push, bo->offset + offset);
+   PUSH_DATA(push, ((bo->offset + offset) >> 32) |
+   (bo->config.nvc0.tile_mode << 18) |
+   tic2);
+   PUSH_DATA(push, 0x0030);
+   PUSH_DATA(push, 0x8000 | width);
+   PUSH_DATA(push, 0x0001 | height);
+   PUSH_DATA(push, 0x0300);
+   PUSH_DATA(push, 0x);
+   } else {
+   unsigned tile_mode = bo->config.nvc0.tile_mode;
+   PUSH_DATA(push, (format & 0x3f) | ((format & ~0x3f) << 1));
+   PUSH_DATA(push, bo->offset + offset);
+   if (pitch == 0) {
+   PUSH_DATA(push, ((bo->offset + offset) >> 32) |
+ GM107_TIC2_2_HEADER_VERSION_BLOCKLINEAR);
+   PUSH_DATA(push, GM107_TIC2_3_LOD_ANISO_QUALITY_2 |
+ ((tile_mode & 0x007)) |
+ ((tile_mode & 0x070) >> (4 - 3)) |
+ ((tile_mode & 0x700) >> (8 - 6)));
+   PUSH_DATA(push, 
GM107_TIC2_4_SECTOR_PROMOTION_PROMOTE_TO_2_V |
+ GM107_TIC2_4_BORDER_SIZE_SAMPLER_COLOR |
+ GM107_TIC2_4_TEXTURE_TYPE_TWO_D |
+ (width - 1));
+   PUSH_DATA(push, GM107_TIC2_5_NORMALIZED_COORDS |
+   ((height - 1) & 0x));
+   PUSH_DATA(push, GM107_TIC2_6_ANISO_FINE_SPREAD_FUNC_TWO 
|
+   
GM107_TIC2_6_ANISO_COARSE_SPREAD_FUNC_ONE);
+   PUSH_DATA(push, 0x);
+   } else {
+   PUSH_DATA(push, ((bo->offset + offset) >> 32) |
+   GM107_TIC2_2_HEADER_VERSION_PITCH);
+   PUSH_DATA(push, GM107_TIC2_3_LOD_ANISO_QUALITY_2 |
+   (pitch >> 5));
+   PUSH_DATA(push, GM107_TIC2_4_BORDER_SIZE_SAMPLER_COLOR |
+ GM107_TIC2_4_TEXTURE_TYPE_TWO_D_NO_MIPMAP |
+ (width - 1));
+   PUSH_DATA(push, GM107_TIC2_5_NORMALIZED_COORDS | 
(height - 1));
+   PUSH_DATA(push, 0x0);
+   PUSH_DATA(push, 0x0);
+   }

Re: [Nouveau] [PATCH v2 7/7] recognize and accelerate GM20x

2016-10-27 Thread Samuel Pitoiset

Reviewed-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>

On 10/27/2016 04:03 PM, Ilia Mirkin wrote:

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---
 src/nv_driver.c  |  2 ++
 src/nvc0_accel.c | 10 +-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/nv_driver.c b/src/nv_driver.c
index fff83f8..61940a8 100644
--- a/src/nv_driver.c
+++ b/src/nv_driver.c
@@ -390,6 +390,7 @@ NVHasKMS(struct pci_device *pci_dev, struct 
xf86_platform_device *platform_dev)
case 0xf0:
case 0x100:
case 0x110:
+   case 0x120:
break;
default:
xf86DrvMsg(-1, X_ERROR, "Unknown chipset: NV%02X\n", chipset);
@@ -941,6 +942,7 @@ NVPreInit(ScrnInfoPtr pScrn, int flags)
pNv->Architecture = NV_KEPLER;
break;
case 0x110:
+   case 0x120:
pNv->Architecture = NV_MAXWELL;
break;
default:
diff --git a/src/nvc0_accel.c b/src/nvc0_accel.c
index d0a835e..6c2bae8 100644
--- a/src/nvc0_accel.c
+++ b/src/nvc0_accel.c
@@ -244,9 +244,17 @@ NVAccelInit3D_NVC0(ScrnInfoPtr pScrn)
} else if (pNv->dev->chipset < 0x110) {
class  = 0xa197;
handle = 0x906e;
-   } else {
+   } else if (pNv->dev->chipset < 0x120) {
class  = 0xb097;
handle = 0x906e;
+   } else if (pNv->dev->chipset < 0x130) {
+   class  = 0xb197;
+   handle = 0x906e;
+   } else {
+   xf86DrvMsg(pScrn->scrnIndex, X_INFO,
+  "No 3D acceleration support for NV%X\n",
+  pNv->dev->chipset);
+   return FALSE;
}

ret = nouveau_object_new(pNv->channel, class, class,


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v2 6/7] copy: add maxwell/pascal copy engine classes

2016-10-27 Thread Samuel Pitoiset

0xc0b5 is not in rnndb, I guess it should be GP100_COPY, right?

Reviewed-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>

On 10/27/2016 04:02 PM, Ilia Mirkin wrote:

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---
 src/nouveau_copy.c |  2 ++
 src/nvc0_accel.c   | 10 +-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/nouveau_copy.c b/src/nouveau_copy.c
index c139de6..7118a7a 100644
--- a/src/nouveau_copy.c
+++ b/src/nouveau_copy.c
@@ -42,6 +42,8 @@ nouveau_copy_init(ScreenPtr pScreen)
int engine;
Bool (*init)(NVPtr);
} methods[] = {
+   { 0xc0b5, 0, nouveau_copya0b5_init },
+   { 0xb0b5, 0, nouveau_copya0b5_init },
{ 0xa0b5, 0, nouveau_copya0b5_init },
{ 0x90b8, 5, nouveau_copy90b5_init },
{ 0x90b5, 4, nouveau_copy90b5_init },
diff --git a/src/nvc0_accel.c b/src/nvc0_accel.c
index 8da5051..d0a835e 100644
--- a/src/nvc0_accel.c
+++ b/src/nvc0_accel.c
@@ -156,9 +156,17 @@ NVAccelInitCOPY_NVE0(ScrnInfoPtr pScrn)
 {
NVPtr pNv = NVPTR(pScrn);
struct nouveau_pushbuf *push = pNv->pushbuf;
+   uint32_t class;
int ret;

-   ret = nouveau_object_new(pNv->channel, 0xa0b5, 0xa0b5,
+   if (pNv->dev->chipset < 0x110)
+   class = 0xa0b5;
+   else if (pNv->dev->chipset < 0x130)
+   class = 0xb0b5;
+   else
+   class = 0xc0b5;
+
+   ret = nouveau_object_new(pNv->channel, class, class,
 NULL, 0, >NvCOPY);
if (ret)
return FALSE;


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 4/5] nvc0: refactor TIC uploads to allow different specifies per generation

2016-10-17 Thread Samuel Pitoiset



On 10/17/2016 02:24 PM, Ilia Mirkin wrote:

On Mon, Oct 17, 2016 at 5:46 AM, Samuel Pitoiset
<samuel.pitoi...@gmail.com> wrote:

Few comments below.

On 10/16/2016 09:14 PM, Ilia Mirkin wrote:


This flips GM10x to using the updated format, which is what I tested
with. However GM20x and GP10x also use this TIC format.

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---
 src/nvc0_accel.c | 11 ++
 src/nvc0_accel.h | 56 ++
 src/nvc0_exa.c   | 22 ---
 src/nvc0_xv.c| 67
+++-
 4 files changed, 93 insertions(+), 63 deletions(-)

diff --git a/src/nvc0_accel.c b/src/nvc0_accel.c
index 0682806..8da5051 100644
--- a/src/nvc0_accel.c
+++ b/src/nvc0_accel.c
@@ -322,6 +322,17 @@ NVAccelInit3D_NVC0(ScrnInfoPtr pScrn)
PUSH_DATA (push, (bo->offset + MISC_OFFSET) >> 32);
PUSH_DATA (push, (bo->offset + MISC_OFFSET));
PUSH_DATA (push, 1);
+   } else {
+   /* Use new TIC format. Not strictly necessary for GM20x+
*/



Yes, but it's also enabled by default in mesa, looks fine.



+   IMMED_NVC0(push, SUBC_3D(0x0f10), 1);
+   if (pNv->dev->chipset >= 0x120) {
+   /* Use center sample locations. */
+   BEGIN_NVC0(push, SUBC_3D(0x11e0), 4);
+   PUSH_DATA (push, 0x);
+   PUSH_DATA (push, 0x);
+   PUSH_DATA (push, 0x);
+   PUSH_DATA (push, 0x);
+   }
}

BEGIN_NVC0(push, NVC0_3D(CODE_ADDRESS_HIGH), 2);
diff --git a/src/nvc0_accel.h b/src/nvc0_accel.h
index 607e97b..9378236 100644
--- a/src/nvc0_accel.h
+++ b/src/nvc0_accel.h
@@ -7,6 +7,7 @@
 #include "hwdefs/nvc0_m2mf.xml.h"
 #include "hwdefs/nv50_defs.xml.h"
 #include "hwdefs/nv50_texture.h"
+#include "hwdefs/gm107_texture.xml.h"
 #include "hwdefs/nv_3ddefs.xml.h"

 /* subchannel assignments, compatible with kepler's fixed layout  */
@@ -108,4 +109,59 @@ PUSH_DATAu(struct nouveau_pushbuf *push, struct
nouveau_bo *bo,
}
 }

+static __inline__ void
+PUSH_TIC(struct nouveau_pushbuf *push, struct nouveau_bo *bo, unsigned
offset,
+unsigned width, unsigned height, unsigned pitch, unsigned format)
+{
+   if (push->client->device->chipset < 0x110) {
+   unsigned tic2 = 0xd0001000;
+   if (pitch == 0)
+   tic2 |= 0x4000;
+   else
+   tic2 |= 0x0005c000;
+   PUSH_DATA(push, format);
+   PUSH_DATA(push, bo->offset + offset);
+   PUSH_DATA(push, ((bo->offset + offset) >> 32) |
+   (bo->config.nvc0.tile_mode << 18) |
+   tic2);
+   PUSH_DATA(push, 0x0030);
+   PUSH_DATA(push, 0x8000 | width);
+   PUSH_DATA(push, 0x0001 | height);
+   PUSH_DATA (push, 0x0300);
+   PUSH_DATA (push, 0x);



Cosmetic.


Oops, will fix.





+   } else {
+   unsigned tile_mode = bo->config.nvc0.tile_mode;
+   PUSH_DATA(push, (format & 0x3f) | ((format & ~0x3f) <<
1));
+   PUSH_DATA(push, bo->offset + offset);
+   if (pitch == 0) {
+   PUSH_DATA(push, ((bo->offset + offset) >> 32) |
+
GM107_TIC2_2_HEADER_VERSION_BLOCKLINEAR);
+   PUSH_DATA(push, GM107_TIC2_3_LOD_ANISO_QUALITY_2 |
+ (tile_mode & 0x007) |
+ (tile_mode & 0x070 >> (4 - 3)) |
+ (tile_mode & 0x700 >> (8 - 6)));
+   PUSH_DATA(push,
GM107_TIC2_4_SECTOR_PROMOTION_PROMOTE_TO_2_V |
+ GM107_TIC2_4_BORDER_SIZE_SAMPLER_COLOR |
+ GM107_TIC2_4_TEXTURE_TYPE_TWO_D |
+ (width - 1));
+   PUSH_DATA(push, GM107_TIC2_5_NORMALIZED_COORDS |
+   ((height - 1) & 0x));
+   PUSH_DATA(push,
GM107_TIC2_6_ANISO_FINE_SPREAD_FUNC_TWO |
+
GM107_TIC2_6_ANISO_COARSE_SPREAD_FUNC_ONE);
+   PUSH_DATA(push, 0x);
+   } else {
+   PUSH_DATA(push, ((bo->offset + offset) >> 32) |
+
GM107_TIC2_2_HEADER_VERSION_PITCH);
+   PUSH_DATA(push, GM107_TIC2_3_LOD_ANISO_QUALITY_2 |
+   (pitch >> 5));
+   PUSH_DATA(push,
GM107_TIC2_4_BORDER_SIZE_SAMPLER_COLOR |
+
GM107_TIC2_4_TEXTURE_TYPE_TWO_D_NO_MIPMAP |
+ (width

Re: [Nouveau] [PATCH] exa: add GM10x acceleration support

2016-10-17 Thread Samuel Pitoiset



On 10/17/2016 02:27 PM, Ilia Mirkin wrote:

On Mon, Oct 17, 2016 at 5:28 AM, Samuel Pitoiset
<samuel.pitoi...@gmail.com> wrote:

Looks reasonable, some minor comments below.


On 10/16/2016 02:06 AM, Ilia Mirkin wrote:

diff --git a/src/nvc0_exa.c b/src/nvc0_exa.c
index 6add60b..a53dfe6 100644
--- a/src/nvc0_exa.c
+++ b/src/nvc0_exa.c
@@ -914,14 +914,56 @@ NVC0EXAComposite(PixmapPtr pdpix,
if (!PUSH_SPACE(push, 64))
return;

+   if (pNv->dev->chipset >= 0x110) {
+   BEGIN_NVC0(push, NVC0_3D(CB_SIZE), 3);
+   PUSH_DATA (push, 256);
+   PUSH_DATA (push, (pNv->scratch->offset + PVP_DATA) >> 32);



No PUSH_DATAh in the DDX?


Nope. Didn't feel the burning need to add a helper either.


Fine by me.




 $(filter %nvc0.vpc,$(SHADERS)): %.vpc: %.vp
-   cpp -DENVYAS $< | sed -e '/^#/d' | $(ENVYAS) -w -m nvc0 -o $@
+   cpp -DENVYAS $< | sed -e '/^#/d' | $(ENVYAS) -w -m gf100 -V gf100
-o $@
 $(filter %nvc0.fpc,$(SHADERS)): %.fpc: %.fp
-   cpp -DENVYAS $< | sed -e '/^#/d' | $(ENVYAS) -w -m nvc0 -o $@
+   cpp -DENVYAS $< | sed -e '/^#/d' | $(ENVYAS) -w -m gf100 -V gf100
-o $@

 $(filter %nve0.vpc,$(SHADERS)): %.vpc: %.vp
-   cpp -DENVYAS $< | sed -e '/^#/d' | $(ENVYAS) -w -m nvc0 -V nve4 -o
$@
+   cpp -DENVYAS $< | sed -e '/^#/d' | $(ENVYAS) -w -m gf100 -V gk104
-o $@
 $(filter %nve0.fpc,$(SHADERS)): %.fpc: %.fp
-   cpp -DENVYAS $< | sed -e '/^#/d' | $(ENVYAS) -w -m nvc0 -V nve4 -o
$@
+   cpp -DENVYAS $< | sed -e '/^#/d' | $(ENVYAS) -w -m gf100 -V gk104
-o $@



This is unrelated to your main change, but well should be *exactly* the same
thing. :)


You mean the bit about me adding -V gf100? Figured I'd fix it up while
I was at it. The machine/variant names changed though.


Yeah, I won't ask for a separate patch anyways. :-)








 $(filter %nvf0.vpc,$(SHADERS)): %.vpc: %.vp
cpp -DENVYAS $< | sed -e '/^#/d' | $(ENVYAS) -w -m gk110 -o $@
 $(filter %nvf0.fpc,$(SHADERS)): %.fpc: %.fp
cpp -DENVYAS $< | sed -e '/^#/d' | $(ENVYAS) -w -m gk110 -o $@
+
+$(filter %nv110.vpc,$(SHADERS)): %.vpc: %.vp
+   cpp -DENVYAS $< | sed -e '/^#/d' | $(ENVYAS) -w -m gm107 -o $@
+$(filter %nv110.fpc,$(SHADERS)): %.fpc: %.fp
+   cpp -DENVYAS $< | sed -e '/^#/d' | $(ENVYAS) -w -m gm107 -o $@
diff --git a/src/shader/exac8nv110.fp b/src/shader/exac8nv110.fp
new file mode 100644
index 000..ce78036
--- /dev/null
+++ b/src/shader/exac8nv110.fp
@@ -0,0 +1,47 @@
+#ifndef ENVYAS
+static uint32_t
+NV110FP_Composite_A8[] = {
+   0x1462,
+   0x,
+   0x,
+   0x,
+   0x,
+   0x8000,
+   0x0a0a,
+   0x,
+   0x,
+   0x,
+   0x,
+   0x,
+   0x,
+   0x,
+   0x,
+   0x,
+   0x,
+   0x,
+   0x000f,
+   0x,
+#include "exac8nv110.fpc"
+};
+#else
+
+sched (st 0x0) (st 0x0) (st 0x0)



Those sched codes are definitely bad, but let's keep them as it for now. I
might have a look at some point to improve the thing.


Yeah, way wrong. However it's what our compiler would produce. You can
use this as a proving ground for your various theories. All simple
shaders though, no control flow. Only complex thing is textures.

  -ilia



--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 5/5] recognize and accelerate GM20x

2016-10-17 Thread Samuel Pitoiset

This requires at least a quick test. :-)

Acked-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>

On 10/16/2016 09:14 PM, Ilia Mirkin wrote:

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---

Untested. I don't have the hardware.

 src/nv_driver.c  |  2 ++
 src/nvc0_accel.c | 10 +-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/nv_driver.c b/src/nv_driver.c
index fff83f8..61940a8 100644
--- a/src/nv_driver.c
+++ b/src/nv_driver.c
@@ -390,6 +390,7 @@ NVHasKMS(struct pci_device *pci_dev, struct 
xf86_platform_device *platform_dev)
case 0xf0:
case 0x100:
case 0x110:
+   case 0x120:
break;
default:
xf86DrvMsg(-1, X_ERROR, "Unknown chipset: NV%02X\n", chipset);
@@ -941,6 +942,7 @@ NVPreInit(ScrnInfoPtr pScrn, int flags)
pNv->Architecture = NV_KEPLER;
break;
case 0x110:
+   case 0x120:
pNv->Architecture = NV_MAXWELL;
break;
default:
diff --git a/src/nvc0_accel.c b/src/nvc0_accel.c
index 8da5051..996fb88 100644
--- a/src/nvc0_accel.c
+++ b/src/nvc0_accel.c
@@ -236,9 +236,17 @@ NVAccelInit3D_NVC0(ScrnInfoPtr pScrn)
} else if (pNv->dev->chipset < 0x110) {
class  = 0xa197;
handle = 0x906e;
-   } else {
+   } else if (pNv->dev->chipset < 0x120) {
class  = 0xb097;
handle = 0x906e;
+   } else if (pNv->dev->chipset < 0x130) {
+   class  = 0xb197;
+   handle = 0x906e;
+   } else {
+   xf86DrvMsg(pScrn->scrnIndex, X_INFO,
+  "No 3D acceleration support for NV%X\n",
+  pNv->dev->chipset);
+   return FALSE;
}

ret = nouveau_object_new(pNv->channel, class, class,



--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 4/5] nvc0: refactor TIC uploads to allow different specifies per generation

2016-10-17 Thread Samuel Pitoiset

Few comments below.

On 10/16/2016 09:14 PM, Ilia Mirkin wrote:

This flips GM10x to using the updated format, which is what I tested
with. However GM20x and GP10x also use this TIC format.

Signed-off-by: Ilia Mirkin 
---
 src/nvc0_accel.c | 11 ++
 src/nvc0_accel.h | 56 ++
 src/nvc0_exa.c   | 22 ---
 src/nvc0_xv.c| 67 +++-
 4 files changed, 93 insertions(+), 63 deletions(-)

diff --git a/src/nvc0_accel.c b/src/nvc0_accel.c
index 0682806..8da5051 100644
--- a/src/nvc0_accel.c
+++ b/src/nvc0_accel.c
@@ -322,6 +322,17 @@ NVAccelInit3D_NVC0(ScrnInfoPtr pScrn)
PUSH_DATA (push, (bo->offset + MISC_OFFSET) >> 32);
PUSH_DATA (push, (bo->offset + MISC_OFFSET));
PUSH_DATA (push, 1);
+   } else {
+   /* Use new TIC format. Not strictly necessary for GM20x+ */


Yes, but it's also enabled by default in mesa, looks fine.


+   IMMED_NVC0(push, SUBC_3D(0x0f10), 1);
+   if (pNv->dev->chipset >= 0x120) {
+   /* Use center sample locations. */
+   BEGIN_NVC0(push, SUBC_3D(0x11e0), 4);
+   PUSH_DATA (push, 0x);
+   PUSH_DATA (push, 0x);
+   PUSH_DATA (push, 0x);
+   PUSH_DATA (push, 0x);
+   }
}

BEGIN_NVC0(push, NVC0_3D(CODE_ADDRESS_HIGH), 2);
diff --git a/src/nvc0_accel.h b/src/nvc0_accel.h
index 607e97b..9378236 100644
--- a/src/nvc0_accel.h
+++ b/src/nvc0_accel.h
@@ -7,6 +7,7 @@
 #include "hwdefs/nvc0_m2mf.xml.h"
 #include "hwdefs/nv50_defs.xml.h"
 #include "hwdefs/nv50_texture.h"
+#include "hwdefs/gm107_texture.xml.h"
 #include "hwdefs/nv_3ddefs.xml.h"

 /* subchannel assignments, compatible with kepler's fixed layout  */
@@ -108,4 +109,59 @@ PUSH_DATAu(struct nouveau_pushbuf *push, struct nouveau_bo 
*bo,
}
 }

+static __inline__ void
+PUSH_TIC(struct nouveau_pushbuf *push, struct nouveau_bo *bo, unsigned offset,
+unsigned width, unsigned height, unsigned pitch, unsigned format)
+{
+   if (push->client->device->chipset < 0x110) {
+   unsigned tic2 = 0xd0001000;
+   if (pitch == 0)
+   tic2 |= 0x4000;
+   else
+   tic2 |= 0x0005c000;
+   PUSH_DATA(push, format);
+   PUSH_DATA(push, bo->offset + offset);
+   PUSH_DATA(push, ((bo->offset + offset) >> 32) |
+   (bo->config.nvc0.tile_mode << 18) |
+   tic2);
+   PUSH_DATA(push, 0x0030);
+   PUSH_DATA(push, 0x8000 | width);
+   PUSH_DATA(push, 0x0001 | height);
+   PUSH_DATA (push, 0x0300);
+   PUSH_DATA (push, 0x);


Cosmetic.


+   } else {
+   unsigned tile_mode = bo->config.nvc0.tile_mode;
+   PUSH_DATA(push, (format & 0x3f) | ((format & ~0x3f) << 1));
+   PUSH_DATA(push, bo->offset + offset);
+   if (pitch == 0) {
+   PUSH_DATA(push, ((bo->offset + offset) >> 32) |
+ GM107_TIC2_2_HEADER_VERSION_BLOCKLINEAR);
+   PUSH_DATA(push, GM107_TIC2_3_LOD_ANISO_QUALITY_2 |
+ (tile_mode & 0x007) |
+ (tile_mode & 0x070 >> (4 - 3)) |
+ (tile_mode & 0x700 >> (8 - 6)));
+   PUSH_DATA(push, 
GM107_TIC2_4_SECTOR_PROMOTION_PROMOTE_TO_2_V |
+ GM107_TIC2_4_BORDER_SIZE_SAMPLER_COLOR |
+ GM107_TIC2_4_TEXTURE_TYPE_TWO_D |
+ (width - 1));
+   PUSH_DATA(push, GM107_TIC2_5_NORMALIZED_COORDS |
+   ((height - 1) & 0x));
+   PUSH_DATA(push, GM107_TIC2_6_ANISO_FINE_SPREAD_FUNC_TWO 
|
+   
GM107_TIC2_6_ANISO_COARSE_SPREAD_FUNC_ONE);
+   PUSH_DATA(push, 0x);
+   } else {
+   PUSH_DATA(push, ((bo->offset + offset) >> 32) |
+   GM107_TIC2_2_HEADER_VERSION_PITCH);
+   PUSH_DATA(push, GM107_TIC2_3_LOD_ANISO_QUALITY_2 |
+   (pitch >> 5));
+   PUSH_DATA(push, GM107_TIC2_4_BORDER_SIZE_SAMPLER_COLOR |
+ GM107_TIC2_4_TEXTURE_TYPE_TWO_D_NO_MIPMAP |
+ (width - 1));
+   PUSH_DATA(push, GM107_TIC2_5_NORMALIZED_COORDS | 
(height - 1));
+   PUSH_DATA(push, 0x0);
+   PUSH_DATA(push, 0x0);

Re: [Nouveau] [PATCH 3/5] nvc0: rename BEGIN_IMC0 to IMMED_NVC0

2016-10-17 Thread Samuel Pitoiset

Reviewed-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>

On 10/16/2016 09:14 PM, Ilia Mirkin wrote:

For consistency with mesa. It wasn't used anywhere previously.

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---
 src/nouveau_local.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/nouveau_local.h b/src/nouveau_local.h
index 3de69a2..dd49395 100644
--- a/src/nouveau_local.h
+++ b/src/nouveau_local.h
@@ -237,7 +237,7 @@ BEGIN_NIC0(struct nouveau_pushbuf *push, int subc, int 
mthd, int size)
 }

 static inline void
-BEGIN_IMC0(struct nouveau_pushbuf *push, int subc, int mthd, int data)
+IMMED_NVC0(struct nouveau_pushbuf *push, int subc, int mthd, int data)
 {
PUSH_DATA (push, 0x8000 | (data << 16) | (subc << 13) | (mthd / 4));
 }



--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 1/5] hwdefs: update nvc0_3d, add gm107_texture for new TIC format

2016-10-17 Thread Samuel Pitoiset

Acked-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>

On 10/16/2016 09:14 PM, Ilia Mirkin wrote:

These are copied directly from the mesa repository.

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---
 src/hwdefs/gm107_texture.xml.h | 365 +
 src/hwdefs/nvc0_3d.xml.h   | 867 +
 2 files changed, 892 insertions(+), 340 deletions(-)
 create mode 100644 src/hwdefs/gm107_texture.xml.h

diff --git a/src/hwdefs/gm107_texture.xml.h b/src/hwdefs/gm107_texture.xml.h
new file mode 100644
index 000..a4bc380
--- /dev/null
+++ b/src/hwdefs/gm107_texture.xml.h
@@ -0,0 +1,365 @@
+#ifndef GM107_TEXTURE_XML
+#define GM107_TEXTURE_XML
+
+/* Autogenerated file, DO NOT EDIT manually!
+
+This file was generated by the rules-ng-ng headergen tool in this git 
repository:
+http://github.com/envytools/envytools/
+git clone https://github.com/envytools/envytools.git
+
+The rules-ng-ng source files this header was generated from are:
+- /home/skeggsb/git/envytools/rnndb/../rnndb/graph/gm107_texture.xml (  22057 
bytes, from 2016-02-12 03:01:43)
+- /home/skeggsb/git/envytools/rnndb/copyright.xml(   6456 
bytes, from 2015-09-10 02:57:40)
+- /home/skeggsb/git/envytools/rnndb/nvchipsets.xml   (   2908 
bytes, from 2016-02-04 22:19:11)
+- /home/skeggsb/git/envytools/rnndb/g80_defs.xml (  21739 
bytes, from 2016-02-04 00:29:42)
+
+Copyright (C) 2006-2016 by the following authors:
+- Artur Huillet <arthur.huil...@free.fr> (ahuillet)
+- Ben Skeggs (darktama, darktama_)
+- B. R. <koala...@users.sourceforge.net> (koala_br)
+- Carlos Martin <carlo...@users.sf.net> (carlosmn)
+- Christoph Bumiller <e0425...@student.tuwien.ac.at> (calim, chrisbmr)
+- Dawid Gajownik <gajow...@users.sf.net> (gajownik)
+- Dmitry Baryshkov
+- Dmitry Eremin-Solenikov <lu...@users.sf.net> (lumag)
+- EdB <e...@users.sf.net> (edb_)
+- Erik Waling <erikwail...@users.sf.net> (erikwaling)
+- Francisco Jerez <curroje...@riseup.net> (curro)
+- Ilia Mirkin <imir...@alum.mit.edu> (imirkin)
+- jb17bsome <jb17bs...@bellsouth.net> (jb17bsome)
+- Jeremy Kolb <kjer...@users.sf.net> (kjeremy)
+- Laurent Carlier <lordhea...@gmail.com> (lordheavy)
+- Luca Barbieri <l...@luca-barbieri.com> (lb, lb1)
+- Maarten Maathuis <madman2...@gmail.com> (stillunknown)
+- Marcin Kościelnicki <koria...@0x04.net> (mwk, koriakin)
+- Mark Carey <mark.ca...@gmail.com> (careym)
+- Matthieu Castet <matthieu.cas...@parrot.com> (mat-c)
+- nvidiaman <nvidia...@users.sf.net> (nvidiaman)
+- Patrice Mandin <patman...@gmail.com> (pmandin, pmdata)
+- Pekka Paalanen <p...@iki.fi> (pq, ppaalanen)
+- Peter Popov <ironpe...@users.sf.net> (ironpeter)
+- Richard Hughes <hughsi...@users.sf.net> (hughsient)
+- Rudi Cilibrasi <cilib...@users.sf.net> (cilibrar)
+- Serge Martin
+- Simon Raffeiner
+- Stephane Loeuillet <lerout...@users.sf.net> (leroutier)
+- Stephane Marchesin <stephane.marche...@gmail.com> (marcheu)
+- sturmflut <sturmf...@users.sf.net> (sturmflut)
+- Sylvain Munaut <t...@246tnt.com>
+- Victor Stinner <victor.stin...@haypocalc.com> (haypo)
+- Wladmir van der Laan <laa...@gmail.com> (miathan6)
+- Younes Manton <youne...@gmail.com> (ymanton)
+
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
+The above copyright notice and this permission notice (including the
+next paragraph) shall be included in all copies or substantial
+portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+*/
+
+
+#define GM107_TIC2__SIZE   0x0020
+#define GM107_TIC2_0   0x
+#define GM107_TIC2_0_COMPONENTS_SIZES__MASK0x007f
+#define GM107_TIC2_0_COMPONENTS_SIZES__SHIFT   0
+#define GM107_TIC2_0_COMPONENTS_SIZES_R32_G32_B32_A32  0x0001
+#define GM107_TIC2_0_COMPONENTS_SIZES_R32_G32_B32  0x0002
+#define GM107_TIC2_0_CO

Re: [Nouveau] [PATCH 2/5] nvc0: make use of the new hwdefs for TEX_CB_INDEX

2016-10-17 Thread Samuel Pitoiset

Reviewed-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>

On 10/16/2016 09:14 PM, Ilia Mirkin wrote:

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---
 src/nvc0_accel.c | 2 +-
 src/nvc0_accel.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/nvc0_accel.c b/src/nvc0_accel.c
index 52a17db..0682806 100644
--- a/src/nvc0_accel.c
+++ b/src/nvc0_accel.c
@@ -313,7 +313,7 @@ NVAccelInit3D_NVC0(ScrnInfoPtr pScrn)
PUSH_DATA (push, 0x0001);
BEGIN_NVC0(push, NVC0_3D(CB_BIND(4)), 1);
PUSH_DATA (push, 0x11);
-   BEGIN_NVC0(push, SUBC_3D(0x2608), 1);
+   BEGIN_NVC0(push, NVE4_3D(TEX_CB_INDEX), 1);
PUSH_DATA (push, 1);
}

diff --git a/src/nvc0_accel.h b/src/nvc0_accel.h
index 4c3bb0f..607e97b 100644
--- a/src/nvc0_accel.h
+++ b/src/nvc0_accel.h
@@ -12,6 +12,7 @@
 /* subchannel assignments, compatible with kepler's fixed layout  */
 #define SUBC_3D(mthd)0, (mthd)
 #define NVC0_3D(mthd)SUBC_3D(NVC0_3D_##mthd)
+#define NVE4_3D(mthd)SUBC_3D(NVE4_3D_##mthd)
 #define SUBC_M2MF(mthd)  2, (mthd)
 #define SUBC_P2MF(mthd)  2, (mthd)
 #define NVC0_M2MF(mthd)  SUBC_M2MF(NVC0_M2MF_##mthd)



--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] exa: add GM10x acceleration support

2016-10-17 Thread Samuel Pitoiset

Looks reasonable, some minor comments below.

On 10/16/2016 02:06 AM, Ilia Mirkin wrote:

rendercheck -f a8r8g8b8 passes as much as on a GK208, and xv appears to
work. Very lightly tested.

Instead of sticking coordinates into pushbufs, the vertex shader is
modified to read them from a constbuf, indexed by vertex id. This
approach could be used for all nvc0 generations, but I didn't want to
rock the boat.

Signed-off-by: Ilia Mirkin 
---

Note: this won't work for GM20x - we need to allow TIC format to be updated
for that to work. But this is a step in that direction.

 src/Makefile.am   |  16 
 src/nouveau_copy.c|   1 +
 src/nouveau_exa.c |   2 +-
 src/nouveau_xv.c  |   2 +-
 src/nv_accel_common.c |   1 +
 src/nv_driver.c   |   1 +
 src/nvc0_accel.c  |  37 ++---
 src/nvc0_exa.c|  48 --
 src/nvc0_xv.c |  48 --
 src/shader/Makefile   |  23 ---
 src/shader/exac8nv110.fp  |  47 +
 src/shader/exac8nv110.fpc |  38 +
 src/shader/exacanv110.fp  |  47 +
 src/shader/exacanv110.fpc |  38 +
 src/shader/exacmnv110.fp  |  47 +
 src/shader/exacmnv110.fpc |  38 +
 src/shader/exas8nv110.fp  |  42 +++
 src/shader/exas8nv110.fpc |  28 +
 src/shader/exasanv110.fp  |  47 +
 src/shader/exasanv110.fpc |  38 +
 src/shader/exascnv110.fp  |  38 +
 src/shader/exascnv110.fpc |  20 +
 src/shader/videonv110.fp  |  54 
 src/shader/videonv110.fpc |  52 +++
 src/shader/xfrm2nv110.vp  |  82 +
 src/shader/xfrm2nv110.vpc | 102 ++
 26 files changed, 918 insertions(+), 19 deletions(-)
 create mode 100644 src/shader/exac8nv110.fp
 create mode 100644 src/shader/exac8nv110.fpc
 create mode 100644 src/shader/exacanv110.fp
 create mode 100644 src/shader/exacanv110.fpc
 create mode 100644 src/shader/exacmnv110.fp
 create mode 100644 src/shader/exacmnv110.fpc
 create mode 100644 src/shader/exas8nv110.fp
 create mode 100644 src/shader/exas8nv110.fpc
 create mode 100644 src/shader/exasanv110.fp
 create mode 100644 src/shader/exasanv110.fpc
 create mode 100644 src/shader/exascnv110.fp
 create mode 100644 src/shader/exascnv110.fpc
 create mode 100644 src/shader/videonv110.fp
 create mode 100644 src/shader/videonv110.fpc
 create mode 100644 src/shader/xfrm2nv110.vp
 create mode 100644 src/shader/xfrm2nv110.vpc

diff --git a/src/Makefile.am b/src/Makefile.am
index 1e04ddf..6ba8d87 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -77,48 +77,64 @@ EXTRA_DIST = hwdefs/nv_3ddefs.xml.h \
 shader/exac8nve0.fpc \
 shader/exac8nvf0.fp \
 shader/exac8nvf0.fpc \
+shader/exac8nv110.fp \
+shader/exac8nv110.fpc \
 shader/exacanvc0.fp \
 shader/exacanvc0.fpc \
 shader/exacanve0.fp \
 shader/exacanve0.fpc \
 shader/exacanvf0.fp \
 shader/exacanvf0.fpc \
+shader/exacanv110.fp \
+shader/exacanv110.fpc \
 shader/exacmnvc0.fp \
 shader/exacmnvc0.fpc \
 shader/exacmnve0.fp \
 shader/exacmnve0.fpc \
 shader/exacmnvf0.fp \
 shader/exacmnvf0.fpc \
+shader/exacmnv110.fp \
+shader/exacmnv110.fpc \
 shader/exas8nvc0.fp \
 shader/exas8nvc0.fpc \
 shader/exas8nve0.fp \
 shader/exas8nve0.fpc \
 shader/exas8nvf0.fp \
 shader/exas8nvf0.fpc \
+shader/exas8nv110.fp \
+shader/exas8nv110.fpc \
 shader/exasanvc0.fp \
 shader/exasanvc0.fpc \
 shader/exasanve0.fp \
 shader/exasanve0.fpc \
 shader/exasanvf0.fp \
 shader/exasanvf0.fpc \
+shader/exasanv110.fp \
+shader/exasanv110.fpc \
 shader/exascnvc0.fp \
 shader/exascnvc0.fpc \
 shader/exascnve0.fp \
 shader/exascnve0.fpc \
 shader/exascnvf0.fp \
 shader/exascnvf0.fpc \
+shader/exascnv110.fp \
+shader/exascnv110.fpc \
 shader/videonvc0.fp \
 shader/videonvc0.fpc \
 shader/videonve0.fp \
 shader/videonve0.fpc \
 shader/videonvf0.fp \
 shader/videonvf0.fpc \
+shader/videonv110.fp \
+shader/videonv110.fpc \
 shader/xfrm2nvc0.vp \
 shader/xfrm2nvc0.vpc \
 shader/xfrm2nve0.vp \
 shader/xfrm2nve0.vpc \
 shader/xfrm2nvf0.vp \
 shader/xfrm2nvf0.vpc \
+

Re: [Nouveau] NVidia Hardware Donation possible

2016-10-12 Thread Samuel Pitoiset



On 10/12/2016 05:54 PM, Gediminas Jakutis wrote:

On 2016.10.11 17:46, Samuel Pitoiset wrote:



On 10/11/2016 02:18 PM, Martin Vorbach wrote:

Samuel,

the HP GT630 is unfortunately the GK107. Given we find the other GT630
model I will check it and come back to you.

Are you interested in any of the other two cards?


I'm (just) interested by Kepler cards actually. :-)

Because I already have a bunch of Tesla, Fermi and one Maxwell.



Hello,
Isn't a GK107 a Kepler? AFAIK the 'K' in "GK" stands for "Kepler"


It is.



Regards,
Gediminas Jakutis



Regards,

Martin


On 2016-10-10 13:45, Samuel Pitoiset wrote:



On 10/10/2016 01:44 PM, Martin Vorbach wrote:

Hi,

I talked to our IT guy over lunch. He thinks there is an old GT630 with
384 shaders somewhere. The 384 shader GPU is the GK208.


Sure, the GK208 would be nice for me. :-)




Given we can find it and it is obsolete, I probably can donate this one
too.

Let me know if you are looking for this one.

Regards,

Martin


On 2016-10-10 11:09, Martin Vorbach wrote:

Hi,

the GT630 is this one:

http://www8.hp.com/emea_africa/en/products/oas/product-detail.html?oid=5275291



Regards,

Martin


On 2016-10-07 18:16, Samuel Pitoiset wrote:



On 10/07/2016 12:24 PM, Martin Vorbach wrote:

Hi,


Hi,

I'm interested. :)

Is the GeForce GT 630 a GK208?

Thanks!



I saw the donation requests and could offer the following cards:

  * Geforce N460GTX
  * Geforce GT 630
  * Quadro K4200

Feel free to contact me in case of interest.

Best regards,

Martin



___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau











___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau





___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] NVidia Hardware Donation possible

2016-10-12 Thread Samuel Pitoiset

Hi Martin,

Thanks for all your efforts about that. :-)

If you have a Kepler (whatever the chipset is) I'm interested.

Let's talk in private in order to avoid flooding the list.

Thanks.

On 10/12/2016 04:53 PM, Martin Vorbach wrote:

Samuel,

in the meantime this matter is driving me crazy.

I found a Quadro K610M, this should have the GK208 GPU. Don't know if it
works, it was in our store-room. Can check on Friday in what state the
card is and what lspci says. But, on a first look, it appears quite OK.
Also I am not sure I can actually get the card, will have to check this
too on Friday.

If this attempt to get a GK208 fails, I will have to give up. The guys
here are about to kill me ;-)

Regards,

Martin


On 2016-10-11 16:46, Samuel Pitoiset wrote:



On 10/11/2016 02:18 PM, Martin Vorbach wrote:

Samuel,

the HP GT630 is unfortunately the GK107. Given we find the other GT630
model I will check it and come back to you.

Are you interested in any of the other two cards?


I'm (just) interested by Kepler cards actually. :-)

Because I already have a bunch of Tesla, Fermi and one Maxwell.



Regards,

Martin


On 2016-10-10 13:45, Samuel Pitoiset wrote:



On 10/10/2016 01:44 PM, Martin Vorbach wrote:

Hi,

I talked to our IT guy over lunch. He thinks there is an old GT630
with
384 shaders somewhere. The 384 shader GPU is the GK208.


Sure, the GK208 would be nice for me. :-)




Given we can find it and it is obsolete, I probably can donate this
one
too.

Let me know if you are looking for this one.

Regards,

Martin


On 2016-10-10 11:09, Martin Vorbach wrote:

Hi,

the GT630 is this one:

http://www8.hp.com/emea_africa/en/products/oas/product-detail.html?oid=5275291




Regards,

Martin


On 2016-10-07 18:16, Samuel Pitoiset wrote:



On 10/07/2016 12:24 PM, Martin Vorbach wrote:

Hi,


Hi,

I'm interested. :)

Is the GeForce GT 630 a GK208?

Thanks!



I saw the donation requests and could offer the following cards:

  * Geforce N460GTX
  * Geforce GT 630
  * Quadro K4200

Feel free to contact me in case of interest.

Best regards,

Martin



___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau











___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau






--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] NVidia Hardware Donation possible

2016-10-11 Thread Samuel Pitoiset



On 10/11/2016 02:18 PM, Martin Vorbach wrote:

Samuel,

the HP GT630 is unfortunately the GK107. Given we find the other GT630
model I will check it and come back to you.

Are you interested in any of the other two cards?


I'm (just) interested by Kepler cards actually. :-)

Because I already have a bunch of Tesla, Fermi and one Maxwell.



Regards,

Martin


On 2016-10-10 13:45, Samuel Pitoiset wrote:



On 10/10/2016 01:44 PM, Martin Vorbach wrote:

Hi,

I talked to our IT guy over lunch. He thinks there is an old GT630 with
384 shaders somewhere. The 384 shader GPU is the GK208.


Sure, the GK208 would be nice for me. :-)




Given we can find it and it is obsolete, I probably can donate this one
too.

Let me know if you are looking for this one.

Regards,

Martin


On 2016-10-10 11:09, Martin Vorbach wrote:

Hi,

the GT630 is this one:

http://www8.hp.com/emea_africa/en/products/oas/product-detail.html?oid=5275291



Regards,

Martin


On 2016-10-07 18:16, Samuel Pitoiset wrote:



On 10/07/2016 12:24 PM, Martin Vorbach wrote:

Hi,


Hi,

I'm interested. :)

Is the GeForce GT 630 a GK208?

Thanks!



I saw the donation requests and could offer the following cards:

  * Geforce N460GTX
  * Geforce GT 630
  * Quadro K4200

Feel free to contact me in case of interest.

Best regards,

Martin



___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau











___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] NVidia Hardware Donation possible

2016-10-10 Thread Samuel Pitoiset



On 10/10/2016 01:44 PM, Martin Vorbach wrote:

Hi,

I talked to our IT guy over lunch. He thinks there is an old GT630 with
384 shaders somewhere. The 384 shader GPU is the GK208.


Sure, the GK208 would be nice for me. :-)




Given we can find it and it is obsolete, I probably can donate this one
too.

Let me know if you are looking for this one.

Regards,

Martin


On 2016-10-10 11:09, Martin Vorbach wrote:

Hi,

the GT630 is this one:

http://www8.hp.com/emea_africa/en/products/oas/product-detail.html?oid=5275291


Regards,

Martin


On 2016-10-07 18:16, Samuel Pitoiset wrote:



On 10/07/2016 12:24 PM, Martin Vorbach wrote:

Hi,


Hi,

I'm interested. :)

Is the GeForce GT 630 a GK208?

Thanks!



I saw the donation requests and could offer the following cards:

  * Geforce N460GTX
  * Geforce GT 630
  * Quadro K4200

Feel free to contact me in case of interest.

Best regards,

Martin



___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau









--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] NVidia Hardware Donation possible

2016-10-10 Thread Samuel Pitoiset



On 10/10/2016 01:39 PM, Karol Herbst wrote:

2016-10-10 11:09 GMT+02:00 Martin Vorbach <mar...@vorbach.name>:

Hi,

the GT630 is this one:

http://www8.hp.com/emea_africa/en/products/oas/product-detail.html?oid=5275291

Regards,

Martin



I am not quite sure, but I think this might be a Fermi one. But would
it be possible to plug it into a computer and check for sure? Lspci
delivers the information


Yes, that would be nice to check the real chipset. :-)

If it's a Kepler I'm definitely interested (because I don't have one).

Thanks!





On 2016-10-07 18:16, Samuel Pitoiset wrote:




On 10/07/2016 12:24 PM, Martin Vorbach wrote:


Hi,



Hi,

I'm interested. :)

Is the GeForce GT 630 a GK208?

Thanks!



I saw the donation requests and could offer the following cards:

  * Geforce N460GTX
  * Geforce GT 630
  * Quadro K4200

Feel free to contact me in case of interest.

Best regards,

Martin



___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau





___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] NVidia Hardware Donation possible

2016-10-07 Thread Samuel Pitoiset



On 10/07/2016 12:24 PM, Martin Vorbach wrote:

Hi,


Hi,

I'm interested. :)

Is the GeForce GT 630 a GK208?

Thanks!



I saw the donation requests and could offer the following cards:

  * Geforce N460GTX
  * Geforce GT 630
  * Quadro K4200

Feel free to contact me in case of interest.

Best regards,

Martin



___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] nouveau: Add missing PIPE_SHADER_CAP_INTEGERS to get_shader_param()

2016-04-11 Thread Samuel Pitoiset

The prefix should be "nv30:" instead of "nouveau:" I guess.

Reviewed-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>

On 04/11/2016 02:13 PM, Hans de Goede wrote:

Add missing PIPE_SHADER_CAP_INTEGERS for frag shaders to
nv30_screen_get_shader_param().

Signed-off-by: Hans de Goede <hdego...@redhat.com>
---
  src/gallium/drivers/nouveau/nv30/nv30_screen.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/nouveau/nv30/nv30_screen.c 
b/src/gallium/drivers/nouveau/nv30/nv30_screen.c
index db7c2d1..ece8af7 100644
--- a/src/gallium/drivers/nouveau/nv30/nv30_screen.c
+++ b/src/gallium/drivers/nouveau/nv30/nv30_screen.c
@@ -324,6 +324,7 @@ nv30_screen_get_shader_param(struct pipe_screen *pscreen, 
unsigned shader,
case PIPE_SHADER_CAP_INDIRECT_TEMP_ADDR:
case PIPE_SHADER_CAP_INDIRECT_CONST_ADDR:
case PIPE_SHADER_CAP_SUBROUTINES:
+  case PIPE_SHADER_CAP_INTEGERS:
case PIPE_SHADER_CAP_DOUBLES:
case PIPE_SHADER_CAP_TGSI_DROUND_SUPPORTED:
case PIPE_SHADER_CAP_TGSI_DFRACEXP_DLDEXP_SUPPORTED:



--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] "unknown fragment shader param 17" error on NV46 when running glxgears

2016-04-11 Thread Samuel Pitoiset



On 04/11/2016 12:10 PM, Hans de Goede wrote:

Hi,


Hi,



While trying to reproduce:
https://bugzilla.redhat.com/show_bug.cgi?id=1325667

I also gave glxgears a quick test with mesa master, this
resulted in the following errors being printed to the
terminal from which glxgears was started :

unknown fragment shader param 17
unknown fragment shader param 17
unknown fragment shader param 17
unknown fragment shader param 17
unknown fragment shader param 17

I would like to fix these, any hints where I need to look ?



It seems like that PIPE_SHADER_CAP_INTEGERS for frag shaders is missing 
in nv30_screen_get_shader_param(). This won't fix your issue but this 
will silence this debug message though.



Regards,

Hans


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH mesa 6/6] nouveau: codegen: Disable more old resource handling code

2016-03-19 Thread Samuel Pitoiset



On 03/16/2016 11:49 AM, Hans de Goede wrote:

Hi,

On 16-03-16 11:45, Samuel Pitoiset wrote:



On 03/16/2016 10:23 AM, Hans de Goede wrote:

Commit c3083c7082 ("nv50/ir: add support for BUFFER accesses")
disabled /
commented out some of the old resource handling code, but not all of it.

Effectively all of it is dead already, if we ever enter the old code
paths in handeLOAD / handleSTORE / handleATOM we will get an exception
due to trying to access the now always zero-sized resources vector.

Make non buffer / memory file accesses not being supported in these
functions more explicit and comment out a whole bunch of dead code.

Also remove the magic file-indexe defines from the old resource code
from include/pipe/p_shader_tokens.h as those are no longer used now
(which is a good thing).

Signed-off-by: Hans de Goede <hdego...@redhat.com>
---
  .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 42
+++---
  src/gallium/include/pipe/p_shader_tokens.h |  9 -
  2 files changed, 30 insertions(+), 21 deletions(-)

diff --git
a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index c167c4a..115d0bb 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -856,12 +856,14 @@ public:
 };
 std::vector textureViews;

+   /*
 struct Resource {
uint8_t target; // TGSI_TEXTURE_*
bool raw;
uint8_t slot; // $surface index
 };
 std::vector resources;
+   */

 struct MemoryFile {
uint8_t mem_type; // TGSI_MEMORY_TYPE_*
@@ -1423,8 +1425,8 @@ private:
 void handleLIT(Value *dst0[4]);
 void handleUserClipPlanes();

-   Symbol *getResourceBase(int r);
-   void getResourceCoords(std::vector&, int r, int s);
+   // Symbol *getResourceBase(int r);
+   // void getResourceCoords(std::vector&, int r, int s);

 void handleLOAD(Value *dst0[4]);
 void handleSTORE();
@@ -2169,6 +2171,7 @@ Converter::handleLIT(Value *dst0[4])
 }
  }

+/* Keep this around for now as reference when adding img support
  static inline bool
  isResourceSpecial(const int r)
  {
@@ -2264,6 +2267,7 @@ partitionLoadStore(uint8_t comp[2], uint8_t
size[2], uint8_t mask)
 }
 return n + 1;
  }
+*/

  // For raw loads, granularity is 4 byte.
  // Usage of the texture read mask on OP_SULDP is not allowed.
@@ -2274,8 +2278,9 @@ Converter::handleLOAD(Value *dst0[4])
 int c;
 std::vector off, src, ldv, def;

-   if (tgsi.getSrc(0).getFile() == TGSI_FILE_BUFFER ||
-   tgsi.getSrc(0).getFile() == TGSI_FILE_MEMORY) {
+   switch (tgsi.getSrc(0).getFile()) {
+   case TGSI_FILE_BUFFER:
+   case TGSI_FILE_MEMORY:
for (c = 0; c < 4; ++c) {
   if (!dst0[c])
  continue;
@@ -2295,9 +2300,12 @@ Converter::handleLOAD(Value *dst0[4])
   if (tgsi.getSrc(0).isIndirect(0))
  ld->setIndirect(0, 1,
fetchSrc(tgsi.getSrc(0).getIndirect(0), 0, 0));
}
-  return;
+  break;
+   default:
+  assert(!"Unsupported srcFile for LOAD");
 }

+/* Keep this around for now as reference when adding img support
 getResourceCoords(off, r, 1);

 if (isResourceRaw(code, r)) {
@@ -2363,6 +2371,7 @@ Converter::handleLOAD(Value *dst0[4])
 FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi)
if (dst0[c] != def[c])
   mkMov(dst0[c], def[tgsi.getSrc(0).getSwizzle(c)]);
+*/
  }

  // For formatted stores, the write mask on OP_SUSTP can be used.
@@ -2374,8 +2383,9 @@ Converter::handleSTORE()
 int c;
 std::vector off, src, dummy;

-   if (tgsi.getDst(0).getFile() == TGSI_FILE_BUFFER ||
-   tgsi.getDst(0).getFile() == TGSI_FILE_MEMORY) {
+   switch (tgsi.getDst(0).getFile()) {
+   case TGSI_FILE_BUFFER:
+   case TGSI_FILE_MEMORY:
for (c = 0; c < 4; ++c) {
   if (!(tgsi.getDst(0).getMask() & (1 << c)))
  continue;
@@ -2396,9 +2406,12 @@ Converter::handleSTORE()
   if (tgsi.getDst(0).isIndirect(0))
  st->setIndirect(0, 1,
fetchSrc(tgsi.getDst(0).getIndirect(0), 0, 0));
}
-  return;
+  break;
+   default:
+  assert(!"Unsupported dstFile for STORE");
 }

+/* Keep this around for now as reference when adding img support
 getResourceCoords(off, r, 0);
 src = off;
 const int s = src.size();
@@ -2446,6 +2459,7 @@ Converter::handleSTORE()
mkTex(OP_SUSTP, getResourceTarget(code, r),
code->resources[r].slot, 0,
  dummy, src)->tex.mask = tgsi.getDst(0).getMask();
 }
+*/
  }

  // XXX: These only work on resources with the single-component
u32/s32 formats.
@@ -2460,8 +2474,9 @@ Converter::handleATOM(Value *dst0[4], DataType
ty, uint16_t subOp)
 std::vector defv;
 LValue *dst = getScratch();

-   if (tgsi.getSrc(0).getFile() == TGSI_FILE_BUFFER ||
-   tgsi.getSrc(0).getFile()

Re: [Nouveau] [RFC mesa] nouveau: Add support for OpenCL global memory buffers

2016-03-19 Thread Samuel Pitoiset



On 03/17/2016 05:07 PM, Hans de Goede wrote:

Hi,

On 14-03-16 21:50, Samuel Pitoiset wrote:




Btw, do you need someone with commit access to push your previous
series (the tgsi thing)? I can do this for you.


Thanks for the offer. IIRC Ilia wanted some minor fixes there, so
I'll do
a v2 tomorrow. Talking about commit rights, I guess it would be
convenient for all if I would get commit rights myself? I promise I
won't
push anythings without acks.


Yes sure, I trust you, no worries. :-)



I already have a freedesktop.org account, my username is jwrdegoede.


Please open a ticket on bugs.freedesktop to ask for commit rights.


Done:

https://bugs.freedesktop.org/show_bug.cgi?id=94594

Can you or Ilia please ack this ?



Done.


Thanks,

Hans

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH mesa v2 1/3] nouveau: codegen: Disable more old resource handling code

2016-03-19 Thread Samuel Pitoiset

Series is:

Reviewed-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>

On 03/17/2016 10:13 AM, Hans de Goede wrote:

Commit c3083c7082 ("nv50/ir: add support for BUFFER accesses") disabled /
commented out some of the old resource handling code, but not all of it.

Effectively all of it is dead already, if we ever enter the old code
paths in handeLOAD / handleSTORE / handleATOM we will get an exception
due to trying to access the now always zero-sized resources vector.

Disable all the dead code.

Signed-off-by: Hans de Goede <hdego...@redhat.com>
---
Changes in v2:
-Split out assert() on getFile() != BUFFER/MEMORY into a separate patch
-Split out removal of TGSI_RESOURCE_* defines into a separate patch
---
  src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 15 ---
  1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index 1e91ad3..41eb4e3 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -856,12 +856,14 @@ public:
 };
 std::vector textureViews;

+   /*
 struct Resource {
uint8_t target; // TGSI_TEXTURE_*
bool raw;
uint8_t slot; // $surface index
 };
 std::vector resources;
+   */

 struct MemoryFile {
uint8_t mem_type; // TGSI_MEMORY_TYPE_*
@@ -1419,8 +1421,8 @@ private:
 void handleLIT(Value *dst0[4]);
 void handleUserClipPlanes();

-   Symbol *getResourceBase(int r);
-   void getResourceCoords(std::vector&, int r, int s);
+   // Symbol *getResourceBase(int r);
+   // void getResourceCoords(std::vector&, int r, int s);

 void handleLOAD(Value *dst0[4]);
 void handleSTORE();
@@ -2161,6 +2163,7 @@ Converter::handleLIT(Value *dst0[4])
 }
  }

+/* Keep this around for now as reference when adding img support
  static inline bool
  isResourceSpecial(const int r)
  {
@@ -2256,6 +2259,7 @@ partitionLoadStore(uint8_t comp[2], uint8_t size[2], 
uint8_t mask)
 }
 return n + 1;
  }
+*/

  // For raw loads, granularity is 4 byte.
  // Usage of the texture read mask on OP_SULDP is not allowed.
@@ -2290,6 +2294,7 @@ Converter::handleLOAD(Value *dst0[4])
return;
 }

+/* Keep this around for now as reference when adding img support
 getResourceCoords(off, r, 1);

 if (isResourceRaw(code, r)) {
@@ -2355,6 +2360,7 @@ Converter::handleLOAD(Value *dst0[4])
 FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi)
if (dst0[c] != def[c])
   mkMov(dst0[c], def[tgsi.getSrc(0).getSwizzle(c)]);
+*/
  }

  // For formatted stores, the write mask on OP_SUSTP can be used.
@@ -2391,6 +2397,7 @@ Converter::handleSTORE()
return;
 }

+/* Keep this around for now as reference when adding img support
 getResourceCoords(off, r, 0);
 src = off;
 const int s = src.size();
@@ -2438,6 +2445,7 @@ Converter::handleSTORE()
mkTex(OP_SUSTP, getResourceTarget(code, r), code->resources[r].slot, 0,
  dummy, src)->tex.mask = tgsi.getDst(0).getMask();
 }
+*/
  }

  // XXX: These only work on resources with the single-component u32/s32 
formats.
@@ -2484,7 +2492,7 @@ Converter::handleATOM(Value *dst0[4], DataType ty, 
uint16_t subOp)
return;
 }

-
+/* Keep this around for now as reference when adding img support
 getResourceCoords(srcv, r, 1);

 if (isResourceSpecial(r)) {
@@ -2512,6 +2520,7 @@ Converter::handleATOM(Value *dst0[4], DataType ty, 
uint16_t subOp)
 for (int c = 0; c < 4; ++c)
if (dst0[c])
   dst0[c] = dst; // not equal to rDst so handleInstruction will do 
mkMov
+*/
  }

  void


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH mesa 6/6] nouveau: codegen: Disable more old resource handling code

2016-03-16 Thread Samuel Pitoiset



On 03/16/2016 10:23 AM, Hans de Goede wrote:

Commit c3083c7082 ("nv50/ir: add support for BUFFER accesses") disabled /
commented out some of the old resource handling code, but not all of it.

Effectively all of it is dead already, if we ever enter the old code
paths in handeLOAD / handleSTORE / handleATOM we will get an exception
due to trying to access the now always zero-sized resources vector.

Make non buffer / memory file accesses not being supported in these
functions more explicit and comment out a whole bunch of dead code.

Also remove the magic file-indexe defines from the old resource code
from include/pipe/p_shader_tokens.h as those are no longer used now
(which is a good thing).

Signed-off-by: Hans de Goede 
---
  .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 42 +++---
  src/gallium/include/pipe/p_shader_tokens.h |  9 -
  2 files changed, 30 insertions(+), 21 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index c167c4a..115d0bb 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -856,12 +856,14 @@ public:
 };
 std::vector textureViews;

+   /*
 struct Resource {
uint8_t target; // TGSI_TEXTURE_*
bool raw;
uint8_t slot; // $surface index
 };
 std::vector resources;
+   */

 struct MemoryFile {
uint8_t mem_type; // TGSI_MEMORY_TYPE_*
@@ -1423,8 +1425,8 @@ private:
 void handleLIT(Value *dst0[4]);
 void handleUserClipPlanes();

-   Symbol *getResourceBase(int r);
-   void getResourceCoords(std::vector&, int r, int s);
+   // Symbol *getResourceBase(int r);
+   // void getResourceCoords(std::vector&, int r, int s);

 void handleLOAD(Value *dst0[4]);
 void handleSTORE();
@@ -2169,6 +2171,7 @@ Converter::handleLIT(Value *dst0[4])
 }
  }

+/* Keep this around for now as reference when adding img support
  static inline bool
  isResourceSpecial(const int r)
  {
@@ -2264,6 +2267,7 @@ partitionLoadStore(uint8_t comp[2], uint8_t size[2], 
uint8_t mask)
 }
 return n + 1;
  }
+*/

  // For raw loads, granularity is 4 byte.
  // Usage of the texture read mask on OP_SULDP is not allowed.
@@ -2274,8 +2278,9 @@ Converter::handleLOAD(Value *dst0[4])
 int c;
 std::vector off, src, ldv, def;

-   if (tgsi.getSrc(0).getFile() == TGSI_FILE_BUFFER ||
-   tgsi.getSrc(0).getFile() == TGSI_FILE_MEMORY) {
+   switch (tgsi.getSrc(0).getFile()) {
+   case TGSI_FILE_BUFFER:
+   case TGSI_FILE_MEMORY:
for (c = 0; c < 4; ++c) {
   if (!dst0[c])
  continue;
@@ -2295,9 +2300,12 @@ Converter::handleLOAD(Value *dst0[4])
   if (tgsi.getSrc(0).isIndirect(0))
  ld->setIndirect(0, 1, fetchSrc(tgsi.getSrc(0).getIndirect(0), 0, 
0));
}
-  return;
+  break;
+   default:
+  assert(!"Unsupported srcFile for LOAD");
 }

+/* Keep this around for now as reference when adding img support
 getResourceCoords(off, r, 1);

 if (isResourceRaw(code, r)) {
@@ -2363,6 +2371,7 @@ Converter::handleLOAD(Value *dst0[4])
 FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi)
if (dst0[c] != def[c])
   mkMov(dst0[c], def[tgsi.getSrc(0).getSwizzle(c)]);
+*/
  }

  // For formatted stores, the write mask on OP_SUSTP can be used.
@@ -2374,8 +2383,9 @@ Converter::handleSTORE()
 int c;
 std::vector off, src, dummy;

-   if (tgsi.getDst(0).getFile() == TGSI_FILE_BUFFER ||
-   tgsi.getDst(0).getFile() == TGSI_FILE_MEMORY) {
+   switch (tgsi.getDst(0).getFile()) {
+   case TGSI_FILE_BUFFER:
+   case TGSI_FILE_MEMORY:
for (c = 0; c < 4; ++c) {
   if (!(tgsi.getDst(0).getMask() & (1 << c)))
  continue;
@@ -2396,9 +2406,12 @@ Converter::handleSTORE()
   if (tgsi.getDst(0).isIndirect(0))
  st->setIndirect(0, 1, fetchSrc(tgsi.getDst(0).getIndirect(0), 0, 
0));
}
-  return;
+  break;
+   default:
+  assert(!"Unsupported dstFile for STORE");
 }

+/* Keep this around for now as reference when adding img support
 getResourceCoords(off, r, 0);
 src = off;
 const int s = src.size();
@@ -2446,6 +2459,7 @@ Converter::handleSTORE()
mkTex(OP_SUSTP, getResourceTarget(code, r), code->resources[r].slot, 0,
  dummy, src)->tex.mask = tgsi.getDst(0).getMask();
 }
+*/
  }

  // XXX: These only work on resources with the single-component u32/s32 
formats.
@@ -2460,8 +2474,9 @@ Converter::handleATOM(Value *dst0[4], DataType ty, 
uint16_t subOp)
 std::vector defv;
 LValue *dst = getScratch();

-   if (tgsi.getSrc(0).getFile() == TGSI_FILE_BUFFER ||
-   tgsi.getSrc(0).getFile() == TGSI_FILE_MEMORY) {
+   switch (tgsi.getSrc(0).getFile()) {
+   case TGSI_FILE_BUFFER:
+   case TGSI_FILE_MEMORY:
for (int c = 0; c < 4; ++c) {
 

Re: [Nouveau] [PATCH mesa v2 3/3] nouveau: codegen: Add support for clover / OpenCL kernel input parameters

2016-03-16 Thread Samuel Pitoiset

Reviewed-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>

On 03/16/2016 09:55 AM, Hans de Goede wrote:

Add support for clover / OpenCL kernel input parameters.

Signed-off-by: Hans de Goede <hdego...@redhat.com>
Reviewed-by: Ilia Mirkin <imir...@alum.mit.edu>
---
Changes in v2:
-s/local/private/
-Add: Reviewed-by: Ilia Mirkin <imir...@alum.mit.edu>
---
  .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 18 +++---
  1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index fb7caca..8a1a426 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -1527,9 +1527,21 @@ Converter::makeSym(uint tgsiFile, int fileIdx, int idx, 
int c, uint32_t address)

 sym->reg.fileIndex = fileIdx;

-   if (tgsiFile == TGSI_FILE_MEMORY &&
-   code->memoryFiles[fileIdx].mem_type == TGSI_MEMORY_TYPE_SHARED)
-  sym->setFile(FILE_MEMORY_SHARED);
+   if (tgsiFile == TGSI_FILE_MEMORY) {
+  switch (code->memoryFiles[fileIdx].mem_type) {
+  case TGSI_MEMORY_TYPE_SHARED:
+ sym->setFile(FILE_MEMORY_SHARED);
+ break;
+  case TGSI_MEMORY_TYPE_INPUT:
+ assert(prog->getType() == Program::TYPE_COMPUTE);
+ assert(idx == -1);
+ sym->setFile(FILE_SHADER_INPUT);
+ address += info->prop.cp.inputOffset;
+ break;
+  default:
+ assert(0); /* TODO: Add support for global and private memory */
+  }
+   }

 if (idx >= 0) {
if (sym->reg.file == FILE_SHADER_INPUT)


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH mesa v2 2/3] tgsi: Add support for global / private / input MEMORY

2016-03-16 Thread Samuel Pitoiset

Reviewed-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>

Btw, usually when someone has reviewed the v1 we add (v1) after the Rb 
tag, like:


Reviewed-by: XXX (v1)

On 03/16/2016 09:55 AM, Hans de Goede wrote:

Extend the MEMORY file support to differentiate between global, private
and shared memory, as well as "input" memory.

"MEMORY[x], INPUT" is intended to access OpenCL kernel parameters, a
special memory type is added for this, since the actual storage of these
(e.g. UBO-s) may differ per implementation. The uploading of kernel
parameters is handled by launch_grid, "MEMORY[x], INPUT" allows drivers
to use an access mechanism for parameter reads which matches with the
upload method.

Signed-off-by: Hans de Goede <hdego...@redhat.com>
Reviewed-by: Ilia Mirkin <imir...@alum.mit.edu>
---
Changes in v2:
-Drop mention of GLSL global / GLSL local from comments
-Change TGSI_MEMORY_TYPE_LOCAL to TGSI_MEMORY_TYPE_PRIVATE
-Add Reviewed-by: Ilia Mirkin <imir...@alum.mit.edu>
---
  src/gallium/auxiliary/tgsi/tgsi_build.c|  8 +++
  src/gallium/auxiliary/tgsi/tgsi_dump.c |  9 ++--
  src/gallium/auxiliary/tgsi/tgsi_text.c | 14 ++--
  src/gallium/auxiliary/tgsi/tgsi_ureg.c | 25 --
  src/gallium/auxiliary/tgsi/tgsi_ureg.h |  2 +-
  .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  |  7 +++---
  src/gallium/include/pipe/p_shader_tokens.h | 10 +++--
  src/mesa/state_tracker/st_glsl_to_tgsi.cpp |  2 +-
  8 files changed, 51 insertions(+), 26 deletions(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_build.c 
b/src/gallium/auxiliary/tgsi/tgsi_build.c
index 1cb95b9..a3e659b 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_build.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_build.c
@@ -111,7 +111,7 @@ tgsi_default_declaration( void )
 declaration.Local = 0;
 declaration.Array = 0;
 declaration.Atomic = 0;
-   declaration.Shared = 0;
+   declaration.MemType = TGSI_MEMORY_TYPE_GLOBAL;
 declaration.Padding = 0;

 return declaration;
@@ -128,7 +128,7 @@ tgsi_build_declaration(
 unsigned local,
 unsigned array,
 unsigned atomic,
-   unsigned shared,
+   unsigned mem_type,
 struct tgsi_header *header )
  {
 struct tgsi_declaration declaration;
@@ -146,7 +146,7 @@ tgsi_build_declaration(
 declaration.Local = local;
 declaration.Array = array;
 declaration.Atomic = atomic;
-   declaration.Shared = shared;
+   declaration.MemType = mem_type;
 header_bodysize_grow( header );

 return declaration;
@@ -406,7 +406,7 @@ tgsi_build_full_declaration(
full_decl->Declaration.Local,
full_decl->Declaration.Array,
full_decl->Declaration.Atomic,
-  full_decl->Declaration.Shared,
+  full_decl->Declaration.MemType,
header );

 if (maxsize <= size)
diff --git a/src/gallium/auxiliary/tgsi/tgsi_dump.c 
b/src/gallium/auxiliary/tgsi/tgsi_dump.c
index c8b91bb..6d39ef2 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_dump.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_dump.c
@@ -365,8 +365,13 @@ iter_declaration(
 }

 if (decl->Declaration.File == TGSI_FILE_MEMORY) {
-  if (decl->Declaration.Shared)
- TXT(", SHARED");
+  switch (decl->Declaration.MemType) {
+  /* Note: ,GLOBAL is optional / the default */
+  case TGSI_MEMORY_TYPE_GLOBAL:  TXT(", GLOBAL");  break;
+  case TGSI_MEMORY_TYPE_SHARED:  TXT(", SHARED");  break;
+  case TGSI_MEMORY_TYPE_PRIVATE: TXT(", PRIVATE"); break;
+  case TGSI_MEMORY_TYPE_INPUT:   TXT(", INPUT");   break;
+  }
 }

 if (decl->Declaration.File == TGSI_FILE_SAMPLER_VIEW) {
diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c 
b/src/gallium/auxiliary/tgsi/tgsi_text.c
index 77598d2..028633c 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_text.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_text.c
@@ -1390,8 +1390,18 @@ static boolean parse_declaration( struct translate_ctx 
*ctx )
  ctx->cur = cur;
   }
} else if (file == TGSI_FILE_MEMORY) {
- if (str_match_nocase_whole(, "SHARED")) {
-decl.Declaration.Shared = 1;
+ if (str_match_nocase_whole(, "GLOBAL")) {
+/* Note this is a no-op global is the default */
+decl.Declaration.MemType = TGSI_MEMORY_TYPE_GLOBAL;
+ctx->cur = cur;
+ } else if (str_match_nocase_whole(, "SHARED")) {
+decl.Declaration.MemType = TGSI_MEMORY_TYPE_SHARED;
+ctx->cur = cur;
+ } else if (str_match_nocase_whole(, "PRIVATE")) {
+decl.Declaration.MemType = TGSI_MEMORY_TYPE_PRIVATE;
+ctx->cur = cur;
+ } else if (str_match_nocase_whole(, "INPUT")) {
+decl.Declaration.MemType = TGSI_MEMORY_TYPE_INPUT;
 

Re: [Nouveau] [PATCH mesa v2 1/3] tgsi: Fix decl.Atomic and .Shared not propagating when parsing tgsi text

2016-03-16 Thread Samuel Pitoiset

Reviewed-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>

On 03/16/2016 09:55 AM, Hans de Goede wrote:

When support for decl.Atomic and .Shared was added, tgsi_build_declaration
was not updated to propagate these properly.

Signed-off-by: Hans de Goede <hdego...@redhat.com>
Reviewed-by: Ilia Mirkin <imir...@alum.mit.edu>
---
Changes in v2:
-Add Reviewed-by: Ilia Mirkin <imir...@alum.mit.edu>
---
  src/gallium/auxiliary/tgsi/tgsi_build.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_build.c 
b/src/gallium/auxiliary/tgsi/tgsi_build.c
index e5355f5..1cb95b9 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_build.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_build.c
@@ -127,6 +127,8 @@ tgsi_build_declaration(
 unsigned invariant,
 unsigned local,
 unsigned array,
+   unsigned atomic,
+   unsigned shared,
 struct tgsi_header *header )
  {
 struct tgsi_declaration declaration;
@@ -143,6 +145,8 @@ tgsi_build_declaration(
 declaration.Invariant = invariant;
 declaration.Local = local;
 declaration.Array = array;
+   declaration.Atomic = atomic;
+   declaration.Shared = shared;
 header_bodysize_grow( header );

 return declaration;
@@ -401,6 +405,8 @@ tgsi_build_full_declaration(
full_decl->Declaration.Invariant,
full_decl->Declaration.Local,
full_decl->Declaration.Array,
+  full_decl->Declaration.Atomic,
+  full_decl->Declaration.Shared,
header );

 if (maxsize <= size)


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH mesa] clover: Fix pipe_grid_info.indirect not being initialized

2016-03-14 Thread Samuel Pitoiset



On 03/14/2016 09:49 PM, Francisco Jerez wrote:

Samuel Pitoiset <samuel.pitoi...@gmail.com> writes:


On 03/14/2016 02:29 PM, Samuel Pitoiset wrote:



On 03/14/2016 02:26 PM, Hans de Goede wrote:

Hi,

On 14-03-16 14:01, Samuel Pitoiset wrote:



On 03/14/2016 01:50 PM, Hans de Goede wrote:

After pipe_grid_info.indirect was introduced, clover was not modified
to set it causing it to pass uninitialized memory for it to
launch_grid.

This commit fixes this by zero-ing the entire pipe_grid_info struct
when
declaring it, to avoid similar problems popping-up in the future.

Signed-off-by: Hans de Goede <hdego...@redhat.com>
---
   src/gallium/state_trackers/clover/core/kernel.cpp | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/clover/core/kernel.cpp
b/src/gallium/state_trackers/clover/core/kernel.cpp
index 8396be9..dad66aa 100644
--- a/src/gallium/state_trackers/clover/core/kernel.cpp
+++ b/src/gallium/state_trackers/clover/core/kernel.cpp
@@ -55,7 +55,7 @@ kernel::launch(command_queue ,
  const auto reduced_grid_size =
 map(divides(), grid_size, block_size);
  void *st = exec.bind(, grid_offset);
-   struct pipe_grid_info info;
+   struct pipe_grid_info info = { 0, };


Right, good catch, it's my fault.

= { 0 }; is enough btw.


I prefer to add the "," to make clear that we are initializing the
entire struct,
I read it as  ", ...".


Well, usually we use { 0 } in mesa, try to grep and you will see. :-)
There is only 3 occurrences of { 0, }, but I think they are quite old.


Of course, I'm not really against this ",", but I just want consistency
with the other parts.



In C++ '{}' is standard, more concise, and works for a wider range of
types regardless of their layout ('{ 0 }' is valid or not depending on
what the first member of the struct is, while '{}' works regardless, in
C++11 it can even be used to initialize non-POD types with custom
constructors), so it should be generally preferred instead.

Don't bother to resend just because of my nitpicking, I'll fix it up
before I push the last revision of your change, which is:


I didn't know that, thanks for this very good explanation. :-)



Reviewed-by: Francisco Jerez <curroje...@riseup.net>






This should be backported to mesa 11.2 I guess, could you please send
a v2 with this minor fix and add the cc thing?


Sure, as soon as we're done bikeshedding on the "," :)

Regards,

Hans




--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [RFC mesa] nouveau: Add support for OpenCL global memory buffers

2016-03-14 Thread Samuel Pitoiset



On 03/14/2016 08:50 PM, Hans de Goede wrote:

Hi,

On 14-03-16 16:41, Samuel Pitoiset wrote:



On 03/14/2016 04:28 PM, Hans de Goede wrote:

Hi,

On 14-03-16 16:05, Ilia Mirkin wrote:

There's a less hacky and more hacky way forward. The more hacky
solution is
to set file index to -1 or something and then not do the lowering when
you
see that.

The less hacky solution is the one you proposed as #1 - introduce a new
file for "buffer" memory and lower it to the global file by adding a
base
offset.

Right now the meaning of global is overloaded - before lowering it
implicitly includes the buffer vase address, and after lowering, it
explicitly includes it. Splitting it out I to another file type seems
like
the cleaner way forward, not sure what issue you were seeing with that
approach.


Ok.


I agree with you guys, the solution #1 is fine by me.

Btw, do you need someone with commit access to push your previous
series (the tgsi thing)? I can do this for you.


Thanks for the offer. IIRC Ilia wanted some minor fixes there, so I'll do
a v2 tomorrow. Talking about commit rights, I guess it would be
convenient for all if I would get commit rights myself? I promise I won't
push anythings without acks.


Yes sure, I trust you, no worries. :-)



I already have a freedesktop.org account, my username is jwrdegoede.


Please open a ticket on bugs.freedesktop to ask for commit rights.



Regards,

Hans









 > (I didn't understand your argument about potential future

issues.)


There was not much to understand, it is just something I worried about,
but was not sure if there actually was something to worry about :)

If you feel that solution #1 (which was also my first hunch) is
the right one then I will go and implement that.


What I really don't want is to somehow differentiate glsl-sourced
and opencl-sourced compute programs in the backend.


Ok, understood.

Regards,

Hans



On Mar 14, 2016 6:22 AM, "Hans de Goede" <hdego...@redhat.com> wrote:


This little "hack" fixes the use of OpenCL global memory buffers with
nouveau, but clearly the #if 0 is not a solution as it breaks buffers
with GLSL.

The reason I'm posting this as an RFC patch is to discuss how to solve
this properly, 2 solutions come to mind:

1) Use separate nv50_ir::FILE_MEMORY_xxx values for buffers versus
TGSI_FILE_MEMORY with TGSI_MEMORY_TYPE_GLOBAL, looking at
translateFile()
we currently have:

case TGSI_FILE_BUFFER:  return
nv50_ir::FILE_MEMORY_GLOBAL;
case TGSI_FILE_MEMORY:  return
nv50_ir::FILE_MEMORY_GLOBAL;

So doing a
s/nv50_ir::FILE_MEMORY_GLOBAL/nv50_ir::FILE_MEMORY_BUFFER/
everywhere and then adding a new FILE_MEMORY_GLOBAL seems like an
obvious fix.

But I'm afraid that we will have similar issues with OpenCL using
flat addresses where as GLSL will have some implied base-address /
offset in other places too, which brings me to solution 2:

2) Add a flag to Program to indicate that it is an OpenCL compute
kernel;
or possible use a different Program::TYPE_* for OpenCL ?

I've a feeling that this is what we want since the addressing
models
are just different and we likely will need to implement different
behavior
in various places based on this.

This will also allow us to use INPUT and CONST in tgsi code build
from
OpenCL programs and use that flag to do the right thing, rather
then
introducing new MEMORY[x], INPUT resp. MEMORY[x], CONST
declarations
for this.

I'm esp. worried that once GLSL gets global support it will want
different behavior for TGSI_FILE_MEMORY with
TGSI_MEMORY_TYPE_GLOBAL
then OpenCL, just like things are now with buffers, rendering
solution
1. a non solution

So I'm seeking input on how to move forward with this ...  ?

Signed-off-by: Hans de Goede <hdego...@redhat.com>
---
  src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 4

  src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 2 ++
  2 files changed, 6 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index de0c72b..15012ac 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -1525,6 +1525,10 @@ Converter::makeSym(uint tgsiFile, int fileIdx,
int
idx, int c, uint32_t address)

 if (tgsiFile == TGSI_FILE_MEMORY) {
switch (code->memoryFiles[fileIdx].mem_type) {
+  case TGSI_MEMORY_TYPE_GLOBAL:
+ /* No-op this is the default for TGSI_FILE_MEMORY */
+ sym->setFile(FILE_MEMORY_GLOBAL);
+ break;
case TGSI_MEMORY_TYPE_SHARED:
   sym->setFile(FILE_MEMORY_SHARED);
   break;
diff --git
a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
index 6cb4dd4..bc

Re: [Nouveau] [RFC mesa] nouveau: Add support for OpenCL global memory buffers

2016-03-14 Thread Samuel Pitoiset



On 03/14/2016 04:28 PM, Hans de Goede wrote:

Hi,

On 14-03-16 16:05, Ilia Mirkin wrote:

There's a less hacky and more hacky way forward. The more hacky
solution is
to set file index to -1 or something and then not do the lowering when
you
see that.

The less hacky solution is the one you proposed as #1 - introduce a new
file for "buffer" memory and lower it to the global file by adding a base
offset.

Right now the meaning of global is overloaded - before lowering it
implicitly includes the buffer vase address, and after lowering, it
explicitly includes it. Splitting it out I to another file type seems
like
the cleaner way forward, not sure what issue you were seeing with that
approach.


Ok.


I agree with you guys, the solution #1 is fine by me.

Btw, do you need someone with commit access to push your previous series 
(the tgsi thing)? I can do this for you.




 > (I didn't understand your argument about potential future

issues.)


There was not much to understand, it is just something I worried about,
but was not sure if there actually was something to worry about :)

If you feel that solution #1 (which was also my first hunch) is
the right one then I will go and implement that.


What I really don't want is to somehow differentiate glsl-sourced
and opencl-sourced compute programs in the backend.


Ok, understood.

Regards,

Hans



On Mar 14, 2016 6:22 AM, "Hans de Goede"  wrote:


This little "hack" fixes the use of OpenCL global memory buffers with
nouveau, but clearly the #if 0 is not a solution as it breaks buffers
with GLSL.

The reason I'm posting this as an RFC patch is to discuss how to solve
this properly, 2 solutions come to mind:

1) Use separate nv50_ir::FILE_MEMORY_xxx values for buffers versus
TGSI_FILE_MEMORY with TGSI_MEMORY_TYPE_GLOBAL, looking at
translateFile()
we currently have:

case TGSI_FILE_BUFFER:  return nv50_ir::FILE_MEMORY_GLOBAL;
case TGSI_FILE_MEMORY:  return nv50_ir::FILE_MEMORY_GLOBAL;

So doing a
s/nv50_ir::FILE_MEMORY_GLOBAL/nv50_ir::FILE_MEMORY_BUFFER/
everywhere and then adding a new FILE_MEMORY_GLOBAL seems like an
obvious fix.

But I'm afraid that we will have similar issues with OpenCL using
flat addresses where as GLSL will have some implied base-address /
offset in other places too, which brings me to solution 2:

2) Add a flag to Program to indicate that it is an OpenCL compute
kernel;
or possible use a different Program::TYPE_* for OpenCL ?

I've a feeling that this is what we want since the addressing models
are just different and we likely will need to implement different
behavior
in various places based on this.

This will also allow us to use INPUT and CONST in tgsi code build
from
OpenCL programs and use that flag to do the right thing, rather then
introducing new MEMORY[x], INPUT resp. MEMORY[x], CONST declarations
for this.

I'm esp. worried that once GLSL gets global support it will want
different behavior for TGSI_FILE_MEMORY with TGSI_MEMORY_TYPE_GLOBAL
then OpenCL, just like things are now with buffers, rendering
solution
1. a non solution

So I'm seeking input on how to move forward with this ...  ?

Signed-off-by: Hans de Goede 
---
  src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 4 
  src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 2 ++
  2 files changed, 6 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index de0c72b..15012ac 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -1525,6 +1525,10 @@ Converter::makeSym(uint tgsiFile, int fileIdx,
int
idx, int c, uint32_t address)

 if (tgsiFile == TGSI_FILE_MEMORY) {
switch (code->memoryFiles[fileIdx].mem_type) {
+  case TGSI_MEMORY_TYPE_GLOBAL:
+ /* No-op this is the default for TGSI_FILE_MEMORY */
+ sym->setFile(FILE_MEMORY_GLOBAL);
+ break;
case TGSI_MEMORY_TYPE_SHARED:
   sym->setFile(FILE_MEMORY_SHARED);
   break;
diff --git
a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
index 6cb4dd4..bcc96de 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
@@ -2106,6 +2106,7 @@ NVC0LoweringPass::visit(Instruction *i)
} else if (i->src(0).getFile() == FILE_SHADER_OUTPUT) {
   assert(prog->getType() ==
Program::TYPE_TESSELLATION_CONTROL);
   i->op = OP_VFETCH;
+#if 0
} else if (i->src(0).getFile() == FILE_MEMORY_GLOBAL) {
   Value *ind = i->getIndirect(0, 1);
   Value *ptr = loadResInfo64(ind, i->getSrc(0)->reg.fileIndex *
16);
@@ -2126,6 +2127,7 @@ 

Re: [Nouveau] [PATCH mesa v2] clover: Fix pipe_grid_info.indirect not being initialized

2016-03-14 Thread Samuel Pitoiset

Thanks Hans!

Reviewed-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>

On 03/14/2016 03:01 PM, Hans de Goede wrote:

After pipe_grid_info.indirect was introduced, clover was not modified
to set it causing it to pass uninitialized memory for it to launch_grid.

This commit fixes this by zero-ing the entire pipe_grid_info struct when
declaring it, to avoid similar problems popping-up in the future.

Cc: "11.2" <mesa-sta...@lists.freedesktop.org>
Signed-off-by: Hans de Goede <hdego...@redhat.com>
---
Changes in v2:
-Drop trailing "," from struct initializer
-Add Cc: "11.2" <mesa-sta...@lists.freedesktop.org>
---
  src/gallium/state_trackers/clover/core/kernel.cpp | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/clover/core/kernel.cpp 
b/src/gallium/state_trackers/clover/core/kernel.cpp
index 8396be9..1ab87ec 100644
--- a/src/gallium/state_trackers/clover/core/kernel.cpp
+++ b/src/gallium/state_trackers/clover/core/kernel.cpp
@@ -55,7 +55,7 @@ kernel::launch(command_queue ,
 const auto reduced_grid_size =
map(divides(), grid_size, block_size);
 void *st = exec.bind(, grid_offset);
-   struct pipe_grid_info info;
+   struct pipe_grid_info info = { 0 };

 // The handles are created during exec_context::bind(), so we need make
 // sure to call exec_context::bind() before retrieving them.



--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH mesa] clover: Fix pipe_grid_info.indirect not being initialized

2016-03-14 Thread Samuel Pitoiset



On 03/14/2016 02:26 PM, Hans de Goede wrote:

Hi,

On 14-03-16 14:01, Samuel Pitoiset wrote:



On 03/14/2016 01:50 PM, Hans de Goede wrote:

After pipe_grid_info.indirect was introduced, clover was not modified
to set it causing it to pass uninitialized memory for it to launch_grid.

This commit fixes this by zero-ing the entire pipe_grid_info struct when
declaring it, to avoid similar problems popping-up in the future.

Signed-off-by: Hans de Goede <hdego...@redhat.com>
---
  src/gallium/state_trackers/clover/core/kernel.cpp | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/clover/core/kernel.cpp
b/src/gallium/state_trackers/clover/core/kernel.cpp
index 8396be9..dad66aa 100644
--- a/src/gallium/state_trackers/clover/core/kernel.cpp
+++ b/src/gallium/state_trackers/clover/core/kernel.cpp
@@ -55,7 +55,7 @@ kernel::launch(command_queue ,
 const auto reduced_grid_size =
map(divides(), grid_size, block_size);
 void *st = exec.bind(, grid_offset);
-   struct pipe_grid_info info;
+   struct pipe_grid_info info = { 0, };


Right, good catch, it's my fault.

= { 0 }; is enough btw.


I prefer to add the "," to make clear that we are initializing the
entire struct,
I read it as  ", ...".


Well, usually we use { 0 } in mesa, try to grep and you will see. :-)
There is only 3 occurrences of { 0, }, but I think they are quite old.




This should be backported to mesa 11.2 I guess, could you please send
a v2 with this minor fix and add the cc thing?


Sure, as soon as we're done bikeshedding on the "," :)

Regards,

Hans


--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH mesa 3/3] nouveau: Add support for clover / OpenCL kernel input parameters

2016-03-10 Thread Samuel Pitoiset



On 03/10/2016 05:03 PM, Pierre Moreau wrote:

On 04:27 PM - Mar 10 2016, Samuel Pitoiset wrote:



On 03/10/2016 04:23 PM, Ilia Mirkin wrote:

On Thu, Mar 10, 2016 at 10:14 AM, Hans de Goede <hdego...@redhat.com> wrote:

Add support for clover / OpenCL kernel input parameters.

Signed-off-by: Hans de Goede <hdego...@redhat.com>
---
  .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 18 +++---
  1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index a8258af..de0c72b 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -1523,9 +1523,21 @@ Converter::makeSym(uint tgsiFile, int fileIdx, int idx, 
int c, uint32_t address)

 sym->reg.fileIndex = fileIdx;

-   if (tgsiFile == TGSI_FILE_MEMORY &&
-   code->memoryFiles[fileIdx].mem_type == TGSI_MEMORY_TYPE_SHARED)
-  sym->setFile(FILE_MEMORY_SHARED);
+   if (tgsiFile == TGSI_FILE_MEMORY) {
+  switch (code->memoryFiles[fileIdx].mem_type) {
+  case TGSI_MEMORY_TYPE_SHARED:
+ sym->setFile(FILE_MEMORY_SHARED);


You might want to increment the address by at least
`info->prop.cp.inputOffset`, and if inputs still end up in shared on Tesla,
then increment further by the input size. This input offset of 0x10 (or is it
0x20?) is due to the card sticking the size of a block and of the grid inside
`s[0x0..0x10]` (or maybe Nouveau is doing that, but I doubt it.). So even if
the user inputs end up somewhere else in memory, you most likely still don't
want to overwrite the grid information. This should be necessary only for Tesla
cards.


cf. my previous comment. :-)




+ break;
+  case TGSI_MEMORY_TYPE_INPUT:
+ assert(prog->getType() == Program::TYPE_COMPUTE);
+ assert(idx == -1);
+ sym->setFile(FILE_SHADER_INPUT);
+ address += info->prop.cp.inputOffset;


What's the idea here? i.e. what is the inputOffset, how is it set, and why?


I don't get the idea too, btw.

But prop.cp.inputOffset is only defined for compute on Kepler. It's the
offset of input parameters in the screen->parm BO but as you already know,
it is going to be removed because the idea is to use screen->uniform_bo
instead. I'll do this change *after* the compute shaders support on Kepler.


If I understand correctly, the goal is to have user inputs in a
`screen->uniform_bo`, and so for all generations?


Sure for fermi, and probably for Tesla.



Pierre






   -ilia


+ break;
+  default:
+ assert(0); /* TODO: Add support for global and local memory */
+  }
+   }

 if (idx >= 0) {
if (sym->reg.file == FILE_SHADER_INPUT)
--
2.7.2



--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH mesa 3/3] nouveau: Add support for clover / OpenCL kernel input parameters

2016-03-10 Thread Samuel Pitoiset
Looks fine, except that you will need to lower FILE_SHADER_INPUT to 
FILE_MEMORY_SHARED for Tesla because input kernel parameters are located 
at s[0x10]. No need to do this for Fermi+ because it's already lowered 
to c0[]. Note that input kernel parameters will be probably sticked on 
c7[] after my changes but that doesn't change anything for you.


I already have a patch for the nv50 bits btw, maybe it's the right time 
to send it?


https://cgit.freedesktop.org/~hakzsam/mesa/commit/?h=compute=640d68009bcf93c1814cee0b1a12939cb85e5895

Reviewed-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>

On 03/10/2016 04:43 PM, Ilia Mirkin wrote:

On Thu, Mar 10, 2016 at 10:27 AM, Samuel Pitoiset
<samuel.pitoi...@gmail.com> wrote:



On 03/10/2016 04:23 PM, Ilia Mirkin wrote:


On Thu, Mar 10, 2016 at 10:14 AM, Hans de Goede <hdego...@redhat.com>
wrote:


Add support for clover / OpenCL kernel input parameters.

Signed-off-by: Hans de Goede <hdego...@redhat.com>
---
   .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 18
+++---
   1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index a8258af..de0c72b 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -1523,9 +1523,21 @@ Converter::makeSym(uint tgsiFile, int fileIdx, int
idx, int c, uint32_t address)

  sym->reg.fileIndex = fileIdx;

-   if (tgsiFile == TGSI_FILE_MEMORY &&
-   code->memoryFiles[fileIdx].mem_type == TGSI_MEMORY_TYPE_SHARED)
-  sym->setFile(FILE_MEMORY_SHARED);
+   if (tgsiFile == TGSI_FILE_MEMORY) {
+  switch (code->memoryFiles[fileIdx].mem_type) {
+  case TGSI_MEMORY_TYPE_SHARED:
+ sym->setFile(FILE_MEMORY_SHARED);
+ break;
+  case TGSI_MEMORY_TYPE_INPUT:
+ assert(prog->getType() == Program::TYPE_COMPUTE);
+ assert(idx == -1);
+ sym->setFile(FILE_SHADER_INPUT);
+ address += info->prop.cp.inputOffset;



What's the idea here? i.e. what is the inputOffset, how is it set, and
why?



I don't get the idea too, btw.

But prop.cp.inputOffset is only defined for compute on Kepler. It's the
offset of input parameters in the screen->parm BO but as you already know,
it is going to be removed because the idea is to use screen->uniform_bo
instead. I'll do this change *after* the compute shaders support on Kepler.


Actually looks like it's only set for nv50 that I can see, shifting
things over by 0x10. It used to be reflected by getResourceBase, but
we broke that abstraction... might be nice to get it back somehow,
perhaps by sending more arguments down to getResourceBase? Either way,
that can be done later. This patch is

Reviewed-by: Ilia Mirkin <imir...@alum.mit.edu>



--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH mesa 3/3] nouveau: Add support for clover / OpenCL kernel input parameters

2016-03-10 Thread Samuel Pitoiset



On 03/10/2016 04:43 PM, Ilia Mirkin wrote:

On Thu, Mar 10, 2016 at 10:27 AM, Samuel Pitoiset
<samuel.pitoi...@gmail.com> wrote:



On 03/10/2016 04:23 PM, Ilia Mirkin wrote:


On Thu, Mar 10, 2016 at 10:14 AM, Hans de Goede <hdego...@redhat.com>
wrote:


Add support for clover / OpenCL kernel input parameters.

Signed-off-by: Hans de Goede <hdego...@redhat.com>
---
   .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 18
+++---
   1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index a8258af..de0c72b 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -1523,9 +1523,21 @@ Converter::makeSym(uint tgsiFile, int fileIdx, int
idx, int c, uint32_t address)

  sym->reg.fileIndex = fileIdx;

-   if (tgsiFile == TGSI_FILE_MEMORY &&
-   code->memoryFiles[fileIdx].mem_type == TGSI_MEMORY_TYPE_SHARED)
-  sym->setFile(FILE_MEMORY_SHARED);
+   if (tgsiFile == TGSI_FILE_MEMORY) {
+  switch (code->memoryFiles[fileIdx].mem_type) {
+  case TGSI_MEMORY_TYPE_SHARED:
+ sym->setFile(FILE_MEMORY_SHARED);
+ break;
+  case TGSI_MEMORY_TYPE_INPUT:
+ assert(prog->getType() == Program::TYPE_COMPUTE);
+ assert(idx == -1);
+ sym->setFile(FILE_SHADER_INPUT);
+ address += info->prop.cp.inputOffset;



What's the idea here? i.e. what is the inputOffset, how is it set, and
why?



I don't get the idea too, btw.

But prop.cp.inputOffset is only defined for compute on Kepler. It's the
offset of input parameters in the screen->parm BO but as you already know,
it is going to be removed because the idea is to use screen->uniform_bo
instead. I'll do this change *after* the compute shaders support on Kepler.


Actually looks like it's only set for nv50 that I can see, shifting
things over by 0x10. It used to be reflected by getResourceBase, but
we broke that abstraction... might be nice to get it back somehow,
perhaps by sending more arguments down to getResourceBase? Either way,
that can be done later. This patch is


Oh yes, I was confused with prop.cp.gridInfoBase on Kepler...



Reviewed-by: Ilia Mirkin <imir...@alum.mit.edu>



--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH mesa 3/3] nouveau: Add support for clover / OpenCL kernel input parameters

2016-03-10 Thread Samuel Pitoiset



On 03/10/2016 04:23 PM, Ilia Mirkin wrote:

On Thu, Mar 10, 2016 at 10:14 AM, Hans de Goede  wrote:

Add support for clover / OpenCL kernel input parameters.

Signed-off-by: Hans de Goede 
---
  .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 18 +++---
  1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index a8258af..de0c72b 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -1523,9 +1523,21 @@ Converter::makeSym(uint tgsiFile, int fileIdx, int idx, 
int c, uint32_t address)

 sym->reg.fileIndex = fileIdx;

-   if (tgsiFile == TGSI_FILE_MEMORY &&
-   code->memoryFiles[fileIdx].mem_type == TGSI_MEMORY_TYPE_SHARED)
-  sym->setFile(FILE_MEMORY_SHARED);
+   if (tgsiFile == TGSI_FILE_MEMORY) {
+  switch (code->memoryFiles[fileIdx].mem_type) {
+  case TGSI_MEMORY_TYPE_SHARED:
+ sym->setFile(FILE_MEMORY_SHARED);
+ break;
+  case TGSI_MEMORY_TYPE_INPUT:
+ assert(prog->getType() == Program::TYPE_COMPUTE);
+ assert(idx == -1);
+ sym->setFile(FILE_SHADER_INPUT);
+ address += info->prop.cp.inputOffset;


What's the idea here? i.e. what is the inputOffset, how is it set, and why?


I don't get the idea too, btw.

But prop.cp.inputOffset is only defined for compute on Kepler. It's the 
offset of input parameters in the screen->parm BO but as you already 
know, it is going to be removed because the idea is to use 
screen->uniform_bo instead. I'll do this change *after* the compute 
shaders support on Kepler.




   -ilia


+ break;
+  default:
+ assert(0); /* TODO: Add support for global and local memory */
+  }
+   }

 if (idx >= 0) {
if (sym->reg.file == FILE_SHADER_INPUT)
--
2.7.2



--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH mesa 1/3] tgsi: Fix decl.Atomic and .Shared not propagating when parsing tgsi text

2016-03-10 Thread Samuel Pitoiset

Reviewed-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>


On 03/10/2016 04:14 PM, Hans de Goede wrote:

When support for decl.Atomic and .Shared was added, tgsi_build_declaration
was not updated to propagate these properly.

Signed-off-by: Hans de Goede <hdego...@redhat.com>
---
  src/gallium/auxiliary/tgsi/tgsi_build.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_build.c 
b/src/gallium/auxiliary/tgsi/tgsi_build.c
index cfe9b92..c420ae1 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_build.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_build.c
@@ -127,6 +127,8 @@ tgsi_build_declaration(
 unsigned invariant,
 unsigned local,
 unsigned array,
+   unsigned atomic,
+   unsigned shared,
 struct tgsi_header *header )
  {
 struct tgsi_declaration declaration;
@@ -143,6 +145,8 @@ tgsi_build_declaration(
 declaration.Invariant = invariant;
 declaration.Local = local;
 declaration.Array = array;
+   declaration.Atomic = atomic;
+   declaration.Shared = shared;
 header_bodysize_grow( header );

 return declaration;
@@ -401,6 +405,8 @@ tgsi_build_full_declaration(
full_decl->Declaration.Invariant,
full_decl->Declaration.Local,
full_decl->Declaration.Array,
+  full_decl->Declaration.Atomic,
+  full_decl->Declaration.Shared,
header );

 if (maxsize <= size)



--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-22 Thread Samuel Pitoiset

Well the pipe_loader stuff is buggy in compute.c, I can't even
create a screen object... That's sad. It fails in pipe_loader_probe() & co.

On 02/22/2016 02:08 PM, Hans de Goede wrote:

Hi,

On 22-02-16 14:04, Samuel Pitoiset wrote:


On 02/22/2016 01:46 PM, Hans de Goede wrote:

Hi,

On 22-02-16 13:41, Samuel Pitoiset wrote:

Hi there,

On 02/22/2016 12:26 PM, Hans de Goede wrote:





So back to the problem of getting OpenCL(ish) code to work again with
the recent mesa changes. For starters I would like to get:

src/gallium/tests/trivial/compute.c and then the test with mask 8,
test_input_global() to work again, when that is working I should be
able to adjust my llvm work (and if necessary clover) to start to
work again.

Currently the test_input_global() test uses the following bit of
TGSI code:

COMP
DCL SV[0], THREAD_ID[0]
DCL TEMP[0], LOCAL
DCL TEMP[1], LOCAL
IMM UINT32 { 8, 0, 0, 0 }

BGNSUB\n"
UMUL TEMP[0], SV[0], IMM[0]
LOAD TEMP[1].xy, RINPUT, TEMP[0]
LOAD TEMP[0].x, RGLOBAL, TEMP[1].
UADD TEMP[1].x, TEMP[0], -TEMP[1]
STORE RGLOBAL.x, TEMP[1]., TEMP[1]
RET
ENDSUB


Where by RINPUT and RGLOBAL get replaces by processing the
code with cpp and the following defines:

#define RGLOBALRES[32767]
#define RLOCAL RES[32766]
#define RPRIVATE   RES[32765]
#define RINPUT RES[32764]

If I understand how memory is supposed to work, then I would need to
change the TGSI as follows:

COMP
DCL SV[0], THREAD_ID[0]
DCL MEMORY[0]
DCL TEMP[0], LOCAL
DCL TEMP[1], LOCAL
IMM UINT32 { 8, 0, 0, 0 }

BGNSUB\n"
UMUL TEMP[0], SV[0], IMM[0]
LOAD TEMP[1].xy, RINPUT, TEMP[0]
LOAD TEMP[0].x, MEMORY[0], TEMP[1].
UADD TEMP[1].x, TEMP[0], -TEMP[1]
STORE MEMORY[0].x, TEMP[1]., TEMP[1]
RET
ENDSUB


Nope, this won't work because RINPUT is RES[32764]. And you have to
remove all occurrences to RES because it's not longer supported. In my
opinion, using BUFFER[0] in a first time should work. Currently, only
SHARED with MEMORY is supported.


Right, as I say below "This only solves the accessing of the global
memory, it does not solve
getting to the kernel input kernel parameters"


This assumes, that as discussed declaring memory without a , SHARED or
other
flag means the memory is global.

So 2 questions:

1) Do the above changes for using the new MEMORY keyword look as
intended
to you?

2) This only solves the accessing of the global memory, it does not
solve
getting to the kernel input kernel parameters, how would I deal with
those ?


The input kernel parameters are directly passed through a call to
pipe_context::launch_grid. You just have to fill the
pipe_grid_info::input array with your parameters and they will be
uploaded by nvXX_compute_upload_input().


Right, the uploading side I understand, the question is how to get to
them from
the compute kernel's tgsi code ?


Right, I wonder if there is already a DECL INPUT or something like
that for input parameters of shaders. Oh yeah, there is
TGSI_FILE_INPUT, maybe this is what you want?


Yes that sounds right, so now "all" we need to do is make
nvXX_compute_upload_input() and TGSI_FILE_INPUT work together.

Regards,

Hans


--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-22 Thread Samuel Pitoiset



On 02/22/2016 01:46 PM, Hans de Goede wrote:

Hi,

On 22-02-16 13:41, Samuel Pitoiset wrote:

Hi there,

On 02/22/2016 12:26 PM, Hans de Goede wrote:





So back to the problem of getting OpenCL(ish) code to work again with
the recent mesa changes. For starters I would like to get:

src/gallium/tests/trivial/compute.c and then the test with mask 8,
test_input_global() to work again, when that is working I should be
able to adjust my llvm work (and if necessary clover) to start to
work again.

Currently the test_input_global() test uses the following bit of
TGSI code:

COMP
DCL SV[0], THREAD_ID[0]
DCL TEMP[0], LOCAL
DCL TEMP[1], LOCAL
IMM UINT32 { 8, 0, 0, 0 }

BGNSUB\n"
UMUL TEMP[0], SV[0], IMM[0]
LOAD TEMP[1].xy, RINPUT, TEMP[0]
LOAD TEMP[0].x, RGLOBAL, TEMP[1].
UADD TEMP[1].x, TEMP[0], -TEMP[1]
STORE RGLOBAL.x, TEMP[1]., TEMP[1]
RET
ENDSUB


Where by RINPUT and RGLOBAL get replaces by processing the
code with cpp and the following defines:

#define RGLOBALRES[32767]
#define RLOCAL RES[32766]
#define RPRIVATE   RES[32765]
#define RINPUT RES[32764]

If I understand how memory is supposed to work, then I would need to
change the TGSI as follows:

COMP
DCL SV[0], THREAD_ID[0]
DCL MEMORY[0]
DCL TEMP[0], LOCAL
DCL TEMP[1], LOCAL
IMM UINT32 { 8, 0, 0, 0 }

BGNSUB\n"
UMUL TEMP[0], SV[0], IMM[0]
LOAD TEMP[1].xy, RINPUT, TEMP[0]
LOAD TEMP[0].x, MEMORY[0], TEMP[1].
UADD TEMP[1].x, TEMP[0], -TEMP[1]
STORE MEMORY[0].x, TEMP[1]., TEMP[1]
RET
ENDSUB


Nope, this won't work because RINPUT is RES[32764]. And you have to
remove all occurrences to RES because it's not longer supported. In my
opinion, using BUFFER[0] in a first time should work. Currently, only
SHARED with MEMORY is supported.


Right, as I say below "This only solves the accessing of the global
memory, it does not solve
getting to the kernel input kernel parameters"


This assumes, that as discussed declaring memory without a , SHARED or
other
flag means the memory is global.

So 2 questions:

1) Do the above changes for using the new MEMORY keyword look as
intended
to you?

2) This only solves the accessing of the global memory, it does not
solve
getting to the kernel input kernel parameters, how would I deal with
those ?


The input kernel parameters are directly passed through a call to
pipe_context::launch_grid. You just have to fill the
pipe_grid_info::input array with your parameters and they will be
uploaded by nvXX_compute_upload_input().


Right, the uploading side I understand, the question is how to get to
them from
the compute kernel's tgsi code ?


Right, I wonder if there is already a DECL INPUT or something like that 
for input parameters of shaders. Oh yeah, there is TGSI_FILE_INPUT, 
maybe this is what you want?




If I understand you correctly you are suggesting to use BUFFER[0] for this,
that is fine from a nouveau point-of-view, but might be a bit nouveau
centric way of looking at things, I think a better approach would be
a separate input register-file for this, as that will be more flexible
when people try to do opencl via clang->llvm->tgsi on other GPUs.


I will have a look at the test_input_global().


Thanks!

Regards,

Hans


--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-22 Thread Samuel Pitoiset

Hi there,

On 02/22/2016 12:26 PM, Hans de Goede wrote:

Hi,

On 19-02-16 20:43, Ilia Mirkin wrote:

On Fri, Feb 19, 2016 at 5:36 AM, Hans de Goede 
wrote:

Hi,

On 18-02-16 17:39, Ilia Mirkin wrote:


On Thu, Feb 18, 2016 at 9:45 AM, Hans de Goede 
wrote:


But this does not seem to be hooked up yet for nouveau.



Samuel has patches. See
https://cgit.freedesktop.org/~hakzsam/mesa/log/?h=arb_compute_shader_v3



Cool, I will take a look at those.


So some questions:
-The commit by Samual says:
   This introduces TGSI_FILE_MEMORY for shared, global and local
memory.
Only
shared memory is currently supported.

   The commit introduces MEMORY[x] and MEMORY[x],SHARED so in
reality it
also
introduces
   a second option next to shared, so what are we going to use plain
MEMORY[x]
for?
   I suggest using it for global memory but we need to be in
agreement on
this.



That sounds fine to me. However what I had in mind was switching the
SHARED field into a 2-bit field and making it

1 = SHARED
2 = GLOBAL
3 = LOCAL

(since for OpenCL you also need to be able to address local or private
memory). I sorta wanted Samuel to do it, but since I had no idea where
you were at, or if you were even still working on this, I figured it
should be fixed up by the first person who needed it.



Sounds good, only the naming is somewhat unfortunate since opencl uses
different
naming. I.e. it has no "shared"


Sad. Well, "shared" is what OpenGL compute shaders use, which is why I
proposed it.



OpenCL has:
-global:  accessible by all worker-groups as well as by the CPU
-const:   read-only global
-local:   shared by worker-items in the same worker-group, not shared
between worker-groups
-private: accessible only to a single worker-item

So how do these map to the TGSI:


1 = SHARED
2 = GLOBAL
3 = LOCAL


OpenCL global = TGSI global
OpenCL const = TGSI global
OpenCL local = TGSI shared
OpenCL private = TGSI local

Not sure what the distinction is between OpenCL const and global is.
If the const stuff is actually just user-supplied uniforms (and
doesn't need to be in a particular place in memory), then those should
go into TGSI CONST somehow.


AFAIK OpenCL const is really read-only global, so the data is filled
in by the CPU, then passed to the opencl-kernel running on the GPU
where all worker-items have access to it. I think that TGSI CONST might
indeed be usable for this, but it is probably easiest to treat
it as GLOBAL for now.


-What about kernel input parameters, so far these have been using
RES[32764]
   I must admit that I do not understand where the file_index of 32764
comes
   from (or where any of the file indexes come from for
src/gallium/tests/trivial/compute.c ?
   I have the feeling that these are not used at all, and everything
simply
goes
   to a flat (virtual) memory space, with the params at address 0,
correct
?



It was never particularly well-specified, which was one of the reasons
it went away. It also didn't map nicely onto the OpenGL model. There
is a remaining question of how to do addressing in memory... there's
40 bits of address space. Should these implicitly be U64
(dual-component in TGSI) addresses that are passed around? Not sure
what the OpenCL position on all this is.



So far I've been using U32 for addresses as that is what Francisco's
original
code was using. And this also is what things like the tgsi LOAD
instruction
take. If you're doing a LOAD on a 1d buffer then you will use
TEMP[#].x to
specify the index, and the way how this currently works with OpenCL
is that
clCreateBuffer() will return a cl_mem type which then gets passed into
the kernel as input parameter and gets treated as a pointer by the
compiler,
so e.g. global mem gets treated as a single address space even if there
are multiple global buffers and TEMP[#].x contains the value passed in
via cl_mem as start offset for the buffer + the index into the buffer.

So this means that currently we are limited to U32 since TEMP[#].x is
only 32 bits wide. Internally 40 bits addresses can and should probably
be used so that at least the different memory spaces each have the full
32 bits available.

Note that we could fix this by adding some sort of LOAD64 opcode, which
uses TEMP[#].x and TEMP[#].y as address for 1d buffers, I'm not sure
how this would work for 3d buffers though. I foresee the llvm backend
eventually getting a 64 bit mode where it will use 64 bits for all
pointers and use something like a LOAD64 opcode to indicate that
the indexes (which it effectively uses as addresses / pointers)
are 64 bit wide.


Well, this LOAD64/STORE64 would just only be defined for MEMORY[]
src/dest, so you don't need to worry about 3d or anything like that. I
believe this is a good solution to the problem.


Right for MEMORY this will work fine. I've no clue yet how images will
work with OpenCL though, hopefully we can avoid the one flat address
space thing there, but I simply don't know yet.


Also it would 

Re: [Nouveau] [PATCH] pci: fix typo in nvkm_pcie_set_link()

2016-02-03 Thread Samuel Pitoiset



On 02/03/2016 10:33 AM, Alexandre Courbot wrote:

Fix a test that would either do nothing on PCI systems, or crash badly
on Tegra.

Signed-off-by: Alexandre Courbot 
Cc: Karol Herbst 
---
  drm/nouveau/nvkm/subdev/pci/pcie.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drm/nouveau/nvkm/subdev/pci/pcie.c 
b/drm/nouveau/nvkm/subdev/pci/pcie.c
index b32954f5311e..d71e5db5028a 100644
--- a/drm/nouveau/nvkm/subdev/pci/pcie.c
+++ b/drm/nouveau/nvkm/subdev/pci/pcie.c
@@ -119,7 +119,7 @@ nvkm_pcie_set_link(struct nvkm_pci *pci, enum 
nvkm_pcie_speed speed, u8 width)
struct pci_bus *pbus;
int ret;

-   if (pci || !pci_is_pcie(pci->pdev))
+   if (!pci || !pci_is_pcie(pci->pdev))
return 0;
pbus = pci->pdev->bus;


This has already been fixed but Ben still didn't apply this change to 
his repository.


http://cgit.freedesktop.org/~airlied/linux/tree/drivers/gpu/drm/nouveau/nvkm/subdev/pci/pcie.c?h=drm-next#n122






--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 0/2] allow partly reclocking on chipset

2016-01-13 Thread Samuel Pitoiset

Hi!

Did you check on different Fermi chipsets or only with one variant?

Are you sure that engine reclocking works as expected on Fermi? Because 
enabling it without a strong inspection sounds like a prediction and it 
might not work.


On 01/13/2016 01:25 PM, Karol Herbst wrote:

some chipset have working engine reclocking, but broken memory reclocking like
Fermi. We should for now, add the functionality to allow partly reclocking for
those.

Allthough this doesn't give as much performance as one might wish, it is till
noticeable and may improve performance enough to be noted.

Karol Herbst (2):
   clk: seperate engine and memory reclock toggles
   clk: allow engine reclock on fermi

  drm/nouveau/include/nvkm/subdev/clk.h |  3 ++-
  drm/nouveau/nvkm/subdev/clk/base.c| 21 ++---
  drm/nouveau/nvkm/subdev/clk/gf100.c   |  3 ++-
  drm/nouveau/nvkm/subdev/clk/gk104.c   |  3 ++-
  drm/nouveau/nvkm/subdev/clk/gk20a.c   |  2 +-
  drm/nouveau/nvkm/subdev/clk/gt215.c   |  2 +-
  drm/nouveau/nvkm/subdev/clk/mcp77.c   |  3 ++-
  drm/nouveau/nvkm/subdev/clk/nv04.c|  2 +-
  drm/nouveau/nvkm/subdev/clk/nv40.c|  2 +-
  drm/nouveau/nvkm/subdev/clk/nv50.c|  2 +-
  drm/nouveau/nvkm/subdev/clk/priv.h|  6 --
  11 files changed, 31 insertions(+), 18 deletions(-)



--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] debugfs: don't emit parameter names

2016-01-13 Thread Samuel Pitoiset



On 01/13/2016 02:12 PM, Karol Herbst wrote:

fixes a compile error


Yeah, and this is probably not the only error you will hit...
This is happens when CONFIG_DEBUG_FS is set.

You need to include "nouveau_drm.h" and to fix nouveau_debugfs.c too.



Signed-off-by: Karol Herbst 
---
  drm/nouveau/nouveau_debugfs.h | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drm/nouveau/nouveau_debugfs.h b/drm/nouveau/nouveau_debugfs.h
index 52c7161..b8c03ff 100644
--- a/drm/nouveau/nouveau_debugfs.h
+++ b/drm/nouveau/nouveau_debugfs.h
@@ -34,13 +34,13 @@ nouveau_drm_debugfs_cleanup(struct drm_minor *minor)
  }

  static inline int
-nouveau_debugfs_init(struct nouveau_drm *)
+nouveau_debugfs_init(struct nouveau_drm *drm)
  {
return 0;
  }

  static inline void
-nouveau_debugfs_fini(struct nouveau_drm *)
+nouveau_debugfs_fini(struct nouveau_drm *drm)
  {
  }




--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 0/2] allow partly reclocking on chipset

2016-01-13 Thread Samuel Pitoiset



On 01/13/2016 01:49 PM, Karol Herbst wrote:

Samuel Pitoiset <samuel.pitoi...@gmail.com> hat am 13. Januar 2016 um 13:43
geschrieben:

Hi!

Did you check on different Fermi chipsets or only with one variant?


currently I only checked that on my nvc1, but I thought I could just send the
patches and it is easier for others to try it out this way.


Yeah, you need to have more feedbacks on different chips.
What about fixing memory reclocking instead of this series which is 
going to be somehow useless when this will be done?


I mean memory reclocking probably offers more performance increase that 
engine reclocking, no?






Are you sure that engine reclocking works as expected on Fermi? Because
enabling it without a strong inspection sounds like a prediction and it
might not work.


It seems to work, because I got a huge performance increase in
gputest_pixmark_piano, check the second commit for details ;)


I did, but 5.1->6.4 for heaven is a good start, but definitely not a 
huge performance increase in my humble opinion. :-)






On 01/13/2016 01:25 PM, Karol Herbst wrote:

some chipset have working engine reclocking, but broken memory reclocking
like
Fermi. We should for now, add the functionality to allow partly reclocking
for
those.

Allthough this doesn't give as much performance as one might wish, it is
till
noticeable and may improve performance enough to be noted.

Karol Herbst (2):
clk: seperate engine and memory reclock toggles
clk: allow engine reclock on fermi

drm/nouveau/include/nvkm/subdev/clk.h | 3 ++-
drm/nouveau/nvkm/subdev/clk/base.c | 21 ++---
drm/nouveau/nvkm/subdev/clk/gf100.c | 3 ++-
drm/nouveau/nvkm/subdev/clk/gk104.c | 3 ++-
drm/nouveau/nvkm/subdev/clk/gk20a.c | 2 +-
drm/nouveau/nvkm/subdev/clk/gt215.c | 2 +-
drm/nouveau/nvkm/subdev/clk/mcp77.c | 3 ++-
drm/nouveau/nvkm/subdev/clk/nv04.c | 2 +-
drm/nouveau/nvkm/subdev/clk/nv40.c | 2 +-
drm/nouveau/nvkm/subdev/clk/nv50.c | 2 +-
drm/nouveau/nvkm/subdev/clk/priv.h | 6 --
11 files changed, 31 insertions(+), 18 deletions(-)



--
-Samuel


--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] debugfs: don't emit parameter names

2016-01-13 Thread Samuel Pitoiset



On 01/13/2016 02:33 PM, Samuel Pitoiset wrote:



On 01/13/2016 02:12 PM, Karol Herbst wrote:

fixes a compile error


Yeah, and this is probably not the only error you will hit...
This is happens when CONFIG_DEBUG_FS is set.

You need to include "nouveau_drm.h" and to fix nouveau_debugfs.c too.


Okay I read the patch too quickly. nouveau-debugfs.c is not compiled 
when CONFIG_DEBUG_FS is disabled, so this is enough to fix the 
compilation error.


However, I think your commit message needs to be updated.
What about s/emit/omit/? :-)

With that commit message fixed, this patch is:

Reviewed-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>





Signed-off-by: Karol Herbst <nouv...@karolherbst.de>
---
  drm/nouveau/nouveau_debugfs.h | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drm/nouveau/nouveau_debugfs.h
b/drm/nouveau/nouveau_debugfs.h
index 52c7161..b8c03ff 100644
--- a/drm/nouveau/nouveau_debugfs.h
+++ b/drm/nouveau/nouveau_debugfs.h
@@ -34,13 +34,13 @@ nouveau_drm_debugfs_cleanup(struct drm_minor *minor)
  }

  static inline int
-nouveau_debugfs_init(struct nouveau_drm *)
+nouveau_debugfs_init(struct nouveau_drm *drm)
  {
  return 0;
  }

  static inline void
-nouveau_debugfs_fini(struct nouveau_drm *)
+nouveau_debugfs_fini(struct nouveau_drm *drm)
  {
  }






--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [mesa v3 8/9] nvc0: remove use of deprecated sw class identifier

2015-12-18 Thread Samuel Pitoiset



On 12/18/2015 11:19 AM, Emil Velikov wrote:

The commit summary "remove use of deprecated..." is no longer
applicable. Feel free to tweak (use nvif provided class name/define ?)
before pushing.


Well, the commit summary is fine by me because the old sw class 
identifier is actually deprecated with that new interface and won"t work 
if it's not updated accordingly.


But, feel free to change it. :-)



-Emil
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [libdrm v3 01/14] nouveau: import and install a selection of nvif headers from the kernel

2015-12-18 Thread Samuel Pitoiset

Hi Ben,

I don't feel comfortable enough with the libdrm nouveau code to give you 
my Rb for this series, but as this seems work as expected, this series is:


Tested-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>

Thanks for your work!

On 12/17/2015 12:20 AM, Ben Skeggs wrote:

From: Ben Skeggs <bske...@redhat.com>

This commit also modifies the install path of the main libdrm_nouveau
header to be under a nouveau/ subdirectory.

Signed-off-by: Ben Skeggs <bske...@redhat.com>
---
  include/drm/nouveau_drm.h|   1 +
  nouveau/Makefile.am  |  11 +++-
  nouveau/libdrm_nouveau.pc.in |   2 +-
  nouveau/nvif/cl0080.h|  45 ++
  nouveau/nvif/cl9097.h|  44 ++
  nouveau/nvif/class.h | 141 +++
  nouveau/nvif/if0002.h|  38 
  nouveau/nvif/if0003.h|  33 ++
  nouveau/nvif/ioctl.h | 132 
  nouveau/nvif/unpack.h|  28 +
  10 files changed, 473 insertions(+), 2 deletions(-)
  create mode 100644 nouveau/nvif/cl0080.h
  create mode 100644 nouveau/nvif/cl9097.h
  create mode 100644 nouveau/nvif/class.h
  create mode 100644 nouveau/nvif/if0002.h
  create mode 100644 nouveau/nvif/if0003.h
  create mode 100644 nouveau/nvif/ioctl.h
  create mode 100644 nouveau/nvif/unpack.h

diff --git a/include/drm/nouveau_drm.h b/include/drm/nouveau_drm.h
index 87aefc5..e418f9f 100644
--- a/include/drm/nouveau_drm.h
+++ b/include/drm/nouveau_drm.h
@@ -200,6 +200,7 @@ struct drm_nouveau_sarea {
  #define DRM_NOUVEAU_GROBJ_ALLOC0x04
  #define DRM_NOUVEAU_NOTIFIEROBJ_ALLOC  0x05
  #define DRM_NOUVEAU_GPUOBJ_FREE0x06
+#define DRM_NOUVEAU_NVIF   0x07
  #define DRM_NOUVEAU_GEM_NEW0x40
  #define DRM_NOUVEAU_GEM_PUSHBUF0x41
  #define DRM_NOUVEAU_GEM_CPU_PREP   0x42
diff --git a/nouveau/Makefile.am b/nouveau/Makefile.am
index 25ea6dc..76cdeca 100644
--- a/nouveau/Makefile.am
+++ b/nouveau/Makefile.am
@@ -14,9 +14,18 @@ libdrm_nouveau_la_LIBADD = ../libdrm.la @PTHREADSTUBS_LIBS@

  libdrm_nouveau_la_SOURCES = $(LIBDRM_NOUVEAU_FILES)

-libdrm_nouveauincludedir = ${includedir}/libdrm
+libdrm_nouveauincludedir = ${includedir}/libdrm/nouveau
  libdrm_nouveauinclude_HEADERS = $(LIBDRM_NOUVEAU_H_FILES)

+libdrm_nouveaunvifincludedir = ${includedir}/libdrm/nouveau/nvif
+libdrm_nouveaunvifinclude_HEADERS = nvif/class.h \
+   nvif/cl0080.h \
+   nvif/cl9097.h \
+   nvif/if0002.h \
+   nvif/if0003.h \
+   nvif/ioctl.h \
+   nvif/unpack.h
+
  pkgconfigdir = @pkgconfigdir@
  pkgconfig_DATA = libdrm_nouveau.pc

diff --git a/nouveau/libdrm_nouveau.pc.in b/nouveau/libdrm_nouveau.pc.in
index 9abfd81..7d0622e 100644
--- a/nouveau/libdrm_nouveau.pc.in
+++ b/nouveau/libdrm_nouveau.pc.in
@@ -7,5 +7,5 @@ Name: libdrm_nouveau
  Description: Userspace interface to nouveau kernel DRM services
  Version: @PACKAGE_VERSION@
  Libs: -L${libdir} -ldrm_nouveau
-Cflags: -I${includedir} -I${includedir}/libdrm
+Cflags: -I${includedir} -I${includedir}/libdrm -I${includedir}/libdrm/nouveau
  Requires.private: libdrm
diff --git a/nouveau/nvif/cl0080.h b/nouveau/nvif/cl0080.h
new file mode 100644
index 000..331620a
--- /dev/null
+++ b/nouveau/nvif/cl0080.h
@@ -0,0 +1,45 @@
+#ifndef __NVIF_CL0080_H__
+#define __NVIF_CL0080_H__
+
+struct nv_device_v0 {
+   __u8  version;
+   __u8  pad01[7];
+   __u64 device;   /* device identifier, ~0 for client default */
+};
+
+#define NV_DEVICE_V0_INFO  0x00
+#define NV_DEVICE_V0_TIME  0x01
+
+struct nv_device_info_v0 {
+   __u8  version;
+#define NV_DEVICE_INFO_V0_IGP  0x00
+#define NV_DEVICE_INFO_V0_PCI  0x01
+#define NV_DEVICE_INFO_V0_AGP  0x02
+#define NV_DEVICE_INFO_V0_PCIE 0x03
+#define NV_DEVICE_INFO_V0_SOC  0x04
+   __u8  platform;
+   __u16 chipset;  /* from NV_PMC_BOOT_0 */
+   __u8  revision; /* from NV_PMC_BOOT_0 */
+#define NV_DEVICE_INFO_V0_TNT  0x01
+#define NV_DEVICE_INFO_V0_CELSIUS  0x02
+#define NV_DEVICE_INFO_V0_KELVIN   0x03
+#define NV_DEVICE_INFO_V0_RANKINE  0x04
+#define NV_DEVICE_INFO_V0_CURIE0x05
+#define NV_DEVICE_INFO_V0_TESLA0x06
+#define NV_DE

Re: [Nouveau] [mesa v3 1/9] nouveau: bump required libdrm version to 2.4.66

2015-12-17 Thread Samuel Pitoiset

This series is:

Reviewed-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>

On 12/17/2015 12:21 AM, Ben Skeggs wrote:

From: Ben Skeggs <bske...@redhat.com>

v2. forgot bump for non-gallium driver

Signed-off-by: Ben Skeggs <bske...@redhat.com>
---
  configure.ac | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure.ac b/configure.ac
index b6680d0..965c6f7 100644
--- a/configure.ac
+++ b/configure.ac
@@ -72,8 +72,8 @@ LIBDRM_REQUIRED=2.4.60
  LIBDRM_RADEON_REQUIRED=2.4.56
  LIBDRM_AMDGPU_REQUIRED=2.4.63
  LIBDRM_INTEL_REQUIRED=2.4.61
-LIBDRM_NVVIEUX_REQUIRED=2.4.33
-LIBDRM_NOUVEAU_REQUIRED=2.4.62
+LIBDRM_NVVIEUX_REQUIRED=2.4.66
+LIBDRM_NOUVEAU_REQUIRED=2.4.66
  LIBDRM_FREEDRENO_REQUIRED=2.4.65
  DRI2PROTO_REQUIRED=2.6
  DRI3PROTO_REQUIRED=1.0



--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [mesa v2 8/9] nvc0: remove allocation of unused sw class

2015-12-08 Thread Samuel Pitoiset

NACK.

This patches breaks MP performance counters on Fermi/Kepler because they 
actually use software methods to configure multiplexers. Global perf 
counters will also use software methods to init, sample and read 
hardware counters, so this SW object is definitely needed.


Instead of removing it, we need to do something like that:

http://paste.awesom.eu/EQeX

Thanks.

On 11/27/2015 02:05 AM, Ben Skeggs wrote:

From: Ben Skeggs 

This would need to be fixed before NVIF can be switched on, but since we
don't use it anyway, just remove it.

Signed-off-by: Ben Skeggs 
---
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 8 
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.h | 1 -
  2 files changed, 9 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index 4897ebe..11cb74a 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -447,7 +447,6 @@ nvc0_screen_destroy(struct pipe_screen *pscreen)
 nouveau_object_del(>eng2d);
 nouveau_object_del(>m2mf);
 nouveau_object_del(>compute);
-   nouveau_object_del(>nvsw);

 nouveau_screen_fini(>base);

@@ -698,13 +697,6 @@ nvc0_screen_create(struct nouveau_device *dev)
 screen->base.fence.update = nvc0_screen_fence_update;


-   ret = nouveau_object_new(chan,
-(dev->chipset < 0xe0) ? 0x1f906e : 0x906e, 0x906e,
-NULL, 0, >nvsw);
-   if (ret)
-  FAIL_SCREEN_INIT("Error creating SW object: %d\n", ret);
-
-
 switch (dev->chipset & ~0xf) {
 case 0x110:
 case 0x100:
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h
index 8b73102..caf34aa 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h
@@ -105,7 +105,6 @@ struct nvc0_screen {
 struct nouveau_object *eng2d;
 struct nouveau_object *m2mf;
 struct nouveau_object *compute;
-   struct nouveau_object *nvsw;
  };

  static inline struct nvc0_screen *



--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] NV50 compute support questions

2015-12-07 Thread Samuel Pitoiset



On 12/07/2015 04:10 PM, Hans de Goede wrote:

Hi



Hi,


On 04-12-15 09:45, Hans de Goede wrote:


I've ordered a GTX740 (GK107) card, which should arrive soon, and
I'll be using that so I can (hopefully) focus on the llvm tgsi bits
again.


So the card arrived today and I've plugged it in tests/trivial/compute
looks much better with this. But there does seem to be one issue
(other then the atomic bits not working) :

- test_resource_indirect


Exactly, two or three test don't work on Kepler < GK110.
It's on my todolist, but with a low priority. :-)

Thanks for reporting this anyway.


(1, 0)[0]: got 0x2/0.00, expected 0x3/0.00
(3, 0)[0]: got 0x6/0.00, expected 0x7/0.00
(5, 0)[0]: got 0xa/0.00, expected 0xb/0.00
(7, 0)[0]: got 0xe/0.00, expected 0xf/0.00
(9, 0)[0]: got 0x12/0.00, expected 0x13/0.00
(11, 0)[0]: got 0x16/0.00, expected 0x17/0.00
(13, 0)[0]: got 0x1a/0.00, expected 0x1b/0.00
(15, 0)[0]: got 0x1e/0.00, expected 0x1f/0.00
(17, 0)[0]: got 0x22/0.00, expected 0x23/0.00
(19, 0)[0]: got 0x26/0.00, expected 0x27/0.00
(21, 0)[0]: got 0x2a/0.00, expected 0x2b/0.00
(23, 0)[0]: got 0x2e/0.00, expected 0x2f/0.00
(25, 0)[0]: got 0x32/0.00, expected 0x33/0.00
(27, 0)[0]: got 0x36/0.00, expected 0x37/0.00
(29, 0)[0]: got 0x3a/0.00, expected 0x3b/0.00
(31, 0)[0]: got 0x3e/0.00, expected 0x3f/0.00
(33, 0)[0]: got 0x42/0.00, expected 0x43/0.00
(35, 0)[0]: got 0x46/0.00, expected 0x47/0.00
(37, 0)[0]: got 0x4a/0.00, expected 0x4b/0.00
(39, 0)[0]: got 0x4e/0.00, expected 0x4f/0.00
(64, 1): FAIL (32)

Regards,

Hans


--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] NV50 compute support questions

2015-12-04 Thread Samuel Pitoiset



On 12/04/2015 10:12 AM, Hans de Goede wrote:

Hi,

On 04-12-15 09:54, Samuel Pitoiset wrote:



On 12/04/2015 09:45 AM, Hans de Goede wrote:





Please give a shot at this branch :
http://cgit.freedesktop.org/~hakzsam/mesa/log/?h=nvf0_compute

It fixes the initialization of the compute state and allows me to
launch 'test_input_global' (ie. ./compute 8) on my GK208 without
any dmesg fails. That's a good start but more patches are coming. :-)


This branch indeed works somewhat better, but things still hang on the

test_system_values compute test for me (this is the first test executed
I did not try the others). So this seems to need more work.


What about test_input_global? test_system_values doesn't work on my
side but it doesn't hang the GPU.


Yes that one works.


Could you please provide dmesg log?


[2.786631] nouveau :01:00.0: NVIDIA GK208B (b06070b1)
[2.914291] nouveau :01:00.0: bios: version 80.28.79.00.0b
[2.937909] nouveau :01:00.0: priv: HUB0: 086014  (1f70820c)
[2.937953] nouveau :01:00.0: fb: 1024 MiB DDR3
[3.623202] [TTM] Zone  kernel: Available graphics memory: 2010556 kiB
[3.623205] [TTM] Initializing pool allocator
[3.623241] [TTM] Initializing DMA pool allocator
[3.623440] nouveau :01:00.0: DRM: VRAM: 1024 MiB
[3.623442] nouveau :01:00.0: DRM: GART: 1048576 MiB
[3.623447] nouveau :01:00.0: DRM: TMDS table version 2.0
[3.623449] nouveau :01:00.0: DRM: DCB version 4.0
[3.623451] nouveau :01:00.0: DRM: DCB outp 00: 01000f02 00020030
[3.623454] nouveau :01:00.0: DRM: DCB outp 01: 02011f62 00020010
[3.623456] nouveau :01:00.0: DRM: DCB outp 02: 02022f10 
[3.623458] nouveau :01:00.0: DRM: DCB conn 00: 1031
[3.623460] nouveau :01:00.0: DRM: DCB conn 01: 2161
[3.623462] nouveau :01:00.0: DRM: DCB conn 02: 0200
[3.627283] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[3.627285] [drm] Driver supports precise vblank timestamp query.
[3.671871] nouveau :01:00.0: DRM: MM: using COPY for buffer copies
[3.889940] nouveau :01:00.0: DRM: allocated 1920x1080 fb:
0x6, bo 88011905
[3.890952] fbcon: nouveaufb (fb0) is primary device
[4.132343] Console: switching to colour frame buffer device 240x67
[4.134930] nouveau :01:00.0: fb0: nouveaufb frame buffer device
[4.141094] [drm] Initialized nouveau 1.3.1 20120801 for :01:00.0
on minor 0



[ 1713.421460] nouveau :01:00.0: gr: TRAP ch 6 [003fa32000
compute[21117]]
[ 1713.421471] nouveau :01:00.0: gr: GPC0/TPC1/MP trap: global
 [] warp 3000e [MEM_OUT_OF_BOUNDS]
[ 1713.441248] nouveau :01:00.0: gr: TRAP ch 6 [003fa32000
compute[21117]]
[ 1713.441260] nouveau :01:00.0: gr: GPC0/TPC0/MP trap: global
0004 [MULTIPLE_WARP_ERRORS] warp 20005 [MISALIGNED_PC]
[ 1713.441265] nouveau :01:00.0: gr: GPC0/TPC1/MP trap: global
0004 [MULTIPLE_WARP_ERRORS] warp 20005 [MISALIGNED_PC]
[ 1717.773839] nouveau :01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1717.773848] nouveau :01:00.0: fifo: sw engine fault on channel 2,
recovering...
[ 1719.776529] nouveau :01:00.0: fifo: runlist 0 update timeout
[ 1722.068923] nouveau :01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1726.363660] nouveau :01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1730.658395] nouveau :01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1734.951720] nouveau :01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1739.241861] nouveau :01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1743.532005] nouveau :01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1747.826728] nouveau :01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1752.121462] nouveau :01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1756.416200] nouveau :01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1760.710930] nouveau :01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1765.005663] nouveau :01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1769.300396] nouveau :01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1773.595135] nouveau :01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1777.889863] nouveau :01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1782.184598] nouveau :01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1786.479328] nouveau :01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1789.730020] nouveau :01:00.0: compute[21117]: failed to idle
channel 6 [compute[21117]]
[ 1790.774060] nouveau :01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1791.729963] nouveau :01:00.0: timeout at
drivers/gpu/drm/nouveau/nvkm/engine/fifo/gpfifogk104.c:47/gk104_fifo_gpfifo_kick()!

[ 1791.729966] nouveau :01:00.0: fifo: channel 6 [compute[21117]]
kick timeout
[ 1791.729973] nouveau: compute[21117]::a06f: detach gr
failed, -16
[ 1791.731401] nouveau :01:00.0: fifo: SCHED_ERROR 0d []
[ 1793.731275] nouveau :01

Re: [Nouveau] NV50 compute support questions

2015-12-04 Thread Samuel Pitoiset



On 12/04/2015 09:45 AM, Hans de Goede wrote:

Hi,

On 02-12-15 19:33, Samuel Pitoiset wrote:



On 12/02/2015 04:34 PM, Hans de Goede wrote:

On 01-12-15, Samuel Pitoiset wrote:

 >>> Ok, here is a MMT trace of vectorAdd:
 >>>
 >>> https://fedorapeople.org/~jwrdegoede/vectorAdd.log.gz
 >>
 >> Hi Hans,
 >>
 >> Thanks a lot.
 >
 > Well, I didn't know but Martin has a GK208...
 > I just tested the compute support on his card and ... it works
without
 > any changes. :-)
 >
 > I'm sorry, I was sure the compute support didn't work on this
chipset.

No need to be sorry because, ...

 > Feel free to test on your GK208 and report back if you have problems.

I've done that, and for me it does not work, if I try to enable compute
support like this:

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index 461fcaa..ab4ea85 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -187,7 +187,7 @@ nvc0_screen_get_param(struct pipe_screen *pscreen,
enum pipe_cap param)
 case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE:
return (class_3d >= NVE4_3D_CLASS) ? 1 : 0;
 case PIPE_CAP_COMPUTE:
-  return (class_3d <= NVE4_3D_CLASS) ? 1 : 0;
+  return 1;
 case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER:
return nouveau_screen(pscreen)->vram_domain & NOUVEAU_BO_VRAM ?
1 : 0;

@@ -246,8 +246,6 @@ nvc0_screen_get_shader_param(struct pipe_screen
*pscreen, unsigned shader,
   return 0;
break;
 case PIPE_SHADER_COMPUTE:
-  if (class_3d > NVE4_3D_CLASS)
- return 0;
break;
 default:
return 0;
@@ -574,11 +572,10 @@ nvc0_screen_init_compute(struct nvc0_screen
*screen)
 case 0xd0:
return nvc0_screen_compute_setup(screen, screen->base.pushbuf);
 case 0xe0:
-  return nve4_screen_compute_setup(screen, screen->base.pushbuf);
 case 0xf0:
 case 0x100:
 case 0x110:
-  return 0;
+  return nve4_screen_compute_setup(screen, screen->base.pushbuf);
 default:
return -1;
 }

Then as soon as I do startx (which starts gnome-shell) the machine
freezes. This is with mesa-master with the above changes on top.

X / gnome-shell will happily work of I do not call
nve4_screen_compute_setup()
but then test/trivial/compute fails with a null-ptr exception.

Do you perhaps have some extra patches in your tree, or am I just
unlucky ?

I've tested this on both a 4.2 and a 4.4-rc3 kernel.


Hi,

My bad... I used the wrong card on reator (which is the REing machine
of Martin). The primary card is a GK106 and the second one is the
GK208. That doesn't explain why I did something wrong but heh? :-)

You are right. With those bits added locally, the compute support
totally hangs the GPU on my GK208 (NV108), and a reboot is needed.

Please give a shot at this branch :
http://cgit.freedesktop.org/~hakzsam/mesa/log/?h=nvf0_compute

It fixes the initialization of the compute state and allows me to
launch 'test_input_global' (ie. ./compute 8) on my GK208 without
any dmesg fails. That's a good start but more patches are coming. :-)


This branch indeed works somewhat better, but things still hang on the

test_system_values compute test for me (this is the first test executed
I did not try the others). So this seems to need more work.


What about test_input_global? test_system_values doesn't work on my side 
but it doesn't hang the GPU. Could you please provide dmesg log?




I've ordered a GTX740 (GK107) card, which should arrive soon, and
I'll be using that so I can (hopefully) focus on the llvm tgsi bits
again.


Yeah, GK107 will do the job. :-)




Btw, according to the trace you sent me, you have a GK208b (NV106).


Right, sorry I thought the differences between GK208 and GK208b would
not matter.


I don't know exactly the differences between these two chipsets but 
since test_system_values hangs your GPU and not mine, I think they are some.




Thanks for all the input / help!

Regards,

Hans




--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] NV50 compute support questions

2015-12-02 Thread Samuel Pitoiset



On 12/02/2015 04:34 PM, Hans de Goede wrote:

On 01-12-15, Samuel Pitoiset wrote:

 >>> Ok, here is a MMT trace of vectorAdd:
 >>>
 >>> https://fedorapeople.org/~jwrdegoede/vectorAdd.log.gz
 >>
 >> Hi Hans,
 >>
 >> Thanks a lot.
 >
 > Well, I didn't know but Martin has a GK208...
 > I just tested the compute support on his card and ... it works without
 > any changes. :-)
 >
 > I'm sorry, I was sure the compute support didn't work on this chipset.

No need to be sorry because, ...

 > Feel free to test on your GK208 and report back if you have problems.

I've done that, and for me it does not work, if I try to enable compute
support like this:

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index 461fcaa..ab4ea85 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -187,7 +187,7 @@ nvc0_screen_get_param(struct pipe_screen *pscreen,
enum pipe_cap param)
 case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE:
return (class_3d >= NVE4_3D_CLASS) ? 1 : 0;
 case PIPE_CAP_COMPUTE:
-  return (class_3d <= NVE4_3D_CLASS) ? 1 : 0;
+  return 1;
 case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER:
return nouveau_screen(pscreen)->vram_domain & NOUVEAU_BO_VRAM ?
1 : 0;

@@ -246,8 +246,6 @@ nvc0_screen_get_shader_param(struct pipe_screen
*pscreen, unsigned shader,
   return 0;
break;
 case PIPE_SHADER_COMPUTE:
-  if (class_3d > NVE4_3D_CLASS)
- return 0;
break;
 default:
return 0;
@@ -574,11 +572,10 @@ nvc0_screen_init_compute(struct nvc0_screen *screen)
 case 0xd0:
return nvc0_screen_compute_setup(screen, screen->base.pushbuf);
 case 0xe0:
-  return nve4_screen_compute_setup(screen, screen->base.pushbuf);
 case 0xf0:
 case 0x100:
 case 0x110:
-  return 0;
+  return nve4_screen_compute_setup(screen, screen->base.pushbuf);
 default:
return -1;
 }

Then as soon as I do startx (which starts gnome-shell) the machine
freezes. This is with mesa-master with the above changes on top.

X / gnome-shell will happily work of I do not call
nve4_screen_compute_setup()
but then test/trivial/compute fails with a null-ptr exception.

Do you perhaps have some extra patches in your tree, or am I just unlucky ?

I've tested this on both a 4.2 and a 4.4-rc3 kernel.


Hi,

My bad... I used the wrong card on reator (which is the REing machine of 
Martin). The primary card is a GK106 and the second one is the GK208. 
That doesn't explain why I did something wrong but heh? :-)


You are right. With those bits added locally, the compute support 
totally hangs the GPU on my GK208 (NV108), and a reboot is needed.


Please give a shot at this branch :
http://cgit.freedesktop.org/~hakzsam/mesa/log/?h=nvf0_compute

It fixes the initialization of the compute state and allows me to
launch 'test_input_global' (ie. ./compute 8) on my GK208 without
any dmesg fails. That's a good start but more patches are coming. :-)

Btw, according to the trace you sent me, you have a GK208b (NV106).

Thanks!



Regards,

Hans

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] NV50 compute support questions

2015-12-01 Thread Samuel Pitoiset



On 11/30/2015 04:13 PM, Samuel Pitoiset wrote:



On 11/30/2015 02:27 PM, Hans de Goede wrote:

Hi,

On 26-11-15 13:52, Samuel Pitoiset wrote:




I do not have a GK106, I've a GK208, and IIRC that one is known to not
work,
I guess I can give it a try.


Compute support is not supported on GK110+, yeah...

If you provide me a MMT trace of, for example, vectorAdd from the CUDA
samples I could have a look.


Ok, here is a MMT trace of vectorAdd:

https://fedorapeople.org/~jwrdegoede/vectorAdd.log.gz


Hi Hans,

Thanks a lot.


Well, I didn't know but Martin has a GK208...
I just tested the compute support on his card and ... it works without 
any changes. :-)


I'm sorry, I was sure the compute support didn't work on this chipset.

Feel free to test on your GK208 and report back if you have problems.

Thanks.





Regards,

Hans




--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] NV50 compute support questions

2015-11-30 Thread Samuel Pitoiset



On 11/30/2015 02:27 PM, Hans de Goede wrote:

Hi,

On 26-11-15 13:52, Samuel Pitoiset wrote:




I do not have a GK106, I've a GK208, and IIRC that one is known to not
work,
I guess I can give it a try.


Compute support is not supported on GK110+, yeah...

If you provide me a MMT trace of, for example, vectorAdd from the CUDA
samples I could have a look.


Ok, here is a MMT trace of vectorAdd:

https://fedorapeople.org/~jwrdegoede/vectorAdd.log.gz


Hi Hans,

Thanks a lot.



Regards,

Hans


--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] NV50 compute support questions

2015-11-26 Thread Samuel Pitoiset
Well, if you remove that assert locally, all compute tests in 
src/gallium/tests/trivial/compute.c pass on GK106, except the atomic 
ones. I'm working on the fermi case btw.


On 11/25/2015 03:43 PM, Hans de Goede wrote:

Hi,

On 20-11-15 17:07, Samuel Pitoiset wrote:



On 11/20/2015 11:36 AM, Hans de Goede wrote:

Hi Samual, et al,


Hi Hans,



In
http://cgit.freedesktop.org/mesa/mesa/commit/src/gallium/drivers/nouveau?id=ff72440b40211326eda118232fabd53965410afd


you write: "This compute support has been tested by
Pierre Moreau and myself with some compute kernels."

Can you provide testing instructions (and the
necessary files) so that I can try to reproduce
your tests ?

And once I've reproduced your tests, the next
question is where / how did you get the compute
kernels for testing. I guess you manually wrote them ?


Yeah, I wrote those compute kernels directly in assembly by hand.

I already said few days ago, you have some examples in
src/gallium/tests/trivial/compute.c which show how to use that compute
support stuff with TGSI kernels and without clover. Because clover is
not currently able to do OpenCL -> TGSI using Clang/LLVM, you can't
really use your backend directly.


Ok, so I've been  trying to get Francisco's nbody.c to run,
but that does not work. It runs but the planet bodies all stay in
the same place, I still need to debug this further any hints
for how to debug this are appreciated.

So I tried to run src/gallium/tests/trivial/compute.c,
with a recent mesa master, but that does not work either.

I get the following when I try to run this:

compute: nvc0/nvc0_resource.c:41: nvc0_surface_create: Assertion
`pres->target != PIPE_BUFFER' failed.

(gdb) bt
#0  0x76cbca98 in raise () from /lib64/libc.so.6
#1  0x76cbe69a in abort () from /lib64/libc.so.6
#2  0x76cb5227 in __assert_fail_base () from /lib64/libc.so.6
#3  0x76cb52d2 in __assert_fail () from /lib64/libc.so.6
#4  0x74e5b24c in nvc0_surface_create (pipe=,
 pres=, templ=) at
nvc0/nvc0_resource.c:41
#5  0x00404341 in init_compute_resources (ctx=ctx@entry=0x691010,
 slots=slots@entry=0x7fffd980) at compute.c:347
#6  0x00402cec in test_system_values (ctx=0x691010) at
compute.c:494
#7  main (argc=, argv=) at compute.c:1584

Which comes from the assert here:

static struct pipe_surface *
nvc0_surface_create(struct pipe_context *pipe,
 struct pipe_resource *pres,
 const struct pipe_surface *templ)
{
/* surfaces are assumed to be miptrees all over the place. */
assert(pres->target != PIPE_BUFFER);
if (unlikely(pres->target == PIPE_BUFFER))
   return nv50_surface_from_buffer(pipe, pres, templ);
return nvc0_miptree_surface_new(pipe, pres, templ);
}

Just dropping that assert helps somewhat, it leads to:

PIPE_COMPUTE_CAP_GRID_DIMENSION: { 3 }
PIPE_COMPUTE_CAP_MAX_GRID_SIZE: { 65535 65535 65535 }
PIPE_COMPUTE_CAP_MAX_BLOCK_SIZE: { 1024 1024 64 }
- test_system_values
ERROR: SUSTx not yet supported on < nve4
ERROR: SUSTx not yet supported on < nve4
ERROR: SUSTx not yet supported on < nve4
ERROR: SUSTx not yet supported on < nve4
(0, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x0/0.00
(1, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x0/0.00
(2, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x0/0.00
(3, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x0/0.00
(4, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x4/0.00
(5, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x3/0.00
(6, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x5/0.00
(7, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x1/0.00
(8, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x5/0.00
(9, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x4/0.00
(10, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x1/0.00
(11, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x1/0.00
(12, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x0/0.00
(13, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x0/0.00
(14, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x0/0.00
(15, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x0/0.00
(16, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x0/0.00
(17, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x0/0.00
(18, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x0/0.00
(19, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x0/0.00
(19200, 1): FAIL (19200)
- test_resource_access
ERROR: SULDB not yet supported on < nve4
ERROR: SUSTx not yet supported on < nve4
(0, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x410

Re: [Nouveau] NV50 compute support questions

2015-11-26 Thread Samuel Pitoiset



On 11/26/2015 01:21 PM, Hans de Goede wrote:

Hi,

On 26-11-15 09:42, Samuel Pitoiset wrote:

Well, if you remove that assert locally, all compute tests in
src/gallium/tests/trivial/compute.c pass on GK106, except the atomic
ones.


Do you mean the:

 Assertion `pres->target != PIPE_BUFFER' failed.

or the:

 Assertion `tex->defExists(0) && tex->srcExists(0)' failed.

assert? Or is the first one not present for Keppler?


The first one. The second one doesn't happen on Kepler.



I do not have a GK106, I've a GK208, and IIRC that one is known to not
work,
I guess I can give it a try.


Compute support is not supported on GK110+, yeah...

If you provide me a MMT trace of, for example, vectorAdd from the CUDA 
samples I could have a look.





I'm working on the fermi case btw.


Great, thanks.

Regards,

Hans




On 11/25/2015 03:43 PM, Hans de Goede wrote:

Hi,

On 20-11-15 17:07, Samuel Pitoiset wrote:



On 11/20/2015 11:36 AM, Hans de Goede wrote:

Hi Samual, et al,


Hi Hans,



In
http://cgit.freedesktop.org/mesa/mesa/commit/src/gallium/drivers/nouveau?id=ff72440b40211326eda118232fabd53965410afd



you write: "This compute support has been tested by
Pierre Moreau and myself with some compute kernels."

Can you provide testing instructions (and the
necessary files) so that I can try to reproduce
your tests ?

And once I've reproduced your tests, the next
question is where / how did you get the compute
kernels for testing. I guess you manually wrote them ?


Yeah, I wrote those compute kernels directly in assembly by hand.

I already said few days ago, you have some examples in
src/gallium/tests/trivial/compute.c which show how to use that compute
support stuff with TGSI kernels and without clover. Because clover is
not currently able to do OpenCL -> TGSI using Clang/LLVM, you can't
really use your backend directly.


Ok, so I've been  trying to get Francisco's nbody.c to run,
but that does not work. It runs but the planet bodies all stay in
the same place, I still need to debug this further any hints
for how to debug this are appreciated.

So I tried to run src/gallium/tests/trivial/compute.c,
with a recent mesa master, but that does not work either.

I get the following when I try to run this:

compute: nvc0/nvc0_resource.c:41: nvc0_surface_create: Assertion
`pres->target != PIPE_BUFFER' failed.

(gdb) bt
#0  0x76cbca98 in raise () from /lib64/libc.so.6
#1  0x76cbe69a in abort () from /lib64/libc.so.6
#2  0x76cb5227 in __assert_fail_base () from /lib64/libc.so.6
#3  0x76cb52d2 in __assert_fail () from /lib64/libc.so.6
#4  0x74e5b24c in nvc0_surface_create (pipe=,
 pres=, templ=) at
nvc0/nvc0_resource.c:41
#5  0x00404341 in init_compute_resources
(ctx=ctx@entry=0x691010,
 slots=slots@entry=0x7fffd980) at compute.c:347
#6  0x00402cec in test_system_values (ctx=0x691010) at
compute.c:494
#7  main (argc=, argv=) at compute.c:1584

Which comes from the assert here:

static struct pipe_surface *
nvc0_surface_create(struct pipe_context *pipe,
 struct pipe_resource *pres,
 const struct pipe_surface *templ)
{
/* surfaces are assumed to be miptrees all over the place. */
assert(pres->target != PIPE_BUFFER);
if (unlikely(pres->target == PIPE_BUFFER))
   return nv50_surface_from_buffer(pipe, pres, templ);
return nvc0_miptree_surface_new(pipe, pres, templ);
}

Just dropping that assert helps somewhat, it leads to:

PIPE_COMPUTE_CAP_GRID_DIMENSION: { 3 }
PIPE_COMPUTE_CAP_MAX_GRID_SIZE: { 65535 65535 65535 }
PIPE_COMPUTE_CAP_MAX_BLOCK_SIZE: { 1024 1024 64 }
- test_system_values
ERROR: SUSTx not yet supported on < nve4
ERROR: SUSTx not yet supported on < nve4
ERROR: SUSTx not yet supported on < nve4
ERROR: SUSTx not yet supported on < nve4
(0, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x0/0.00
(1, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x0/0.00
(2, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x0/0.00
(3, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x0/0.00
(4, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x4/0.00
(5, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x3/0.00
(6, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x5/0.00
(7, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x1/0.00
(8, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x5/0.00
(9, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x4/0.00
(10, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x1/0.00
(11, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x1/0.00
(12, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x0/0.00
(13, 0)[0]: got 0xdeadbeef/-6259853398707798016.00, expected
0x0/0.00
(14, 0)[0]: got 0xdeadbeef

Re: [Nouveau] NV50 compute support questions

2015-11-20 Thread Samuel Pitoiset



On 11/20/2015 11:36 AM, Hans de Goede wrote:

Hi Samual, et al,


Hi Hans,



In
http://cgit.freedesktop.org/mesa/mesa/commit/src/gallium/drivers/nouveau?id=ff72440b40211326eda118232fabd53965410afd

you write: "This compute support has been tested by
Pierre Moreau and myself with some compute kernels."

Can you provide testing instructions (and the
necessary files) so that I can try to reproduce
your tests ?

And once I've reproduced your tests, the next
question is where / how did you get the compute
kernels for testing. I guess you manually wrote them ?


Yeah, I wrote those compute kernels directly in assembly by hand.

I already said few days ago, you have some examples in 
src/gallium/tests/trivial/compute.c which show how to use that compute 
support stuff with TGSI kernels and without clover. Because clover is 
not currently able to do OpenCL -> TGSI using Clang/LLVM, you can't 
really use your backend directly.


An other way to achieve what you need is to copy/paste your TGSI kernel 
in src/gallium/tests/trivial/compute.c, set up the global buffers and 
other stuff (maybe samplers, textures and so on) yourself. This is a bit 
painful but should work as expected.




As you know I'm working on a llvm tgsi backend,
it actually produces some output now, if you want
to take a peek it lives here:
http://cgit.freedesktop.org/~jwrdegoede/llvm


I'm currently building your TGSI branch. :-)



Before working further on this I want to take
a bottom up approach, so I want to first make
sure we've working TGSI -> compute-kernel and
compute-kernel -> hardware steps. So the next
question is, do you know if we can go from
(manually written) TGSI to a compute-kernel
using say nouveau-compiler ?


Sure, you can use nouveau-compiler to convert TGSI to NV50 IR, but as I 
said, you can't directly execute your compute kernel without setting a 
ton of stuff before... That's a bunch of fun! :-)


Btw, do you still need compute support on your GK208? or did you have an 
other card for testing ?




And if not, do you know what is missing to do
this?

Thanks & Regards,

Hans

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] llvm TGSI backend (WIP) questions

2015-11-13 Thread Samuel Pitoiset



On 11/13/2015 02:46 PM, Hans de Goede wrote:

Hi All,


Hey Hans,



So as discussed I've started working on a TGSI backend for
llvm to use as a way to get compute going on nouveau (and other gpu-s).

I'm still learning all the ins and outs of llvm so I do not have
much to show yet.

I've rebased Francisco's (curro's) latest version on top of llvm
trunk, and added a commit on top to actual get it build with the
latest trunk. So currently I'm at the point where I've just
taken Francisco's code, and made it compile, no more and no less.

I have a git repo with this work available here:

http://cgit.freedesktop.org/~jwrdegoede/llvm/


Thanks for sharing your work. :-)



So the next step would be to test this and see if it actually
does anything, questions:

1) Does anyone have a simple test case / command where I can
invoke just llvm and get TGSI asm output to check ?

2) Assuming I get the above to (somewhat) work, is there a
way to make llvm show the output of the various intermediate
passes in a human readable form ?


Basically, you need to ask Clang to emit LLVM code for you, for example, 
this command will emit LLVM IR:


clang -cc1 -cl-std=CL1.2 -emit-llvm -triple spir64-unknown-unknown kernel.cl

Note that this command only works with an old LLVM version (I don't 
remember exactly).


But in your case, and for that TGSI backend, I don't think there is a 
-emit-tgsi option which can directly output TGSI from OpenCL.


The other way, and in my opinion the best, is to write a little C++ 
program based on Clang/LLVM API for generating TGSI code. To do that,
you can have a look at 
src/gallium/state_trackers/clover/llvm/invocation.cpp which contains an 
example (but it seems to be outdated).


Basically, you need to call that CompilerInvocation object with some 
parameters and all the stuff around. This should not take more than 
100LOC in my opinion. I think the first step should be to emit LLVM IR 
before trying to get TGSI working.


I could write that program for you if you want but I don't think to have 
time to do it during this weekend.


Thanks.



Regards,

Hans


--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] nv50, nvc0: don't base decisions on available pushbuf space

2015-10-11 Thread Samuel Pitoiset
I did a full piglit run on Fermi. There are no regressions and you fixed 
texelFetch tests and other ones which failed with that assert.


I'm lazy to do it on Tesla, so:

Reviewed-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>

Thanks!

On 10/10/2015 11:09 AM, Ilia Mirkin wrote:

We still have to push everything out, might as well kick earlier and
flip pushbufs when we know we'll need it. This resolves some issues with
the new policy of making sure that we always leave a bit of room at the
end for fences.

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
Cc: mesa-sta...@lists.freedesktop.org
---
  src/gallium/drivers/nouveau/nv50/nv50_shader_state.c |  9 ++---
  src/gallium/drivers/nouveau/nv50/nv50_transfer.c | 16 +++-
  src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c | 20 +---
  3 files changed, 10 insertions(+), 35 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c 
b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
index fdde11f..941555f 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
@@ -65,14 +65,9 @@ nv50_constbufs_validate(struct nv50_context *nv50)
 PUSH_DATA (push, (b << 12) | (i << 8) | p | 1);
  }
  while (words) {
-   unsigned nr;
-
-   if (!PUSH_SPACE(push, 16))
-  break;
-   nr = PUSH_AVAIL(push);
-   assert(nr >= 16);
-   nr = MIN2(MIN2(nr - 3, words), NV04_PFIFO_MAX_PACKET_LEN);
+   unsigned nr = MIN2(words, NV04_PFIFO_MAX_PACKET_LEN);
  
+   PUSH_SPACE(push, nr + 3);

 BEGIN_NV04(push, NV50_3D(CB_ADDR), 1);
 PUSH_DATA (push, (start << 8) | b);
 BEGIN_NI04(push, NV50_3D(CB_DATA(0)), nr);
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_transfer.c 
b/src/gallium/drivers/nouveau/nv50/nv50_transfer.c
index be51407..9a3fd1e 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_transfer.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_transfer.c
@@ -187,14 +187,7 @@ nv50_sifc_linear_u8(struct nouveau_context *nv,
 PUSH_DATA (push, 0);
  
 while (count) {

-  unsigned nr;
-
-  if (!PUSH_SPACE(push, 16))
- break;
-  nr = PUSH_AVAIL(push);
-  assert(nr >= 16);
-  nr = MIN2(count, nr - 1);
-  nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN);
+  unsigned nr = MIN2(count, NV04_PFIFO_MAX_PACKET_LEN);
  
BEGIN_NI04(push, NV50_2D(SIFC_DATA), nr);

PUSH_DATAp(push, src, nr);
@@ -395,12 +388,9 @@ nv50_cb_push(struct nouveau_context *nv,
 nouveau_pushbuf_validate(push);
  
 while (words) {

-  unsigned nr;
-
-  nr = PUSH_AVAIL(push);
-  nr = MIN2(nr - 7, words);
-  nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN - 1);
+  unsigned nr = MIN2(words, NV04_PFIFO_MAX_PACKET_LEN);
  
+  PUSH_SPACE(push, nr + 7);

BEGIN_NV04(push, NV50_3D(CB_DEF_ADDRESS_HIGH), 3);
PUSH_DATAh(push, bo->offset + base);
PUSH_DATA (push, bo->offset + base);
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
index aaec60a..d459dd6 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
@@ -188,14 +188,10 @@ nvc0_m2mf_push_linear(struct nouveau_context *nv,
 nouveau_pushbuf_validate(push);
  
 while (count) {

-  unsigned nr;
+  unsigned nr = MIN2(count, NV04_PFIFO_MAX_PACKET_LEN);
  
-  if (!PUSH_SPACE(push, 16))

+  if (!PUSH_SPACE(push, nr + 9))
   break;
-  nr = PUSH_AVAIL(push);
-  assert(nr >= 16);
-  nr = MIN2(count, nr - 9);
-  nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN);
  
BEGIN_NVC0(push, NVC0_M2MF(OFFSET_OUT_HIGH), 2);

PUSH_DATAh(push, dst->offset + offset);
@@ -234,14 +230,10 @@ nve4_p2mf_push_linear(struct nouveau_context *nv,
 nouveau_pushbuf_validate(push);
  
 while (count) {

-  unsigned nr;
+  unsigned nr = MIN2(count, (NV04_PFIFO_MAX_PACKET_LEN - 1));
  
-  if (!PUSH_SPACE(push, 16))

+  if (!PUSH_SPACE(push, nr + 10))
   break;
-  nr = PUSH_AVAIL(push);
-  assert(nr >= 16);
-  nr = MIN2(count, nr - 8);
-  nr = MIN2(nr, (NV04_PFIFO_MAX_PACKET_LEN - 1));
  
BEGIN_NVC0(push, NVE4_P2MF(UPLOAD_DST_ADDRESS_HIGH), 2);

PUSH_DATAh(push, dst->offset + offset);
@@ -571,9 +563,7 @@ nvc0_cb_bo_push(struct nouveau_context *nv,
 PUSH_DATA (push, bo->offset + base);
  
 while (words) {

-  unsigned nr = PUSH_AVAIL(push);
-  nr = MIN2(nr, words);
-  nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN - 1);
+  unsigned nr = MIN2(words, NV04_PFIFO_MAX_PACKET_LEN - 1);
  
PUSH_SPACE(push, nr + 2);

PUSH_REFN (push, bo, NOUVEAU_BO_WR | domain);


___

Re: [Nouveau] [PATCH] nv50, nvc0: don't base decisions on available pushbuf space

2015-10-10 Thread Samuel Pitoiset



On 10/10/2015 09:42 PM, Ilia Mirkin wrote:

On Sat, Oct 10, 2015 at 3:41 PM, Samuel Pitoiset
<samuel.pitoi...@gmail.com> wrote:

This patch looks fine except that it should be a bit more normalized. I
mean, sometimes you break when PUSH_SPACE fails, sometimes not. Same for
PUSH_SPACE calls, sometimes you add it sometimes not.

Meh. We need to get our error checking situation straight, but this
isn't the patch to do it in.


Yeah, but this needs to be clarified.




Did you run a full piglit test this time ? :)

Nope, but I ran a full piglit before this patch. Almost took down my
box. Probably won't be running it again for this patch.


Ok, I'll run a full piglit this night then.




See my comment below.


On 10/10/2015 11:09 AM, Ilia Mirkin wrote:

We still have to push everything out, might as well kick earlier and
flip pushbufs when we know we'll need it. This resolves some issues with
the new policy of making sure that we always leave a bit of room at the
end for fences.

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
Cc: mesa-sta...@lists.freedesktop.org
---
   src/gallium/drivers/nouveau/nv50/nv50_shader_state.c |  9 ++---
   src/gallium/drivers/nouveau/nv50/nv50_transfer.c | 16
+++-
   src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c | 20
+---
   3 files changed, 10 insertions(+), 35 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
index fdde11f..941555f 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
@@ -65,14 +65,9 @@ nv50_constbufs_validate(struct nv50_context *nv50)
  PUSH_DATA (push, (b << 12) | (i << 8) | p | 1);
   }
   while (words) {
-   unsigned nr;
-
-   if (!PUSH_SPACE(push, 16))
-  break;
-   nr = PUSH_AVAIL(push);
-   assert(nr >= 16);
-   nr = MIN2(MIN2(nr - 3, words), NV04_PFIFO_MAX_PACKET_LEN);
+   unsigned nr = MIN2(words, NV04_PFIFO_MAX_PACKET_LEN);
   +   PUSH_SPACE(push, nr + 3);


This PUSH_SPACE call doesn't seem to be needed for me because
NV50_PUSH_EXPLICIT_SPACE_CHECKING is not set and the following BEGIN_XXX
calls will allocate space.

I want to ensure that both of the below commands are in the same
batch. Not sure if it's necessary, but... don't want to find out. They
were in the same batch before. And this batch stuff is what was
causing the M2MF errors I was seeing earlier.




  BEGIN_NV04(push, NV50_3D(CB_ADDR), 1);
  PUSH_DATA (push, (start << 8) | b);
  BEGIN_NI04(push, NV50_3D(CB_DATA(0)), nr);
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_transfer.c
b/src/gallium/drivers/nouveau/nv50/nv50_transfer.c
index be51407..9a3fd1e 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_transfer.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_transfer.c
@@ -187,14 +187,7 @@ nv50_sifc_linear_u8(struct nouveau_context *nv,
  PUSH_DATA (push, 0);
while (count) {
-  unsigned nr;
-
-  if (!PUSH_SPACE(push, 16))
- break;
-  nr = PUSH_AVAIL(push);
-  assert(nr >= 16);
-  nr = MIN2(count, nr - 1);
-  nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN);
+  unsigned nr = MIN2(count, NV04_PFIFO_MAX_PACKET_LEN);
   BEGIN_NI04(push, NV50_2D(SIFC_DATA), nr);
 PUSH_DATAp(push, src, nr);
@@ -395,12 +388,9 @@ nv50_cb_push(struct nouveau_context *nv,
  nouveau_pushbuf_validate(push);
while (words) {
-  unsigned nr;
-
-  nr = PUSH_AVAIL(push);
-  nr = MIN2(nr - 7, words);
-  nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN - 1);
+  unsigned nr = MIN2(words, NV04_PFIFO_MAX_PACKET_LEN);
   +  PUSH_SPACE(push, nr + 7);
 BEGIN_NV04(push, NV50_3D(CB_DEF_ADDRESS_HIGH), 3);
 PUSH_DATAh(push, bo->offset + base);
 PUSH_DATA (push, bo->offset + base);
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
b/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
index aaec60a..d459dd6 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
@@ -188,14 +188,10 @@ nvc0_m2mf_push_linear(struct nouveau_context *nv,
  nouveau_pushbuf_validate(push);
while (count) {
-  unsigned nr;
+  unsigned nr = MIN2(count, NV04_PFIFO_MAX_PACKET_LEN);
   -  if (!PUSH_SPACE(push, 16))
+  if (!PUSH_SPACE(push, nr + 9))
break;
-  nr = PUSH_AVAIL(push);
-  assert(nr >= 16);
-  nr = MIN2(count, nr - 9);
-  nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN);
   BEGIN_NVC0(push, NVC0_M2MF(OFFSET_OUT_HIGH), 2);
 PUSH_DATAh(push, dst->offset + offset);
@@ -234,14 +230,10 @@ nve4_p2mf_push_linear(struct nouveau_context *nv,
  nouveau_pushbuf_validate(push);
whi

Re: [Nouveau] [Mesa-dev] [PATCH] nouveau: avoid emitting new fences unnecessarily

2015-10-10 Thread Samuel Pitoiset

Does this fix those texelFetch piglit tests ? Or is it the second patch ?

Anyway, this patch is :

Reviewed-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>

On 10/10/2015 08:12 AM, Ilia Mirkin wrote:

Right now we emit on every kick, but this is only necessary if something
will ever be able to observe that the fence completed. If there are no
refs, leave the fence alone and emit it another day.

This also happens to work around an issue for the kick handler -- a kick
can be a result of e.g. nouveau_bo_wait or explicit kick, or it can be
due to lack of space in the pushbuf. We want the emit to happen in the
current batch, so we want there to always be enough space. However an
explicit kick could take the reserved space for the implicitly-triggered
kick's fence emission if it happened right after. With the new mechanism,
hopefully there's no way to cause two fences to be emitted into the same
reserved space.

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
Cc: mesa-sta...@lists.freedesktop.org
Fixes: 47d11990b (nouveau: make sure there's always room to emit a fence)
---
  src/gallium/drivers/nouveau/nouveau_fence.c | 12 +---
  1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nouveau_fence.c 
b/src/gallium/drivers/nouveau/nouveau_fence.c
index ee4e08d..18b1592 100644
--- a/src/gallium/drivers/nouveau/nouveau_fence.c
+++ b/src/gallium/drivers/nouveau/nouveau_fence.c
@@ -190,8 +190,10 @@ nouveau_fence_wait(struct nouveau_fence *fence)
 /* wtf, someone is waiting on a fence in flush_notify handler? */
 assert(fence->state != NOUVEAU_FENCE_STATE_EMITTING);
  
-   if (fence->state < NOUVEAU_FENCE_STATE_EMITTED)

+   if (fence->state < NOUVEAU_FENCE_STATE_EMITTED) {
+  PUSH_SPACE(screen->pushbuf, 8);
nouveau_fence_emit(fence);
+   }
  
 if (fence->state < NOUVEAU_FENCE_STATE_FLUSHED)

if (nouveau_pushbuf_kick(screen->pushbuf, screen->pushbuf->channel))
@@ -224,8 +226,12 @@ nouveau_fence_wait(struct nouveau_fence *fence)
  void
  nouveau_fence_next(struct nouveau_screen *screen)
  {
-   if (screen->fence.current->state < NOUVEAU_FENCE_STATE_EMITTING)
-  nouveau_fence_emit(screen->fence.current);
+   if (screen->fence.current->state < NOUVEAU_FENCE_STATE_EMITTING) {
+  if (screen->fence.current->ref > 1)
+ nouveau_fence_emit(screen->fence.current);
+  else
+ return;
+   }
  
 nouveau_fence_ref(NULL, >fence.current);
  


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] nv50, nvc0: don't base decisions on available pushbuf space

2015-10-10 Thread Samuel Pitoiset
This patch looks fine except that it should be a bit more normalized. I 
mean, sometimes you break when PUSH_SPACE fails, sometimes not. Same for 
PUSH_SPACE calls, sometimes you add it sometimes not.


Did you run a full piglit test this time ? :)

See my comment below.

On 10/10/2015 11:09 AM, Ilia Mirkin wrote:

We still have to push everything out, might as well kick earlier and
flip pushbufs when we know we'll need it. This resolves some issues with
the new policy of making sure that we always leave a bit of room at the
end for fences.

Signed-off-by: Ilia Mirkin 
Cc: mesa-sta...@lists.freedesktop.org
---
  src/gallium/drivers/nouveau/nv50/nv50_shader_state.c |  9 ++---
  src/gallium/drivers/nouveau/nv50/nv50_transfer.c | 16 +++-
  src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c | 20 +---
  3 files changed, 10 insertions(+), 35 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c 
b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
index fdde11f..941555f 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
@@ -65,14 +65,9 @@ nv50_constbufs_validate(struct nv50_context *nv50)
 PUSH_DATA (push, (b << 12) | (i << 8) | p | 1);
  }
  while (words) {
-   unsigned nr;
-
-   if (!PUSH_SPACE(push, 16))
-  break;
-   nr = PUSH_AVAIL(push);
-   assert(nr >= 16);
-   nr = MIN2(MIN2(nr - 3, words), NV04_PFIFO_MAX_PACKET_LEN);
+   unsigned nr = MIN2(words, NV04_PFIFO_MAX_PACKET_LEN);
  
+   PUSH_SPACE(push, nr + 3);


This PUSH_SPACE call doesn't seem to be needed for me because 
NV50_PUSH_EXPLICIT_SPACE_CHECKING is not set and the following BEGIN_XXX 
calls will allocate space.



 BEGIN_NV04(push, NV50_3D(CB_ADDR), 1);
 PUSH_DATA (push, (start << 8) | b);
 BEGIN_NI04(push, NV50_3D(CB_DATA(0)), nr);
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_transfer.c 
b/src/gallium/drivers/nouveau/nv50/nv50_transfer.c
index be51407..9a3fd1e 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_transfer.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_transfer.c
@@ -187,14 +187,7 @@ nv50_sifc_linear_u8(struct nouveau_context *nv,
 PUSH_DATA (push, 0);
  
 while (count) {

-  unsigned nr;
-
-  if (!PUSH_SPACE(push, 16))
- break;
-  nr = PUSH_AVAIL(push);
-  assert(nr >= 16);
-  nr = MIN2(count, nr - 1);
-  nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN);
+  unsigned nr = MIN2(count, NV04_PFIFO_MAX_PACKET_LEN);
  
BEGIN_NI04(push, NV50_2D(SIFC_DATA), nr);

PUSH_DATAp(push, src, nr);
@@ -395,12 +388,9 @@ nv50_cb_push(struct nouveau_context *nv,
 nouveau_pushbuf_validate(push);
  
 while (words) {

-  unsigned nr;
-
-  nr = PUSH_AVAIL(push);
-  nr = MIN2(nr - 7, words);
-  nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN - 1);
+  unsigned nr = MIN2(words, NV04_PFIFO_MAX_PACKET_LEN);
  
+  PUSH_SPACE(push, nr + 7);

BEGIN_NV04(push, NV50_3D(CB_DEF_ADDRESS_HIGH), 3);
PUSH_DATAh(push, bo->offset + base);
PUSH_DATA (push, bo->offset + base);
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
index aaec60a..d459dd6 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
@@ -188,14 +188,10 @@ nvc0_m2mf_push_linear(struct nouveau_context *nv,
 nouveau_pushbuf_validate(push);
  
 while (count) {

-  unsigned nr;
+  unsigned nr = MIN2(count, NV04_PFIFO_MAX_PACKET_LEN);
  
-  if (!PUSH_SPACE(push, 16))

+  if (!PUSH_SPACE(push, nr + 9))
   break;
-  nr = PUSH_AVAIL(push);
-  assert(nr >= 16);
-  nr = MIN2(count, nr - 9);
-  nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN);
  
BEGIN_NVC0(push, NVC0_M2MF(OFFSET_OUT_HIGH), 2);

PUSH_DATAh(push, dst->offset + offset);
@@ -234,14 +230,10 @@ nve4_p2mf_push_linear(struct nouveau_context *nv,
 nouveau_pushbuf_validate(push);
  
 while (count) {

-  unsigned nr;
+  unsigned nr = MIN2(count, (NV04_PFIFO_MAX_PACKET_LEN - 1));
  
-  if (!PUSH_SPACE(push, 16))

+  if (!PUSH_SPACE(push, nr + 10))
   break;
-  nr = PUSH_AVAIL(push);
-  assert(nr >= 16);
-  nr = MIN2(count, nr - 8);
-  nr = MIN2(nr, (NV04_PFIFO_MAX_PACKET_LEN - 1));
  
BEGIN_NVC0(push, NVE4_P2MF(UPLOAD_DST_ADDRESS_HIGH), 2);

PUSH_DATAh(push, dst->offset + offset);
@@ -571,9 +563,7 @@ nvc0_cb_bo_push(struct nouveau_context *nv,
 PUSH_DATA (push, bo->offset + base);
  
 while (words) {

-  unsigned nr = PUSH_AVAIL(push);
-  nr = MIN2(nr, words);
-  nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN - 1);
+  unsigned nr = MIN2(words, 

Re: [Nouveau] [PATCH] nv50, nvc0: don't base decisions on available pushbuf space

2015-10-10 Thread Samuel Pitoiset



On 10/10/2015 09:58 PM, Ilia Mirkin wrote:

On Sat, Oct 10, 2015 at 3:55 PM, Samuel Pitoiset
<samuel.pitoi...@gmail.com> wrote:


On 10/10/2015 09:42 PM, Ilia Mirkin wrote:

On Sat, Oct 10, 2015 at 3:41 PM, Samuel Pitoiset
<samuel.pitoi...@gmail.com> wrote:

This patch looks fine except that it should be a bit more normalized. I
mean, sometimes you break when PUSH_SPACE fails, sometimes not. Same for
PUSH_SPACE calls, sometimes you add it sometimes not.

Meh. We need to get our error checking situation straight, but this
isn't the patch to do it in.


Yeah, but this needs to be clarified.

What does?


I mean, we should either use PUSH_SPACE everywhere or not at all, and 
always breaks (or not) when PUSH_SPACE fails.

That's really a minor issue.


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] nv50, nvc0: don't base decisions on available pushbuf space

2015-10-10 Thread Samuel Pitoiset



On 10/10/2015 10:17 PM, Ilia Mirkin wrote:

On Sat, Oct 10, 2015 at 4:21 PM, Samuel Pitoiset
<samuel.pitoi...@gmail.com> wrote:


On 10/10/2015 09:58 PM, Ilia Mirkin wrote:

On Sat, Oct 10, 2015 at 3:55 PM, Samuel Pitoiset
<samuel.pitoi...@gmail.com> wrote:


On 10/10/2015 09:42 PM, Ilia Mirkin wrote:

On Sat, Oct 10, 2015 at 3:41 PM, Samuel Pitoiset
<samuel.pitoi...@gmail.com> wrote:

This patch looks fine except that it should be a bit more normalized. I
mean, sometimes you break when PUSH_SPACE fails, sometimes not. Same
for
PUSH_SPACE calls, sometimes you add it sometimes not.

Meh. We need to get our error checking situation straight, but this
isn't the patch to do it in.


Yeah, but this needs to be clarified.

What does?


I mean, we should either use PUSH_SPACE everywhere or not at all, and always
breaks (or not) when PUSH_SPACE fails.
That's really a minor issue.

It's actually a major issue. Error-handling is practically
non-existent. There are a couple of spots here and there, but it
doesn't really scale up. I guess I (semi-)accidentally removed a
couple of spots that error checked, but, again, meh. Doing this for
real will require some careful thought.


Yeah, okay. So we really need to improve error-handling. :)


   -ilia


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] nouveau: make sure there's always room to emit a fence

2015-10-05 Thread Samuel Pitoiset

Reviewed-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>

On 10/05/2015 09:21 PM, Ilia Mirkin wrote:

I started seeing a lot of situations on nv30 where fence emission
wouldn't fit into the previous buffer (causing assertions). This ensures
that whenever checking for space, we always leave a bit of extra room
for the fence emission commands. Adjusts the nv30 and nvc0 fence
emission logic to bypass the space checking as well.

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
Cc: mesa-sta...@lists.freedesktop.org
---
  src/gallium/drivers/nouveau/nouveau_winsys.h   | 2 ++
  src/gallium/drivers/nouveau/nv30/nv30_screen.c | 4 +++-
  src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 +
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 3 ++-
  4 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nouveau_winsys.h 
b/src/gallium/drivers/nouveau/nouveau_winsys.h
index 389a229..a44fd3e 100644
--- a/src/gallium/drivers/nouveau/nouveau_winsys.h
+++ b/src/gallium/drivers/nouveau/nouveau_winsys.h
@@ -24,6 +24,8 @@ PUSH_AVAIL(struct nouveau_pushbuf *push)
  static inline bool
  PUSH_SPACE(struct nouveau_pushbuf *push, uint32_t size)
  {
+   /* Provide a buffer so that fences always have room to be emitted */
+   size += 8;
 if (PUSH_AVAIL(push) < size)
return nouveau_pushbuf_space(push, size, 0, 0) == 0;
 return true;
diff --git a/src/gallium/drivers/nouveau/nv30/nv30_screen.c 
b/src/gallium/drivers/nouveau/nv30/nv30_screen.c
index 39267b3..335c163 100644
--- a/src/gallium/drivers/nouveau/nv30/nv30_screen.c
+++ b/src/gallium/drivers/nouveau/nv30/nv30_screen.c
@@ -347,7 +347,9 @@ nv30_screen_fence_emit(struct pipe_screen *pscreen, 
uint32_t *sequence)
  
 *sequence = ++screen->base.fence.sequence;
  
-   BEGIN_NV04(push, NV30_3D(FENCE_OFFSET), 2);

+   assert(PUSH_AVAIL(push) >= 3);
+   PUSH_DATA (push, NV30_3D_FENCE_OFFSET |
+  (2 /* size */ << 18) | (7 /* subchan */ << 13));


Is there some other places where we do something like this?
If so, maybe we should introduce NV30_FIFO_PKHDR_SQ.


 PUSH_DATA (push, 0);
 PUSH_DATA (push, *sequence);
  }
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c 
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index 6012ff6..812b246 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -388,6 +388,7 @@ nv50_screen_fence_emit(struct pipe_screen *pscreen, u32 
*sequence)
 /* we need to do it after possible flush in MARK_RING */
 *sequence = ++screen->base.fence.sequence;
  
+   assert(PUSH_AVAIL(push) >= 5);

 PUSH_DATA (push, NV50_FIFO_PKHDR(NV50_3D(QUERY_ADDRESS_HIGH), 4));
 PUSH_DATAh(push, screen->fence.bo->offset);
 PUSH_DATA (push, screen->fence.bo->offset);
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index 32da76c..afd91e6 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -537,7 +537,8 @@ nvc0_screen_fence_emit(struct pipe_screen *pscreen, u32 
*sequence)
 /* we need to do it after possible flush in MARK_RING */
 *sequence = ++screen->base.fence.sequence;
  
-   BEGIN_NVC0(push, NVC0_3D(QUERY_ADDRESS_HIGH), 4);

+   assert(PUSH_AVAIL(push) >= 5);
+   PUSH_DATA (push, NVC0_FIFO_PKHDR_SQ(NVC0_3D(QUERY_ADDRESS_HIGH), 4));
 PUSH_DATAh(push, screen->fence.bo->offset);
 PUSH_DATA (push, screen->fence.bo->offset);
 PUSH_DATA (push, *sequence);


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] [resend] nouveau: Disable AGP for SiS 761

2015-09-30 Thread Samuel Pitoiset

This patch has been merged by Ben yesterday.

http://cgit.freedesktop.org/~darktama/nouveau/commit/?id=8c713f90a63ffca10d122af09d439f3409c933ed

Why do you send a new version ? Is the previous patch wrong?

On 09/30/2015 01:48 PM, Ondrej Zary wrote:

SiS 761 chipset does not support AGP cards but has AGP capability (for
the onboard video). At least PC Chips A31G board using this chipset has
an AGP-like AGPro slot that's wired to the PCI bus. Enabling AGP will
fail (GPU lockup and software fbcon, X11 hangs).

Add support for matching just the host bridge in nvkm_device_agp_quirks
and add entry for SiS 761 with mode 0 (AGP disabled).

Signed-off-by: Ondrej Zary 
---
  drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.c |8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.c
index 814cb51..385a90f 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.c
@@ -35,6 +35,8 @@ static const struct nvkm_device_agp_quirk
  nvkm_device_agp_quirks[] = {
/* VIA Apollo PRO133x / GeForce FX 5600 Ultra - fdo#20341 */
{ PCI_VENDOR_ID_VIA, 0x0691, PCI_VENDOR_ID_NVIDIA, 0x0311, 2 },
+   /* SiS 761 does not support AGP cards, use PCI mode */
+   { PCI_VENDOR_ID_SI, 0x0761, PCI_ANY_ID, PCI_ANY_ID, 0 },
{},
  };
  
@@ -137,8 +139,10 @@ nvkm_agp_ctor(struct nvkm_pci *pci)

while (quirk->hostbridge_vendor) {
if (info.device->vendor == quirk->hostbridge_vendor &&
info.device->device == quirk->hostbridge_device &&
-   pci->pdev->vendor == quirk->chip_vendor &&
-   pci->pdev->device == quirk->chip_device) {
+   (quirk->chip_vendor == (u16)PCI_ANY_ID ||
+   pci->pdev->vendor == quirk->chip_vendor) &&
+   (quirk->chip_device == (u16)PCI_ANY_ID ||
+   pci->pdev->device == quirk->chip_device)) {
nvkm_info(subdev, "forcing default agp mode to %dX, "
  "use NvAGP= to override\n",
  quirk->mode);


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH v3] ibus/gf100: increase wait timeout to avoid read faults

2015-09-24 Thread Samuel Pitoiset
Increase clock timeout of some unknown engines in order to avoid failure
at high gpcclk rate.

This fixes IBUS read faults on my GF119 when reclocking is manually
enabled. Note that memory reclocking is completely broken and NvMemExec
has to be disabled to allow core clock reclocking only.

Signed-off-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>
---
V3: changed some nvkm_mask() to nvkm_wr32() for gf100 (as the blob does)

 drm/nouveau/include/nvkm/subdev/ibus.h |  1 +
 drm/nouveau/nvkm/engine/device/base.c  |  4 +--
 drm/nouveau/nvkm/subdev/ibus/Kbuild|  1 +
 drm/nouveau/nvkm/subdev/ibus/gf100.c   | 17 ++--
 drm/nouveau/nvkm/subdev/ibus/gf117.c   | 51 ++
 drm/nouveau/nvkm/subdev/ibus/priv.h|  7 +
 6 files changed, 77 insertions(+), 4 deletions(-)
 create mode 100644 drm/nouveau/nvkm/subdev/ibus/gf117.c
 create mode 100644 drm/nouveau/nvkm/subdev/ibus/priv.h

diff --git a/drm/nouveau/include/nvkm/subdev/ibus.h 
b/drm/nouveau/include/nvkm/subdev/ibus.h
index 9d512cd..c4dcd26 100644
--- a/drm/nouveau/include/nvkm/subdev/ibus.h
+++ b/drm/nouveau/include/nvkm/subdev/ibus.h
@@ -3,6 +3,7 @@
 #include 
 
 int gf100_ibus_new(struct nvkm_device *, int, struct nvkm_subdev **);
+int gf117_ibus_new(struct nvkm_device *, int, struct nvkm_subdev **);
 int gk104_ibus_new(struct nvkm_device *, int, struct nvkm_subdev **);
 int gk20a_ibus_new(struct nvkm_device *, int, struct nvkm_subdev **);
 #endif
diff --git a/drm/nouveau/nvkm/engine/device/base.c 
b/drm/nouveau/nvkm/engine/device/base.c
index 952a508..83383bc 100644
--- a/drm/nouveau/nvkm/engine/device/base.c
+++ b/drm/nouveau/nvkm/engine/device/base.c
@@ -1595,7 +1595,7 @@ nvd7_chipset = {
.fuse = gf100_fuse_new,
.gpio = gf119_gpio_new,
.i2c = gf117_i2c_new,
-   .ibus = gf100_ibus_new,
+   .ibus = gf117_ibus_new,
.imem = nv50_instmem_new,
.ltc = gf100_ltc_new,
.mc = gf100_mc_new,
@@ -1628,7 +1628,7 @@ nvd9_chipset = {
.fuse = gf100_fuse_new,
.gpio = gf119_gpio_new,
.i2c = gf119_i2c_new,
-   .ibus = gf100_ibus_new,
+   .ibus = gf117_ibus_new,
.imem = nv50_instmem_new,
.ltc = gf100_ltc_new,
.mc = gf100_mc_new,
diff --git a/drm/nouveau/nvkm/subdev/ibus/Kbuild 
b/drm/nouveau/nvkm/subdev/ibus/Kbuild
index a0b12d2..de888fa 100644
--- a/drm/nouveau/nvkm/subdev/ibus/Kbuild
+++ b/drm/nouveau/nvkm/subdev/ibus/Kbuild
@@ -1,3 +1,4 @@
 nvkm-y += nvkm/subdev/ibus/gf100.o
+nvkm-y += nvkm/subdev/ibus/gf117.o
 nvkm-y += nvkm/subdev/ibus/gk104.o
 nvkm-y += nvkm/subdev/ibus/gk20a.o
diff --git a/drm/nouveau/nvkm/subdev/ibus/gf100.c 
b/drm/nouveau/nvkm/subdev/ibus/gf100.c
index 37a0496..72d6330 100644
--- a/drm/nouveau/nvkm/subdev/ibus/gf100.c
+++ b/drm/nouveau/nvkm/subdev/ibus/gf100.c
@@ -21,7 +21,7 @@
  *
  * Authors: Ben Skeggs
  */
-#include 
+#include "priv.h"
 
 static void
 gf100_ibus_intr_hub(struct nvkm_subdev *ibus, int i)
@@ -56,7 +56,7 @@ gf100_ibus_intr_gpc(struct nvkm_subdev *ibus, int i)
nvkm_mask(device, 0x128128 + (i * 0x0400), 0x0200, 0x);
 }
 
-static void
+void
 gf100_ibus_intr(struct nvkm_subdev *ibus)
 {
struct nvkm_device *device = ibus->device;
@@ -92,8 +92,21 @@ gf100_ibus_intr(struct nvkm_subdev *ibus)
}
 }
 
+static int
+gf100_ibus_init(struct nvkm_subdev *ibus)
+{
+   struct nvkm_device *device = ibus->device;
+   nvkm_mask(device, 0x122310, 0x0003, 0x0800);
+   nvkm_wr32(device, 0x12232c, 0x00100064);
+   nvkm_wr32(device, 0x122330, 0x00100064);
+   nvkm_wr32(device, 0x122334, 0x00100064);
+   nvkm_mask(device, 0x122348, 0x0003, 0x0100);
+   return 0;
+}
+
 static const struct nvkm_subdev_func
 gf100_ibus = {
+   .init = gf100_ibus_init,
.intr = gf100_ibus_intr,
 };
 
diff --git a/drm/nouveau/nvkm/subdev/ibus/gf117.c 
b/drm/nouveau/nvkm/subdev/ibus/gf117.c
new file mode 100644
index 000..f69f263
--- /dev/null
+++ b/drm/nouveau/nvkm/subdev/ibus/gf117.c
@@ -0,0 +1,51 @@
+/*
+ * Copyright 2015 Samuel Pitosiet
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR 

[Nouveau] [PATCH 1/2] fb/ramgf100: disable memory reclocking by default

2015-09-23 Thread Samuel Pitoiset
Although memory reclocking seems to be completely broken on my GF119, we
can at least allow users to enable reclocking for the core clock.

Signed-off-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>
---
 drm/nouveau/nvkm/subdev/fb/ramgf100.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drm/nouveau/nvkm/subdev/fb/ramgf100.c 
b/drm/nouveau/nvkm/subdev/fb/ramgf100.c
index 772425c..a3219a2 100644
--- a/drm/nouveau/nvkm/subdev/fb/ramgf100.c
+++ b/drm/nouveau/nvkm/subdev/fb/ramgf100.c
@@ -409,7 +409,7 @@ gf100_ram_prog(struct nvkm_ram *base)
 {
struct gf100_ram *ram = gf100_ram(base);
struct nvkm_device *device = ram->base.fb->subdev.device;
-   ram_exec(>fuc, nvkm_boolopt(device->cfgopt, "NvMemExec", true));
+   ram_exec(>fuc, nvkm_boolopt(device->cfgopt, "NvMemExec", false));
return 0;
 }
 
-- 
2.5.3

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 2/2] clk/gf100: allow users to enable reclocking

2015-09-23 Thread Samuel Pitoiset



On 09/24/2015 12:00 AM, Martin Peres wrote:

On 24/09/15 00:20, Samuel Pitoiset wrote:

Only the core clock is currently supported.

Signed-off-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>
---
  drm/nouveau/nvkm/subdev/clk/gf100.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drm/nouveau/nvkm/subdev/clk/gf100.c 
b/drm/nouveau/nvkm/subdev/clk/gf100.c

index a52b7e7..807305a 100644
--- a/drm/nouveau/nvkm/subdev/clk/gf100.c
+++ b/drm/nouveau/nvkm/subdev/clk/gf100.c
@@ -462,5 +462,5 @@ gf100_clk_new(struct nvkm_device *device, int 
index, struct nvkm_clk **pclk)

  return -ENOMEM;
  *pclk = >base;
  -return nvkm_clk_ctor(_clk, device, index, false, 
>base);

+return nvkm_clk_ctor(_clk, device, index, true, >base);
  }
What changed that suddenly made reclocking OK? You really need to 
prove it is and a few hours of testing are not enough ;)


Make sure that the clock tree is parsed correctly, then programmed 
correctly. Make sure that reclocking while the card is being used is 
also kind of stable, at least on your card.


After that, you may enable it and test on more Fermis. Until then, 
this patch is premature, at best.


Yeah, this is probably a kind of experimental feature for now, but even 
if I didn't check a lot, it seems to be stable (with heaven at least).


Anyways, I'll do more checks to prove that is going to work as expected.

Thanks for your feedbacks Martin.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH 2/2] clk/gf100: allow users to enable reclocking

2015-09-23 Thread Samuel Pitoiset
Only the core clock is currently supported.

Signed-off-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>
---
 drm/nouveau/nvkm/subdev/clk/gf100.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drm/nouveau/nvkm/subdev/clk/gf100.c 
b/drm/nouveau/nvkm/subdev/clk/gf100.c
index a52b7e7..807305a 100644
--- a/drm/nouveau/nvkm/subdev/clk/gf100.c
+++ b/drm/nouveau/nvkm/subdev/clk/gf100.c
@@ -462,5 +462,5 @@ gf100_clk_new(struct nvkm_device *device, int index, struct 
nvkm_clk **pclk)
return -ENOMEM;
*pclk = >base;
 
-   return nvkm_clk_ctor(_clk, device, index, false, >base);
+   return nvkm_clk_ctor(_clk, device, index, true, >base);
 }
-- 
2.5.3

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH] ibus/gf100: increase wait timeout to avoid read faults

2015-09-23 Thread Samuel Pitoiset
Increase clock timeout of some unknown engines in order to avoid failure
at high gpcclk rate.

This fixes IBUS read faults on my GF119 when reclocking is manually
enabled. Note that memory reclocking is completely broken and NvMemExec
has to be disabled to allow core clock reclocking only.

Signed-off-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>
---
 drm/nouveau/include/nvkm/subdev/ibus.h |  1 +
 drm/nouveau/nvkm/engine/device/base.c  |  4 +--
 drm/nouveau/nvkm/subdev/ibus/Kbuild|  1 +
 drm/nouveau/nvkm/subdev/ibus/gf100.c   | 17 ++--
 drm/nouveau/nvkm/subdev/ibus/gf117.c   | 51 ++
 drm/nouveau/nvkm/subdev/ibus/priv.h|  7 +
 6 files changed, 77 insertions(+), 4 deletions(-)
 create mode 100644 drm/nouveau/nvkm/subdev/ibus/gf117.c
 create mode 100644 drm/nouveau/nvkm/subdev/ibus/priv.h

diff --git a/drm/nouveau/include/nvkm/subdev/ibus.h 
b/drm/nouveau/include/nvkm/subdev/ibus.h
index 9d512cd..c4dcd26 100644
--- a/drm/nouveau/include/nvkm/subdev/ibus.h
+++ b/drm/nouveau/include/nvkm/subdev/ibus.h
@@ -3,6 +3,7 @@
 #include 
 
 int gf100_ibus_new(struct nvkm_device *, int, struct nvkm_subdev **);
+int gf117_ibus_new(struct nvkm_device *, int, struct nvkm_subdev **);
 int gk104_ibus_new(struct nvkm_device *, int, struct nvkm_subdev **);
 int gk20a_ibus_new(struct nvkm_device *, int, struct nvkm_subdev **);
 #endif
diff --git a/drm/nouveau/nvkm/engine/device/base.c 
b/drm/nouveau/nvkm/engine/device/base.c
index 952a508..83383bc 100644
--- a/drm/nouveau/nvkm/engine/device/base.c
+++ b/drm/nouveau/nvkm/engine/device/base.c
@@ -1595,7 +1595,7 @@ nvd7_chipset = {
.fuse = gf100_fuse_new,
.gpio = gf119_gpio_new,
.i2c = gf117_i2c_new,
-   .ibus = gf100_ibus_new,
+   .ibus = gf117_ibus_new,
.imem = nv50_instmem_new,
.ltc = gf100_ltc_new,
.mc = gf100_mc_new,
@@ -1628,7 +1628,7 @@ nvd9_chipset = {
.fuse = gf100_fuse_new,
.gpio = gf119_gpio_new,
.i2c = gf119_i2c_new,
-   .ibus = gf100_ibus_new,
+   .ibus = gf117_ibus_new,
.imem = nv50_instmem_new,
.ltc = gf100_ltc_new,
.mc = gf100_mc_new,
diff --git a/drm/nouveau/nvkm/subdev/ibus/Kbuild 
b/drm/nouveau/nvkm/subdev/ibus/Kbuild
index a0b12d2..de888fa 100644
--- a/drm/nouveau/nvkm/subdev/ibus/Kbuild
+++ b/drm/nouveau/nvkm/subdev/ibus/Kbuild
@@ -1,3 +1,4 @@
 nvkm-y += nvkm/subdev/ibus/gf100.o
+nvkm-y += nvkm/subdev/ibus/gf117.o
 nvkm-y += nvkm/subdev/ibus/gk104.o
 nvkm-y += nvkm/subdev/ibus/gk20a.o
diff --git a/drm/nouveau/nvkm/subdev/ibus/gf100.c 
b/drm/nouveau/nvkm/subdev/ibus/gf100.c
index 37a0496..382720f 100644
--- a/drm/nouveau/nvkm/subdev/ibus/gf100.c
+++ b/drm/nouveau/nvkm/subdev/ibus/gf100.c
@@ -21,7 +21,7 @@
  *
  * Authors: Ben Skeggs
  */
-#include 
+#include "priv.h"
 
 static void
 gf100_ibus_intr_hub(struct nvkm_subdev *ibus, int i)
@@ -56,7 +56,7 @@ gf100_ibus_intr_gpc(struct nvkm_subdev *ibus, int i)
nvkm_mask(device, 0x128128 + (i * 0x0400), 0x0200, 0x);
 }
 
-static void
+void
 gf100_ibus_intr(struct nvkm_subdev *ibus)
 {
struct nvkm_device *device = ibus->device;
@@ -92,8 +92,21 @@ gf100_ibus_intr(struct nvkm_subdev *ibus)
}
 }
 
+static int
+gf100_ibus_init(struct nvkm_subdev *ibus)
+{
+   struct nvkm_device *device = ibus->device;
+   nvkm_mask(device, 0x122310, 0x0003, 0x0800);
+   nvkm_mask(device, 0x12232c, 0x0003, 0x00100064);
+   nvkm_mask(device, 0x122330, 0x0003, 0x00100064);
+   nvkm_mask(device, 0x122334, 0x0003, 0x00100064);
+   nvkm_mask(device, 0x122348, 0x0003, 0x0100);
+   return 0;
+}
+
 static const struct nvkm_subdev_func
 gf100_ibus = {
+   .init = gf100_ibus_init,
.intr = gf100_ibus_intr,
 };
 
diff --git a/drm/nouveau/nvkm/subdev/ibus/gf117.c 
b/drm/nouveau/nvkm/subdev/ibus/gf117.c
new file mode 100644
index 000..f69f263
--- /dev/null
+++ b/drm/nouveau/nvkm/subdev/ibus/gf117.c
@@ -0,0 +1,51 @@
+/*
+ * Copyright 2015 Samuel Pitosiet
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAI

[Nouveau] [PATCH v2] ibus/gf100: increase wait timeout to avoid read faults

2015-09-23 Thread Samuel Pitoiset
Increase clock timeout of some unknown engines in order to avoid failure
at high gpcclk rate.

This fixes IBUS read faults on my GF119 when reclocking is manually
enabled. Note that memory reclocking is completely broken and NvMemExec
has to be disabled to allow core clock reclocking only.

Signed-off-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>
---
V2: increase mask for the gf100 case

 drm/nouveau/include/nvkm/subdev/ibus.h |  1 +
 drm/nouveau/nvkm/engine/device/base.c  |  4 +--
 drm/nouveau/nvkm/subdev/ibus/Kbuild|  1 +
 drm/nouveau/nvkm/subdev/ibus/gf100.c   | 17 ++--
 drm/nouveau/nvkm/subdev/ibus/gf117.c   | 51 ++
 drm/nouveau/nvkm/subdev/ibus/priv.h|  7 +
 6 files changed, 77 insertions(+), 4 deletions(-)
 create mode 100644 drm/nouveau/nvkm/subdev/ibus/gf117.c
 create mode 100644 drm/nouveau/nvkm/subdev/ibus/priv.h

diff --git a/drm/nouveau/include/nvkm/subdev/ibus.h 
b/drm/nouveau/include/nvkm/subdev/ibus.h
index 9d512cd..c4dcd26 100644
--- a/drm/nouveau/include/nvkm/subdev/ibus.h
+++ b/drm/nouveau/include/nvkm/subdev/ibus.h
@@ -3,6 +3,7 @@
 #include 
 
 int gf100_ibus_new(struct nvkm_device *, int, struct nvkm_subdev **);
+int gf117_ibus_new(struct nvkm_device *, int, struct nvkm_subdev **);
 int gk104_ibus_new(struct nvkm_device *, int, struct nvkm_subdev **);
 int gk20a_ibus_new(struct nvkm_device *, int, struct nvkm_subdev **);
 #endif
diff --git a/drm/nouveau/nvkm/engine/device/base.c 
b/drm/nouveau/nvkm/engine/device/base.c
index 952a508..83383bc 100644
--- a/drm/nouveau/nvkm/engine/device/base.c
+++ b/drm/nouveau/nvkm/engine/device/base.c
@@ -1595,7 +1595,7 @@ nvd7_chipset = {
.fuse = gf100_fuse_new,
.gpio = gf119_gpio_new,
.i2c = gf117_i2c_new,
-   .ibus = gf100_ibus_new,
+   .ibus = gf117_ibus_new,
.imem = nv50_instmem_new,
.ltc = gf100_ltc_new,
.mc = gf100_mc_new,
@@ -1628,7 +1628,7 @@ nvd9_chipset = {
.fuse = gf100_fuse_new,
.gpio = gf119_gpio_new,
.i2c = gf119_i2c_new,
-   .ibus = gf100_ibus_new,
+   .ibus = gf117_ibus_new,
.imem = nv50_instmem_new,
.ltc = gf100_ltc_new,
.mc = gf100_mc_new,
diff --git a/drm/nouveau/nvkm/subdev/ibus/Kbuild 
b/drm/nouveau/nvkm/subdev/ibus/Kbuild
index a0b12d2..de888fa 100644
--- a/drm/nouveau/nvkm/subdev/ibus/Kbuild
+++ b/drm/nouveau/nvkm/subdev/ibus/Kbuild
@@ -1,3 +1,4 @@
 nvkm-y += nvkm/subdev/ibus/gf100.o
+nvkm-y += nvkm/subdev/ibus/gf117.o
 nvkm-y += nvkm/subdev/ibus/gk104.o
 nvkm-y += nvkm/subdev/ibus/gk20a.o
diff --git a/drm/nouveau/nvkm/subdev/ibus/gf100.c 
b/drm/nouveau/nvkm/subdev/ibus/gf100.c
index 37a0496..6c61d54 100644
--- a/drm/nouveau/nvkm/subdev/ibus/gf100.c
+++ b/drm/nouveau/nvkm/subdev/ibus/gf100.c
@@ -21,7 +21,7 @@
  *
  * Authors: Ben Skeggs
  */
-#include 
+#include "priv.h"
 
 static void
 gf100_ibus_intr_hub(struct nvkm_subdev *ibus, int i)
@@ -56,7 +56,7 @@ gf100_ibus_intr_gpc(struct nvkm_subdev *ibus, int i)
nvkm_mask(device, 0x128128 + (i * 0x0400), 0x0200, 0x);
 }
 
-static void
+void
 gf100_ibus_intr(struct nvkm_subdev *ibus)
 {
struct nvkm_device *device = ibus->device;
@@ -92,8 +92,21 @@ gf100_ibus_intr(struct nvkm_subdev *ibus)
}
 }
 
+static int
+gf100_ibus_init(struct nvkm_subdev *ibus)
+{
+   struct nvkm_device *device = ibus->device;
+   nvkm_mask(device, 0x122310, 0x0003, 0x0800);
+   nvkm_mask(device, 0x12232c, 0x0073, 0x00100064);
+   nvkm_mask(device, 0x122330, 0x0073, 0x00100064);
+   nvkm_mask(device, 0x122334, 0x0073, 0x00100064);
+   nvkm_mask(device, 0x122348, 0x0003, 0x0100);
+   return 0;
+}
+
 static const struct nvkm_subdev_func
 gf100_ibus = {
+   .init = gf100_ibus_init,
.intr = gf100_ibus_intr,
 };
 
diff --git a/drm/nouveau/nvkm/subdev/ibus/gf117.c 
b/drm/nouveau/nvkm/subdev/ibus/gf117.c
new file mode 100644
index 000..f69f263
--- /dev/null
+++ b/drm/nouveau/nvkm/subdev/ibus/gf117.c
@@ -0,0 +1,51 @@
+/*
+ * Copyright 2015 Samuel Pitosiet
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR 

[Nouveau] [PATCH] core: remove unused variables detected by Clang

2015-08-26 Thread Samuel Pitoiset
These variables have been left since the recent big merge.

Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com
---
 drm/nouveau/nvkm/engine/device/base.c | 4 
 drm/nouveau/nvkm/engine/dma/base.c| 5 -
 drm/nouveau/nvkm/engine/sw/base.c | 5 -
 3 files changed, 14 deletions(-)

diff --git a/drm/nouveau/nvkm/engine/device/base.c 
b/drm/nouveau/nvkm/engine/device/base.c
index 952a508..a32ac99 100644
--- a/drm/nouveau/nvkm/engine/device/base.c
+++ b/drm/nouveau/nvkm/engine/device/base.c
@@ -2291,10 +2291,6 @@ nvkm_device_del(struct nvkm_device **pdevice)
}
 }
 
-static const struct nvkm_engine_func
-nvkm_device_func = {
-};
-
 int
 nvkm_device_ctor(const struct nvkm_device_func *func,
 const struct nvkm_device_quirk *quirk,
diff --git a/drm/nouveau/nvkm/engine/dma/base.c 
b/drm/nouveau/nvkm/engine/dma/base.c
index c1957ce..9769fc0 100644
--- a/drm/nouveau/nvkm/engine/dma/base.c
+++ b/drm/nouveau/nvkm/engine/dma/base.c
@@ -97,11 +97,6 @@ nvkm_dma_oclass_fifo_new(const struct nvkm_oclass *oclass, 
void *data, u32 size,
 }
 
 static const struct nvkm_sclass
-nvkm_dma_oclass_fifo = {
-   .ctor = nvkm_dma_oclass_fifo_new,
-};
-
-static const struct nvkm_sclass
 nvkm_dma_sclass[] = {
{ 0, 0, NV_DMA_FROM_MEMORY, NULL, nvkm_dma_oclass_fifo_new },
{ 0, 0, NV_DMA_TO_MEMORY, NULL, nvkm_dma_oclass_fifo_new },
diff --git a/drm/nouveau/nvkm/engine/sw/base.c 
b/drm/nouveau/nvkm/engine/sw/base.c
index d46f229..53c1f7e 100644
--- a/drm/nouveau/nvkm/engine/sw/base.c
+++ b/drm/nouveau/nvkm/engine/sw/base.c
@@ -55,11 +55,6 @@ nvkm_sw_oclass_new(const struct nvkm_oclass *oclass, void 
*data, u32 size,
return sclass-ctor(chan, oclass, data, size, pobject);
 }
 
-static const struct nvkm_sclass
-nvkm_sw_oclass = {
-   .ctor = nvkm_sw_oclass_new,
-};
-
 static int
 nvkm_sw_oclass_get(struct nvkm_oclass *oclass, int index)
 {
-- 
2.5.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [Mesa-dev] [PATCH] nv50: avoid using inline vertex data submit when gl_VertexID is used

2015-08-24 Thread Samuel Pitoiset



On 08/24/2015 10:02 PM, Ilia Mirkin wrote:

Edge flag stuff is annoying. Pretty sure only blender uses it. shade
model = flat should get fixed on nv50 before edge flags, since blender
uses that too, and it's produces much worse visual artifacts.


No rush for this one though.



I'm having second thoughts about this patch. I think I'm going to go
back to my previous approach of just calling
nv50_vertex_arrays_validate when vbo_fifo  vertexid. I suspect that
vertexid usage with small draws from client buffers is next to
inexistent, no need to re-emit this stuff so often.


Good, I'd be happy to have a look at this second approach.



On Mon, Aug 24, 2015 at 4:07 PM, Samuel Pitoiset
samuel.pitoi...@gmail.com wrote:

Reviewed-by: Samuel Pitoiset samuel.pitoi...@gmail.com

This fix is simpler than I was expected. What about the edge flag stuff now?
:)


On 08/24/2015 05:51 PM, Ilia Mirkin wrote:

The hardware only generates vertexid when vertices come from a VBO. This
fixes:

vertexid-drawelements
vertexid-drawarrays

Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
Cc: 11.0 mesa-sta...@lists.freedesktop.org
---
   src/gallium/drivers/nouveau/nv50/nv50_program.c| 1 +
   src/gallium/drivers/nouveau/nv50/nv50_program.h| 1 +
   src/gallium/drivers/nouveau/nv50/nv50_state_validate.c | 3 ++-
   src/gallium/drivers/nouveau/nv50/nv50_vbo.c| 8 
   4 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c
b/src/gallium/drivers/nouveau/nv50/nv50_program.c
index 02dc367..eff4477 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c
@@ -66,6 +66,7 @@ nv50_vertprog_assign_slots(struct nv50_ir_prog_info
*info)
 case TGSI_SEMANTIC_VERTEXID:
prog-vp.attrs[2] |= NV50_3D_VP_GP_BUILTIN_ATTR_EN_VERTEX_ID;
prog-vp.attrs[2] |=
NV50_3D_VP_GP_BUILTIN_ATTR_EN_VERTEX_ID_DRAW_ARRAYS_ADD_START;
+ prog-vp.vertexid = 1;
continue;
 default:
break;
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.h
b/src/gallium/drivers/nouveau/nv50/nv50_program.h
index 5d3ff56..f4e8e94 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.h
@@ -76,6 +76,7 @@ struct nv50_program {
 ubyte psiz;/* output slot of point size */
 ubyte bfc[2];  /* indices into varying for FFC (FP) or BFC
(VP) */
 ubyte edgeflag;
+  ubyte vertexid;
 ubyte clpd[2]; /* output slot of clip distance[i]'s 1st
component */
 ubyte clpd_nr;
  } vp;
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c
b/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c
index b304a17..66dcf43 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c
@@ -503,7 +503,8 @@ static struct state_validate {
   { nv50_validate_samplers,  NV50_NEW_SAMPLERS },
   { nv50_stream_output_validate, NV50_NEW_STRMOUT |
  NV50_NEW_VERTPROG | NV50_NEW_GMTYPROG
},
-{ nv50_vertex_arrays_validate, NV50_NEW_VERTEX | NV50_NEW_ARRAYS },
+{ nv50_vertex_arrays_validate, NV50_NEW_VERTEX | NV50_NEW_ARRAYS |
+   NV50_NEW_VERTPROG },
   { nv50_validate_min_samples,   NV50_NEW_MIN_SAMPLES },
   };
   #define validate_list_len (sizeof(validate_list) /
sizeof(validate_list[0]))
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
index 600b973..fb4305f 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
@@ -301,6 +301,14 @@ nv50_vertex_arrays_validate(struct nv50_context
*nv50)
  unsigned i;
  const unsigned n = MAX2(vertex-num_elements,
nv50-state.num_vtxelts);
   +   /* A vertexid is not generated for inline data uploads. Have to use
a
+* VBO. This check must come after the vertprog has been validated,
+* otherwise vertexid may be unset.
+*/
+   assert(nv50-vertprog-translated);
+   if (nv50-vertprog-vp.vertexid)
+  nv50-vbo_push_hint = 0;
+
  if (unlikely(vertex-need_conversion))
 nv50-vbo_fifo = ~0;
  else




___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [Mesa-dev] [PATCH] nv50: avoid using inline vertex data submit when gl_VertexID is used

2015-08-24 Thread Samuel Pitoiset

Reviewed-by: Samuel Pitoiset samuel.pitoi...@gmail.com

This fix is simpler than I was expected. What about the edge flag stuff 
now? :)


On 08/24/2015 05:51 PM, Ilia Mirkin wrote:

The hardware only generates vertexid when vertices come from a VBO. This
fixes:

   vertexid-drawelements
   vertexid-drawarrays

Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
Cc: 11.0 mesa-sta...@lists.freedesktop.org
---
  src/gallium/drivers/nouveau/nv50/nv50_program.c| 1 +
  src/gallium/drivers/nouveau/nv50/nv50_program.h| 1 +
  src/gallium/drivers/nouveau/nv50/nv50_state_validate.c | 3 ++-
  src/gallium/drivers/nouveau/nv50/nv50_vbo.c| 8 
  4 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c 
b/src/gallium/drivers/nouveau/nv50/nv50_program.c
index 02dc367..eff4477 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c
@@ -66,6 +66,7 @@ nv50_vertprog_assign_slots(struct nv50_ir_prog_info *info)
case TGSI_SEMANTIC_VERTEXID:
   prog-vp.attrs[2] |= NV50_3D_VP_GP_BUILTIN_ATTR_EN_VERTEX_ID;
   prog-vp.attrs[2] |= 
NV50_3D_VP_GP_BUILTIN_ATTR_EN_VERTEX_ID_DRAW_ARRAYS_ADD_START;
+ prog-vp.vertexid = 1;
   continue;
default:
   break;
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.h 
b/src/gallium/drivers/nouveau/nv50/nv50_program.h
index 5d3ff56..f4e8e94 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.h
@@ -76,6 +76,7 @@ struct nv50_program {
ubyte psiz;/* output slot of point size */
ubyte bfc[2];  /* indices into varying for FFC (FP) or BFC (VP) */
ubyte edgeflag;
+  ubyte vertexid;
ubyte clpd[2]; /* output slot of clip distance[i]'s 1st component */
ubyte clpd_nr;
 } vp;
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c 
b/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c
index b304a17..66dcf43 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c
@@ -503,7 +503,8 @@ static struct state_validate {
  { nv50_validate_samplers,  NV50_NEW_SAMPLERS },
  { nv50_stream_output_validate, NV50_NEW_STRMOUT |
 NV50_NEW_VERTPROG | NV50_NEW_GMTYPROG },
-{ nv50_vertex_arrays_validate, NV50_NEW_VERTEX | NV50_NEW_ARRAYS },
+{ nv50_vertex_arrays_validate, NV50_NEW_VERTEX | NV50_NEW_ARRAYS |
+   NV50_NEW_VERTPROG },
  { nv50_validate_min_samples,   NV50_NEW_MIN_SAMPLES },
  };
  #define validate_list_len (sizeof(validate_list) / sizeof(validate_list[0]))
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c 
b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
index 600b973..fb4305f 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
@@ -301,6 +301,14 @@ nv50_vertex_arrays_validate(struct nv50_context *nv50)
 unsigned i;
 const unsigned n = MAX2(vertex-num_elements, nv50-state.num_vtxelts);
  
+   /* A vertexid is not generated for inline data uploads. Have to use a

+* VBO. This check must come after the vertprog has been validated,
+* otherwise vertexid may be unset.
+*/
+   assert(nv50-vertprog-translated);
+   if (nv50-vertprog-vp.vertexid)
+  nv50-vbo_push_hint = 0;
+
 if (unlikely(vertex-need_conversion))
nv50-vbo_fifo = ~0;
 else


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH 1/4] pm: allow zeroed signals to enable sources

2015-08-04 Thread Samuel Pitoiset
Hardware signals index 0x00 are defined for some domains and they have
to be allowed to enable sources like the others.

Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com
---
 drm/nouveau/nvkm/engine/pm/base.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drm/nouveau/nvkm/engine/pm/base.c 
b/drm/nouveau/nvkm/engine/pm/base.c
index 94991d6..48c1ce6 100644
--- a/drm/nouveau/nvkm/engine/pm/base.c
+++ b/drm/nouveau/nvkm/engine/pm/base.c
@@ -134,7 +134,7 @@ nvkm_perfsrc_enable(struct nvkm_pm *ppm, struct 
nvkm_perfctr *ctr)
u32 mask, value;
int i, j;
 
-   for (i = 0; i  4  ctr-signal[i]; i++) {
+   for (i = 0; i  4; i++) {
for (j = 0; j  8  ctr-source[i][j]; j++) {
sig = nvkm_perfsig_find(ppm, ctr-domain,
ctr-signal[i], dom);
@@ -170,7 +170,7 @@ nvkm_perfsrc_disable(struct nvkm_pm *ppm, struct 
nvkm_perfctr *ctr)
u32 mask;
int i, j;
 
-   for (i = 0; i  4  ctr-signal[i]; i++) {
+   for (i = 0; i  4; i++) {
for (j = 0; j  8  ctr-source[i][j]; j++) {
sig = nvkm_perfsig_find(ppm, ctr-domain,
ctr-signal[i], dom);
-- 
2.4.6

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH 1/2] pm/nv50: fix wrong addr for ZCULL source on G80:GT215

2015-07-26 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com
---
 drm/nouveau/nvkm/engine/pm/nv50.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drm/nouveau/nvkm/engine/pm/nv50.c 
b/drm/nouveau/nvkm/engine/pm/nv50.c
index a778bc7..14d474b 100644
--- a/drm/nouveau/nvkm/engine/pm/nv50.c
+++ b/drm/nouveau/nvkm/engine/pm/nv50.c
@@ -34,7 +34,7 @@ nv50_prop_sources[] = {
 
 const struct nvkm_specsrc
 nv50_zcull_sources[] = {
-   { 0x4002ca4, (const struct nvkm_specmux[]) {
+   { 0x402ca4, (const struct nvkm_specmux[]) {
{ 0x7fff, 0, unk0 },
{}
}, pgraph_zcull_pm_unka4 },
-- 
2.4.6

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH 2/2] pm/nv50: TPC[0x3] must be used for PGRAPH muxs on G80

2015-07-26 Thread Samuel Pitoiset
I thought that using TPC[0x0] like for G84:GT215 was sufficient on G80,
but it's actually not the case. According to NVIDIA PerfKit on Windows,
we have to configure PGRAPH related muxs on TPC[0x3] for this chipset.

Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com
---
 drm/nouveau/nvkm/engine/pm/g84.c  | 25 +
 drm/nouveau/nvkm/engine/pm/nv50.c | 22 +++---
 drm/nouveau/nvkm/engine/pm/priv.h |  1 -
 3 files changed, 28 insertions(+), 20 deletions(-)

diff --git a/drm/nouveau/nvkm/engine/pm/g84.c b/drm/nouveau/nvkm/engine/pm/g84.c
index dda539c..815bb0d 100644
--- a/drm/nouveau/nvkm/engine/pm/g84.c
+++ b/drm/nouveau/nvkm/engine/pm/g84.c
@@ -33,6 +33,15 @@ g84_vfetch_sources[] = {
 };
 
 static const struct nvkm_specsrc
+g84_prop_sources[] = {
+   { 0x408e50, (const struct nvkm_specmux[]) {
+   { 0x1f, 0, sel, true },
+   {}
+   }, pgraph_tpc0_prop_pm_mux },
+   {}
+};
+
+static const struct nvkm_specsrc
 g84_crop_sources[] = {
{ 0x407008, (const struct nvkm_specmux[]) {
{ 0xf, 0, sel0, true },
@@ -109,14 +118,14 @@ g84_pm[] = {
{ 0x31, pc02_crop_01, g84_crop_sources },
{ 0x32, pc02_crop_02, g84_crop_sources },
{ 0x33, pc02_crop_03, g84_crop_sources },
-   { 0x00, pc02_prop_00, nv50_prop_sources },
-   { 0x01, pc02_prop_01, nv50_prop_sources },
-   { 0x02, pc02_prop_02, nv50_prop_sources },
-   { 0x03, pc02_prop_03, nv50_prop_sources },
-   { 0x04, pc02_prop_04, nv50_prop_sources },
-   { 0x05, pc02_prop_05, nv50_prop_sources },
-   { 0x06, pc02_prop_06, nv50_prop_sources },
-   { 0x07, pc02_prop_07, nv50_prop_sources },
+   { 0x00, pc02_prop_00, g84_prop_sources },
+   { 0x01, pc02_prop_01, g84_prop_sources },
+   { 0x02, pc02_prop_02, g84_prop_sources },
+   { 0x03, pc02_prop_03, g84_prop_sources },
+   { 0x04, pc02_prop_04, g84_prop_sources },
+   { 0x05, pc02_prop_05, g84_prop_sources },
+   { 0x06, pc02_prop_06, g84_prop_sources },
+   { 0x07, pc02_prop_07, g84_prop_sources },
{ 0x48, pc02_tex_00, g84_tex_sources },
{ 0x49, pc02_tex_01, g84_tex_sources },
{ 0x4a, pc02_tex_02, g84_tex_sources },
diff --git a/drm/nouveau/nvkm/engine/pm/nv50.c 
b/drm/nouveau/nvkm/engine/pm/nv50.c
index 14d474b..dee73af 100644
--- a/drm/nouveau/nvkm/engine/pm/nv50.c
+++ b/drm/nouveau/nvkm/engine/pm/nv50.c
@@ -24,15 +24,6 @@
 #include nv40.h
 
 const struct nvkm_specsrc
-nv50_prop_sources[] = {
-   { 0x408e50, (const struct nvkm_specmux[]) {
-   { 0x1f, 0, sel, true },
-   {}
-   }, pgraph_tpc0_prop_pm_mux },
-   {}
-};
-
-const struct nvkm_specsrc
 nv50_zcull_sources[] = {
{ 0x402ca4, (const struct nvkm_specmux[]) {
{ 0x7fff, 0, unk0 },
@@ -52,6 +43,15 @@ nv50_zrop_sources[] = {
 };
 
 static const struct nvkm_specsrc
+nv50_prop_sources[] = {
+   { 0x40be50, (const struct nvkm_specmux[]) {
+   { 0x1f, 0, sel, true },
+   {}
+   }, pgraph_tpc3_prop_pm_mux },
+   {}
+};
+
+static const struct nvkm_specsrc
 nv50_crop_sources[] = {
 { 0x407008, (const struct nvkm_specmux[]) {
 { 0x7, 0, sel0, true },
@@ -63,10 +63,10 @@ nv50_crop_sources[] = {
 
 static const struct nvkm_specsrc
 nv50_tex_sources[] = {
-   { 0x408808, (const struct nvkm_specmux[]) {
+   { 0x40b808, (const struct nvkm_specmux[]) {
{ 0x3fff, 0, unk0 },
{}
-   }, pgraph_tpc0_tex_unk08 },
+   }, pgraph_tpc3_tex_unk08 },
{}
 };
 
diff --git a/drm/nouveau/nvkm/engine/pm/priv.h 
b/drm/nouveau/nvkm/engine/pm/priv.h
index 5bcc739..69b7278 100644
--- a/drm/nouveau/nvkm/engine/pm/priv.h
+++ b/drm/nouveau/nvkm/engine/pm/priv.h
@@ -44,7 +44,6 @@ struct nvkm_perfsrc {
bool enable;
 };
 
-extern const struct nvkm_specsrc nv50_prop_sources[];
 extern const struct nvkm_specsrc nv50_zcull_sources[];
 extern const struct nvkm_specsrc nv50_zrop_sources[];
 extern const struct nvkm_specsrc g84_vfetch_sources[];
-- 
2.4.6

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [Mesa-dev] [PATCH] nvc0: bind a fake tess control program when there isn't one available

2015-07-26 Thread Samuel Pitoiset



On 07/26/2015 06:56 AM, Ilia Mirkin wrote:

Apparently this is necessary in order for tess factors to work in a tess
eval program without a tess control program bound. Probably because it
uses the fake program's shader header to work out the number of patch
constants.

Fixes vs-tes-tessinner-tessouter-inputs

Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
---
  src/gallium/drivers/nouveau/nvc0/nvc0_context.c  |  5 +
  src/gallium/drivers/nouveau/nvc0/nvc0_context.h  |  3 +++
  src/gallium/drivers/nouveau/nvc0/nvc0_program.c  | 17 +
  src/gallium/drivers/nouveau/nvc0/nvc0_shader_state.c |  6 +-
  4 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_context.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_context.c
index 84f8db6..46970db 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_context.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_context.c
@@ -132,6 +132,9 @@ nvc0_context_unreference_resources(struct nvc0_context 
*nvc0)
pipe_resource_reference(res, NULL);
 }
 util_dynarray_fini(nvc0-global_residents);
+
+   if (nvc0-tcp_empty)
+  nvc0-base.pipe.delete_tcs_state(nvc0-base.pipe, nvc0-tcp_empty);
  }
  
  static void

@@ -326,6 +329,8 @@ nvc0_create(struct pipe_screen *pscreen, void *priv)
  
 /* shader builtin library is per-screen, but we need a context for m2mf */

 nvc0_program_library_upload(nvc0);
+   nvc0_program_init_tcp_empty(nvc0);
+   nvc0-dirty |= NVC0_NEW_TCTLPROG;
  
 /* add permanently resident buffers to bufctxts */
  
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h

index f449942..df1a891 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
@@ -128,6 +128,8 @@ struct nvc0_context {
 struct nvc0_program *fragprog;
 struct nvc0_program *compprog;
  
+   struct nvc0_program *tcp_empty;

+
 struct nvc0_constbuf constbuf[6][NVC0_MAX_PIPE_CONSTBUFS];
 uint16_t constbuf_dirty[6];
 uint16_t constbuf_valid[6];
@@ -227,6 +229,7 @@ void nvc0_program_destroy(struct nvc0_context *, struct 
nvc0_program *);
  void nvc0_program_library_upload(struct nvc0_context *);
  uint32_t nvc0_program_symbol_offset(const struct nvc0_program *,
  uint32_t label);
+void nvc0_program_init_tcp_empty(struct nvc0_context *);
  
  /* nvc0_query.c */

  void nvc0_init_query_functions(struct nvc0_context *);
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
index 4941831..e9975ce 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
@@ -22,6 +22,8 @@
  
  #include pipe/p_defines.h
  
+#include tgsi/tgsi_ureg.h

+
  #include nvc0/nvc0_context.h
  
  #include codegen/nv50_ir_driver.h

@@ -803,3 +805,18 @@ nvc0_program_symbol_offset(const struct nvc0_program 
*prog, uint32_t label)
   return prog-code_base + base + syms[i].offset;
 return prog-code_base; /* no symbols or symbol not found */
  }
+
+void
+nvc0_program_init_tcp_empty(struct nvc0_context *nvc0)
+{
+   struct ureg_program *ureg;
+
+   ureg = ureg_create(TGSI_PROCESSOR_TESS_CTRL);
+   if (!ureg)
+  return;
+
+   ureg_property(ureg, TGSI_PROPERTY_TCS_VERTICES_OUT, 1);
+   ureg_END(ureg);
+
+   nvc0-tcp_empty = ureg_create_shader_and_destroy(ureg, nvc0-base.pipe);
+}
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_shader_state.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_shader_state.c
index 8aa127a..e21515f 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_shader_state.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_shader_state.c
@@ -148,8 +148,12 @@ nvc0_tctlprog_validate(struct nvc0_context *nvc0)
BEGIN_NVC0(push, NVC0_3D(SP_GPR_ALLOC(2)), 1);
PUSH_DATA (push, tp-num_gprs);
 } else {
-  BEGIN_NVC0(push, NVC0_3D(SP_SELECT(2)), 1);
+  tp = nvc0-tcp_empty;
+  if (!nvc0_program_validate(nvc0, tp))
+ assert(!unable to validate empty tcp);
+  BEGIN_NVC0(push, NVC0_3D(SP_SELECT(2)), 2);
PUSH_DATA (push, 0x20);
+  PUSH_DATA (push, tp-code_base);
 }


It would be good to check if tp is not NULL before trying to validate 
the program.
And if the program can't be validated, I don't think we want to push 
tp-code_base, isn't it?



 nvc0_program_update_context_state(nvc0, tp, 1);
  }


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] nv50: adjust min/max lod by base level on G80

2015-07-20 Thread Samuel Pitoiset

Reviewed-by: Samuel Pitoiset samuel.pitoi...@gmail.com

On 07/20/2015 09:26 AM, Ilia Mirkin wrote:

Make the assumption that there's a 1:1 TIC - TSC connection, and
increase min/max lod by the relevant texture's base level. Also if
there's no mipfilter, we have to enable it while forcing min/max lod to
the base level.

This fixes many, but not all, tex-miplevel-selection tests on G80.

Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
---

All the textureLod tests fail. If I also adjust the lod_bias by the
first_level, then the regular tests start failing.

Not sure what the right move is here... need to trace the blob to see
what it does here.

  src/gallium/drivers/nouveau/nv50/nv50_state.c  |  1 +
  .../drivers/nouveau/nv50/nv50_stateobj_tex.h   |  1 +
  src/gallium/drivers/nouveau/nv50/nv50_tex.c| 39 ++
  3 files changed, 41 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_state.c 
b/src/gallium/drivers/nouveau/nv50/nv50_state.c
index d4d41af..98c4c3a 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_state.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_state.c
@@ -464,6 +464,7 @@ nv50_sampler_state_create(struct pipe_context *pipe,
 struct nv50_tsc_entry *so = MALLOC_STRUCT(nv50_tsc_entry);
 float f[2];
  
+   so-pipe = *cso;

 so-id = -1;
  
 so-tsc[0] = (0x00026000 |

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_stateobj_tex.h 
b/src/gallium/drivers/nouveau/nv50/nv50_stateobj_tex.h
index 99548cb..9a19166 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_stateobj_tex.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_stateobj_tex.h
@@ -5,6 +5,7 @@
  #include pipe/p_state.h
  
  struct nv50_tsc_entry {

+   struct pipe_sampler_state pipe;
 int id;
 uint32_t tsc[8];
  };
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_tex.c 
b/src/gallium/drivers/nouveau/nv50/nv50_tex.c
index 17ae27f..d79c813 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_tex.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_tex.c
@@ -344,6 +344,45 @@ nv50_validate_tsc(struct nv50_context *nv50, int s)
   PUSH_DATA (push, (i  4) | 0);
   continue;
}
+  if (nv50-base.screen-class_3d == NV50_3D_CLASS) {
+ struct nv50_tic_entry *tic = nv50_tic_entry(nv50-textures[s][i]);
+
+ /* We must make sure that the MIN_LOD is at least set to the first
+  * level for the G80
+  */
+ bool need_update = false;
+ float min_lod = CLAMP(
+   tic-pipe.u.tex.first_level + tsc-pipe.min_lod, 0.0f, 15.0f);
+ float max_lod = CLAMP(
+   tic-pipe.u.tex.first_level + tsc-pipe.max_lod, 0.0f, 15.0f);
+
+ if (tsc-pipe.min_mip_filter == PIPE_TEX_MIPFILTER_NONE) {
+uint32_t old_tsc1 = tsc-tsc[1];
+tsc-tsc[1] = ~NV50_TSC_1_MIPF__MASK;
+if (tic-pipe.u.tex.first_level) {
+   tsc-tsc[1] |= NV50_TSC_1_MIPF_NEAREST;
+   max_lod = min_lod = tic-pipe.u.tex.first_level;
+}
+if (tsc-tsc[1] != old_tsc1)
+   need_update = true;
+ }
+
+ uint32_t new_tsc2 =
+(((int)(max_lod * 256.0f)  0xfff)  12) |
+((int)(min_lod * 256.0f)  0xfff);
+ if ((tsc-tsc[2]  0xff) != new_tsc2) {
+tsc-tsc[2] = ~0xffu;
+tsc-tsc[2] |= new_tsc2;
+need_update = true;
+ }
+
+ if (need_update  tsc-id = 0) {
+nv50_sifc_linear_u8(nv50-base, nv50-screen-txc,
+65536 + tsc-id * 32,
+NOUVEAU_BO_VRAM, 32, tsc-tsc);
+need_flush = TRUE;
+ }
+  }
if (tsc-id  0) {
   tsc-id = nv50_screen_tsc_alloc(nv50-screen, tsc);
  


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [Mesa-dev] [PATCH] nvc0: fix geometry program revalidation of clipping params

2015-07-13 Thread Samuel Pitoiset
What piglit test does this fix?

On Sat, Jul 11, 2015 at 7:13 PM, Ilia Mirkin imir...@alum.mit.edu wrote:

 Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
 Cc: mesa-sta...@lists.freedesktop.org
 ---

 Even though in practice a geometry program will never be using UCP's,
 we still were revalidating (aka recompiling) the program when more
 clip planes became enabled (which also are used for regular clip
 distances).

 This seems like it should have led to massive fail, but I guess you
 don't change the number of clip planes when using geometry shaders.
 But I'm going to put this through a full piglit run just in case
 there's something I'm missing.

  src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c
 b/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c
 index 785e52e..11f2b10 100644
 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c
 +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c
 @@ -339,7 +339,7 @@ nvc0_check_program_ucps(struct nvc0_context *nvc0,
nvc0_vertprog_validate(nvc0);
 else
 if (likely(vp == nvc0-gmtyprog))
 -  nvc0_vertprog_validate(nvc0);
 +  nvc0_gmtyprog_validate(nvc0);
 else
nvc0_tevlprog_validate(nvc0);
  }
 --
 2.3.6

 ___
 mesa-dev mailing list
 mesa-...@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev




-- 
Best regards,
Samuel Pitoiset.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [RFC PATCH 3/8] nv50: allocate and map a notifier buffer object for PM

2015-06-28 Thread Samuel Pitoiset



On 06/26/2015 01:02 AM, Ilia Mirkin wrote:

On Mon, Jun 22, 2015 at 4:53 PM, Samuel Pitoiset
samuel.pitoi...@gmail.com wrote:

This notifier buffer object will be used to read back global performance
counters results written by the kernel.

For each domain, we will store the handle of the perfdom object, an
array of 4 counters and the number of cycles. Like the Gallium's HUD,
we keep a list of busy queries in a ring in order to prevent stalls
when reading queries.

Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com
---
  src/gallium/drivers/nouveau/nv50/nv50_screen.c | 29 ++
  src/gallium/drivers/nouveau/nv50/nv50_screen.h |  6 ++
  2 files changed, 35 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c 
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index c985344..3a99cc8 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -368,6 +368,7 @@ nv50_screen_destroy(struct pipe_screen *pscreen)
 nouveau_object_del(screen-m2mf);
 nouveau_object_del(screen-sync);
 nouveau_object_del(screen-sw);
+   nouveau_object_del(screen-query);

 nouveau_screen_fini(screen-base);

@@ -699,9 +700,11 @@ nv50_screen_create(struct nouveau_device *dev)
 struct nv50_screen *screen;
 struct pipe_screen *pscreen;
 struct nouveau_object *chan;
+   struct nv04_fifo *fifo;
 uint64_t value;
 uint32_t tesla_class;
 unsigned stack_size;
+   uint32_t length;
 int ret;

 screen = CALLOC_STRUCT(nv50_screen);
@@ -727,6 +730,7 @@ nv50_screen_create(struct nouveau_device *dev)
 screen-base.pushbuf-rsvd_kick = 5;

 chan = screen-base.channel;
+   fifo = chan-data;

 pscreen-destroy = nv50_screen_destroy;
 pscreen-context_create = nv50_create;
@@ -772,6 +776,23 @@ nv50_screen_create(struct nouveau_device *dev)
goto fail;
 }

+   /* Compute size (in bytes) of the notifier buffer object which is used
+* in order to read back global performance counters results written
+* by the kernel. For each domain, we store the handle of the perfdom
+* object, an array of 4 counters and the number of cycles. Like for
+* the Gallium's HUD, we keep a list of busy queries in a ring in order
+* to prevent stalls when reading queries. */
+   length = (1 + (NV50_HW_PM_RING_BUFFER_NUM_DOMAINS * 6) *
+  NV50_HW_PM_RING_BUFFER_MAX_QUERIES) * 4;

This calculation may become apparent to me later, but it certainly
isn't now. What's the *6? You refer to an array of 4 counters...
should that have been 6 counters? Or should this have been a 4?


This refers to the handle of the object, the array of 4 counters and the 
number of cycles.

In other words, for each domain we store: id, ctr0, ctr1, ctr2, ctr3, clk.




+
+   ret = nouveau_object_new(chan, 0xbeef0302, NOUVEAU_NOTIFIER_CLASS,
+(struct nv04_notify){ .length = length },
+sizeof(struct nv04_notify), screen-query);
+   if (ret) {
+   NOUVEAU_ERR(Failed to allocate notifier object for PM: %d\n, ret);
+   goto fail;
+   }
+
 ret = nouveau_object_new(chan, 0xbeef506e, 0x506e,
  NULL, 0, screen-sw);
 if (ret) {
@@ -845,6 +866,14 @@ nv50_screen_create(struct nouveau_device *dev)
 nouveau_heap_init(screen-gp_code_heap, 0, 1  NV50_CODE_BO_SIZE_LOG2);
 nouveau_heap_init(screen-fp_code_heap, 0, 1  NV50_CODE_BO_SIZE_LOG2);

+   ret = nouveau_bo_wrap(screen-base.device, fifo-notify, 
screen-notify_bo);
+   if (ret == 0)
+  nouveau_bo_map(screen-notify_bo, 0, screen-base.client);

ret = ...


Good catch, thanks.




+   if (ret) {
+  NOUVEAU_ERR(Failed to map notifier object for PM: %d\n, ret);
+  goto fail;
+   }
+
 nouveau_getparam(dev, NOUVEAU_GETPARAM_GRAPH_UNITS, value);

 screen-TPs = util_bitcount(value  0x);
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.h 
b/src/gallium/drivers/nouveau/nv50/nv50_screen.h
index 69fdfdb..71a5247 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.h
@@ -59,6 +59,7 @@ struct nv50_screen {
 struct nouveau_bo *txc; /* TIC (offset 0) and TSC (65536) */
 struct nouveau_bo *stack_bo;
 struct nouveau_bo *tls_bo;
+   struct nouveau_bo *notify_bo;

 unsigned TPs;
 unsigned MPsInTP;
@@ -89,6 +90,7 @@ struct nv50_screen {
 } fence;

 struct nouveau_object *sync;
+   struct nouveau_object *query;

 struct nouveau_object *tesla;
 struct nouveau_object *eng2d;
@@ -96,6 +98,10 @@ struct nv50_screen {
 struct nouveau_object *sw;
  };

+/* Parameters of the ring buffer used to read back global PM counters. */
+#define NV50_HW_PM_RING_BUFFER_NUM_DOMAINS 8
+#define NV50_HW_PM_RING_BUFFER_MAX_QUERIES 9 /* HUD_NUM_QUERIES + 1 */
+
  static INLINE struct nv50_screen *
  nv50_screen(struct pipe_screen *screen)
  {
--
2.4.4

Re: [Nouveau] [RFC PATCH 6/8] nv50: add support for compute/graphics global performance counters

2015-06-28 Thread Samuel Pitoiset



On 06/26/2015 01:09 AM, Ilia Mirkin wrote:

What's with the \%'s everywhere?


Maybe percent will be better ?



On Mon, Jun 22, 2015 at 4:53 PM, Samuel Pitoiset
samuel.pitoi...@gmail.com wrote:

This commit adds support for both compute and graphics global
performance counters which have been reverse engineered with
CUPTI (Linux) and PerfKit (Windows).

Currently, only one query type can be monitored at the same time because
the Gallium's HUD doesn't fit pretty well. This will be improved later.

Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com
---
  src/gallium/drivers/nouveau/nv50/nv50_query.c  | 1057 +++-
  src/gallium/drivers/nouveau/nv50/nv50_screen.h |   35 +
  2 files changed, 1087 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_query.c 
b/src/gallium/drivers/nouveau/nv50/nv50_query.c
index 1162110..b9d2914 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_query.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_query.c
@@ -27,6 +27,8 @@
  #include nv50/nv50_context.h
  #include nv_object.xml.h

+#include nouveau_perfmon.h
+
  #define NV50_QUERY_STATE_READY   0
  #define NV50_QUERY_STATE_ACTIVE  1
  #define NV50_QUERY_STATE_ENDED   2
@@ -51,10 +53,25 @@ struct nv50_query {
 boolean is64bit;
 struct nouveau_mm_allocation *mm;
 struct nouveau_fence *fence;
+   struct nouveau_object *perfdom;
  };

  #define NV50_QUERY_ALLOC_SPACE 256

+#ifdef DEBUG
+static void nv50_hw_pm_dump_perfdom(struct nvif_perfdom_v0 *args);
+#endif
+
+static boolean
+nv50_hw_pm_query_create(struct nv50_context *, struct nv50_query *);
+static void
+nv50_hw_pm_query_destroy(struct nv50_context *, struct nv50_query *);
+static boolean
+nv50_hw_pm_query_begin(struct nv50_context *, struct nv50_query *);
+static void nv50_hw_pm_query_end(struct nv50_context *, struct nv50_query *);
+static boolean nv50_hw_pm_query_result(struct nv50_context *,
+struct nv50_query *, boolean, void *);
+
  static INLINE struct nv50_query *
  nv50_query(struct pipe_query *pipe)
  {
@@ -96,12 +113,18 @@ nv50_query_allocate(struct nv50_context *nv50, struct 
nv50_query *q, int size)
  static void
  nv50_query_destroy(struct pipe_context *pipe, struct pipe_query *pq)
  {
+   struct nv50_context *nv50 = nv50_context(pipe);
+   struct nv50_query *q = nv50_query(pq);
+
 if (!pq)
return;

-   nv50_query_allocate(nv50_context(pipe), nv50_query(pq), 0);
-   nouveau_fence_ref(NULL, nv50_query(pq)-fence);
-   FREE(nv50_query(pq));
+   if ((q-type = NV50_HW_PM_QUERY(0)  q-type = NV50_HW_PM_QUERY_LAST))
+  nv50_hw_pm_query_destroy(nv50, q);
+
+   nv50_query_allocate(nv50, q, 0);
+   nouveau_fence_ref(NULL, q-fence);
+   FREE(q);
  }

  static struct pipe_query *
@@ -130,6 +153,11 @@ nv50_query_create(struct pipe_context *pipe, unsigned 
type, unsigned index)
q-data -= 32 / sizeof(*q-data); /* we advance before query_begin ! */
 }

+   if ((q-type = NV50_HW_PM_QUERY(0)  q-type = NV50_HW_PM_QUERY_LAST)) {
+  if (!nv50_hw_pm_query_create(nv50, q))
+ return NULL;
+   }
+
 return (struct pipe_query *)q;
  }

@@ -154,6 +182,7 @@ nv50_query_begin(struct pipe_context *pipe, struct 
pipe_query *pq)
 struct nv50_context *nv50 = nv50_context(pipe);
 struct nouveau_pushbuf *push = nv50-base.pushbuf;
 struct nv50_query *q = nv50_query(pq);
+   boolean ret = TRUE;

 if (!pq)
return FALSE;
@@ -211,10 +240,13 @@ nv50_query_begin(struct pipe_context *pipe, struct 
pipe_query *pq)
nv50_query_get(push, q, 0x10, 0x5002);
break;
 default:
+  if ((q-type = NV50_HW_PM_QUERY(0)  q-type = 
NV50_HW_PM_QUERY_LAST)) {
+ ret = nv50_hw_pm_query_begin(nv50, q);
+  }
break;
 }
 q-state = NV50_QUERY_STATE_ACTIVE;
-   return true;
+   return ret;
  }

  static void
@@ -274,7 +306,9 @@ nv50_query_end(struct pipe_context *pipe, struct pipe_query 
*pq)
q-state = NV50_QUERY_STATE_READY;
break;
 default:
-  assert(0);
+  if ((q-type = NV50_HW_PM_QUERY(0)  q-type = 
NV50_HW_PM_QUERY_LAST)) {
+ nv50_hw_pm_query_end(nv50, q);
+  }
break;
 }

@@ -309,6 +343,10 @@ nv50_query_result(struct pipe_context *pipe, struct 
pipe_query *pq,
 if (!pq)
return FALSE;

+   if ((q-type = NV50_HW_PM_QUERY(0)  q-type = NV50_HW_PM_QUERY_LAST)) {
+  return nv50_hw_pm_query_result(nv50, q, wait, result);
+   }
+
 if (q-state != NV50_QUERY_STATE_READY)
nv50_query_update(q);

@@ -488,6 +526,1015 @@ nva0_so_target_save_offset(struct pipe_context *pipe,
 nv50_query_end(pipe, targ-pq);
  }

+/* === HARDWARE GLOBAL PERFORMANCE COUNTERS for NV50 === */
+
+struct nv50_hw_pm_source_cfg
+{
+   const char *name;
+   uint64_t value;
+};
+
+struct nv50_hw_pm_signal_cfg
+{
+   const char *name;
+   const struct nv50_hw_pm_source_cfg src[8];
+};
+
+struct nv50_hw_pm_counter_cfg
+{
+   uint16_t logic_op;
+   const struct

Re: [Nouveau] [RFC PATCH 4/8] nv50: configure the ring buffer for reading back PM counters

2015-06-28 Thread Samuel Pitoiset



On 06/26/2015 01:04 AM, Ilia Mirkin wrote:

Yeah, this whole thing has to be guarded by a drm version check,
otherwise it'll end up with errors in dmesg I assume. Perhaps only
allocate screen-query when the drm version matches, and gate things
on that for the rest of the code?


Yes, this sounds good to me.



On Mon, Jun 22, 2015 at 4:53 PM, Samuel Pitoiset
samuel.pitoi...@gmail.com wrote:

To write data at the right offset, the kernel has to know some
parameters of this ring buffer, like the number of domains and the
maximum number of queries.

Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com
---
  src/gallium/drivers/nouveau/nv50/nv50_screen.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c 
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index 3a99cc8..53817c0 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -441,6 +441,13 @@ nv50_screen_init_hwctx(struct nv50_screen *screen)

 BEGIN_NV04(push, SUBC_SW(NV01_SUBCHAN_OBJECT), 1);
 PUSH_DATA (push, screen-sw-handle);
+   BEGIN_NV04(push, SUBC_SW(0x0190), 1);
+   PUSH_DATA (push, screen-query-handle);
+   // XXX: Maybe add a check for DRM version here ?
+   BEGIN_NV04(push, SUBC_SW(0x0600), 1);
+   PUSH_DATA (push, NV50_HW_PM_RING_BUFFER_MAX_QUERIES);
+   BEGIN_NV04(push, SUBC_SW(0x0604), 1);
+   PUSH_DATA (push, NV50_HW_PM_RING_BUFFER_NUM_DOMAINS);

FYI you can do BEGIN_NV04(..., 2), since they're sequential.


I'm going to make the change.




 BEGIN_NV04(push, NV50_3D(COND_MODE), 1);
 PUSH_DATA (push, NV50_3D_COND_MODE_ALWAYS);
--
2.4.4

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [Mesa-dev] [RFC PATCH 5/8] nv50: prevent NULL pointer dereference with pipe_query functions

2015-06-23 Thread Samuel Pitoiset



On 06/23/2015 08:57 AM, Michel Dänzer wrote:

On 23.06.2015 06:02, Samuel Pitoiset wrote:


On 06/22/2015 10:52 PM, Ilia Mirkin wrote:

If query_create fails, why would any of these functions get called?

Because the HUD doesn't check if query_create() fails and it calls other
pipe_query functions with NULL pointer instead of a valid query object.

Could the HUD code be fixed instead?
It's definitely possible, and probably the best solution instead of 
preventing NULL pointer dereference in the underlying drivers. I'll make 
a patch.





___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [RFC PATCH 6/8] nv50: add support for compute/graphics global performance counters

2015-06-22 Thread Samuel Pitoiset
This commit adds support for both compute and graphics global
performance counters which have been reverse engineered with
CUPTI (Linux) and PerfKit (Windows).

Currently, only one query type can be monitored at the same time because
the Gallium's HUD doesn't fit pretty well. This will be improved later.

Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com
---
 src/gallium/drivers/nouveau/nv50/nv50_query.c  | 1057 +++-
 src/gallium/drivers/nouveau/nv50/nv50_screen.h |   35 +
 2 files changed, 1087 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_query.c 
b/src/gallium/drivers/nouveau/nv50/nv50_query.c
index 1162110..b9d2914 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_query.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_query.c
@@ -27,6 +27,8 @@
 #include nv50/nv50_context.h
 #include nv_object.xml.h
 
+#include nouveau_perfmon.h
+
 #define NV50_QUERY_STATE_READY   0
 #define NV50_QUERY_STATE_ACTIVE  1
 #define NV50_QUERY_STATE_ENDED   2
@@ -51,10 +53,25 @@ struct nv50_query {
boolean is64bit;
struct nouveau_mm_allocation *mm;
struct nouveau_fence *fence;
+   struct nouveau_object *perfdom;
 };
 
 #define NV50_QUERY_ALLOC_SPACE 256
 
+#ifdef DEBUG
+static void nv50_hw_pm_dump_perfdom(struct nvif_perfdom_v0 *args);
+#endif
+
+static boolean
+nv50_hw_pm_query_create(struct nv50_context *, struct nv50_query *);
+static void
+nv50_hw_pm_query_destroy(struct nv50_context *, struct nv50_query *);
+static boolean
+nv50_hw_pm_query_begin(struct nv50_context *, struct nv50_query *);
+static void nv50_hw_pm_query_end(struct nv50_context *, struct nv50_query *);
+static boolean nv50_hw_pm_query_result(struct nv50_context *,
+struct nv50_query *, boolean, void *);
+
 static INLINE struct nv50_query *
 nv50_query(struct pipe_query *pipe)
 {
@@ -96,12 +113,18 @@ nv50_query_allocate(struct nv50_context *nv50, struct 
nv50_query *q, int size)
 static void
 nv50_query_destroy(struct pipe_context *pipe, struct pipe_query *pq)
 {
+   struct nv50_context *nv50 = nv50_context(pipe);
+   struct nv50_query *q = nv50_query(pq);
+
if (!pq)
   return;
 
-   nv50_query_allocate(nv50_context(pipe), nv50_query(pq), 0);
-   nouveau_fence_ref(NULL, nv50_query(pq)-fence);
-   FREE(nv50_query(pq));
+   if ((q-type = NV50_HW_PM_QUERY(0)  q-type = NV50_HW_PM_QUERY_LAST))
+  nv50_hw_pm_query_destroy(nv50, q);
+
+   nv50_query_allocate(nv50, q, 0);
+   nouveau_fence_ref(NULL, q-fence);
+   FREE(q);
 }
 
 static struct pipe_query *
@@ -130,6 +153,11 @@ nv50_query_create(struct pipe_context *pipe, unsigned 
type, unsigned index)
   q-data -= 32 / sizeof(*q-data); /* we advance before query_begin ! */
}
 
+   if ((q-type = NV50_HW_PM_QUERY(0)  q-type = NV50_HW_PM_QUERY_LAST)) {
+  if (!nv50_hw_pm_query_create(nv50, q))
+ return NULL;
+   }
+
return (struct pipe_query *)q;
 }
 
@@ -154,6 +182,7 @@ nv50_query_begin(struct pipe_context *pipe, struct 
pipe_query *pq)
struct nv50_context *nv50 = nv50_context(pipe);
struct nouveau_pushbuf *push = nv50-base.pushbuf;
struct nv50_query *q = nv50_query(pq);
+   boolean ret = TRUE;
 
if (!pq)
   return FALSE;
@@ -211,10 +240,13 @@ nv50_query_begin(struct pipe_context *pipe, struct 
pipe_query *pq)
   nv50_query_get(push, q, 0x10, 0x5002);
   break;
default:
+  if ((q-type = NV50_HW_PM_QUERY(0)  q-type = 
NV50_HW_PM_QUERY_LAST)) {
+ ret = nv50_hw_pm_query_begin(nv50, q);
+  }
   break;
}
q-state = NV50_QUERY_STATE_ACTIVE;
-   return true;
+   return ret;
 }
 
 static void
@@ -274,7 +306,9 @@ nv50_query_end(struct pipe_context *pipe, struct pipe_query 
*pq)
   q-state = NV50_QUERY_STATE_READY;
   break;
default:
-  assert(0);
+  if ((q-type = NV50_HW_PM_QUERY(0)  q-type = 
NV50_HW_PM_QUERY_LAST)) {
+ nv50_hw_pm_query_end(nv50, q);
+  }
   break;
}
 
@@ -309,6 +343,10 @@ nv50_query_result(struct pipe_context *pipe, struct 
pipe_query *pq,
if (!pq)
   return FALSE;
 
+   if ((q-type = NV50_HW_PM_QUERY(0)  q-type = NV50_HW_PM_QUERY_LAST)) {
+  return nv50_hw_pm_query_result(nv50, q, wait, result);
+   }
+
if (q-state != NV50_QUERY_STATE_READY)
   nv50_query_update(q);
 
@@ -488,6 +526,1015 @@ nva0_so_target_save_offset(struct pipe_context *pipe,
nv50_query_end(pipe, targ-pq);
 }
 
+/* === HARDWARE GLOBAL PERFORMANCE COUNTERS for NV50 === */
+
+struct nv50_hw_pm_source_cfg
+{
+   const char *name;
+   uint64_t value;
+};
+
+struct nv50_hw_pm_signal_cfg
+{
+   const char *name;
+   const struct nv50_hw_pm_source_cfg src[8];
+};
+
+struct nv50_hw_pm_counter_cfg
+{
+   uint16_t logic_op;
+   const struct nv50_hw_pm_signal_cfg sig[4];
+};
+
+enum nv50_hw_pm_query_display
+{
+   NV50_HW_PM_EVENT_DISPLAY_RAW,
+   NV50_HW_PM_EVENT_DISPLAY_RATIO,
+};
+
+enum nv50_hw_pm_query_count
+{
+   NV50_HW_PM_EVENT_COUNT_SIMPLE,
+   NV50_HW_PM_EVENT_COUNT_B4

[Nouveau] [RFC PATCH 3/8] nv50: allocate and map a notifier buffer object for PM

2015-06-22 Thread Samuel Pitoiset
This notifier buffer object will be used to read back global performance
counters results written by the kernel.

For each domain, we will store the handle of the perfdom object, an
array of 4 counters and the number of cycles. Like the Gallium's HUD,
we keep a list of busy queries in a ring in order to prevent stalls
when reading queries.

Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com
---
 src/gallium/drivers/nouveau/nv50/nv50_screen.c | 29 ++
 src/gallium/drivers/nouveau/nv50/nv50_screen.h |  6 ++
 2 files changed, 35 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c 
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index c985344..3a99cc8 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -368,6 +368,7 @@ nv50_screen_destroy(struct pipe_screen *pscreen)
nouveau_object_del(screen-m2mf);
nouveau_object_del(screen-sync);
nouveau_object_del(screen-sw);
+   nouveau_object_del(screen-query);
 
nouveau_screen_fini(screen-base);
 
@@ -699,9 +700,11 @@ nv50_screen_create(struct nouveau_device *dev)
struct nv50_screen *screen;
struct pipe_screen *pscreen;
struct nouveau_object *chan;
+   struct nv04_fifo *fifo;
uint64_t value;
uint32_t tesla_class;
unsigned stack_size;
+   uint32_t length;
int ret;
 
screen = CALLOC_STRUCT(nv50_screen);
@@ -727,6 +730,7 @@ nv50_screen_create(struct nouveau_device *dev)
screen-base.pushbuf-rsvd_kick = 5;
 
chan = screen-base.channel;
+   fifo = chan-data;
 
pscreen-destroy = nv50_screen_destroy;
pscreen-context_create = nv50_create;
@@ -772,6 +776,23 @@ nv50_screen_create(struct nouveau_device *dev)
   goto fail;
}
 
+   /* Compute size (in bytes) of the notifier buffer object which is used
+* in order to read back global performance counters results written
+* by the kernel. For each domain, we store the handle of the perfdom
+* object, an array of 4 counters and the number of cycles. Like for
+* the Gallium's HUD, we keep a list of busy queries in a ring in order
+* to prevent stalls when reading queries. */
+   length = (1 + (NV50_HW_PM_RING_BUFFER_NUM_DOMAINS * 6) *
+  NV50_HW_PM_RING_BUFFER_MAX_QUERIES) * 4;
+
+   ret = nouveau_object_new(chan, 0xbeef0302, NOUVEAU_NOTIFIER_CLASS,
+(struct nv04_notify){ .length = length },
+sizeof(struct nv04_notify), screen-query);
+   if (ret) {
+   NOUVEAU_ERR(Failed to allocate notifier object for PM: %d\n, ret);
+   goto fail;
+   }
+
ret = nouveau_object_new(chan, 0xbeef506e, 0x506e,
 NULL, 0, screen-sw);
if (ret) {
@@ -845,6 +866,14 @@ nv50_screen_create(struct nouveau_device *dev)
nouveau_heap_init(screen-gp_code_heap, 0, 1  NV50_CODE_BO_SIZE_LOG2);
nouveau_heap_init(screen-fp_code_heap, 0, 1  NV50_CODE_BO_SIZE_LOG2);
 
+   ret = nouveau_bo_wrap(screen-base.device, fifo-notify, 
screen-notify_bo);
+   if (ret == 0)
+  nouveau_bo_map(screen-notify_bo, 0, screen-base.client);
+   if (ret) {
+  NOUVEAU_ERR(Failed to map notifier object for PM: %d\n, ret);
+  goto fail;
+   }
+
nouveau_getparam(dev, NOUVEAU_GETPARAM_GRAPH_UNITS, value);
 
screen-TPs = util_bitcount(value  0x);
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.h 
b/src/gallium/drivers/nouveau/nv50/nv50_screen.h
index 69fdfdb..71a5247 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.h
@@ -59,6 +59,7 @@ struct nv50_screen {
struct nouveau_bo *txc; /* TIC (offset 0) and TSC (65536) */
struct nouveau_bo *stack_bo;
struct nouveau_bo *tls_bo;
+   struct nouveau_bo *notify_bo;
 
unsigned TPs;
unsigned MPsInTP;
@@ -89,6 +90,7 @@ struct nv50_screen {
} fence;
 
struct nouveau_object *sync;
+   struct nouveau_object *query;
 
struct nouveau_object *tesla;
struct nouveau_object *eng2d;
@@ -96,6 +98,10 @@ struct nv50_screen {
struct nouveau_object *sw;
 };
 
+/* Parameters of the ring buffer used to read back global PM counters. */
+#define NV50_HW_PM_RING_BUFFER_NUM_DOMAINS 8
+#define NV50_HW_PM_RING_BUFFER_MAX_QUERIES 9 /* HUD_NUM_QUERIES + 1 */
+
 static INLINE struct nv50_screen *
 nv50_screen(struct pipe_screen *screen)
 {
-- 
2.4.4

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [RFC PATCH 7/8] nv50: expose global performance counters to the HUD

2015-06-22 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com
---
 src/gallium/drivers/nouveau/nv50/nv50_query.c  | 41 ++
 src/gallium/drivers/nouveau/nv50/nv50_screen.c |  1 +
 src/gallium/drivers/nouveau/nv50/nv50_screen.h |  3 ++
 3 files changed, 45 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_query.c 
b/src/gallium/drivers/nouveau/nv50/nv50_query.c
index b9d2914..062d427 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_query.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_query.c
@@ -1535,6 +1535,47 @@ nv50_hw_pm_query_result(struct nv50_context *nv50, 
struct nv50_query *q,
return TRUE;
 }
 
+int
+nv50_screen_get_driver_query_info(struct pipe_screen *pscreen,
+  unsigned id,
+  struct pipe_driver_query_info *info)
+{
+   struct nv50_screen *screen = nv50_screen(pscreen);
+   int count = 0;
+
+   // TODO: Check DRM version when nvif will be merged in libdrm!
+   if (screen-base.perfmon) {
+  nv50_identify_events(screen);
+  count += NV50_HW_PM_QUERY_COUNT;
+   }
+
+   if (!info)
+  return count;
+
+   /* Init default values. */
+   info-name = this_is_not_the_query_you_are_looking_for;
+   info-query_type = 0xdeadd01d;
+   info-type = PIPE_DRIVER_QUERY_TYPE_UINT64;
+   info-max_value.u64 = 0;
+   info-group_id = -1;
+
+   if (id  count) {
+  if (screen-base.perfmon) {
+ const struct nv50_hw_pm_query_cfg *cfg =
+nv50_hw_pm_query_get_cfg(screen, NV50_HW_PM_QUERY(id));
+
+ info-name = cfg-event-name;
+ info-query_type = NV50_HW_PM_QUERY(id);
+ info-max_value.u64 =
+(cfg-event-display == NV50_HW_PM_EVENT_DISPLAY_RATIO) ? 100 : 0;
+ return 1;
+  }
+   }
+
+   /* User asked for info about non-existing query. */
+   return 0;
+}
+
 void
 nv50_init_query_functions(struct nv50_context *nv50)
 {
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c 
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index 53817c0..f07798e 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -745,6 +745,7 @@ nv50_screen_create(struct nouveau_device *dev)
pscreen-get_param = nv50_screen_get_param;
pscreen-get_shader_param = nv50_screen_get_shader_param;
pscreen-get_paramf = nv50_screen_get_paramf;
+   pscreen-get_driver_query_info = nv50_screen_get_driver_query_info;
 
nv50_screen_init_resource_functions(pscreen);
 
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.h 
b/src/gallium/drivers/nouveau/nv50/nv50_screen.h
index 0449659..69127c0 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.h
@@ -143,6 +143,9 @@ nv50_screen(struct pipe_screen *screen)
 #define NV50_HW_PM_QUERY_TEX_CACHE_HIT  22
 #define NV50_HW_PM_QUERY_TEX_WAITS_FOR_FB   23
 
+int nv50_screen_get_driver_query_info(struct pipe_screen *, unsigned,
+  struct pipe_driver_query_info *);
+
 boolean nv50_blitter_create(struct nv50_screen *);
 void nv50_blitter_destroy(struct nv50_screen *);
 
-- 
2.4.4

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [RFC PATCH 0/8] nv50: expose global performance counters

2015-06-22 Thread Samuel Pitoiset
Hello there,

This series exposes NVIDIA's global performance counters for Tesla through the
Gallium's HUD and the GL_AMD_performance_monitor extension.

This adds support for 24 hardware events which have been reverse engineered
with PerfKit (Windows) and CUPTI (Linux). These hardware events will allow
developers to profile OpenGL applications.

To reduce latency and to improve accuracy, these global performance counters
are tied to the command stream of the GPU using a set of software methods
instead of ioctls. Results are then written by the kernel to a mapped notifier
buffer object that allows the userspace to read back them.

However, the libdrm branch which implements the new nvif interface exposed by
Nouveau and the software methods interface are not upstream yet. I hope this
should done in the next days.

The code of this series can be found here:
http://cgit.freedesktop.org/~hakzsam/mesa/log/?h=nouveau_perfmon

The libdrm branch can be found here:
http://cgit.freedesktop.org/~hakzsam/drm/log/?h=nouveau_perfmon

The code of the software methods interface can be found here (two last commits):
http://cgit.freedesktop.org/~hakzsam/nouveau/log/?h=nouveau_perfmon

An other series which exposes global performance counters for Fermi and Kepler
will be submitted once I have got enough reviews for this one.

Feel free to make a review.

Thanks,
Samuel.

Samuel Pitoiset (8):
  nouveau: implement the nvif hardware performance counters interface
  nv50: allocate a software object class
  nv50: allocate and map a notifier buffer object for PM
  nv50: configure the ring buffer for reading back PM counters
  nv50: prevent NULL pointer dereference with pipe_query functions
  nv50: add support for compute/graphics global performance counters
  nv50: expose global performance counters to the HUD
  nv50: enable GL_AMD_performance_monitor

 src/gallium/drivers/nouveau/Makefile.sources   |2 +
 src/gallium/drivers/nouveau/nouveau_perfmon.c  |  302 +++
 src/gallium/drivers/nouveau/nouveau_perfmon.h  |   59 ++
 src/gallium/drivers/nouveau/nouveau_screen.c   |5 +
 src/gallium/drivers/nouveau/nouveau_screen.h   |1 +
 src/gallium/drivers/nouveau/nv50/nv50_query.c  | 1148 +++-
 src/gallium/drivers/nouveau/nv50/nv50_screen.c |   49 +
 src/gallium/drivers/nouveau/nv50/nv50_screen.h |   51 ++
 src/gallium/drivers/nouveau/nv50/nv50_winsys.h |1 +
 9 files changed, 1612 insertions(+), 6 deletions(-)
 create mode 100644 src/gallium/drivers/nouveau/nouveau_perfmon.c
 create mode 100644 src/gallium/drivers/nouveau/nouveau_perfmon.h

-- 
2.4.4

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [RFC PATCH 8/8] nv50: enable GL_AMD_performance_monitor

2015-06-22 Thread Samuel Pitoiset
This exposes a group of global performance counters that enables
GL_AMD_performance_monitor. All piglit tests are okay.

Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com
---
 src/gallium/drivers/nouveau/nv50/nv50_query.c  | 35 ++
 src/gallium/drivers/nouveau/nv50/nv50_screen.c |  1 +
 src/gallium/drivers/nouveau/nv50/nv50_screen.h |  6 +
 3 files changed, 42 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_query.c 
b/src/gallium/drivers/nouveau/nv50/nv50_query.c
index 062d427..6638e82 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_query.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_query.c
@@ -1566,6 +1566,7 @@ nv50_screen_get_driver_query_info(struct pipe_screen 
*pscreen,
 
  info-name = cfg-event-name;
  info-query_type = NV50_HW_PM_QUERY(id);
+ info-group_id = NV50_HW_PM_QUERY_GROUP;
  info-max_value.u64 =
 (cfg-event-display == NV50_HW_PM_EVENT_DISPLAY_RATIO) ? 100 : 0;
  return 1;
@@ -1576,6 +1577,40 @@ nv50_screen_get_driver_query_info(struct pipe_screen 
*pscreen,
return 0;
 }
 
+int
+nv50_screen_get_driver_query_group_info(struct pipe_screen *pscreen,
+unsigned id,
+struct pipe_driver_query_group_info 
*info)
+{
+   struct nv50_screen *screen = nv50_screen(pscreen);
+   int count = 0;
+
+   // TODO: Check DRM version when nvif will be merged in libdrm!
+   if (screen-base.perfmon) {
+  count++; /* NV50_HW_PM_QUERY_GROUP */
+   }
+
+   if (!info)
+  return count;
+
+   if (id == NV50_HW_PM_QUERY_GROUP) {
+  if (screen-base.perfmon) {
+ info-name = Global performance counters;
+ info-type = PIPE_DRIVER_QUERY_GROUP_TYPE_GPU;
+ info-num_queries = NV50_HW_PM_QUERY_COUNT;
+ info-max_active_queries = 1; /* TODO: get rid of this limitation! */
+ return 1;
+  }
+   }
+
+   /* user asked for info about non-existing query group */
+   info-name = this_is_not_the_query_group_you_are_looking_for;
+   info-max_active_queries = 0;
+   info-num_queries = 0;
+   info-type = 0;
+   return 0;
+}
+
 void
 nv50_init_query_functions(struct nv50_context *nv50)
 {
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c 
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index f07798e..dfe20c9 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -746,6 +746,7 @@ nv50_screen_create(struct nouveau_device *dev)
pscreen-get_shader_param = nv50_screen_get_shader_param;
pscreen-get_paramf = nv50_screen_get_paramf;
pscreen-get_driver_query_info = nv50_screen_get_driver_query_info;
+   pscreen-get_driver_query_group_info = 
nv50_screen_get_driver_query_group_info;
 
nv50_screen_init_resource_functions(pscreen);
 
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.h 
b/src/gallium/drivers/nouveau/nv50/nv50_screen.h
index 69127c0..807ae0e 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.h
@@ -114,6 +114,9 @@ nv50_screen(struct pipe_screen *screen)
return (struct nv50_screen *)screen;
 }
 
+/* Hardware global performance counters groups. */
+#define NV50_HW_PM_QUERY_GROUP 0
+
 /* Hardware global performance counters. */
 #define NV50_HW_PM_QUERY_COUNT  24
 #define NV50_HW_PM_QUERY(i)(PIPE_QUERY_DRIVER_SPECIFIC + (i))
@@ -146,6 +149,9 @@ nv50_screen(struct pipe_screen *screen)
 int nv50_screen_get_driver_query_info(struct pipe_screen *, unsigned,
   struct pipe_driver_query_info *);
 
+int nv50_screen_get_driver_query_group_info(struct pipe_screen *, unsigned,
+struct 
pipe_driver_query_group_info *);
+
 boolean nv50_blitter_create(struct nv50_screen *);
 void nv50_blitter_destroy(struct nv50_screen *);
 
-- 
2.4.4

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [RFC PATCH 5/8] nv50: prevent NULL pointer dereference with pipe_query functions

2015-06-22 Thread Samuel Pitoiset
This may happen when nv50_query_create() fails to create a new query.

Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com
---
 src/gallium/drivers/nouveau/nv50/nv50_query.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_query.c 
b/src/gallium/drivers/nouveau/nv50/nv50_query.c
index 55fcac8..1162110 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_query.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_query.c
@@ -96,6 +96,9 @@ nv50_query_allocate(struct nv50_context *nv50, struct 
nv50_query *q, int size)
 static void
 nv50_query_destroy(struct pipe_context *pipe, struct pipe_query *pq)
 {
+   if (!pq)
+  return;
+
nv50_query_allocate(nv50_context(pipe), nv50_query(pq), 0);
nouveau_fence_ref(NULL, nv50_query(pq)-fence);
FREE(nv50_query(pq));
@@ -152,6 +155,9 @@ nv50_query_begin(struct pipe_context *pipe, struct 
pipe_query *pq)
struct nouveau_pushbuf *push = nv50-base.pushbuf;
struct nv50_query *q = nv50_query(pq);
 
+   if (!pq)
+  return FALSE;
+
/* For occlusion queries we have to change the storage, because a previous
 * query might set the initial render conition to FALSE even *after* we re-
 * initialized it to TRUE.
@@ -218,6 +224,9 @@ nv50_query_end(struct pipe_context *pipe, struct pipe_query 
*pq)
struct nouveau_pushbuf *push = nv50-base.pushbuf;
struct nv50_query *q = nv50_query(pq);
 
+   if (!pq)
+  return;
+
q-state = NV50_QUERY_STATE_ENDED;
 
switch (q-type) {
@@ -294,9 +303,12 @@ nv50_query_result(struct pipe_context *pipe, struct 
pipe_query *pq,
uint64_t *res64 = (uint64_t *)result;
uint32_t *res32 = (uint32_t *)result;
boolean *res8 = (boolean *)result;
-   uint64_t *data64 = (uint64_t *)q-data;
+   uint64_t *data64;
int i;
 
+   if (!pq)
+  return FALSE;
+
if (q-state != NV50_QUERY_STATE_READY)
   nv50_query_update(q);
 
@@ -314,6 +326,7 @@ nv50_query_result(struct pipe_context *pipe, struct 
pipe_query *pq,
}
q-state = NV50_QUERY_STATE_READY;
 
+   data64 = (uint64_t *)q-data;
switch (q-type) {
case PIPE_QUERY_GPU_FINISHED:
   res8[0] = TRUE;
-- 
2.4.4

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [RFC PATCH 4/8] nv50: configure the ring buffer for reading back PM counters

2015-06-22 Thread Samuel Pitoiset
To write data at the right offset, the kernel has to know some
parameters of this ring buffer, like the number of domains and the
maximum number of queries.

Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com
---
 src/gallium/drivers/nouveau/nv50/nv50_screen.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c 
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index 3a99cc8..53817c0 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -441,6 +441,13 @@ nv50_screen_init_hwctx(struct nv50_screen *screen)
 
BEGIN_NV04(push, SUBC_SW(NV01_SUBCHAN_OBJECT), 1);
PUSH_DATA (push, screen-sw-handle);
+   BEGIN_NV04(push, SUBC_SW(0x0190), 1);
+   PUSH_DATA (push, screen-query-handle);
+   // XXX: Maybe add a check for DRM version here ?
+   BEGIN_NV04(push, SUBC_SW(0x0600), 1);
+   PUSH_DATA (push, NV50_HW_PM_RING_BUFFER_MAX_QUERIES);
+   BEGIN_NV04(push, SUBC_SW(0x0604), 1);
+   PUSH_DATA (push, NV50_HW_PM_RING_BUFFER_NUM_DOMAINS);
 
BEGIN_NV04(push, NV50_3D(COND_MODE), 1);
PUSH_DATA (push, NV50_3D_COND_MODE_ALWAYS);
-- 
2.4.4

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


  1   2   >