from:"Tobias Klausmann"

[Nouveau] [PATCH] nouveau: forward error generated while resuming objects tree

2019-03-28 Thread Tobias Klausmann

On a failed resume we may experience unrecoverable errors. Plumb the error code
through to actually let the driver fail. On a reverse-prime setup this helps the
drm subsystem to at least recover the integrated gpu.

This can especially happen with secboot timing out, leaving the hardware in a
non-functioning state.

Signed-off-by: Tobias Klausmann 
---
 drivers/gpu/drm/nouveau/nouveau_drm.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c 
b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 5020265bfbd9..56a107f3a0e1 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -802,10 +802,15 @@ nouveau_do_suspend(struct drm_device *dev, bool runtime)
 static int
 nouveau_do_resume(struct drm_device *dev, bool runtime)
 {
+   int ret = 0;
struct nouveau_drm *drm = nouveau_drm(dev);
 
NV_DEBUG(drm, "resuming object tree...\n");
-   nvif_client_resume(>master.base);
+   ret = nvif_client_resume(>master.base);
+   if (ret) {
+   NV_ERROR(drm, "Client resume failed with error: %d\n", ret);
+   return ret;
+   }
 
NV_DEBUG(drm, "resuming fence...\n");
if (drm->fence && nouveau_fence(drm)->resume)
@@ -925,6 +930,7 @@ nouveau_pmops_runtime_resume(struct device *dev)
 {
struct pci_dev *pdev = to_pci_dev(dev);
struct drm_device *drm_dev = pci_get_drvdata(pdev);
+   struct nouveau_drm *drm = nouveau_drm(drm_dev);
struct nvif_device *device = _drm(drm_dev)->client.device;
int ret;
 
@@ -941,6 +947,10 @@ nouveau_pmops_runtime_resume(struct device *dev)
pci_set_master(pdev);
 
ret = nouveau_do_resume(drm_dev, true);
+   if (ret) {
+   NV_ERROR(drm, "resume failed with: %d\n", ret);
+   return ret;
+   }
 
/* do magic */
nvif_mask(>object, 0x088488, (1 << 25), (1 << 25));
-- 
2.21.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] Nouveau dmem NULL Pointer deref (SVM)

2019-03-21 Thread Tobias Klausmann



On 21.03.19 20:30, Tobias Klausmann wrote:

On 21.03.19 18:12, Jerome Glisse wrote:

On Thu, Mar 21, 2019 at 04:59:14PM +0100, Tobias Klausmann wrote:

Hi,

just for your information and maybe for some help: with 5.1rc1 and SVM
enabled i see the following backtrace [1] when the nouveau card 
(reverse
prime) goes to sleep, for now i have papered over with [2] which 
leaves me
with userspace hangs. Any pointers where to look for the actual 
culprit?


PS: Card is: nouveau :01:00.0: NVIDIA GP106 (136000a1)

Greetings,

Tobias

Can you check if attached patch fix the issue ?

Cheers,
Jérôme



Hi,

the patch is fine, you can add my R-b & Tested-by!



Of course i tested the second patch you send out, not the first one!




PS: yet i have another unrelated error keeping my card from beeing 
happy, thats now the next on my todo list:


[ 1102.004901] [ cut here ]
[ 1102.004902] nouveau :01:00.0: timeout
[ 1102.004948] WARNING: CPU: 2 PID: 55 at 
drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c:183 
acr_ls_sec2_post_run+0x139/0x190 [nouveau]
[ 1102.004949] Modules linked in: rfcomm af_packet bnep btusb uvcvideo 
btrtl btbcm rtsx_usb_sdmmc btintel videobuf2_vmalloc rtsx_usb_ms 
videobuf2_memops mmc_core bluetooth memstick videobuf2_v4l2 videodev 
videobuf2_common ecdh_generic rtsx_usb snd_hda_codec_hdmi usbhid 
snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio nouveau arc4 
nls_iso8859_1 nls_cp437 i915 vfat fat intel_rapl x86_pkg_temp_thermal 
intel_powerclamp coretemp kvm_intel ath10k_pci msr kvm ath10k_core 
snd_hda_intel irqbypass ath mxm_wmi snd_hda_codec ttm joydev mac80211 
snd_hda_core drm_kms_helper crct10dif_pclmul snd_hwdep crc32_pclmul 
snd_pcm crc32c_intel drm hid_multitouch ghash_clmulni_intel snd_timer 
hid_generic iTCO_wdt aesni_intel mei_hdcp iTCO_vendor_support snd 
aes_x86_64 fb_sys_fops cfg80211 crypto_simd acerfan syscopyarea r8169 
sysfillrect cryptd sysimgblt glue_helper realtek idma64 acer_wmi 
i2c_algo_bit mei_me libphy pcspkr sparse_keymap intel_lpss_pci 
intel_wmi_thunderbolt soundcore
[ 1102.004965]  intel_pch_thermal mei i2c_i801 intel_lpss rfkill 
wmi_bmof thermal tpm_crb tpm_tis pinctrl_sunrisepoint tpm_tis_core ac 
pinctrl_intel battery tpm button acpi_pad pcc_cpufreq xhci_pci 
xhci_hcd serio_raw usbcore i2c_hid wmi video sg dm_multipath dm_mod 
scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs autofs4
[ 1102.004972] CPU: 2 PID: 55 Comm: kworker/2:1 Not tainted 
5.1.0-rc1-desktop-debug+ #80
[ 1102.004973] Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS 
V1.11 08/01/2018

[ 1102.004976] Workqueue: pm pm_runtime_work
[ 1102.005007] RIP: 0010:acr_ls_sec2_post_run+0x139/0x190 [nouveau]
[ 1102.005008] Code: 04 24 48 8b 40 10 48 8b 78 10 4c 8b 77 50 4d 85 
f6 74 1e e8 b9 2d 6a dd 48 89 c6 4c 89 f2 48 c7 c7 39 15 fb c0 e8 8c 
b6 20 dd <0f> 0b e9 4c ff ff ff 4c 8b 77 10 eb dc 48 8b 04 24 48 8b 40 
10 48

[ 1102.005009] RSP: 0018:a45c00ee7ab8 EFLAGS: 00010296
[ 1102.005009] RAX: 001d RBX: 912f0e366900 RCX: 
0006
[ 1102.005010] RDX: 0007 RSI: 0086 RDI: 
912f3ec963f0
[ 1102.005010] RBP:  R08: 03cb R09: 
0004
[ 1102.005011] R10:  R11: 0001 R12: 
912f330cc400
[ 1102.005011] R13: 0040 R14: 912df09f0060 R15: 
912df09f80b0
[ 1102.005012] FS:  () GS:912f3ec8() 
knlGS:

[ 1102.005012] CS:  0010 DS:  ES:  CR0: 80050033
[ 1102.005013] CR2: 7fed2968e020 CR3: 00028a728004 CR4: 
003606e0

[ 1102.005013] Call Trace:
[ 1102.005044]  acr_r352_bootstrap+0x16e/0x1d0 [nouveau]
[ 1102.005073]  acr_r352_reset+0x21/0x190 [nouveau]
[ 1102.005105]  gf100_gr_init_ctxctl_ext+0x59/0x500 [nouveau]
[ 1102.005136]  gf100_gr_init_ctxctl+0x19/0x270 [nouveau]
[ 1102.005167]  ? gf100_gr_init+0x533/0x570 [nouveau]
[ 1102.005181]  nvkm_engine_init+0xa2/0x120 [nouveau]
[ 1102.005196]  nvkm_subdev_init+0x8d/0xc0 [nouveau]
[ 1102.005226]  nvkm_device_init+0x107/0x190 [nouveau]
[ 1102.005255]  nvkm_udevice_init+0x3c/0x60 [nouveau]
[ 1102.005269]  nvkm_object_init+0x39/0x100 [nouveau]
[ 1102.005284]  nvkm_object_init+0x6c/0x100 [nouveau]
[ 1102.005299]  nvkm_object_init+0x6c/0x100 [nouveau]
[ 1102.005328]  nouveau_do_resume+0x23/0xb0 [nouveau]
[ 1102.005357]  nouveau_pmops_runtime_resume+0x7c/0x150 [nouveau]
[ 1102.005360]  ? pci_restore_standard_config+0x40/0x40
[ 1102.005361]  pci_pm_runtime_resume+0x6f/0xc0
[ 1102.005362]  ? pci_restore_standard_config+0x40/0x40
[ 1102.005363]  __rpm_callback+0x76/0x120
[ 1102.005365]  ? pci_restore_standard_config+0x40/0x40
[ 1102.005366]  rpm_callback+0x1a/0x70
[ 1102.005367]  ? pci_restore_standard_config+0x40/0x40
[ 1102.005368]  rpm_resume+0x3f5/0x5f0
[ 1102.005369]  pm_runtime_work+0x4e/0xa0
[ 1102.005370]  process_one_work+0x1d4/0x360
[ 1102.005372]  worker_thread+0x28/0x3c0
[ 1

Re: [Nouveau] Nouveau dmem NULL Pointer deref (SVM)

2019-03-21 Thread Tobias Klausmann


On 21.03.19 18:12, Jerome Glisse wrote:

On Thu, Mar 21, 2019 at 04:59:14PM +0100, Tobias Klausmann wrote:

Hi,

just for your information and maybe for some help: with 5.1rc1 and SVM
enabled i see the following backtrace [1] when the nouveau card (reverse
prime) goes to sleep, for now i have papered over with [2] which leaves me
with userspace hangs. Any pointers where to look for the actual culprit?

PS: Card is: nouveau :01:00.0: NVIDIA GP106 (136000a1)

Greetings,

Tobias

Can you check if attached patch fix the issue ?

Cheers,
Jérôme



Hi,

the patch is fine, you can add my R-b & Tested-by!

PS: yet i have another unrelated error keeping my card from beeing 
happy, thats now the next on my todo list:


[ 1102.004901] [ cut here ]
[ 1102.004902] nouveau :01:00.0: timeout
[ 1102.004948] WARNING: CPU: 2 PID: 55 at 
drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c:183 
acr_ls_sec2_post_run+0x139/0x190 [nouveau]
[ 1102.004949] Modules linked in: rfcomm af_packet bnep btusb uvcvideo 
btrtl btbcm rtsx_usb_sdmmc btintel videobuf2_vmalloc rtsx_usb_ms 
videobuf2_memops mmc_core bluetooth memstick videobuf2_v4l2 videodev 
videobuf2_common ecdh_generic rtsx_usb snd_hda_codec_hdmi usbhid 
snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio nouveau arc4 
nls_iso8859_1 nls_cp437 i915 vfat fat intel_rapl x86_pkg_temp_thermal 
intel_powerclamp coretemp kvm_intel ath10k_pci msr kvm ath10k_core 
snd_hda_intel irqbypass ath mxm_wmi snd_hda_codec ttm joydev mac80211 
snd_hda_core drm_kms_helper crct10dif_pclmul snd_hwdep crc32_pclmul 
snd_pcm crc32c_intel drm hid_multitouch ghash_clmulni_intel snd_timer 
hid_generic iTCO_wdt aesni_intel mei_hdcp iTCO_vendor_support snd 
aes_x86_64 fb_sys_fops cfg80211 crypto_simd acerfan syscopyarea r8169 
sysfillrect cryptd sysimgblt glue_helper realtek idma64 acer_wmi 
i2c_algo_bit mei_me libphy pcspkr sparse_keymap intel_lpss_pci 
intel_wmi_thunderbolt soundcore
[ 1102.004965]  intel_pch_thermal mei i2c_i801 intel_lpss rfkill 
wmi_bmof thermal tpm_crb tpm_tis pinctrl_sunrisepoint tpm_tis_core ac 
pinctrl_intel battery tpm button acpi_pad pcc_cpufreq xhci_pci xhci_hcd 
serio_raw usbcore i2c_hid wmi video sg dm_multipath dm_mod scsi_dh_rdac 
scsi_dh_emc scsi_dh_alua efivarfs autofs4
[ 1102.004972] CPU: 2 PID: 55 Comm: kworker/2:1 Not tainted 
5.1.0-rc1-desktop-debug+ #80
[ 1102.004973] Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.11 
08/01/2018

[ 1102.004976] Workqueue: pm pm_runtime_work
[ 1102.005007] RIP: 0010:acr_ls_sec2_post_run+0x139/0x190 [nouveau]
[ 1102.005008] Code: 04 24 48 8b 40 10 48 8b 78 10 4c 8b 77 50 4d 85 f6 
74 1e e8 b9 2d 6a dd 48 89 c6 4c 89 f2 48 c7 c7 39 15 fb c0 e8 8c b6 20 
dd <0f> 0b e9 4c ff ff ff 4c 8b 77 10 eb dc 48 8b 04 24 48 8b 40 10 48

[ 1102.005009] RSP: 0018:a45c00ee7ab8 EFLAGS: 00010296
[ 1102.005009] RAX: 001d RBX: 912f0e366900 RCX: 
0006
[ 1102.005010] RDX: 0007 RSI: 0086 RDI: 
912f3ec963f0
[ 1102.005010] RBP:  R08: 03cb R09: 
0004
[ 1102.005011] R10:  R11: 0001 R12: 
912f330cc400
[ 1102.005011] R13: 0040 R14: 912df09f0060 R15: 
912df09f80b0
[ 1102.005012] FS:  () GS:912f3ec8() 
knlGS:

[ 1102.005012] CS:  0010 DS:  ES:  CR0: 80050033
[ 1102.005013] CR2: 7fed2968e020 CR3: 00028a728004 CR4: 
003606e0

[ 1102.005013] Call Trace:
[ 1102.005044]  acr_r352_bootstrap+0x16e/0x1d0 [nouveau]
[ 1102.005073]  acr_r352_reset+0x21/0x190 [nouveau]
[ 1102.005105]  gf100_gr_init_ctxctl_ext+0x59/0x500 [nouveau]
[ 1102.005136]  gf100_gr_init_ctxctl+0x19/0x270 [nouveau]
[ 1102.005167]  ? gf100_gr_init+0x533/0x570 [nouveau]
[ 1102.005181]  nvkm_engine_init+0xa2/0x120 [nouveau]
[ 1102.005196]  nvkm_subdev_init+0x8d/0xc0 [nouveau]
[ 1102.005226]  nvkm_device_init+0x107/0x190 [nouveau]
[ 1102.005255]  nvkm_udevice_init+0x3c/0x60 [nouveau]
[ 1102.005269]  nvkm_object_init+0x39/0x100 [nouveau]
[ 1102.005284]  nvkm_object_init+0x6c/0x100 [nouveau]
[ 1102.005299]  nvkm_object_init+0x6c/0x100 [nouveau]
[ 1102.005328]  nouveau_do_resume+0x23/0xb0 [nouveau]
[ 1102.005357]  nouveau_pmops_runtime_resume+0x7c/0x150 [nouveau]
[ 1102.005360]  ? pci_restore_standard_config+0x40/0x40
[ 1102.005361]  pci_pm_runtime_resume+0x6f/0xc0
[ 1102.005362]  ? pci_restore_standard_config+0x40/0x40
[ 1102.005363]  __rpm_callback+0x76/0x120
[ 1102.005365]  ? pci_restore_standard_config+0x40/0x40
[ 1102.005366]  rpm_callback+0x1a/0x70
[ 1102.005367]  ? pci_restore_standard_config+0x40/0x40
[ 1102.005368]  rpm_resume+0x3f5/0x5f0
[ 1102.005369]  pm_runtime_work+0x4e/0xa0
[ 1102.005370]  process_one_work+0x1d4/0x360
[ 1102.005372]  worker_thread+0x28/0x3c0
[ 1102.005372]  ? process_one_work+0x360/0x360
[ 1102.005374]  kthread+0x10d/0x130
[ 1102.005375]  ? kthread_create_worker_on_cpu+0x40/0x40
[ 1

[Nouveau] Nouveau dmem NULL Pointer deref (SVM)

2019-03-21 Thread Tobias Klausmann


Hi,

just for your information and maybe for some help: with 5.1rc1 and SVM 
enabled i see the following backtrace [1] when the nouveau card (reverse 
prime) goes to sleep, for now i have papered over with [2] which leaves 
me with userspace hangs. Any pointers where to look for the actual culprit?


PS: Card is: nouveau :01:00.0: NVIDIA GP106 (136000a1)

Greetings,

Tobias


[1]:

BUG: unable to handle kernel NULL pointer dereference at 0028
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops:  [#1] PREEMPT SMP PTI
CPU: 3 PID: 435 Comm: kworker/3:4 Not tainted 5.1.0-rc1-desktop-debug+ #80
Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.11 08/01/2018
Workqueue: pm pm_runtime_work
RIP: 0010:nouveau_bo_unpin (linux/./include/linux/compiler.h:193 
linux/./arch/x86/include/asm/atomic.h:31 
linux/./include/asm-generic/atomic-instrumented.h:27 
linux/./include/linux/refcount.h:43 linux/./include/linux/kref.h:38 
linux/./include/drm/ttm/ttm_bo_driver.h:721 
linux/drivers/gpu/drm/nouveau/nouveau_bo.c:454) nouveau
Code: 89 d9 48 c7 c6 50 04 e5 c0 c4 42 79 f7 c0 bd f0 ff ff ff e8 42 d5 
7a c6 ff 83 00 04 00 00 e9 17 ff ff ff 41 54 55 53 48 89 fb <8b> 47 28 
85 c0 0f 84 cf 00 00 00 48 8b bb c0 01 00 00 31 f6 4c 8b

All code

   0:    89 d9        mov    %ebx,%ecx
   2:    48 c7 c6 50 04 e5 c0     mov    $0xc0e50450,%rsi
   9:    c4 42 79 f7 c0       shlx   %eax,%r8d,%r8d
   e:    bd f0 ff ff ff       mov    $0xfff0,%ebp
  13:    e8 42 d5 7a c6       callq  0xc67ad55a
  18:    ff 83 00 04 00 00        incl   0x400(%rbx)
  1e:    e9 17 ff ff ff       jmpq   0xff3a
  23:    41 54        push   %r12
  25:    55       push   %rbp
  26:    53       push   %rbx
  27:    48 89 fb     mov    %rdi,%rbx
  2a:*    8b 47 28     mov    0x28(%rdi),%eax <-- trapping 
instruction

  2d:    85 c0        test   %eax,%eax
  2f:    0f 84 cf 00 00 00        je 0x104
  35:    48 8b bb c0 01 00 00     mov    0x1c0(%rbx),%rdi
  3c:    31 f6        xor    %esi,%esi
  3e:    4c       rex.WR
  3f:    8b       .byte 0x8b

Code starting with the faulting instruction
===
   0:    8b 47 28     mov    0x28(%rdi),%eax
   3:    85 c0        test   %eax,%eax
   5:    0f 84 cf 00 00 00        je 0xda
   b:    48 8b bb c0 01 00 00     mov    0x1c0(%rbx),%rdi
  12:    31 f6        xor    %esi,%esi
  14:    4c       rex.WR
  15:    8b       .byte 0x8b
RSP: 0018:bf0b41237d20 EFLAGS: 00010216
RAX: 9dfe0ba2ec00 RBX:  RCX: c0ceb630
RDX: 9dfe0ba2ec38 RSI: 7fff RDI: 
RBP: 9dfe0a07e000 R08:  R09: c0d4a9a0
R10: 8080808080808080 R11: 1800 R12: 0001
R13:  R14:  R15: 0008
FS:  () GS:9dfe3ecc() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0028 CR3: 0001a500e002 CR4: 003606e0
Call Trace:
nouveau_dmem_suspend (linux/drivers/gpu/drm/nouveau/nouveau_dmem.c:482 
(discriminator 9)) nouveau

nouveau_do_suspend (linux/drivers/gpu/drm/nouveau/nouveau_drm.c:748) nouveau
nouveau_pmops_runtime_suspend 
(linux/drivers/gpu/drm/nouveau/nouveau_drm.c:915) nouveau

pci_pm_runtime_suspend (linux/drivers/pci/pci-driver.c:1262)
? __switch_to_asm (linux/arch/x86/entry/entry_64.S:312)
? pci_has_legacy_pm_support (linux/drivers/pci/pci-driver.c:1238)
__rpm_callback (linux/drivers/base/power/runtime.c:357)
? pci_has_legacy_pm_support (linux/drivers/pci/pci-driver.c:1238)
rpm_callback (linux/drivers/base/power/runtime.c:490)
? pci_has_legacy_pm_support (linux/drivers/pci/pci-driver.c:1238)
rpm_suspend (linux/drivers/base/power/runtime.c:629)
? __switch_to_asm (linux/arch/x86/entry/entry_64.S:312)
? __switch_to_asm (linux/arch/x86/entry/entry_64.S:312)
? __switch_to_asm (linux/arch/x86/entry/entry_64.S:312)
? __switch_to_asm (linux/arch/x86/entry/entry_64.S:312)
? __switch_to_asm (linux/arch/x86/entry/entry_64.S:312)
pm_runtime_work (linux/drivers/base/power/runtime.c:922)
process_one_work (linux/./arch/x86/include/asm/preempt.h:26 
linux/kernel/workqueue.c:2278)
worker_thread (linux/./include/linux/compiler.h:193 
linux/./include/linux/list.h:237 linux/kernel/workqueue.c:2416)

? process_one_work (linux/kernel/workqueue.c:2358)
kthread (linux/kernel/kthread.c:253)
? kthread_create_worker_on_cpu (linux/kernel/kthread.c:213)
ret_from_fork (linux/arch/x86/entry/entry_64.S:358)
Modules linked in: rfcomm af_packet snd_hda_codec_hdmi bnep uvcvideo 
videobuf2_vmalloc rtsx_usb_sdmmc videobuf2_memops btusb rtsx_usb_ms 
videobuf2_v4l2 btrtl mmc_core memstick btbcm videodev btintel 
videobuf2_common rtsx_usb bluetooth usbhid

Re: [Nouveau] [PATCH] drm/nouveau/therm/gp100: Do not report temperature when subdev is shadowed

2018-01-26 Thread Tobias Klausmann

Well fixing the return of wrong values in this function is reasonable by 
any means, of course not reading the mem in the first place would be 
nice, but deciding this is imho not in the scope of a temp_get function 
but somewhere in the code calling temp_get.


On 1/26/18 3:03 PM, Karol Herbst wrote:

well I just tried to say, that you are not fixing the issue you think
were fixing. In your case the GPU is powered off and you get garbage
values from any mmio read, so parsing those values is just wrong and
we need to prevent doing anything on the hw whenever it is powered off
directly in hwmon.

On Fri, Jan 26, 2018 at 2:40 PM, Tobias Klausmann
<tobias.johannes.klausm...@mni.thm.de> wrote:

Not sure if i understand completely what you intend to say here, with this
we prevent hwmon from reporting utterly wrong temperature values returning
an error (we could return -EBUSY or somehting instead, granted), yet if the
device is shadowed, getting a sane temp value out of is seems unlikely to
me!

Greetings,

Tobias


On 1/26/18 12:40 PM, Karol Herbst wrote:

no, we can't do that. We actually have to prevent this from hwom. The
issue here is, that the reg read returns 0x and parsing that
is the first step in the first place.

On Thu, Jan 25, 2018 at 7:16 PM, Tobias Klausmann
<tobias.johannes.klausm...@mni.thm.de> wrote:

This fixes wrong temperature outputs e.g. 511°C if the card is asleep.

Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
---
   drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c | 4 +++-
   1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c
b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c
index 9f0dea3f61dc..45d0ec632b5a 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c
@@ -32,8 +32,10 @@ gp100_temp_get(struct nvkm_therm *therm)
  u32 inttemp = (tsensor & 0x0001fff8);

  /* device SHADOWed */
-   if (tsensor & 0x4000)
+   if (tsensor & 0x4000) {
  nvkm_trace(subdev, "reading temperature from SHADOWed
sensor\n");
+   return -ENODEV;
+   }

  /* device valid */
  if (tsensor & 0x2000)
--
2.16.1

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH] drm/nouveau/therm/gp100: Do not report temperature when subdev is shadowed

2018-01-26 Thread Tobias Klausmann

Not sure if i understand completely what you intend to say here, with 
this we prevent hwmon from reporting utterly wrong temperature values 
returning an error (we could return -EBUSY or somehting instead, 
granted), yet if the device is shadowed, getting a sane temp value out 
of is seems unlikely to me!


Greetings,

Tobias

On 1/26/18 12:40 PM, Karol Herbst wrote:

no, we can't do that. We actually have to prevent this from hwom. The
issue here is, that the reg read returns 0x and parsing that
is the first step in the first place.

On Thu, Jan 25, 2018 at 7:16 PM, Tobias Klausmann
<tobias.johannes.klausm...@mni.thm.de> wrote:

This fixes wrong temperature outputs e.g. 511°C if the card is asleep.

Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
---
  drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c
index 9f0dea3f61dc..45d0ec632b5a 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c
@@ -32,8 +32,10 @@ gp100_temp_get(struct nvkm_therm *therm)
 u32 inttemp = (tsensor & 0x0001fff8);

 /* device SHADOWed */
-   if (tsensor & 0x4000)
+   if (tsensor & 0x4000) {
 nvkm_trace(subdev, "reading temperature from SHADOWed 
sensor\n");
+   return -ENODEV;
+   }

 /* device valid */
 if (tsensor & 0x2000)
--
2.16.1

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH v2] nv50/ir: Initialize all members of GCRA (trivial)

2017-12-30 Thread Tobias Klausmann

v2: use initialization list (Pierre)

Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
Reviewed-by: Pierre Moreau <pierre.mor...@free.fr>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
index 361918a161..a70a54f6b8 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
@@ -1144,7 +1144,9 @@ GCRA::RIG_Node::addRegPreference(RIG_Node *node)
 GCRA::GCRA(Function *fn, SpillCodeInserter& spill) :
func(fn),
regs(fn->getProgram()->getTarget()),
-   spill(spill)
+   spill(spill),
+   nodeCount(0),
+   nodes(NULL)
 {
prog = func->getProgram();
 
-- 
2.15.1

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] nouveau. swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152

2017-12-18 Thread Tobias Klausmann



On 12/18/17 7:06 PM, Mike Galbraith wrote:

Greetings,

Kernel bound workloads seem to trigger the below for whatever reason.
  I only see this when beating up NFS.  There was a kworker wakeup
latency issue, but with a bandaid applied to fix that up, I can still
trigger this.



Hi,

i have seen this one as well with my system, but i could not find an 
easy way to trigger it for bisecting purpose. If you can trigger it 
conveniently, a bisect would be nice!


Greetings,

Tobias




[ 1313.811031] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
[ 1313.811035] swiotlb: coherent allocation failed for device :01:00.0 
size=2097152
[ 1313.811038] CPU: 6 PID: 3026 Comm: Xorg Tainted: GE
4.15.0.g1291a0d5-master #355
[ 1313.811040] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 
09/23/2013
[ 1313.811041] Call Trace:
[ 1313.811049]  dump_stack+0x7c/0xb6
[ 1313.811053]  swiotlb_alloc_coherent+0x13f/0x150
[ 1313.811060]  ttm_dma_pool_alloc_new_pages+0x106/0x3c0 [ttm]
[ 1313.811066]  ttm_dma_pool_get_pages+0x10a/0x1e0 [ttm]
[ 1313.811070]  ttm_dma_populate+0x21f/0x2f0 [ttm]
[ 1313.811075]  ttm_tt_bind+0x2f/0x60 [ttm]
[ 1313.811079]  ttm_bo_handle_move_mem+0x51f/0x580 [ttm]
[ 1313.811084]  ? ttm_bo_handle_move_mem+0x5/0x580 [ttm]
[ 1313.811088]  ttm_bo_validate+0x10c/0x120 [ttm]
[ 1313.811092]  ? ttm_bo_validate+0x5/0x120 [ttm]
[ 1313.811106]  ? drm_mode_setcrtc+0x20e/0x540 [drm]
[ 1313.811109]  ttm_bo_init_reserved+0x290/0x490 [ttm]
[ 1313.84]  ttm_bo_init+0x52/0xb0 [ttm]
[ 1313.811141]  ? nv10_bo_put_tile_region+0x60/0x60 [nouveau]
[ 1313.811163]  nouveau_bo_new+0x465/0x5e0 [nouveau]
[ 1313.811184]  ? nv10_bo_put_tile_region+0x60/0x60 [nouveau]
[ 1313.811203]  nouveau_gem_new+0x66/0x110 [nouveau]
[ 1313.811223]  ? nouveau_gem_new+0x110/0x110 [nouveau]
[ 1313.811241]  nouveau_gem_ioctl_new+0x48/0xc0 [nouveau]
[ 1313.811249]  drm_ioctl_kernel+0x64/0xb0 [drm]
[ 1313.811257]  drm_ioctl+0x2a4/0x360 [drm]
[ 1313.811276]  ? nouveau_gem_new+0x110/0x110 [nouveau]
[ 1313.811285]  ? drm_ioctl+0x5/0x360 [drm]
[ 1313.811304]  nouveau_drm_ioctl+0x50/0xb0 [nouveau]
[ 1313.811308]  do_vfs_ioctl+0x90/0x690
[ 1313.811311]  ? do_vfs_ioctl+0x5/0x690
[ 1313.811313]  SyS_ioctl+0x3b/0x70
[ 1313.811316]  entry_SYSCALL_64_fastpath+0x1f/0x91
[ 1313.811320] RIP: 0033:0x7f3234746227
[ 1313.811321] RSP: 002b:7ffc3ace0408 EFLAGS: 3246 ORIG_RAX: 
0010
[ 1313.811324] RAX: ffda RBX: 025515d0 RCX: 7f3234746227
[ 1313.811325] RDX: 7ffc3ace0460 RSI: c0306480 RDI: 000b
[ 1313.811326] RBP: 00824120 R08: 02548f80 R09: 025490d0
[ 1313.811328] R10:  R11: 3246 R12: 093d
[ 1313.811329] R13: 02aff74c R14: 00824150 R15: 

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH] Accept 3d controllers and not only VGA controllers.

2017-12-14 Thread Tobias Klausmann



On 12/3/17 8:56 PM, Josef Larsson wrote:

Sure, I can easily split it into two commits, but I would like to have
an OK on the actual code changes before splitting the patch.

Best regards,

Josef Larsson


On 2017-11-11 01:05, Tobias Klausmann wrote:

On 11/10/17 7:49 PM, Josef Larsson wrote:

Accept 3d controllers and not only VGA controllers. According to Ilia
Mirkin,
the VGA controller check should be removed. This makes it possible
to use external connectors on a docking station (40A5) for a Thinkpad
P51.
(See Bug 101778).

lspci example:

01:00.0 3D controller: NVIDIA Corporation GM107GLM [Quadro M1200 Mobile]
(rev a2)

Also include safe-guards to avoid NULL dereferencing of fbcon, which is
how this bug was found.
---
   drivers/gpu/drm/nouveau/nouveau_fbcon.c |  3 +--
   drivers/gpu/drm/nouveau/nv50_display.c  | 13 +
   2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_fbcon.c
b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
index 2b12d82aac15..6b4d374a9d82 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fbcon.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
@@ -498,8 +498,7 @@ nouveau_fbcon_init(struct drm_device *dev)
   int preferred_bpp;
   int ret;
   -    if (!dev->mode_config.num_crtc ||
-        (dev->pdev->class >> 8) != PCI_CLASS_DISPLAY_VGA)
+    if (!dev->mode_config.num_crtc)
       return 0;
     fbcon = kzalloc(sizeof(struct nouveau_fbdev), GFP_KERNEL);
diff --git a/drivers/gpu/drm/nouveau/nv50_display.c
b/drivers/gpu/drm/nouveau/nv50_display.c
index fb47d46050ec..061daf036407 100644
--- a/drivers/gpu/drm/nouveau/nv50_display.c
+++ b/drivers/gpu/drm/nouveau/nv50_display.c
@@ -3214,6 +3214,13 @@ nv50_mstm_destroy_connector(struct
drm_dp_mst_topology_mgr *mgr,
   struct nouveau_drm *drm = nouveau_drm(connector->dev);
   struct nv50_mstc *mstc = nv50_mstc(connector);
   +    if (!drm->fbcon)
+    {
+        NV_WARN(drm, "drm->fbcon of %s point to NULL. Will not destroy
connector\n",
+            connector->name);
+        return;
+    }
+
   drm_connector_unregister(>connector);
     drm_modeset_lock_all(drm->dev);
@@ -3229,6 +3236,12 @@ nv50_mstm_register_connector(struct drm_connector
*connector)
   {
   struct nouveau_drm *drm = nouveau_drm(connector->dev);
   +    if (!drm->fbcon)
+    {
+        NV_WARN(drm, "drm->fbcon of %s point to NULL. Will not register
connector\n",
+            connector->name);
+        return;
+    }
   drm_modeset_lock_all(drm->dev);
   drm_fb_helper_add_one_connector(>fbcon->helper, connector);
   drm_modeset_unlock_all(drm->dev);



Hi,

the patch looks OK to me, yet as noted in IRC, i'd like to have this
patch split into two and have the ->fbcon check as a precondition to
the 3D Controller part. But lets see what the other and more clever
people think about it! :)

Greetings,

Tobias





Ping,

adding Ben Skeggs and Dave Airlied to CC, maybe this will get this 
little one commited!


Greetings,

Tobias


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [RFC PATCH] gr: did you try turning it off and on again.

2017-11-28 Thread Tobias Klausmann


Hi,

comments inline

On 11/28/17 2:11 PM, Karol Herbst wrote:

Fixes secure boot on my gp107. No idea why. Otherwise the GPU enters
complete lockdown after starting the gpccs and fecs with the LS images
loaded.

Signed-off-by: Karol Herbst 
---
  drm/nouveau/nvkm/engine/gr/gf100.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/drm/nouveau/nvkm/engine/gr/gf100.c 
b/drm/nouveau/nvkm/engine/gr/gf100.c
index 2f8dc107..322d9fa6 100644
--- a/drm/nouveau/nvkm/engine/gr/gf100.c
+++ b/drm/nouveau/nvkm/engine/gr/gf100.c
@@ -1731,8 +1731,15 @@ gf100_gr_init_(struct nvkm_gr *base)
  {
struct gf100_gr *gr = gf100_gr(base);
struct nvkm_subdev *subdev = >engine.subdev;
+   struct nvkm_device *device = subdev->device;
u32 ret;
  
+	/* did you try turning it off and on again? Apparently we need this

+* on pascal, otherwise secboot will just fail.



The comments about the off and on looks silly, at least put it in 
quotation marks, or rewrite it, e.g. "Apparently we need to turn it off 
and on for the pascal generation, otherwise secboot will just fail."



+*/
+   nvkm_mask(device, 0x200, 0x1000, 0x);
+   nvkm_mask(device, 0x200, 0x1000, 0x1000);
+



It is needed with pascal, but does it harm other generations calling 
this init? Maybe guard it against exectution on maxwell



Greetings,

Tobias



nvkm_pmu_pgob(gr->base.engine.subdev.device->pmu, false);
  
  	ret = nvkm_falcon_get(gr->fecs, subdev);

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH v3] nouveau/compiler: Allow to omit line numbers when printing instructions

2017-11-24 Thread Tobias Klausmann

This comes in handy when checking "NV50_PROG_DEBUG=1" outputs with diff!

V2:
 - Use environmental variable (Karol Herbst)
V3:
 - Use the already populated nv50_ir_prog_info to forward information to the
   print pass (Pierre Moreau)

Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h  |  1 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp | 12 +---
 src/gallium/drivers/nouveau/nouveau_compiler.c|  2 ++
 src/gallium/drivers/nouveau/nv50/nv50_program.c   |  2 ++
 src/gallium/drivers/nouveau/nvc0/nvc0_program.c   |  2 ++
 5 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
index ffd53c9cd3..604a22ba89 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
@@ -82,6 +82,7 @@ struct nv50_ir_prog_info
 
uint8_t optLevel; /* optimization level (0 to 3) */
uint8_t dbgFlags;
+   bool omitLineNum;
 
struct {
   int16_t maxGPR; /* may be -1 if none used */
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
index 9145801b62..eb7e9057b5 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
@@ -691,7 +691,7 @@ void Instruction::print() const
 class PrintPass : public Pass
 {
 public:
-   PrintPass() : serial(0) { }
+   PrintPass(bool omitLineNum = false) : serial(0), omit_serial(omitLineNum) { 
}
 
virtual bool visit(Function *);
virtual bool visit(BasicBlock *);
@@ -699,6 +699,7 @@ public:
 
 private:
int serial;
+   bool omit_serial;
 };
 
 bool
@@ -762,7 +763,12 @@ PrintPass::visit(BasicBlock *bb)
 bool
 PrintPass::visit(Instruction *insn)
 {
-   INFO("%3i: ", serial++);
+   if (omit_serial) {
+  INFO(" ");
+  serial++;
+   }
+   else
+  INFO("%3i: ", serial++);
insn->print();
return true;
 }
@@ -777,7 +783,7 @@ Function::print()
 void
 Program::print()
 {
-   PrintPass pass;
+   PrintPass pass(driver->omitLineNum);
init_colours();
pass.run(this, true, false);
 }
diff --git a/src/gallium/drivers/nouveau/nouveau_compiler.c 
b/src/gallium/drivers/nouveau/nouveau_compiler.c
index 3151a6f420..1214cf3565 100644
--- a/src/gallium/drivers/nouveau/nouveau_compiler.c
+++ b/src/gallium/drivers/nouveau/nouveau_compiler.c
@@ -122,6 +122,8 @@ nouveau_codegen(int chipset, int type, struct tgsi_token 
tokens[],
 
info.optLevel = debug_get_num_option("NV50_PROG_OPTIMIZE", 3);
info.dbgFlags = debug_get_num_option("NV50_PROG_DEBUG", 0);
+   info.omitLineNum =
+ debug_get_num_option("NV50_PROG_DEBUG_OMIT_LINENUM", 0) ? true : 
false;
 
ret = nv50_ir_generate_code();
if (ret) {
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c 
b/src/gallium/drivers/nouveau/nv50/nv50_program.c
index 6e943a3d94..fb5c9ed777 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c
@@ -367,6 +367,8 @@ nv50_program_translate(struct nv50_program *prog, uint16_t 
chipset,
 #ifdef DEBUG
info->optLevel = debug_get_num_option("NV50_PROG_OPTIMIZE", 3);
info->dbgFlags = debug_get_num_option("NV50_PROG_DEBUG", 0);
+   info->omitLineNum =
+ debug_get_num_option("NV50_PROG_DEBUG_OMIT_LINENUM", 0) ? true : 
false;
 #else
info->optLevel = 3;
 #endif
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
index c95a96c717..8dced66437 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
@@ -575,6 +575,8 @@ nvc0_program_translate(struct nvc0_program *prog, uint16_t 
chipset,
info->target = debug_get_num_option("NV50_PROG_CHIPSET", chipset);
info->optLevel = debug_get_num_option("NV50_PROG_OPTIMIZE", 3);
info->dbgFlags = debug_get_num_option("NV50_PROG_DEBUG", 0);
+   info->omitLineNum =
+ debug_get_num_option("NV50_PROG_DEBUG_OMIT_LINENUM", 0) ? true : 
false;
 #else
info->optLevel = 3;
 #endif
-- 
2.15.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH v2] nouveau/compiler: Allow to omit line numbers when printing instructions

2017-11-17 Thread Tobias Klausmann

This comes in handy when checking "NV50_PROG_DEBUG=1" outputs with diff!

V2:
 - Use environmental variable (Karol Herbst)

Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir.cpp|  6 +++---
 src/gallium/drivers/nouveau/codegen/nv50_ir.h  |  2 +-
 src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h   |  1 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp  | 14 ++
 src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp |  2 +-
 src/gallium/drivers/nouveau/nouveau_compiler.c |  3 +++
 src/gallium/drivers/nouveau/nv50/nv50_program.c|  2 ++
 src/gallium/drivers/nouveau/nvc0/nvc0_program.c|  2 ++
 8 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
index e9363101bf..4bf6c73837 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
@@ -1249,7 +1249,7 @@ nv50_ir_generate_code(struct nv50_ir_prog_info *info)
if (ret < 0)
   goto out;
if (prog->dbgFlags & NV50_IR_DEBUG_VERBOSE)
-  prog->print();
+  prog->print(info->omitLineNum);
 
targ->parseDriverInfo(info);
prog->getTarget()->runLegalizePass(prog, nv50_ir::CG_STAGE_PRE_SSA);
@@ -1257,13 +1257,13 @@ nv50_ir_generate_code(struct nv50_ir_prog_info *info)
prog->convertToSSA();
 
if (prog->dbgFlags & NV50_IR_DEBUG_VERBOSE)
-  prog->print();
+  prog->print(info->omitLineNum);
 
prog->optimizeSSA(info->optLevel);
prog->getTarget()->runLegalizePass(prog, nv50_ir::CG_STAGE_SSA);
 
if (prog->dbgFlags & NV50_IR_DEBUG_BASIC)
-  prog->print();
+  prog->print(info->omitLineNum);
 
if (!prog->registerAllocation()) {
   ret = -4;
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
index f2ce16d882..a3c7fd2f94 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
@@ -1249,7 +1249,7 @@ public:
Program(Type type, Target *targ);
~Program();
 
-   void print();
+   void print(bool omitLineNum);
 
Type getType() const { return progType; }
 
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
index ffd53c9cd3..604a22ba89 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
@@ -82,6 +82,7 @@ struct nv50_ir_prog_info
 
uint8_t optLevel; /* optimization level (0 to 3) */
uint8_t dbgFlags;
+   bool omitLineNum;
 
struct {
   int16_t maxGPR; /* may be -1 if none used */
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
index 9145801b62..d6fe928af4 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
@@ -691,7 +691,7 @@ void Instruction::print() const
 class PrintPass : public Pass
 {
 public:
-   PrintPass() : serial(0) { }
+   PrintPass(bool omitLineNum = false) : serial(0), omit_serial(omitLineNum) { 
}
 
virtual bool visit(Function *);
virtual bool visit(BasicBlock *);
@@ -699,6 +699,7 @@ public:
 
 private:
int serial;
+   bool omit_serial;
 };
 
 bool
@@ -762,7 +763,12 @@ PrintPass::visit(BasicBlock *bb)
 bool
 PrintPass::visit(Instruction *insn)
 {
-   INFO("%3i: ", serial++);
+   if (omit_serial) {
+  INFO(" ");
+  serial++;
+   }
+   else
+  INFO("%3i: ", serial++);
insn->print();
return true;
 }
@@ -775,9 +781,9 @@ Function::print()
 }
 
 void
-Program::print()
+Program::print(bool omitLineNum)
 {
-   PrintPass pass;
+   PrintPass pass(omitLineNum);
init_colours();
pass.run(this, true, false);
 }
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
index 298e7c6ef9..96ad70d28a 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
@@ -371,7 +371,7 @@ Program::emitBinary(struct nv50_ir_prog_info *info)
emit->prepareEmission(this);
 
if (dbgFlags & NV50_IR_DEBUG_BASIC)
-  this->print();
+  this->print(info->omitLineNum);
 
if (!binSize) {
   code = NULL;
diff --git a/src/gallium/drivers/nouveau/nouveau_compiler.c 
b/src/gallium/drivers/nouveau/nouveau_compiler.c
index 3151a6f420..20a4966433 100644
--- a/src/gallium/drivers/nouveau/nouveau_compiler.c
+++ b/src/gallium/drivers/nouveau/nouveau_compiler.c
@@ -122,6 +122,8 @@ nouveau_codegen(int chipset, int type, struct tgsi_token 
tokens[],
 
info.optLevel = debug_get_num_option(

Re: [Nouveau] [PATCH] nouveau/codegen: dump tgsi floats as hex values

2017-11-16 Thread Tobias Klausmann



On 11/16/17 1:30 PM, Karol Herbst wrote:

the problem is, that you also need to be able to save the TGSI into a
file and run it rhough nouveau_compiler. Not really sure if it is
worth the effort. Printing hex instead of numbers make more sense in
this regard anyhow, because we are more precise and being able to
debug some issues much better in the end. As long as the new version
is still correctly parsed with nouveau_compiler,



Yes, it is still parsed correctly!



  this change is
acked-by me.

On Wed, Nov 15, 2017 at 10:52 PM, Tobias Klausmann
<tobias.johannes.klausm...@mni.thm.de> wrote:

Hi,

yeah in the long run showing both in an ordered manner would be a nice thing
to have! That would include patching the output and the tgsi parser (who
wants to delete half the output to parse it again e.g. with
nouveau_compiler).

I can image an output similar to the one below:

IMM[5] FLT32 {0., 0., 0., 0.} ^ IMM[5] FLT32
{0x0019, 0x000f, 0x0005, 0x001e}
IMM[6] FLT32 {0., 0., 0., 0.} = IMM[6] FLT32
{0x001e, 0x0005, 0x000a, 0x0014}
IMM[7] FLT32 {0., 0., 0., 0.}   IMM[7] FLT32
{0x0014, 0x000a, 0x000f, 0x0019}

Greetings,

Tobias


PS: I have no push rights to commit this!



On 11/15/17 10:44 PM, Pierre Moreau wrote:

This looks like the saner approach, compared to changing tgsi_dump.c to
display
more fractional digits. Maybe there could be a second option to display as
both
float and hex?

Reviewed-by: Pierre Moreau <pierre.mor...@free.fr>

On 2017-11-14 — 15:11, Tobias Klausmann wrote:

Printing without this could lead to the following output, while the
values are
not exactly zero:
IMM[5] FLT32 {0., 0., 0., 0.}
IMM[6] FLT32 {0., 0., 0., 0.}
IMM[7] FLT32 {0., 0., 0., 0.}

when printing the values as hex, we can now see the differences:
IMM[5] FLT32 {0x0019, 0x000f, 0x0005, 0x001e}
IMM[6] FLT32 {0x001e, 0x0005, 0x000a, 0x0014}
IMM[7] FLT32 {0x0014, 0x000a, 0x000f, 0x0019}

Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
---
   src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index 34351dab51..898031811d 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -1095,7 +1095,7 @@ Source::Source(struct nv50_ir_prog_info *prog) :
info(prog)
  tokens = (const struct tgsi_token *)info->bin.source;
if (prog->dbgFlags & NV50_IR_DEBUG_BASIC)
-  tgsi_dump(tokens, 0);
+  tgsi_dump(tokens, TGSI_DUMP_FLOAT_AS_HEX);
   }
 Source::~Source()
--
2.15.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH] nouveau/codegen: dump tgsi floats as hex values

2017-11-15 Thread Tobias Klausmann


Hi,

yeah in the long run showing both in an ordered manner would be a nice 
thing to have! That would include patching the output and the tgsi 
parser (who wants to delete half the output to parse it again e.g. with 
nouveau_compiler).


I can image an output similar to the one below:

IMM[5] FLT32 {0., 0., 0., 0.} ^ IMM[5] FLT32 
{0x0019, 0x000f, 0x0005, 0x001e}
IMM[6] FLT32 {0., 0., 0., 0.} = IMM[6] FLT32 
{0x001e, 0x0005, 0x000a, 0x0014}
IMM[7] FLT32 {0., 0., 0., 0.}   IMM[7] FLT32 
{0x0014, 0x000a, 0x000f, 0x0019}

Greetings,

Tobias


PS: I have no push rights to commit this!


On 11/15/17 10:44 PM, Pierre Moreau wrote:

This looks like the saner approach, compared to changing tgsi_dump.c to display
more fractional digits. Maybe there could be a second option to display as both
float and hex?

Reviewed-by: Pierre Moreau <pierre.mor...@free.fr>

On 2017-11-14 — 15:11, Tobias Klausmann wrote:

Printing without this could lead to the following output, while the values are
not exactly zero:
IMM[5] FLT32 {0., 0., 0., 0.}
IMM[6] FLT32 {0., 0., 0., 0.}
IMM[7] FLT32 {0., 0., 0., 0.}

when printing the values as hex, we can now see the differences:
IMM[5] FLT32 {0x0019, 0x000f, 0x0005, 0x001e}
IMM[6] FLT32 {0x001e, 0x0005, 0x000a, 0x0014}
IMM[7] FLT32 {0x0014, 0x000a, 0x000f, 0x0019}

Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
---
  src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index 34351dab51..898031811d 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -1095,7 +1095,7 @@ Source::Source(struct nv50_ir_prog_info *prog) : 
info(prog)
 tokens = (const struct tgsi_token *)info->bin.source;
  
 if (prog->dbgFlags & NV50_IR_DEBUG_BASIC)

-  tgsi_dump(tokens, 0);
+  tgsi_dump(tokens, TGSI_DUMP_FLOAT_AS_HEX);
  }
  
  Source::~Source()

--
2.15.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH] nouveau/codegen: dump tgsi floats as hex values

2017-11-15 Thread Tobias Klausmann


ping!

On 11/14/17 3:11 PM, Tobias Klausmann wrote:

Printing without this could lead to the following output, while the values are
not exactly zero:
IMM[5] FLT32 {0., 0., 0., 0.}
IMM[6] FLT32 {0., 0., 0., 0.}
IMM[7] FLT32 {0., 0., 0., 0.}

when printing the values as hex, we can now see the differences:
IMM[5] FLT32 {0x0019, 0x000f, 0x0005, 0x001e}
IMM[6] FLT32 {0x001e, 0x0005, 0x000a, 0x0014}
IMM[7] FLT32 {0x0014, 0x000a, 0x000f, 0x0019}

Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
---
  src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index 34351dab51..898031811d 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -1095,7 +1095,7 @@ Source::Source(struct nv50_ir_prog_info *prog) : 
info(prog)
 tokens = (const struct tgsi_token *)info->bin.source;
  
 if (prog->dbgFlags & NV50_IR_DEBUG_BASIC)

-  tgsi_dump(tokens, 0);
+  tgsi_dump(tokens, TGSI_DUMP_FLOAT_AS_HEX);
  }
  
  Source::~Source()

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [RFC PATCH] nouveau/compiler: Allow to omit line numbers when printing instructions

2017-11-14 Thread Tobias Klausmann

This comes in handy when checking "NV50_PROG_DEBUG=1" outputs with diff!

Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir.cpp|  6 +++---
 src/gallium/drivers/nouveau/codegen/nv50_ir.h  |  2 +-
 src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h   |  1 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp  | 12 
 src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp |  2 +-
 src/gallium/drivers/nouveau/nouveau_compiler.c |  8 ++--
 src/gallium/drivers/nouveau/nv50/nv50_program.c|  1 +
 src/gallium/drivers/nouveau/nvc0/nvc0_program.c|  1 +
 8 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
index e9363101bf..4bf6c73837 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
@@ -1249,7 +1249,7 @@ nv50_ir_generate_code(struct nv50_ir_prog_info *info)
if (ret < 0)
   goto out;
if (prog->dbgFlags & NV50_IR_DEBUG_VERBOSE)
-  prog->print();
+  prog->print(info->omitLineNum);
 
targ->parseDriverInfo(info);
prog->getTarget()->runLegalizePass(prog, nv50_ir::CG_STAGE_PRE_SSA);
@@ -1257,13 +1257,13 @@ nv50_ir_generate_code(struct nv50_ir_prog_info *info)
prog->convertToSSA();
 
if (prog->dbgFlags & NV50_IR_DEBUG_VERBOSE)
-  prog->print();
+  prog->print(info->omitLineNum);
 
prog->optimizeSSA(info->optLevel);
prog->getTarget()->runLegalizePass(prog, nv50_ir::CG_STAGE_SSA);
 
if (prog->dbgFlags & NV50_IR_DEBUG_BASIC)
-  prog->print();
+  prog->print(info->omitLineNum);
 
if (!prog->registerAllocation()) {
   ret = -4;
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
index f2ce16d882..a3c7fd2f94 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
@@ -1249,7 +1249,7 @@ public:
Program(Type type, Target *targ);
~Program();
 
-   void print();
+   void print(bool omitLineNum);
 
Type getType() const { return progType; }
 
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
index ffd53c9cd3..604a22ba89 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
@@ -82,6 +82,7 @@ struct nv50_ir_prog_info
 
uint8_t optLevel; /* optimization level (0 to 3) */
uint8_t dbgFlags;
+   bool omitLineNum;
 
struct {
   int16_t maxGPR; /* may be -1 if none used */
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
index f5253b3745..a42fb44940 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
@@ -689,7 +689,7 @@ void Instruction::print() const
 class PrintPass : public Pass
 {
 public:
-   PrintPass() : serial(0) { }
+   PrintPass(bool omitLineNum = false) : serial(0), omit_serial(omitLineNum) { 
}
 
virtual bool visit(Function *);
virtual bool visit(BasicBlock *);
@@ -697,6 +697,7 @@ public:
 
 private:
int serial;
+   bool omit_serial;
 };
 
 bool
@@ -760,7 +761,10 @@ PrintPass::visit(BasicBlock *bb)
 bool
 PrintPass::visit(Instruction *insn)
 {
-   INFO("%3i: ", serial++);
+   if (omit_serial)
+  INFO("   ");
+   else
+  INFO("%3i: ", serial++);
insn->print();
return true;
 }
@@ -773,9 +777,9 @@ Function::print()
 }
 
 void
-Program::print()
+Program::print(bool omitLineNum)
 {
-   PrintPass pass;
+   PrintPass pass(omitLineNum);
init_colours();
pass.run(this, true, false);
 }
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
index 298e7c6ef9..96ad70d28a 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
@@ -371,7 +371,7 @@ Program::emitBinary(struct nv50_ir_prog_info *info)
emit->prepareEmission(this);
 
if (dbgFlags & NV50_IR_DEBUG_BASIC)
-  this->print();
+  this->print(info->omitLineNum);
 
if (!binSize) {
   code = NULL;
diff --git a/src/gallium/drivers/nouveau/nouveau_compiler.c 
b/src/gallium/drivers/nouveau/nouveau_compiler.c
index 3151a6f420..ed68031383 100644
--- a/src/gallium/drivers/nouveau/nouveau_compiler.c
+++ b/src/gallium/drivers/nouveau/nouveau_compiler.c
@@ -103,7 +103,7 @@ dummy_assign_slots(struct nv50_ir_prog_info *info)
 
 static int
 nouveau_codegen(int chipset, int type, struct tgsi_token tokens[],
-unsigned *size, unsigned **code) {

Re: [Nouveau] [PATCH] Accept 3d controllers and not only VGA controllers.

2017-11-10 Thread Tobias Klausmann



On 11/10/17 7:49 PM, Josef Larsson wrote:

Accept 3d controllers and not only VGA controllers. According to Ilia
Mirkin,
the VGA controller check should be removed. This makes it possible
to use external connectors on a docking station (40A5) for a Thinkpad P51.
(See Bug 101778).

lspci example:

01:00.0 3D controller: NVIDIA Corporation GM107GLM [Quadro M1200 Mobile]
(rev a2)

Also include safe-guards to avoid NULL dereferencing of fbcon, which is
how this bug was found.
---
  drivers/gpu/drm/nouveau/nouveau_fbcon.c |  3 +--
  drivers/gpu/drm/nouveau/nv50_display.c  | 13 +
  2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_fbcon.c
b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
index 2b12d82aac15..6b4d374a9d82 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fbcon.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
@@ -498,8 +498,7 @@ nouveau_fbcon_init(struct drm_device *dev)
  int preferred_bpp;
  int ret;
  
-    if (!dev->mode_config.num_crtc ||

-        (dev->pdev->class >> 8) != PCI_CLASS_DISPLAY_VGA)
+    if (!dev->mode_config.num_crtc)
      return 0;
  
  fbcon = kzalloc(sizeof(struct nouveau_fbdev), GFP_KERNEL);

diff --git a/drivers/gpu/drm/nouveau/nv50_display.c
b/drivers/gpu/drm/nouveau/nv50_display.c
index fb47d46050ec..061daf036407 100644
--- a/drivers/gpu/drm/nouveau/nv50_display.c
+++ b/drivers/gpu/drm/nouveau/nv50_display.c
@@ -3214,6 +3214,13 @@ nv50_mstm_destroy_connector(struct
drm_dp_mst_topology_mgr *mgr,
  struct nouveau_drm *drm = nouveau_drm(connector->dev);
  struct nv50_mstc *mstc = nv50_mstc(connector);
  
+    if (!drm->fbcon)

+    {
+        NV_WARN(drm, "drm->fbcon of %s point to NULL. Will not destroy
connector\n",
+            connector->name);
+        return;
+    }
+
  drm_connector_unregister(>connector);
  
  drm_modeset_lock_all(drm->dev);

@@ -3229,6 +3236,12 @@ nv50_mstm_register_connector(struct drm_connector
*connector)
  {
  struct nouveau_drm *drm = nouveau_drm(connector->dev);
  
+    if (!drm->fbcon)

+    {
+        NV_WARN(drm, "drm->fbcon of %s point to NULL. Will not register
connector\n",
+            connector->name);
+        return;
+    }
  drm_modeset_lock_all(drm->dev);
  drm_fb_helper_add_one_connector(>fbcon->helper, connector);
  drm_modeset_unlock_all(drm->dev);




Hi,

the patch looks OK to me, yet as noted in IRC, i'd like to have this 
patch split into two and have the ->fbcon check as a precondition to the 
3D Controller part. But lets see what the other and more clever people 
think about it! :)


Greetings,

Tobias

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] Nouveau: kernel hang on Optimus+Intel+NVidia GeForce 1060m

2017-09-13 Thread Tobias Klausmann

Hi,

the system fails to initialize your vbios using secureboot (i had a rare
chance to on my system to witness it again), for now i traced it to
acr_boot_falcon() in
"linux/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0148cdec.c" where it
throws -110 which is -ETIMEDOUT. You could try to increase the timeout
and see if it helps something, similar to the following:


diff --git a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c
b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c
index 77273b53672c..fc0cb187d80d 100644
--- a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c
+++ b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c
@@ -326,7 +326,7 @@ nvkm_msgqueue_post(struct nvkm_msgqueue *priv, enum
msgqueue_msg_priority prio,
    int ret;
 
    if (wait_init && !wait_for_completion_timeout(>init_done,
-    msecs_to_jiffies(1000)))
+    msecs_to_jiffies(5000)))
    return -ETIMEDOUT;
 
    queue = priv->func->cmd_queue(priv, prio);

diff --git a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c
b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c
index fec0273158f6..c2ae525a0780 100644
--- a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c
+++ b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c
@@ -279,6 +279,7 @@ acr_boot_falcon(struct nvkm_msgqueue *priv, enum
nvkm_secboot_falcon falcon)
    u32 flags;
    u32 falcon_id;
    } cmd;
+   const struct nvkm_subdev *subdev = priv->falcon->owner;
 
    memset(, 0, sizeof(cmd));
 
@@ -290,7 +291,8 @@ acr_boot_falcon(struct nvkm_msgqueue *priv, enum
nvkm_secboot_falcon falcon)
    nvkm_msgqueue_post(priv, MSGQUEUE_MSG_PRIORITY_HIGH, ,
    acr_boot_falcon_callback, , true);
 
-   if (!wait_for_completion_timeout(,
msecs_to_jiffies(1000)))
+   nvkm_error(subdev, "waiting for timeout in acr_boot_falcon
(msgqueue_0137bca5)\n");
+   if (!wait_for_completion_timeout(,
msecs_to_jiffies(5000)))
    return -ETIMEDOUT;
 
    return 0;



On 9/13/17 11:37 AM, Nicolas Mercier wrote:
> I am still looking for a solution. I have hacked around in the code
> and found out the following:
> - Nouveau prefers using PCIE power managemet over ACPI Optimus calls.
> I tried to force it to use Optimus ACPI calls, but there was an error
> calling the ACPI method so it bails out and uses PCIE PM anyway.
> - I tried to debug the PCIE pm states which internally uses ACPI to
> turn power on/off. I could print different statuses here and there.
> When the power is switched off, ACPI calls turn the power off then the
> kernel successfully puts the device in state D3Cold (also turning off
> power to the PCI Express port). When waking up, ACPI turns the power
> on, apparently successfully (Device [PEGP] transitioned to D0). But a
> read from the PCI bus to get the power state & other flags return
> 65535 (~0) and the kernel fails to set the device in D0 (although ACPI
> claims it is in D0)
> The call to pci_raw_set_power_state (in drivers/pci/pci.c) seems to
> fail because pci_read_config_word returns "~0" (and does not return
> any error code)
>
> I have tried different things; if I use pcie_port_pm=off, the NVidia
> card goes to state D3Hot (if I am not mistaken, its PCIE port is still
> powered) but that did not fix it. I tried to turn on or off different
> PCI/PCIexpress features such as hotplug, PM and so on. The only thing
> that works is that PM is fully disabled, which equals to the device
> not being powered off, so that would be equivalent to nouveau.runpm=0,
> which is not helping a lot. I have tried to force pcie aspm by
> recompiling the ACPI table, still no luck.
>
> I am still taking a look, but it seems like the problem comes from the
> PCIExpress PM functions and ACPI, not directly from Nouveau
>
> /n
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] Nouveau: kernel hang on Optimus+Intel+NVidia GeForce 1060m

2017-09-11 Thread Tobias Klausmann

Hi,

i remember seeing the same error with earlier firmware version with a
similar system (GP106) once in a while on boot, yet it does not happen
with newer versions. Maybe you could try to update the firmware to the
latest version from kernel-firmware.

As a small addition: I remember deeply: you should ignore the
_OSI(Linux) query, as it may break the system in some ways, if you don't
have a specific bug fixed with adding _OSI(Linux), removing it from the
cmdline is a thing to test!

Greetings,

Tobias


On 9/11/17 4:54 PM, Nicolas Mercier wrote:
> Hi,
> I have an Optimus-enabled laptop with a GTX 1060m. I never got it to
> fully work with Nouveau even after Pascal support was added. I need to
> run the kernel with nouveau.runpm=0 to get it to work. Unfortunately
> without proper power mangement support, my laptop will run out of
> battery after about 1h30, so I'd love to get Optimus working.
>
> What I can see is that when the extra GPU is not in use, Nouveau will
> try to shut it off. This seems to work, as the indicator of the laptop
> changes to amber (discrete GPU in use) to blue (discrete GPU powered
> off), and the kernel log (attached) reports that the GPU went off.
>
> If I wake up the GPU (plugging in/out the external monitor, running a
> GL command with DRI_PRIME=1, or even lspci) will make the computer
> unresponsive and only a force shutdown will work, as no graphics
> command seems to be able to execute anymore. Nouveau (likely, vga
> switcheroo) tries to wake up the GPU but it seems to fail. The LED
> indicator does indicate the GPU has power though.
>
> Most of the time (but not often when I have an external monitor
> plugged into the discrete card before I boot the computer) I get a
> timeout during boot or during modprobe (see kernel log). Even with
> runpm=0, I can't seem to be able to run GL commands on it:
>
> yngwe@labarbara: % DRI_PRIME=1 glxinfo
> name of display: :0.0
> nvc0_screen_create:857 - Error allocating PGRAPH context for M2MF: -16
> libGL error: failed to create dri screen
> libGL error: failed to load driver: nouveau
> display: :0  screen: 0
> direct rendering: Yes
> ... follows up using the Intel card ...
>
> the error I get in the kernel in that case:
> [  201.612583] nouveau :01:00.0: gr: FECS falcon already acquired
> by gr!
> [  201.612586] nouveau :01:00.0: gr: init failed, -16
>
> It runs on Debian testing + the firmware from Ubuntu as Debian's
> firmware does not have the nvidia blobs, kernel was either 4.12/4.13
> release candidates from Debian repositories or kernel 4.13.1 self
> compiled. I always had exactly the same symptoms on all these kernels.
>
> /nicolas
>
>
> ___
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH] nv50/ra: Only increment DefValue counter if we are going to spill

2017-08-19 Thread Tobias Klausmann

This is in preparation of an upcoming patch changing how we keep track of the
defs.

Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
index e4f38c8e46..5034f8f989 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
@@ -1750,8 +1750,7 @@ SpillCodeInserter::run(const std::list& lst)
   // multiple destinations that all need to be spilled (like OP_SPLIT).
   unordered_set to_del;
 
-  for (Value::DefIterator d = lval->defs.begin(); d != lval->defs.end();
-   ++d) {
+  for (Value::DefIterator d = lval->defs.begin(); d != lval->defs.end();) {
  Value *slot = mem ?
 static_cast(mem) : new_LValue(func, FILE_GPR);
  Value *tmp = NULL;
@@ -1787,13 +1786,13 @@ SpillCodeInserter::run(const std::list& lst)
  assert(defi);
  if (defi->isPseudo()) {
 d = lval->defs.erase(d);
---d;
 if (slot->reg.file == FILE_MEMORY_LOCAL)
to_del.insert(defi);
 else
defi->setDef(0, slot);
  } else {
 spill(defi, slot, dval);
+d++;
  }
   }
 
-- 
2.14.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH] nvc0: fix handling of inverted render condition

2017-08-17 Thread Tobias Klausmann

Wether we wait on an inverted rendering condition or not, we should not render
on a passed query.

This fixes the CTS test case 'KHR-GL45.conditional_render_inverted.functional'.

Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
---
 src/gallium/drivers/nouveau/nvc0/nvc0_query.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query.c
index e92695bd6a..e6c7d5a3ad 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query.c
@@ -132,7 +132,7 @@ nvc0_render_condition(struct pipe_context *pipe,
 else
cond = NVC0_3D_COND_MODE_RES_NON_ZERO;
  } else {
-cond = wait ? NVC0_3D_COND_MODE_EQUAL : NVC0_3D_COND_MODE_ALWAYS;
+cond = NVC0_3D_COND_MODE_EQUAL;
  }
  break;
   default:
-- 
2.14.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH v2] nvc0/ir: propagate immediates to CALL input MOVs

2017-08-16 Thread Tobias Klausmann

ping on this v2

On 8/13/17 3:02 AM, Tobias Klausmann wrote:
> On using builtin functions we have to move the input to registers $0 and $1, 
> if
> one of the input value is an immediate, we fail to propagate the immediate:
>
> ...
> mov u32 $r477 0x0003 (0)
> ...
> mov u32 $r0 %r473 (0)
> mov u32 $r1 $r477 (0)
> call abs BUILTIN:0 (0)
> mov u32 %r495 $r1 (0)
> ...
>
> With this patch the immediate is propagated, potentially causing the first MOV
> to be superfluous, which we'd remove in that case:
>
> ...
>
> mov u32 $r0 %r473 (0)
> mov u32 $r1 0x0003 (0)
> call abs BUILTIN:0 (0)
> mov u32 %r495 $r1 (0)
> ...
>
> Shaderdb stats:
> total instructions in shared programs : 4893460 -> 4893324 (-0.00%)
> total gprs used in shared programs: 582972 -> 582881 (-0.02%)
> total local used in shared programs   : 17960 -> 17960 (0.00%)
>
> localgpr   inst  bytes
> helped   0  91 112 112
>   hurt   0   0   0   0
>
> v2:
>  implement some changes proposed by imirkin, the manual deletion of the dead
>  mov is necessary after ea22ac23e0 ("nvc0/ir: unlink values pre- and post-call
>  to division function") as the potentially dead mov is unlinked properly,
>  causing later passes to not notice the mov op at all and thus not cleaning it
>  up. That makes up a big chunk of the regression the above commit caused.
>  Keep the deletion of the op where it is, deleting it later unnecessarily 
> blows
>  up size of the change.
>
> Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
> ---
>  .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp   | 21 
> +++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> index c8f0701572..7243b1d2e4 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> @@ -47,8 +47,25 @@ NVC0LegalizeSSA::handleDIV(Instruction *i)
> int builtin;
>  
> bld.setPosition(i, false);
> -   bld.mkMovToReg(0, i->getSrc(0));
> -   bld.mkMovToReg(1, i->getSrc(1));
> +
> +   // Generate movs to the input regs for the call we want to generate
> +   for (int s = 0; i->srcExists(s); ++s) {
> +  Instruction *ld = i->getSrc(s)->getInsn();
> +  assert(ld->getSrc(0) != NULL);
> +  // check if we are moving an immediate, propagate it in that case
> +  if (!ld || ld->fixed || (ld->op != OP_LOAD && ld->op != OP_MOV) ||
> +!(ld->src(0).getFile() == FILE_IMMEDIATE))
> + bld.mkMovToReg(s, i->getSrc(s));
> +  else {
> + bld.mkMovToReg(s, ld->getSrc(0));
> + // Clear the src, to make code elimination possible here before we
> + // delete the instruction i later
> + i->setSrc(s, NULL);
> + if (ld->isDead())
> +delete_Instruction(prog, ld);
> +  }
> +   }
> +
> switch (i->dType) {
> case TYPE_U32: builtin = NVC0_BUILTIN_DIV_U32; break;
> case TYPE_S32: builtin = NVC0_BUILTIN_DIV_S32; break;
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH v2] nvc0/ir: propagate immediates to CALL input MOVs

2017-08-12 Thread Tobias Klausmann

On using builtin functions we have to move the input to registers $0 and $1, if
one of the input value is an immediate, we fail to propagate the immediate:

...
mov u32 $r477 0x0003 (0)
...
mov u32 $r0 %r473 (0)
mov u32 $r1 $r477 (0)
call abs BUILTIN:0 (0)
mov u32 %r495 $r1 (0)
...

With this patch the immediate is propagated, potentially causing the first MOV
to be superfluous, which we'd remove in that case:

...

mov u32 $r0 %r473 (0)
mov u32 $r1 0x0003 (0)
call abs BUILTIN:0 (0)
mov u32 %r495 $r1 (0)
...

Shaderdb stats:
total instructions in shared programs : 4893460 -> 4893324 (-0.00%)
total gprs used in shared programs: 582972 -> 582881 (-0.02%)
total local used in shared programs   : 17960 -> 17960 (0.00%)

localgpr   inst  bytes
helped   0  91 112 112
  hurt   0   0   0   0

v2:
 implement some changes proposed by imirkin, the manual deletion of the dead
 mov is necessary after ea22ac23e0 ("nvc0/ir: unlink values pre- and post-call
 to division function") as the potentially dead mov is unlinked properly,
 causing later passes to not notice the mov op at all and thus not cleaning it
 up. That makes up a big chunk of the regression the above commit caused.
 Keep the deletion of the op where it is, deleting it later unnecessarily blows
 up size of the change.

Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
---
 .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp   | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
index c8f0701572..7243b1d2e4 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
@@ -47,8 +47,25 @@ NVC0LegalizeSSA::handleDIV(Instruction *i)
int builtin;
 
bld.setPosition(i, false);
-   bld.mkMovToReg(0, i->getSrc(0));
-   bld.mkMovToReg(1, i->getSrc(1));
+
+   // Generate movs to the input regs for the call we want to generate
+   for (int s = 0; i->srcExists(s); ++s) {
+  Instruction *ld = i->getSrc(s)->getInsn();
+  assert(ld->getSrc(0) != NULL);
+  // check if we are moving an immediate, propagate it in that case
+  if (!ld || ld->fixed || (ld->op != OP_LOAD && ld->op != OP_MOV) ||
+!(ld->src(0).getFile() == FILE_IMMEDIATE))
+ bld.mkMovToReg(s, i->getSrc(s));
+  else {
+ bld.mkMovToReg(s, ld->getSrc(0));
+ // Clear the src, to make code elimination possible here before we
+ // delete the instruction i later
+ i->setSrc(s, NULL);
+ if (ld->isDead())
+delete_Instruction(prog, ld);
+  }
+   }
+
switch (i->dType) {
case TYPE_U32: builtin = NVC0_BUILTIN_DIV_U32; break;
case TYPE_S32: builtin = NVC0_BUILTIN_DIV_S32; break;
-- 
2.14.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH] nvc0/ir: propagate immediates to CALL input MOVs

2017-08-12 Thread Tobias Klausmann


On 8/12/17 10:20 PM, Ilia Mirkin wrote:
> On Sat, Aug 12, 2017 at 3:33 PM, Tobias Klausmann
> <tobias.johannes.klausm...@mni.thm.de> wrote:
>> On using builtin functions we have to move the input to registers $0 and $1, 
>> if
>> one of the input value is an immediate, we fail to propagate the immediate:
>>
>> ...
>> mov u32 $r477 0x0003 (0)
>> ...
>> mov u32 $r0 %r473 (0)
>> mov u32 $r1 $r477 (0)
>> call abs BUILTIN:0 (0)
>> mov u32 %r495 $r1 (0)
>> ...
>>
>> With this patch the immediate is propagated, potentially causing the first 
>> MOV
>> to be superfluous, which we'd remove in that case:
>>
>> ...
>>
>> mov u32 $r0 %r473 (0)
>> mov u32 $r1 0x0003 (0)
>> call abs BUILTIN:0 (0)
>> mov u32 %r495 $r1 (0)
>> ...
>>
>> Shaderdb stats:
>> total instructions in shared programs : 4893460 -> 4893324 (-0.00%)
>> total gprs used in shared programs: 582972 -> 582881 (-0.02%)
>> total local used in shared programs   : 17960 -> 17960 (0.00%)
>>
>>     localgpr   inst  bytes
>> helped   0  91 112 112
>>   hurt   0   0   0   0
>>
>> Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
>> ---
>>  .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp   | 21 
>> +++--
>>  1 file changed, 19 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp 
>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
>> index c8f0701572..861d08af24 100644
>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
>> @@ -47,8 +47,25 @@ NVC0LegalizeSSA::handleDIV(Instruction *i)
>> int builtin;
>>
>> bld.setPosition(i, false);
>> -   bld.mkMovToReg(0, i->getSrc(0));
>> -   bld.mkMovToReg(1, i->getSrc(1));
>> +
>> +   // Generate movs to the input regs for the call we want to generate
>> +   for (int s = 0; i->srcExists(s); ++s) {
>> +  Instruction *ld = i->getSrc(s)->getInsn();
>> +  ImmediateValue imm;
>> +  // check if we are moving an immediate, propagate it in that case
>> +  if (!ld || ld->fixed || (ld->op != OP_LOAD && ld->op != OP_MOV) ||
>> +!ld->src(0).getImmediate(imm))
> At this point you don't even have to use getImmediate - you can just
> look at ld->src(0).getFile() == FILE_IMMEDIATE.

That was actually the fallback to the before non-working getImmediate()
and is easily doable if favored.


>
> Normally you'd just do i->src(s).getImmediate(imm) and moved on with
> life. But you kinda need the ld here, which is annoying. Perhaps you
> can just drop the manual deletion of the op here... which would let
> you do the much simpler thing. Can you see if there's any effect from
> that?


Will do!


>> + bld.mkMovToReg(s, i->getSrc(s));
>> +  else {
>> + bld.mkMovToReg(s, ld->getSrc(0));
>> + // Clear the src, to make code elimination possible here before we
>> + // delete the instruction i later
>> + i->setSrc(s, NULL);
> i gets deleted later on. move the deletion of ld after that happens?


this would cause more indirection (saving of the insn for later), not
sure if that makes the code more readable or if this clear is more
straight forward. As you see i'd go with the clear, but if you really
want i can add the extra save and delete.


>
>> + if (ld->getDef(0)->refCount() == 0)
> ld->isDead()


ok


>
>> +delete_Instruction(prog, ld);
>> +  }
>> +   }
>> +
>> switch (i->dType) {
>> case TYPE_U32: builtin = NVC0_BUILTIN_DIV_U32; break;
>> case TYPE_S32: builtin = NVC0_BUILTIN_DIV_S32; break;
>> --
>> 2.14.0
>>
>> ___
>> Nouveau mailing list
>> Nouveau@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/nouveau
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH] nvc0/ir: propagate immediates to CALL input MOVs

2017-08-12 Thread Tobias Klausmann

On using builtin functions we have to move the input to registers $0 and $1, if
one of the input value is an immediate, we fail to propagate the immediate:

...
mov u32 $r477 0x0003 (0)
...
mov u32 $r0 %r473 (0)
mov u32 $r1 $r477 (0)
call abs BUILTIN:0 (0)
mov u32 %r495 $r1 (0)
...

With this patch the immediate is propagated, potentially causing the first MOV
to be superfluous, which we'd remove in that case:

...

mov u32 $r0 %r473 (0)
mov u32 $r1 0x0003 (0)
call abs BUILTIN:0 (0)
mov u32 %r495 $r1 (0)
...

Shaderdb stats:
total instructions in shared programs : 4893460 -> 4893324 (-0.00%)
total gprs used in shared programs: 582972 -> 582881 (-0.02%)
total local used in shared programs   : 17960 -> 17960 (0.00%)

localgpr   inst  bytes
helped   0  91 112 112
  hurt   0   0   0   0

Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
---
 .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp   | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
index c8f0701572..861d08af24 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
@@ -47,8 +47,25 @@ NVC0LegalizeSSA::handleDIV(Instruction *i)
int builtin;
 
bld.setPosition(i, false);
-   bld.mkMovToReg(0, i->getSrc(0));
-   bld.mkMovToReg(1, i->getSrc(1));
+
+   // Generate movs to the input regs for the call we want to generate
+   for (int s = 0; i->srcExists(s); ++s) {
+  Instruction *ld = i->getSrc(s)->getInsn();
+  ImmediateValue imm;
+  // check if we are moving an immediate, propagate it in that case
+  if (!ld || ld->fixed || (ld->op != OP_LOAD && ld->op != OP_MOV) ||
+!ld->src(0).getImmediate(imm))
+ bld.mkMovToReg(s, i->getSrc(s));
+  else {
+ bld.mkMovToReg(s, ld->getSrc(0));
+ // Clear the src, to make code elimination possible here before we
+ // delete the instruction i later
+ i->setSrc(s, NULL);
+ if (ld->getDef(0)->refCount() == 0)
+delete_Instruction(prog, ld);
+  }
+   }
+
switch (i->dType) {
case TYPE_U32: builtin = NVC0_BUILTIN_DIV_U32; break;
case TYPE_S32: builtin = NVC0_BUILTIN_DIV_S32; break;
-- 
2.14.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 17/29] drm/nouveau: switch to drm_*{get, put} helpers

2017-08-03 Thread Tobias Klausmann

Looks good to me!

Reviewed-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>


On 8/3/17 1:58 PM, Cihangir Akturk wrote:
> drm_*_reference() and drm_*_unreference() functions are just
> compatibility alias for drm_*_get() and drm_*_put() adn should not be
> used by new code. So convert all users of compatibility functions to use
> the new APIs.
>
> Signed-off-by: Cihangir Akturk <cakt...@gmail.com>
> ---
>  drivers/gpu/drm/nouveau/dispnv04/crtc.c   |  2 +-
>  drivers/gpu/drm/nouveau/nouveau_abi16.c   |  2 +-
>  drivers/gpu/drm/nouveau/nouveau_display.c |  8 
>  drivers/gpu/drm/nouveau/nouveau_fbcon.c   |  2 +-
>  drivers/gpu/drm/nouveau/nouveau_gem.c | 14 +++---
>  drivers/gpu/drm/nouveau/nv50_display.c|  2 +-
>  6 files changed, 15 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/dispnv04/crtc.c 
> b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
> index 4b4b0b4..18b4be1 100644
> --- a/drivers/gpu/drm/nouveau/dispnv04/crtc.c
> +++ b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
> @@ -1019,7 +1019,7 @@ nv04_crtc_cursor_set(struct drm_crtc *crtc, struct 
> drm_file *file_priv,
>   nv_crtc->cursor.set_offset(nv_crtc, nv_crtc->cursor.offset);
>   nv_crtc->cursor.show(nv_crtc, true);
>  out:
> - drm_gem_object_unreference_unlocked(gem);
> + drm_gem_object_put_unlocked(gem);
>   return ret;
>  }
>  
> diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.c 
> b/drivers/gpu/drm/nouveau/nouveau_abi16.c
> index f98f800..3e9db5a 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_abi16.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_abi16.c
> @@ -136,7 +136,7 @@ nouveau_abi16_chan_fini(struct nouveau_abi16 *abi16,
>   if (chan->ntfy) {
>   nouveau_bo_vma_del(chan->ntfy, >ntfy_vma);
>   nouveau_bo_unpin(chan->ntfy);
> - drm_gem_object_unreference_unlocked(>ntfy->gem);
> + drm_gem_object_put_unlocked(>ntfy->gem);
>   }
>  
>   if (chan->heap.block_size)
> diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
> b/drivers/gpu/drm/nouveau/nouveau_display.c
> index 8d1df56..a68fe1a 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_display.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_display.c
> @@ -206,7 +206,7 @@ nouveau_user_framebuffer_destroy(struct drm_framebuffer 
> *drm_fb)
>   struct nouveau_framebuffer *fb = nouveau_framebuffer(drm_fb);
>  
>   if (fb->nvbo)
> - drm_gem_object_unreference_unlocked(>nvbo->gem);
> + drm_gem_object_put_unlocked(>nvbo->gem);
>  
>   drm_framebuffer_cleanup(drm_fb);
>   kfree(fb);
> @@ -267,7 +267,7 @@ nouveau_user_framebuffer_create(struct drm_device *dev,
>   if (ret == 0)
>   return >base;
>  
> - drm_gem_object_unreference_unlocked(gem);
> + drm_gem_object_put_unlocked(gem);
>   return ERR_PTR(ret);
>  }
>  
> @@ -947,7 +947,7 @@ nouveau_display_dumb_create(struct drm_file *file_priv, 
> struct drm_device *dev,
>   return ret;
>  
>   ret = drm_gem_handle_create(file_priv, >gem, >handle);
> - drm_gem_object_unreference_unlocked(>gem);
> + drm_gem_object_put_unlocked(>gem);
>   return ret;
>  }
>  
> @@ -962,7 +962,7 @@ nouveau_display_dumb_map_offset(struct drm_file 
> *file_priv,
>   if (gem) {
>   struct nouveau_bo *bo = nouveau_gem_object(gem);
>   *poffset = drm_vma_node_offset_addr(>bo.vma_node);
> - drm_gem_object_unreference_unlocked(gem);
> + drm_gem_object_put_unlocked(gem);
>   return 0;
>   }
>  
> diff --git a/drivers/gpu/drm/nouveau/nouveau_fbcon.c 
> b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
> index 2665a07..6c9e1ec 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_fbcon.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
> @@ -451,7 +451,7 @@ nouveau_fbcon_destroy(struct drm_device *dev, struct 
> nouveau_fbdev *fbcon)
>   nouveau_bo_vma_del(nouveau_fb->nvbo, _fb->vma);
>   nouveau_bo_unmap(nouveau_fb->nvbo);
>   nouveau_bo_unpin(nouveau_fb->nvbo);
> - drm_framebuffer_unreference(_fb->base);
> + drm_framebuffer_put(_fb->base);
>   }
>  
>   return 0;
> diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c 
> b/drivers/gpu/drm/nouveau/nouveau_gem.c
> index 2170534..653425c 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_gem.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
> @@ -281,7 +281,7 @@ nouveau_gem_ioctl_new(struct drm_device *dev, void *data,
>   }
>  
>   /* drop reference from allocate -

[Nouveau] [RFC PATCH v2] nv50/ir: allow spilling of def values for constrained MERGES/UNIONS

2017-08-01 Thread Tobias Klausmann

This lets us spill more values and compile a Civilization 6 shader with many
local vars. As a precation, only spill those vars as a fallback option if all
other spillable vars are already spilled!

shader-db run shows:
total instructions in shared programs : 4427020 -> 4427388 (0.01%)
total gprs used in shared programs: 522836 -> 522871 (0.01%)
total local used in shared programs   : 17128 -> 17464 (1.96%)

localgpr   inst  bytes
helped   0   0   0   0
  hurt   0   0   0   0

The additional instructions (+368) gprs (+35) and local (+336) are contained in
the Civilization 6 shader:

90.shader_test - type: 0, local: 336, gpr: 35, inst: 368, bytes: 3928

Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir.cpp|  2 ++
 src/gallium/drivers/nouveau/codegen/nv50_ir.h  |  3 ++
 src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 39 --
 3 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
index 08181b790f..2fa8c22e33 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
@@ -233,6 +233,7 @@ LValue::LValue(Function *fn, DataFile file)
ssa = 0;
fixedReg = 0;
noSpill = 0;
+   softNoSpill = 0;
 
fn->add(this, this->id);
 }
@@ -250,6 +251,7 @@ LValue::LValue(Function *fn, LValue *lval)
ssa = 0;
fixedReg = 0;
noSpill = 0;
+   softNoSpill = 0;
 
fn->add(this, this->id);
 }
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
index bc15992df0..ca5bcb5362 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
@@ -704,6 +704,9 @@ public:
unsigned ssa  : 1;
unsigned fixedReg : 1; // set & used by RA, earlier just use (id < 0)
unsigned noSpill  : 1; // do not spill (e.g. if spill temporary already)
+   unsigned softNoSpill : 1; /* only spill these values if all other values are
+  * spilled already!
+  */
 };
 
 class Symbol : public Value
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
index b33d7b4010..9d70ec3c9c 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
@@ -769,7 +769,7 @@ private:
bool coalesce(ArrayList&);
bool doCoalesce(ArrayList&, unsigned int mask);
void calculateSpillWeights();
-   bool simplify();
+   bool simplify(bool useSoftNoSpill);
bool selectRegisters();
void cleanup(const bool success);
 
@@ -1242,7 +1242,7 @@ GCRA::calculateSpillWeights()
   }
   LValue *val = nodes[i].getValue();
 
-  if (!val->noSpill) {
+  if (!val->noSpill || val->softNoSpill) {
  int rc = 0;
  for (Value::DefIterator it = val->defs.begin();
   it != val->defs.end();
@@ -1304,7 +1304,7 @@ GCRA::simplifyNode(RIG_Node *node)
 }
 
 bool
-GCRA::simplify()
+GCRA::simplify(bool useSoftNoSpill)
 {
for (;;) {
   if (!DLLIST_EMPTY([0])) {
@@ -1317,17 +1317,32 @@ GCRA::simplify()
   } else
   if (!DLLIST_EMPTY()) {
  RIG_Node *best = hi.next;
- float bestScore = best->weight / (float)best->degree;
+ bool spillable = false;
+ if (best->getValue()->noSpill && best->getValue()->softNoSpill &&
+   useSoftNoSpill)
+spillable = true;
+ float bestScore = INFINITY;
+ if (!best->getValue()->noSpill || spillable)
+  bestScore = best->weight / (float)best->degree;
  // spill candidate
  for (RIG_Node *it = best->next; it !=  it = it->next) {
-float score = it->weight / (float)it->degree;
+float score = INFINITY;
+bool spillable = false;
+if (it->getValue()->noSpill && it->getValue()->softNoSpill &&
+  useSoftNoSpill)
+   spillable = true;
+if (!it->getValue()->noSpill || spillable) {
+   score = it->weight / (float)it->degree;
+}
 if (score < bestScore) {
best = it;
bestScore = score;
 }
  }
  if (isinf(bestScore)) {
-ERROR("no viable spill candidates left\n");
+if (useSoftNoSpill)
+   ERROR("no viable spill candidates left\n");
+
 return false;
  }
  simplifyNode(best);
@@ -1491,9 +1506,12 @@ GCRA::allocateRegisters(ArrayList&

[Nouveau] [RFC PATCH] nv50/ir: allow spilling of def values for constrained MERGES/UNIONS

2017-07-31 Thread Tobias Klausmann

This lets us spill more values and compile a big shader for Civilization 6.

Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 2 --
 1 file changed, 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
index b33d7b4010..f29c8a1a95 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
@@ -2344,8 +2344,6 @@ RegAlloc::InsertConstraintsPass::insertConstraintMoves()
 cst->setSrc(s, mov->getDef(0));
 cst->bb->insertBefore(cst, mov);
 
-cst->getDef(0)->asLValue()->noSpill = 1; // doesn't help
-
 if (cst->op == OP_UNION)
mov->setPredicate(defi->cc, defi->getPredicate());
  }
-- 
2.13.3

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH] drm: disable vblank only if it got previously enabled

2017-07-20 Thread Tobias Klausmann

Mh ok,

paper over in nouveau_display_fini until Ben comes up with a better idea
then?!


Greetings,

Tobias


On 7/20/17 10:13 AM, Daniel Vetter wrote:
> On Wed, Jul 19, 2017 at 04:10:50PM -0400, Ilia Mirkin wrote:
>> I believe the solution is to not call drm_crtc_vblank_off for atomic
>> modesetting in nouveau_display_fini. I think Ben's working on it.
> Yes, the goal of vblank_on/off was very much to not paper over driver bugs
> with clever tricks like these. If the driver cant keep track of its
> vblank, something has gone wrong, and the core should _not_ fix it up.
> Otherwise we're back to the old style vblank horror show.
>
> Thanks, Daniel
>
>> On Wed, Jul 19, 2017 at 1:25 PM, Tobias Klausmann
>> <tobias.johannes.klausm...@mni.thm.de> wrote:
>>> mimic the behavior of vblank_disable_fn(), another caller of
>>> drm_vblank_disable_and_save().
>>>
>>> This avoids oopsing, while trying to disable vblank on a not connected 
>>> display:
>>>
>>> [   12.768079] WARNING: CPU: 0 PID: 274 at drivers/gpu/drm/drm_vblank.c:609 
>>> drm_calc_vbltimestamp_from_scanoutpos+0x296/0x320 [drm]
>>> [   12.768080] Modules linked in: bnep snd_hda_codec_hdmi rtsx_usb_sdmmc 
>>> uvcvideo rtsx_usb_ms mmc_core videobuf2_vmalloc memstick videobuf2_memops 
>>> videobuf2_v4l2 videobuf2_core rtsx_usb videodev btusb btrtl arc4 
>>> snd_hda_codec_realtek snd_hda_codec_generic joydev nls_iso8859_1 
>>> hid_multitouch nls_cp437 intel_rapl x86_pkg_temp_thermal intel_powerclamp 
>>> vfat coretemp fat kvm_intel iTCO_wdt iTCO_vendor_support kvm irqbypass 
>>> crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc 
>>> aesni_intel ath10k_pci snd_hda_intel ath10k_core aes_x86_64 snd_hda_codec 
>>> crypto_simd ath glue_helper cryptd snd_hda_core mac80211 snd_hwdep snd_pcm 
>>> pcspkr r8169 cfg80211 mii snd_timer acer_wmi snd sparse_keymap wmi_bmof 
>>> idma64 hci_uart virt_dma mei_me soundcore i2c_i801 mei btbcm shpchp 
>>> intel_lpss_pci intel_pch_thermal
>>> [   12.768130]  serdev btqca ucsi_acpi btintel typec_ucsi thermal typec 
>>> bluetooth ecdh_generic battery ac pinctrl_sunrisepoint rfkill 
>>> intel_lpss_acpi pinctrl_intel intel_lpss acpi_pad nouveau serio_raw i915 
>>> mxm_wmi ttm i2c_algo_bit drm_kms_helper xhci_pci syscopyarea sysfillrect 
>>> sysimgblt xhci_hcd fb_sys_fops usbcore drm i2c_hid wmi video button sg 
>>> efivarfs
>>> [   12.768158] CPU: 0 PID: 274 Comm: kworker/0:2 Not tainted 
>>> 4.12.0-desktop-debug-drm+ #2
>>> [   12.768160] Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.04 
>>> 03/30/2017
>>> [   12.768164] Workqueue: pm pm_runtime_work
>>> [   12.768166] task: 889bf1627040 task.stack: 9541013e4000
>>> [   12.768180] RIP: 0010:drm_calc_vbltimestamp_from_scanoutpos+0x296/0x320 
>>> [drm]
>>> [   12.768181] RSP: 0018:9541013e7b30 EFLAGS: 00010086
>>> [   12.768183] RAX: 001c RBX: 889b4cebd000 RCX: 
>>> 0004
>>> [   12.768184] RDX: 8004 RSI: 87a2d952 RDI: 
>>> 
>>> [   12.768186] RBP: 9541013e7b90 R08: 0001 R09: 
>>> 039f
>>> [   12.768187] R10: c05fe530 R11:  R12: 
>>> 
>>> [   12.768188] R13: 9541013e7ba4 R14: 889bf0426088 R15: 
>>> 889bf0426000
>>> [   12.768190] FS:  () GS:889bfec0() 
>>> knlGS:
>>> [   12.768191] CS:  0010 DS:  ES:  CR0: 80050033
>>> [   12.768192] CR2: 00edb16580b8 CR3: 00020cc09000 CR4: 
>>> 003406f0
>>> [   12.768193] Call Trace:
>>> [   12.768198]  ? enqueue_task_fair+0x64/0x600
>>> [   12.768211]  ? drm_get_last_vbltimestamp+0x47/0x70 [drm]
>>> [   12.768223]  ? drm_update_vblank_count+0x65/0x240 [drm]
>>> [   12.768227]  ? pci_pm_runtime_resume+0xa0/0xa0
>>> [   12.768238]  ? drm_vblank_disable_and_save+0x55/0xc0 [drm]
>>> [   12.768250]  ? drm_crtc_vblank_off+0xa9/0x1e0 [drm]
>>> [   12.768253]  ? pci_pm_runtime_resume+0xa0/0xa0
>>> [   12.768299]  ? nouveau_display_fini+0x56/0xd0 [nouveau]
>>> [   12.768339]  ? nouveau_display_suspend+0x51/0x110 [nouveau]
>>> [   12.768378]  ? nouveau_do_suspend+0x76/0x1c0 [nouveau]
>>> [   12.768413]  ? nouveau_pmops_runtime_suspend+0x54/0xb0 [nouveau]
>>> [   12.768416]  ? pci_pm_runtime_suspend+0x5c/0x160
>>> [   12.768419]  ? __rpm_callback+0xb6/0x1e0
>>

[Nouveau] [PATCH] drm: disable vblank only if it got previously enabled

2017-07-19 Thread Tobias Klausmann

mimic the behavior of vblank_disable_fn(), another caller of
drm_vblank_disable_and_save().

This avoids oopsing, while trying to disable vblank on a not connected display:

[   12.768079] WARNING: CPU: 0 PID: 274 at drivers/gpu/drm/drm_vblank.c:609 
drm_calc_vbltimestamp_from_scanoutpos+0x296/0x320 [drm]
[   12.768080] Modules linked in: bnep snd_hda_codec_hdmi rtsx_usb_sdmmc 
uvcvideo rtsx_usb_ms mmc_core videobuf2_vmalloc memstick videobuf2_memops 
videobuf2_v4l2 videobuf2_core rtsx_usb videodev btusb btrtl arc4 
snd_hda_codec_realtek snd_hda_codec_generic joydev nls_iso8859_1 hid_multitouch 
nls_cp437 intel_rapl x86_pkg_temp_thermal intel_powerclamp vfat coretemp fat 
kvm_intel iTCO_wdt iTCO_vendor_support kvm irqbypass crct10dif_pclmul 
crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel ath10k_pci 
snd_hda_intel ath10k_core aes_x86_64 snd_hda_codec crypto_simd ath glue_helper 
cryptd snd_hda_core mac80211 snd_hwdep snd_pcm pcspkr r8169 cfg80211 mii 
snd_timer acer_wmi snd sparse_keymap wmi_bmof idma64 hci_uart virt_dma mei_me 
soundcore i2c_i801 mei btbcm shpchp intel_lpss_pci intel_pch_thermal
[   12.768130]  serdev btqca ucsi_acpi btintel typec_ucsi thermal typec 
bluetooth ecdh_generic battery ac pinctrl_sunrisepoint rfkill intel_lpss_acpi 
pinctrl_intel intel_lpss acpi_pad nouveau serio_raw i915 mxm_wmi ttm 
i2c_algo_bit drm_kms_helper xhci_pci syscopyarea sysfillrect sysimgblt xhci_hcd 
fb_sys_fops usbcore drm i2c_hid wmi video button sg efivarfs
[   12.768158] CPU: 0 PID: 274 Comm: kworker/0:2 Not tainted 
4.12.0-desktop-debug-drm+ #2
[   12.768160] Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.04 
03/30/2017
[   12.768164] Workqueue: pm pm_runtime_work
[   12.768166] task: 889bf1627040 task.stack: 9541013e4000
[   12.768180] RIP: 0010:drm_calc_vbltimestamp_from_scanoutpos+0x296/0x320 [drm]
[   12.768181] RSP: 0018:9541013e7b30 EFLAGS: 00010086
[   12.768183] RAX: 001c RBX: 889b4cebd000 RCX: 0004
[   12.768184] RDX: 8004 RSI: 87a2d952 RDI: 
[   12.768186] RBP: 9541013e7b90 R08: 0001 R09: 039f
[   12.768187] R10: c05fe530 R11:  R12: 
[   12.768188] R13: 9541013e7ba4 R14: 889bf0426088 R15: 889bf0426000
[   12.768190] FS:  () GS:889bfec0() 
knlGS:
[   12.768191] CS:  0010 DS:  ES:  CR0: 80050033
[   12.768192] CR2: 00edb16580b8 CR3: 00020cc09000 CR4: 003406f0
[   12.768193] Call Trace:
[   12.768198]  ? enqueue_task_fair+0x64/0x600
[   12.768211]  ? drm_get_last_vbltimestamp+0x47/0x70 [drm]
[   12.768223]  ? drm_update_vblank_count+0x65/0x240 [drm]
[   12.768227]  ? pci_pm_runtime_resume+0xa0/0xa0
[   12.768238]  ? drm_vblank_disable_and_save+0x55/0xc0 [drm]
[   12.768250]  ? drm_crtc_vblank_off+0xa9/0x1e0 [drm]
[   12.768253]  ? pci_pm_runtime_resume+0xa0/0xa0
[   12.768299]  ? nouveau_display_fini+0x56/0xd0 [nouveau]
[   12.768339]  ? nouveau_display_suspend+0x51/0x110 [nouveau]
[   12.768378]  ? nouveau_do_suspend+0x76/0x1c0 [nouveau]
[   12.768413]  ? nouveau_pmops_runtime_suspend+0x54/0xb0 [nouveau]
[   12.768416]  ? pci_pm_runtime_suspend+0x5c/0x160
[   12.768419]  ? __rpm_callback+0xb6/0x1e0
[   12.768423]  ? kobject_uevent_env+0x111/0x5e0
[   12.768425]  ? pci_pm_runtime_resume+0xa0/0xa0
[   12.768427]  ? rpm_callback+0x1f/0x70
[   12.768429]  ? pci_pm_runtime_resume+0xa0/0xa0
[   12.768431]  ? rpm_suspend+0x11f/0x640
[   12.768441]  ? drm_fb_helper_hotplug_event+0x9a/0xe0 [drm_kms_helper]
[   12.768447]  ? output_poll_execute+0x17b/0x1a0 [drm_kms_helper]
[   12.768449]  ? pm_runtime_work+0x64/0xa0
[   12.768453]  ? process_one_work+0x1db/0x410
[   12.768456]  ? worker_thread+0x47/0x3d0
[   12.768459]  ? process_one_work+0x410/0x410
[   12.768461]  ? kthread+0x117/0x130
[   12.768463]  ? kthread_create_on_node+0x40/0x40
[   12.768466]  ? ret_from_fork+0x25/0x30
[   12.768468] Code: 80 3d 26 f3 01 00 00 0f 85 ad fd ff ff 48 8b 43 20 48 c7 
c7 31 a2 20 c0 c6 05 0e f3 01 00 01 48 8b b0 60 01 00 00 e8 75 2e ec c6 <0f> ff 
e9 88 fd ff ff 31 f6 44 88 55 b0 e8 38 fa ed c6 44 0f b6
[   12.768508] ---[ end trace d9bb853af3659bd5 ]---

Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
---
 drivers/gpu/drm/drm_vblank.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
index a233a6be934a..4a21756bf2bd 100644
--- a/drivers/gpu/drm/drm_vblank.c
+++ b/drivers/gpu/drm/drm_vblank.c
@@ -1140,8 +1140,11 @@ void drm_crtc_vblank_off(struct drm_crtc *crtc)
 
/* Avoid redundant vblank disables without previous
 * drm_crtc_vblank_on(). */
-   if (drm_core_check_feature(dev, DRIVER_ATOMIC) || !vblank->inmodeset)
+   if (drm_core_check_feature(dev, DRIVER_ATOMIC) || (!vblank->inmodeset &&
+

[Nouveau] [PATCH v2] drm/nouveau: honor return type of nvif_mthd, trivial

2017-07-14 Thread Tobias Klausmann

nvif_mthd() returns an int, so provide that for return checking

Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
---
V2: declare var only once
The other patch should have never get out at all, but as i'm at it, send out a
fixed one!

 drivers/gpu/drm/nouveau/nouveau_display.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 8d1df5678eaa..58375669d492 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -113,10 +113,11 @@ nouveau_display_scanoutpos_head(struct drm_crtc *crtc, 
int *vpos, int *hpos,
struct drm_vblank_crtc *vblank = 
>dev->vblank[drm_crtc_index(crtc)];
int retry = 20;
bool ret = false;
+   int mthd_ret;
 
do {
-   ret = nvif_mthd(>disp, 0, , sizeof(args));
-   if (ret != 0)
+   mthd_ret = nvif_mthd(>disp, 0, , sizeof(args));
+   if (mthd_ret != 0)
return false;
 
if (args.scan.vline) {
-- 
2.13.2

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-14 Thread Tobias Klausmann

The conversion is a nice catch, but i'd like to have a bit more context, 
see below!


With a better description:

Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>


On 7/14/17 5:10 PM, Karol Herbst wrote:

Yeah, we shouldn't let the machine die. Are there more WARN_ON_ONCE
usage we could convert to WARN_ONCE?

Reviewed-By: Karol Herbst <karolher...@gmail.com>

On Fri, Jul 14, 2017 at 5:05 PM, Tobias Klausmann
<tobias.johannes.klausm...@mni.thm.de> wrote:

On 7/14/17 3:41 PM, Mike Galbraith wrote:

On Fri, 2017-07-14 at 15:36 +0200, Mike Galbraith wrote:

   All DRM did was to slip a
WARN_ON_ONCE() that nouveau triggers into a kernel module where such
things no longer warn, they blow the box out of the water.

BTW, turn that irksome WARN_ON_ONCE() in drivers/gpu/drm/drm_vblank.c
into a WARN_ONCE(), and all is peachy, you get the warning, box lives.

---
   drivers/gpu/drm/drm_vblank.c |3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/gpu/drm/drm_vblank.c
+++ b/drivers/gpu/drm/drm_vblank.c
@@ -605,7 +605,8 @@ bool drm_calc_vbltimestamp_from_scanoutp
  */
 if (mode->crtc_clock == 0) {
 DRM_DEBUG("crtc %u: Noop due to uninitialized mode.\n",
pipe);
-   WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev));
+   WARN_ONCE(drm_drv_uses_atomic_modeset(dev), "%s: report
me.\n",


"report me" seems a bit odd, maybe just uninitialized mode?



+ dev->driver->name);
 return false;
 }



Hey,

confirmed this helps saving the box, but we still have to find the root
cause! Backtrace with the above fix applied (and the one which came in with
the latest drm-fixes merge)!


[1] https://hastebin.com/uyoqifijed.http

Thanks,

Tobias
Reviewed-By: Karol Herbst <karolher...@gmail.com>
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-14 Thread Tobias Klausmann


On 7/14/17 3:41 PM, Mike Galbraith wrote:

On Fri, 2017-07-14 at 15:36 +0200, Mike Galbraith wrote:

  All DRM did was to slip a
WARN_ON_ONCE() that nouveau triggers into a kernel module where such
things no longer warn, they blow the box out of the water.

BTW, turn that irksome WARN_ON_ONCE() in drivers/gpu/drm/drm_vblank.c
into a WARN_ONCE(), and all is peachy, you get the warning, box lives.

---
  drivers/gpu/drm/drm_vblank.c |3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/gpu/drm/drm_vblank.c
+++ b/drivers/gpu/drm/drm_vblank.c
@@ -605,7 +605,8 @@ bool drm_calc_vbltimestamp_from_scanoutp
 */
if (mode->crtc_clock == 0) {
DRM_DEBUG("crtc %u: Noop due to uninitialized mode.\n", pipe);
-   WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev));
+   WARN_ONCE(drm_drv_uses_atomic_modeset(dev), "%s: report me.\n",
+ dev->driver->name);
  
  		return false;

}



Hey,

confirmed this helps saving the box, but we still have to find the root 
cause! Backtrace with the above fix applied (and the one which came in 
with the latest drm-fixes merge)!



[1] https://hastebin.com/uyoqifijed.http

Thanks,

Tobias

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH] drm/nouveau: split nouveau_drm_postclose back in pre/postclose

2017-07-12 Thread Tobias Klausmann


Mike,

sorry i forgot to add: Can you plase test this patch?


Thanks,

tobias

On 7/12/17 11:56 PM, Tobias Klausmann wrote:

This patch brings back the old nouveau_drm_preclose and nouveau_drm_postclose
functions for closing down a drm device

Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
---
  drivers/gpu/drm/nouveau/nouveau_drm.c | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c 
b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 90757af9bc73..0ca2b65bdc4f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -877,7 +877,7 @@ nouveau_drm_open(struct drm_device *dev, struct drm_file 
*fpriv)
  }
  
  static void

-nouveau_drm_postclose(struct drm_device *dev, struct drm_file *fpriv)
+nouveau_drm_preclose(struct drm_device *dev, struct drm_file *fpriv)
  {
struct nouveau_cli *cli = nouveau_cli(fpriv);
struct nouveau_drm *drm = nouveau_drm(dev);
@@ -892,7 +892,12 @@ nouveau_drm_postclose(struct drm_device *dev, struct 
drm_file *fpriv)
mutex_lock(>client.mutex);
list_del(>head);
mutex_unlock(>client.mutex);
+}
  
+static void

+nouveau_drm_postclose(struct drm_device *dev, struct drm_file *fpriv)
+{
+   struct nouveau_cli *cli = nouveau_cli(fpriv);
nouveau_cli_fini(cli);
kfree(cli);
pm_runtime_mark_last_busy(dev->dev);
@@ -964,6 +969,7 @@ driver_stub = {
.load = nouveau_drm_load,
.unload = nouveau_drm_unload,
.open = nouveau_drm_open,
+   .preclose = nouveau_drm_preclose,
.postclose = nouveau_drm_postclose,
.lastclose = nouveau_vga_lastclose,
  

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH] drm/nouveau: split nouveau_drm_postclose back in pre/postclose

2017-07-12 Thread Tobias Klausmann

This patch brings back the old nouveau_drm_preclose and nouveau_drm_postclose
functions for closing down a drm device

Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
---
 drivers/gpu/drm/nouveau/nouveau_drm.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c 
b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 90757af9bc73..0ca2b65bdc4f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -877,7 +877,7 @@ nouveau_drm_open(struct drm_device *dev, struct drm_file 
*fpriv)
 }
 
 static void
-nouveau_drm_postclose(struct drm_device *dev, struct drm_file *fpriv)
+nouveau_drm_preclose(struct drm_device *dev, struct drm_file *fpriv)
 {
struct nouveau_cli *cli = nouveau_cli(fpriv);
struct nouveau_drm *drm = nouveau_drm(dev);
@@ -892,7 +892,12 @@ nouveau_drm_postclose(struct drm_device *dev, struct 
drm_file *fpriv)
mutex_lock(>client.mutex);
list_del(>head);
mutex_unlock(>client.mutex);
+}
 
+static void
+nouveau_drm_postclose(struct drm_device *dev, struct drm_file *fpriv)
+{
+   struct nouveau_cli *cli = nouveau_cli(fpriv);
nouveau_cli_fini(cli);
kfree(cli);
pm_runtime_mark_last_busy(dev->dev);
@@ -964,6 +969,7 @@ driver_stub = {
.load = nouveau_drm_load,
.unload = nouveau_drm_unload,
.open = nouveau_drm_open,
+   .preclose = nouveau_drm_preclose,
.postclose = nouveau_drm_postclose,
.lastclose = nouveau_vga_lastclose,
 
-- 
2.13.2

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH] drm/nouveau: honor return type of nvif_mthd, trivial

2017-07-12 Thread Tobias Klausmann

nvif_mthd() returns an int, so provide that for return checking

Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
---
 drivers/gpu/drm/nouveau/nouveau_display.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 8d1df5678eaa..f8f555e2e912 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -113,10 +113,11 @@ nouveau_display_scanoutpos_head(struct drm_crtc *crtc, 
int *vpos, int *hpos,
struct drm_vblank_crtc *vblank = 
>dev->vblank[drm_crtc_index(crtc)];
int retry = 20;
bool ret = false;
+   int method_ret;
 
do {
-   ret = nvif_mthd(>disp, 0, , sizeof(args));
-   if (ret != 0)
+   int method_ret = nvif_mthd(>disp, 0, , sizeof(args));
+   if (method_ret != 0)
return false;
 
if (args.scan.vline) {
-- 
2.13.2

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-12 Thread Tobias Klausmann



On 7/12/17 7:19 PM, Mike Galbraith wrote:

On Wed, 2017-07-12 at 07:37 -0400, Ilia Mirkin wrote:

On Wed, Jul 12, 2017 at 7:25 AM, Mike Galbraith  wrote:

On Wed, 2017-07-12 at 11:55 +0200, Mike Galbraith wrote:

On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote:

Some display stuff did change for 4.13 for GM20x+ boards. If it's not
too much trouble, a bisect would be pretty useful.

Bisection seemingly went fine, but the result is odd.

e98c58e55f68f8785aebfab1f8c9a03d8de0afe1 is the first bad commit

But it really really is bad.  Looking at gitk fork in the road leading
to it...

52d9d38c183b drm/sti:fix spelling mistake: "compoment" -> "component" - good
e4e818cc2d7c drm: make drm_panel.h self-contained - good
9cf8f5802f39 drm: add missing declaration to drm_blend.h  - good

Before the git highway splits, all is well.  The lane with commits
works fine at both ends, but e98c58e55f68 is busted.  Merge arfifact?

Hmmm... that tree does not appear to have gotten a v4.12 backmerge at
any point. The last backmerge from Linus as far as I can tell was
v4.11-rc7. Could be an interaction with some out-of-tree change.

FWIW, checking out the fingered commit then..

git log --oneline 52d9d38c183b..e98c58e55f68|grep nouveau and reverting
the lot helped not at all.

Checking out 6b7781b42dc9 and reverting the fingered commit did.  Given
the nouveau bits reverted are mostly the vblank changes, CC to Daniel,
maybe he'll know why both GTX 980 and GeForce 8600 GT get all upset.

Either I'm damn lucky, both of my nvidia equipped boxen going boom 100%
repeatably, or there are a lot of folks out there who haven't yet tried
suspend with our latest/greatest kernel.  I suspect the later.

-Mike



I should have had a look at my inbox, would have save me a log of work 
bisecting. Yet i come to the same conclusion:


# first bad commit: [e98c58e55f68f8785aebfab1f8c9a03d8de0afe1] Merge tag 
'drm-misc-next-2017-05-16' of git://anongit.freedesktop.org/git/drm-misc 
into drm-next



I suspect it is some vblank change as it shows up in every trace i have 
seen while bisecting, but that is just a wild guess...


Greetings,

Tobias

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-11 Thread Tobias Klausmann


>On Tue, Jul 11, 2017 at 2:08 PM, Mike Galbraith wrote:
>> On Tue, 2017-07-11 at 13:51 -0400, Ilia Mirkin wrote:
>>> Some details that may be useful in analysis of the bug:
>>>
>>> 1. lspci -nn -d 10de:
>>
>> 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 
[GeForce GTX 980] [10de:13c0] (rev a1)
>> 01:00.1 Audio device [0403]: NVIDIA Corporation GM204 High 
Definition Audio Controller [10de:0fbb] (rev a1

>>
>>> 2. What displays, if any, you have plugged into the NVIDIA board when
>>> this happens?
>>
>> A Philips 273V, via DVI.
>>
>>> 3. Any boot parameters, esp relating to ACPI, PM, or related?
>>
>> None for those, what's there that will be unfamiliar to you are for
>> patches that aren't applied.
>>
>> nortsched hpc_cpusets skew_tick=1 ftrace_dump_on_oops audit=0
>> nodelayacct cgroup_disable=memory rtkthreads=1 rtworkqueues=2 panic=60
>> ignore_loglevel crashkernel=256M,high
>
> OK, thanks. So in other words, a fairly standard desktop with a PCIe
> board plugged in. No funny business. (Laptops can create a ton of
> additional weirdness, which I assumed you had since you were talking
> about STR.)
>
> My best guess is that gf119_head_vblank_put either has a bogus head id
> (should be in the 0..3 range) which causes it to do an out-of-bounds
> read on MMIO space, or that the MMIO mapping has already been removed
> by the time nouveau_display_suspend runs. Adding Ben Skeggs for
> additional insight.
>
> Some display stuff did change for 4.13 for GM20x+ boards. If it's not
> too much trouble, a bisect would be pretty useful.


Hey Mike,
just to inform you: i have a quite similar bug with no monitor attached 
while putting my nouveau card to sleep (laptop/optimus system) within 
nouveau_display_suspend().


I'm going to bisect this, hopefully on the long run this will aid in 
resolving your issue as well!


Greeting,
Tobias

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD

2016-10-02 Thread Tobias Klausmann




On 02.10.2016 20:03, Ilia Mirkin wrote:

On Sun, Oct 2, 2016 at 1:58 PM, Tobias Klausmann
<tobias.johannes.klausm...@mni.thm.de> wrote:

Previously we'd end up with an unnecessary mov for the thirs immediate value.

total instructions in shared programs : 851881 -> 851864 (-0.00%)
total gprs used in shared programs: 110295 -> 110295 (0.00%)
total local used in shared programs   : 1020 -> 1020 (0.00%)

 localgpr   inst  bytes
 helped   0   0  17  17
   hurt   0   0   0   0

Suggested-by: Karol Herbst <nouv...@karolherbst.de>
Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
---
  src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 15 ---
  1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 9875738..8bb5cf9 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1008,13 +1008,22 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue 
, int s)
break;
 case OP_MAD:
if (imm0.isInteger(0)) {
+ ImmediateValue imm1;
   i->setSrc(0, i->getSrc(2));
   i->src(0).mod = i->src(2).mod;
   i->setSrc(1, NULL);
   i->setSrc(2, NULL);
- i->op = i->src(0).mod.getOp();
- if (i->op != OP_CVT)
-i->src(0).mod = 0;
+ if (i->src(0).getImmediate(imm1)) {
+bld.setPosition(i, false);
+newi = bld.mkMov(i->getDef(0), bld.mkImm(imm1.reg.data.u64),
+ i->dType);
+delete_Instruction(prog, i);

What's an example of a situation where this helps? It shouldn't
matter, the mov's should get cleaned up. [Clearly 17 shaders
disagree...] Is this just a side-effect of the fact that we don't run
the opts to a fixed point?


It is a second mov that causes a problem for later folding in the imm, 
here output of a testshader[1]:


  0: nop u32 %r56 (0)
  1: ld  u32 %r31 c0[0x0] (0)
  2: ld  u32 %r37 c0[0x140] (0)
  3: mov u32 %r38 0x (0)
  4: mov u32 %r39 0x3f80 (0)
  5: mad f32 %r40 %r37 %r38 %r39 (0)
  6: mad f32 %r44 %r37 %r38 %r38 (0)
  7: add f32 %r53 %r31 %r40 (0)
  8: add f32 %r54 %r31 %r44 (0)
  9: add f32 %r57 %r56 %r44 (0)

Constantfolding...

MAIN:-1 ()
BB:0 (14 instructions) - df = { }
 -> BB:1 (tree)
  0: nop u32 %r56 (0)
  1: ld  u32 %r31 c0[0x0] (0)
  2: ld  u32 %r37 c0[0x140] (0)
  3: mov u32 %r38 0x (0)
  4: mov u32 %r39 0x3f80 (0)
  5: mov f32 %r40 %r39 (0)
  6: mov f32 %r44 %r38 (0)
  7: add f32 %r53 %r31 %r40 (0)
  8: mov f32 %r54 %r31 (0)
  9: mov f32 %r57 %r56 (0)


The outcome:
  0: ld  u32 $r2 c0[0x0] (8)
  1: mov u32 $r0 0x3f80 (8)
  2: add ftz f32 $r0 $r2 $r0 (8)
  3: mov f32 $r3 $r1 (8)
  4: mov u32 $r1 $r2 (8)
  5: export b128 # o[0x0] $r0q (8)

With patch:
  0: ld  u32 $r2 c0[0x0] (8)
  1: add ftz f32 $r0 $r2 1.00 (8)
  2: mov f32 $r3 $r1 (8)
  3: mov u32 $r1 $r2 (8)
  4: export b128 # o[0x0] $r0q (8)


[1]:
VERT
PROPERTY NEXT_SHADER FRAG
DCL OUT[0], GENERIC[0]
DCL CONST[0]
DCL TEMP[0..1], LOCAL
IMM[0] FLT32 {0.0078,-1., 0., 0.5000}
IMM[1] FLT32 {1., 0., 65535., 0.0100}
  0: MOV TEMP[0].xyz, CONST[0].
 39: MAD TEMP[1], CONST[20]., IMM[1]., IMM[1].xyyy
 41: ADD TEMP[1], TEMP[0], TEMP[1]
208: MOV OUT[0], TEMP[1]
211: END






+ }
+ else {
+i->op = i->src(0).mod.getOp();
+if (i->op != OP_CVT)
+   i->src(0).mod = 0;
+ }
} else
if (i->subOp != NV50_IR_SUBOP_MUL_HIGH &&
(imm0.isInteger(1) || imm0.isInteger(-1))) {
--
2.10.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD

2016-10-02 Thread Tobias Klausmann

Previously we'd end up with an unnecessary mov for the thirs immediate value.

total instructions in shared programs : 851881 -> 851864 (-0.00%)
total gprs used in shared programs: 110295 -> 110295 (0.00%)
total local used in shared programs   : 1020 -> 1020 (0.00%)

localgpr   inst  bytes
helped   0   0  17  17
  hurt   0   0   0   0

Suggested-by: Karol Herbst <nouv...@karolherbst.de>
Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 9875738..8bb5cf9 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1008,13 +1008,22 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue 
, int s)
   break;
case OP_MAD:
   if (imm0.isInteger(0)) {
+ ImmediateValue imm1;
  i->setSrc(0, i->getSrc(2));
  i->src(0).mod = i->src(2).mod;
  i->setSrc(1, NULL);
  i->setSrc(2, NULL);
- i->op = i->src(0).mod.getOp();
- if (i->op != OP_CVT)
-i->src(0).mod = 0;
+ if (i->src(0).getImmediate(imm1)) {
+bld.setPosition(i, false);
+newi = bld.mkMov(i->getDef(0), bld.mkImm(imm1.reg.data.u64),
+ i->dType);
+delete_Instruction(prog, i);
+ }
+ else {
+i->op = i->src(0).mod.getOp();
+if (i->op != OP_CVT)
+   i->src(0).mod = 0;
+ }
   } else
   if (i->subOp != NV50_IR_SUBOP_MUL_HIGH &&
   (imm0.isInteger(1) || imm0.isInteger(-1))) {
-- 
2.10.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH v2] nv50/ir: constant fold OP_SPLIT

2016-09-30 Thread Tobias Klausmann




On 30.09.2016 23:57, Ilia Mirkin wrote:

On Fri, Sep 30, 2016 at 5:50 PM, Tobias Klausmann
<tobias.johannes.klausm...@mni.thm.de> wrote:

Split the source immediate value into two new values and create OP_MOV
instructions the two newly created values.

V2: get rid of special cases

Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
---
  src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 16 
  1 file changed, 16 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 9875738..d56b057 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -932,6 +932,22 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue 
, int s)
 Instruction *newi = i;

 switch (i->op) {
+   case OP_SPLIT: {
+  uint8_t size = typeSizeof(i->dType);
+  DataType type = typeOfSize(size / 2, isFloatType(i->dType),
+ isSignedType(i->dType));

Er wait, sorry, I might have confused matters here...

Why do you need to compute type at all? Why not just reuse i->dType?


i->dType comes in the same as i->sType, so we need to evaluate the old 
type and set the new type accordingly.
e.g in my test shader a u64 is split, still i->dType == TYPE_U64. Maybe 
we are doing something wrong somewhere else, but looking only at this 
folding, setting the new type is needed (note that originally i->sType 
was used)





+  if (likely(type != TYPE_NONE)) {
+ uint64_t val = imm0.reg.data.u64;
+ uint16_t shift = size * 8;
+ bld.setPosition(i, false);
+ for (int8_t d = 0; i->defExists(d); ++d) {
+bld.mkMov(i->getDef(d), bld.mkImm(val & ((1 << shift) - 1)), type);

1ULL


+val >>= shift;
+ }
+ delete_Instruction(prog, i);
+  }
+   }
+   break;
 case OP_MUL:
if (i->dType == TYPE_F32)
   tryCollapseChainedMULs(i, s, imm0);
--
2.10.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH v2] nv50/ir: constant fold OP_SPLIT

2016-09-30 Thread Tobias Klausmann

Split the source immediate value into two new values and create OP_MOV
instructions the two newly created values.

V2: get rid of special cases

Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 16 
 1 file changed, 16 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 9875738..d56b057 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -932,6 +932,22 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue 
, int s)
Instruction *newi = i;
 
switch (i->op) {
+   case OP_SPLIT: {
+  uint8_t size = typeSizeof(i->dType);
+  DataType type = typeOfSize(size / 2, isFloatType(i->dType),
+ isSignedType(i->dType));
+  if (likely(type != TYPE_NONE)) {
+ uint64_t val = imm0.reg.data.u64;
+ uint16_t shift = size * 8;
+ bld.setPosition(i, false);
+ for (int8_t d = 0; i->defExists(d); ++d) {
+bld.mkMov(i->getDef(d), bld.mkImm(val & ((1 << shift) - 1)), type);
+val >>= shift;
+ }
+ delete_Instruction(prog, i);
+  }
+   }
+   break;
case OP_MUL:
   if (i->dType == TYPE_F32)
  tryCollapseChainedMULs(i, s, imm0);
-- 
2.10.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH] nv50/ir: constant fold OP_SPLIT

2016-09-30 Thread Tobias Klausmann




On 28.09.2016 02:01, Ilia Mirkin wrote:

On Tue, Sep 27, 2016 at 7:25 PM, Tobias Klausmann
<tobias.johannes.klausm...@mni.thm.de> wrote:

Split the source immediate value into two new values and create OP_MOV
instructions the two newly created values.

Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
---
  .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 23 ++
  1 file changed, 23 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 74a5a85..f71 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -920,6 +920,29 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue 
, int s)
 Instruction *newi = i;

 switch (i->op) {
+   case OP_SPLIT: {
+  uint16_t shift = 0;
+  DataType type = TYPE_NONE;
+  bld.setPosition(i, false);
+  if (i->sType == TYPE_U64 || i->sType == TYPE_S64) {
+ shift = 32;
+ type = (i->sType == TYPE_U64) ? TYPE_U32 : TYPE_S32;
+  }
+  if (i->sType == TYPE_U32 || i->sType == TYPE_S32) {
+ shift = 16;
+ type = (i->sType == TYPE_U32) ? TYPE_U16 : TYPE_S16;
+  }
+  if (i->sType == TYPE_U16 || i->sType == TYPE_S16) {
+ shift = 8;
+ type = (i->sType == TYPE_U16) ? TYPE_U8 : TYPE_S8;
+  }

shift = typeSizeOf(i->dType);


+  if (type != TYPE_NONE) {
+ bld.mkMov(i->getDef(0), bld.mkImm(imm0.reg.data.u64 >> shift), type);
+ bld.mkMov(i->getDef(1), bld.mkImm(imm0.reg.data.u64), type);

u64 val = ...u64;
for (d = 0; i->defExists(d); ++d) {
   bld.mkMov(i->getDef(d), bld.mkImm(val & ((1 << shift) - 1));
   val >>= shift;
}

I think this will account for every case, and with a lot less
special-casing. What do you think?


Well with this you'd not set the new type right: bld.mkMov(def, val, 
>>type<<), where you always would use TYPE_U32. Not sure if that is 
what we want... other than that that, shorten it like this would be nice!





+ delete_Instruction(prog, i);
+  }
+   }
+   break;
 case OP_MUL:
if (i->dType == TYPE_F32)
   tryCollapseChainedMULs(i, s, imm0);
--
2.10.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH] nv50/ir: use unordered_set instead of list to keep track of var defs

2015-09-06 Thread Tobias Klausmann

The set of variable defs does not need to be ordered in any way, and
removing/adding elements is a fairly common operation in various
optimization passes.

This shortens runtime of piglit test fp-long-alu to ~11s from ~22s
No piglit regressions observed on nvc0!

Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir.cpp|  4 ++--
 src/gallium/drivers/nouveau/codegen/nv50_ir.h  |  7 +++---
 .../drivers/nouveau/codegen/nv50_ir_inlines.h  | 28 +-
 .../nouveau/codegen/nv50_ir_lowering_nv50.cpp  |  4 ++--
 .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp  |  6 ++---
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   |  4 ++--
 src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 17 +++--
 7 files changed, 38 insertions(+), 32 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
index cce6055..745cdc9 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
@@ -154,9 +154,9 @@ ValueDef::set(Value *defVal)
if (value == defVal)
   return;
if (value)
-  value->defs.remove(this);
+  value->defs.erase(this);
if (defVal)
-  defVal->defs.push_back(this);
+  defVal->defs.insert(this);
 
value = defVal;
 }
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
index ba1b085..deeabff 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
@@ -570,6 +570,7 @@ public:
 
inline Value *rep() const { return join; }
 
+   inline Instruction *getUniqueInsnMerged() const;
inline Instruction *getUniqueInsn() const;
inline Instruction *getInsn() const; // use when uniqueness is certain
 
@@ -586,11 +587,11 @@ public:
static inline Value *get(Iterator&);
 
unordered_set uses;
-   std::list defs;
+   unordered_set defs;
typedef unordered_set::iterator UseIterator;
typedef unordered_set::const_iterator UseCIterator;
-   typedef std::list::iterator DefIterator;
-   typedef std::list::const_iterator DefCIterator;
+   typedef unordered_set::iterator DefIterator;
+   typedef unordered_set::const_iterator DefCIterator;
 
int id;
Storage reg;
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_inlines.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_inlines.h
index e465f24..8c8e54c 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_inlines.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_inlines.h
@@ -205,21 +205,26 @@ const LValue *ValueDef::preSSA() const
 
 Instruction *Value::getInsn() const
 {
-   return defs.empty() ? NULL : defs.front()->getInsn();
+   return defs.empty() ? NULL : (*defs.begin())->getInsn();
 }
 
-Instruction *Value::getUniqueInsn() const
+Instruction *Value::getUniqueInsnMerged() const
 {
if (defs.empty())
   return NULL;
+   /* It is not guaranteed that this is the first in the set, lets find it */
+   for (DefCIterator it = defs.begin(); it != defs.end(); ++it)
+  if ((*it)->get() == this)
+ return (*it)->getInsn();
+   /* We should never hit this assert */
+   assert(0);
+   return NULL;
+}
 
-   // after regalloc, the definitions of coalesced values are linked
-   if (join != this) {
-  for (DefCIterator it = defs.begin(); it != defs.end(); ++it)
- if ((*it)->get() == this)
-return (*it)->getInsn();
-  // should be unreachable and trigger assertion at the end
-   }
+Instruction *Value::getUniqueInsn() const
+{
+   if (defs.empty())
+  return NULL;
 #ifdef DEBUG
if (reg.data.id < 0) {
   int n = 0;
@@ -230,8 +235,9 @@ Instruction *Value::getUniqueInsn() const
  WARN("value %%%i not uniquely defined\n", id); // return NULL ?
}
 #endif
-   assert(defs.front()->get() == this);
-   return defs.front()->getInsn();
+   ValueDef *def = *defs.begin();
+   assert(def->get() == this);
+   return def->getInsn();
 }
 
 inline bool Instruction::constrainedDefs() const
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
index bea293b..9d1244d 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
@@ -211,7 +211,7 @@ NV50LegalizePostRA::visit(Function *fn)
if (outWrites) {
   for (std::list::iterator it = outWrites->begin();
it != outWrites->end(); ++it)
- (*it)->getSrc(1)->defs.front()->getInsn()->setDef(0, 
(*it)->getSrc(0));
+ (*(*it)->getSrc(1)->defs.begin())->getInsn()->setDef(0, 
(*it)->getSrc(0));
   // instructions will be deleted on exit
   outWrites->clear();
}
@@ -343,7 +343,7 @@ NV50Lega

Re: [Nouveau] [PATCH] nv50: avoid using inline vertex data submit when gl_VertexID is used

2015-08-24 Thread Tobias Klausmann




On 24.08.2015 17:51, Ilia Mirkin wrote:

The hardware only generates vertexid when vertices come from a VBO. This
fixes:

   vertexid-drawelements
   vertexid-drawarrays

Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
Cc: 11.0 mesa-sta...@lists.freedesktop.org
---
  src/gallium/drivers/nouveau/nv50/nv50_program.c| 1 +
  src/gallium/drivers/nouveau/nv50/nv50_program.h| 1 +
  src/gallium/drivers/nouveau/nv50/nv50_state_validate.c | 3 ++-
  src/gallium/drivers/nouveau/nv50/nv50_vbo.c| 8 
  4 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c 
b/src/gallium/drivers/nouveau/nv50/nv50_program.c
index 02dc367..eff4477 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c
@@ -66,6 +66,7 @@ nv50_vertprog_assign_slots(struct nv50_ir_prog_info *info)
case TGSI_SEMANTIC_VERTEXID:
   prog-vp.attrs[2] |= NV50_3D_VP_GP_BUILTIN_ATTR_EN_VERTEX_ID;
   prog-vp.attrs[2] |= 
NV50_3D_VP_GP_BUILTIN_ATTR_EN_VERTEX_ID_DRAW_ARRAYS_ADD_START;
+ prog-vp.vertexid = 1;
   continue;
default:
   break;
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.h 
b/src/gallium/drivers/nouveau/nv50/nv50_program.h
index 5d3ff56..f4e8e94 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.h
@@ -76,6 +76,7 @@ struct nv50_program {
ubyte psiz;/* output slot of point size */
ubyte bfc[2];  /* indices into varying for FFC (FP) or BFC (VP) */
ubyte edgeflag;
+  ubyte vertexid;
ubyte clpd[2]; /* output slot of clip distance[i]'s 1st component */
ubyte clpd_nr;
 } vp;
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c 
b/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c
index b304a17..66dcf43 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c
@@ -503,7 +503,8 @@ static struct state_validate {
  { nv50_validate_samplers,  NV50_NEW_SAMPLERS },
  { nv50_stream_output_validate, NV50_NEW_STRMOUT |
 NV50_NEW_VERTPROG | NV50_NEW_GMTYPROG },
-{ nv50_vertex_arrays_validate, NV50_NEW_VERTEX | NV50_NEW_ARRAYS },
+{ nv50_vertex_arrays_validate, NV50_NEW_VERTEX | NV50_NEW_ARRAYS |
+   NV50_NEW_VERTPROG },
  { nv50_validate_min_samples,   NV50_NEW_MIN_SAMPLES },
  };
  #define validate_list_len (sizeof(validate_list) / sizeof(validate_list[0]))
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c 
b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
index 600b973..fb4305f 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
@@ -301,6 +301,14 @@ nv50_vertex_arrays_validate(struct nv50_context *nv50)
 unsigned i;
 const unsigned n = MAX2(vertex-num_elements, nv50-state.num_vtxelts);
  
+   /* A vertexid is not generated for inline data uploads. Have to use a

+* VBO. This check must come after the vertprog has been validated,
+* otherwise vertexid may be unset.
+*/
+   assert(nv50-vertprog-translated);
+   if (nv50-vertprog-vp.vertexid)
+  nv50-vbo_push_hint = 0;
+
 if (unlikely(vertex-need_conversion))
nv50-vbo_fifo = ~0;
 else

LGTM!
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH] glsl: Extend lowering pass for gl_ClipDistance to support other arrays

2015-08-17 Thread Tobias Klausmann

This will come in handy when we want to lower gl_CullDistance into
gl_CullDistanceMESA.

Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de
---
 src/glsl/Makefile.sources|   2 +-
 src/glsl/ir_optimization.h   |   1 +
 src/glsl/lower_clip_distance.cpp | 574 
 src/glsl/lower_distance.cpp  | 606 +++
 4 files changed, 608 insertions(+), 575 deletions(-)
 delete mode 100644 src/glsl/lower_clip_distance.cpp
 create mode 100644 src/glsl/lower_distance.cpp

diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
index 0b77244..00ba480 100644
--- a/src/glsl/Makefile.sources
+++ b/src/glsl/Makefile.sources
@@ -143,7 +143,7 @@ LIBGLSL_FILES = \
loop_analysis.h \
loop_controls.cpp \
loop_unroll.cpp \
-   lower_clip_distance.cpp \
+   lower_distance.cpp \
lower_const_arrays_to_uniforms.cpp \
lower_discard.cpp \
lower_discard_flow.cpp \
diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h
index eef107e..fe62e74 100644
--- a/src/glsl/ir_optimization.h
+++ b/src/glsl/ir_optimization.h
@@ -120,6 +120,7 @@ bool lower_variable_index_to_cond_assign(gl_shader_stage 
stage,
 bool lower_quadop_vector(exec_list *instructions, bool dont_lower_swz);
 bool lower_const_arrays_to_uniforms(exec_list *instructions);
 bool lower_clip_distance(gl_shader *shader);
+bool lower_cull_distance(gl_shader *shader);
 void lower_output_reads(unsigned stage, exec_list *instructions);
 bool lower_packing_builtins(exec_list *instructions, int op_mask);
 void lower_ubo_reference(struct gl_shader *shader, exec_list *instructions);
diff --git a/src/glsl/lower_clip_distance.cpp b/src/glsl/lower_clip_distance.cpp
deleted file mode 100644
index 1ada215..000
--- a/src/glsl/lower_clip_distance.cpp
+++ /dev/null
@@ -1,574 +0,0 @@
-/*
- * Copyright © 2011 Intel Corporation
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the Software),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice (including the next
- * paragraph) shall be included in all copies or substantial portions of the
- * Software.
- *
- * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
- * DEALINGS IN THE SOFTWARE.
- */
-
-/**
- * \file lower_clip_distance.cpp
- *
- * This pass accounts for the difference between the way
- * gl_ClipDistance is declared in standard GLSL (as an array of
- * floats), and the way it is frequently implemented in hardware (as
- * a pair of vec4s, with four clip distances packed into each).
- *
- * The declaration of gl_ClipDistance is replaced with a declaration
- * of gl_ClipDistanceMESA, and any references to gl_ClipDistance are
- * translated to refer to gl_ClipDistanceMESA with the appropriate
- * swizzling of array indices.  For instance:
- *
- *   gl_ClipDistance[i]
- *
- * is translated into:
- *
- *   gl_ClipDistanceMESA[i2][i3]
- *
- * Since some hardware may not internally represent gl_ClipDistance as a pair
- * of vec4's, this lowering pass is optional.  To enable it, set the
- * LowerClipDistance flag in gl_shader_compiler_options to true.
- */
-
-#include glsl_symbol_table.h
-#include ir_rvalue_visitor.h
-#include ir.h
-#include program/prog_instruction.h /* For WRITEMASK_* */
-
-namespace {
-
-class lower_clip_distance_visitor : public ir_rvalue_visitor {
-public:
-   explicit lower_clip_distance_visitor(gl_shader_stage shader_stage)
-  : progress(false), old_clip_distance_out_var(NULL),
-old_clip_distance_in_var(NULL), new_clip_distance_out_var(NULL),
-new_clip_distance_in_var(NULL), shader_stage(shader_stage)
-   {
-   }
-
-   virtual ir_visitor_status visit(ir_variable *);
-   void create_indices(ir_rvalue*, ir_rvalue *, ir_rvalue *);
-   bool is_clip_distance_vec8(ir_rvalue *ir);
-   ir_rvalue *lower_clip_distance_vec8(ir_rvalue *ir);
-   virtual ir_visitor_status visit_leave(ir_assignment *);
-   void visit_new_assignment(ir_assignment *ir);
-   virtual ir_visitor_status visit_leave(ir_call *);
-
-   virtual void handle_rvalue(ir_rvalue **rvalue);
-
-   void fix_lhs(ir_assignment *);
-
-   bool progress;
-
-   /**
-* Pointer

Re: [Nouveau] [PATCH] avoid build fail without COMPOSITE

2015-07-14 Thread Tobias Klausmann


Lgtm! You can add my R-b if you want!

On 14.07.2015 23:17, Ilia Mirkin wrote:

---
  src/nouveau_dri2.c | 15 ++-
  1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/src/nouveau_dri2.c b/src/nouveau_dri2.c
index f22e319..4398559 100644
--- a/src/nouveau_dri2.c
+++ b/src/nouveau_dri2.c
@@ -142,6 +142,7 @@ nouveau_dri2_copy_region2(ScreenPtr pScreen, DrawablePtr 
pDraw, RegionPtr pRegio
NVPtr pNv = NVPTR(xf86ScreenToScrn(pScreen));
RegionPtr pCopyClip;
GCPtr pGC;
+   PixmapPtr pPix;
DrawablePtr src_draw, dst_draw;
Bool translate = FALSE;
int off_x = 0, off_y = 0;
@@ -170,9 +171,13 @@ nouveau_dri2_copy_region2(ScreenPtr pScreen, DrawablePtr 
pDraw, RegionPtr pRegio
}
  
  	if (translate  pDraw-type == DRAWABLE_WINDOW) {

-   PixmapPtr pPix = get_drawable_pixmap(pDraw);
-   off_x = pDraw-x - pPix-screen_x;
-   off_y = pDraw-y - pPix-screen_y;
+   off_x = pDraw-x;
+   off_y = pDraw-y;
+#ifdef COMPOSITE
+   pPix = get_drawable_pixmap(pDraw);
+   off_x -= pPix-screen_x;
+   off_y -= pPix-screen_y;
+#endif
}
  
  	pGC = GetScratchGC(pDraw-depth, pScreen);

@@ -194,8 +199,8 @@ nouveau_dri2_copy_region2(ScreenPtr pScreen, DrawablePtr 
pDraw, RegionPtr pRegio
if (extents-x1 == 0  extents-y1 == 0 
extents-x2 == pDraw-width 
extents-y2 == pDraw-height) {
-   PixmapPtr fpix = get_drawable_pixmap(dst_draw);
-   struct nouveau_bo *bo = nouveau_pixmap_bo(fpix);
+   pPix = get_drawable_pixmap(dst_draw);
+   struct nouveau_bo *bo = nouveau_pixmap_bo(pPix);
if (bo)
nouveau_bo_wait(bo, NOUVEAU_BO_RD, pNv-client);
}


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH 1/2] nouveau/compiler: fix trivial compiler warnings

2015-07-08 Thread Tobias Klausmann

nouveau_compiler.c: In function ‘main’:
nouveau_compiler.c:216:27: warning: ‘code’ may be used uninitialized in
this function [-Wmaybe-uninitialized]
   printf(%08x , code[i / 4]);
   ^
nouveau_compiler.c:215:4: warning: ‘size’ may be used uninitialized in
this function [-Wmaybe-uninitialized]
for (i = 0; i  size; i += 4) {

Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de
---
 src/gallium/drivers/nouveau/nouveau_compiler.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nouveau_compiler.c 
b/src/gallium/drivers/nouveau/nouveau_compiler.c
index 8660498..ca128b5 100644
--- a/src/gallium/drivers/nouveau/nouveau_compiler.c
+++ b/src/gallium/drivers/nouveau/nouveau_compiler.c
@@ -144,7 +144,7 @@ main(int argc, char *argv[])
const char *filename = NULL;
FILE *f;
char text[65536] = {0};
-   unsigned size, *code;
+   unsigned size = 0, *code = NULL;
 
for (i = 1; i  argc; i++) {
   if (!strcmp(argv[i], -a))
-- 
2.4.5

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH 2/2] nv50/ir: fix a compiler warning with debug-only code

2015-07-08 Thread Tobias Klausmann

codegen/nv50_ir_emit_nv50.cpp: In member function
‘void nv50_ir::CodeEmitterNV50::emitLOAD(const nv50_ir::Instruction*)’:
codegen/nv50_ir_emit_nv50.cpp:620:12: warning: unused variable ‘offset’
 [-Wunused-variable]
int32_t offset = i-getSrc(0)-reg.data.offset;

Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
index 67ea6df..86b16f2 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
@@ -616,8 +616,11 @@ CodeEmitterNV50::emitLoadStoreSizeCS(DataType ty)
 void
 CodeEmitterNV50::emitLOAD(const Instruction *i)
 {
-   DataFile sf = i-src(0).getFile();
+#ifdef DEBUG
int32_t offset = i-getSrc(0)-reg.data.offset;
+#endif
+
+   DataFile sf = i-src(0).getFile();
 
switch (sf) {
case FILE_SHADER_INPUT:
-- 
2.4.5

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 1/2] nouveau/compiler: fix trivial compiler warnings

2015-07-08 Thread Tobias Klausmann




On 08.07.2015 21:42, Emil Velikov wrote:

On 8 July 2015 at 20:34, Tobias Klausmann
tobias.johannes.klausm...@mni.thm.de wrote:

Mh i'm not aware of me ever changed the nouveau_compiler. But i'm happy to
see this made you laugh, so it has something positive at least... :/


Story time:
This particular compiler warning has been brought up (incl here) four
or five times. Each time, Ilia feels reluctant about the fix as the
(gcc) compiler gets it wrong.

Personally I do not see a problem with explicitly initialising the
variable at this instance, yet I'm curious for how long Ilia will say
no to this (type of) patch(es) :-P

No offence, I just find it funny.
Emil
Oh i did even answer in a thread for a patch from Martin where he 
propose the same change (even with the same prefix :D). Ilia maybe you 
should take this after all, as it seems you are haunted by this :P

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [RFC PATCH 00/11] Implement ARB_cull_distance

2015-07-08 Thread Tobias Klausmann




On 25.05.2015 17:07, Ilia Mirkin wrote:

On Mon, May 25, 2015 at 9:40 AM, Tobias Klausmann
tobias.johannes.klausm...@mni.thm.de wrote:

On 25.05.2015 07:17, Dave Airlie wrote:

On 25 May 2015 at 08:11, Marek Olšák mar...@gmail.com wrote:

It's the same on Radeon. There are 2x ClipOrCullDistance output
vectors and a mask saying it should clip or cull or do nothing.

Marek


My thinking was gallium should have a single semantic and a mask in
the shader definition maybe.

though it doesn't solve the does nvidia do the right thing with
cull[0] and clip[0], and what is the right thing.

Dave.


I'm still convinced that both clip[0] and cull[0] should be possible. Plus i
have written a shader_test for this a while ago which you pushed to piglit
(fs-cull-and-clip-distance-different.shader_test). If i remember right
nvidia passed that test just fine.

My take (and note that I last read the extension many months ago) is
that you're supposed to figure out the max gl_ClipDistance[] written,
and then write all your cull distances above that. So if you, e.g.,
have something like

gl_ClipDistance[5] = 1;
gl_CullDistance[0] = 1;

Then it would decide that there are 6 clip distances (or if there's an
explicit out float gl_ClipDistance[n], then use that), and 1 cull
distance. In the TGSI, I'm thinking this might look approximately like

PROPERTY CULL_MASK (16)
DCL OUT[0], CLIPDIST[0]
DCL OUT[1], CLIPDIST[1]
MOV OUT[1].y, 1 (clip distance[5])
MOV OUT[1].z, 1 (cull distance[0])

Then basically you'd have

(rast-clip_enable  shader-actual_clip_writes_mask) | cull_mask =
the enabled distances
cull_mask = cull mask

This would work *very* well for nouveau, not sure how suitable it is
for other hardware.

Cheers,

   -ilia
I wonder where this step should be implemented after all. It was brought 
up that llvmpipe already supports cull_distance (it does!), so maybe we 
should implement this in the drivers to evade llvmpipe breakage. Any 
suggestions appreciated :)


Tobias
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 2/2] nv50/ir: fix a compiler warning with debug-only code

2015-07-08 Thread Tobias Klausmann




On 08.07.2015 21:34, Emil Velikov wrote:

On 8 July 2015 at 19:27, Tobias Klausmann
tobias.johannes.klausm...@mni.thm.de wrote:

codegen/nv50_ir_emit_nv50.cpp: In member function
‘void nv50_ir::CodeEmitterNV50::emitLOAD(const nv50_ir::Instruction*)’:
codegen/nv50_ir_emit_nv50.cpp:620:12: warning: unused variable ‘offset’
  [-Wunused-variable]
 int32_t offset = i-getSrc(0)-reg.data.offset;

Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de
---
  src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
index 67ea6df..86b16f2 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
@@ -616,8 +616,11 @@ CodeEmitterNV50::emitLoadStoreSizeCS(DataType ty)
  void
  CodeEmitterNV50::emitLOAD(const Instruction *i)
  {
-   DataFile sf = i-src(0).getFile();
+#ifdef DEBUG
 int32_t offset = i-getSrc(0)-reg.data.offset;
+#endif
+

assert is (normally) guarded by NDEBUG. Mesa/gallium has an in-house
replacement, which (not 100% sure) should be fine as well.

-Emil
As far as i can see it in u_debug.h assert (debug_assert) is guarded by 
DEBUG as the above change...

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 1/2] nouveau/compiler: fix trivial compiler warnings

2015-07-08 Thread Tobias Klausmann




On 08.07.2015 20:38, Ilia Mirkin wrote:

Compiler is wrong.


So just nouveau: ... then? Anyway, change it to your liking.


On Wed, Jul 8, 2015 at 2:27 PM, Tobias Klausmann
tobias.johannes.klausm...@mni.thm.de wrote:

nouveau_compiler.c: In function ‘main’:
nouveau_compiler.c:216:27: warning: ‘code’ may be used uninitialized in
this function [-Wmaybe-uninitialized]
printf(%08x , code[i / 4]);
^
nouveau_compiler.c:215:4: warning: ‘size’ may be used uninitialized in
this function [-Wmaybe-uninitialized]
 for (i = 0; i  size; i += 4) {

Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de
---
  src/gallium/drivers/nouveau/nouveau_compiler.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nouveau_compiler.c 
b/src/gallium/drivers/nouveau/nouveau_compiler.c
index 8660498..ca128b5 100644
--- a/src/gallium/drivers/nouveau/nouveau_compiler.c
+++ b/src/gallium/drivers/nouveau/nouveau_compiler.c
@@ -144,7 +144,7 @@ main(int argc, char *argv[])
 const char *filename = NULL;
 FILE *f;
 char text[65536] = {0};
-   unsigned size, *code;
+   unsigned size = 0, *code = NULL;

 for (i = 1; i  argc; i++) {
if (!strcmp(argv[i], -a))
--
2.4.5

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [RFC PATCH 00/11] Implement ARB_cull_distance

2015-05-27 Thread Tobias Klausmann




On 27.05.2015 18:28, Marek Olšák wrote:

Another thing to consider is linking shaders that occur before the
rasterizer (e.g. any two shaders from VS-TCS-TES-GS). The maximum
number of written distances is still 8, but what happens if VS writes
1x clip and 7x cull and GS reads 8x clip and no cull?


i think this should be rejected anyway (in the glsl?!), constraining vs 
output to be the same as gs input where the last definition is the valid 
one



  In this case
it's basically a generic varying. How is linking separate shaders
supposed to work with one combined clip-or-cull array? It doesn't seem
to be possible.

Marek

On Mon, May 25, 2015 at 5:07 PM, Ilia Mirkin imir...@alum.mit.edu wrote:

On Mon, May 25, 2015 at 9:40 AM, Tobias Klausmann
tobias.johannes.klausm...@mni.thm.de wrote:

On 25.05.2015 07:17, Dave Airlie wrote:

On 25 May 2015 at 08:11, Marek Olšák mar...@gmail.com wrote:

It's the same on Radeon. There are 2x ClipOrCullDistance output
vectors and a mask saying it should clip or cull or do nothing.

Marek


My thinking was gallium should have a single semantic and a mask in
the shader definition maybe.

though it doesn't solve the does nvidia do the right thing with
cull[0] and clip[0], and what is the right thing.

Dave.


I'm still convinced that both clip[0] and cull[0] should be possible. Plus i
have written a shader_test for this a while ago which you pushed to piglit
(fs-cull-and-clip-distance-different.shader_test). If i remember right
nvidia passed that test just fine.


Ah btw, if we follow Brian Paul, overlapping indexes are fine! (and it 
is way more intuitive to use for a shader developer)



My take (and note that I last read the extension many months ago) is
that you're supposed to figure out the max gl_ClipDistance[] written,
and then write all your cull distances above that. So if you, e.g.,
have something like

gl_ClipDistance[5] = 1;
gl_CullDistance[0] = 1;

Then it would decide that there are 6 clip distances (or if there's an
explicit out float gl_ClipDistance[n], then use that), and 1 cull
distance. In the TGSI, I'm thinking this might look approximately like

PROPERTY CULL_MASK (16)
DCL OUT[0], CLIPDIST[0]
DCL OUT[1], CLIPDIST[1]
MOV OUT[1].y, 1 (clip distance[5])
MOV OUT[1].z, 1 (cull distance[0])

Then basically you'd have

(rast-clip_enable  shader-actual_clip_writes_mask) | cull_mask =
the enabled distances
cull_mask = cull mask

This would work *very* well for nouveau, not sure how suitable it is
for other hardware.

Cheers,

   -ilia


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] Self introduction Hans de Goede

2015-05-26 Thread Tobias Klausmann




On 26.05.2015 09:43, Hans de Goede wrote:

Hi,

On 26-05-15 09:41, Martin Peres wrote:

On 26/05/15 10:29, Hans de Goede wrote:

Hi All,

Since I will be working on nouveau pretty much starting today I thought
it would be good to write a quick self introduction.

I'm an FOSS enthusiast / developer since 1996. I've written (and still
maintain) various hwmon drivers, various USB webcam drivers, libv4l and
libgphoto camlibs for photoframes which use a custom usb protocol and
usb redirection over the network for qemu.

And although not the original author I'm now a days a maintainer of
libusb, usb-3 bulk streams support in the xhci driver, the uas driver,
and libahci_platform.

In my spare time I work on u-boot and Linux support for Allwinner ARM
SoCs, I'm a u-boot maintainer for these, and maintain the sata, mmc
and usb kernel drivers.

About a year ago I joined the Red Hat graphics team, where my first
task was to help to get libinput up to a level where it could be
used as the input stack for Wayland. The work on this is winding
down, so my next project is to help out with nouveau and here I am.

My knowledge of GPU-s and basically anything 3d is quite limited
atm, so I will likely be asking a lot of questions to get up to
speed. I'll also join the #nouveau channel on irc, so I'll see
you all there.

Regards,

Hans


Hey Hans.

That's very good news! Welcome to the project!

Do you know if you are going to help mostly on the kernel or mesa side?


Where-ever help is needed is the best way I can describe the plan
I guess. Step 1 is getting up to speed, I'll likely focus on Mesa
first for that.

Regards,

Hans



We sure need help! Welcome to the project from my side as well!

Tobias



___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 2/2] nv30/draw: switch varying hookup logic to know about texcoords

2015-05-26 Thread Tobias Klausmann




On 26.05.2015 02:59, Ilia Mirkin wrote:

On Mon, May 25, 2015 at 8:55 PM, Tobias Klausmann
tobias.johannes.klausm...@mni.thm.de wrote:


On 26.05.2015 02:49, Ilia Mirkin wrote:

On Mon, May 25, 2015 at 8:37 PM, Tobias Klausmann
tobias.johannes.klausm...@mni.thm.de wrote:


On 25.05.2015 21:29, Ilia Mirkin wrote:

Commit 8acaf862dfe switched things over to use TEXCOORD instead of
GENERIC, but did not update the nv30 swtnl draw paths. This teaches the
draw logic about TEXCOORD.

Among other things, this fixes a crash in demos/arbocclude when using
swtnl. Curiously enough, the point-sprite piglit works without this.

Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
Cc: 10.5 10.6 mesa-sta...@lists.freedesktop.org
---
src/gallium/drivers/nouveau/nv30/nv30_draw.c | 25
-
1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv30/nv30_draw.c
b/src/gallium/drivers/nouveau/nv30/nv30_draw.c
index a681135..03c0c70 100644
--- a/src/gallium/drivers/nouveau/nv30/nv30_draw.c
+++ b/src/gallium/drivers/nouveau/nv30/nv30_draw.c
@@ -230,22 +230,24 @@ static const struct {
   [TGSI_SEMANTIC_BCOLOR  ] = { EMIT_4F, INTERP_LINEAR , 1, 3,
0x0004 },
   [TGSI_SEMANTIC_FOG ] = { EMIT_4F, INTERP_PERSPECTIVE, 5, 5,
0x0010 },
   [TGSI_SEMANTIC_PSIZE   ] = { EMIT_1F_PSIZE, INTERP_POS  , 6, 6,
0x0020 },
-   [TGSI_SEMANTIC_GENERIC ] = { EMIT_4F, INTERP_PERSPECTIVE, 8, 7,
0x4000 }
+   [TGSI_SEMANTIC_TEXCOORD] = { EMIT_4F, INTERP_PERSPECTIVE, 8, 7,
0x4000 },
};
  static boolean
vroute_add(struct nv30_render *r, uint attrib, uint sem, uint *idx)
{
-   struct pipe_screen *pscreen = r-nv30-screen-base.base;
+   struct nv30_screen *screen = r-nv30-screen;
   struct nv30_fragprog *fp = r-nv30-fragprog.program;
   struct vertex_info *vinfo = r-vertex_info;
   enum pipe_format format;
   uint emit = EMIT_OMIT;
   uint result = *idx;
-   if (sem == TGSI_SEMANTIC_GENERIC  result = 8) {
-  for (result = 0; result  8; result++) {
- if (fp-texcoord[result] == *idx) {
+   if (sem == TGSI_SEMANTIC_GENERIC) {
+  uint num_texcoords = (screen-eng3d-oclass  NV40_3D_CLASS) ? 8
:
10;
+  for (result = 0; result  num_texcoords; result++) {
+ if (fp-texcoord[result] == *idx + 8) {


maybe i'm too tired, but why exactly *idx + 8 ?

See nvfx_fragprog.c:

 fpc-fp-texcoord[hw] = fdec-Semantic.Index + 8;

when the semantic is GENERIC. (and 0xfffe when it's PCOORD). This is
because there can be up to 8 TEXCOORD's.


yet you run for 8 or 10 texcoords. Wont this cause problems on nv40+?

this is just the handle... it could just as well be + 1000. As long as
it's  8, since that's what gets stored for the TEXCOORD semantics.


oh right :)




+sem = TGSI_SEMANTIC_TEXCOORD;
emit = vroute[sem].emit;
break;
 }
@@ -260,11 +262,11 @@ vroute_add(struct nv30_render *r, uint attrib,
uint
sem, uint *idx)
   draw_emit_vertex_attr(vinfo, emit, vroute[sem].interp, attrib);
   format = draw_translate_vinfo_format(emit);
-   r-vtxfmt[attrib] = nv30_vtxfmt(pscreen, format)-hw;
+   r-vtxfmt[attrib] = nv30_vtxfmt(screen-base.base, format)-hw;
   r-vtxptr[attrib] = vinfo-size | NV30_3D_VTXBUF_DMA1;
   vinfo-size += draw_translate_vinfo_size(emit);
-   if (nv30_screen(pscreen)-eng3d-oclass  NV40_3D_CLASS) {
+   if (screen-eng3d-oclass  NV40_3D_CLASS) {
  r-vtxprog[attrib][0] = 0x001f38d8;
  r-vtxprog[attrib][1] = 0x0080001b | (attrib  9);
  r-vtxprog[attrib][2] = 0x0836106c;
@@ -276,7 +278,12 @@ vroute_add(struct nv30_render *r, uint attrib, uint
sem, uint *idx)
  r-vtxprog[attrib][3] = 0x6041ff80 | (result +
vroute[sem].vp40)
 2;
   }
-   *idx = vroute[sem].ow40  result;
+   if (result  8)
+  *idx = vroute[sem].ow40  result;
+   else {
+  assert(sem == TGSI_SEMANTIC_TEXCOORD);
+  *idx = 0x1000  (result - 8);
+   }
   return TRUE;
}
@@ -330,7 +337,7 @@ nv30_render_validate(struct nv30_context *nv30)
 while (pntc  attrib  16) {
  uint index = ffs(pntc) - 1; pntc = ~(1  index);
-  if (vroute_add(r, attrib, TGSI_SEMANTIC_GENERIC, index)) {
+  if (vroute_add(r, attrib, TGSI_SEMANTIC_TEXCOORD, index)) {
 vp_attribs |= (1  attrib++);
 vp_results |= index;
  }




___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [RFC PATCH 00/11] Implement ARB_cull_distance

2015-05-25 Thread Tobias Klausmann



On 25.05.2015 07:17, Dave Airlie wrote:

On 25 May 2015 at 08:11, Marek Olšák mar...@gmail.com wrote:

It's the same on Radeon. There are 2x ClipOrCullDistance output
vectors and a mask saying it should clip or cull or do nothing.

Marek


My thinking was gallium should have a single semantic and a mask in
the shader definition maybe.

though it doesn't solve the does nvidia do the right thing with
cull[0] and clip[0], and what is the right thing.

Dave.


I'm still convinced that both clip[0] and cull[0] should be possible. 
Plus i have written a shader_test for this a while ago which you pushed 
to piglit (fs-cull-and-clip-distance-different.shader_test). If i 
remember right nvidia passed that test just fine.

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH] docs: Mark ARB_cull_distance as in progress

2015-05-25 Thread Tobias Klausmann

Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de
---

I'm already getting emails wanting me to do this, so just mark it,
wont change anything really

 docs/GL3.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index 44a824b..8e1c8cd 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -190,7 +190,7 @@ GL 4.5, GLSL 4.50:
   GL_ARB_ES3_1_compatibility   not started
   GL_ARB_clip_control  DONE (i965, nv50, nvc0, 
r600, radeonsi, llvmpipe, softpipe)
   GL_ARB_conditional_render_inverted   DONE (i965, nv50, nvc0, 
llvmpipe, softpipe)
-  GL_ARB_cull_distance not started
+  GL_ARB_cull_distance in progress (Tobias)
   GL_ARB_derivative_controlDONE (i965, nv50, nvc0, 
r600)
   GL_ARB_direct_state_access   DONE (all drivers)
   - Transform Feedback object  DONE
-- 
2.4.1

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [Mesa-dev] [PATCH 09/11] gallium: add support for arb_cull_distance

2015-05-25 Thread Tobias Klausmann




On 25.05.2015 20:36, Roland Scheidegger wrote:

This doesn't really do what the commit message claims - it just adds a
cap bit, not actual support for arb_cull_distance (which was already
there), so the log should be changed accordingly.


Yep, you are completely right here, will change it to better represent 
what was done here.



Apart from what was already mentioned (that is, the cap bit isn't used
by st/mesa which instead mistakenly uses glsl13),


will change it for v2


  it would be nice if
you could test softpipe/llvmpipe with this enabled, as it should already
work (at least for llvmpipe, I'm not entirely sure for softpipe) if not
there's some unaccounted difference somewhere how we'd thought of how
this should work for dx10 vs. opengl).


Hey nice to know llvmpipe already supports this, i'll test it!

Tobias



Roland


Am 24.05.2015 um 19:58 schrieb Tobias Klausmann:

Add another pipe cap so we can savely enable of disable this extension

Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de
---
  src/gallium/auxiliary/cso_cache/cso_context.c| 3 +++
  src/gallium/drivers/freedreno/freedreno_screen.c | 1 +
  src/gallium/drivers/i915/i915_screen.c   | 1 +
  src/gallium/drivers/ilo/ilo_screen.c | 1 +
  src/gallium/drivers/llvmpipe/lp_screen.c | 2 ++
  src/gallium/drivers/nouveau/nv30/nv30_screen.c   | 1 +
  src/gallium/drivers/nouveau/nv50/nv50_screen.c   | 1 +
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c   | 1 +
  src/gallium/drivers/r300/r300_screen.c   | 1 +
  src/gallium/drivers/r600/r600_pipe.c | 1 +
  src/gallium/drivers/radeonsi/si_pipe.c   | 1 +
  src/gallium/drivers/softpipe/sp_screen.c | 2 ++
  src/gallium/drivers/svga/svga_screen.c   | 1 +
  src/gallium/drivers/vc4/vc4_screen.c | 1 +
  src/gallium/include/pipe/p_defines.h | 1 +
  15 files changed, 19 insertions(+)

diff --git a/src/gallium/auxiliary/cso_cache/cso_context.c 
b/src/gallium/auxiliary/cso_cache/cso_context.c
index 744b00c..7612b43 100644
--- a/src/gallium/auxiliary/cso_cache/cso_context.c
+++ b/src/gallium/auxiliary/cso_cache/cso_context.c
@@ -119,6 +119,9 @@ struct cso_context {
 struct pipe_clip_state clip;
 struct pipe_clip_state clip_saved;
  
+   struct pipe_clip_state cull;

+   struct pipe_clip_state cull_saved;
+
 struct pipe_framebuffer_state fb, fb_saved;
 struct pipe_viewport_state vp, vp_saved;
 struct pipe_blend_color blend_color;
diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c 
b/src/gallium/drivers/freedreno/freedreno_screen.c
index c596d03..986a942 100644
--- a/src/gallium/drivers/freedreno/freedreno_screen.c
+++ b/src/gallium/drivers/freedreno/freedreno_screen.c
@@ -221,6 +221,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_MULTISAMPLE_Z_RESOLVE:
case PIPE_CAP_RESOURCE_FROM_USER_MEMORY:
case PIPE_CAP_DEVICE_RESET_STATUS_QUERY:
+   case PIPE_CAP_CULL_DISTANCE:
return 0;
  
  	case PIPE_CAP_MAX_VIEWPORTS:

diff --git a/src/gallium/drivers/i915/i915_screen.c 
b/src/gallium/drivers/i915/i915_screen.c
index 03fecd1..678347d 100644
--- a/src/gallium/drivers/i915/i915_screen.c
+++ b/src/gallium/drivers/i915/i915_screen.c
@@ -242,6 +242,7 @@ i915_get_param(struct pipe_screen *screen, enum pipe_cap 
cap)
 case PIPE_CAP_MULTISAMPLE_Z_RESOLVE:
 case PIPE_CAP_RESOURCE_FROM_USER_MEMORY:
 case PIPE_CAP_DEVICE_RESET_STATUS_QUERY:
+   case PIPE_CAP_CULL_DISTANCE:
return 0;
  
 case PIPE_CAP_MAX_DUAL_SOURCE_RENDER_TARGETS:

diff --git a/src/gallium/drivers/ilo/ilo_screen.c 
b/src/gallium/drivers/ilo/ilo_screen.c
index b0fed73..f92d5de 100644
--- a/src/gallium/drivers/ilo/ilo_screen.c
+++ b/src/gallium/drivers/ilo/ilo_screen.c
@@ -459,6 +459,7 @@ ilo_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
 case PIPE_CAP_MULTISAMPLE_Z_RESOLVE:
 case PIPE_CAP_RESOURCE_FROM_USER_MEMORY:
 case PIPE_CAP_DEVICE_RESET_STATUS_QUERY:
+   case PIPE_CAP_CULL_DISTANCE:
return 0;
  
 case PIPE_CAP_VENDOR_ID:

diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c 
b/src/gallium/drivers/llvmpipe/lp_screen.c
index 09ac9af..c90c405 100644
--- a/src/gallium/drivers/llvmpipe/lp_screen.c
+++ b/src/gallium/drivers/llvmpipe/lp_screen.c
@@ -293,6 +293,8 @@ llvmpipe_get_param(struct pipe_screen *screen, enum 
pipe_cap param)
 case PIPE_CAP_RESOURCE_FROM_USER_MEMORY:
 case PIPE_CAP_DEVICE_RESET_STATUS_QUERY:
return 0;
+   case PIPE_CAP_CULL_DISTANCE:
+  return 0;
 }
 /* should only get here on unhandled cases */
 debug_printf(Unexpected PIPE_CAP %d query\n, param);
diff --git a/src/gallium/drivers/nouveau/nv30/nv30_screen.c 
b/src/gallium/drivers/nouveau/nv30/nv30_screen.c
index bb79ccc..fc33ddf 100644
--- a/src/gallium/drivers/nouveau/nv30/nv30_screen.c
+++ b/src/gallium/drivers/nouveau/nv30/nv30_screen.c
@@ -162,6 +162,7

Re: [Nouveau] [PATCH 1/2] nv30/draw: rework some of the output vertex buffer logic

2015-05-25 Thread Tobias Klausmann




On 25.05.2015 21:29, Ilia Mirkin wrote:

This makes the vertex buffer go to GART, not VRAM, and redoes the
mapping to not use the UNSYNCHRONIZED access (which is meaningless on a
VRAM buffer anyways). While we're at it, add some flushes for VBO data.

Moving the vertex buffer from VRAM to GART makes glxgears work fully
with NV30_SWTNL=1. The other changes just seem like a good idea. I'm not
sure *why* moving the buffer from VRAM makes it work... perhaps
something doesn't get flushed in time? However this is a single use by
the GPU buffer, so STREAM seems like the correct usage semantic for it.


i'm not really happy moving things to gart and don't see why this 
resolves the issue but granted if it works out :-)


Reviewed-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de


Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
Cc: 10.5 10.6 mesa-sta...@lists.freedesktop.org
---
  src/gallium/drivers/nouveau/nv30/nv30_draw.c | 30 +---
  1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv30/nv30_draw.c 
b/src/gallium/drivers/nouveau/nv30/nv30_draw.c
index 6a0d06f..a681135 100644
--- a/src/gallium/drivers/nouveau/nv30/nv30_draw.c
+++ b/src/gallium/drivers/nouveau/nv30/nv30_draw.c
@@ -71,12 +71,12 @@ nv30_render_allocate_vertices(struct vbuf_render *render,
 struct nv30_render *r = nv30_render(render);
 struct nv30_context *nv30 = r-nv30;
  
-   r-length = vertex_size * nr_vertices;

+   r-length = (uint32_t)vertex_size * (uint32_t)nr_vertices;
  
 if (r-offset + r-length = render-max_vertex_buffer_bytes) {

pipe_resource_reference(r-buffer, NULL);
r-buffer = pipe_buffer_create(nv30-screen-base.base,
- PIPE_BIND_VERTEX_BUFFER, 0,
+ PIPE_BIND_VERTEX_BUFFER, 
PIPE_USAGE_STREAM,
   render-max_vertex_buffer_bytes);
if (!r-buffer)
   return FALSE;
@@ -91,10 +91,14 @@ static void *
  nv30_render_map_vertices(struct vbuf_render *render)
  {
 struct nv30_render *r = nv30_render(render);
-   char *map = pipe_buffer_map(r-nv30-base.pipe, r-buffer,
-   PIPE_TRANSFER_WRITE |
-   PIPE_TRANSFER_UNSYNCHRONIZED, r-transfer);
-   return map + r-offset;
+   char *map = pipe_buffer_map_range(
+ r-nv30-base.pipe, r-buffer,
+ r-offset, r-length,
+ PIPE_TRANSFER_WRITE |
+ PIPE_TRANSFER_DISCARD_RANGE,
+ r-transfer);
+   assert(map);
+   return map;
  }
  
  static void

@@ -127,12 +131,18 @@ nv30_render_draw_elements(struct vbuf_render *render,
 for (i = 0; i  r-vertex_info.num_attribs; i++) {
PUSH_RESRC(push, NV30_3D(VTXBUF(i)), BUFCTX_VTXTMP,
 nv04_resource(r-buffer), r-offset + r-vtxptr[i],
-   NOUVEAU_BO_LOW | NOUVEAU_BO_RD, 0, 0);
+   NOUVEAU_BO_LOW | NOUVEAU_BO_RD, 0, NV30_3D_VTXBUF_DMA1);
 }
  
 if (!nv30_state_validate(nv30, ~0, FALSE))

return;
  
+   if (nv30-base.vbo_dirty) {

+  BEGIN_NV04(push, NV30_3D(VTX_CACHE_INVALIDATE_1710), 1);
+  PUSH_DATA (push, 0);
+  nv30-base.vbo_dirty = FALSE;
+   }
+
 BEGIN_NV04(push, NV30_3D(VERTEX_BEGIN_END), 1);
 PUSH_DATA (push, r-prim);
  
@@ -178,6 +188,12 @@ nv30_render_draw_arrays(struct vbuf_render *render, unsigned start, uint nr)

 if (!nv30_state_validate(nv30, ~0, FALSE))
return;
  
+   if (nv30-base.vbo_dirty) {

+  BEGIN_NV04(push, NV30_3D(VTX_CACHE_INVALIDATE_1710), 1);
+  PUSH_DATA (push, 0);
+  nv30-base.vbo_dirty = FALSE;
+   }
+
 BEGIN_NV04(push, NV30_3D(VERTEX_BEGIN_END), 1);
 PUSH_DATA (push, r-prim);
  


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 2/2] nv30/draw: switch varying hookup logic to know about texcoords

2015-05-25 Thread Tobias Klausmann




On 26.05.2015 02:49, Ilia Mirkin wrote:

On Mon, May 25, 2015 at 8:37 PM, Tobias Klausmann
tobias.johannes.klausm...@mni.thm.de wrote:


On 25.05.2015 21:29, Ilia Mirkin wrote:

Commit 8acaf862dfe switched things over to use TEXCOORD instead of
GENERIC, but did not update the nv30 swtnl draw paths. This teaches the
draw logic about TEXCOORD.

Among other things, this fixes a crash in demos/arbocclude when using
swtnl. Curiously enough, the point-sprite piglit works without this.

Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
Cc: 10.5 10.6 mesa-sta...@lists.freedesktop.org
---
   src/gallium/drivers/nouveau/nv30/nv30_draw.c | 25
-
   1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv30/nv30_draw.c
b/src/gallium/drivers/nouveau/nv30/nv30_draw.c
index a681135..03c0c70 100644
--- a/src/gallium/drivers/nouveau/nv30/nv30_draw.c
+++ b/src/gallium/drivers/nouveau/nv30/nv30_draw.c
@@ -230,22 +230,24 @@ static const struct {
  [TGSI_SEMANTIC_BCOLOR  ] = { EMIT_4F, INTERP_LINEAR , 1, 3,
0x0004 },
  [TGSI_SEMANTIC_FOG ] = { EMIT_4F, INTERP_PERSPECTIVE, 5, 5,
0x0010 },
  [TGSI_SEMANTIC_PSIZE   ] = { EMIT_1F_PSIZE, INTERP_POS  , 6, 6,
0x0020 },
-   [TGSI_SEMANTIC_GENERIC ] = { EMIT_4F, INTERP_PERSPECTIVE, 8, 7,
0x4000 }
+   [TGSI_SEMANTIC_TEXCOORD] = { EMIT_4F, INTERP_PERSPECTIVE, 8, 7,
0x4000 },
   };
 static boolean
   vroute_add(struct nv30_render *r, uint attrib, uint sem, uint *idx)
   {
-   struct pipe_screen *pscreen = r-nv30-screen-base.base;
+   struct nv30_screen *screen = r-nv30-screen;
  struct nv30_fragprog *fp = r-nv30-fragprog.program;
  struct vertex_info *vinfo = r-vertex_info;
  enum pipe_format format;
  uint emit = EMIT_OMIT;
  uint result = *idx;
   -   if (sem == TGSI_SEMANTIC_GENERIC  result = 8) {
-  for (result = 0; result  8; result++) {
- if (fp-texcoord[result] == *idx) {
+   if (sem == TGSI_SEMANTIC_GENERIC) {
+  uint num_texcoords = (screen-eng3d-oclass  NV40_3D_CLASS) ? 8 :
10;
+  for (result = 0; result  num_texcoords; result++) {
+ if (fp-texcoord[result] == *idx + 8) {


maybe i'm too tired, but why exactly *idx + 8 ?

See nvfx_fragprog.c:

fpc-fp-texcoord[hw] = fdec-Semantic.Index + 8;

when the semantic is GENERIC. (and 0xfffe when it's PCOORD). This is
because there can be up to 8 TEXCOORD's.


yet you run for 8 or 10 texcoords. Wont this cause problems on nv40+?




+sem = TGSI_SEMANTIC_TEXCOORD;
   emit = vroute[sem].emit;
   break;
}
@@ -260,11 +262,11 @@ vroute_add(struct nv30_render *r, uint attrib, uint
sem, uint *idx)
  draw_emit_vertex_attr(vinfo, emit, vroute[sem].interp, attrib);
  format = draw_translate_vinfo_format(emit);
   -   r-vtxfmt[attrib] = nv30_vtxfmt(pscreen, format)-hw;
+   r-vtxfmt[attrib] = nv30_vtxfmt(screen-base.base, format)-hw;
  r-vtxptr[attrib] = vinfo-size | NV30_3D_VTXBUF_DMA1;
  vinfo-size += draw_translate_vinfo_size(emit);
   -   if (nv30_screen(pscreen)-eng3d-oclass  NV40_3D_CLASS) {
+   if (screen-eng3d-oclass  NV40_3D_CLASS) {
 r-vtxprog[attrib][0] = 0x001f38d8;
 r-vtxprog[attrib][1] = 0x0080001b | (attrib  9);
 r-vtxprog[attrib][2] = 0x0836106c;
@@ -276,7 +278,12 @@ vroute_add(struct nv30_render *r, uint attrib, uint
sem, uint *idx)
 r-vtxprog[attrib][3] = 0x6041ff80 | (result + vroute[sem].vp40)
 2;
  }
   -   *idx = vroute[sem].ow40  result;
+   if (result  8)
+  *idx = vroute[sem].ow40  result;
+   else {
+  assert(sem == TGSI_SEMANTIC_TEXCOORD);
+  *idx = 0x1000  (result - 8);
+   }
  return TRUE;
   }
   @@ -330,7 +337,7 @@ nv30_render_validate(struct nv30_context *nv30)
while (pntc  attrib  16) {
 uint index = ffs(pntc) - 1; pntc = ~(1  index);
-  if (vroute_add(r, attrib, TGSI_SEMANTIC_GENERIC, index)) {
+  if (vroute_add(r, attrib, TGSI_SEMANTIC_TEXCOORD, index)) {
vp_attribs |= (1  attrib++);
vp_results |= index;
 }




___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [Mesa-dev] [PATCH 2/2] nv30: fix clip plane uploads and enable changes

2015-05-24 Thread Tobias Klausmann




On 24.05.2015 16:15, Pierre Moreau wrote:

On 24 May 2015, at 16:03, Tobias Klausmann 
tobias.johannes.klausm...@mni.thm.de wrote:



On 24.05.2015 10:38, Samuel Pitoiset wrote:


On 05/24/2015 06:58 AM, Ilia Mirkin wrote:

nv30_validate_clip depends on the rasterizer state. Also we should
upload all the new clip planes on change since next time the plane data
won't have changed, but the enables might.

Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
---
  src/gallium/drivers/nouveau/nv30/nv30_state_validate.c | 16 +++-
  1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv30/nv30_state_validate.c 
b/src/gallium/drivers/nouveau/nv30/nv30_state_validate.c
index 86ac4f7..a954dcc 100644
--- a/src/gallium/drivers/nouveau/nv30/nv30_state_validate.c
+++ b/src/gallium/drivers/nouveau/nv30/nv30_state_validate.c
@@ -272,15 +272,13 @@ nv30_validate_clip(struct nv30_context *nv30)
 uint32_t clpd_enable = 0;
   for (i = 0; i  6; i++) {
-  if (nv30-rast-pipe.clip_plane_enable  (1  i)) {
- if (nv30-dirty  NV30_NEW_CLIP) {
-BEGIN_NV04(push, NV30_3D(VP_UPLOAD_CONST_ID), 5);
-PUSH_DATA (push, i);
-PUSH_DATAp(push, nv30-clip.ucp[i], 4);
- }
-
- clpd_enable |= 1  (1 + 4*i);
+  if (nv30-dirty  NV30_NEW_CLIP) {
+ BEGIN_NV04(push, NV30_3D(VP_UPLOAD_CONST_ID), 5);
+ PUSH_DATA (push, i);
+ PUSH_DATAp(push, nv30-clip.ucp[i], 4);
}
+  if (nv30-rast-pipe.clip_plane_enable  (1  i))
+ clpd_enable |= 2  (4*i);

Can you explain why did you change this line?

This does bother me as well :)

It should be the same as before but using one less addition: shifting 1 by 5 or 
2 by 4 is similar.


*dang* you are right. maybe we should change those lines along in nv50 
and nvc0, save the additional addition :-)


With this sorted out, series is:

Reviewed-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de






 }
   BEGIN_NV04(push, NV30_3D(VP_CLIP_PLANES_ENABLE), 1);
@@ -389,7 +387,7 @@ static struct state_validate hwtnl_validate_list[] = {
  { nv30_validate_stipple,   NV30_NEW_STIPPLE },
  { nv30_validate_scissor,   NV30_NEW_SCISSOR | NV30_NEW_RASTERIZER },
  { nv30_validate_viewport,  NV30_NEW_VIEWPORT },
-{ nv30_validate_clip,  NV30_NEW_CLIP },
+{ nv30_validate_clip,  NV30_NEW_CLIP | NV30_NEW_RASTERIZER },
  { nv30_fragprog_validate,  NV30_NEW_FRAGPROG | NV30_NEW_FRAGCONST },
  { nv30_vertprog_validate,  NV30_NEW_VERTPROG | NV30_NEW_VERTCONST |
 NV30_NEW_FRAGPROG | NV30_NEW_RASTERIZER },

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [Mesa-dev] [PATCH 2/2] nv30: fix clip plane uploads and enable changes

2015-05-24 Thread Tobias Klausmann




On 24.05.2015 10:38, Samuel Pitoiset wrote:



On 05/24/2015 06:58 AM, Ilia Mirkin wrote:

nv30_validate_clip depends on the rasterizer state. Also we should
upload all the new clip planes on change since next time the plane data
won't have changed, but the enables might.

Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
---
  src/gallium/drivers/nouveau/nv30/nv30_state_validate.c | 16 
+++-

  1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv30/nv30_state_validate.c 
b/src/gallium/drivers/nouveau/nv30/nv30_state_validate.c

index 86ac4f7..a954dcc 100644
--- a/src/gallium/drivers/nouveau/nv30/nv30_state_validate.c
+++ b/src/gallium/drivers/nouveau/nv30/nv30_state_validate.c
@@ -272,15 +272,13 @@ nv30_validate_clip(struct nv30_context *nv30)
 uint32_t clpd_enable = 0;
   for (i = 0; i  6; i++) {
-  if (nv30-rast-pipe.clip_plane_enable  (1  i)) {
- if (nv30-dirty  NV30_NEW_CLIP) {
-BEGIN_NV04(push, NV30_3D(VP_UPLOAD_CONST_ID), 5);
-PUSH_DATA (push, i);
-PUSH_DATAp(push, nv30-clip.ucp[i], 4);
- }
-
- clpd_enable |= 1  (1 + 4*i);
+  if (nv30-dirty  NV30_NEW_CLIP) {
+ BEGIN_NV04(push, NV30_3D(VP_UPLOAD_CONST_ID), 5);
+ PUSH_DATA (push, i);
+ PUSH_DATAp(push, nv30-clip.ucp[i], 4);
}
+  if (nv30-rast-pipe.clip_plane_enable  (1  i))
+ clpd_enable |= 2  (4*i);


Can you explain why did you change this line?


This does bother me as well :)





 }
   BEGIN_NV04(push, NV30_3D(VP_CLIP_PLANES_ENABLE), 1);
@@ -389,7 +387,7 @@ static struct state_validate 
hwtnl_validate_list[] = {

  { nv30_validate_stipple,   NV30_NEW_STIPPLE },
  { nv30_validate_scissor,   NV30_NEW_SCISSOR | 
NV30_NEW_RASTERIZER },

  { nv30_validate_viewport,  NV30_NEW_VIEWPORT },
-{ nv30_validate_clip,  NV30_NEW_CLIP },
+{ nv30_validate_clip,  NV30_NEW_CLIP | 
NV30_NEW_RASTERIZER },
  { nv30_fragprog_validate,  NV30_NEW_FRAGPROG | 
NV30_NEW_FRAGCONST },
  { nv30_vertprog_validate,  NV30_NEW_VERTPROG | 
NV30_NEW_VERTCONST |
 NV30_NEW_FRAGPROG | 
NV30_NEW_RASTERIZER },


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH 01/11] glapi: add GL_ARB_cull_distance

2015-05-24 Thread Tobias Klausmann

Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de
---
 src/mapi/glapi/gen/gl_API.xml | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
index 3090b9f..a792056 100644
--- a/src/mapi/glapi/gen/gl_API.xml
+++ b/src/mapi/glapi/gen/gl_API.xml
@@ -8247,7 +8247,12 @@
 enum name=QUERY_BY_REGION_NO_WAIT_INVERTED value=0x8E1A/
 /category
 
-!-- ARB extensions 162 - 163 --
+category name=ARB_cull_distance number=162
+enum name=MAX_CULL_DISTANCES  value=0x82F9/
+enum name=MAX_COMBINED_CLIP_AND_CULL_DISTANCESvalue=0x82FA/
+/category
+
+!-- ARB extensions 163 --
 
 xi:include href=ARB_direct_state_access.xml 
xmlns:xi=http://www.w3.org/2001/XInclude/
 
-- 
2.4.1

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [RFC PATCH 00/11] Implement ARB_cull_distance

2015-05-24 Thread Tobias Klausmann

This patch series adds the needed support for this extension to the various
parts of mesa to finally enable it for nvc0.

Dave Airlie (1):
  glsl: lower cull_distance into cull_distance_mesa

Tobias Klausmann (10):
  glapi: add GL_ARB_cull_distance
  mesa/main: add support for GL_ARB_cull_distance
  mesa/prog: Add varyings for arb_cull_distance
  mesa/st: add support for GL_ARB_cull_distance
  glsl: Add a helper to see if an array was unsize in the shader
  glsl: Add arb_cull_distance support
  i965: rename UsesClipDistanceOut to UsesClipCullDistanceOut
  gallium: add support for arb_cull_distance
  nouveau/codegen: sort in galliums cull_distance semantic into the
drivers bitmask
  nouveau/nvc0: implement cull_distance as a special form of clip
distance

 docs/GL3.txt   |   2 +-
 docs/relnotes/10.7.0.html  |   4 +-
 src/gallium/auxiliary/cso_cache/cso_context.c  |   3 +
 src/gallium/drivers/freedreno/freedreno_screen.c   |   1 +
 src/gallium/drivers/i915/i915_screen.c |   1 +
 src/gallium/drivers/ilo/ilo_screen.c   |   1 +
 src/gallium/drivers/llvmpipe/lp_screen.c   |   2 +
 .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  |   5 +
 src/gallium/drivers/nouveau/nv30/nv30_screen.c |   1 +
 src/gallium/drivers/nouveau/nv50/nv50_screen.c |   1 +
 src/gallium/drivers/nouveau/nvc0/nvc0_program.c|   6 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_program.h|   1 +
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c |   1 +
 .../drivers/nouveau/nvc0/nvc0_state_validate.c |   1 +
 src/gallium/drivers/r300/r300_screen.c |   1 +
 src/gallium/drivers/r600/r600_pipe.c   |   1 +
 src/gallium/drivers/radeonsi/si_pipe.c |   1 +
 src/gallium/drivers/softpipe/sp_screen.c   |   2 +
 src/gallium/drivers/svga/svga_screen.c |   1 +
 src/gallium/drivers/vc4/vc4_screen.c   |   1 +
 src/gallium/include/pipe/p_defines.h   |   1 +
 src/glsl/Makefile.sources  |   1 +
 src/glsl/ast_to_hir.cpp|  14 +
 src/glsl/builtin_variables.cpp |  13 +-
 src/glsl/glcpp/glcpp-parse.y   |   3 +
 src/glsl/glsl_parser_extras.cpp|   1 +
 src/glsl/glsl_parser_extras.h  |   3 +
 src/glsl/glsl_types.cpp|   8 +-
 src/glsl/glsl_types.h  |  10 +-
 src/glsl/ir_optimization.h |   1 +
 src/glsl/link_varyings.cpp |  17 +-
 src/glsl/link_varyings.h   |   3 +-
 src/glsl/linker.cpp| 124 +++--
 src/glsl/lower_cull_distance.cpp   | 549 +
 src/glsl/standalone_scaffolding.cpp|   1 +
 src/glsl/tests/varyings_test.cpp   |  27 +
 src/mapi/glapi/gen/gl_API.xml  |   7 +-
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |   2 +-
 src/mesa/drivers/dri/i965/brw_gs.c |   2 +-
 src/mesa/drivers/dri/i965/brw_vec4.cpp |   2 +-
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp |   2 +-
 src/mesa/drivers/dri/i965/brw_vs.c |   2 +-
 src/mesa/main/extensions.c |   1 +
 src/mesa/main/get.c|  26 +
 src/mesa/main/get_hash_params.py   |   4 +
 src/mesa/main/mtypes.h |  22 +-
 src/mesa/main/shaderapi.c  |   4 +-
 src/mesa/main/tests/enum_strings.cpp   |   2 +
 src/mesa/program/prog_print.c  |   4 +
 src/mesa/state_tracker/st_extensions.c |   4 +
 src/mesa/state_tracker/st_program.c|  34 ++
 51 files changed, 859 insertions(+), 72 deletions(-)
 create mode 100644 src/glsl/lower_cull_distance.cpp

-- 
2.4.1

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH 02/11] mesa/main: add support for GL_ARB_cull_distance

2015-05-24 Thread Tobias Klausmann

Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de
---
 src/mesa/main/extensions.c   |  1 +
 src/mesa/main/get.c  | 26 ++
 src/mesa/main/get_hash_params.py |  4 
 src/mesa/main/mtypes.h   | 22 +-
 src/mesa/main/shaderapi.c|  4 ++--
 src/mesa/main/tests/enum_strings.cpp |  2 ++
 6 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
index c82416a..2145502 100644
--- a/src/mesa/main/extensions.c
+++ b/src/mesa/main/extensions.c
@@ -99,6 +99,7 @@ static const struct extension extension_table[] = {
{ GL_ARB_copy_buffer, o(dummy_true),  
GL, 2008 },
{ GL_ARB_copy_image,  o(ARB_copy_image),  
GL, 2012 },
{ GL_ARB_conservative_depth,  o(ARB_conservative_depth),  
GL, 2011 },
+   { GL_ARB_cull_distance,   o(ARB_cull_distance),   
GL, 2014 },
{ GL_ARB_debug_output,o(dummy_true),  
GL, 2009 },
{ GL_ARB_depth_buffer_float,  o(ARB_depth_buffer_float),  
GL, 2008 },
{ GL_ARB_depth_clamp, o(ARB_depth_clamp), 
GL, 2003 },
diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
index 8a6c81a..1dcfcc9 100644
--- a/src/mesa/main/get.c
+++ b/src/mesa/main/get.c
@@ -143,6 +143,8 @@ enum value_extra {
EXTRA_VALID_DRAW_BUFFER,
EXTRA_VALID_TEXTURE_UNIT,
EXTRA_VALID_CLIP_DISTANCE,
+   EXTRA_VALID_CULL_DISTANCE,
+   EXTRA_VALID_CULL_AND_CLIP_DISTANCE,
EXTRA_FLUSH_CURRENT,
EXTRA_GLSL_130,
EXTRA_EXT_UBO_GS4,
@@ -267,6 +269,13 @@ static const int extra_valid_clip_distance[] = {
EXTRA_END
 };
 
+static const int extra_valid_clip_and_cull_distance[] = {
+   EXTRA_VALID_CLIP_DISTANCE,
+   EXTRA_VALID_CULL_DISTANCE,
+   EXTRA_VALID_CULL_AND_CLIP_DISTANCE,
+   EXTRA_END
+};
+
 static const int extra_flush_current_valid_texture_unit[] = {
EXTRA_FLUSH_CURRENT,
EXTRA_VALID_TEXTURE_UNIT,
@@ -393,6 +402,7 @@ EXTRA_EXT(INTEL_performance_query);
 EXTRA_EXT(ARB_explicit_uniform_location);
 EXTRA_EXT(ARB_clip_control);
 EXTRA_EXT(EXT_polygon_offset_clamp);
+EXTRA_EXT(ARB_cull_distance);
 
 static const int
 extra_ARB_color_buffer_float_or_glcore[] = {
@@ -1116,6 +1126,22 @@ check_extra(struct gl_context *ctx, const char *func, 
const struct value_desc *d
return GL_FALSE;
 }
 break;
+  case EXTRA_VALID_CULL_DISTANCE:
+if (d-pname - GL_MAX_CULL_DISTANCES = ctx-Const.MaxClipPlanes) {
+   _mesa_error(ctx, GL_INVALID_ENUM, %s(cull distance %u),
+   func, d-pname - GL_MAX_CULL_DISTANCES);
+   return GL_FALSE;
+}
+break;
+  case EXTRA_VALID_CULL_AND_CLIP_DISTANCE:
+if (d-pname - GL_MAX_COMBINED_CLIP_AND_CULL_DISTANCES =
+ ctx-Const.MaxClipPlanes) {
+   _mesa_error(ctx, GL_INVALID_ENUM,
+   %s(combined clip and cull distance %u), func,
+   d-pname - GL_MAX_COMBINED_CLIP_AND_CULL_DISTANCES);
+   return GL_FALSE;
+}
+break;
   case EXTRA_GLSL_130:
  api_check = GL_TRUE;
  if (ctx-Const.GLSLVersion = 130)
diff --git a/src/mesa/main/get_hash_params.py b/src/mesa/main/get_hash_params.py
index 41cb2c1..a63aba7 100644
--- a/src/mesa/main/get_hash_params.py
+++ b/src/mesa/main/get_hash_params.py
@@ -798,6 +798,10 @@ descriptor=[
   [ MIN_FRAGMENT_INTERPOLATION_OFFSET, 
CONTEXT_FLOAT(Const.MinFragmentInterpolationOffset), extra_ARB_gpu_shader5 ],
   [ MAX_FRAGMENT_INTERPOLATION_OFFSET, 
CONTEXT_FLOAT(Const.MaxFragmentInterpolationOffset), extra_ARB_gpu_shader5 ],
   [ FRAGMENT_INTERPOLATION_OFFSET_BITS, 
CONST(FRAGMENT_INTERPOLATION_OFFSET_BITS), extra_ARB_gpu_shader5 ],
+
+# GL_ARB_cull_distance
+  [ MAX_CULL_DISTANCES, CONTEXT_INT(Const.MaxClipPlanes), 
extra_ARB_cull_distance ],
+  [ MAX_COMBINED_CLIP_AND_CULL_DISTANCES, CONTEXT_INT(Const.MaxClipPlanes), 
extra_ARB_cull_distance ],
 ]},
 
 # Enums restricted to OpenGL Core profile
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 8342517..6425c06 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -236,6 +236,8 @@ typedef enum
VARYING_SLOT_CLIP_VERTEX, /* Does not appear in FS */
VARYING_SLOT_CLIP_DIST0,
VARYING_SLOT_CLIP_DIST1,
+   VARYING_SLOT_CULL_DIST0,
+   VARYING_SLOT_CULL_DIST1,
VARYING_SLOT_PRIMITIVE_ID, /* Does not appear in VS */
VARYING_SLOT_LAYER, /* Appears as VS or GS output */
VARYING_SLOT_VIEWPORT, /* Appears as VS or GS output */
@@ -272,6 +274,8 @@ typedef enum
 #define VARYING_BIT_CLIP_VERTEX BITFIELD64_BIT(VARYING_SLOT_CLIP_VERTEX)
 #define

[Nouveau] [PATCH 08/11] i965: rename UsesClipDistanceOut to UsesClipCullDistanceOut

2015-05-24 Thread Tobias Klausmann

Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de
---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   | 2 +-
 src/mesa/drivers/dri/i965/brw_gs.c | 2 +-
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 +-
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 2 +-
 src/mesa/drivers/dri/i965/brw_vs.c | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index ead7768..c4439fd 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -3912,7 +3912,7 @@ fs_visitor::emit_urb_writes()
fs_reg sources[8];
 
/* Lower legacy ff and ClipVertex clipping to clip distances */
-   if (key-base.userclip_active  !prog-UsesClipDistanceOut)
+   if (key-base.userclip_active  !prog-UsesClipCullDistanceOut)
   compute_clip_distance();
 
/* If we don't have any valid slots to write, just do a minimal urb write
diff --git a/src/mesa/drivers/dri/i965/brw_gs.c 
b/src/mesa/drivers/dri/i965/brw_gs.c
index 52c7303..2cb3fe2 100644
--- a/src/mesa/drivers/dri/i965/brw_gs.c
+++ b/src/mesa/drivers/dri/i965/brw_gs.c
@@ -314,7 +314,7 @@ brw_gs_populate_key(struct brw_context *brw,
 
key-base.program_string_id = gp-id;
brw_setup_vue_key_clip_info(brw, key-base,
-   gp-program.Base.UsesClipDistanceOut);
+   gp-program.Base.UsesClipCullDistanceOut);
 
/* _NEW_TEXTURE */
brw_populate_sampler_prog_key_data(ctx, prog, stage_state-sampler_count,
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index e9681b7..d99dcd0 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1730,7 +1730,7 @@ vec4_visitor::run()
}
base_ir = NULL;
 
-   if (key-userclip_active  !prog-UsesClipDistanceOut)
+   if (key-userclip_active  !prog-UsesClipCullDistanceOut)
   setup_uniform_clipplane_values();
 
emit_thread_end();
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 5a60fe4..0401171 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -3250,7 +3250,7 @@ vec4_visitor::emit_vertex()
}
 
/* Lower legacy ff and ClipVertex clipping to clip distances */
-   if (key-userclip_active  !prog-UsesClipDistanceOut) {
+   if (key-userclip_active  !prog-UsesClipCullDistanceOut) {
   current_annotation = user clip distances;
 
   output_reg[VARYING_SLOT_CLIP_DIST0] = dst_reg(this, 
glsl_type::vec4_type);
diff --git a/src/mesa/drivers/dri/i965/brw_vs.c 
b/src/mesa/drivers/dri/i965/brw_vs.c
index d03567e..eb69cc7 100644
--- a/src/mesa/drivers/dri/i965/brw_vs.c
+++ b/src/mesa/drivers/dri/i965/brw_vs.c
@@ -435,7 +435,7 @@ brw_vs_populate_key(struct brw_context *brw,
 */
key-base.program_string_id = vp-id;
brw_setup_vue_key_clip_info(brw, key-base,
-   vp-program.Base.UsesClipDistanceOut);
+   vp-program.Base.UsesClipCullDistanceOut);
 
/* _NEW_POLYGON */
if (brw-gen  6) {
-- 
2.4.1

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH 05/11] glsl: Add a helper to see if an array was unsize in the shader

2015-05-24 Thread Tobias Klausmann

Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de
---
 src/glsl/glsl_types.cpp |  8 
 src/glsl/glsl_types.h   | 10 --
 src/glsl/linker.cpp |  2 +-
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp
index f675e90..4bc7324 100644
--- a/src/glsl/glsl_types.cpp
+++ b/src/glsl/glsl_types.cpp
@@ -340,12 +340,12 @@ _mesa_glsl_release_types(void)
 }
 
 
-glsl_type::glsl_type(const glsl_type *array, unsigned length) :
+glsl_type::glsl_type(const glsl_type *array, unsigned length, bool 
was_unsized) :
base_type(GLSL_TYPE_ARRAY),
sampler_dimensionality(0), sampler_shadow(0), sampler_array(0),
sampler_type(0), interface_packing(0),
vector_elements(0), matrix_columns(0),
-   length(length), name(NULL)
+   length(length), name(NULL), was_unsized(was_unsized)
 {
this-fields.array = array;
/* Inherit the gl type of the base. The GL type is used for
@@ -635,7 +635,7 @@ glsl_type::get_sampler_instance(enum glsl_sampler_dim dim,
 }
 
 const glsl_type *
-glsl_type::get_array_instance(const glsl_type *base, unsigned array_size)
+glsl_type::get_array_instance(const glsl_type *base, unsigned array_size, bool 
was_unsized)
 {
/* Generate a name using the base type pointer in the key.  This is
 * done because the name of the base type may not be unique across
@@ -656,7 +656,7 @@ glsl_type::get_array_instance(const glsl_type *base, 
unsigned array_size)
 
if (t == NULL) {
   mtx_unlock(glsl_type::mutex);
-  t = new glsl_type(base, array_size);
+  t = new glsl_type(base, array_size, was_unsized);
   mtx_lock(glsl_type::mutex);
 
   hash_table_insert(array_types, (void *) t, ralloc_strdup(mem_ctx, key));
diff --git a/src/glsl/glsl_types.h b/src/glsl/glsl_types.h
index f54a939..d6ad450 100644
--- a/src/glsl/glsl_types.h
+++ b/src/glsl/glsl_types.h
@@ -183,6 +183,12 @@ struct glsl_type {
} fields;
 
/**
+* For \c GLSL_TYPE_ARRAY this determines if an array was unsized and
+* got changed to a sized array.
+*/
+   bool was_unsized;
+
+   /**
 * \name Pointers to various public type singletons
 */
/*@{*/
@@ -246,7 +252,7 @@ struct glsl_type {
 * Get the instance of an array type
 */
static const glsl_type *get_array_instance(const glsl_type *base,
- unsigned elements);
+ unsigned elements, bool 
was_unsized = false);
 
/**
 * Get the instance of a record type
@@ -677,7 +683,7 @@ private:
 enum glsl_interface_packing packing, const char *name);
 
/** Constructor for array types */
-   glsl_type(const glsl_type *array, unsigned length);
+   glsl_type(const glsl_type *array, unsigned length, bool was_unsized);
 
/** Hash table containing the known array types. */
static struct hash_table *array_types;
diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
index 9798afe..8eace14 100644
--- a/src/glsl/linker.cpp
+++ b/src/glsl/linker.cpp
@@ -1261,7 +1261,7 @@ private:
{
   if ((*type)-is_unsized_array()) {
  *type = glsl_type::get_array_instance((*type)-fields.array,
-   max_array_access + 1);
+   max_array_access + 1, true);
  assert(*type != NULL);
   }
}
-- 
2.4.1

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH 06/11] glsl: lower cull_distance into cull_distance_mesa

2015-05-24 Thread Tobias Klausmann

From: Dave Airlie airl...@redhat.com

Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de
---
 src/glsl/Makefile.sources|   1 +
 src/glsl/ir_optimization.h   |   1 +
 src/glsl/link_varyings.cpp   |  15 +-
 src/glsl/link_varyings.h |   3 +-
 src/glsl/linker.cpp  |   1 +
 src/glsl/lower_cull_distance.cpp | 549 +++
 6 files changed, 565 insertions(+), 5 deletions(-)
 create mode 100644 src/glsl/lower_cull_distance.cpp

diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
index d784a81..502b6ca 100644
--- a/src/glsl/Makefile.sources
+++ b/src/glsl/Makefile.sources
@@ -143,6 +143,7 @@ LIBGLSL_FILES = \
loop_unroll.cpp \
lower_clip_distance.cpp \
lower_const_arrays_to_uniforms.cpp \
+   lower_cull_distance.cpp \
lower_discard.cpp \
lower_discard_flow.cpp \
lower_if_to_cond_assign.cpp \
diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h
index e6939f3..1220df6 100644
--- a/src/glsl/ir_optimization.h
+++ b/src/glsl/ir_optimization.h
@@ -119,6 +119,7 @@ bool lower_variable_index_to_cond_assign(exec_list 
*instructions,
 bool lower_quadop_vector(exec_list *instructions, bool dont_lower_swz);
 bool lower_const_arrays_to_uniforms(exec_list *instructions);
 bool lower_clip_distance(gl_shader *shader);
+bool lower_cull_distance(gl_shader *shader);
 void lower_output_reads(exec_list *instructions);
 bool lower_packing_builtins(exec_list *instructions, int op_mask);
 void lower_ubo_reference(struct gl_shader *shader, exec_list *instructions);
diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp
index 7b2d4bd..46f84c6 100644
--- a/src/glsl/link_varyings.cpp
+++ b/src/glsl/link_varyings.cpp
@@ -301,6 +301,7 @@ tfeedback_decl::init(struct gl_context *ctx, const void 
*mem_ctx,
this-location = -1;
this-orig_name = input;
this-is_clip_distance_mesa = false;
+   this-is_cull_distance_mesa = false;
this-skip_components = 0;
this-next_buffer_separator = false;
this-matched_candidate = NULL;
@@ -351,6 +352,10 @@ tfeedback_decl::init(struct gl_context *ctx, const void 
*mem_ctx,
strcmp(this-var_name, gl_ClipDistance) == 0) {
   this-is_clip_distance_mesa = true;
}
+   if (ctx-Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].LowerClipDistance 

+   strcmp(this-var_name, gl_CullDistance) == 0) {
+  this-is_cull_distance_mesa = true;
+   }
 }
 
 
@@ -397,7 +402,8 @@ tfeedback_decl::assign_location(struct gl_context *ctx,
  this-matched_candidate-type-fields.array-matrix_columns;
   const unsigned vector_elements =
  this-matched_candidate-type-fields.array-vector_elements;
-  unsigned actual_array_size = this-is_clip_distance_mesa ?
+  unsigned actual_array_size =
+ (this-is_clip_distance_mesa || this-is_cull_distance_mesa) ?
  prog-LastClipDistanceArraySize :
  this-matched_candidate-type-array_size();
 
@@ -410,7 +416,8 @@ tfeedback_decl::assign_location(struct gl_context *ctx,
  actual_array_size);
 return false;
  }
- unsigned array_elem_size = this-is_clip_distance_mesa ?
+ unsigned array_elem_size =
+(this-is_clip_distance_mesa || this-is_cull_distance_mesa) ?
 1 : vector_elements * matrix_cols;
  fine_location += array_elem_size * this-array_subscript;
  this-size = 1;
@@ -419,7 +426,7 @@ tfeedback_decl::assign_location(struct gl_context *ctx,
   }
   this-vector_elements = vector_elements;
   this-matrix_columns = matrix_cols;
-  if (this-is_clip_distance_mesa)
+  if (this-is_clip_distance_mesa || this-is_cull_distance_mesa)
  this-type = GL_FLOAT;
   else
  this-type = this-matched_candidate-type-fields.array-gl_type;
@@ -542,7 +549,7 @@ const tfeedback_candidate *
 tfeedback_decl::find_candidate(gl_shader_program *prog,
hash_table *tfeedback_candidates)
 {
-   const char *name = this-is_clip_distance_mesa
+   const char *name = this-is_cull_distance_mesa ? gl_CullDistanceMESA : 
this-is_clip_distance_mesa
   ? gl_ClipDistanceMESA : this-var_name;
this-matched_candidate = (const tfeedback_candidate *)
   hash_table_find(tfeedback_candidates, name);
diff --git a/src/glsl/link_varyings.h b/src/glsl/link_varyings.h
index afc16a8..842ab7c 100644
--- a/src/glsl/link_varyings.h
+++ b/src/glsl/link_varyings.h
@@ -128,7 +128,7 @@ public:
 */
unsigned num_components() const
{
-  if (this-is_clip_distance_mesa)
+  if (this-is_clip_distance_mesa || this-is_cull_distance_mesa)
  return this-size;
   else
  return this-vector_elements * this-matrix_columns * this-size;
@@ -165,6 +165,7 @@ private:
 * gl_ClipDistance to gl_ClipDistanceMESA.
 */
bool is_clip_distance_mesa;
+   bool is_cull_distance_mesa

[Nouveau] [PATCH 07/11] glsl: Add arb_cull_distance support

2015-05-24 Thread Tobias Klausmann

Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de
---
 src/glsl/ast_to_hir.cpp |  14 +
 src/glsl/builtin_variables.cpp  |  13 +++-
 src/glsl/glcpp/glcpp-parse.y|   3 +
 src/glsl/glsl_parser_extras.cpp |   1 +
 src/glsl/glsl_parser_extras.h   |   3 +
 src/glsl/link_varyings.cpp  |   2 +-
 src/glsl/linker.cpp | 121 +---
 src/glsl/standalone_scaffolding.cpp |   1 +
 src/glsl/tests/varyings_test.cpp|  27 
 9 files changed, 145 insertions(+), 40 deletions(-)

diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
index 8aebb13..4db2b2e 100644
--- a/src/glsl/ast_to_hir.cpp
+++ b/src/glsl/ast_to_hir.cpp
@@ -1045,6 +1045,20 @@ check_builtin_array_max_size(const char *name, unsigned 
size,
   _mesa_glsl_error(loc, state, `gl_ClipDistance' array size cannot 
be larger than gl_MaxClipDistances (%u),
state-Const.MaxClipPlanes);
+   } else if (strcmp(gl_CullDistance, name) == 0
+   size  state-Const.MaxClipPlanes) {
+  /* From the ARB_cull_distance spec:
+   *
+   *   The gl_CullDistance array is predeclared as unsized and
+   *must be sized by the shader either redeclaring it with
+   *a size or indexing it only with integral constant
+   *expressions. The size determines the number and set of
+   *enabled cull distances and can be at most
+   *gl_MaxCullDistances.
+   */
+  _mesa_glsl_error(loc, state, `gl_CullDistance' array size cannot 
+   be larger than gl_MaxCullDistances (%u),
+   state-Const.MaxClipPlanes);
}
 }
 
diff --git a/src/glsl/builtin_variables.cpp b/src/glsl/builtin_variables.cpp
index 6806aa1..78c8db2 100644
--- a/src/glsl/builtin_variables.cpp
+++ b/src/glsl/builtin_variables.cpp
@@ -298,7 +298,7 @@ public:
const glsl_type *construct_interface_instance() const;
 
 private:
-   glsl_struct_field fields[10];
+   glsl_struct_field fields[11];
unsigned num_fields;
 };
 
@@ -600,6 +600,12 @@ builtin_variable_generator::generate_constants()
   add_const(gl_MaxVaryingComponents, state-ctx-Const.MaxVarying * 4);
}
 
+   if (state-is_version(450, 0) || state-ARB_cull_distance_enable) {
+  add_const(gl_MaxCullDistances, state-Const.MaxClipPlanes);
+  add_const(gl_MaxCombinedClipAndCullDistances,
+state-Const.MaxClipPlanes);
+   }
+
if (state-is_version(150, 0)) {
   add_const(gl_MaxVertexOutputComponents,
 state-Const.MaxVertexOutputComponents);
@@ -1029,6 +1035,11 @@ builtin_variable_generator::generate_varyings()
gl_ClipDistance);
}
 
+   if (state-is_version(450, 0) || state-ARB_cull_distance_enable) {
+   ADD_VARYING(VARYING_SLOT_CULL_DIST0, array(float_t, 0),
+   gl_CullDistance);
+   }
+
if (compatibility) {
   ADD_VARYING(VARYING_SLOT_TEX0, array(vec4_t, 0), gl_TexCoord);
   ADD_VARYING(VARYING_SLOT_FOGC, float_t, gl_FogFragCoord);
diff --git a/src/glsl/glcpp/glcpp-parse.y b/src/glsl/glcpp/glcpp-parse.y
index a11b6b2..536c17f 100644
--- a/src/glsl/glcpp/glcpp-parse.y
+++ b/src/glsl/glcpp/glcpp-parse.y
@@ -2483,6 +2483,9 @@ _glcpp_parser_handle_version_declaration(glcpp_parser_t 
*parser, intmax_t versio
 
   if (extensions-ARB_shader_precision)
  add_builtin_define(parser, GL_ARB_shader_precision, 1);
+
+  if (extensions-ARB_cull_distance)
+add_builtin_define(parser, GL_ARB_cull_distance, 1);
   }
}
 
diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp
index 046d5d7..d1cd8ff 100644
--- a/src/glsl/glsl_parser_extras.cpp
+++ b/src/glsl/glsl_parser_extras.cpp
@@ -554,6 +554,7 @@ static const _mesa_glsl_extension 
_mesa_glsl_supported_extensions[] = {
EXT(ARB_arrays_of_arrays,   true,  false, ARB_arrays_of_arrays),
EXT(ARB_compute_shader, true,  false, ARB_compute_shader),
EXT(ARB_conservative_depth, true,  false, 
ARB_conservative_depth),
+   EXT(ARB_cull_distance,  true,  false, ARB_cull_distance),
EXT(ARB_derivative_control, true,  false, 
ARB_derivative_control),
EXT(ARB_draw_buffers,   true,  false, dummy_true),
EXT(ARB_draw_instanced, true,  false, ARB_draw_instanced),
diff --git a/src/glsl/glsl_parser_extras.h b/src/glsl/glsl_parser_extras.h
index 9a0c24e..8572905 100644
--- a/src/glsl/glsl_parser_extras.h
+++ b/src/glsl/glsl_parser_extras.h
@@ -378,6 +378,7 @@ struct _mesa_glsl_parse_state {
 
   /* ARB_viewport_array */
   unsigned MaxViewports;
+
} Const;
 
/**
@@ -430,6 +431,8 @@ struct _mesa_glsl_parse_state {
bool ARB_compute_shader_warn;
bool ARB_conservative_depth_enable;
bool ARB_conservative_depth_warn;
+   bool

[Nouveau] [PATCH 10/11] nouveau/codegen: sort in galliums cull_distance semantic into the drivers bitmask

2015-05-24 Thread Tobias Klausmann

Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index ecd115f..381a958 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -1063,6 +1063,11 @@ bool Source::scanDeclaration(const struct 
tgsi_full_declaration *decl)
decl-Declaration.UsageMask  (si * 4);
 info-io.genUserClip = -1;
 break;
+ case TGSI_SEMANTIC_CULLDIST:
+info-io.cullDistanceMask |=
+   decl-Declaration.UsageMask  (si * 4);
+info-io.genUserClip = -1;
+break;
  case TGSI_SEMANTIC_SAMPLEMASK:
 info-io.sampleMask = i;
 break;
-- 
2.4.1

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH 04/11] mesa/st: add support for GL_ARB_cull_distance

2015-05-24 Thread Tobias Klausmann

Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de
---
 src/mesa/state_tracker/st_extensions.c |  4 
 src/mesa/state_tracker/st_program.c| 34 ++
 2 files changed, 38 insertions(+)

diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
index 23a4588..63f3334 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c
@@ -466,6 +466,7 @@ void st_init_extensions(struct pipe_screen *screen,
   { o(ARB_conditional_render_inverted),  
PIPE_CAP_CONDITIONAL_RENDER_INVERTED  },
   { o(ARB_texture_view), PIPE_CAP_SAMPLER_VIEW_TARGET  
},
   { o(ARB_clip_control), PIPE_CAP_CLIP_HALFZ   
},
+  { o(ARB_cull_distance),PIPE_CAP_CULL_DISTANCE
},
   { o(EXT_polygon_offset_clamp), PIPE_CAP_POLYGON_OFFSET_CLAMP 
},
};
 
@@ -678,6 +679,9 @@ void st_init_extensions(struct pipe_screen *screen,
if (glsl_feature_level = 410)
   extensions-ARB_shader_precision = GL_TRUE;
 
+   if (glsl_feature_level = 130)
+  extensions-ARB_cull_distance = GL_TRUE;
+
/* This extension needs full OpenGL 3.2, but we don't know if that's
 * supported at this point. Only check the GLSL version. */
if (consts-GLSLVersion = 150 
diff --git a/src/mesa/state_tracker/st_program.c 
b/src/mesa/state_tracker/st_program.c
index a9110d3..79e8ad7 100644
--- a/src/mesa/state_tracker/st_program.c
+++ b/src/mesa/state_tracker/st_program.c
@@ -253,6 +253,14 @@ st_prepare_vertex_program(struct gl_context *ctx,
 stvp-output_semantic_name[slot] = TGSI_SEMANTIC_CLIPDIST;
 stvp-output_semantic_index[slot] = 1;
 break;
+ case VARYING_SLOT_CULL_DIST0:
+stvp-output_semantic_name[slot] = TGSI_SEMANTIC_CULLDIST;
+stvp-output_semantic_index[slot] = 0;
+break;
+ case VARYING_SLOT_CULL_DIST1:
+stvp-output_semantic_name[slot] = TGSI_SEMANTIC_CULLDIST;
+stvp-output_semantic_index[slot] = 1;
+break;
  case VARYING_SLOT_EDGE:
 assert(0);
 break;
@@ -606,6 +614,16 @@ st_translate_fragment_program(struct st_context *st,
 input_semantic_index[slot] = 1;
 interpMode[slot] = TGSI_INTERPOLATE_PERSPECTIVE;
 break;
+ case VARYING_SLOT_CULL_DIST0:
+input_semantic_name[slot] = TGSI_SEMANTIC_CULLDIST;
+input_semantic_index[slot] = 0;
+interpMode[slot] = TGSI_INTERPOLATE_PERSPECTIVE;
+break;
+ case VARYING_SLOT_CULL_DIST1:
+input_semantic_name[slot] = TGSI_SEMANTIC_CULLDIST;
+input_semantic_index[slot] = 1;
+interpMode[slot] = TGSI_INTERPOLATE_PERSPECTIVE;
+break;
 /* In most cases, there is nothing special about these
  * inputs, so adopt a convention to use the generic
  * semantic name and the mesa VARYING_SLOT_ number as the
@@ -941,6 +959,14 @@ st_translate_geometry_program(struct st_context *st,
 input_semantic_name[slot] = TGSI_SEMANTIC_CLIPDIST;
 input_semantic_index[slot] = 1;
 break;
+ case VARYING_SLOT_CULL_DIST0:
+input_semantic_name[slot] = TGSI_SEMANTIC_CULLDIST;
+input_semantic_index[slot] = 0;
+break;
+ case VARYING_SLOT_CULL_DIST1:
+input_semantic_name[slot] = TGSI_SEMANTIC_CULLDIST;
+input_semantic_index[slot] = 1;
+break;
  case VARYING_SLOT_PSIZ:
 input_semantic_name[slot] = TGSI_SEMANTIC_PSIZE;
 input_semantic_index[slot] = 0;
@@ -1028,6 +1054,14 @@ st_translate_geometry_program(struct st_context *st,
 gs_output_semantic_name[slot] = TGSI_SEMANTIC_CLIPDIST;
 gs_output_semantic_index[slot] = 1;
 break;
+ case VARYING_SLOT_CULL_DIST0:
+gs_output_semantic_name[slot] = TGSI_SEMANTIC_CULLDIST;
+gs_output_semantic_index[slot] = 0;
+break;
+ case VARYING_SLOT_CULL_DIST1:
+gs_output_semantic_name[slot] = TGSI_SEMANTIC_CULLDIST;
+gs_output_semantic_index[slot] = 1;
+break;
  case VARYING_SLOT_LAYER:
 gs_output_semantic_name[slot] = TGSI_SEMANTIC_LAYER;
 gs_output_semantic_index[slot] = 0;
-- 
2.4.1

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH 11/11] nouveau/nvc0: implement cull_distance as a special form of clip distance

2015-05-24 Thread Tobias Klausmann

This enables ARB_cull_distance.

Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de
---
 docs/GL3.txt   | 2 +-
 docs/relnotes/10.7.0.html  | 4 +++-
 src/gallium/drivers/nouveau/nvc0/nvc0_program.c| 6 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_program.h| 1 +
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 2 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c | 1 +
 6 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index 9d56ee5..ebdae38 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -190,7 +190,7 @@ GL 4.5, GLSL 4.50:
   GL_ARB_ES3_1_compatibility   not started
   GL_ARB_clip_control  DONE (i965, nv50, nvc0, 
r600, radeonsi, llvmpipe, softpipe)
   GL_ARB_conditional_render_inverted   DONE (i965, nv50, nvc0, 
llvmpipe, softpipe)
-  GL_ARB_cull_distance not started
+  GL_ARB_cull_distance DONE (nvc0)
   GL_ARB_derivative_controlDONE (i965, nv50, nvc0, 
r600)
   GL_ARB_direct_state_access   DONE (all drivers)
   - Transform Feedback object  DONE
diff --git a/docs/relnotes/10.7.0.html b/docs/relnotes/10.7.0.html
index 6206716..12e6b5b 100644
--- a/docs/relnotes/10.7.0.html
+++ b/docs/relnotes/10.7.0.html
@@ -43,7 +43,9 @@ TBD.
 Note: some of the new features are only available with certain drivers.
 /p
 
-TBD.
+ul
+liGL_ARB_cull_distance on nvc0/li
+/ul
 
 h2Bug fixes/h2
 
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
index 4a47cb2..aa3b751 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
@@ -46,6 +46,7 @@ nvc0_shader_input_address(unsigned sn, unsigned si, unsigned 
ubase)
case TGSI_SEMANTIC_BCOLOR:   return 0x2a0 + si * 0x10;
case NV50_SEMANTIC_CLIPDISTANCE: return 0x2c0 + si * 0x4;
case TGSI_SEMANTIC_CLIPDIST: return 0x2c0 + si * 0x10;
+   case TGSI_SEMANTIC_CULLDIST: return 0x2c0 + si * 0x10;
case TGSI_SEMANTIC_CLIPVERTEX:   return 0x270;
case TGSI_SEMANTIC_PCOORD:   return 0x2e0;
case NV50_SEMANTIC_TESSCOORD:return 0x2f0;
@@ -75,6 +76,7 @@ nvc0_shader_output_address(unsigned sn, unsigned si, unsigned 
ubase)
case TGSI_SEMANTIC_BCOLOR:return 0x2a0 + si * 0x10;
case NV50_SEMANTIC_CLIPDISTANCE:  return 0x2c0 + si * 0x4;
case TGSI_SEMANTIC_CLIPDIST:  return 0x2c0 + si * 0x10;
+   case TGSI_SEMANTIC_CULLDIST:  return 0x2c0 + si * 0x10;
case TGSI_SEMANTIC_CLIPVERTEX:return 0x270;
case TGSI_SEMANTIC_TEXCOORD:  return 0x300 + si * 0x10;
case TGSI_SEMANTIC_EDGEFLAG:  return ~0;
@@ -255,11 +257,13 @@ nvc0_vtgp_gen_header(struct nvc0_program *vp, struct 
nv50_ir_prog_info *info)
   }
}
 
-   vp-vp.clip_enable = info-io.clipDistanceMask;
for (i = 0; i  8; ++i)
   if (info-io.cullDistanceMask  (1  i))
  vp-vp.clip_mode |= 1  (i * 4);
 
+   vp-vp.clip_enable = info-io.clipDistanceMask;
+   vp-vp.cull_enable = info-io.cullDistanceMask;
+
if (info-io.genUserClip  0)
   vp-vp.num_ucps = PIPE_MAX_CLIP_PLANES + 1; /* prevent rebuilding */
 
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_program.h 
b/src/gallium/drivers/nouveau/nvc0/nvc0_program.h
index 3fd9d21..b8b1a5a 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_program.h
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_program.h
@@ -39,6 +39,7 @@ struct nvc0_program {
struct {
   uint32_t clip_mode; /* clip/cull selection */
   uint8_t clip_enable; /* mask of defined clip planes */
+  uint8_t cull_enable; /* mask of defined cull planes */
   uint8_t num_ucps; /* also set to max if ClipDistance is used */
   uint8_t edgeflag; /* attribute index of edgeflag input */
   boolean need_vertex_id;
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index c942dda..56d22a0 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -174,6 +174,7 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_CLIP_HALFZ:
case PIPE_CAP_POLYGON_OFFSET_CLAMP:
case PIPE_CAP_MULTISAMPLE_Z_RESOLVE:
+   case PIPE_CAP_CULL_DISTANCE:
   return 1;
case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE:
   return (class_3d = NVE4_3D_CLASS) ? 1 : 0;
@@ -194,7 +195,6 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_VERTEXID_NOBASE:
case PIPE_CAP_RESOURCE_FROM_USER_MEMORY:
case PIPE_CAP_DEVICE_RESET_STATUS_QUERY:
-   case PIPE_CAP_CULL_DISTANCE:
   return 0;
 
case PIPE_CAP_VENDOR_ID:
diff --git a/src/gallium/drivers/nouveau

[Nouveau] [PATCH 09/11] gallium: add support for arb_cull_distance

2015-05-24 Thread Tobias Klausmann

Add another pipe cap so we can savely enable of disable this extension

Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de
---
 src/gallium/auxiliary/cso_cache/cso_context.c| 3 +++
 src/gallium/drivers/freedreno/freedreno_screen.c | 1 +
 src/gallium/drivers/i915/i915_screen.c   | 1 +
 src/gallium/drivers/ilo/ilo_screen.c | 1 +
 src/gallium/drivers/llvmpipe/lp_screen.c | 2 ++
 src/gallium/drivers/nouveau/nv30/nv30_screen.c   | 1 +
 src/gallium/drivers/nouveau/nv50/nv50_screen.c   | 1 +
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c   | 1 +
 src/gallium/drivers/r300/r300_screen.c   | 1 +
 src/gallium/drivers/r600/r600_pipe.c | 1 +
 src/gallium/drivers/radeonsi/si_pipe.c   | 1 +
 src/gallium/drivers/softpipe/sp_screen.c | 2 ++
 src/gallium/drivers/svga/svga_screen.c   | 1 +
 src/gallium/drivers/vc4/vc4_screen.c | 1 +
 src/gallium/include/pipe/p_defines.h | 1 +
 15 files changed, 19 insertions(+)

diff --git a/src/gallium/auxiliary/cso_cache/cso_context.c 
b/src/gallium/auxiliary/cso_cache/cso_context.c
index 744b00c..7612b43 100644
--- a/src/gallium/auxiliary/cso_cache/cso_context.c
+++ b/src/gallium/auxiliary/cso_cache/cso_context.c
@@ -119,6 +119,9 @@ struct cso_context {
struct pipe_clip_state clip;
struct pipe_clip_state clip_saved;
 
+   struct pipe_clip_state cull;
+   struct pipe_clip_state cull_saved;
+
struct pipe_framebuffer_state fb, fb_saved;
struct pipe_viewport_state vp, vp_saved;
struct pipe_blend_color blend_color;
diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c 
b/src/gallium/drivers/freedreno/freedreno_screen.c
index c596d03..986a942 100644
--- a/src/gallium/drivers/freedreno/freedreno_screen.c
+++ b/src/gallium/drivers/freedreno/freedreno_screen.c
@@ -221,6 +221,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_MULTISAMPLE_Z_RESOLVE:
case PIPE_CAP_RESOURCE_FROM_USER_MEMORY:
case PIPE_CAP_DEVICE_RESET_STATUS_QUERY:
+   case PIPE_CAP_CULL_DISTANCE:
return 0;
 
case PIPE_CAP_MAX_VIEWPORTS:
diff --git a/src/gallium/drivers/i915/i915_screen.c 
b/src/gallium/drivers/i915/i915_screen.c
index 03fecd1..678347d 100644
--- a/src/gallium/drivers/i915/i915_screen.c
+++ b/src/gallium/drivers/i915/i915_screen.c
@@ -242,6 +242,7 @@ i915_get_param(struct pipe_screen *screen, enum pipe_cap 
cap)
case PIPE_CAP_MULTISAMPLE_Z_RESOLVE:
case PIPE_CAP_RESOURCE_FROM_USER_MEMORY:
case PIPE_CAP_DEVICE_RESET_STATUS_QUERY:
+   case PIPE_CAP_CULL_DISTANCE:
   return 0;
 
case PIPE_CAP_MAX_DUAL_SOURCE_RENDER_TARGETS:
diff --git a/src/gallium/drivers/ilo/ilo_screen.c 
b/src/gallium/drivers/ilo/ilo_screen.c
index b0fed73..f92d5de 100644
--- a/src/gallium/drivers/ilo/ilo_screen.c
+++ b/src/gallium/drivers/ilo/ilo_screen.c
@@ -459,6 +459,7 @@ ilo_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
case PIPE_CAP_MULTISAMPLE_Z_RESOLVE:
case PIPE_CAP_RESOURCE_FROM_USER_MEMORY:
case PIPE_CAP_DEVICE_RESET_STATUS_QUERY:
+   case PIPE_CAP_CULL_DISTANCE:
   return 0;
 
case PIPE_CAP_VENDOR_ID:
diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c 
b/src/gallium/drivers/llvmpipe/lp_screen.c
index 09ac9af..c90c405 100644
--- a/src/gallium/drivers/llvmpipe/lp_screen.c
+++ b/src/gallium/drivers/llvmpipe/lp_screen.c
@@ -293,6 +293,8 @@ llvmpipe_get_param(struct pipe_screen *screen, enum 
pipe_cap param)
case PIPE_CAP_RESOURCE_FROM_USER_MEMORY:
case PIPE_CAP_DEVICE_RESET_STATUS_QUERY:
   return 0;
+   case PIPE_CAP_CULL_DISTANCE:
+  return 0;
}
/* should only get here on unhandled cases */
debug_printf(Unexpected PIPE_CAP %d query\n, param);
diff --git a/src/gallium/drivers/nouveau/nv30/nv30_screen.c 
b/src/gallium/drivers/nouveau/nv30/nv30_screen.c
index bb79ccc..fc33ddf 100644
--- a/src/gallium/drivers/nouveau/nv30/nv30_screen.c
+++ b/src/gallium/drivers/nouveau/nv30/nv30_screen.c
@@ -162,6 +162,7 @@ nv30_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_MULTISAMPLE_Z_RESOLVE:
case PIPE_CAP_RESOURCE_FROM_USER_MEMORY:
case PIPE_CAP_DEVICE_RESET_STATUS_QUERY:
+   case PIPE_CAP_CULL_DISTANCE:
   return 0;
 
case PIPE_CAP_VENDOR_ID:
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c 
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index f455a7f..d8efd75 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -210,6 +210,7 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_MULTISAMPLE_Z_RESOLVE: /* potentially supported on some hw */
case PIPE_CAP_RESOURCE_FROM_USER_MEMORY:
case PIPE_CAP_DEVICE_RESET_STATUS_QUERY:
+   case PIPE_CAP_CULL_DISTANCE:
   return 0;
 
case PIPE_CAP_VENDOR_ID:
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b

[Nouveau] [PATCH 03/11] mesa/prog: Add varyings for arb_cull_distance

2015-05-24 Thread Tobias Klausmann

Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de
---
 src/mesa/program/prog_print.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/mesa/program/prog_print.c b/src/mesa/program/prog_print.c
index d588d07..e8855cd 100644
--- a/src/mesa/program/prog_print.c
+++ b/src/mesa/program/prog_print.c
@@ -147,6 +147,8 @@ arb_input_attrib_string(GLuint index, GLenum progType)
   fragment.(twenty-one), /* VARYING_SLOT_VIEWPORT */
   fragment.(twenty-two), /* VARYING_SLOT_FACE */
   fragment.(twenty-three), /* VARYING_SLOT_PNTC */
+  fragment.(twenty-four), /* VARYING_SLOT_CULL_DIST0 */
+  fragment.(twenty-five), /* VARYING_SLOT_CULL_DIST1 */
   fragment.varying[0],
   fragment.varying[1],
   fragment.varying[2],
@@ -272,6 +274,8 @@ arb_output_attrib_string(GLuint index, GLenum progType)
   result.(twenty-one), /* VARYING_SLOT_VIEWPORT */
   result.(twenty-two), /* VARYING_SLOT_FACE */
   result.(twenty-three), /* VARYING_SLOT_PNTC */
+  result.(twenty-four), /* VARYING_SLOT_CULL_DIST0 */
+  result.(twenty-five), /* VARYING_SLOT_CULL_DIST1 */
   result.varying[0],
   result.varying[1],
   result.varying[2],
-- 
2.4.1

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 02/11] mesa/main: add support for GL_ARB_cull_distance

2015-05-24 Thread Tobias Klausmann




On 24.05.2015 20:11, Ilia Mirkin wrote:

On Sun, May 24, 2015 at 1:58 PM, Tobias Klausmann
tobias.johannes.klausm...@mni.thm.de wrote:

Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de
---
  src/mesa/main/extensions.c   |  1 +
  src/mesa/main/get.c  | 26 ++
  src/mesa/main/get_hash_params.py |  4 
  src/mesa/main/mtypes.h   | 22 +-
  src/mesa/main/shaderapi.c|  4 ++--
  src/mesa/main/tests/enum_strings.cpp |  2 ++
  6 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
index c82416a..2145502 100644
--- a/src/mesa/main/extensions.c
+++ b/src/mesa/main/extensions.c
@@ -99,6 +99,7 @@ static const struct extension extension_table[] = {
 { GL_ARB_copy_buffer, o(dummy_true), 
 GL, 2008 },
 { GL_ARB_copy_image,  o(ARB_copy_image), 
 GL, 2012 },
 { GL_ARB_conservative_depth,  o(ARB_conservative_depth), 
 GL, 2011 },
+   { GL_ARB_cull_distance,   o(ARB_cull_distance),   
GL, 2014 },
 { GL_ARB_debug_output,o(dummy_true), 
 GL, 2009 },
 { GL_ARB_depth_buffer_float,  o(ARB_depth_buffer_float), 
 GL, 2008 },
 { GL_ARB_depth_clamp, o(ARB_depth_clamp),
 GL, 2003 },
diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
index 8a6c81a..1dcfcc9 100644
--- a/src/mesa/main/get.c
+++ b/src/mesa/main/get.c
@@ -143,6 +143,8 @@ enum value_extra {
 EXTRA_VALID_DRAW_BUFFER,
 EXTRA_VALID_TEXTURE_UNIT,
 EXTRA_VALID_CLIP_DISTANCE,
+   EXTRA_VALID_CULL_DISTANCE,
+   EXTRA_VALID_CULL_AND_CLIP_DISTANCE,
 EXTRA_FLUSH_CURRENT,
 EXTRA_GLSL_130,
 EXTRA_EXT_UBO_GS4,
@@ -267,6 +269,13 @@ static const int extra_valid_clip_distance[] = {
 EXTRA_END
  };

+static const int extra_valid_clip_and_cull_distance[] = {
+   EXTRA_VALID_CLIP_DISTANCE,
+   EXTRA_VALID_CULL_DISTANCE,
+   EXTRA_VALID_CULL_AND_CLIP_DISTANCE,
+   EXTRA_END
+};
+
  static const int extra_flush_current_valid_texture_unit[] = {
 EXTRA_FLUSH_CURRENT,
 EXTRA_VALID_TEXTURE_UNIT,
@@ -393,6 +402,7 @@ EXTRA_EXT(INTEL_performance_query);
  EXTRA_EXT(ARB_explicit_uniform_location);
  EXTRA_EXT(ARB_clip_control);
  EXTRA_EXT(EXT_polygon_offset_clamp);
+EXTRA_EXT(ARB_cull_distance);

  static const int
  extra_ARB_color_buffer_float_or_glcore[] = {
@@ -1116,6 +1126,22 @@ check_extra(struct gl_context *ctx, const char *func, 
const struct value_desc *d
 return GL_FALSE;
  }
  break;
+  case EXTRA_VALID_CULL_DISTANCE:
+if (d-pname - GL_MAX_CULL_DISTANCES = ctx-Const.MaxClipPlanes) {
+   _mesa_error(ctx, GL_INVALID_ENUM, %s(cull distance %u),
+   func, d-pname - GL_MAX_CULL_DISTANCES);
+   return GL_FALSE;
+}
+break;
+  case EXTRA_VALID_CULL_AND_CLIP_DISTANCE:
+if (d-pname - GL_MAX_COMBINED_CLIP_AND_CULL_DISTANCES =
+ ctx-Const.MaxClipPlanes) {
+   _mesa_error(ctx, GL_INVALID_ENUM,
+   %s(combined clip and cull distance %u), func,
+   d-pname - GL_MAX_COMBINED_CLIP_AND_CULL_DISTANCES);
+   return GL_FALSE;
+}

huh?

I guess you were copying EXTRA_VALID_CLIP_DISTANCE? That's for
validating GL_CLIP_DISTANCE0..7 all in one go (and erroring out for
ones that are too high). That doesn't seem to apply here.

You don't appear to use extra_valid_clip_and_cull_distance either, so
I guess that makes sense... should remove the whole lot.


will do!




+break;
case EXTRA_GLSL_130:
   api_check = GL_TRUE;
   if (ctx-Const.GLSLVersion = 130)
diff --git a/src/mesa/main/get_hash_params.py b/src/mesa/main/get_hash_params.py
index 41cb2c1..a63aba7 100644
--- a/src/mesa/main/get_hash_params.py
+++ b/src/mesa/main/get_hash_params.py
@@ -798,6 +798,10 @@ descriptor=[
[ MIN_FRAGMENT_INTERPOLATION_OFFSET, 
CONTEXT_FLOAT(Const.MinFragmentInterpolationOffset), extra_ARB_gpu_shader5 ],
[ MAX_FRAGMENT_INTERPOLATION_OFFSET, 
CONTEXT_FLOAT(Const.MaxFragmentInterpolationOffset), extra_ARB_gpu_shader5 ],
[ FRAGMENT_INTERPOLATION_OFFSET_BITS, 
CONST(FRAGMENT_INTERPOLATION_OFFSET_BITS), extra_ARB_gpu_shader5 ],
+
+# GL_ARB_cull_distance
+  [ MAX_CULL_DISTANCES, CONTEXT_INT(Const.MaxClipPlanes), 
extra_ARB_cull_distance ],
+  [ MAX_COMBINED_CLIP_AND_CULL_DISTANCES, CONTEXT_INT(Const.MaxClipPlanes), 
extra_ARB_cull_distance ],
  ]},

  # Enums restricted to OpenGL Core profile
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 8342517..6425c06 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main

Re: [Nouveau] [Mesa-dev] [PATCH 04/11] mesa/st: add support for GL_ARB_cull_distance

2015-05-24 Thread Tobias Klausmann




On 24.05.2015 20:12, Marek Olšák wrote:

On Sun, May 24, 2015 at 7:58 PM, Tobias Klausmann
tobias.johannes.klausm...@mni.thm.de wrote:

Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de
---
  src/mesa/state_tracker/st_extensions.c |  4 
  src/mesa/state_tracker/st_program.c| 34 ++
  2 files changed, 38 insertions(+)

diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
index 23a4588..63f3334 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c
@@ -466,6 +466,7 @@ void st_init_extensions(struct pipe_screen *screen,
{ o(ARB_conditional_render_inverted),  
PIPE_CAP_CONDITIONAL_RENDER_INVERTED  },
{ o(ARB_texture_view), PIPE_CAP_SAMPLER_VIEW_TARGET 
 },
{ o(ARB_clip_control), PIPE_CAP_CLIP_HALFZ  
 },
+  { o(ARB_cull_distance),PIPE_CAP_CULL_DISTANCE
},
{ o(EXT_polygon_offset_clamp), PIPE_CAP_POLYGON_OFFSET_CLAMP
 },
 };

@@ -678,6 +679,9 @@ void st_init_extensions(struct pipe_screen *screen,
 if (glsl_feature_level = 410)
extensions-ARB_shader_precision = GL_TRUE;

+   if (glsl_feature_level = 130)
+  extensions-ARB_cull_distance = GL_TRUE;
+

This hunk is wrong and seems to be completely unnecessary.

Also, the patch which adds PIPE_CAP_CULL_DISTANCE should be before this patch.

Marek

removing and noted, thanks!
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [Mesa-dev] [PATCH 2/2] nv30: fix clip plane uploads and enable changes

2015-05-24 Thread Tobias Klausmann




On 24.05.2015 17:42, Ilia Mirkin wrote:

On Sun, May 24, 2015 at 10:56 AM, Tobias Klausmann
tobias.johannes.klausm...@mni.thm.de wrote:


On 24.05.2015 16:15, Pierre Moreau wrote:

On 24 May 2015, at 16:03, Tobias Klausmann
tobias.johannes.klausm...@mni.thm.de wrote:



On 24.05.2015 10:38, Samuel Pitoiset wrote:


On 05/24/2015 06:58 AM, Ilia Mirkin wrote:

nv30_validate_clip depends on the rasterizer state. Also we should
upload all the new clip planes on change since next time the plane data
won't have changed, but the enables might.

Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
---
   src/gallium/drivers/nouveau/nv30/nv30_state_validate.c | 16
+++-
   1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv30/nv30_state_validate.c
b/src/gallium/drivers/nouveau/nv30/nv30_state_validate.c
index 86ac4f7..a954dcc 100644
--- a/src/gallium/drivers/nouveau/nv30/nv30_state_validate.c
+++ b/src/gallium/drivers/nouveau/nv30/nv30_state_validate.c
@@ -272,15 +272,13 @@ nv30_validate_clip(struct nv30_context *nv30)
  uint32_t clpd_enable = 0;
for (i = 0; i  6; i++) {
-  if (nv30-rast-pipe.clip_plane_enable  (1  i)) {
- if (nv30-dirty  NV30_NEW_CLIP) {
-BEGIN_NV04(push, NV30_3D(VP_UPLOAD_CONST_ID), 5);
-PUSH_DATA (push, i);
-PUSH_DATAp(push, nv30-clip.ucp[i], 4);
- }
-
- clpd_enable |= 1  (1 + 4*i);
+  if (nv30-dirty  NV30_NEW_CLIP) {
+ BEGIN_NV04(push, NV30_3D(VP_UPLOAD_CONST_ID), 5);
+ PUSH_DATA (push, i);
+ PUSH_DATAp(push, nv30-clip.ucp[i], 4);
 }
+  if (nv30-rast-pipe.clip_plane_enable  (1  i))
+ clpd_enable |= 2  (4*i);

Can you explain why did you change this line?

This does bother me as well :)

It should be the same as before but using one less addition: shifting 1 by
5 or 2 by 4 is similar.


*dang* you are right. maybe we should change those lines along in nv50 and
nvc0, save the additional addition :-)

What lines?


With this sorted out, series is:

Not sure what you mean here... what do you want me to sort out? The 2
back into a +1? I was just looking at the defines like


Nah, i meant that it _is_ allright the way you did it and that we should 
change similar lines for clip in nv50/nvc0 the way you did here




#define NV30_3D_VP_CLIP_PLANES_ENABLE_PLANE1  0x0020

and the 2  4i seemed more obviously correct. Although they're
obviously identical.


Reviewed-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de



  }
BEGIN_NV04(push, NV30_3D(VP_CLIP_PLANES_ENABLE), 1);
@@ -389,7 +387,7 @@ static struct state_validate hwtnl_validate_list[]
= {
   { nv30_validate_stipple,   NV30_NEW_STIPPLE },
   { nv30_validate_scissor,   NV30_NEW_SCISSOR |
NV30_NEW_RASTERIZER },
   { nv30_validate_viewport,  NV30_NEW_VIEWPORT },
-{ nv30_validate_clip,  NV30_NEW_CLIP },
+{ nv30_validate_clip,  NV30_NEW_CLIP | NV30_NEW_RASTERIZER
},
   { nv30_fragprog_validate,  NV30_NEW_FRAGPROG |
NV30_NEW_FRAGCONST },
   { nv30_vertprog_validate,  NV30_NEW_VERTPROG |
NV30_NEW_VERTCONST |
  NV30_NEW_FRAGPROG |
NV30_NEW_RASTERIZER },

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau




___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [RFC PATCH 00/11] Implement ARB_cull_distance

2015-05-24 Thread Tobias Klausmann




On 24.05.2015 20:25, Ilia Mirkin wrote:

I'm having a bit of trouble tracing through this. What happens if I
have a shader that just does:

gl_ClipDistance[0] = 1;
gl_CullDistance[0] = 1;

what does the resulting TGSI look like? (Assuming that clip plane 0 is
enabled.) What about the generated nvc0 code (for the vertex shader)?


(hack up a patch for this, run it without DRI_PRIME=1, see i pass and 
forget to check it again)

yeah those are equal, sorry for wasting your time on this :/



On Sun, May 24, 2015 at 1:57 PM, Tobias Klausmann
tobias.johannes.klausm...@mni.thm.de wrote:

This patch series adds the needed support for this extension to the various
parts of mesa to finally enable it for nvc0.

Dave Airlie (1):
   glsl: lower cull_distance into cull_distance_mesa

Tobias Klausmann (10):
   glapi: add GL_ARB_cull_distance
   mesa/main: add support for GL_ARB_cull_distance
   mesa/prog: Add varyings for arb_cull_distance
   mesa/st: add support for GL_ARB_cull_distance
   glsl: Add a helper to see if an array was unsize in the shader
   glsl: Add arb_cull_distance support
   i965: rename UsesClipDistanceOut to UsesClipCullDistanceOut
   gallium: add support for arb_cull_distance
   nouveau/codegen: sort in galliums cull_distance semantic into the
 drivers bitmask
   nouveau/nvc0: implement cull_distance as a special form of clip
 distance

  docs/GL3.txt   |   2 +-
  docs/relnotes/10.7.0.html  |   4 +-
  src/gallium/auxiliary/cso_cache/cso_context.c  |   3 +
  src/gallium/drivers/freedreno/freedreno_screen.c   |   1 +
  src/gallium/drivers/i915/i915_screen.c |   1 +
  src/gallium/drivers/ilo/ilo_screen.c   |   1 +
  src/gallium/drivers/llvmpipe/lp_screen.c   |   2 +
  .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  |   5 +
  src/gallium/drivers/nouveau/nv30/nv30_screen.c |   1 +
  src/gallium/drivers/nouveau/nv50/nv50_screen.c |   1 +
  src/gallium/drivers/nouveau/nvc0/nvc0_program.c|   6 +-
  src/gallium/drivers/nouveau/nvc0/nvc0_program.h|   1 +
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c |   1 +
  .../drivers/nouveau/nvc0/nvc0_state_validate.c |   1 +
  src/gallium/drivers/r300/r300_screen.c |   1 +
  src/gallium/drivers/r600/r600_pipe.c   |   1 +
  src/gallium/drivers/radeonsi/si_pipe.c |   1 +
  src/gallium/drivers/softpipe/sp_screen.c   |   2 +
  src/gallium/drivers/svga/svga_screen.c |   1 +
  src/gallium/drivers/vc4/vc4_screen.c   |   1 +
  src/gallium/include/pipe/p_defines.h   |   1 +
  src/glsl/Makefile.sources  |   1 +
  src/glsl/ast_to_hir.cpp|  14 +
  src/glsl/builtin_variables.cpp |  13 +-
  src/glsl/glcpp/glcpp-parse.y   |   3 +
  src/glsl/glsl_parser_extras.cpp|   1 +
  src/glsl/glsl_parser_extras.h  |   3 +
  src/glsl/glsl_types.cpp|   8 +-
  src/glsl/glsl_types.h  |  10 +-
  src/glsl/ir_optimization.h |   1 +
  src/glsl/link_varyings.cpp |  17 +-
  src/glsl/link_varyings.h   |   3 +-
  src/glsl/linker.cpp| 124 +++--
  src/glsl/lower_cull_distance.cpp   | 549 +
  src/glsl/standalone_scaffolding.cpp|   1 +
  src/glsl/tests/varyings_test.cpp   |  27 +
  src/mapi/glapi/gen/gl_API.xml  |   7 +-
  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |   2 +-
  src/mesa/drivers/dri/i965/brw_gs.c |   2 +-
  src/mesa/drivers/dri/i965/brw_vec4.cpp |   2 +-
  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp |   2 +-
  src/mesa/drivers/dri/i965/brw_vs.c |   2 +-
  src/mesa/main/extensions.c |   1 +
  src/mesa/main/get.c|  26 +
  src/mesa/main/get_hash_params.py   |   4 +
  src/mesa/main/mtypes.h |  22 +-
  src/mesa/main/shaderapi.c  |   4 +-
  src/mesa/main/tests/enum_strings.cpp   |   2 +
  src/mesa/program/prog_print.c  |   4 +
  src/mesa/state_tracker/st_extensions.c |   4 +
  src/mesa/state_tracker/st_program.c|  34 ++
  51 files changed, 859 insertions(+), 72 deletions(-)
  create mode 100644 src/glsl/lower_cull_distance.cpp

--
2.4.1

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [RFC PATCH 00/11] Implement ARB_cull_distance

2015-05-24 Thread Tobias Klausmann




On 24.05.2015 21:36, Ilia Mirkin wrote:

On Sun, May 24, 2015 at 3:30 PM, Tobias Klausmann
tobias.johannes.klausm...@mni.thm.de wrote:


On 24.05.2015 20:25, Ilia Mirkin wrote:

I'm having a bit of trouble tracing through this. What happens if I
have a shader that just does:

gl_ClipDistance[0] = 1;
gl_CullDistance[0] = 1;

what does the resulting TGSI look like? (Assuming that clip plane 0 is
enabled.) What about the generated nvc0 code (for the vertex shader)?


(hack up a patch for this, run it without DRI_PRIME=1, see i pass and forget
to check it again)
yeah those are equal, sorry for wasting your time on this :/

Not a waste at all... let's ignore any shortcomings of your patches
for a second, and think it through -- what do you want the TGSI to
look like? I'm not even sure.

Do you want to have a separate 2x CLIPDIST and 2x CULLDIST and let the
driver worry about figuring out the max clip dist used and sticking
the cull dists above it? Or do you want to work it out at a lower
level where they share a single CLIPORCULLDIST semantic and get a
separate e.g. shader property that gives them the mask?

I don't know how other hardware works, but nv50/nvc0 hw has 8
clip_or_cull distances, and a mask that selects whether each is a clip
or a cull distance. But perhaps other hw has them totally separate,
dunno.


With my limited experience about other hardware i'd go with seperate 
clip/cull and let the drivers figure out the right way to place them. 
That gives us the freedom to have it the way nv50/nvc0 works and other 
ways, like seperated clip/cull_distances if needed. Maybe we should just 
consider nouveau and radeon and decide by the hw of these often used 
drivers.


Marek any comments on how the various radeons work?






On Sun, May 24, 2015 at 1:57 PM, Tobias Klausmann
tobias.johannes.klausm...@mni.thm.de wrote:

This patch series adds the needed support for this extension to the
various
parts of mesa to finally enable it for nvc0.

Dave Airlie (1):
glsl: lower cull_distance into cull_distance_mesa

Tobias Klausmann (10):
glapi: add GL_ARB_cull_distance
mesa/main: add support for GL_ARB_cull_distance
mesa/prog: Add varyings for arb_cull_distance
mesa/st: add support for GL_ARB_cull_distance
glsl: Add a helper to see if an array was unsize in the shader
glsl: Add arb_cull_distance support
i965: rename UsesClipDistanceOut to UsesClipCullDistanceOut
gallium: add support for arb_cull_distance
nouveau/codegen: sort in galliums cull_distance semantic into the
  drivers bitmask
nouveau/nvc0: implement cull_distance as a special form of clip
  distance

   docs/GL3.txt   |   2 +-
   docs/relnotes/10.7.0.html  |   4 +-
   src/gallium/auxiliary/cso_cache/cso_context.c  |   3 +
   src/gallium/drivers/freedreno/freedreno_screen.c   |   1 +
   src/gallium/drivers/i915/i915_screen.c |   1 +
   src/gallium/drivers/ilo/ilo_screen.c   |   1 +
   src/gallium/drivers/llvmpipe/lp_screen.c   |   2 +
   .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  |   5 +
   src/gallium/drivers/nouveau/nv30/nv30_screen.c |   1 +
   src/gallium/drivers/nouveau/nv50/nv50_screen.c |   1 +
   src/gallium/drivers/nouveau/nvc0/nvc0_program.c|   6 +-
   src/gallium/drivers/nouveau/nvc0/nvc0_program.h|   1 +
   src/gallium/drivers/nouveau/nvc0/nvc0_screen.c |   1 +
   .../drivers/nouveau/nvc0/nvc0_state_validate.c |   1 +
   src/gallium/drivers/r300/r300_screen.c |   1 +
   src/gallium/drivers/r600/r600_pipe.c   |   1 +
   src/gallium/drivers/radeonsi/si_pipe.c |   1 +
   src/gallium/drivers/softpipe/sp_screen.c   |   2 +
   src/gallium/drivers/svga/svga_screen.c |   1 +
   src/gallium/drivers/vc4/vc4_screen.c   |   1 +
   src/gallium/include/pipe/p_defines.h   |   1 +
   src/glsl/Makefile.sources  |   1 +
   src/glsl/ast_to_hir.cpp|  14 +
   src/glsl/builtin_variables.cpp |  13 +-
   src/glsl/glcpp/glcpp-parse.y   |   3 +
   src/glsl/glsl_parser_extras.cpp|   1 +
   src/glsl/glsl_parser_extras.h  |   3 +
   src/glsl/glsl_types.cpp|   8 +-
   src/glsl/glsl_types.h  |  10 +-
   src/glsl/ir_optimization.h |   1 +
   src/glsl/link_varyings.cpp |  17 +-
   src/glsl/link_varyings.h   |   3 +-
   src/glsl/linker.cpp| 124 +++--
   src/glsl/lower_cull_distance.cpp   | 549
+
   src/glsl/standalone_scaffolding.cpp|   1 +
   src/glsl/tests/varyings_test.cpp   |  27 +
   src/mapi/glapi/gen/gl_API.xml  |   7 +-
   src/mesa

Re: [Nouveau] [PATCH v2] nv50/ir: avoid messing up arg1 of PFETCH

2015-05-23 Thread Tobias Klausmann


Reviewed-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de

On 23.05.2015 18:56, Ilia Mirkin wrote:

There can be scenarios where the indirect arg of a PFETCH becomes
known, and so the code will attempt to propagate it. Use this
opportunity to just fold it into the first argument, and prevent the
load propagation pass from touching PFETCH further.

This fixes gs-input-array-vec4-index-rd.shader_test and
vs-output-array-vec4-index-wr-before-gs.shader_test on nvc0 at least.

Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
Cc: 10.5 10.6 mesa-sta...@lists.freedesktop.org
---

v1 - v2:
  - redo final section of ConstantFolding::expr using a switch, per tobijk

  .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 20 ++--
  1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 72dd31e..b7fcd56 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -236,6 +236,9 @@ LoadPropagation::visit(BasicBlock *bb)
if (i-op == OP_CALL) // calls have args as sources, they must be in 
regs
   continue;
  
+  if (i-op == OP_PFETCH) // pfetch expects arg1 to be a reg

+ continue;
+
if (i-srcExists(1))
   checkSwapSrc01(i);
  
@@ -581,6 +584,11 @@ ConstantFolding::expr(Instruction *i,

 case OP_POPCNT:
res.data.u32 = util_bitcount(a-data.u32  b-data.u32);
break;
+   case OP_PFETCH:
+  // The two arguments to pfetch are logically added together. Normally
+  // the second argument will not be constant, but that can happen.
+  res.data.u32 = a-data.u32 + b-data.u32;
+  break;
 default:
return;
 }
@@ -595,7 +603,9 @@ ConstantFolding::expr(Instruction *i,
  
 i-getSrc(0)-reg.data = res.data;
  
-   if (i-op == OP_MAD || i-op == OP_FMA) {

+   switch (i-op) {
+   case OP_MAD:
+   case OP_FMA: {
i-op = OP_ADD;
  
i-setSrc(1, i-getSrc(0));

@@ -610,8 +620,14 @@ ConstantFolding::expr(Instruction *i,
   bld.setPosition(i, false);
   i-setSrc(1, bld.loadImm(NULL, res.data.u32));
}
-   } else {
+  break;
+   }
+   case OP_PFETCH:
+  // Leave PFETCH alone... we just folded its 2 args into 1.
+  break;
+   default:
i-op = i-saturate ? OP_SAT : OP_MOV; /* SAT handled by unary() */
+  break;
 }
 i-subOp = 0;
  }


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [Mesa-dev] [PATCH] nv50/ir: avoid messing up arg1 of PFETCH

2015-05-23 Thread Tobias Klausmann




On 23.05.2015 08:06, Ilia Mirkin wrote:

There can be scenarios where the indirect arg of a PFETCH becomes
known, and so the code will attempt to propagate it. Use this
opportunity to just fold it into the first argument, and prevent the
load propagation pass from touching PFETCH further.

This fixes gs-input-array-vec4-index-rd.shader_test and
vs-output-array-vec4-index-wr-before-gs.shader_test on nvc0 at least.

Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
Cc: 10.5 10.6 mesa-sta...@lists.freedesktop.org
---
  src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 72dd31e..98e3d1f 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -236,6 +236,9 @@ LoadPropagation::visit(BasicBlock *bb)
if (i-op == OP_CALL) // calls have args as sources, they must be in 
regs
   continue;
  
+  if (i-op == OP_PFETCH) // pfetch expects arg1 to be a reg

+ continue;
+
if (i-srcExists(1))
   checkSwapSrc01(i);
  
@@ -581,6 +584,11 @@ ConstantFolding::expr(Instruction *i,

 case OP_POPCNT:
res.data.u32 = util_bitcount(a-data.u32  b-data.u32);
break;
+   case OP_PFETCH:
+  // The two arguments to pfetch are logically added together. Normally
+  // the second argument will not be constant, but that can happen.
+  res.data.u32 = a-data.u32 + b-data.u32;
+  break;
 default:
return;
 }
@@ -610,6 +618,8 @@ ConstantFolding::expr(Instruction *i,
   bld.setPosition(i, false);
   i-setSrc(1, bld.loadImm(NULL, res.data.u32));
}
+   } else if (i-op == OP_PFETCH) {
+  // Leave PFETCH alone... we just folded its 2 args into 1.
 } else {
i-op = i-saturate ? OP_SAT : OP_MOV; /* SAT handled by unary() */
 }
this last part sure works, but it gets ugly, while you are at it, can 
you change it to a switch statement?

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 4/9] nvkm/fb/ramnv50: Ressurect timing code, use proper timing/rammap handlers

2015-05-22 Thread Tobias Klausmann




On 23.05.2015 00:33, Roy Spliet wrote:

Might need some generalisation to  GT200. For those: use at your own risk!

Signed-off-by: Roy Spliet rspl...@eclipso.eu
---
  .../drm/nouveau/include/nvkm/subdev/bios/ramcfg.h  |  16 ++
  .../drm/nouveau/include/nvkm/subdev/bios/rammap.h  |   2 +
  drivers/gpu/drm/nouveau/nvkm/subdev/bios/rammap.c  |  29 
  drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramnv50.c   | 167 +
  4 files changed, 181 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/bios/ramcfg.h 
b/drivers/gpu/drm/nouveau/include/nvkm/subdev/bios/ramcfg.h
index c6fb6aa..f09b6bf 100644
--- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/bios/ramcfg.h
+++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/bios/ramcfg.h
@@ -35,6 +35,22 @@ struct nvbios_ramcfg {
unsigned ramcfg_DLLoff;
union {
struct {
+   unsigned ramcfg_00_03_01:1;
+   unsigned ramcfg_00_03_02:1;
+   unsigned ramcfg_00_03_08:1;
+   unsigned ramcfg_00_03_10:1;
+   unsigned ramcfg_00_04_02:1;
+   unsigned ramcfg_00_04_04:1;
+   unsigned ramcfg_00_04_20:1;
+   unsigned ramcfg_00_05:8;
+   unsigned ramcfg_00_06:8;
+   unsigned ramcfg_00_07:8;
+   unsigned ramcfg_00_08:8;
+   unsigned ramcfg_00_09:8;
+   unsigned ramcfg_00_0a_0f:4;
+   unsigned ramcfg_00_0a_f0:4;
+   };
+   struct {
unsigned ramcfg_10_02_01:1;
unsigned ramcfg_10_02_02:1;
unsigned ramcfg_10_02_04:1;
diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/bios/rammap.h 
b/drivers/gpu/drm/nouveau/include/nvkm/subdev/bios/rammap.h
index 609a905..2044fc9 100644
--- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/bios/rammap.h
+++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/bios/rammap.h
@@ -15,6 +15,8 @@ u32 nvbios_rammapEm(struct nvkm_bios *, u16 mhz,
  u32 nvbios_rammapSe(struct nvkm_bios *, u32 data,
u8 ever, u8 ehdr, u8 ecnt, u8 elen, int idx,
u8 *ver, u8 *hdr);
+u32 nvbios_rammapSp_from_perf(struct nvkm_bios *bios, u32 data, u8 size, int 
idx,
+   struct nvbios_ramcfg *p);
  u32 nvbios_rammapSp(struct nvkm_bios *, u32 data,
u8 ever, u8 ehdr, u8 ecnt, u8 elen, int idx,
u8 *ver, u8 *hdr, struct nvbios_ramcfg *);
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bios/rammap.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/bios/rammap.c
index a688d3b..4ec376a 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/bios/rammap.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bios/rammap.c
@@ -141,6 +141,35 @@ nvbios_rammapSe(struct nvkm_bios *bios, u32 data,
  }
  
  u32

+nvbios_rammapSp_from_perf(struct nvkm_bios *bios, u32 data, u8 size, int idx,
+   struct nvbios_ramcfg *p)
+{
+   data += (idx * size);
+
+   if (size  11)
+   return NULL;
+
+   p-ramcfg_timing   =  nv_ro08(bios, data + 0x01);
+   p-ramcfg_00_03_01 = (nv_ro08(bios, data + 0x03)  0x01)  0;
+   p-ramcfg_00_03_02 = (nv_ro08(bios, data + 0x03)  0x02)  1;
+   p-ramcfg_DLLoff   = (nv_ro08(bios, data + 0x03)  0x04)  2;
+   p-ramcfg_00_03_08 = (nv_ro08(bios, data + 0x03)  0x08)  3;
+   p-ramcfg_00_03_10 = (nv_ro08(bios, data + 0x03)  0x10)  4;
+   p-ramcfg_00_04_02 = (nv_ro08(bios, data + 0x04)  0x02)  1;
+   p-ramcfg_00_04_04 = (nv_ro08(bios, data + 0x04)  0x04)  2;
+   p-ramcfg_00_04_20 = (nv_ro08(bios, data + 0x04)  0x20)  5;
+   p-ramcfg_00_05= (nv_ro08(bios, data + 0x05)  0xff)  0;
+   p-ramcfg_00_06= (nv_ro08(bios, data + 0x06)  0xff)  0;
+   p-ramcfg_00_07= (nv_ro08(bios, data + 0x07)  0xff)  0;
+   p-ramcfg_00_08= (nv_ro08(bios, data + 0x08)  0xff)  0;
+   p-ramcfg_00_09= (nv_ro08(bios, data + 0x09)  0xff)  0;
+   p-ramcfg_00_0a_0f = (nv_ro08(bios, data + 0x0a)  0x0f)  0;
+   p-ramcfg_00_0a_f0 = (nv_ro08(bios, data + 0x0a)  0xf0)  4;
+
+   return data;
+}
+
+u32
  nvbios_rammapSp(struct nvkm_bios *bios, u32 data,
u8 ever, u8 ehdr, u8 ecnt, u8 elen, int idx,
u8 *ver, u8 *hdr, struct nvbios_ramcfg *p)
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramnv50.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramnv50.c
index a96e512..51f93a0 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramnv50.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramnv50.c
@@ -29,6 +29,7 @@
  #include subdev/bios.h
  #include subdev/bios/perf.h
  #include subdev/bios/pll.h
+#include subdev/bios/rammap.h
  #include subdev/bios/timing.h
  #include subdev/clk/pll.h
  
@@ -55,6 +56,84 @@ struct nv50_ram {

struct nv50_ramseq hwsq;
  };
  
+#define T(t)

Re: [Nouveau] [PATCH] fix a wrong use of a logical operator in drmmode_output_dpms()

2015-05-20 Thread Tobias Klausmann


looks good to me! :)

Feel free to add my R-b.

On 20.05.2015 17:08, Samuel Pitoiset wrote:

This is probably a typo error which has been introduced in 2009...
This fixes the following warning detected by Clang :

drmmode_display.c:907:30: warning: use of logical '' with constant operand 
[-Wconstant-logical-operand]
 if (props  (props-flags  DRM_MODE_PROP_ENUM)) {

Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com
---
  src/drmmode_display.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/drmmode_display.c b/src/drmmode_display.c
index 7c1d2bb..161bccd 100644
--- a/src/drmmode_display.c
+++ b/src/drmmode_display.c
@@ -904,7 +904,7 @@ drmmode_output_dpms(xf86OutputPtr output, int mode)
  
  	for (i = 0; i  koutput-count_props; i++) {

props = drmModeGetProperty(drmmode-fd, koutput-props[i]);
-   if (props  (props-flags  DRM_MODE_PROP_ENUM)) {
+   if (props  (props-flags  DRM_MODE_PROP_ENUM)) {
if (!strcmp(props-name, DPMS)) {
mode_id = koutput-props[i];
drmModeFreeProperty(props);


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH] ram/gf100-: error out if a ridiculous amount of vram is detected

2015-05-20 Thread Tobias Klausmann


Any idea on how to solve the problem. other than just reporting it?

But for now this adds a helpful error message... you may add my R-b.

On 20.05.2015 22:01, Ilia Mirkin wrote:

Some newer chips have trouble coming up, and we get bad MMIO reads from
them, like 0xbadf100. This ends up translating into crazy amounts of
VRAM, which destroys all sorts of other logic down the line. Instead,
fail device init.

Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
Cc: sta...@kernel.org
---
  drm/nouveau/nvkm/subdev/fb/ramgf100.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/drm/nouveau/nvkm/subdev/fb/ramgf100.c 
b/drm/nouveau/nvkm/subdev/fb/ramgf100.c
index de9f395..9d4d196 100644
--- a/drm/nouveau/nvkm/subdev/fb/ramgf100.c
+++ b/drm/nouveau/nvkm/subdev/fb/ramgf100.c
@@ -545,6 +545,12 @@ gf100_ram_create_(struct nvkm_object *parent, struct 
nvkm_object *engine,
}
}
  
+	/* if over 1TB of VRAM is reported, something went very wrong, bail */

+   if (ram-size  (1ULL  40)) {
+   nv_error(pfb, invalid vram size: %llx\n, ram-size);
+   return -EINVAL;
+   }
+
/* if all controllers have the same amount attached, there's no holes */
if (uniform) {
offset = rsvd_head;


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [Mesa-dev] [PATCH 00/12] Tessellation support for nvc0

2015-05-17 Thread Tobias Klausmann

as far as i can evaluate this without deeper insight into tess, this 
patchseries looks good to me!


Reviewed-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de

On 17.05.2015 07:07, Ilia Mirkin wrote:

This is enough to enable tessellation support on nvc0. It seems to
work a lot better on my GF108 than GK208. I suspect that there's some
sort of scheduling shenanigans that need to be adjusted for
kepler+. Or perhaps some shader header things.

Even with the GF108, I still get occasional blue triangles in Heaven,
but I get a *ton* of them on the GK208 -- seemingly the same issue,
but it's much worse on there.

Also there's about a 100% chance that gl_PrimitiveID doesn't work.

In any case, I plan on pushing this semi-soon unless there are any
loud objections. I don't think it's going to do too much good sitting
in my tree, or too much evil sitting upstream while core + st/mesa are
worked out.

Ilia Mirkin (12):
   nvc0: preliminary tess support
   nvc0: add support for setting patch vertices at draw time
   nvc0: add handling for set_tess_state callback
   nvc0: TESSCOORD comes in as a sysval, not an input
   nvc0/ir: mark varyings as per-patch based on semantic name
   nv50/ir: populate info structure based on new tess properties
   nv50/ir: set perPatch flag on load/stores to per-patch varyings
   nv50/ir: add support for reading outputs in tess control shaders
   nvc0/ir: patch vertex count is stored in the upper bits
   nvc0/ir: handle loads from outputs in control shaders
   nvc0/ir: allow tess eval output loads to be CSE'd
   nv50/ir: cleanup private enums that have graduated to gallium

  src/gallium/drivers/nouveau/codegen/nv50_ir.cpp|  4 +-
  .../drivers/nouveau/codegen/nv50_ir_driver.h   | 12 
  .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 56 +++--
  .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp  |  7 +++
  .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   |  2 +
  src/gallium/drivers/nouveau/nvc0/nvc0_context.h|  8 ++-
  src/gallium/drivers/nouveau/nvc0/nvc0_program.c| 56 +++--
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c |  7 +--
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.h |  1 +
  .../drivers/nouveau/nvc0/nvc0_shader_state.c   |  3 -
  src/gallium/drivers/nouveau/nvc0/nvc0_state.c  | 71 ++
  .../drivers/nouveau/nvc0/nvc0_state_validate.c | 11 
  src/gallium/drivers/nouveau/nvc0/nvc0_tex.c| 34 +--
  src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c|  9 ++-
  .../drivers/nouveau/nvc0/nvc0_vbo_translate.c  |  3 +-
  15 files changed, 200 insertions(+), 84 deletions(-)



___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH] nvc0: fix context destruction for partly implemented tesselation

2015-05-17 Thread Tobias Klausmann

Backtrace:
...
0x757fc392 in __assert_fail () from /lib64/libc.so.6
0x70cf5bec in nvc0_shader_stage (pipe=optimized out) at 
./nvc0/nvc0_context.h:204
nvc0_set_constant_buffer (pipe=0x631080, shader=optimized out, 
index=optimized out, cb=0x0)
  at nvc0/nvc0_state.c:771
0x70a2cf63 in st_destroy_context (st=0x68a9f0) at 
state_tracker/st_context.c:382
...

Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de
---
 src/gallium/drivers/nouveau/nvc0/nvc0_context.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h 
b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
index 09d08e4..f910541 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
@@ -195,8 +195,8 @@ nvc0_shader_stage(unsigned pipe)
 {
switch (pipe) {
case PIPE_SHADER_VERTEX: return 0;
-/* case PIPE_SHADER_TESSELLATION_CONTROL: return 1; */
-/* case PIPE_SHADER_TESSELLATION_EVALUATION: return 2; */
+   case PIPE_SHADER_TESS_CTRL: return 1;
+   case PIPE_SHADER_TESS_EVAL: return 2;
case PIPE_SHADER_GEOMETRY: return 3;
case PIPE_SHADER_FRAGMENT: return 4;
case PIPE_SHADER_COMPUTE: return 5;
-- 
2.4.1

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH] nvc0: switch mechanism for shader eviction to be a while loop

2015-05-10 Thread Tobias Klausmann




On 10.05.2015 07:57, Ilia Mirkin wrote:

This aligns it to work similarly to nv50. However there's no library
code there, so the whole thing can be freed. Here we end up with an
allocated node that's not attached to a specific program.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86792
Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
Cc: mesa-sta...@lists.freedesktop.org
---
  src/gallium/drivers/nouveau/nvc0/nvc0_program.c | 11 ++-
  1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
index c156e91..5589695 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
@@ -683,11 +683,12 @@ nvc0_program_upload_code(struct nvc0_context *nvc0, 
struct nvc0_program *prog)
 ret = nouveau_heap_alloc(screen-text_heap, size, prog, prog-mem);
 if (ret) {
struct nouveau_heap *heap = screen-text_heap;
-  struct nouveau_heap *iter;
-  for (iter = heap; iter  iter-next != heap; iter = iter-next) {
- struct nvc0_program *evict = iter-priv;
- if (evict)
-nouveau_heap_free(evict-mem);
+  /* Note that the code library, which is allocated before anything else,
+   * does not have a priv pointer. We can stop once we hit it.
+   */
+  while (heap-next  heap-next-priv) {
+ struct nvc0_program *evict = heap-next-priv;
+ nouveau_heap_free(evict-mem);
}
debug_printf(WARNING: out of code space, evicting all shaders.\n);
ret = nouveau_heap_alloc(heap, size, prog, prog-mem);


The new comment is a bit upside down, but thats not really a problem

R-b here as well
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 4/4] nv50/ir: allow OP_SET to merge with OP_SET_AND/etc as well as a neg

2015-05-09 Thread Tobias Klausmann




On 09.05.2015 07:35, Ilia Mirkin wrote:

This covers the pattern where a KILL_IF is used, which triggers a
comparison of -x to 0. This can usually be folded into the comparison whose
result is being compared to 0, however it may, itself, have already been
combined with another comparison. That shouldn't impact the logic of
this pass however. With this and the  1.0 change, code like

0020: 001c0001 80081df4 set b32 $r0 lt f32 $r0 0x3e80
0028: 001c 201fc000 and b32 $r0 $r0 0x3f80
0030: 7f9c001e dd885c00 set $p0 0x1 lt f32 neg $r0 0x0
0038: 003c 1980 $p0 discard

becomes

0020: 001c001d b5881df4 set $p0 0x1 lt f32 $r0 0x3e80
0028: 003c 1980 $p0 discard

Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
---
  .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 51 ++
  1 file changed, 33 insertions(+), 18 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index d8af19a..43a2fe9 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -278,7 +278,6 @@ private:
  
 void tryCollapseChainedMULs(Instruction *, const int s, ImmediateValue);
  
-   // TGSI 'true' is converted to -1 by F2I(NEG(SET)), track back to SET

 CmpInstruction *findOriginForTestWithZero(Value *);
  
 unsigned int foldCount;

@@ -337,16 +336,10 @@ ConstantFolding::findOriginForTestWithZero(Value *value)
return NULL;
 Instruction *insn = value-getInsn();
  
-   while (insn  insn-op != OP_SET) {

+   while (insn  insn-op != OP_SET  insn-op != OP_SET_AND 
+  insn-op != OP_SET_OR  insn-op != OP_SET_XOR) {
Instruction *next = NULL;
switch (insn-op) {
-  case OP_NEG:
-  case OP_ABS:
-  case OP_CVT:
- next = insn-getSrc(0)-getInsn();
- if (insn-sType != next-dType)
-return NULL;
- break;
case OP_MOV:
   next = insn-getSrc(0)-getInsn();
   break;
@@ -946,29 +939,51 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue 
imm0, int s)
  
 case OP_SET: // TODO: SET_AND,OR,XOR


delete this comment?!


 {
+  /* This optimizes the case where the output of a set is being compared
+   * to zero. Since the set can only produce 0/-1 (int) or 0/1 (float), we
+   * can be a lot cleverer in our comparison.
+   */
CmpInstruction *si = findOriginForTestWithZero(i-getSrc(t));
CondCode cc, ccZ;
-  if (i-src(t).mod != Modifier(0))
- return;
-  if (imm0.reg.data.u32 != 0 || !si || si-op != OP_SET)
+  if (imm0.reg.data.u32 != 0 || !si)
   return;
cc = si-setCond;
ccZ = (CondCode)((unsigned int)i-asCmp()-setCond  ~CC_U);
+  // We do everything assuming var (cmp) 0, reverse the condition if 0 is
+  // first.
if (s == 0)
   ccZ = reverseCondCode(ccZ);
+  // If there is a negative modifier, we need to undo that, by flipping
+  // the comparison to zero.
+  if (i-src(t).mod.neg())
+ ccZ = reverseCondCode(ccZ);
+  // If this is a signed comparison, we expect the input to be a regular
+  // boolean, i.e. 0/-1. However the rest of the logic assumes that true
+  // is positive, so just flip the sign.
+  if (i-sType == TYPE_S32) {
+ assert(!isFloatType(si-dType));
+ ccZ = reverseCondCode(ccZ);
+  }


can both this and the previous condition evaluate to true? if yes, this 
double-flips ccZ...



switch (ccZ) {
-  case CC_LT: cc = CC_FL; break;
-  case CC_GE: cc = CC_TR; break;
-  case CC_EQ: cc = inverseCondCode(cc); break;
-  case CC_LE: cc = inverseCondCode(cc); break;
-  case CC_GT: break;
-  case CC_NE: break;
+  case CC_LT: cc = CC_FL; break; // bool  0 -- this is never true
+  case CC_GE: cc = CC_TR; break; // bool = 0 -- this is always true
+  case CC_EQ: cc = inverseCondCode(cc); break; // bool == 0 -- !bool
+  case CC_LE: cc = inverseCondCode(cc); break; // bool = 0 -- !bool
+  case CC_GT: break; // bool  0 -- bool
+  case CC_NE: break; // bool != 0 -- bool
default:
   return;
}
+
+  // Update the condition of this SET to be identical to the origin set,
+  // but with the updated condition code. The original SET should get
+  // DCE'd, ideally.
+  i-op = si-op;
i-asCmp()-setCond = cc;
i-setSrc(0, si-src(0));
i-setSrc(1, si-src(1));
+  if (si-srcExists(2))
+ i-setSrc(2, si-src(2));
i-sType = si-sType;
 }
break;


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 3/4] nvc0/ir: optimize set 1.0 to produce boolean-float sets

2015-05-09 Thread Tobias Klausmann




On 09.05.2015 07:35, Ilia Mirkin wrote:

This has started to happen more now that the backend is producing
KILL_IF more often.

Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
---
  .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 29 ++
  .../nouveau/codegen/nv50_ir_target_nv50.cpp|  2 ++
  2 files changed, 31 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 14446b6..d8af19a 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -973,6 +973,35 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue 
imm0, int s)
 }
break;
  
+   case OP_AND:

+   {
+  CmpInstruction *cmp = i-getSrc(t)-getInsn()-asCmp();
+  if (!cmp || cmp-op == OP_SLCT)


how about if (cmp == NULL || ...) and kill the same condition later?


+ return;
+  if (!prog-getTarget()-isOpSupported(cmp-op, TYPE_F32))
+ return;
+  if (imm0.reg.data.f32 != 1.0)
+ return;
+  if (cmp == NULL)
+ return;
+  if (i-getSrc(t)-getInsn()-dType != TYPE_U32)
+ return;
+
+  i-getSrc(t)-getInsn()-dType = TYPE_F32;
+  if (i-src(t).mod != Modifier(0)) {
+ assert(i-src(t).mod == Modifier(NV50_IR_MOD_NOT));
+ i-src(t).mod = Modifier(0);
+ cmp-setCond = reverseCondCode(cmp-setCond);
+  }
+  i-op = OP_MOV;
+  i-setSrc(s, NULL);
+  if (t) {
+ i-setSrc(0, i-getSrc(t));
+ i-setSrc(t, NULL);
+  }
+   }
+  break;
+
 case OP_SHL:
 {
if (s != 1 || i-src(0).mod != Modifier(0))
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
index 178a167..70180eb 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
@@ -413,6 +413,8 @@ TargetNV50::isOpSupported(operation op, DataType ty) const
return false;
 case OP_SAD:
return ty == TYPE_S32;
+   case OP_SET:
+  return !isFloatType(ty);
 default:
return true;
 }


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH] nv50/ir: only enable mul saturate on G200+

2015-05-09 Thread Tobias Klausmann


Reviewed-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de

On 09.05.2015 09:31, Ilia Mirkin wrote:

Commit 44673512a84 enabled support for saturating fmul. However
experimentally this does not seem to work on the older chips. Restrict
the feature to G200 (NVA0) and later.

Reported-by: Pierre Moreau pierre.mor...@free.fr
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90350
Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
Cc: mesa-sta...@lists.freedesktop.org
---
  src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
index 70180eb..ca545a6 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
@@ -84,7 +84,7 @@ static const struct opProperties _initProps[] =
 //   neg  abs  not  sat  c[]  s[], a[], imm
 { OP_ADD,0x3, 0x0, 0x0, 0x8, 0x2, 0x1, 0x1, 0x2 },
 { OP_SUB,0x3, 0x0, 0x0, 0x8, 0x2, 0x1, 0x1, 0x2 },
-   { OP_MUL,0x3, 0x0, 0x0, 0x8, 0x2, 0x1, 0x1, 0x2 },
+   { OP_MUL,0x3, 0x0, 0x0, 0x0, 0x2, 0x1, 0x1, 0x2 },
 { OP_MAX,0x3, 0x3, 0x0, 0x0, 0x2, 0x1, 0x1, 0x0 },
 { OP_MIN,0x3, 0x3, 0x0, 0x0, 0x2, 0x1, 0x1, 0x0 },
 { OP_MAD,0x7, 0x0, 0x0, 0x8, 0x6, 0x1, 0x1, 0x0 }, // special 
constraint
@@ -188,6 +188,9 @@ void TargetNV50::initOpInfo()
if (prop-mSat  8)
   opInfo[prop-op].dstMods = NV50_IR_MOD_SAT;
 }
+
+   if (chipset = 0xa0)
+  opInfo[OP_MUL].dstMods = NV50_IR_MOD_SAT;
  }
  
  unsigned int


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 3/4] nvc0/ir: optimize set 1.0 to produce boolean-float sets

2015-05-09 Thread Tobias Klausmann




On 09.05.2015 19:53, Ilia Mirkin wrote:

On Sat, May 9, 2015 at 11:27 AM, Tobias Klausmann
tobias.johannes.klausm...@mni.thm.de wrote:


On 09.05.2015 07:35, Ilia Mirkin wrote:

This has started to happen more now that the backend is producing
KILL_IF more often.

Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
---
   .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 29
++
   .../nouveau/codegen/nv50_ir_target_nv50.cpp|  2 ++
   2 files changed, 31 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 14446b6..d8af19a 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -973,6 +973,35 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue
imm0, int s)
  }
 break;
   +   case OP_AND:
+   {
+  CmpInstruction *cmp = i-getSrc(t)-getInsn()-asCmp();
+  if (!cmp || cmp-op == OP_SLCT)


how about if (cmp == NULL || ...) and kill the same condition later?

I just killed the other one. I think the usual style tends to be if
(!ptr) rather than if (ptr == NULL) in codegen. Both are acceptable
though.


it was mainly about the dead code you killed now. With this it looks 
fine to me, so feel free to add


Reviewed-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de





+ return;
+  if (!prog-getTarget()-isOpSupported(cmp-op, TYPE_F32))
+ return;
+  if (imm0.reg.data.f32 != 1.0)
+ return;
+  if (cmp == NULL)
+ return;
+  if (i-getSrc(t)-getInsn()-dType != TYPE_U32)
+ return;
+
+  i-getSrc(t)-getInsn()-dType = TYPE_F32;
+  if (i-src(t).mod != Modifier(0)) {
+ assert(i-src(t).mod == Modifier(NV50_IR_MOD_NOT));
+ i-src(t).mod = Modifier(0);
+ cmp-setCond = reverseCondCode(cmp-setCond);
+  }
+  i-op = OP_MOV;
+  i-setSrc(s, NULL);
+  if (t) {
+ i-setSrc(0, i-getSrc(t));
+ i-setSrc(t, NULL);
+  }
+   }
+  break;
+
  case OP_SHL:
  {
 if (s != 1 || i-src(0).mod != Modifier(0))
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
index 178a167..70180eb 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
@@ -413,6 +413,8 @@ TargetNV50::isOpSupported(operation op, DataType ty)
const
 return false;
  case OP_SAD:
 return ty == TYPE_S32;
+   case OP_SET:
+  return !isFloatType(ty);
  default:
 return true;
  }




___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] Problem with GTX 970 under Fedora 21

2015-01-26 Thread Tobias Klausmann



On 25.01.2015 21:40, super_7b wrote:

Hi,

I took this issue to the Fedora forum initially, but no-one there has 
been able to offer any guidance, so I have decided to come to the 
nouveau community directly.


I was running a KDE desktop under Fedora 21 successfully, including 
the ability to use a direct text login on an older card (GXT 560 Ti) 
with an entirely stock Fedora, fully updated.


I simply replaced the old card with an Asus GTX 970 and re-booted.

I run with the rhgb quiet option removed from grub2, so I can see 
all that happens up to the graphical login screen and I noticed that 
the re-boot was different at the point at which the screen changed 
from a basic (80x25 ?) text to a higher resolution (128X50 ?). This no 
longer happened and I got a simple blank screen.


If I wait, the graphical boot screen eventually appears and I can 
login to KDE successfully and run my desktop apps as normal. If I 
switch consoles (e.g. Ctrl-Alt-F2) for a text login, I get a black 
screen.


The last visible thing on my boot screen before the black is something 
like fb: switching to nouveaufb from EFI VGA.


When I looked at the dmesg output, lsmod and lshw, it appears that the 
nouveau driver does not correctly detect and initialise the GTX 970.


Have I a configuration error, or is there something not working in 
nouveau?


I attach logs from dmesg, lshw and lsmod and I can supply more data if 
needed.


BR

Mick



Hi,
the 9xx series is rather new and is not supported by nouveau for now 
(modesetting is in 3.19rcX imho). Concerning acceleration: Nvidia needs 
to provide signed firmwares for those cards under a license appropriate 
to include in an open source project, until that happens, there wont be 
substantial improvements.


Greetings
Tobias
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH] fuse/gm107: simplify the return logic

2015-01-25 Thread Tobias Klausmann



On 25.01.2015 20:35, Martin Peres wrote:
Spotted by coccinelle: 
drivers/gpu/drm/nouveau/core/subdev/fuse/gm107.c:50:5-8: WARNING: end 
returns can be simpified Signed-off-by: Martin Peres 
martin.pe...@free.fr --- drm/nouveau/nvkm/subdev/fuse/gm107.c | 4 
+--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git 
a/drm/nouveau/nvkm/subdev/fuse/gm107.c 
b/drm/nouveau/nvkm/subdev/fuse/gm107.c index ba19158..0b256aa 100644 
--- a/drm/nouveau/nvkm/subdev/fuse/gm107.c +++ 
b/drm/nouveau/nvkm/subdev/fuse/gm107.c @@ -45,10 +45,8 @@ 
gm107_fuse_ctor(struct nvkm_object *parent, struct nvkm_object 
*engine, ret = nvkm_fuse_create(parent, engine, oclass, priv); 
*pobject = nv_object(priv); - if (ret) - return ret; - return 0; + 
return ret; } struct nvkm_oclass


Reviewed-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de

If it is helping :)
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH] nv50/ir: Handle OP_CVT when folding constant expressions

2015-01-24 Thread Tobias Klausmann



On 11.01.2015 23:53, Ilia Mirkin wrote:
On Sun, Jan 11, 2015 at 5:48 PM, Tobias Klausmann 
tobias.johannes.klausm...@mni.thm.de wrote:

On 11.01.2015 23:12, Ilia Mirkin wrote:
On Sun, Jan 11, 2015 at 5:08 PM, Tobias Klausmann 
tobias.johannes.klausm...@mni.thm.de wrote:

On 11.01.2015 22:54, Ilia Mirkin wrote:
On Sun, Jan 11, 2015 at 4:40 PM, Tobias Klausmann 
tobias.johannes.klausm...@mni.thm.de wrote:
Folding for conversions: F32-(U{16/32}, S{16/32}) and (U{16/32}, 
{S16/32})-F32 Signed-off-by: Tobias Klausmann 
tobias.johannes.klausm...@mni.thm.de --- V2: Split out F64 
parts V3: remove handling of saturate for (U/S)32, 
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 73 
++ 1 file changed, 73 insertions(+) diff 
--git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 
21d20ca..aaf0d0d 100644 --- 
a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ 
-997,6 +997,79 @@ ConstantFolding::opnd(Instruction *i, 
ImmediateValue imm0, int s) i-op = OP_MOV; break; } + case 
OP_CVT: { + Storage res; + bld.setPosition(i, true); /* make sure 
bld is init'ed */ + switch(i-dType) { + case TYPE_U16: + switch 
(i-sType) { + case TYPE_F32: + if (i-saturate) + res.data.u16 = 
util_iround(CLAMP(imm0.reg.data.f32, 0, + UINT16_MAX)); + else + 
res.data.u16 = util_iround(imm0.reg.data.f32); + break; + 
default: + return; + }
This won't get hit for the U32 - U16 conversion though right? Did 
you test that case? Am I misreading/misunderstanding perhaps?
A complete piglit run did not hit i-saturate for U32 or S32. That 
said, i kept the assert() there on purpose for now to actually make 
sure we are no hitting such a case. Do i misread you now? :)
From my read of the code, we'd hit that case now with TXF on a 
2D_ARRAY with a constant as the array element. i.e. a piglit with 
uniform sampler2DArray foo; texelFetch(foo, ivec3(1, 2, 3));
Tested this (hope i did the right thing) and the assert did not get 
triggered, but i am still uncertain of this. - move the assert into 
the F32 case for U32/S32 just to make sure... switch (i-sType) case 
TYPE_F32: assert(...) ... other than that, we are not even going to 
fold U32 - U16 ;-)
Right, and that's the problem. Try it with a piglit that has the code 
I suggest... if you don't end up collapsing it, include the TGSI 
that's generated (and also the shader test source) and we'll go from 
there. -ilia


Haven't found a piglit test triggering that, but i have created a TGSI 
shader on my own. That's the only reason i am writing this email and not 
just posting the patch. This one is collapsed just fine though.


FRAG
DCL OUT[0..2], COLOR
DCL CONST[0..2]
DCL TEMP[0..2], LOCAL
IMM[0] FLT32 { 1, 2, 3, 4}
IMM[1] UINT32 { 5, 6, 7, 8}
0: TEX TEMP[0], IMM[0], SAMP[0], 2D_ARRAY
1: TXF TEMP[1], IMM[1], SAMP[0], 2D_ARRAY
2: MOV OUT[0], TEMP[0]
3: MOV OUT[1], TEMP[1]
4: END


Greetings,
Tobias
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH v4] nv50/ir: Handle OP_CVT when folding constant expressions

2015-01-24 Thread Tobias Klausmann

Folding for conversions:
   F32-(U{16/32}, S{16/32})
   (U{16/32}, {S16/32})-F32
   U32 - U16

Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de
---
V2: Split out F64 parts
V3: remove handling of saturate for (U/S)32
V4: handle U32-U16 for OP_TXF

 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 79 ++
 1 file changed, 79 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 21d20ca..235aed9 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -997,6 +997,85 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue 
imm0, int s)
   i-op = OP_MOV;
   break;
}
+   case OP_CVT: {
+  Storage res;
+  bld.setPosition(i, true); /* make sure bld is init'ed */
+  switch(i-dType) {
+  case TYPE_U16:
+ switch (i-sType) {
+ case TYPE_F32:
+if (i-saturate)
+   res.data.u16 = util_iround(CLAMP(imm0.reg.data.f32, 0,
+UINT16_MAX));
+else
+  res.data.u16 = util_iround(imm0.reg.data.f32);
+break;
+ case TYPE_U32:
+if (i-saturate)
+   res.data.u16 = CLAMP(imm0.reg.data.u32, 0, UINT16_MAX);
+else
+   res.data.u16 = imm0.reg.data.u32;
+break;
+ default:
+return;
+ }
+ i-setSrc(0, bld.mkImm(res.data.u16));
+ break;
+  case TYPE_U32:
+ assert(!i-saturate);
+ switch (i-sType) {
+ case TYPE_F32:
+res.data.u32 = util_iround(imm0.reg.data.f32);
+break;
+ default:
+return;
+ }
+ i-setSrc(0, bld.mkImm(res.data.u32));
+ break;
+  case TYPE_S16:
+ switch (i-sType) {
+ case TYPE_F32:
+if (i-saturate)
+   res.data.s16 = util_iround(CLAMP(imm0.reg.data.f32, INT16_MIN,
+INT16_MAX));
+else
+   res.data.s16 = util_iround(imm0.reg.data.f32);
+break;
+ default:
+return;
+ }
+ i-setSrc(0, bld.mkImm(res.data.s16));
+ break;
+  case TYPE_S32:
+ assert(!i-saturate);
+ switch (i-sType) {
+ case TYPE_F32:
+res.data.s32 = util_iround(imm0.reg.data.f32);
+break;
+ default:
+return;
+ }
+ i-setSrc(0, bld.mkImm(res.data.s32));
+ break;
+  case TYPE_F32:
+ switch (i-sType) {
+ case TYPE_U16: res.data.f32 = (float) imm0.reg.data.u16; break;
+ case TYPE_U32: res.data.f32 = (float) imm0.reg.data.u32; break;
+ case TYPE_S16: res.data.f32 = (float) imm0.reg.data.s16; break;
+ case TYPE_S32: res.data.f32 = (float) imm0.reg.data.s32; break;
+ default:
+return;
+ }
+ i-setSrc(0, bld.mkImm(res.data.f32));
+ break;
+  default:
+ return;
+  }
+  i-setType(i-dType); /* Remove i-sType, which we don't need anymore */
+  i-op = OP_MOV;
+  i-src(0).mod = Modifier(0); /* Clear the already applied modifier */
+  break;
+   }
default:
   return;
}
-- 
2.2.2

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH v2] nv50/ir: Handle OP_CVT when folding constant expressions

2015-01-11 Thread Tobias Klausmann




On 11.01.2015 20:19, Ilia Mirkin wrote:

On Sun, Jan 11, 2015 at 12:27 PM, Tobias Klausmann
tobias.johannes.klausm...@mni.thm.de wrote:


On 11.01.2015 01:58, Ilia Mirkin wrote:

On Fri, Jan 9, 2015 at 8:24 PM, Tobias Klausmann
tobias.johannes.klausm...@mni.thm.de wrote:

Folding for conversions: F32-(U{16/32}, S{16/32}) and (U{16/32},
{S16/32})-F32

Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de
---
V2: beat me, whip me, split out F64

   .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 81
++
   1 file changed, 81 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 9a0bb60..741c74f 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -997,6 +997,87 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue
imm0, int s)
 i-op = OP_MOV;
 break;
  }
+   case OP_CVT: {
+  Storage res;
+  bld.setPosition(i, true); /* make sure bld is init'ed */
+  switch(i-dType) {
+  case TYPE_U16:
+ switch (i-sType) {
+ case TYPE_F32:
+if (i-saturate)
+   res.data.u16 = util_iround(CLAMP(imm0.reg.data.f32, 0,
+UINT16_MAX));

Where did this saturate stuff come from? It doesn't make sense to
saturate to a non-float dtype. I'd go ahead and just
assert(!i-saturate) in the int dtype cases.

One does wonder what the hw does if the float doesn't fit in the
destination... whether it saturates or not. I don't hugely care
though.

Actually i can't remember why that was added in the first place, i'll go
ahead and follow your advice here.

Oh wait... this was to support saturating an array access into a u16...

  const int sat = (i-op == OP_TXF) ? 1 : 0;
  DataType sTy = (i-op == OP_TXF) ? TYPE_U32 : TYPE_F32;
  bld.mkCvt(OP_CVT, TYPE_U16, layer, sTy, src)-saturate = sat;

So... basically if the source is a U32 and the dest is a U16, we want
to saturate there? IMO this is such a minor use-case that it doesn't
really matter. However I guess you can keep the saturate bits around
if you like.
We can do it with or without the saturate if we rely on the test, 
assert(!i-saturate)'ing is the only thing that breaks the test you sure 
meant:


glsl-resource-not-bound 1DArray
glsl-resource-not-bound 2DArray
glsl-resource-not-bound 2DMSArray



   -ilia


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] Re: [RFC] mesa/st: Avoid passing a NULL buffer to the drivers

2015-01-11 Thread Tobias Klausmann




On 11.01.2015 06:05, Ilia Mirkin wrote:

Can you elaborate a bit as to why that's the right thing to do?

On Wed, Jan 7, 2015 at 1:52 PM, Tobias Klausmann
tobias.johannes.klausm...@mni.thm.de wrote:

If we capture transform feedback from n stream in (n-1) buffers we face a
NULL buffer, use the buffer (n-1) to capture the output of stream n.

This fixes one piglit test with nvc0:
arb_gpu_shader5-xfb-streams-without-invocations

Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de
---
  src/mesa/state_tracker/st_cb_xformfb.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/src/mesa/state_tracker/st_cb_xformfb.c 
b/src/mesa/state_tracker/st_cb_xformfb.c
index 8f75eda..5a12da4 100644
--- a/src/mesa/state_tracker/st_cb_xformfb.c
+++ b/src/mesa/state_tracker/st_cb_xformfb.c
@@ -123,6 +123,11 @@ st_begin_transform_feedback(struct gl_context *ctx, GLenum 
mode,
struct st_buffer_object *bo = st_buffer_object(sobj-base.Buffers[i]);

if (bo) {
+ if (!bo-buffer)
+/* If we capture transform feedback from n streams into (n-1)
+ * buffers we have to write to buffer (n-1) for stream n.
+ */
+bo = st_buffer_object(sobj-base.Buffers[i-1]);
   /* Check whether we need to recreate the target. */
   if (!sobj-targets[i] ||
   sobj-targets[i] == sobj-draw_count ||
--
2.2.1

Quoted from Ilia Mirkin, to specify what shall be elaborated:
Can you explain (on-list) why using buffer n - 1 is the right thing to
do to capture output of stream n? I would have thought that the output
for that stream should be discarded or something.

Like with a spec quotation or some other justification. i.e. why is
the code you wrote correct? Why is it better than, say, bo =
buffers[0], or some other thing entirely?

Yeah thats the most concerning point i see as well. The problem is that 
there is a interaction between arb_gpu_shader5 and 
arb_transform_feedback3, but after a bit of reading i think the patch is 
actually what we should do:


From the arb_transfrom_feedback3 spec:

(3) How might you use transform feedback with geometry shaders and
multiple vertex streams?

  RESOLVED:  As a simple example, let's say you are processing 
triangles

  and capture both processed triangle vertices and some values that are
  computed per-primitive (e.g., facet normal).  The geometry shader
  might declare its outputs like the following:

layout(stream = 0) out vec4 position;
layout(stream = 0) out vec4 texcoord;
layout(stream = 1) out vec4 normal;

  position and texcoord would be per-vertex attributes written to
  vertex stream 0; normal would be a per-triangle facet normal.  The
  geometry shader would emit three vertices to stream zero (the 
processed

  input vertices) and a single vertex to stream one (the per-triangle
  data).  The transform feedback API usage for this case would be
  something like:

// Set up buffer objects 21 and 22 to capture data for 
per-vertex and

// per primitive values.
glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 0, 21);
glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 1, 22);

// Set up XFB to capture position and texcoord to buffer binding
// point 0 (buffer 21 bound), and normal to binding point 1 
(buffer

// 22 bound).
char *strings[] = { position, texcoord, gl_NextBuffer,
normal };


- Especially the comments are enlightening as to where the outputs 
should go. Thats what happens with the 
arb_gpu_shader5-xfb-streams-without-invocations test, where two 
stream(outputs) are captured into one buffer.


One might argue now if we have to count .Buffers[i-1] for all buffers 
after this...


Comments and additional feedback is always appreciated!

Greetings,
Tobias

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH] nv50/ir: Handle OP_CVT when folding constant expressions

2015-01-11 Thread Tobias Klausmann




On 11.01.2015 22:54, Ilia Mirkin wrote:

On Sun, Jan 11, 2015 at 4:40 PM, Tobias Klausmann
tobias.johannes.klausm...@mni.thm.de wrote:

Folding for conversions: F32-(U{16/32}, S{16/32}) and (U{16/32}, {S16/32})-F32

Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de
---
V2: Split out F64 parts
V3: remove handling of saturate for (U/S)32,

  .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 73 ++
  1 file changed, 73 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 21d20ca..aaf0d0d 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -997,6 +997,79 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue 
imm0, int s)
i-op = OP_MOV;
break;
 }
+   case OP_CVT: {
+  Storage res;
+  bld.setPosition(i, true); /* make sure bld is init'ed */
+  switch(i-dType) {
+  case TYPE_U16:
+ switch (i-sType) {
+ case TYPE_F32:
+if (i-saturate)
+   res.data.u16 = util_iround(CLAMP(imm0.reg.data.f32, 0,
+UINT16_MAX));
+else
+  res.data.u16 = util_iround(imm0.reg.data.f32);
+break;
+ default:
+return;
+ }

This won't get hit for the U32 - U16 conversion though right? Did you
test that case? Am I misreading/misunderstanding perhaps?
A complete piglit run did not hit i-saturate for U32 or S32. That said, 
i kept the assert() there on purpose for now to actually make sure we 
are no hitting such a case. Do i misread you now? :)

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH] nv50/ir: Handle OP_CVT when folding constant expressions

2015-01-11 Thread Tobias Klausmann




On 11.01.2015 23:12, Ilia Mirkin wrote:

On Sun, Jan 11, 2015 at 5:08 PM, Tobias Klausmann
tobias.johannes.klausm...@mni.thm.de wrote:


On 11.01.2015 22:54, Ilia Mirkin wrote:

On Sun, Jan 11, 2015 at 4:40 PM, Tobias Klausmann
tobias.johannes.klausm...@mni.thm.de wrote:

Folding for conversions: F32-(U{16/32}, S{16/32}) and (U{16/32},
{S16/32})-F32

Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de
---
V2: Split out F64 parts
V3: remove handling of saturate for (U/S)32,

   .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 73
++
   1 file changed, 73 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 21d20ca..aaf0d0d 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -997,6 +997,79 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue
imm0, int s)
 i-op = OP_MOV;
 break;
  }
+   case OP_CVT: {
+  Storage res;
+  bld.setPosition(i, true); /* make sure bld is init'ed */
+  switch(i-dType) {
+  case TYPE_U16:
+ switch (i-sType) {
+ case TYPE_F32:
+if (i-saturate)
+   res.data.u16 = util_iround(CLAMP(imm0.reg.data.f32, 0,
+UINT16_MAX));
+else
+  res.data.u16 = util_iround(imm0.reg.data.f32);
+break;
+ default:
+return;
+ }

This won't get hit for the U32 - U16 conversion though right? Did you
test that case? Am I misreading/misunderstanding perhaps?

A complete piglit run did not hit i-saturate for U32 or S32. That said, i
kept the assert() there on purpose for now to actually make sure we are no
hitting such a case. Do i misread you now? :)

 From my read of the code, we'd hit that case now with TXF on a
2D_ARRAY with a constant as the array element. i.e. a piglit with

uniform sampler2DArray foo;
texelFetch(foo, ivec3(1, 2, 3));
Tested this (hope i did the right thing) and the assert did not get 
triggered, but i am still uncertain of this.

- move the assert into the F32 case for U32/S32 just to make sure...
switch (i-sType)
case TYPE_F32:
   assert(...)
...

other than that, we are not even going to fold U32 - U16 ;-)

Greetings,
Tobias

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

1 2 >

1 - 100 of 155 matches

Mail list logo