[Nouveau] [PATCH] nouveau: forward error generated while resuming objects tree
On a failed resume we may experience unrecoverable errors. Plumb the error code through to actually let the driver fail. On a reverse-prime setup this helps the drm subsystem to at least recover the integrated gpu. This can especially happen with secboot timing out, leaving the hardware in a non-functioning state. Signed-off-by: Tobias Klausmann --- drivers/gpu/drm/nouveau/nouveau_drm.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c index 5020265bfbd9..56a107f3a0e1 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drm.c +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c @@ -802,10 +802,15 @@ nouveau_do_suspend(struct drm_device *dev, bool runtime) static int nouveau_do_resume(struct drm_device *dev, bool runtime) { + int ret = 0; struct nouveau_drm *drm = nouveau_drm(dev); NV_DEBUG(drm, "resuming object tree...\n"); - nvif_client_resume(>master.base); + ret = nvif_client_resume(>master.base); + if (ret) { + NV_ERROR(drm, "Client resume failed with error: %d\n", ret); + return ret; + } NV_DEBUG(drm, "resuming fence...\n"); if (drm->fence && nouveau_fence(drm)->resume) @@ -925,6 +930,7 @@ nouveau_pmops_runtime_resume(struct device *dev) { struct pci_dev *pdev = to_pci_dev(dev); struct drm_device *drm_dev = pci_get_drvdata(pdev); + struct nouveau_drm *drm = nouveau_drm(drm_dev); struct nvif_device *device = _drm(drm_dev)->client.device; int ret; @@ -941,6 +947,10 @@ nouveau_pmops_runtime_resume(struct device *dev) pci_set_master(pdev); ret = nouveau_do_resume(drm_dev, true); + if (ret) { + NV_ERROR(drm, "resume failed with: %d\n", ret); + return ret; + } /* do magic */ nvif_mask(>object, 0x088488, (1 << 25), (1 << 25)); -- 2.21.0 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Nouveau dmem NULL Pointer deref (SVM)
On 21.03.19 20:30, Tobias Klausmann wrote: On 21.03.19 18:12, Jerome Glisse wrote: On Thu, Mar 21, 2019 at 04:59:14PM +0100, Tobias Klausmann wrote: Hi, just for your information and maybe for some help: with 5.1rc1 and SVM enabled i see the following backtrace [1] when the nouveau card (reverse prime) goes to sleep, for now i have papered over with [2] which leaves me with userspace hangs. Any pointers where to look for the actual culprit? PS: Card is: nouveau :01:00.0: NVIDIA GP106 (136000a1) Greetings, Tobias Can you check if attached patch fix the issue ? Cheers, Jérôme Hi, the patch is fine, you can add my R-b & Tested-by! Of course i tested the second patch you send out, not the first one! PS: yet i have another unrelated error keeping my card from beeing happy, thats now the next on my todo list: [ 1102.004901] [ cut here ] [ 1102.004902] nouveau :01:00.0: timeout [ 1102.004948] WARNING: CPU: 2 PID: 55 at drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c:183 acr_ls_sec2_post_run+0x139/0x190 [nouveau] [ 1102.004949] Modules linked in: rfcomm af_packet bnep btusb uvcvideo btrtl btbcm rtsx_usb_sdmmc btintel videobuf2_vmalloc rtsx_usb_ms videobuf2_memops mmc_core bluetooth memstick videobuf2_v4l2 videodev videobuf2_common ecdh_generic rtsx_usb snd_hda_codec_hdmi usbhid snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio nouveau arc4 nls_iso8859_1 nls_cp437 i915 vfat fat intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ath10k_pci msr kvm ath10k_core snd_hda_intel irqbypass ath mxm_wmi snd_hda_codec ttm joydev mac80211 snd_hda_core drm_kms_helper crct10dif_pclmul snd_hwdep crc32_pclmul snd_pcm crc32c_intel drm hid_multitouch ghash_clmulni_intel snd_timer hid_generic iTCO_wdt aesni_intel mei_hdcp iTCO_vendor_support snd aes_x86_64 fb_sys_fops cfg80211 crypto_simd acerfan syscopyarea r8169 sysfillrect cryptd sysimgblt glue_helper realtek idma64 acer_wmi i2c_algo_bit mei_me libphy pcspkr sparse_keymap intel_lpss_pci intel_wmi_thunderbolt soundcore [ 1102.004965] intel_pch_thermal mei i2c_i801 intel_lpss rfkill wmi_bmof thermal tpm_crb tpm_tis pinctrl_sunrisepoint tpm_tis_core ac pinctrl_intel battery tpm button acpi_pad pcc_cpufreq xhci_pci xhci_hcd serio_raw usbcore i2c_hid wmi video sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs autofs4 [ 1102.004972] CPU: 2 PID: 55 Comm: kworker/2:1 Not tainted 5.1.0-rc1-desktop-debug+ #80 [ 1102.004973] Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.11 08/01/2018 [ 1102.004976] Workqueue: pm pm_runtime_work [ 1102.005007] RIP: 0010:acr_ls_sec2_post_run+0x139/0x190 [nouveau] [ 1102.005008] Code: 04 24 48 8b 40 10 48 8b 78 10 4c 8b 77 50 4d 85 f6 74 1e e8 b9 2d 6a dd 48 89 c6 4c 89 f2 48 c7 c7 39 15 fb c0 e8 8c b6 20 dd <0f> 0b e9 4c ff ff ff 4c 8b 77 10 eb dc 48 8b 04 24 48 8b 40 10 48 [ 1102.005009] RSP: 0018:a45c00ee7ab8 EFLAGS: 00010296 [ 1102.005009] RAX: 001d RBX: 912f0e366900 RCX: 0006 [ 1102.005010] RDX: 0007 RSI: 0086 RDI: 912f3ec963f0 [ 1102.005010] RBP: R08: 03cb R09: 0004 [ 1102.005011] R10: R11: 0001 R12: 912f330cc400 [ 1102.005011] R13: 0040 R14: 912df09f0060 R15: 912df09f80b0 [ 1102.005012] FS: () GS:912f3ec8() knlGS: [ 1102.005012] CS: 0010 DS: ES: CR0: 80050033 [ 1102.005013] CR2: 7fed2968e020 CR3: 00028a728004 CR4: 003606e0 [ 1102.005013] Call Trace: [ 1102.005044] acr_r352_bootstrap+0x16e/0x1d0 [nouveau] [ 1102.005073] acr_r352_reset+0x21/0x190 [nouveau] [ 1102.005105] gf100_gr_init_ctxctl_ext+0x59/0x500 [nouveau] [ 1102.005136] gf100_gr_init_ctxctl+0x19/0x270 [nouveau] [ 1102.005167] ? gf100_gr_init+0x533/0x570 [nouveau] [ 1102.005181] nvkm_engine_init+0xa2/0x120 [nouveau] [ 1102.005196] nvkm_subdev_init+0x8d/0xc0 [nouveau] [ 1102.005226] nvkm_device_init+0x107/0x190 [nouveau] [ 1102.005255] nvkm_udevice_init+0x3c/0x60 [nouveau] [ 1102.005269] nvkm_object_init+0x39/0x100 [nouveau] [ 1102.005284] nvkm_object_init+0x6c/0x100 [nouveau] [ 1102.005299] nvkm_object_init+0x6c/0x100 [nouveau] [ 1102.005328] nouveau_do_resume+0x23/0xb0 [nouveau] [ 1102.005357] nouveau_pmops_runtime_resume+0x7c/0x150 [nouveau] [ 1102.005360] ? pci_restore_standard_config+0x40/0x40 [ 1102.005361] pci_pm_runtime_resume+0x6f/0xc0 [ 1102.005362] ? pci_restore_standard_config+0x40/0x40 [ 1102.005363] __rpm_callback+0x76/0x120 [ 1102.005365] ? pci_restore_standard_config+0x40/0x40 [ 1102.005366] rpm_callback+0x1a/0x70 [ 1102.005367] ? pci_restore_standard_config+0x40/0x40 [ 1102.005368] rpm_resume+0x3f5/0x5f0 [ 1102.005369] pm_runtime_work+0x4e/0xa0 [ 1102.005370] process_one_work+0x1d4/0x360 [ 1102.005372] worker_thread+0x28/0x3c0 [ 1
Re: [Nouveau] Nouveau dmem NULL Pointer deref (SVM)
On 21.03.19 18:12, Jerome Glisse wrote: On Thu, Mar 21, 2019 at 04:59:14PM +0100, Tobias Klausmann wrote: Hi, just for your information and maybe for some help: with 5.1rc1 and SVM enabled i see the following backtrace [1] when the nouveau card (reverse prime) goes to sleep, for now i have papered over with [2] which leaves me with userspace hangs. Any pointers where to look for the actual culprit? PS: Card is: nouveau :01:00.0: NVIDIA GP106 (136000a1) Greetings, Tobias Can you check if attached patch fix the issue ? Cheers, Jérôme Hi, the patch is fine, you can add my R-b & Tested-by! PS: yet i have another unrelated error keeping my card from beeing happy, thats now the next on my todo list: [ 1102.004901] [ cut here ] [ 1102.004902] nouveau :01:00.0: timeout [ 1102.004948] WARNING: CPU: 2 PID: 55 at drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c:183 acr_ls_sec2_post_run+0x139/0x190 [nouveau] [ 1102.004949] Modules linked in: rfcomm af_packet bnep btusb uvcvideo btrtl btbcm rtsx_usb_sdmmc btintel videobuf2_vmalloc rtsx_usb_ms videobuf2_memops mmc_core bluetooth memstick videobuf2_v4l2 videodev videobuf2_common ecdh_generic rtsx_usb snd_hda_codec_hdmi usbhid snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio nouveau arc4 nls_iso8859_1 nls_cp437 i915 vfat fat intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ath10k_pci msr kvm ath10k_core snd_hda_intel irqbypass ath mxm_wmi snd_hda_codec ttm joydev mac80211 snd_hda_core drm_kms_helper crct10dif_pclmul snd_hwdep crc32_pclmul snd_pcm crc32c_intel drm hid_multitouch ghash_clmulni_intel snd_timer hid_generic iTCO_wdt aesni_intel mei_hdcp iTCO_vendor_support snd aes_x86_64 fb_sys_fops cfg80211 crypto_simd acerfan syscopyarea r8169 sysfillrect cryptd sysimgblt glue_helper realtek idma64 acer_wmi i2c_algo_bit mei_me libphy pcspkr sparse_keymap intel_lpss_pci intel_wmi_thunderbolt soundcore [ 1102.004965] intel_pch_thermal mei i2c_i801 intel_lpss rfkill wmi_bmof thermal tpm_crb tpm_tis pinctrl_sunrisepoint tpm_tis_core ac pinctrl_intel battery tpm button acpi_pad pcc_cpufreq xhci_pci xhci_hcd serio_raw usbcore i2c_hid wmi video sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs autofs4 [ 1102.004972] CPU: 2 PID: 55 Comm: kworker/2:1 Not tainted 5.1.0-rc1-desktop-debug+ #80 [ 1102.004973] Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.11 08/01/2018 [ 1102.004976] Workqueue: pm pm_runtime_work [ 1102.005007] RIP: 0010:acr_ls_sec2_post_run+0x139/0x190 [nouveau] [ 1102.005008] Code: 04 24 48 8b 40 10 48 8b 78 10 4c 8b 77 50 4d 85 f6 74 1e e8 b9 2d 6a dd 48 89 c6 4c 89 f2 48 c7 c7 39 15 fb c0 e8 8c b6 20 dd <0f> 0b e9 4c ff ff ff 4c 8b 77 10 eb dc 48 8b 04 24 48 8b 40 10 48 [ 1102.005009] RSP: 0018:a45c00ee7ab8 EFLAGS: 00010296 [ 1102.005009] RAX: 001d RBX: 912f0e366900 RCX: 0006 [ 1102.005010] RDX: 0007 RSI: 0086 RDI: 912f3ec963f0 [ 1102.005010] RBP: R08: 03cb R09: 0004 [ 1102.005011] R10: R11: 0001 R12: 912f330cc400 [ 1102.005011] R13: 0040 R14: 912df09f0060 R15: 912df09f80b0 [ 1102.005012] FS: () GS:912f3ec8() knlGS: [ 1102.005012] CS: 0010 DS: ES: CR0: 80050033 [ 1102.005013] CR2: 7fed2968e020 CR3: 00028a728004 CR4: 003606e0 [ 1102.005013] Call Trace: [ 1102.005044] acr_r352_bootstrap+0x16e/0x1d0 [nouveau] [ 1102.005073] acr_r352_reset+0x21/0x190 [nouveau] [ 1102.005105] gf100_gr_init_ctxctl_ext+0x59/0x500 [nouveau] [ 1102.005136] gf100_gr_init_ctxctl+0x19/0x270 [nouveau] [ 1102.005167] ? gf100_gr_init+0x533/0x570 [nouveau] [ 1102.005181] nvkm_engine_init+0xa2/0x120 [nouveau] [ 1102.005196] nvkm_subdev_init+0x8d/0xc0 [nouveau] [ 1102.005226] nvkm_device_init+0x107/0x190 [nouveau] [ 1102.005255] nvkm_udevice_init+0x3c/0x60 [nouveau] [ 1102.005269] nvkm_object_init+0x39/0x100 [nouveau] [ 1102.005284] nvkm_object_init+0x6c/0x100 [nouveau] [ 1102.005299] nvkm_object_init+0x6c/0x100 [nouveau] [ 1102.005328] nouveau_do_resume+0x23/0xb0 [nouveau] [ 1102.005357] nouveau_pmops_runtime_resume+0x7c/0x150 [nouveau] [ 1102.005360] ? pci_restore_standard_config+0x40/0x40 [ 1102.005361] pci_pm_runtime_resume+0x6f/0xc0 [ 1102.005362] ? pci_restore_standard_config+0x40/0x40 [ 1102.005363] __rpm_callback+0x76/0x120 [ 1102.005365] ? pci_restore_standard_config+0x40/0x40 [ 1102.005366] rpm_callback+0x1a/0x70 [ 1102.005367] ? pci_restore_standard_config+0x40/0x40 [ 1102.005368] rpm_resume+0x3f5/0x5f0 [ 1102.005369] pm_runtime_work+0x4e/0xa0 [ 1102.005370] process_one_work+0x1d4/0x360 [ 1102.005372] worker_thread+0x28/0x3c0 [ 1102.005372] ? process_one_work+0x360/0x360 [ 1102.005374] kthread+0x10d/0x130 [ 1102.005375] ? kthread_create_worker_on_cpu+0x40/0x40 [ 1
[Nouveau] Nouveau dmem NULL Pointer deref (SVM)
Hi, just for your information and maybe for some help: with 5.1rc1 and SVM enabled i see the following backtrace [1] when the nouveau card (reverse prime) goes to sleep, for now i have papered over with [2] which leaves me with userspace hangs. Any pointers where to look for the actual culprit? PS: Card is: nouveau :01:00.0: NVIDIA GP106 (136000a1) Greetings, Tobias [1]: BUG: unable to handle kernel NULL pointer dereference at 0028 #PF error: [normal kernel read fault] PGD 0 P4D 0 Oops: [#1] PREEMPT SMP PTI CPU: 3 PID: 435 Comm: kworker/3:4 Not tainted 5.1.0-rc1-desktop-debug+ #80 Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.11 08/01/2018 Workqueue: pm pm_runtime_work RIP: 0010:nouveau_bo_unpin (linux/./include/linux/compiler.h:193 linux/./arch/x86/include/asm/atomic.h:31 linux/./include/asm-generic/atomic-instrumented.h:27 linux/./include/linux/refcount.h:43 linux/./include/linux/kref.h:38 linux/./include/drm/ttm/ttm_bo_driver.h:721 linux/drivers/gpu/drm/nouveau/nouveau_bo.c:454) nouveau Code: 89 d9 48 c7 c6 50 04 e5 c0 c4 42 79 f7 c0 bd f0 ff ff ff e8 42 d5 7a c6 ff 83 00 04 00 00 e9 17 ff ff ff 41 54 55 53 48 89 fb <8b> 47 28 85 c0 0f 84 cf 00 00 00 48 8b bb c0 01 00 00 31 f6 4c 8b All code 0: 89 d9 mov %ebx,%ecx 2: 48 c7 c6 50 04 e5 c0 mov $0xc0e50450,%rsi 9: c4 42 79 f7 c0 shlx %eax,%r8d,%r8d e: bd f0 ff ff ff mov $0xfff0,%ebp 13: e8 42 d5 7a c6 callq 0xc67ad55a 18: ff 83 00 04 00 00 incl 0x400(%rbx) 1e: e9 17 ff ff ff jmpq 0xff3a 23: 41 54 push %r12 25: 55 push %rbp 26: 53 push %rbx 27: 48 89 fb mov %rdi,%rbx 2a:* 8b 47 28 mov 0x28(%rdi),%eax <-- trapping instruction 2d: 85 c0 test %eax,%eax 2f: 0f 84 cf 00 00 00 je 0x104 35: 48 8b bb c0 01 00 00 mov 0x1c0(%rbx),%rdi 3c: 31 f6 xor %esi,%esi 3e: 4c rex.WR 3f: 8b .byte 0x8b Code starting with the faulting instruction === 0: 8b 47 28 mov 0x28(%rdi),%eax 3: 85 c0 test %eax,%eax 5: 0f 84 cf 00 00 00 je 0xda b: 48 8b bb c0 01 00 00 mov 0x1c0(%rbx),%rdi 12: 31 f6 xor %esi,%esi 14: 4c rex.WR 15: 8b .byte 0x8b RSP: 0018:bf0b41237d20 EFLAGS: 00010216 RAX: 9dfe0ba2ec00 RBX: RCX: c0ceb630 RDX: 9dfe0ba2ec38 RSI: 7fff RDI: RBP: 9dfe0a07e000 R08: R09: c0d4a9a0 R10: 8080808080808080 R11: 1800 R12: 0001 R13: R14: R15: 0008 FS: () GS:9dfe3ecc() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0028 CR3: 0001a500e002 CR4: 003606e0 Call Trace: nouveau_dmem_suspend (linux/drivers/gpu/drm/nouveau/nouveau_dmem.c:482 (discriminator 9)) nouveau nouveau_do_suspend (linux/drivers/gpu/drm/nouveau/nouveau_drm.c:748) nouveau nouveau_pmops_runtime_suspend (linux/drivers/gpu/drm/nouveau/nouveau_drm.c:915) nouveau pci_pm_runtime_suspend (linux/drivers/pci/pci-driver.c:1262) ? __switch_to_asm (linux/arch/x86/entry/entry_64.S:312) ? pci_has_legacy_pm_support (linux/drivers/pci/pci-driver.c:1238) __rpm_callback (linux/drivers/base/power/runtime.c:357) ? pci_has_legacy_pm_support (linux/drivers/pci/pci-driver.c:1238) rpm_callback (linux/drivers/base/power/runtime.c:490) ? pci_has_legacy_pm_support (linux/drivers/pci/pci-driver.c:1238) rpm_suspend (linux/drivers/base/power/runtime.c:629) ? __switch_to_asm (linux/arch/x86/entry/entry_64.S:312) ? __switch_to_asm (linux/arch/x86/entry/entry_64.S:312) ? __switch_to_asm (linux/arch/x86/entry/entry_64.S:312) ? __switch_to_asm (linux/arch/x86/entry/entry_64.S:312) ? __switch_to_asm (linux/arch/x86/entry/entry_64.S:312) pm_runtime_work (linux/drivers/base/power/runtime.c:922) process_one_work (linux/./arch/x86/include/asm/preempt.h:26 linux/kernel/workqueue.c:2278) worker_thread (linux/./include/linux/compiler.h:193 linux/./include/linux/list.h:237 linux/kernel/workqueue.c:2416) ? process_one_work (linux/kernel/workqueue.c:2358) kthread (linux/kernel/kthread.c:253) ? kthread_create_worker_on_cpu (linux/kernel/kthread.c:213) ret_from_fork (linux/arch/x86/entry/entry_64.S:358) Modules linked in: rfcomm af_packet snd_hda_codec_hdmi bnep uvcvideo videobuf2_vmalloc rtsx_usb_sdmmc videobuf2_memops btusb rtsx_usb_ms videobuf2_v4l2 btrtl mmc_core memstick btbcm videodev btintel videobuf2_common rtsx_usb bluetooth usbhid
Re: [Nouveau] [PATCH] drm/nouveau/therm/gp100: Do not report temperature when subdev is shadowed
Well fixing the return of wrong values in this function is reasonable by any means, of course not reading the mem in the first place would be nice, but deciding this is imho not in the scope of a temp_get function but somewhere in the code calling temp_get. On 1/26/18 3:03 PM, Karol Herbst wrote: well I just tried to say, that you are not fixing the issue you think were fixing. In your case the GPU is powered off and you get garbage values from any mmio read, so parsing those values is just wrong and we need to prevent doing anything on the hw whenever it is powered off directly in hwmon. On Fri, Jan 26, 2018 at 2:40 PM, Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> wrote: Not sure if i understand completely what you intend to say here, with this we prevent hwmon from reporting utterly wrong temperature values returning an error (we could return -EBUSY or somehting instead, granted), yet if the device is shadowed, getting a sane temp value out of is seems unlikely to me! Greetings, Tobias On 1/26/18 12:40 PM, Karol Herbst wrote: no, we can't do that. We actually have to prevent this from hwom. The issue here is, that the reg read returns 0x and parsing that is the first step in the first place. On Thu, Jan 25, 2018 at 7:16 PM, Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> wrote: This fixes wrong temperature outputs e.g. 511°C if the card is asleep. Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> --- drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c index 9f0dea3f61dc..45d0ec632b5a 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c @@ -32,8 +32,10 @@ gp100_temp_get(struct nvkm_therm *therm) u32 inttemp = (tsensor & 0x0001fff8); /* device SHADOWed */ - if (tsensor & 0x4000) + if (tsensor & 0x4000) { nvkm_trace(subdev, "reading temperature from SHADOWed sensor\n"); + return -ENODEV; + } /* device valid */ if (tsensor & 0x2000) -- 2.16.1 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH] drm/nouveau/therm/gp100: Do not report temperature when subdev is shadowed
Not sure if i understand completely what you intend to say here, with this we prevent hwmon from reporting utterly wrong temperature values returning an error (we could return -EBUSY or somehting instead, granted), yet if the device is shadowed, getting a sane temp value out of is seems unlikely to me! Greetings, Tobias On 1/26/18 12:40 PM, Karol Herbst wrote: no, we can't do that. We actually have to prevent this from hwom. The issue here is, that the reg read returns 0x and parsing that is the first step in the first place. On Thu, Jan 25, 2018 at 7:16 PM, Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> wrote: This fixes wrong temperature outputs e.g. 511°C if the card is asleep. Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> --- drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c index 9f0dea3f61dc..45d0ec632b5a 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c @@ -32,8 +32,10 @@ gp100_temp_get(struct nvkm_therm *therm) u32 inttemp = (tsensor & 0x0001fff8); /* device SHADOWed */ - if (tsensor & 0x4000) + if (tsensor & 0x4000) { nvkm_trace(subdev, "reading temperature from SHADOWed sensor\n"); + return -ENODEV; + } /* device valid */ if (tsensor & 0x2000) -- 2.16.1 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH v2] nv50/ir: Initialize all members of GCRA (trivial)
v2: use initialization list (Pierre) Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> Reviewed-by: Pierre Moreau <pierre.mor...@free.fr> --- src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp index 361918a161..a70a54f6b8 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp @@ -1144,7 +1144,9 @@ GCRA::RIG_Node::addRegPreference(RIG_Node *node) GCRA::GCRA(Function *fn, SpillCodeInserter& spill) : func(fn), regs(fn->getProgram()->getTarget()), - spill(spill) + spill(spill), + nodeCount(0), + nodes(NULL) { prog = func->getProgram(); -- 2.15.1 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] nouveau. swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152
On 12/18/17 7:06 PM, Mike Galbraith wrote: Greetings, Kernel bound workloads seem to trigger the below for whatever reason. I only see this when beating up NFS. There was a kworker wakeup latency issue, but with a bandaid applied to fix that up, I can still trigger this. Hi, i have seen this one as well with my system, but i could not find an easy way to trigger it for bisecting purpose. If you can trigger it conveniently, a bisect would be nice! Greetings, Tobias [ 1313.811031] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 bytes) [ 1313.811035] swiotlb: coherent allocation failed for device :01:00.0 size=2097152 [ 1313.811038] CPU: 6 PID: 3026 Comm: Xorg Tainted: GE 4.15.0.g1291a0d5-master #355 [ 1313.811040] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013 [ 1313.811041] Call Trace: [ 1313.811049] dump_stack+0x7c/0xb6 [ 1313.811053] swiotlb_alloc_coherent+0x13f/0x150 [ 1313.811060] ttm_dma_pool_alloc_new_pages+0x106/0x3c0 [ttm] [ 1313.811066] ttm_dma_pool_get_pages+0x10a/0x1e0 [ttm] [ 1313.811070] ttm_dma_populate+0x21f/0x2f0 [ttm] [ 1313.811075] ttm_tt_bind+0x2f/0x60 [ttm] [ 1313.811079] ttm_bo_handle_move_mem+0x51f/0x580 [ttm] [ 1313.811084] ? ttm_bo_handle_move_mem+0x5/0x580 [ttm] [ 1313.811088] ttm_bo_validate+0x10c/0x120 [ttm] [ 1313.811092] ? ttm_bo_validate+0x5/0x120 [ttm] [ 1313.811106] ? drm_mode_setcrtc+0x20e/0x540 [drm] [ 1313.811109] ttm_bo_init_reserved+0x290/0x490 [ttm] [ 1313.84] ttm_bo_init+0x52/0xb0 [ttm] [ 1313.811141] ? nv10_bo_put_tile_region+0x60/0x60 [nouveau] [ 1313.811163] nouveau_bo_new+0x465/0x5e0 [nouveau] [ 1313.811184] ? nv10_bo_put_tile_region+0x60/0x60 [nouveau] [ 1313.811203] nouveau_gem_new+0x66/0x110 [nouveau] [ 1313.811223] ? nouveau_gem_new+0x110/0x110 [nouveau] [ 1313.811241] nouveau_gem_ioctl_new+0x48/0xc0 [nouveau] [ 1313.811249] drm_ioctl_kernel+0x64/0xb0 [drm] [ 1313.811257] drm_ioctl+0x2a4/0x360 [drm] [ 1313.811276] ? nouveau_gem_new+0x110/0x110 [nouveau] [ 1313.811285] ? drm_ioctl+0x5/0x360 [drm] [ 1313.811304] nouveau_drm_ioctl+0x50/0xb0 [nouveau] [ 1313.811308] do_vfs_ioctl+0x90/0x690 [ 1313.811311] ? do_vfs_ioctl+0x5/0x690 [ 1313.811313] SyS_ioctl+0x3b/0x70 [ 1313.811316] entry_SYSCALL_64_fastpath+0x1f/0x91 [ 1313.811320] RIP: 0033:0x7f3234746227 [ 1313.811321] RSP: 002b:7ffc3ace0408 EFLAGS: 3246 ORIG_RAX: 0010 [ 1313.811324] RAX: ffda RBX: 025515d0 RCX: 7f3234746227 [ 1313.811325] RDX: 7ffc3ace0460 RSI: c0306480 RDI: 000b [ 1313.811326] RBP: 00824120 R08: 02548f80 R09: 025490d0 [ 1313.811328] R10: R11: 3246 R12: 093d [ 1313.811329] R13: 02aff74c R14: 00824150 R15: ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH] Accept 3d controllers and not only VGA controllers.
On 12/3/17 8:56 PM, Josef Larsson wrote: Sure, I can easily split it into two commits, but I would like to have an OK on the actual code changes before splitting the patch. Best regards, Josef Larsson On 2017-11-11 01:05, Tobias Klausmann wrote: On 11/10/17 7:49 PM, Josef Larsson wrote: Accept 3d controllers and not only VGA controllers. According to Ilia Mirkin, the VGA controller check should be removed. This makes it possible to use external connectors on a docking station (40A5) for a Thinkpad P51. (See Bug 101778). lspci example: 01:00.0 3D controller: NVIDIA Corporation GM107GLM [Quadro M1200 Mobile] (rev a2) Also include safe-guards to avoid NULL dereferencing of fbcon, which is how this bug was found. --- drivers/gpu/drm/nouveau/nouveau_fbcon.c | 3 +-- drivers/gpu/drm/nouveau/nv50_display.c | 13 + 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_fbcon.c b/drivers/gpu/drm/nouveau/nouveau_fbcon.c index 2b12d82aac15..6b4d374a9d82 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fbcon.c +++ b/drivers/gpu/drm/nouveau/nouveau_fbcon.c @@ -498,8 +498,7 @@ nouveau_fbcon_init(struct drm_device *dev) int preferred_bpp; int ret; - if (!dev->mode_config.num_crtc || - (dev->pdev->class >> 8) != PCI_CLASS_DISPLAY_VGA) + if (!dev->mode_config.num_crtc) return 0; fbcon = kzalloc(sizeof(struct nouveau_fbdev), GFP_KERNEL); diff --git a/drivers/gpu/drm/nouveau/nv50_display.c b/drivers/gpu/drm/nouveau/nv50_display.c index fb47d46050ec..061daf036407 100644 --- a/drivers/gpu/drm/nouveau/nv50_display.c +++ b/drivers/gpu/drm/nouveau/nv50_display.c @@ -3214,6 +3214,13 @@ nv50_mstm_destroy_connector(struct drm_dp_mst_topology_mgr *mgr, struct nouveau_drm *drm = nouveau_drm(connector->dev); struct nv50_mstc *mstc = nv50_mstc(connector); + if (!drm->fbcon) + { + NV_WARN(drm, "drm->fbcon of %s point to NULL. Will not destroy connector\n", + connector->name); + return; + } + drm_connector_unregister(>connector); drm_modeset_lock_all(drm->dev); @@ -3229,6 +3236,12 @@ nv50_mstm_register_connector(struct drm_connector *connector) { struct nouveau_drm *drm = nouveau_drm(connector->dev); + if (!drm->fbcon) + { + NV_WARN(drm, "drm->fbcon of %s point to NULL. Will not register connector\n", + connector->name); + return; + } drm_modeset_lock_all(drm->dev); drm_fb_helper_add_one_connector(>fbcon->helper, connector); drm_modeset_unlock_all(drm->dev); Hi, the patch looks OK to me, yet as noted in IRC, i'd like to have this patch split into two and have the ->fbcon check as a precondition to the 3D Controller part. But lets see what the other and more clever people think about it! :) Greetings, Tobias Ping, adding Ben Skeggs and Dave Airlied to CC, maybe this will get this little one commited! Greetings, Tobias ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [RFC PATCH] gr: did you try turning it off and on again.
Hi, comments inline On 11/28/17 2:11 PM, Karol Herbst wrote: Fixes secure boot on my gp107. No idea why. Otherwise the GPU enters complete lockdown after starting the gpccs and fecs with the LS images loaded. Signed-off-by: Karol Herbst--- drm/nouveau/nvkm/engine/gr/gf100.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drm/nouveau/nvkm/engine/gr/gf100.c b/drm/nouveau/nvkm/engine/gr/gf100.c index 2f8dc107..322d9fa6 100644 --- a/drm/nouveau/nvkm/engine/gr/gf100.c +++ b/drm/nouveau/nvkm/engine/gr/gf100.c @@ -1731,8 +1731,15 @@ gf100_gr_init_(struct nvkm_gr *base) { struct gf100_gr *gr = gf100_gr(base); struct nvkm_subdev *subdev = >engine.subdev; + struct nvkm_device *device = subdev->device; u32 ret; + /* did you try turning it off and on again? Apparently we need this +* on pascal, otherwise secboot will just fail. The comments about the off and on looks silly, at least put it in quotation marks, or rewrite it, e.g. "Apparently we need to turn it off and on for the pascal generation, otherwise secboot will just fail." +*/ + nvkm_mask(device, 0x200, 0x1000, 0x); + nvkm_mask(device, 0x200, 0x1000, 0x1000); + It is needed with pascal, but does it harm other generations calling this init? Maybe guard it against exectution on maxwell Greetings, Tobias nvkm_pmu_pgob(gr->base.engine.subdev.device->pmu, false); ret = nvkm_falcon_get(gr->fecs, subdev); ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH v3] nouveau/compiler: Allow to omit line numbers when printing instructions
This comes in handy when checking "NV50_PROG_DEBUG=1" outputs with diff! V2: - Use environmental variable (Karol Herbst) V3: - Use the already populated nv50_ir_prog_info to forward information to the print pass (Pierre Moreau) Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> --- src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h | 1 + src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp | 12 +--- src/gallium/drivers/nouveau/nouveau_compiler.c| 2 ++ src/gallium/drivers/nouveau/nv50/nv50_program.c | 2 ++ src/gallium/drivers/nouveau/nvc0/nvc0_program.c | 2 ++ 5 files changed, 16 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h index ffd53c9cd3..604a22ba89 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h @@ -82,6 +82,7 @@ struct nv50_ir_prog_info uint8_t optLevel; /* optimization level (0 to 3) */ uint8_t dbgFlags; + bool omitLineNum; struct { int16_t maxGPR; /* may be -1 if none used */ diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp index 9145801b62..eb7e9057b5 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp @@ -691,7 +691,7 @@ void Instruction::print() const class PrintPass : public Pass { public: - PrintPass() : serial(0) { } + PrintPass(bool omitLineNum = false) : serial(0), omit_serial(omitLineNum) { } virtual bool visit(Function *); virtual bool visit(BasicBlock *); @@ -699,6 +699,7 @@ public: private: int serial; + bool omit_serial; }; bool @@ -762,7 +763,12 @@ PrintPass::visit(BasicBlock *bb) bool PrintPass::visit(Instruction *insn) { - INFO("%3i: ", serial++); + if (omit_serial) { + INFO(" "); + serial++; + } + else + INFO("%3i: ", serial++); insn->print(); return true; } @@ -777,7 +783,7 @@ Function::print() void Program::print() { - PrintPass pass; + PrintPass pass(driver->omitLineNum); init_colours(); pass.run(this, true, false); } diff --git a/src/gallium/drivers/nouveau/nouveau_compiler.c b/src/gallium/drivers/nouveau/nouveau_compiler.c index 3151a6f420..1214cf3565 100644 --- a/src/gallium/drivers/nouveau/nouveau_compiler.c +++ b/src/gallium/drivers/nouveau/nouveau_compiler.c @@ -122,6 +122,8 @@ nouveau_codegen(int chipset, int type, struct tgsi_token tokens[], info.optLevel = debug_get_num_option("NV50_PROG_OPTIMIZE", 3); info.dbgFlags = debug_get_num_option("NV50_PROG_DEBUG", 0); + info.omitLineNum = + debug_get_num_option("NV50_PROG_DEBUG_OMIT_LINENUM", 0) ? true : false; ret = nv50_ir_generate_code(); if (ret) { diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c b/src/gallium/drivers/nouveau/nv50/nv50_program.c index 6e943a3d94..fb5c9ed777 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_program.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c @@ -367,6 +367,8 @@ nv50_program_translate(struct nv50_program *prog, uint16_t chipset, #ifdef DEBUG info->optLevel = debug_get_num_option("NV50_PROG_OPTIMIZE", 3); info->dbgFlags = debug_get_num_option("NV50_PROG_DEBUG", 0); + info->omitLineNum = + debug_get_num_option("NV50_PROG_DEBUG_OMIT_LINENUM", 0) ? true : false; #else info->optLevel = 3; #endif diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c index c95a96c717..8dced66437 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c @@ -575,6 +575,8 @@ nvc0_program_translate(struct nvc0_program *prog, uint16_t chipset, info->target = debug_get_num_option("NV50_PROG_CHIPSET", chipset); info->optLevel = debug_get_num_option("NV50_PROG_OPTIMIZE", 3); info->dbgFlags = debug_get_num_option("NV50_PROG_DEBUG", 0); + info->omitLineNum = + debug_get_num_option("NV50_PROG_DEBUG_OMIT_LINENUM", 0) ? true : false; #else info->optLevel = 3; #endif -- 2.15.0 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH v2] nouveau/compiler: Allow to omit line numbers when printing instructions
This comes in handy when checking "NV50_PROG_DEBUG=1" outputs with diff! V2: - Use environmental variable (Karol Herbst) Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> --- src/gallium/drivers/nouveau/codegen/nv50_ir.cpp| 6 +++--- src/gallium/drivers/nouveau/codegen/nv50_ir.h | 2 +- src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h | 1 + src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp | 14 ++ src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp | 2 +- src/gallium/drivers/nouveau/nouveau_compiler.c | 3 +++ src/gallium/drivers/nouveau/nv50/nv50_program.c| 2 ++ src/gallium/drivers/nouveau/nvc0/nvc0_program.c| 2 ++ 8 files changed, 23 insertions(+), 9 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp index e9363101bf..4bf6c73837 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp @@ -1249,7 +1249,7 @@ nv50_ir_generate_code(struct nv50_ir_prog_info *info) if (ret < 0) goto out; if (prog->dbgFlags & NV50_IR_DEBUG_VERBOSE) - prog->print(); + prog->print(info->omitLineNum); targ->parseDriverInfo(info); prog->getTarget()->runLegalizePass(prog, nv50_ir::CG_STAGE_PRE_SSA); @@ -1257,13 +1257,13 @@ nv50_ir_generate_code(struct nv50_ir_prog_info *info) prog->convertToSSA(); if (prog->dbgFlags & NV50_IR_DEBUG_VERBOSE) - prog->print(); + prog->print(info->omitLineNum); prog->optimizeSSA(info->optLevel); prog->getTarget()->runLegalizePass(prog, nv50_ir::CG_STAGE_SSA); if (prog->dbgFlags & NV50_IR_DEBUG_BASIC) - prog->print(); + prog->print(info->omitLineNum); if (!prog->registerAllocation()) { ret = -4; diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h b/src/gallium/drivers/nouveau/codegen/nv50_ir.h index f2ce16d882..a3c7fd2f94 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h @@ -1249,7 +1249,7 @@ public: Program(Type type, Target *targ); ~Program(); - void print(); + void print(bool omitLineNum); Type getType() const { return progType; } diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h index ffd53c9cd3..604a22ba89 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h @@ -82,6 +82,7 @@ struct nv50_ir_prog_info uint8_t optLevel; /* optimization level (0 to 3) */ uint8_t dbgFlags; + bool omitLineNum; struct { int16_t maxGPR; /* may be -1 if none used */ diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp index 9145801b62..d6fe928af4 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp @@ -691,7 +691,7 @@ void Instruction::print() const class PrintPass : public Pass { public: - PrintPass() : serial(0) { } + PrintPass(bool omitLineNum = false) : serial(0), omit_serial(omitLineNum) { } virtual bool visit(Function *); virtual bool visit(BasicBlock *); @@ -699,6 +699,7 @@ public: private: int serial; + bool omit_serial; }; bool @@ -762,7 +763,12 @@ PrintPass::visit(BasicBlock *bb) bool PrintPass::visit(Instruction *insn) { - INFO("%3i: ", serial++); + if (omit_serial) { + INFO(" "); + serial++; + } + else + INFO("%3i: ", serial++); insn->print(); return true; } @@ -775,9 +781,9 @@ Function::print() } void -Program::print() +Program::print(bool omitLineNum) { - PrintPass pass; + PrintPass pass(omitLineNum); init_colours(); pass.run(this, true, false); } diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp index 298e7c6ef9..96ad70d28a 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp @@ -371,7 +371,7 @@ Program::emitBinary(struct nv50_ir_prog_info *info) emit->prepareEmission(this); if (dbgFlags & NV50_IR_DEBUG_BASIC) - this->print(); + this->print(info->omitLineNum); if (!binSize) { code = NULL; diff --git a/src/gallium/drivers/nouveau/nouveau_compiler.c b/src/gallium/drivers/nouveau/nouveau_compiler.c index 3151a6f420..20a4966433 100644 --- a/src/gallium/drivers/nouveau/nouveau_compiler.c +++ b/src/gallium/drivers/nouveau/nouveau_compiler.c @@ -122,6 +122,8 @@ nouveau_codegen(int chipset, int type, struct tgsi_token tokens[], info.optLevel = debug_get_num_option(
Re: [Nouveau] [PATCH] nouveau/codegen: dump tgsi floats as hex values
On 11/16/17 1:30 PM, Karol Herbst wrote: the problem is, that you also need to be able to save the TGSI into a file and run it rhough nouveau_compiler. Not really sure if it is worth the effort. Printing hex instead of numbers make more sense in this regard anyhow, because we are more precise and being able to debug some issues much better in the end. As long as the new version is still correctly parsed with nouveau_compiler, Yes, it is still parsed correctly! this change is acked-by me. On Wed, Nov 15, 2017 at 10:52 PM, Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> wrote: Hi, yeah in the long run showing both in an ordered manner would be a nice thing to have! That would include patching the output and the tgsi parser (who wants to delete half the output to parse it again e.g. with nouveau_compiler). I can image an output similar to the one below: IMM[5] FLT32 {0., 0., 0., 0.} ^ IMM[5] FLT32 {0x0019, 0x000f, 0x0005, 0x001e} IMM[6] FLT32 {0., 0., 0., 0.} = IMM[6] FLT32 {0x001e, 0x0005, 0x000a, 0x0014} IMM[7] FLT32 {0., 0., 0., 0.} IMM[7] FLT32 {0x0014, 0x000a, 0x000f, 0x0019} Greetings, Tobias PS: I have no push rights to commit this! On 11/15/17 10:44 PM, Pierre Moreau wrote: This looks like the saner approach, compared to changing tgsi_dump.c to display more fractional digits. Maybe there could be a second option to display as both float and hex? Reviewed-by: Pierre Moreau <pierre.mor...@free.fr> On 2017-11-14 — 15:11, Tobias Klausmann wrote: Printing without this could lead to the following output, while the values are not exactly zero: IMM[5] FLT32 {0., 0., 0., 0.} IMM[6] FLT32 {0., 0., 0., 0.} IMM[7] FLT32 {0., 0., 0., 0.} when printing the values as hex, we can now see the differences: IMM[5] FLT32 {0x0019, 0x000f, 0x0005, 0x001e} IMM[6] FLT32 {0x001e, 0x0005, 0x000a, 0x0014} IMM[7] FLT32 {0x0014, 0x000a, 0x000f, 0x0019} Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> --- src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp index 34351dab51..898031811d 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp @@ -1095,7 +1095,7 @@ Source::Source(struct nv50_ir_prog_info *prog) : info(prog) tokens = (const struct tgsi_token *)info->bin.source; if (prog->dbgFlags & NV50_IR_DEBUG_BASIC) - tgsi_dump(tokens, 0); + tgsi_dump(tokens, TGSI_DUMP_FLOAT_AS_HEX); } Source::~Source() -- 2.15.0 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH] nouveau/codegen: dump tgsi floats as hex values
Hi, yeah in the long run showing both in an ordered manner would be a nice thing to have! That would include patching the output and the tgsi parser (who wants to delete half the output to parse it again e.g. with nouveau_compiler). I can image an output similar to the one below: IMM[5] FLT32 {0., 0., 0., 0.} ^ IMM[5] FLT32 {0x0019, 0x000f, 0x0005, 0x001e} IMM[6] FLT32 {0., 0., 0., 0.} = IMM[6] FLT32 {0x001e, 0x0005, 0x000a, 0x0014} IMM[7] FLT32 {0., 0., 0., 0.} IMM[7] FLT32 {0x0014, 0x000a, 0x000f, 0x0019} Greetings, Tobias PS: I have no push rights to commit this! On 11/15/17 10:44 PM, Pierre Moreau wrote: This looks like the saner approach, compared to changing tgsi_dump.c to display more fractional digits. Maybe there could be a second option to display as both float and hex? Reviewed-by: Pierre Moreau <pierre.mor...@free.fr> On 2017-11-14 — 15:11, Tobias Klausmann wrote: Printing without this could lead to the following output, while the values are not exactly zero: IMM[5] FLT32 {0., 0., 0., 0.} IMM[6] FLT32 {0., 0., 0., 0.} IMM[7] FLT32 {0., 0., 0., 0.} when printing the values as hex, we can now see the differences: IMM[5] FLT32 {0x0019, 0x000f, 0x0005, 0x001e} IMM[6] FLT32 {0x001e, 0x0005, 0x000a, 0x0014} IMM[7] FLT32 {0x0014, 0x000a, 0x000f, 0x0019} Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> --- src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp index 34351dab51..898031811d 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp @@ -1095,7 +1095,7 @@ Source::Source(struct nv50_ir_prog_info *prog) : info(prog) tokens = (const struct tgsi_token *)info->bin.source; if (prog->dbgFlags & NV50_IR_DEBUG_BASIC) - tgsi_dump(tokens, 0); + tgsi_dump(tokens, TGSI_DUMP_FLOAT_AS_HEX); } Source::~Source() -- 2.15.0 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH] nouveau/codegen: dump tgsi floats as hex values
ping! On 11/14/17 3:11 PM, Tobias Klausmann wrote: Printing without this could lead to the following output, while the values are not exactly zero: IMM[5] FLT32 {0., 0., 0., 0.} IMM[6] FLT32 {0., 0., 0., 0.} IMM[7] FLT32 {0., 0., 0., 0.} when printing the values as hex, we can now see the differences: IMM[5] FLT32 {0x0019, 0x000f, 0x0005, 0x001e} IMM[6] FLT32 {0x001e, 0x0005, 0x000a, 0x0014} IMM[7] FLT32 {0x0014, 0x000a, 0x000f, 0x0019} Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> --- src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp index 34351dab51..898031811d 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp @@ -1095,7 +1095,7 @@ Source::Source(struct nv50_ir_prog_info *prog) : info(prog) tokens = (const struct tgsi_token *)info->bin.source; if (prog->dbgFlags & NV50_IR_DEBUG_BASIC) - tgsi_dump(tokens, 0); + tgsi_dump(tokens, TGSI_DUMP_FLOAT_AS_HEX); } Source::~Source() ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [RFC PATCH] nouveau/compiler: Allow to omit line numbers when printing instructions
This comes in handy when checking "NV50_PROG_DEBUG=1" outputs with diff! Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> --- src/gallium/drivers/nouveau/codegen/nv50_ir.cpp| 6 +++--- src/gallium/drivers/nouveau/codegen/nv50_ir.h | 2 +- src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h | 1 + src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp | 12 src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp | 2 +- src/gallium/drivers/nouveau/nouveau_compiler.c | 8 ++-- src/gallium/drivers/nouveau/nv50/nv50_program.c| 1 + src/gallium/drivers/nouveau/nvc0/nvc0_program.c| 1 + 8 files changed, 22 insertions(+), 11 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp index e9363101bf..4bf6c73837 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp @@ -1249,7 +1249,7 @@ nv50_ir_generate_code(struct nv50_ir_prog_info *info) if (ret < 0) goto out; if (prog->dbgFlags & NV50_IR_DEBUG_VERBOSE) - prog->print(); + prog->print(info->omitLineNum); targ->parseDriverInfo(info); prog->getTarget()->runLegalizePass(prog, nv50_ir::CG_STAGE_PRE_SSA); @@ -1257,13 +1257,13 @@ nv50_ir_generate_code(struct nv50_ir_prog_info *info) prog->convertToSSA(); if (prog->dbgFlags & NV50_IR_DEBUG_VERBOSE) - prog->print(); + prog->print(info->omitLineNum); prog->optimizeSSA(info->optLevel); prog->getTarget()->runLegalizePass(prog, nv50_ir::CG_STAGE_SSA); if (prog->dbgFlags & NV50_IR_DEBUG_BASIC) - prog->print(); + prog->print(info->omitLineNum); if (!prog->registerAllocation()) { ret = -4; diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h b/src/gallium/drivers/nouveau/codegen/nv50_ir.h index f2ce16d882..a3c7fd2f94 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h @@ -1249,7 +1249,7 @@ public: Program(Type type, Target *targ); ~Program(); - void print(); + void print(bool omitLineNum); Type getType() const { return progType; } diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h index ffd53c9cd3..604a22ba89 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h @@ -82,6 +82,7 @@ struct nv50_ir_prog_info uint8_t optLevel; /* optimization level (0 to 3) */ uint8_t dbgFlags; + bool omitLineNum; struct { int16_t maxGPR; /* may be -1 if none used */ diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp index f5253b3745..a42fb44940 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp @@ -689,7 +689,7 @@ void Instruction::print() const class PrintPass : public Pass { public: - PrintPass() : serial(0) { } + PrintPass(bool omitLineNum = false) : serial(0), omit_serial(omitLineNum) { } virtual bool visit(Function *); virtual bool visit(BasicBlock *); @@ -697,6 +697,7 @@ public: private: int serial; + bool omit_serial; }; bool @@ -760,7 +761,10 @@ PrintPass::visit(BasicBlock *bb) bool PrintPass::visit(Instruction *insn) { - INFO("%3i: ", serial++); + if (omit_serial) + INFO(" "); + else + INFO("%3i: ", serial++); insn->print(); return true; } @@ -773,9 +777,9 @@ Function::print() } void -Program::print() +Program::print(bool omitLineNum) { - PrintPass pass; + PrintPass pass(omitLineNum); init_colours(); pass.run(this, true, false); } diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp index 298e7c6ef9..96ad70d28a 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp @@ -371,7 +371,7 @@ Program::emitBinary(struct nv50_ir_prog_info *info) emit->prepareEmission(this); if (dbgFlags & NV50_IR_DEBUG_BASIC) - this->print(); + this->print(info->omitLineNum); if (!binSize) { code = NULL; diff --git a/src/gallium/drivers/nouveau/nouveau_compiler.c b/src/gallium/drivers/nouveau/nouveau_compiler.c index 3151a6f420..ed68031383 100644 --- a/src/gallium/drivers/nouveau/nouveau_compiler.c +++ b/src/gallium/drivers/nouveau/nouveau_compiler.c @@ -103,7 +103,7 @@ dummy_assign_slots(struct nv50_ir_prog_info *info) static int nouveau_codegen(int chipset, int type, struct tgsi_token tokens[], -unsigned *size, unsigned **code) {
Re: [Nouveau] [PATCH] Accept 3d controllers and not only VGA controllers.
On 11/10/17 7:49 PM, Josef Larsson wrote: Accept 3d controllers and not only VGA controllers. According to Ilia Mirkin, the VGA controller check should be removed. This makes it possible to use external connectors on a docking station (40A5) for a Thinkpad P51. (See Bug 101778). lspci example: 01:00.0 3D controller: NVIDIA Corporation GM107GLM [Quadro M1200 Mobile] (rev a2) Also include safe-guards to avoid NULL dereferencing of fbcon, which is how this bug was found. --- drivers/gpu/drm/nouveau/nouveau_fbcon.c | 3 +-- drivers/gpu/drm/nouveau/nv50_display.c | 13 + 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_fbcon.c b/drivers/gpu/drm/nouveau/nouveau_fbcon.c index 2b12d82aac15..6b4d374a9d82 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fbcon.c +++ b/drivers/gpu/drm/nouveau/nouveau_fbcon.c @@ -498,8 +498,7 @@ nouveau_fbcon_init(struct drm_device *dev) int preferred_bpp; int ret; - if (!dev->mode_config.num_crtc || - (dev->pdev->class >> 8) != PCI_CLASS_DISPLAY_VGA) + if (!dev->mode_config.num_crtc) return 0; fbcon = kzalloc(sizeof(struct nouveau_fbdev), GFP_KERNEL); diff --git a/drivers/gpu/drm/nouveau/nv50_display.c b/drivers/gpu/drm/nouveau/nv50_display.c index fb47d46050ec..061daf036407 100644 --- a/drivers/gpu/drm/nouveau/nv50_display.c +++ b/drivers/gpu/drm/nouveau/nv50_display.c @@ -3214,6 +3214,13 @@ nv50_mstm_destroy_connector(struct drm_dp_mst_topology_mgr *mgr, struct nouveau_drm *drm = nouveau_drm(connector->dev); struct nv50_mstc *mstc = nv50_mstc(connector); + if (!drm->fbcon) + { + NV_WARN(drm, "drm->fbcon of %s point to NULL. Will not destroy connector\n", + connector->name); + return; + } + drm_connector_unregister(>connector); drm_modeset_lock_all(drm->dev); @@ -3229,6 +3236,12 @@ nv50_mstm_register_connector(struct drm_connector *connector) { struct nouveau_drm *drm = nouveau_drm(connector->dev); + if (!drm->fbcon) + { + NV_WARN(drm, "drm->fbcon of %s point to NULL. Will not register connector\n", + connector->name); + return; + } drm_modeset_lock_all(drm->dev); drm_fb_helper_add_one_connector(>fbcon->helper, connector); drm_modeset_unlock_all(drm->dev); Hi, the patch looks OK to me, yet as noted in IRC, i'd like to have this patch split into two and have the ->fbcon check as a precondition to the 3D Controller part. But lets see what the other and more clever people think about it! :) Greetings, Tobias ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Nouveau: kernel hang on Optimus+Intel+NVidia GeForce 1060m
Hi, the system fails to initialize your vbios using secureboot (i had a rare chance to on my system to witness it again), for now i traced it to acr_boot_falcon() in "linux/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0148cdec.c" where it throws -110 which is -ETIMEDOUT. You could try to increase the timeout and see if it helps something, similar to the following: diff --git a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c index 77273b53672c..fc0cb187d80d 100644 --- a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c +++ b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue.c @@ -326,7 +326,7 @@ nvkm_msgqueue_post(struct nvkm_msgqueue *priv, enum msgqueue_msg_priority prio, int ret; if (wait_init && !wait_for_completion_timeout(>init_done, - msecs_to_jiffies(1000))) + msecs_to_jiffies(5000))) return -ETIMEDOUT; queue = priv->func->cmd_queue(priv, prio); diff --git a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c index fec0273158f6..c2ae525a0780 100644 --- a/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c +++ b/drivers/gpu/drm/nouveau/nvkm/falcon/msgqueue_0137c63d.c @@ -279,6 +279,7 @@ acr_boot_falcon(struct nvkm_msgqueue *priv, enum nvkm_secboot_falcon falcon) u32 flags; u32 falcon_id; } cmd; + const struct nvkm_subdev *subdev = priv->falcon->owner; memset(, 0, sizeof(cmd)); @@ -290,7 +291,8 @@ acr_boot_falcon(struct nvkm_msgqueue *priv, enum nvkm_secboot_falcon falcon) nvkm_msgqueue_post(priv, MSGQUEUE_MSG_PRIORITY_HIGH, , acr_boot_falcon_callback, , true); - if (!wait_for_completion_timeout(, msecs_to_jiffies(1000))) + nvkm_error(subdev, "waiting for timeout in acr_boot_falcon (msgqueue_0137bca5)\n"); + if (!wait_for_completion_timeout(, msecs_to_jiffies(5000))) return -ETIMEDOUT; return 0; On 9/13/17 11:37 AM, Nicolas Mercier wrote: > I am still looking for a solution. I have hacked around in the code > and found out the following: > - Nouveau prefers using PCIE power managemet over ACPI Optimus calls. > I tried to force it to use Optimus ACPI calls, but there was an error > calling the ACPI method so it bails out and uses PCIE PM anyway. > - I tried to debug the PCIE pm states which internally uses ACPI to > turn power on/off. I could print different statuses here and there. > When the power is switched off, ACPI calls turn the power off then the > kernel successfully puts the device in state D3Cold (also turning off > power to the PCI Express port). When waking up, ACPI turns the power > on, apparently successfully (Device [PEGP] transitioned to D0). But a > read from the PCI bus to get the power state & other flags return > 65535 (~0) and the kernel fails to set the device in D0 (although ACPI > claims it is in D0) > The call to pci_raw_set_power_state (in drivers/pci/pci.c) seems to > fail because pci_read_config_word returns "~0" (and does not return > any error code) > > I have tried different things; if I use pcie_port_pm=off, the NVidia > card goes to state D3Hot (if I am not mistaken, its PCIE port is still > powered) but that did not fix it. I tried to turn on or off different > PCI/PCIexpress features such as hotplug, PM and so on. The only thing > that works is that PM is fully disabled, which equals to the device > not being powered off, so that would be equivalent to nouveau.runpm=0, > which is not helping a lot. I have tried to force pcie aspm by > recompiling the ACPI table, still no luck. > > I am still taking a look, but it seems like the problem comes from the > PCIExpress PM functions and ACPI, not directly from Nouveau > > /n ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Nouveau: kernel hang on Optimus+Intel+NVidia GeForce 1060m
Hi, i remember seeing the same error with earlier firmware version with a similar system (GP106) once in a while on boot, yet it does not happen with newer versions. Maybe you could try to update the firmware to the latest version from kernel-firmware. As a small addition: I remember deeply: you should ignore the _OSI(Linux) query, as it may break the system in some ways, if you don't have a specific bug fixed with adding _OSI(Linux), removing it from the cmdline is a thing to test! Greetings, Tobias On 9/11/17 4:54 PM, Nicolas Mercier wrote: > Hi, > I have an Optimus-enabled laptop with a GTX 1060m. I never got it to > fully work with Nouveau even after Pascal support was added. I need to > run the kernel with nouveau.runpm=0 to get it to work. Unfortunately > without proper power mangement support, my laptop will run out of > battery after about 1h30, so I'd love to get Optimus working. > > What I can see is that when the extra GPU is not in use, Nouveau will > try to shut it off. This seems to work, as the indicator of the laptop > changes to amber (discrete GPU in use) to blue (discrete GPU powered > off), and the kernel log (attached) reports that the GPU went off. > > If I wake up the GPU (plugging in/out the external monitor, running a > GL command with DRI_PRIME=1, or even lspci) will make the computer > unresponsive and only a force shutdown will work, as no graphics > command seems to be able to execute anymore. Nouveau (likely, vga > switcheroo) tries to wake up the GPU but it seems to fail. The LED > indicator does indicate the GPU has power though. > > Most of the time (but not often when I have an external monitor > plugged into the discrete card before I boot the computer) I get a > timeout during boot or during modprobe (see kernel log). Even with > runpm=0, I can't seem to be able to run GL commands on it: > > yngwe@labarbara: % DRI_PRIME=1 glxinfo > name of display: :0.0 > nvc0_screen_create:857 - Error allocating PGRAPH context for M2MF: -16 > libGL error: failed to create dri screen > libGL error: failed to load driver: nouveau > display: :0 screen: 0 > direct rendering: Yes > ... follows up using the Intel card ... > > the error I get in the kernel in that case: > [ 201.612583] nouveau :01:00.0: gr: FECS falcon already acquired > by gr! > [ 201.612586] nouveau :01:00.0: gr: init failed, -16 > > It runs on Debian testing + the firmware from Ubuntu as Debian's > firmware does not have the nvidia blobs, kernel was either 4.12/4.13 > release candidates from Debian repositories or kernel 4.13.1 self > compiled. I always had exactly the same symptoms on all these kernels. > > /nicolas > > > ___ > Nouveau mailing list > Nouveau@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nouveau ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH] nv50/ra: Only increment DefValue counter if we are going to spill
This is in preparation of an upcoming patch changing how we keep track of the defs. Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> --- src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp index e4f38c8e46..5034f8f989 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp @@ -1750,8 +1750,7 @@ SpillCodeInserter::run(const std::list& lst) // multiple destinations that all need to be spilled (like OP_SPLIT). unordered_set to_del; - for (Value::DefIterator d = lval->defs.begin(); d != lval->defs.end(); - ++d) { + for (Value::DefIterator d = lval->defs.begin(); d != lval->defs.end();) { Value *slot = mem ? static_cast(mem) : new_LValue(func, FILE_GPR); Value *tmp = NULL; @@ -1787,13 +1786,13 @@ SpillCodeInserter::run(const std::list& lst) assert(defi); if (defi->isPseudo()) { d = lval->defs.erase(d); ---d; if (slot->reg.file == FILE_MEMORY_LOCAL) to_del.insert(defi); else defi->setDef(0, slot); } else { spill(defi, slot, dval); +d++; } } -- 2.14.0 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH] nvc0: fix handling of inverted render condition
Wether we wait on an inverted rendering condition or not, we should not render on a passed query. This fixes the CTS test case 'KHR-GL45.conditional_render_inverted.functional'. Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> --- src/gallium/drivers/nouveau/nvc0/nvc0_query.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query.c b/src/gallium/drivers/nouveau/nvc0/nvc0_query.c index e92695bd6a..e6c7d5a3ad 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query.c @@ -132,7 +132,7 @@ nvc0_render_condition(struct pipe_context *pipe, else cond = NVC0_3D_COND_MODE_RES_NON_ZERO; } else { -cond = wait ? NVC0_3D_COND_MODE_EQUAL : NVC0_3D_COND_MODE_ALWAYS; +cond = NVC0_3D_COND_MODE_EQUAL; } break; default: -- 2.14.0 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH v2] nvc0/ir: propagate immediates to CALL input MOVs
ping on this v2 On 8/13/17 3:02 AM, Tobias Klausmann wrote: > On using builtin functions we have to move the input to registers $0 and $1, > if > one of the input value is an immediate, we fail to propagate the immediate: > > ... > mov u32 $r477 0x0003 (0) > ... > mov u32 $r0 %r473 (0) > mov u32 $r1 $r477 (0) > call abs BUILTIN:0 (0) > mov u32 %r495 $r1 (0) > ... > > With this patch the immediate is propagated, potentially causing the first MOV > to be superfluous, which we'd remove in that case: > > ... > > mov u32 $r0 %r473 (0) > mov u32 $r1 0x0003 (0) > call abs BUILTIN:0 (0) > mov u32 %r495 $r1 (0) > ... > > Shaderdb stats: > total instructions in shared programs : 4893460 -> 4893324 (-0.00%) > total gprs used in shared programs: 582972 -> 582881 (-0.02%) > total local used in shared programs : 17960 -> 17960 (0.00%) > > localgpr inst bytes > helped 0 91 112 112 > hurt 0 0 0 0 > > v2: > implement some changes proposed by imirkin, the manual deletion of the dead > mov is necessary after ea22ac23e0 ("nvc0/ir: unlink values pre- and post-call > to division function") as the potentially dead mov is unlinked properly, > causing later passes to not notice the mov op at all and thus not cleaning it > up. That makes up a big chunk of the regression the above commit caused. > Keep the deletion of the op where it is, deleting it later unnecessarily > blows > up size of the change. > > Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> > --- > .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 21 > +++-- > 1 file changed, 19 insertions(+), 2 deletions(-) > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > index c8f0701572..7243b1d2e4 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > @@ -47,8 +47,25 @@ NVC0LegalizeSSA::handleDIV(Instruction *i) > int builtin; > > bld.setPosition(i, false); > - bld.mkMovToReg(0, i->getSrc(0)); > - bld.mkMovToReg(1, i->getSrc(1)); > + > + // Generate movs to the input regs for the call we want to generate > + for (int s = 0; i->srcExists(s); ++s) { > + Instruction *ld = i->getSrc(s)->getInsn(); > + assert(ld->getSrc(0) != NULL); > + // check if we are moving an immediate, propagate it in that case > + if (!ld || ld->fixed || (ld->op != OP_LOAD && ld->op != OP_MOV) || > +!(ld->src(0).getFile() == FILE_IMMEDIATE)) > + bld.mkMovToReg(s, i->getSrc(s)); > + else { > + bld.mkMovToReg(s, ld->getSrc(0)); > + // Clear the src, to make code elimination possible here before we > + // delete the instruction i later > + i->setSrc(s, NULL); > + if (ld->isDead()) > +delete_Instruction(prog, ld); > + } > + } > + > switch (i->dType) { > case TYPE_U32: builtin = NVC0_BUILTIN_DIV_U32; break; > case TYPE_S32: builtin = NVC0_BUILTIN_DIV_S32; break; ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH v2] nvc0/ir: propagate immediates to CALL input MOVs
On using builtin functions we have to move the input to registers $0 and $1, if one of the input value is an immediate, we fail to propagate the immediate: ... mov u32 $r477 0x0003 (0) ... mov u32 $r0 %r473 (0) mov u32 $r1 $r477 (0) call abs BUILTIN:0 (0) mov u32 %r495 $r1 (0) ... With this patch the immediate is propagated, potentially causing the first MOV to be superfluous, which we'd remove in that case: ... mov u32 $r0 %r473 (0) mov u32 $r1 0x0003 (0) call abs BUILTIN:0 (0) mov u32 %r495 $r1 (0) ... Shaderdb stats: total instructions in shared programs : 4893460 -> 4893324 (-0.00%) total gprs used in shared programs: 582972 -> 582881 (-0.02%) total local used in shared programs : 17960 -> 17960 (0.00%) localgpr inst bytes helped 0 91 112 112 hurt 0 0 0 0 v2: implement some changes proposed by imirkin, the manual deletion of the dead mov is necessary after ea22ac23e0 ("nvc0/ir: unlink values pre- and post-call to division function") as the potentially dead mov is unlinked properly, causing later passes to not notice the mov op at all and thus not cleaning it up. That makes up a big chunk of the regression the above commit caused. Keep the deletion of the op where it is, deleting it later unnecessarily blows up size of the change. Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> --- .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp index c8f0701572..7243b1d2e4 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp @@ -47,8 +47,25 @@ NVC0LegalizeSSA::handleDIV(Instruction *i) int builtin; bld.setPosition(i, false); - bld.mkMovToReg(0, i->getSrc(0)); - bld.mkMovToReg(1, i->getSrc(1)); + + // Generate movs to the input regs for the call we want to generate + for (int s = 0; i->srcExists(s); ++s) { + Instruction *ld = i->getSrc(s)->getInsn(); + assert(ld->getSrc(0) != NULL); + // check if we are moving an immediate, propagate it in that case + if (!ld || ld->fixed || (ld->op != OP_LOAD && ld->op != OP_MOV) || +!(ld->src(0).getFile() == FILE_IMMEDIATE)) + bld.mkMovToReg(s, i->getSrc(s)); + else { + bld.mkMovToReg(s, ld->getSrc(0)); + // Clear the src, to make code elimination possible here before we + // delete the instruction i later + i->setSrc(s, NULL); + if (ld->isDead()) +delete_Instruction(prog, ld); + } + } + switch (i->dType) { case TYPE_U32: builtin = NVC0_BUILTIN_DIV_U32; break; case TYPE_S32: builtin = NVC0_BUILTIN_DIV_S32; break; -- 2.14.0 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH] nvc0/ir: propagate immediates to CALL input MOVs
On 8/12/17 10:20 PM, Ilia Mirkin wrote: > On Sat, Aug 12, 2017 at 3:33 PM, Tobias Klausmann > <tobias.johannes.klausm...@mni.thm.de> wrote: >> On using builtin functions we have to move the input to registers $0 and $1, >> if >> one of the input value is an immediate, we fail to propagate the immediate: >> >> ... >> mov u32 $r477 0x0003 (0) >> ... >> mov u32 $r0 %r473 (0) >> mov u32 $r1 $r477 (0) >> call abs BUILTIN:0 (0) >> mov u32 %r495 $r1 (0) >> ... >> >> With this patch the immediate is propagated, potentially causing the first >> MOV >> to be superfluous, which we'd remove in that case: >> >> ... >> >> mov u32 $r0 %r473 (0) >> mov u32 $r1 0x0003 (0) >> call abs BUILTIN:0 (0) >> mov u32 %r495 $r1 (0) >> ... >> >> Shaderdb stats: >> total instructions in shared programs : 4893460 -> 4893324 (-0.00%) >> total gprs used in shared programs: 582972 -> 582881 (-0.02%) >> total local used in shared programs : 17960 -> 17960 (0.00%) >> >> localgpr inst bytes >> helped 0 91 112 112 >> hurt 0 0 0 0 >> >> Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> >> --- >> .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 21 >> +++-- >> 1 file changed, 19 insertions(+), 2 deletions(-) >> >> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp >> b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp >> index c8f0701572..861d08af24 100644 >> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp >> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp >> @@ -47,8 +47,25 @@ NVC0LegalizeSSA::handleDIV(Instruction *i) >> int builtin; >> >> bld.setPosition(i, false); >> - bld.mkMovToReg(0, i->getSrc(0)); >> - bld.mkMovToReg(1, i->getSrc(1)); >> + >> + // Generate movs to the input regs for the call we want to generate >> + for (int s = 0; i->srcExists(s); ++s) { >> + Instruction *ld = i->getSrc(s)->getInsn(); >> + ImmediateValue imm; >> + // check if we are moving an immediate, propagate it in that case >> + if (!ld || ld->fixed || (ld->op != OP_LOAD && ld->op != OP_MOV) || >> +!ld->src(0).getImmediate(imm)) > At this point you don't even have to use getImmediate - you can just > look at ld->src(0).getFile() == FILE_IMMEDIATE. That was actually the fallback to the before non-working getImmediate() and is easily doable if favored. > > Normally you'd just do i->src(s).getImmediate(imm) and moved on with > life. But you kinda need the ld here, which is annoying. Perhaps you > can just drop the manual deletion of the op here... which would let > you do the much simpler thing. Can you see if there's any effect from > that? Will do! >> + bld.mkMovToReg(s, i->getSrc(s)); >> + else { >> + bld.mkMovToReg(s, ld->getSrc(0)); >> + // Clear the src, to make code elimination possible here before we >> + // delete the instruction i later >> + i->setSrc(s, NULL); > i gets deleted later on. move the deletion of ld after that happens? this would cause more indirection (saving of the insn for later), not sure if that makes the code more readable or if this clear is more straight forward. As you see i'd go with the clear, but if you really want i can add the extra save and delete. > >> + if (ld->getDef(0)->refCount() == 0) > ld->isDead() ok > >> +delete_Instruction(prog, ld); >> + } >> + } >> + >> switch (i->dType) { >> case TYPE_U32: builtin = NVC0_BUILTIN_DIV_U32; break; >> case TYPE_S32: builtin = NVC0_BUILTIN_DIV_S32; break; >> -- >> 2.14.0 >> >> ___ >> Nouveau mailing list >> Nouveau@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/nouveau ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH] nvc0/ir: propagate immediates to CALL input MOVs
On using builtin functions we have to move the input to registers $0 and $1, if one of the input value is an immediate, we fail to propagate the immediate: ... mov u32 $r477 0x0003 (0) ... mov u32 $r0 %r473 (0) mov u32 $r1 $r477 (0) call abs BUILTIN:0 (0) mov u32 %r495 $r1 (0) ... With this patch the immediate is propagated, potentially causing the first MOV to be superfluous, which we'd remove in that case: ... mov u32 $r0 %r473 (0) mov u32 $r1 0x0003 (0) call abs BUILTIN:0 (0) mov u32 %r495 $r1 (0) ... Shaderdb stats: total instructions in shared programs : 4893460 -> 4893324 (-0.00%) total gprs used in shared programs: 582972 -> 582881 (-0.02%) total local used in shared programs : 17960 -> 17960 (0.00%) localgpr inst bytes helped 0 91 112 112 hurt 0 0 0 0 Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> --- .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp index c8f0701572..861d08af24 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp @@ -47,8 +47,25 @@ NVC0LegalizeSSA::handleDIV(Instruction *i) int builtin; bld.setPosition(i, false); - bld.mkMovToReg(0, i->getSrc(0)); - bld.mkMovToReg(1, i->getSrc(1)); + + // Generate movs to the input regs for the call we want to generate + for (int s = 0; i->srcExists(s); ++s) { + Instruction *ld = i->getSrc(s)->getInsn(); + ImmediateValue imm; + // check if we are moving an immediate, propagate it in that case + if (!ld || ld->fixed || (ld->op != OP_LOAD && ld->op != OP_MOV) || +!ld->src(0).getImmediate(imm)) + bld.mkMovToReg(s, i->getSrc(s)); + else { + bld.mkMovToReg(s, ld->getSrc(0)); + // Clear the src, to make code elimination possible here before we + // delete the instruction i later + i->setSrc(s, NULL); + if (ld->getDef(0)->refCount() == 0) +delete_Instruction(prog, ld); + } + } + switch (i->dType) { case TYPE_U32: builtin = NVC0_BUILTIN_DIV_U32; break; case TYPE_S32: builtin = NVC0_BUILTIN_DIV_S32; break; -- 2.14.0 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH 17/29] drm/nouveau: switch to drm_*{get, put} helpers
Looks good to me! Reviewed-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> On 8/3/17 1:58 PM, Cihangir Akturk wrote: > drm_*_reference() and drm_*_unreference() functions are just > compatibility alias for drm_*_get() and drm_*_put() adn should not be > used by new code. So convert all users of compatibility functions to use > the new APIs. > > Signed-off-by: Cihangir Akturk <cakt...@gmail.com> > --- > drivers/gpu/drm/nouveau/dispnv04/crtc.c | 2 +- > drivers/gpu/drm/nouveau/nouveau_abi16.c | 2 +- > drivers/gpu/drm/nouveau/nouveau_display.c | 8 > drivers/gpu/drm/nouveau/nouveau_fbcon.c | 2 +- > drivers/gpu/drm/nouveau/nouveau_gem.c | 14 +++--- > drivers/gpu/drm/nouveau/nv50_display.c| 2 +- > 6 files changed, 15 insertions(+), 15 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/dispnv04/crtc.c > b/drivers/gpu/drm/nouveau/dispnv04/crtc.c > index 4b4b0b4..18b4be1 100644 > --- a/drivers/gpu/drm/nouveau/dispnv04/crtc.c > +++ b/drivers/gpu/drm/nouveau/dispnv04/crtc.c > @@ -1019,7 +1019,7 @@ nv04_crtc_cursor_set(struct drm_crtc *crtc, struct > drm_file *file_priv, > nv_crtc->cursor.set_offset(nv_crtc, nv_crtc->cursor.offset); > nv_crtc->cursor.show(nv_crtc, true); > out: > - drm_gem_object_unreference_unlocked(gem); > + drm_gem_object_put_unlocked(gem); > return ret; > } > > diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.c > b/drivers/gpu/drm/nouveau/nouveau_abi16.c > index f98f800..3e9db5a 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_abi16.c > +++ b/drivers/gpu/drm/nouveau/nouveau_abi16.c > @@ -136,7 +136,7 @@ nouveau_abi16_chan_fini(struct nouveau_abi16 *abi16, > if (chan->ntfy) { > nouveau_bo_vma_del(chan->ntfy, >ntfy_vma); > nouveau_bo_unpin(chan->ntfy); > - drm_gem_object_unreference_unlocked(>ntfy->gem); > + drm_gem_object_put_unlocked(>ntfy->gem); > } > > if (chan->heap.block_size) > diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c > b/drivers/gpu/drm/nouveau/nouveau_display.c > index 8d1df56..a68fe1a 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_display.c > +++ b/drivers/gpu/drm/nouveau/nouveau_display.c > @@ -206,7 +206,7 @@ nouveau_user_framebuffer_destroy(struct drm_framebuffer > *drm_fb) > struct nouveau_framebuffer *fb = nouveau_framebuffer(drm_fb); > > if (fb->nvbo) > - drm_gem_object_unreference_unlocked(>nvbo->gem); > + drm_gem_object_put_unlocked(>nvbo->gem); > > drm_framebuffer_cleanup(drm_fb); > kfree(fb); > @@ -267,7 +267,7 @@ nouveau_user_framebuffer_create(struct drm_device *dev, > if (ret == 0) > return >base; > > - drm_gem_object_unreference_unlocked(gem); > + drm_gem_object_put_unlocked(gem); > return ERR_PTR(ret); > } > > @@ -947,7 +947,7 @@ nouveau_display_dumb_create(struct drm_file *file_priv, > struct drm_device *dev, > return ret; > > ret = drm_gem_handle_create(file_priv, >gem, >handle); > - drm_gem_object_unreference_unlocked(>gem); > + drm_gem_object_put_unlocked(>gem); > return ret; > } > > @@ -962,7 +962,7 @@ nouveau_display_dumb_map_offset(struct drm_file > *file_priv, > if (gem) { > struct nouveau_bo *bo = nouveau_gem_object(gem); > *poffset = drm_vma_node_offset_addr(>bo.vma_node); > - drm_gem_object_unreference_unlocked(gem); > + drm_gem_object_put_unlocked(gem); > return 0; > } > > diff --git a/drivers/gpu/drm/nouveau/nouveau_fbcon.c > b/drivers/gpu/drm/nouveau/nouveau_fbcon.c > index 2665a07..6c9e1ec 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_fbcon.c > +++ b/drivers/gpu/drm/nouveau/nouveau_fbcon.c > @@ -451,7 +451,7 @@ nouveau_fbcon_destroy(struct drm_device *dev, struct > nouveau_fbdev *fbcon) > nouveau_bo_vma_del(nouveau_fb->nvbo, _fb->vma); > nouveau_bo_unmap(nouveau_fb->nvbo); > nouveau_bo_unpin(nouveau_fb->nvbo); > - drm_framebuffer_unreference(_fb->base); > + drm_framebuffer_put(_fb->base); > } > > return 0; > diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c > b/drivers/gpu/drm/nouveau/nouveau_gem.c > index 2170534..653425c 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_gem.c > +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c > @@ -281,7 +281,7 @@ nouveau_gem_ioctl_new(struct drm_device *dev, void *data, > } > > /* drop reference from allocate -
[Nouveau] [RFC PATCH v2] nv50/ir: allow spilling of def values for constrained MERGES/UNIONS
This lets us spill more values and compile a Civilization 6 shader with many local vars. As a precation, only spill those vars as a fallback option if all other spillable vars are already spilled! shader-db run shows: total instructions in shared programs : 4427020 -> 4427388 (0.01%) total gprs used in shared programs: 522836 -> 522871 (0.01%) total local used in shared programs : 17128 -> 17464 (1.96%) localgpr inst bytes helped 0 0 0 0 hurt 0 0 0 0 The additional instructions (+368) gprs (+35) and local (+336) are contained in the Civilization 6 shader: 90.shader_test - type: 0, local: 336, gpr: 35, inst: 368, bytes: 3928 Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> --- src/gallium/drivers/nouveau/codegen/nv50_ir.cpp| 2 ++ src/gallium/drivers/nouveau/codegen/nv50_ir.h | 3 ++ src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 39 -- 3 files changed, 34 insertions(+), 10 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp index 08181b790f..2fa8c22e33 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp @@ -233,6 +233,7 @@ LValue::LValue(Function *fn, DataFile file) ssa = 0; fixedReg = 0; noSpill = 0; + softNoSpill = 0; fn->add(this, this->id); } @@ -250,6 +251,7 @@ LValue::LValue(Function *fn, LValue *lval) ssa = 0; fixedReg = 0; noSpill = 0; + softNoSpill = 0; fn->add(this, this->id); } diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h b/src/gallium/drivers/nouveau/codegen/nv50_ir.h index bc15992df0..ca5bcb5362 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h @@ -704,6 +704,9 @@ public: unsigned ssa : 1; unsigned fixedReg : 1; // set & used by RA, earlier just use (id < 0) unsigned noSpill : 1; // do not spill (e.g. if spill temporary already) + unsigned softNoSpill : 1; /* only spill these values if all other values are + * spilled already! + */ }; class Symbol : public Value diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp index b33d7b4010..9d70ec3c9c 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp @@ -769,7 +769,7 @@ private: bool coalesce(ArrayList&); bool doCoalesce(ArrayList&, unsigned int mask); void calculateSpillWeights(); - bool simplify(); + bool simplify(bool useSoftNoSpill); bool selectRegisters(); void cleanup(const bool success); @@ -1242,7 +1242,7 @@ GCRA::calculateSpillWeights() } LValue *val = nodes[i].getValue(); - if (!val->noSpill) { + if (!val->noSpill || val->softNoSpill) { int rc = 0; for (Value::DefIterator it = val->defs.begin(); it != val->defs.end(); @@ -1304,7 +1304,7 @@ GCRA::simplifyNode(RIG_Node *node) } bool -GCRA::simplify() +GCRA::simplify(bool useSoftNoSpill) { for (;;) { if (!DLLIST_EMPTY([0])) { @@ -1317,17 +1317,32 @@ GCRA::simplify() } else if (!DLLIST_EMPTY()) { RIG_Node *best = hi.next; - float bestScore = best->weight / (float)best->degree; + bool spillable = false; + if (best->getValue()->noSpill && best->getValue()->softNoSpill && + useSoftNoSpill) +spillable = true; + float bestScore = INFINITY; + if (!best->getValue()->noSpill || spillable) + bestScore = best->weight / (float)best->degree; // spill candidate for (RIG_Node *it = best->next; it != it = it->next) { -float score = it->weight / (float)it->degree; +float score = INFINITY; +bool spillable = false; +if (it->getValue()->noSpill && it->getValue()->softNoSpill && + useSoftNoSpill) + spillable = true; +if (!it->getValue()->noSpill || spillable) { + score = it->weight / (float)it->degree; +} if (score < bestScore) { best = it; bestScore = score; } } if (isinf(bestScore)) { -ERROR("no viable spill candidates left\n"); +if (useSoftNoSpill) + ERROR("no viable spill candidates left\n"); + return false; } simplifyNode(best); @@ -1491,9 +1506,12 @@ GCRA::allocateRegisters(ArrayList&
[Nouveau] [RFC PATCH] nv50/ir: allow spilling of def values for constrained MERGES/UNIONS
This lets us spill more values and compile a big shader for Civilization 6. Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> --- src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 2 -- 1 file changed, 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp index b33d7b4010..f29c8a1a95 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp @@ -2344,8 +2344,6 @@ RegAlloc::InsertConstraintsPass::insertConstraintMoves() cst->setSrc(s, mov->getDef(0)); cst->bb->insertBefore(cst, mov); -cst->getDef(0)->asLValue()->noSpill = 1; // doesn't help - if (cst->op == OP_UNION) mov->setPredicate(defi->cc, defi->getPredicate()); } -- 2.13.3 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH] drm: disable vblank only if it got previously enabled
Mh ok, paper over in nouveau_display_fini until Ben comes up with a better idea then?! Greetings, Tobias On 7/20/17 10:13 AM, Daniel Vetter wrote: > On Wed, Jul 19, 2017 at 04:10:50PM -0400, Ilia Mirkin wrote: >> I believe the solution is to not call drm_crtc_vblank_off for atomic >> modesetting in nouveau_display_fini. I think Ben's working on it. > Yes, the goal of vblank_on/off was very much to not paper over driver bugs > with clever tricks like these. If the driver cant keep track of its > vblank, something has gone wrong, and the core should _not_ fix it up. > Otherwise we're back to the old style vblank horror show. > > Thanks, Daniel > >> On Wed, Jul 19, 2017 at 1:25 PM, Tobias Klausmann >> <tobias.johannes.klausm...@mni.thm.de> wrote: >>> mimic the behavior of vblank_disable_fn(), another caller of >>> drm_vblank_disable_and_save(). >>> >>> This avoids oopsing, while trying to disable vblank on a not connected >>> display: >>> >>> [ 12.768079] WARNING: CPU: 0 PID: 274 at drivers/gpu/drm/drm_vblank.c:609 >>> drm_calc_vbltimestamp_from_scanoutpos+0x296/0x320 [drm] >>> [ 12.768080] Modules linked in: bnep snd_hda_codec_hdmi rtsx_usb_sdmmc >>> uvcvideo rtsx_usb_ms mmc_core videobuf2_vmalloc memstick videobuf2_memops >>> videobuf2_v4l2 videobuf2_core rtsx_usb videodev btusb btrtl arc4 >>> snd_hda_codec_realtek snd_hda_codec_generic joydev nls_iso8859_1 >>> hid_multitouch nls_cp437 intel_rapl x86_pkg_temp_thermal intel_powerclamp >>> vfat coretemp fat kvm_intel iTCO_wdt iTCO_vendor_support kvm irqbypass >>> crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc >>> aesni_intel ath10k_pci snd_hda_intel ath10k_core aes_x86_64 snd_hda_codec >>> crypto_simd ath glue_helper cryptd snd_hda_core mac80211 snd_hwdep snd_pcm >>> pcspkr r8169 cfg80211 mii snd_timer acer_wmi snd sparse_keymap wmi_bmof >>> idma64 hci_uart virt_dma mei_me soundcore i2c_i801 mei btbcm shpchp >>> intel_lpss_pci intel_pch_thermal >>> [ 12.768130] serdev btqca ucsi_acpi btintel typec_ucsi thermal typec >>> bluetooth ecdh_generic battery ac pinctrl_sunrisepoint rfkill >>> intel_lpss_acpi pinctrl_intel intel_lpss acpi_pad nouveau serio_raw i915 >>> mxm_wmi ttm i2c_algo_bit drm_kms_helper xhci_pci syscopyarea sysfillrect >>> sysimgblt xhci_hcd fb_sys_fops usbcore drm i2c_hid wmi video button sg >>> efivarfs >>> [ 12.768158] CPU: 0 PID: 274 Comm: kworker/0:2 Not tainted >>> 4.12.0-desktop-debug-drm+ #2 >>> [ 12.768160] Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.04 >>> 03/30/2017 >>> [ 12.768164] Workqueue: pm pm_runtime_work >>> [ 12.768166] task: 889bf1627040 task.stack: 9541013e4000 >>> [ 12.768180] RIP: 0010:drm_calc_vbltimestamp_from_scanoutpos+0x296/0x320 >>> [drm] >>> [ 12.768181] RSP: 0018:9541013e7b30 EFLAGS: 00010086 >>> [ 12.768183] RAX: 001c RBX: 889b4cebd000 RCX: >>> 0004 >>> [ 12.768184] RDX: 8004 RSI: 87a2d952 RDI: >>> >>> [ 12.768186] RBP: 9541013e7b90 R08: 0001 R09: >>> 039f >>> [ 12.768187] R10: c05fe530 R11: R12: >>> >>> [ 12.768188] R13: 9541013e7ba4 R14: 889bf0426088 R15: >>> 889bf0426000 >>> [ 12.768190] FS: () GS:889bfec0() >>> knlGS: >>> [ 12.768191] CS: 0010 DS: ES: CR0: 80050033 >>> [ 12.768192] CR2: 00edb16580b8 CR3: 00020cc09000 CR4: >>> 003406f0 >>> [ 12.768193] Call Trace: >>> [ 12.768198] ? enqueue_task_fair+0x64/0x600 >>> [ 12.768211] ? drm_get_last_vbltimestamp+0x47/0x70 [drm] >>> [ 12.768223] ? drm_update_vblank_count+0x65/0x240 [drm] >>> [ 12.768227] ? pci_pm_runtime_resume+0xa0/0xa0 >>> [ 12.768238] ? drm_vblank_disable_and_save+0x55/0xc0 [drm] >>> [ 12.768250] ? drm_crtc_vblank_off+0xa9/0x1e0 [drm] >>> [ 12.768253] ? pci_pm_runtime_resume+0xa0/0xa0 >>> [ 12.768299] ? nouveau_display_fini+0x56/0xd0 [nouveau] >>> [ 12.768339] ? nouveau_display_suspend+0x51/0x110 [nouveau] >>> [ 12.768378] ? nouveau_do_suspend+0x76/0x1c0 [nouveau] >>> [ 12.768413] ? nouveau_pmops_runtime_suspend+0x54/0xb0 [nouveau] >>> [ 12.768416] ? pci_pm_runtime_suspend+0x5c/0x160 >>> [ 12.768419] ? __rpm_callback+0xb6/0x1e0 >>
[Nouveau] [PATCH] drm: disable vblank only if it got previously enabled
mimic the behavior of vblank_disable_fn(), another caller of drm_vblank_disable_and_save(). This avoids oopsing, while trying to disable vblank on a not connected display: [ 12.768079] WARNING: CPU: 0 PID: 274 at drivers/gpu/drm/drm_vblank.c:609 drm_calc_vbltimestamp_from_scanoutpos+0x296/0x320 [drm] [ 12.768080] Modules linked in: bnep snd_hda_codec_hdmi rtsx_usb_sdmmc uvcvideo rtsx_usb_ms mmc_core videobuf2_vmalloc memstick videobuf2_memops videobuf2_v4l2 videobuf2_core rtsx_usb videodev btusb btrtl arc4 snd_hda_codec_realtek snd_hda_codec_generic joydev nls_iso8859_1 hid_multitouch nls_cp437 intel_rapl x86_pkg_temp_thermal intel_powerclamp vfat coretemp fat kvm_intel iTCO_wdt iTCO_vendor_support kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel ath10k_pci snd_hda_intel ath10k_core aes_x86_64 snd_hda_codec crypto_simd ath glue_helper cryptd snd_hda_core mac80211 snd_hwdep snd_pcm pcspkr r8169 cfg80211 mii snd_timer acer_wmi snd sparse_keymap wmi_bmof idma64 hci_uart virt_dma mei_me soundcore i2c_i801 mei btbcm shpchp intel_lpss_pci intel_pch_thermal [ 12.768130] serdev btqca ucsi_acpi btintel typec_ucsi thermal typec bluetooth ecdh_generic battery ac pinctrl_sunrisepoint rfkill intel_lpss_acpi pinctrl_intel intel_lpss acpi_pad nouveau serio_raw i915 mxm_wmi ttm i2c_algo_bit drm_kms_helper xhci_pci syscopyarea sysfillrect sysimgblt xhci_hcd fb_sys_fops usbcore drm i2c_hid wmi video button sg efivarfs [ 12.768158] CPU: 0 PID: 274 Comm: kworker/0:2 Not tainted 4.12.0-desktop-debug-drm+ #2 [ 12.768160] Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.04 03/30/2017 [ 12.768164] Workqueue: pm pm_runtime_work [ 12.768166] task: 889bf1627040 task.stack: 9541013e4000 [ 12.768180] RIP: 0010:drm_calc_vbltimestamp_from_scanoutpos+0x296/0x320 [drm] [ 12.768181] RSP: 0018:9541013e7b30 EFLAGS: 00010086 [ 12.768183] RAX: 001c RBX: 889b4cebd000 RCX: 0004 [ 12.768184] RDX: 8004 RSI: 87a2d952 RDI: [ 12.768186] RBP: 9541013e7b90 R08: 0001 R09: 039f [ 12.768187] R10: c05fe530 R11: R12: [ 12.768188] R13: 9541013e7ba4 R14: 889bf0426088 R15: 889bf0426000 [ 12.768190] FS: () GS:889bfec0() knlGS: [ 12.768191] CS: 0010 DS: ES: CR0: 80050033 [ 12.768192] CR2: 00edb16580b8 CR3: 00020cc09000 CR4: 003406f0 [ 12.768193] Call Trace: [ 12.768198] ? enqueue_task_fair+0x64/0x600 [ 12.768211] ? drm_get_last_vbltimestamp+0x47/0x70 [drm] [ 12.768223] ? drm_update_vblank_count+0x65/0x240 [drm] [ 12.768227] ? pci_pm_runtime_resume+0xa0/0xa0 [ 12.768238] ? drm_vblank_disable_and_save+0x55/0xc0 [drm] [ 12.768250] ? drm_crtc_vblank_off+0xa9/0x1e0 [drm] [ 12.768253] ? pci_pm_runtime_resume+0xa0/0xa0 [ 12.768299] ? nouveau_display_fini+0x56/0xd0 [nouveau] [ 12.768339] ? nouveau_display_suspend+0x51/0x110 [nouveau] [ 12.768378] ? nouveau_do_suspend+0x76/0x1c0 [nouveau] [ 12.768413] ? nouveau_pmops_runtime_suspend+0x54/0xb0 [nouveau] [ 12.768416] ? pci_pm_runtime_suspend+0x5c/0x160 [ 12.768419] ? __rpm_callback+0xb6/0x1e0 [ 12.768423] ? kobject_uevent_env+0x111/0x5e0 [ 12.768425] ? pci_pm_runtime_resume+0xa0/0xa0 [ 12.768427] ? rpm_callback+0x1f/0x70 [ 12.768429] ? pci_pm_runtime_resume+0xa0/0xa0 [ 12.768431] ? rpm_suspend+0x11f/0x640 [ 12.768441] ? drm_fb_helper_hotplug_event+0x9a/0xe0 [drm_kms_helper] [ 12.768447] ? output_poll_execute+0x17b/0x1a0 [drm_kms_helper] [ 12.768449] ? pm_runtime_work+0x64/0xa0 [ 12.768453] ? process_one_work+0x1db/0x410 [ 12.768456] ? worker_thread+0x47/0x3d0 [ 12.768459] ? process_one_work+0x410/0x410 [ 12.768461] ? kthread+0x117/0x130 [ 12.768463] ? kthread_create_on_node+0x40/0x40 [ 12.768466] ? ret_from_fork+0x25/0x30 [ 12.768468] Code: 80 3d 26 f3 01 00 00 0f 85 ad fd ff ff 48 8b 43 20 48 c7 c7 31 a2 20 c0 c6 05 0e f3 01 00 01 48 8b b0 60 01 00 00 e8 75 2e ec c6 <0f> ff e9 88 fd ff ff 31 f6 44 88 55 b0 e8 38 fa ed c6 44 0f b6 [ 12.768508] ---[ end trace d9bb853af3659bd5 ]--- Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> --- drivers/gpu/drm/drm_vblank.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c index a233a6be934a..4a21756bf2bd 100644 --- a/drivers/gpu/drm/drm_vblank.c +++ b/drivers/gpu/drm/drm_vblank.c @@ -1140,8 +1140,11 @@ void drm_crtc_vblank_off(struct drm_crtc *crtc) /* Avoid redundant vblank disables without previous * drm_crtc_vblank_on(). */ - if (drm_core_check_feature(dev, DRIVER_ATOMIC) || !vblank->inmodeset) + if (drm_core_check_feature(dev, DRIVER_ATOMIC) || (!vblank->inmodeset && +
[Nouveau] [PATCH v2] drm/nouveau: honor return type of nvif_mthd, trivial
nvif_mthd() returns an int, so provide that for return checking Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> --- V2: declare var only once The other patch should have never get out at all, but as i'm at it, send out a fixed one! drivers/gpu/drm/nouveau/nouveau_display.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 8d1df5678eaa..58375669d492 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -113,10 +113,11 @@ nouveau_display_scanoutpos_head(struct drm_crtc *crtc, int *vpos, int *hpos, struct drm_vblank_crtc *vblank = >dev->vblank[drm_crtc_index(crtc)]; int retry = 20; bool ret = false; + int mthd_ret; do { - ret = nvif_mthd(>disp, 0, , sizeof(args)); - if (ret != 0) + mthd_ret = nvif_mthd(>disp, 0, , sizeof(args)); + if (mthd_ret != 0) return false; if (args.scan.vline) { -- 2.13.2 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
The conversion is a nice catch, but i'd like to have a bit more context, see below! With a better description: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> On 7/14/17 5:10 PM, Karol Herbst wrote: Yeah, we shouldn't let the machine die. Are there more WARN_ON_ONCE usage we could convert to WARN_ONCE? Reviewed-By: Karol Herbst <karolher...@gmail.com> On Fri, Jul 14, 2017 at 5:05 PM, Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> wrote: On 7/14/17 3:41 PM, Mike Galbraith wrote: On Fri, 2017-07-14 at 15:36 +0200, Mike Galbraith wrote: All DRM did was to slip a WARN_ON_ONCE() that nouveau triggers into a kernel module where such things no longer warn, they blow the box out of the water. BTW, turn that irksome WARN_ON_ONCE() in drivers/gpu/drm/drm_vblank.c into a WARN_ONCE(), and all is peachy, you get the warning, box lives. --- drivers/gpu/drm/drm_vblank.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/drivers/gpu/drm/drm_vblank.c +++ b/drivers/gpu/drm/drm_vblank.c @@ -605,7 +605,8 @@ bool drm_calc_vbltimestamp_from_scanoutp */ if (mode->crtc_clock == 0) { DRM_DEBUG("crtc %u: Noop due to uninitialized mode.\n", pipe); - WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev)); + WARN_ONCE(drm_drv_uses_atomic_modeset(dev), "%s: report me.\n", "report me" seems a bit odd, maybe just uninitialized mode? + dev->driver->name); return false; } Hey, confirmed this helps saving the box, but we still have to find the root cause! Backtrace with the above fix applied (and the one which came in with the latest drm-fixes merge)! [1] https://hastebin.com/uyoqifijed.http Thanks, Tobias Reviewed-By: Karol Herbst <karolher...@gmail.com> ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On 7/14/17 3:41 PM, Mike Galbraith wrote: On Fri, 2017-07-14 at 15:36 +0200, Mike Galbraith wrote: All DRM did was to slip a WARN_ON_ONCE() that nouveau triggers into a kernel module where such things no longer warn, they blow the box out of the water. BTW, turn that irksome WARN_ON_ONCE() in drivers/gpu/drm/drm_vblank.c into a WARN_ONCE(), and all is peachy, you get the warning, box lives. --- drivers/gpu/drm/drm_vblank.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/drivers/gpu/drm/drm_vblank.c +++ b/drivers/gpu/drm/drm_vblank.c @@ -605,7 +605,8 @@ bool drm_calc_vbltimestamp_from_scanoutp */ if (mode->crtc_clock == 0) { DRM_DEBUG("crtc %u: Noop due to uninitialized mode.\n", pipe); - WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev)); + WARN_ONCE(drm_drv_uses_atomic_modeset(dev), "%s: report me.\n", + dev->driver->name); return false; } Hey, confirmed this helps saving the box, but we still have to find the root cause! Backtrace with the above fix applied (and the one which came in with the latest drm-fixes merge)! [1] https://hastebin.com/uyoqifijed.http Thanks, Tobias ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH] drm/nouveau: split nouveau_drm_postclose back in pre/postclose
Mike, sorry i forgot to add: Can you plase test this patch? Thanks, tobias On 7/12/17 11:56 PM, Tobias Klausmann wrote: This patch brings back the old nouveau_drm_preclose and nouveau_drm_postclose functions for closing down a drm device Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> --- drivers/gpu/drm/nouveau/nouveau_drm.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c index 90757af9bc73..0ca2b65bdc4f 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drm.c +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c @@ -877,7 +877,7 @@ nouveau_drm_open(struct drm_device *dev, struct drm_file *fpriv) } static void -nouveau_drm_postclose(struct drm_device *dev, struct drm_file *fpriv) +nouveau_drm_preclose(struct drm_device *dev, struct drm_file *fpriv) { struct nouveau_cli *cli = nouveau_cli(fpriv); struct nouveau_drm *drm = nouveau_drm(dev); @@ -892,7 +892,12 @@ nouveau_drm_postclose(struct drm_device *dev, struct drm_file *fpriv) mutex_lock(>client.mutex); list_del(>head); mutex_unlock(>client.mutex); +} +static void +nouveau_drm_postclose(struct drm_device *dev, struct drm_file *fpriv) +{ + struct nouveau_cli *cli = nouveau_cli(fpriv); nouveau_cli_fini(cli); kfree(cli); pm_runtime_mark_last_busy(dev->dev); @@ -964,6 +969,7 @@ driver_stub = { .load = nouveau_drm_load, .unload = nouveau_drm_unload, .open = nouveau_drm_open, + .preclose = nouveau_drm_preclose, .postclose = nouveau_drm_postclose, .lastclose = nouveau_vga_lastclose, ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH] drm/nouveau: split nouveau_drm_postclose back in pre/postclose
This patch brings back the old nouveau_drm_preclose and nouveau_drm_postclose functions for closing down a drm device Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> --- drivers/gpu/drm/nouveau/nouveau_drm.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c index 90757af9bc73..0ca2b65bdc4f 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drm.c +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c @@ -877,7 +877,7 @@ nouveau_drm_open(struct drm_device *dev, struct drm_file *fpriv) } static void -nouveau_drm_postclose(struct drm_device *dev, struct drm_file *fpriv) +nouveau_drm_preclose(struct drm_device *dev, struct drm_file *fpriv) { struct nouveau_cli *cli = nouveau_cli(fpriv); struct nouveau_drm *drm = nouveau_drm(dev); @@ -892,7 +892,12 @@ nouveau_drm_postclose(struct drm_device *dev, struct drm_file *fpriv) mutex_lock(>client.mutex); list_del(>head); mutex_unlock(>client.mutex); +} +static void +nouveau_drm_postclose(struct drm_device *dev, struct drm_file *fpriv) +{ + struct nouveau_cli *cli = nouveau_cli(fpriv); nouveau_cli_fini(cli); kfree(cli); pm_runtime_mark_last_busy(dev->dev); @@ -964,6 +969,7 @@ driver_stub = { .load = nouveau_drm_load, .unload = nouveau_drm_unload, .open = nouveau_drm_open, + .preclose = nouveau_drm_preclose, .postclose = nouveau_drm_postclose, .lastclose = nouveau_vga_lastclose, -- 2.13.2 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH] drm/nouveau: honor return type of nvif_mthd, trivial
nvif_mthd() returns an int, so provide that for return checking Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> --- drivers/gpu/drm/nouveau/nouveau_display.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 8d1df5678eaa..f8f555e2e912 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -113,10 +113,11 @@ nouveau_display_scanoutpos_head(struct drm_crtc *crtc, int *vpos, int *hpos, struct drm_vblank_crtc *vblank = >dev->vblank[drm_crtc_index(crtc)]; int retry = 20; bool ret = false; + int method_ret; do { - ret = nvif_mthd(>disp, 0, , sizeof(args)); - if (ret != 0) + int method_ret = nvif_mthd(>disp, 0, , sizeof(args)); + if (method_ret != 0) return false; if (args.scan.vline) { -- 2.13.2 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On 7/12/17 7:19 PM, Mike Galbraith wrote: On Wed, 2017-07-12 at 07:37 -0400, Ilia Mirkin wrote: On Wed, Jul 12, 2017 at 7:25 AM, Mike Galbraithwrote: On Wed, 2017-07-12 at 11:55 +0200, Mike Galbraith wrote: On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote: Some display stuff did change for 4.13 for GM20x+ boards. If it's not too much trouble, a bisect would be pretty useful. Bisection seemingly went fine, but the result is odd. e98c58e55f68f8785aebfab1f8c9a03d8de0afe1 is the first bad commit But it really really is bad. Looking at gitk fork in the road leading to it... 52d9d38c183b drm/sti:fix spelling mistake: "compoment" -> "component" - good e4e818cc2d7c drm: make drm_panel.h self-contained - good 9cf8f5802f39 drm: add missing declaration to drm_blend.h - good Before the git highway splits, all is well. The lane with commits works fine at both ends, but e98c58e55f68 is busted. Merge arfifact? Hmmm... that tree does not appear to have gotten a v4.12 backmerge at any point. The last backmerge from Linus as far as I can tell was v4.11-rc7. Could be an interaction with some out-of-tree change. FWIW, checking out the fingered commit then.. git log --oneline 52d9d38c183b..e98c58e55f68|grep nouveau and reverting the lot helped not at all. Checking out 6b7781b42dc9 and reverting the fingered commit did. Given the nouveau bits reverted are mostly the vblank changes, CC to Daniel, maybe he'll know why both GTX 980 and GeForce 8600 GT get all upset. Either I'm damn lucky, both of my nvidia equipped boxen going boom 100% repeatably, or there are a lot of folks out there who haven't yet tried suspend with our latest/greatest kernel. I suspect the later. -Mike I should have had a look at my inbox, would have save me a log of work bisecting. Yet i come to the same conclusion: # first bad commit: [e98c58e55f68f8785aebfab1f8c9a03d8de0afe1] Merge tag 'drm-misc-next-2017-05-16' of git://anongit.freedesktop.org/git/drm-misc into drm-next I suspect it is some vblank change as it shows up in every trace i have seen while bisecting, but that is just a wild guess... Greetings, Tobias ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
>On Tue, Jul 11, 2017 at 2:08 PM, Mike Galbraith wrote: >> On Tue, 2017-07-11 at 13:51 -0400, Ilia Mirkin wrote: >>> Some details that may be useful in analysis of the bug: >>> >>> 1. lspci -nn -d 10de: >> >> 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 980] [10de:13c0] (rev a1) >> 01:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1 >> >>> 2. What displays, if any, you have plugged into the NVIDIA board when >>> this happens? >> >> A Philips 273V, via DVI. >> >>> 3. Any boot parameters, esp relating to ACPI, PM, or related? >> >> None for those, what's there that will be unfamiliar to you are for >> patches that aren't applied. >> >> nortsched hpc_cpusets skew_tick=1 ftrace_dump_on_oops audit=0 >> nodelayacct cgroup_disable=memory rtkthreads=1 rtworkqueues=2 panic=60 >> ignore_loglevel crashkernel=256M,high > > OK, thanks. So in other words, a fairly standard desktop with a PCIe > board plugged in. No funny business. (Laptops can create a ton of > additional weirdness, which I assumed you had since you were talking > about STR.) > > My best guess is that gf119_head_vblank_put either has a bogus head id > (should be in the 0..3 range) which causes it to do an out-of-bounds > read on MMIO space, or that the MMIO mapping has already been removed > by the time nouveau_display_suspend runs. Adding Ben Skeggs for > additional insight. > > Some display stuff did change for 4.13 for GM20x+ boards. If it's not > too much trouble, a bisect would be pretty useful. Hey Mike, just to inform you: i have a quite similar bug with no monitor attached while putting my nouveau card to sleep (laptop/optimus system) within nouveau_display_suspend(). I'm going to bisect this, hopefully on the long run this will aid in resolving your issue as well! Greeting, Tobias ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
On 02.10.2016 20:03, Ilia Mirkin wrote: On Sun, Oct 2, 2016 at 1:58 PM, Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> wrote: Previously we'd end up with an unnecessary mov for the thirs immediate value. total instructions in shared programs : 851881 -> 851864 (-0.00%) total gprs used in shared programs: 110295 -> 110295 (0.00%) total local used in shared programs : 1020 -> 1020 (0.00%) localgpr inst bytes helped 0 0 17 17 hurt 0 0 0 0 Suggested-by: Karol Herbst <nouv...@karolherbst.de> Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> --- src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 9875738..8bb5cf9 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -1008,13 +1008,22 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue , int s) break; case OP_MAD: if (imm0.isInteger(0)) { + ImmediateValue imm1; i->setSrc(0, i->getSrc(2)); i->src(0).mod = i->src(2).mod; i->setSrc(1, NULL); i->setSrc(2, NULL); - i->op = i->src(0).mod.getOp(); - if (i->op != OP_CVT) -i->src(0).mod = 0; + if (i->src(0).getImmediate(imm1)) { +bld.setPosition(i, false); +newi = bld.mkMov(i->getDef(0), bld.mkImm(imm1.reg.data.u64), + i->dType); +delete_Instruction(prog, i); What's an example of a situation where this helps? It shouldn't matter, the mov's should get cleaned up. [Clearly 17 shaders disagree...] Is this just a side-effect of the fact that we don't run the opts to a fixed point? It is a second mov that causes a problem for later folding in the imm, here output of a testshader[1]: 0: nop u32 %r56 (0) 1: ld u32 %r31 c0[0x0] (0) 2: ld u32 %r37 c0[0x140] (0) 3: mov u32 %r38 0x (0) 4: mov u32 %r39 0x3f80 (0) 5: mad f32 %r40 %r37 %r38 %r39 (0) 6: mad f32 %r44 %r37 %r38 %r38 (0) 7: add f32 %r53 %r31 %r40 (0) 8: add f32 %r54 %r31 %r44 (0) 9: add f32 %r57 %r56 %r44 (0) Constantfolding... MAIN:-1 () BB:0 (14 instructions) - df = { } -> BB:1 (tree) 0: nop u32 %r56 (0) 1: ld u32 %r31 c0[0x0] (0) 2: ld u32 %r37 c0[0x140] (0) 3: mov u32 %r38 0x (0) 4: mov u32 %r39 0x3f80 (0) 5: mov f32 %r40 %r39 (0) 6: mov f32 %r44 %r38 (0) 7: add f32 %r53 %r31 %r40 (0) 8: mov f32 %r54 %r31 (0) 9: mov f32 %r57 %r56 (0) The outcome: 0: ld u32 $r2 c0[0x0] (8) 1: mov u32 $r0 0x3f80 (8) 2: add ftz f32 $r0 $r2 $r0 (8) 3: mov f32 $r3 $r1 (8) 4: mov u32 $r1 $r2 (8) 5: export b128 # o[0x0] $r0q (8) With patch: 0: ld u32 $r2 c0[0x0] (8) 1: add ftz f32 $r0 $r2 1.00 (8) 2: mov f32 $r3 $r1 (8) 3: mov u32 $r1 $r2 (8) 4: export b128 # o[0x0] $r0q (8) [1]: VERT PROPERTY NEXT_SHADER FRAG DCL OUT[0], GENERIC[0] DCL CONST[0] DCL TEMP[0..1], LOCAL IMM[0] FLT32 {0.0078,-1., 0., 0.5000} IMM[1] FLT32 {1., 0., 65535., 0.0100} 0: MOV TEMP[0].xyz, CONST[0]. 39: MAD TEMP[1], CONST[20]., IMM[1]., IMM[1].xyyy 41: ADD TEMP[1], TEMP[0], TEMP[1] 208: MOV OUT[0], TEMP[1] 211: END + } + else { +i->op = i->src(0).mod.getOp(); +if (i->op != OP_CVT) + i->src(0).mod = 0; + } } else if (i->subOp != NV50_IR_SUBOP_MUL_HIGH && (imm0.isInteger(1) || imm0.isInteger(-1))) { -- 2.10.0 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
Previously we'd end up with an unnecessary mov for the thirs immediate value. total instructions in shared programs : 851881 -> 851864 (-0.00%) total gprs used in shared programs: 110295 -> 110295 (0.00%) total local used in shared programs : 1020 -> 1020 (0.00%) localgpr inst bytes helped 0 0 17 17 hurt 0 0 0 0 Suggested-by: Karol Herbst <nouv...@karolherbst.de> Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> --- src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 9875738..8bb5cf9 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -1008,13 +1008,22 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue , int s) break; case OP_MAD: if (imm0.isInteger(0)) { + ImmediateValue imm1; i->setSrc(0, i->getSrc(2)); i->src(0).mod = i->src(2).mod; i->setSrc(1, NULL); i->setSrc(2, NULL); - i->op = i->src(0).mod.getOp(); - if (i->op != OP_CVT) -i->src(0).mod = 0; + if (i->src(0).getImmediate(imm1)) { +bld.setPosition(i, false); +newi = bld.mkMov(i->getDef(0), bld.mkImm(imm1.reg.data.u64), + i->dType); +delete_Instruction(prog, i); + } + else { +i->op = i->src(0).mod.getOp(); +if (i->op != OP_CVT) + i->src(0).mod = 0; + } } else if (i->subOp != NV50_IR_SUBOP_MUL_HIGH && (imm0.isInteger(1) || imm0.isInteger(-1))) { -- 2.10.0 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH v2] nv50/ir: constant fold OP_SPLIT
On 30.09.2016 23:57, Ilia Mirkin wrote: On Fri, Sep 30, 2016 at 5:50 PM, Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> wrote: Split the source immediate value into two new values and create OP_MOV instructions the two newly created values. V2: get rid of special cases Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> --- src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 16 1 file changed, 16 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 9875738..d56b057 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -932,6 +932,22 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue , int s) Instruction *newi = i; switch (i->op) { + case OP_SPLIT: { + uint8_t size = typeSizeof(i->dType); + DataType type = typeOfSize(size / 2, isFloatType(i->dType), + isSignedType(i->dType)); Er wait, sorry, I might have confused matters here... Why do you need to compute type at all? Why not just reuse i->dType? i->dType comes in the same as i->sType, so we need to evaluate the old type and set the new type accordingly. e.g in my test shader a u64 is split, still i->dType == TYPE_U64. Maybe we are doing something wrong somewhere else, but looking only at this folding, setting the new type is needed (note that originally i->sType was used) + if (likely(type != TYPE_NONE)) { + uint64_t val = imm0.reg.data.u64; + uint16_t shift = size * 8; + bld.setPosition(i, false); + for (int8_t d = 0; i->defExists(d); ++d) { +bld.mkMov(i->getDef(d), bld.mkImm(val & ((1 << shift) - 1)), type); 1ULL +val >>= shift; + } + delete_Instruction(prog, i); + } + } + break; case OP_MUL: if (i->dType == TYPE_F32) tryCollapseChainedMULs(i, s, imm0); -- 2.10.0 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH v2] nv50/ir: constant fold OP_SPLIT
Split the source immediate value into two new values and create OP_MOV instructions the two newly created values. V2: get rid of special cases Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> --- src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 16 1 file changed, 16 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 9875738..d56b057 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -932,6 +932,22 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue , int s) Instruction *newi = i; switch (i->op) { + case OP_SPLIT: { + uint8_t size = typeSizeof(i->dType); + DataType type = typeOfSize(size / 2, isFloatType(i->dType), + isSignedType(i->dType)); + if (likely(type != TYPE_NONE)) { + uint64_t val = imm0.reg.data.u64; + uint16_t shift = size * 8; + bld.setPosition(i, false); + for (int8_t d = 0; i->defExists(d); ++d) { +bld.mkMov(i->getDef(d), bld.mkImm(val & ((1 << shift) - 1)), type); +val >>= shift; + } + delete_Instruction(prog, i); + } + } + break; case OP_MUL: if (i->dType == TYPE_F32) tryCollapseChainedMULs(i, s, imm0); -- 2.10.0 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH] nv50/ir: constant fold OP_SPLIT
On 28.09.2016 02:01, Ilia Mirkin wrote: On Tue, Sep 27, 2016 at 7:25 PM, Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> wrote: Split the source immediate value into two new values and create OP_MOV instructions the two newly created values. Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> --- .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 23 ++ 1 file changed, 23 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 74a5a85..f71 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -920,6 +920,29 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue , int s) Instruction *newi = i; switch (i->op) { + case OP_SPLIT: { + uint16_t shift = 0; + DataType type = TYPE_NONE; + bld.setPosition(i, false); + if (i->sType == TYPE_U64 || i->sType == TYPE_S64) { + shift = 32; + type = (i->sType == TYPE_U64) ? TYPE_U32 : TYPE_S32; + } + if (i->sType == TYPE_U32 || i->sType == TYPE_S32) { + shift = 16; + type = (i->sType == TYPE_U32) ? TYPE_U16 : TYPE_S16; + } + if (i->sType == TYPE_U16 || i->sType == TYPE_S16) { + shift = 8; + type = (i->sType == TYPE_U16) ? TYPE_U8 : TYPE_S8; + } shift = typeSizeOf(i->dType); + if (type != TYPE_NONE) { + bld.mkMov(i->getDef(0), bld.mkImm(imm0.reg.data.u64 >> shift), type); + bld.mkMov(i->getDef(1), bld.mkImm(imm0.reg.data.u64), type); u64 val = ...u64; for (d = 0; i->defExists(d); ++d) { bld.mkMov(i->getDef(d), bld.mkImm(val & ((1 << shift) - 1)); val >>= shift; } I think this will account for every case, and with a lot less special-casing. What do you think? Well with this you'd not set the new type right: bld.mkMov(def, val, >>type<<), where you always would use TYPE_U32. Not sure if that is what we want... other than that that, shorten it like this would be nice! + delete_Instruction(prog, i); + } + } + break; case OP_MUL: if (i->dType == TYPE_F32) tryCollapseChainedMULs(i, s, imm0); -- 2.10.0 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH] nv50/ir: use unordered_set instead of list to keep track of var defs
The set of variable defs does not need to be ordered in any way, and removing/adding elements is a fairly common operation in various optimization passes. This shortens runtime of piglit test fp-long-alu to ~11s from ~22s No piglit regressions observed on nvc0! Signed-off-by: Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> --- src/gallium/drivers/nouveau/codegen/nv50_ir.cpp| 4 ++-- src/gallium/drivers/nouveau/codegen/nv50_ir.h | 7 +++--- .../drivers/nouveau/codegen/nv50_ir_inlines.h | 28 +- .../nouveau/codegen/nv50_ir_lowering_nv50.cpp | 4 ++-- .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 6 ++--- .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 4 ++-- src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 17 +++-- 7 files changed, 38 insertions(+), 32 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp index cce6055..745cdc9 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp @@ -154,9 +154,9 @@ ValueDef::set(Value *defVal) if (value == defVal) return; if (value) - value->defs.remove(this); + value->defs.erase(this); if (defVal) - defVal->defs.push_back(this); + defVal->defs.insert(this); value = defVal; } diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h b/src/gallium/drivers/nouveau/codegen/nv50_ir.h index ba1b085..deeabff 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h @@ -570,6 +570,7 @@ public: inline Value *rep() const { return join; } + inline Instruction *getUniqueInsnMerged() const; inline Instruction *getUniqueInsn() const; inline Instruction *getInsn() const; // use when uniqueness is certain @@ -586,11 +587,11 @@ public: static inline Value *get(Iterator&); unordered_set uses; - std::list defs; + unordered_set defs; typedef unordered_set::iterator UseIterator; typedef unordered_set::const_iterator UseCIterator; - typedef std::list::iterator DefIterator; - typedef std::list::const_iterator DefCIterator; + typedef unordered_set::iterator DefIterator; + typedef unordered_set::const_iterator DefCIterator; int id; Storage reg; diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_inlines.h b/src/gallium/drivers/nouveau/codegen/nv50_ir_inlines.h index e465f24..8c8e54c 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_inlines.h +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_inlines.h @@ -205,21 +205,26 @@ const LValue *ValueDef::preSSA() const Instruction *Value::getInsn() const { - return defs.empty() ? NULL : defs.front()->getInsn(); + return defs.empty() ? NULL : (*defs.begin())->getInsn(); } -Instruction *Value::getUniqueInsn() const +Instruction *Value::getUniqueInsnMerged() const { if (defs.empty()) return NULL; + /* It is not guaranteed that this is the first in the set, lets find it */ + for (DefCIterator it = defs.begin(); it != defs.end(); ++it) + if ((*it)->get() == this) + return (*it)->getInsn(); + /* We should never hit this assert */ + assert(0); + return NULL; +} - // after regalloc, the definitions of coalesced values are linked - if (join != this) { - for (DefCIterator it = defs.begin(); it != defs.end(); ++it) - if ((*it)->get() == this) -return (*it)->getInsn(); - // should be unreachable and trigger assertion at the end - } +Instruction *Value::getUniqueInsn() const +{ + if (defs.empty()) + return NULL; #ifdef DEBUG if (reg.data.id < 0) { int n = 0; @@ -230,8 +235,9 @@ Instruction *Value::getUniqueInsn() const WARN("value %%%i not uniquely defined\n", id); // return NULL ? } #endif - assert(defs.front()->get() == this); - return defs.front()->getInsn(); + ValueDef *def = *defs.begin(); + assert(def->get() == this); + return def->getInsn(); } inline bool Instruction::constrainedDefs() const diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp index bea293b..9d1244d 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp @@ -211,7 +211,7 @@ NV50LegalizePostRA::visit(Function *fn) if (outWrites) { for (std::list::iterator it = outWrites->begin(); it != outWrites->end(); ++it) - (*it)->getSrc(1)->defs.front()->getInsn()->setDef(0, (*it)->getSrc(0)); + (*(*it)->getSrc(1)->defs.begin())->getInsn()->setDef(0, (*it)->getSrc(0)); // instructions will be deleted on exit outWrites->clear(); } @@ -343,7 +343,7 @@ NV50Lega
Re: [Nouveau] [PATCH] nv50: avoid using inline vertex data submit when gl_VertexID is used
On 24.08.2015 17:51, Ilia Mirkin wrote: The hardware only generates vertexid when vertices come from a VBO. This fixes: vertexid-drawelements vertexid-drawarrays Signed-off-by: Ilia Mirkin imir...@alum.mit.edu Cc: 11.0 mesa-sta...@lists.freedesktop.org --- src/gallium/drivers/nouveau/nv50/nv50_program.c| 1 + src/gallium/drivers/nouveau/nv50/nv50_program.h| 1 + src/gallium/drivers/nouveau/nv50/nv50_state_validate.c | 3 ++- src/gallium/drivers/nouveau/nv50/nv50_vbo.c| 8 4 files changed, 12 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c b/src/gallium/drivers/nouveau/nv50/nv50_program.c index 02dc367..eff4477 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_program.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c @@ -66,6 +66,7 @@ nv50_vertprog_assign_slots(struct nv50_ir_prog_info *info) case TGSI_SEMANTIC_VERTEXID: prog-vp.attrs[2] |= NV50_3D_VP_GP_BUILTIN_ATTR_EN_VERTEX_ID; prog-vp.attrs[2] |= NV50_3D_VP_GP_BUILTIN_ATTR_EN_VERTEX_ID_DRAW_ARRAYS_ADD_START; + prog-vp.vertexid = 1; continue; default: break; diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.h b/src/gallium/drivers/nouveau/nv50/nv50_program.h index 5d3ff56..f4e8e94 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_program.h +++ b/src/gallium/drivers/nouveau/nv50/nv50_program.h @@ -76,6 +76,7 @@ struct nv50_program { ubyte psiz;/* output slot of point size */ ubyte bfc[2]; /* indices into varying for FFC (FP) or BFC (VP) */ ubyte edgeflag; + ubyte vertexid; ubyte clpd[2]; /* output slot of clip distance[i]'s 1st component */ ubyte clpd_nr; } vp; diff --git a/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c b/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c index b304a17..66dcf43 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c @@ -503,7 +503,8 @@ static struct state_validate { { nv50_validate_samplers, NV50_NEW_SAMPLERS }, { nv50_stream_output_validate, NV50_NEW_STRMOUT | NV50_NEW_VERTPROG | NV50_NEW_GMTYPROG }, -{ nv50_vertex_arrays_validate, NV50_NEW_VERTEX | NV50_NEW_ARRAYS }, +{ nv50_vertex_arrays_validate, NV50_NEW_VERTEX | NV50_NEW_ARRAYS | + NV50_NEW_VERTPROG }, { nv50_validate_min_samples, NV50_NEW_MIN_SAMPLES }, }; #define validate_list_len (sizeof(validate_list) / sizeof(validate_list[0])) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c index 600b973..fb4305f 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c @@ -301,6 +301,14 @@ nv50_vertex_arrays_validate(struct nv50_context *nv50) unsigned i; const unsigned n = MAX2(vertex-num_elements, nv50-state.num_vtxelts); + /* A vertexid is not generated for inline data uploads. Have to use a +* VBO. This check must come after the vertprog has been validated, +* otherwise vertexid may be unset. +*/ + assert(nv50-vertprog-translated); + if (nv50-vertprog-vp.vertexid) + nv50-vbo_push_hint = 0; + if (unlikely(vertex-need_conversion)) nv50-vbo_fifo = ~0; else LGTM! ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH] glsl: Extend lowering pass for gl_ClipDistance to support other arrays
This will come in handy when we want to lower gl_CullDistance into gl_CullDistanceMESA. Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- src/glsl/Makefile.sources| 2 +- src/glsl/ir_optimization.h | 1 + src/glsl/lower_clip_distance.cpp | 574 src/glsl/lower_distance.cpp | 606 +++ 4 files changed, 608 insertions(+), 575 deletions(-) delete mode 100644 src/glsl/lower_clip_distance.cpp create mode 100644 src/glsl/lower_distance.cpp diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources index 0b77244..00ba480 100644 --- a/src/glsl/Makefile.sources +++ b/src/glsl/Makefile.sources @@ -143,7 +143,7 @@ LIBGLSL_FILES = \ loop_analysis.h \ loop_controls.cpp \ loop_unroll.cpp \ - lower_clip_distance.cpp \ + lower_distance.cpp \ lower_const_arrays_to_uniforms.cpp \ lower_discard.cpp \ lower_discard_flow.cpp \ diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h index eef107e..fe62e74 100644 --- a/src/glsl/ir_optimization.h +++ b/src/glsl/ir_optimization.h @@ -120,6 +120,7 @@ bool lower_variable_index_to_cond_assign(gl_shader_stage stage, bool lower_quadop_vector(exec_list *instructions, bool dont_lower_swz); bool lower_const_arrays_to_uniforms(exec_list *instructions); bool lower_clip_distance(gl_shader *shader); +bool lower_cull_distance(gl_shader *shader); void lower_output_reads(unsigned stage, exec_list *instructions); bool lower_packing_builtins(exec_list *instructions, int op_mask); void lower_ubo_reference(struct gl_shader *shader, exec_list *instructions); diff --git a/src/glsl/lower_clip_distance.cpp b/src/glsl/lower_clip_distance.cpp deleted file mode 100644 index 1ada215..000 --- a/src/glsl/lower_clip_distance.cpp +++ /dev/null @@ -1,574 +0,0 @@ -/* - * Copyright © 2011 Intel Corporation - * - * Permission is hereby granted, free of charge, to any person obtaining a - * copy of this software and associated documentation files (the Software), - * to deal in the Software without restriction, including without limitation - * the rights to use, copy, modify, merge, publish, distribute, sublicense, - * and/or sell copies of the Software, and to permit persons to whom the - * Software is furnished to do so, subject to the following conditions: - * - * The above copyright notice and this permission notice (including the next - * paragraph) shall be included in all copies or substantial portions of the - * Software. - * - * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER - * DEALINGS IN THE SOFTWARE. - */ - -/** - * \file lower_clip_distance.cpp - * - * This pass accounts for the difference between the way - * gl_ClipDistance is declared in standard GLSL (as an array of - * floats), and the way it is frequently implemented in hardware (as - * a pair of vec4s, with four clip distances packed into each). - * - * The declaration of gl_ClipDistance is replaced with a declaration - * of gl_ClipDistanceMESA, and any references to gl_ClipDistance are - * translated to refer to gl_ClipDistanceMESA with the appropriate - * swizzling of array indices. For instance: - * - * gl_ClipDistance[i] - * - * is translated into: - * - * gl_ClipDistanceMESA[i2][i3] - * - * Since some hardware may not internally represent gl_ClipDistance as a pair - * of vec4's, this lowering pass is optional. To enable it, set the - * LowerClipDistance flag in gl_shader_compiler_options to true. - */ - -#include glsl_symbol_table.h -#include ir_rvalue_visitor.h -#include ir.h -#include program/prog_instruction.h /* For WRITEMASK_* */ - -namespace { - -class lower_clip_distance_visitor : public ir_rvalue_visitor { -public: - explicit lower_clip_distance_visitor(gl_shader_stage shader_stage) - : progress(false), old_clip_distance_out_var(NULL), -old_clip_distance_in_var(NULL), new_clip_distance_out_var(NULL), -new_clip_distance_in_var(NULL), shader_stage(shader_stage) - { - } - - virtual ir_visitor_status visit(ir_variable *); - void create_indices(ir_rvalue*, ir_rvalue *, ir_rvalue *); - bool is_clip_distance_vec8(ir_rvalue *ir); - ir_rvalue *lower_clip_distance_vec8(ir_rvalue *ir); - virtual ir_visitor_status visit_leave(ir_assignment *); - void visit_new_assignment(ir_assignment *ir); - virtual ir_visitor_status visit_leave(ir_call *); - - virtual void handle_rvalue(ir_rvalue **rvalue); - - void fix_lhs(ir_assignment *); - - bool progress; - - /** -* Pointer
Re: [Nouveau] [PATCH] avoid build fail without COMPOSITE
Lgtm! You can add my R-b if you want! On 14.07.2015 23:17, Ilia Mirkin wrote: --- src/nouveau_dri2.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/src/nouveau_dri2.c b/src/nouveau_dri2.c index f22e319..4398559 100644 --- a/src/nouveau_dri2.c +++ b/src/nouveau_dri2.c @@ -142,6 +142,7 @@ nouveau_dri2_copy_region2(ScreenPtr pScreen, DrawablePtr pDraw, RegionPtr pRegio NVPtr pNv = NVPTR(xf86ScreenToScrn(pScreen)); RegionPtr pCopyClip; GCPtr pGC; + PixmapPtr pPix; DrawablePtr src_draw, dst_draw; Bool translate = FALSE; int off_x = 0, off_y = 0; @@ -170,9 +171,13 @@ nouveau_dri2_copy_region2(ScreenPtr pScreen, DrawablePtr pDraw, RegionPtr pRegio } if (translate pDraw-type == DRAWABLE_WINDOW) { - PixmapPtr pPix = get_drawable_pixmap(pDraw); - off_x = pDraw-x - pPix-screen_x; - off_y = pDraw-y - pPix-screen_y; + off_x = pDraw-x; + off_y = pDraw-y; +#ifdef COMPOSITE + pPix = get_drawable_pixmap(pDraw); + off_x -= pPix-screen_x; + off_y -= pPix-screen_y; +#endif } pGC = GetScratchGC(pDraw-depth, pScreen); @@ -194,8 +199,8 @@ nouveau_dri2_copy_region2(ScreenPtr pScreen, DrawablePtr pDraw, RegionPtr pRegio if (extents-x1 == 0 extents-y1 == 0 extents-x2 == pDraw-width extents-y2 == pDraw-height) { - PixmapPtr fpix = get_drawable_pixmap(dst_draw); - struct nouveau_bo *bo = nouveau_pixmap_bo(fpix); + pPix = get_drawable_pixmap(dst_draw); + struct nouveau_bo *bo = nouveau_pixmap_bo(pPix); if (bo) nouveau_bo_wait(bo, NOUVEAU_BO_RD, pNv-client); } ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH 1/2] nouveau/compiler: fix trivial compiler warnings
nouveau_compiler.c: In function ‘main’: nouveau_compiler.c:216:27: warning: ‘code’ may be used uninitialized in this function [-Wmaybe-uninitialized] printf(%08x , code[i / 4]); ^ nouveau_compiler.c:215:4: warning: ‘size’ may be used uninitialized in this function [-Wmaybe-uninitialized] for (i = 0; i size; i += 4) { Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- src/gallium/drivers/nouveau/nouveau_compiler.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/nouveau_compiler.c b/src/gallium/drivers/nouveau/nouveau_compiler.c index 8660498..ca128b5 100644 --- a/src/gallium/drivers/nouveau/nouveau_compiler.c +++ b/src/gallium/drivers/nouveau/nouveau_compiler.c @@ -144,7 +144,7 @@ main(int argc, char *argv[]) const char *filename = NULL; FILE *f; char text[65536] = {0}; - unsigned size, *code; + unsigned size = 0, *code = NULL; for (i = 1; i argc; i++) { if (!strcmp(argv[i], -a)) -- 2.4.5 ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH 2/2] nv50/ir: fix a compiler warning with debug-only code
codegen/nv50_ir_emit_nv50.cpp: In member function ‘void nv50_ir::CodeEmitterNV50::emitLOAD(const nv50_ir::Instruction*)’: codegen/nv50_ir_emit_nv50.cpp:620:12: warning: unused variable ‘offset’ [-Wunused-variable] int32_t offset = i-getSrc(0)-reg.data.offset; Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp index 67ea6df..86b16f2 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp @@ -616,8 +616,11 @@ CodeEmitterNV50::emitLoadStoreSizeCS(DataType ty) void CodeEmitterNV50::emitLOAD(const Instruction *i) { - DataFile sf = i-src(0).getFile(); +#ifdef DEBUG int32_t offset = i-getSrc(0)-reg.data.offset; +#endif + + DataFile sf = i-src(0).getFile(); switch (sf) { case FILE_SHADER_INPUT: -- 2.4.5 ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH 1/2] nouveau/compiler: fix trivial compiler warnings
On 08.07.2015 21:42, Emil Velikov wrote: On 8 July 2015 at 20:34, Tobias Klausmann tobias.johannes.klausm...@mni.thm.de wrote: Mh i'm not aware of me ever changed the nouveau_compiler. But i'm happy to see this made you laugh, so it has something positive at least... :/ Story time: This particular compiler warning has been brought up (incl here) four or five times. Each time, Ilia feels reluctant about the fix as the (gcc) compiler gets it wrong. Personally I do not see a problem with explicitly initialising the variable at this instance, yet I'm curious for how long Ilia will say no to this (type of) patch(es) :-P No offence, I just find it funny. Emil Oh i did even answer in a thread for a patch from Martin where he propose the same change (even with the same prefix :D). Ilia maybe you should take this after all, as it seems you are haunted by this :P ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [RFC PATCH 00/11] Implement ARB_cull_distance
On 25.05.2015 17:07, Ilia Mirkin wrote: On Mon, May 25, 2015 at 9:40 AM, Tobias Klausmann tobias.johannes.klausm...@mni.thm.de wrote: On 25.05.2015 07:17, Dave Airlie wrote: On 25 May 2015 at 08:11, Marek Olšák mar...@gmail.com wrote: It's the same on Radeon. There are 2x ClipOrCullDistance output vectors and a mask saying it should clip or cull or do nothing. Marek My thinking was gallium should have a single semantic and a mask in the shader definition maybe. though it doesn't solve the does nvidia do the right thing with cull[0] and clip[0], and what is the right thing. Dave. I'm still convinced that both clip[0] and cull[0] should be possible. Plus i have written a shader_test for this a while ago which you pushed to piglit (fs-cull-and-clip-distance-different.shader_test). If i remember right nvidia passed that test just fine. My take (and note that I last read the extension many months ago) is that you're supposed to figure out the max gl_ClipDistance[] written, and then write all your cull distances above that. So if you, e.g., have something like gl_ClipDistance[5] = 1; gl_CullDistance[0] = 1; Then it would decide that there are 6 clip distances (or if there's an explicit out float gl_ClipDistance[n], then use that), and 1 cull distance. In the TGSI, I'm thinking this might look approximately like PROPERTY CULL_MASK (16) DCL OUT[0], CLIPDIST[0] DCL OUT[1], CLIPDIST[1] MOV OUT[1].y, 1 (clip distance[5]) MOV OUT[1].z, 1 (cull distance[0]) Then basically you'd have (rast-clip_enable shader-actual_clip_writes_mask) | cull_mask = the enabled distances cull_mask = cull mask This would work *very* well for nouveau, not sure how suitable it is for other hardware. Cheers, -ilia I wonder where this step should be implemented after all. It was brought up that llvmpipe already supports cull_distance (it does!), so maybe we should implement this in the drivers to evade llvmpipe breakage. Any suggestions appreciated :) Tobias ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH 2/2] nv50/ir: fix a compiler warning with debug-only code
On 08.07.2015 21:34, Emil Velikov wrote: On 8 July 2015 at 19:27, Tobias Klausmann tobias.johannes.klausm...@mni.thm.de wrote: codegen/nv50_ir_emit_nv50.cpp: In member function ‘void nv50_ir::CodeEmitterNV50::emitLOAD(const nv50_ir::Instruction*)’: codegen/nv50_ir_emit_nv50.cpp:620:12: warning: unused variable ‘offset’ [-Wunused-variable] int32_t offset = i-getSrc(0)-reg.data.offset; Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp index 67ea6df..86b16f2 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp @@ -616,8 +616,11 @@ CodeEmitterNV50::emitLoadStoreSizeCS(DataType ty) void CodeEmitterNV50::emitLOAD(const Instruction *i) { - DataFile sf = i-src(0).getFile(); +#ifdef DEBUG int32_t offset = i-getSrc(0)-reg.data.offset; +#endif + assert is (normally) guarded by NDEBUG. Mesa/gallium has an in-house replacement, which (not 100% sure) should be fine as well. -Emil As far as i can see it in u_debug.h assert (debug_assert) is guarded by DEBUG as the above change... ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH 1/2] nouveau/compiler: fix trivial compiler warnings
On 08.07.2015 20:38, Ilia Mirkin wrote: Compiler is wrong. So just nouveau: ... then? Anyway, change it to your liking. On Wed, Jul 8, 2015 at 2:27 PM, Tobias Klausmann tobias.johannes.klausm...@mni.thm.de wrote: nouveau_compiler.c: In function ‘main’: nouveau_compiler.c:216:27: warning: ‘code’ may be used uninitialized in this function [-Wmaybe-uninitialized] printf(%08x , code[i / 4]); ^ nouveau_compiler.c:215:4: warning: ‘size’ may be used uninitialized in this function [-Wmaybe-uninitialized] for (i = 0; i size; i += 4) { Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- src/gallium/drivers/nouveau/nouveau_compiler.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/nouveau_compiler.c b/src/gallium/drivers/nouveau/nouveau_compiler.c index 8660498..ca128b5 100644 --- a/src/gallium/drivers/nouveau/nouveau_compiler.c +++ b/src/gallium/drivers/nouveau/nouveau_compiler.c @@ -144,7 +144,7 @@ main(int argc, char *argv[]) const char *filename = NULL; FILE *f; char text[65536] = {0}; - unsigned size, *code; + unsigned size = 0, *code = NULL; for (i = 1; i argc; i++) { if (!strcmp(argv[i], -a)) -- 2.4.5 ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [RFC PATCH 00/11] Implement ARB_cull_distance
On 27.05.2015 18:28, Marek Olšák wrote: Another thing to consider is linking shaders that occur before the rasterizer (e.g. any two shaders from VS-TCS-TES-GS). The maximum number of written distances is still 8, but what happens if VS writes 1x clip and 7x cull and GS reads 8x clip and no cull? i think this should be rejected anyway (in the glsl?!), constraining vs output to be the same as gs input where the last definition is the valid one In this case it's basically a generic varying. How is linking separate shaders supposed to work with one combined clip-or-cull array? It doesn't seem to be possible. Marek On Mon, May 25, 2015 at 5:07 PM, Ilia Mirkin imir...@alum.mit.edu wrote: On Mon, May 25, 2015 at 9:40 AM, Tobias Klausmann tobias.johannes.klausm...@mni.thm.de wrote: On 25.05.2015 07:17, Dave Airlie wrote: On 25 May 2015 at 08:11, Marek Olšák mar...@gmail.com wrote: It's the same on Radeon. There are 2x ClipOrCullDistance output vectors and a mask saying it should clip or cull or do nothing. Marek My thinking was gallium should have a single semantic and a mask in the shader definition maybe. though it doesn't solve the does nvidia do the right thing with cull[0] and clip[0], and what is the right thing. Dave. I'm still convinced that both clip[0] and cull[0] should be possible. Plus i have written a shader_test for this a while ago which you pushed to piglit (fs-cull-and-clip-distance-different.shader_test). If i remember right nvidia passed that test just fine. Ah btw, if we follow Brian Paul, overlapping indexes are fine! (and it is way more intuitive to use for a shader developer) My take (and note that I last read the extension many months ago) is that you're supposed to figure out the max gl_ClipDistance[] written, and then write all your cull distances above that. So if you, e.g., have something like gl_ClipDistance[5] = 1; gl_CullDistance[0] = 1; Then it would decide that there are 6 clip distances (or if there's an explicit out float gl_ClipDistance[n], then use that), and 1 cull distance. In the TGSI, I'm thinking this might look approximately like PROPERTY CULL_MASK (16) DCL OUT[0], CLIPDIST[0] DCL OUT[1], CLIPDIST[1] MOV OUT[1].y, 1 (clip distance[5]) MOV OUT[1].z, 1 (cull distance[0]) Then basically you'd have (rast-clip_enable shader-actual_clip_writes_mask) | cull_mask = the enabled distances cull_mask = cull mask This would work *very* well for nouveau, not sure how suitable it is for other hardware. Cheers, -ilia ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Self introduction Hans de Goede
On 26.05.2015 09:43, Hans de Goede wrote: Hi, On 26-05-15 09:41, Martin Peres wrote: On 26/05/15 10:29, Hans de Goede wrote: Hi All, Since I will be working on nouveau pretty much starting today I thought it would be good to write a quick self introduction. I'm an FOSS enthusiast / developer since 1996. I've written (and still maintain) various hwmon drivers, various USB webcam drivers, libv4l and libgphoto camlibs for photoframes which use a custom usb protocol and usb redirection over the network for qemu. And although not the original author I'm now a days a maintainer of libusb, usb-3 bulk streams support in the xhci driver, the uas driver, and libahci_platform. In my spare time I work on u-boot and Linux support for Allwinner ARM SoCs, I'm a u-boot maintainer for these, and maintain the sata, mmc and usb kernel drivers. About a year ago I joined the Red Hat graphics team, where my first task was to help to get libinput up to a level where it could be used as the input stack for Wayland. The work on this is winding down, so my next project is to help out with nouveau and here I am. My knowledge of GPU-s and basically anything 3d is quite limited atm, so I will likely be asking a lot of questions to get up to speed. I'll also join the #nouveau channel on irc, so I'll see you all there. Regards, Hans Hey Hans. That's very good news! Welcome to the project! Do you know if you are going to help mostly on the kernel or mesa side? Where-ever help is needed is the best way I can describe the plan I guess. Step 1 is getting up to speed, I'll likely focus on Mesa first for that. Regards, Hans We sure need help! Welcome to the project from my side as well! Tobias ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH 2/2] nv30/draw: switch varying hookup logic to know about texcoords
On 26.05.2015 02:59, Ilia Mirkin wrote: On Mon, May 25, 2015 at 8:55 PM, Tobias Klausmann tobias.johannes.klausm...@mni.thm.de wrote: On 26.05.2015 02:49, Ilia Mirkin wrote: On Mon, May 25, 2015 at 8:37 PM, Tobias Klausmann tobias.johannes.klausm...@mni.thm.de wrote: On 25.05.2015 21:29, Ilia Mirkin wrote: Commit 8acaf862dfe switched things over to use TEXCOORD instead of GENERIC, but did not update the nv30 swtnl draw paths. This teaches the draw logic about TEXCOORD. Among other things, this fixes a crash in demos/arbocclude when using swtnl. Curiously enough, the point-sprite piglit works without this. Signed-off-by: Ilia Mirkin imir...@alum.mit.edu Cc: 10.5 10.6 mesa-sta...@lists.freedesktop.org --- src/gallium/drivers/nouveau/nv30/nv30_draw.c | 25 - 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/src/gallium/drivers/nouveau/nv30/nv30_draw.c b/src/gallium/drivers/nouveau/nv30/nv30_draw.c index a681135..03c0c70 100644 --- a/src/gallium/drivers/nouveau/nv30/nv30_draw.c +++ b/src/gallium/drivers/nouveau/nv30/nv30_draw.c @@ -230,22 +230,24 @@ static const struct { [TGSI_SEMANTIC_BCOLOR ] = { EMIT_4F, INTERP_LINEAR , 1, 3, 0x0004 }, [TGSI_SEMANTIC_FOG ] = { EMIT_4F, INTERP_PERSPECTIVE, 5, 5, 0x0010 }, [TGSI_SEMANTIC_PSIZE ] = { EMIT_1F_PSIZE, INTERP_POS , 6, 6, 0x0020 }, - [TGSI_SEMANTIC_GENERIC ] = { EMIT_4F, INTERP_PERSPECTIVE, 8, 7, 0x4000 } + [TGSI_SEMANTIC_TEXCOORD] = { EMIT_4F, INTERP_PERSPECTIVE, 8, 7, 0x4000 }, }; static boolean vroute_add(struct nv30_render *r, uint attrib, uint sem, uint *idx) { - struct pipe_screen *pscreen = r-nv30-screen-base.base; + struct nv30_screen *screen = r-nv30-screen; struct nv30_fragprog *fp = r-nv30-fragprog.program; struct vertex_info *vinfo = r-vertex_info; enum pipe_format format; uint emit = EMIT_OMIT; uint result = *idx; - if (sem == TGSI_SEMANTIC_GENERIC result = 8) { - for (result = 0; result 8; result++) { - if (fp-texcoord[result] == *idx) { + if (sem == TGSI_SEMANTIC_GENERIC) { + uint num_texcoords = (screen-eng3d-oclass NV40_3D_CLASS) ? 8 : 10; + for (result = 0; result num_texcoords; result++) { + if (fp-texcoord[result] == *idx + 8) { maybe i'm too tired, but why exactly *idx + 8 ? See nvfx_fragprog.c: fpc-fp-texcoord[hw] = fdec-Semantic.Index + 8; when the semantic is GENERIC. (and 0xfffe when it's PCOORD). This is because there can be up to 8 TEXCOORD's. yet you run for 8 or 10 texcoords. Wont this cause problems on nv40+? this is just the handle... it could just as well be + 1000. As long as it's 8, since that's what gets stored for the TEXCOORD semantics. oh right :) +sem = TGSI_SEMANTIC_TEXCOORD; emit = vroute[sem].emit; break; } @@ -260,11 +262,11 @@ vroute_add(struct nv30_render *r, uint attrib, uint sem, uint *idx) draw_emit_vertex_attr(vinfo, emit, vroute[sem].interp, attrib); format = draw_translate_vinfo_format(emit); - r-vtxfmt[attrib] = nv30_vtxfmt(pscreen, format)-hw; + r-vtxfmt[attrib] = nv30_vtxfmt(screen-base.base, format)-hw; r-vtxptr[attrib] = vinfo-size | NV30_3D_VTXBUF_DMA1; vinfo-size += draw_translate_vinfo_size(emit); - if (nv30_screen(pscreen)-eng3d-oclass NV40_3D_CLASS) { + if (screen-eng3d-oclass NV40_3D_CLASS) { r-vtxprog[attrib][0] = 0x001f38d8; r-vtxprog[attrib][1] = 0x0080001b | (attrib 9); r-vtxprog[attrib][2] = 0x0836106c; @@ -276,7 +278,12 @@ vroute_add(struct nv30_render *r, uint attrib, uint sem, uint *idx) r-vtxprog[attrib][3] = 0x6041ff80 | (result + vroute[sem].vp40) 2; } - *idx = vroute[sem].ow40 result; + if (result 8) + *idx = vroute[sem].ow40 result; + else { + assert(sem == TGSI_SEMANTIC_TEXCOORD); + *idx = 0x1000 (result - 8); + } return TRUE; } @@ -330,7 +337,7 @@ nv30_render_validate(struct nv30_context *nv30) while (pntc attrib 16) { uint index = ffs(pntc) - 1; pntc = ~(1 index); - if (vroute_add(r, attrib, TGSI_SEMANTIC_GENERIC, index)) { + if (vroute_add(r, attrib, TGSI_SEMANTIC_TEXCOORD, index)) { vp_attribs |= (1 attrib++); vp_results |= index; } ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [RFC PATCH 00/11] Implement ARB_cull_distance
On 25.05.2015 07:17, Dave Airlie wrote: On 25 May 2015 at 08:11, Marek Olšák mar...@gmail.com wrote: It's the same on Radeon. There are 2x ClipOrCullDistance output vectors and a mask saying it should clip or cull or do nothing. Marek My thinking was gallium should have a single semantic and a mask in the shader definition maybe. though it doesn't solve the does nvidia do the right thing with cull[0] and clip[0], and what is the right thing. Dave. I'm still convinced that both clip[0] and cull[0] should be possible. Plus i have written a shader_test for this a while ago which you pushed to piglit (fs-cull-and-clip-distance-different.shader_test). If i remember right nvidia passed that test just fine. ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH] docs: Mark ARB_cull_distance as in progress
Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- I'm already getting emails wanting me to do this, so just mark it, wont change anything really docs/GL3.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index 44a824b..8e1c8cd 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -190,7 +190,7 @@ GL 4.5, GLSL 4.50: GL_ARB_ES3_1_compatibility not started GL_ARB_clip_control DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe) GL_ARB_conditional_render_inverted DONE (i965, nv50, nvc0, llvmpipe, softpipe) - GL_ARB_cull_distance not started + GL_ARB_cull_distance in progress (Tobias) GL_ARB_derivative_controlDONE (i965, nv50, nvc0, r600) GL_ARB_direct_state_access DONE (all drivers) - Transform Feedback object DONE -- 2.4.1 ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [Mesa-dev] [PATCH 09/11] gallium: add support for arb_cull_distance
On 25.05.2015 20:36, Roland Scheidegger wrote: This doesn't really do what the commit message claims - it just adds a cap bit, not actual support for arb_cull_distance (which was already there), so the log should be changed accordingly. Yep, you are completely right here, will change it to better represent what was done here. Apart from what was already mentioned (that is, the cap bit isn't used by st/mesa which instead mistakenly uses glsl13), will change it for v2 it would be nice if you could test softpipe/llvmpipe with this enabled, as it should already work (at least for llvmpipe, I'm not entirely sure for softpipe) if not there's some unaccounted difference somewhere how we'd thought of how this should work for dx10 vs. opengl). Hey nice to know llvmpipe already supports this, i'll test it! Tobias Roland Am 24.05.2015 um 19:58 schrieb Tobias Klausmann: Add another pipe cap so we can savely enable of disable this extension Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- src/gallium/auxiliary/cso_cache/cso_context.c| 3 +++ src/gallium/drivers/freedreno/freedreno_screen.c | 1 + src/gallium/drivers/i915/i915_screen.c | 1 + src/gallium/drivers/ilo/ilo_screen.c | 1 + src/gallium/drivers/llvmpipe/lp_screen.c | 2 ++ src/gallium/drivers/nouveau/nv30/nv30_screen.c | 1 + src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + src/gallium/drivers/r300/r300_screen.c | 1 + src/gallium/drivers/r600/r600_pipe.c | 1 + src/gallium/drivers/radeonsi/si_pipe.c | 1 + src/gallium/drivers/softpipe/sp_screen.c | 2 ++ src/gallium/drivers/svga/svga_screen.c | 1 + src/gallium/drivers/vc4/vc4_screen.c | 1 + src/gallium/include/pipe/p_defines.h | 1 + 15 files changed, 19 insertions(+) diff --git a/src/gallium/auxiliary/cso_cache/cso_context.c b/src/gallium/auxiliary/cso_cache/cso_context.c index 744b00c..7612b43 100644 --- a/src/gallium/auxiliary/cso_cache/cso_context.c +++ b/src/gallium/auxiliary/cso_cache/cso_context.c @@ -119,6 +119,9 @@ struct cso_context { struct pipe_clip_state clip; struct pipe_clip_state clip_saved; + struct pipe_clip_state cull; + struct pipe_clip_state cull_saved; + struct pipe_framebuffer_state fb, fb_saved; struct pipe_viewport_state vp, vp_saved; struct pipe_blend_color blend_color; diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index c596d03..986a942 100644 --- a/src/gallium/drivers/freedreno/freedreno_screen.c +++ b/src/gallium/drivers/freedreno/freedreno_screen.c @@ -221,6 +221,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_MULTISAMPLE_Z_RESOLVE: case PIPE_CAP_RESOURCE_FROM_USER_MEMORY: case PIPE_CAP_DEVICE_RESET_STATUS_QUERY: + case PIPE_CAP_CULL_DISTANCE: return 0; case PIPE_CAP_MAX_VIEWPORTS: diff --git a/src/gallium/drivers/i915/i915_screen.c b/src/gallium/drivers/i915/i915_screen.c index 03fecd1..678347d 100644 --- a/src/gallium/drivers/i915/i915_screen.c +++ b/src/gallium/drivers/i915/i915_screen.c @@ -242,6 +242,7 @@ i915_get_param(struct pipe_screen *screen, enum pipe_cap cap) case PIPE_CAP_MULTISAMPLE_Z_RESOLVE: case PIPE_CAP_RESOURCE_FROM_USER_MEMORY: case PIPE_CAP_DEVICE_RESET_STATUS_QUERY: + case PIPE_CAP_CULL_DISTANCE: return 0; case PIPE_CAP_MAX_DUAL_SOURCE_RENDER_TARGETS: diff --git a/src/gallium/drivers/ilo/ilo_screen.c b/src/gallium/drivers/ilo/ilo_screen.c index b0fed73..f92d5de 100644 --- a/src/gallium/drivers/ilo/ilo_screen.c +++ b/src/gallium/drivers/ilo/ilo_screen.c @@ -459,6 +459,7 @@ ilo_get_param(struct pipe_screen *screen, enum pipe_cap param) case PIPE_CAP_MULTISAMPLE_Z_RESOLVE: case PIPE_CAP_RESOURCE_FROM_USER_MEMORY: case PIPE_CAP_DEVICE_RESET_STATUS_QUERY: + case PIPE_CAP_CULL_DISTANCE: return 0; case PIPE_CAP_VENDOR_ID: diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c b/src/gallium/drivers/llvmpipe/lp_screen.c index 09ac9af..c90c405 100644 --- a/src/gallium/drivers/llvmpipe/lp_screen.c +++ b/src/gallium/drivers/llvmpipe/lp_screen.c @@ -293,6 +293,8 @@ llvmpipe_get_param(struct pipe_screen *screen, enum pipe_cap param) case PIPE_CAP_RESOURCE_FROM_USER_MEMORY: case PIPE_CAP_DEVICE_RESET_STATUS_QUERY: return 0; + case PIPE_CAP_CULL_DISTANCE: + return 0; } /* should only get here on unhandled cases */ debug_printf(Unexpected PIPE_CAP %d query\n, param); diff --git a/src/gallium/drivers/nouveau/nv30/nv30_screen.c b/src/gallium/drivers/nouveau/nv30/nv30_screen.c index bb79ccc..fc33ddf 100644 --- a/src/gallium/drivers/nouveau/nv30/nv30_screen.c +++ b/src/gallium/drivers/nouveau/nv30/nv30_screen.c @@ -162,6 +162,7
Re: [Nouveau] [PATCH 1/2] nv30/draw: rework some of the output vertex buffer logic
On 25.05.2015 21:29, Ilia Mirkin wrote: This makes the vertex buffer go to GART, not VRAM, and redoes the mapping to not use the UNSYNCHRONIZED access (which is meaningless on a VRAM buffer anyways). While we're at it, add some flushes for VBO data. Moving the vertex buffer from VRAM to GART makes glxgears work fully with NV30_SWTNL=1. The other changes just seem like a good idea. I'm not sure *why* moving the buffer from VRAM makes it work... perhaps something doesn't get flushed in time? However this is a single use by the GPU buffer, so STREAM seems like the correct usage semantic for it. i'm not really happy moving things to gart and don't see why this resolves the issue but granted if it works out :-) Reviewed-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de Signed-off-by: Ilia Mirkin imir...@alum.mit.edu Cc: 10.5 10.6 mesa-sta...@lists.freedesktop.org --- src/gallium/drivers/nouveau/nv30/nv30_draw.c | 30 +--- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/src/gallium/drivers/nouveau/nv30/nv30_draw.c b/src/gallium/drivers/nouveau/nv30/nv30_draw.c index 6a0d06f..a681135 100644 --- a/src/gallium/drivers/nouveau/nv30/nv30_draw.c +++ b/src/gallium/drivers/nouveau/nv30/nv30_draw.c @@ -71,12 +71,12 @@ nv30_render_allocate_vertices(struct vbuf_render *render, struct nv30_render *r = nv30_render(render); struct nv30_context *nv30 = r-nv30; - r-length = vertex_size * nr_vertices; + r-length = (uint32_t)vertex_size * (uint32_t)nr_vertices; if (r-offset + r-length = render-max_vertex_buffer_bytes) { pipe_resource_reference(r-buffer, NULL); r-buffer = pipe_buffer_create(nv30-screen-base.base, - PIPE_BIND_VERTEX_BUFFER, 0, + PIPE_BIND_VERTEX_BUFFER, PIPE_USAGE_STREAM, render-max_vertex_buffer_bytes); if (!r-buffer) return FALSE; @@ -91,10 +91,14 @@ static void * nv30_render_map_vertices(struct vbuf_render *render) { struct nv30_render *r = nv30_render(render); - char *map = pipe_buffer_map(r-nv30-base.pipe, r-buffer, - PIPE_TRANSFER_WRITE | - PIPE_TRANSFER_UNSYNCHRONIZED, r-transfer); - return map + r-offset; + char *map = pipe_buffer_map_range( + r-nv30-base.pipe, r-buffer, + r-offset, r-length, + PIPE_TRANSFER_WRITE | + PIPE_TRANSFER_DISCARD_RANGE, + r-transfer); + assert(map); + return map; } static void @@ -127,12 +131,18 @@ nv30_render_draw_elements(struct vbuf_render *render, for (i = 0; i r-vertex_info.num_attribs; i++) { PUSH_RESRC(push, NV30_3D(VTXBUF(i)), BUFCTX_VTXTMP, nv04_resource(r-buffer), r-offset + r-vtxptr[i], - NOUVEAU_BO_LOW | NOUVEAU_BO_RD, 0, 0); + NOUVEAU_BO_LOW | NOUVEAU_BO_RD, 0, NV30_3D_VTXBUF_DMA1); } if (!nv30_state_validate(nv30, ~0, FALSE)) return; + if (nv30-base.vbo_dirty) { + BEGIN_NV04(push, NV30_3D(VTX_CACHE_INVALIDATE_1710), 1); + PUSH_DATA (push, 0); + nv30-base.vbo_dirty = FALSE; + } + BEGIN_NV04(push, NV30_3D(VERTEX_BEGIN_END), 1); PUSH_DATA (push, r-prim); @@ -178,6 +188,12 @@ nv30_render_draw_arrays(struct vbuf_render *render, unsigned start, uint nr) if (!nv30_state_validate(nv30, ~0, FALSE)) return; + if (nv30-base.vbo_dirty) { + BEGIN_NV04(push, NV30_3D(VTX_CACHE_INVALIDATE_1710), 1); + PUSH_DATA (push, 0); + nv30-base.vbo_dirty = FALSE; + } + BEGIN_NV04(push, NV30_3D(VERTEX_BEGIN_END), 1); PUSH_DATA (push, r-prim); ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH 2/2] nv30/draw: switch varying hookup logic to know about texcoords
On 26.05.2015 02:49, Ilia Mirkin wrote: On Mon, May 25, 2015 at 8:37 PM, Tobias Klausmann tobias.johannes.klausm...@mni.thm.de wrote: On 25.05.2015 21:29, Ilia Mirkin wrote: Commit 8acaf862dfe switched things over to use TEXCOORD instead of GENERIC, but did not update the nv30 swtnl draw paths. This teaches the draw logic about TEXCOORD. Among other things, this fixes a crash in demos/arbocclude when using swtnl. Curiously enough, the point-sprite piglit works without this. Signed-off-by: Ilia Mirkin imir...@alum.mit.edu Cc: 10.5 10.6 mesa-sta...@lists.freedesktop.org --- src/gallium/drivers/nouveau/nv30/nv30_draw.c | 25 - 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/src/gallium/drivers/nouveau/nv30/nv30_draw.c b/src/gallium/drivers/nouveau/nv30/nv30_draw.c index a681135..03c0c70 100644 --- a/src/gallium/drivers/nouveau/nv30/nv30_draw.c +++ b/src/gallium/drivers/nouveau/nv30/nv30_draw.c @@ -230,22 +230,24 @@ static const struct { [TGSI_SEMANTIC_BCOLOR ] = { EMIT_4F, INTERP_LINEAR , 1, 3, 0x0004 }, [TGSI_SEMANTIC_FOG ] = { EMIT_4F, INTERP_PERSPECTIVE, 5, 5, 0x0010 }, [TGSI_SEMANTIC_PSIZE ] = { EMIT_1F_PSIZE, INTERP_POS , 6, 6, 0x0020 }, - [TGSI_SEMANTIC_GENERIC ] = { EMIT_4F, INTERP_PERSPECTIVE, 8, 7, 0x4000 } + [TGSI_SEMANTIC_TEXCOORD] = { EMIT_4F, INTERP_PERSPECTIVE, 8, 7, 0x4000 }, }; static boolean vroute_add(struct nv30_render *r, uint attrib, uint sem, uint *idx) { - struct pipe_screen *pscreen = r-nv30-screen-base.base; + struct nv30_screen *screen = r-nv30-screen; struct nv30_fragprog *fp = r-nv30-fragprog.program; struct vertex_info *vinfo = r-vertex_info; enum pipe_format format; uint emit = EMIT_OMIT; uint result = *idx; - if (sem == TGSI_SEMANTIC_GENERIC result = 8) { - for (result = 0; result 8; result++) { - if (fp-texcoord[result] == *idx) { + if (sem == TGSI_SEMANTIC_GENERIC) { + uint num_texcoords = (screen-eng3d-oclass NV40_3D_CLASS) ? 8 : 10; + for (result = 0; result num_texcoords; result++) { + if (fp-texcoord[result] == *idx + 8) { maybe i'm too tired, but why exactly *idx + 8 ? See nvfx_fragprog.c: fpc-fp-texcoord[hw] = fdec-Semantic.Index + 8; when the semantic is GENERIC. (and 0xfffe when it's PCOORD). This is because there can be up to 8 TEXCOORD's. yet you run for 8 or 10 texcoords. Wont this cause problems on nv40+? +sem = TGSI_SEMANTIC_TEXCOORD; emit = vroute[sem].emit; break; } @@ -260,11 +262,11 @@ vroute_add(struct nv30_render *r, uint attrib, uint sem, uint *idx) draw_emit_vertex_attr(vinfo, emit, vroute[sem].interp, attrib); format = draw_translate_vinfo_format(emit); - r-vtxfmt[attrib] = nv30_vtxfmt(pscreen, format)-hw; + r-vtxfmt[attrib] = nv30_vtxfmt(screen-base.base, format)-hw; r-vtxptr[attrib] = vinfo-size | NV30_3D_VTXBUF_DMA1; vinfo-size += draw_translate_vinfo_size(emit); - if (nv30_screen(pscreen)-eng3d-oclass NV40_3D_CLASS) { + if (screen-eng3d-oclass NV40_3D_CLASS) { r-vtxprog[attrib][0] = 0x001f38d8; r-vtxprog[attrib][1] = 0x0080001b | (attrib 9); r-vtxprog[attrib][2] = 0x0836106c; @@ -276,7 +278,12 @@ vroute_add(struct nv30_render *r, uint attrib, uint sem, uint *idx) r-vtxprog[attrib][3] = 0x6041ff80 | (result + vroute[sem].vp40) 2; } - *idx = vroute[sem].ow40 result; + if (result 8) + *idx = vroute[sem].ow40 result; + else { + assert(sem == TGSI_SEMANTIC_TEXCOORD); + *idx = 0x1000 (result - 8); + } return TRUE; } @@ -330,7 +337,7 @@ nv30_render_validate(struct nv30_context *nv30) while (pntc attrib 16) { uint index = ffs(pntc) - 1; pntc = ~(1 index); - if (vroute_add(r, attrib, TGSI_SEMANTIC_GENERIC, index)) { + if (vroute_add(r, attrib, TGSI_SEMANTIC_TEXCOORD, index)) { vp_attribs |= (1 attrib++); vp_results |= index; } ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [Mesa-dev] [PATCH 2/2] nv30: fix clip plane uploads and enable changes
On 24.05.2015 16:15, Pierre Moreau wrote: On 24 May 2015, at 16:03, Tobias Klausmann tobias.johannes.klausm...@mni.thm.de wrote: On 24.05.2015 10:38, Samuel Pitoiset wrote: On 05/24/2015 06:58 AM, Ilia Mirkin wrote: nv30_validate_clip depends on the rasterizer state. Also we should upload all the new clip planes on change since next time the plane data won't have changed, but the enables might. Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- src/gallium/drivers/nouveau/nv30/nv30_state_validate.c | 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/src/gallium/drivers/nouveau/nv30/nv30_state_validate.c b/src/gallium/drivers/nouveau/nv30/nv30_state_validate.c index 86ac4f7..a954dcc 100644 --- a/src/gallium/drivers/nouveau/nv30/nv30_state_validate.c +++ b/src/gallium/drivers/nouveau/nv30/nv30_state_validate.c @@ -272,15 +272,13 @@ nv30_validate_clip(struct nv30_context *nv30) uint32_t clpd_enable = 0; for (i = 0; i 6; i++) { - if (nv30-rast-pipe.clip_plane_enable (1 i)) { - if (nv30-dirty NV30_NEW_CLIP) { -BEGIN_NV04(push, NV30_3D(VP_UPLOAD_CONST_ID), 5); -PUSH_DATA (push, i); -PUSH_DATAp(push, nv30-clip.ucp[i], 4); - } - - clpd_enable |= 1 (1 + 4*i); + if (nv30-dirty NV30_NEW_CLIP) { + BEGIN_NV04(push, NV30_3D(VP_UPLOAD_CONST_ID), 5); + PUSH_DATA (push, i); + PUSH_DATAp(push, nv30-clip.ucp[i], 4); } + if (nv30-rast-pipe.clip_plane_enable (1 i)) + clpd_enable |= 2 (4*i); Can you explain why did you change this line? This does bother me as well :) It should be the same as before but using one less addition: shifting 1 by 5 or 2 by 4 is similar. *dang* you are right. maybe we should change those lines along in nv50 and nvc0, save the additional addition :-) With this sorted out, series is: Reviewed-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de } BEGIN_NV04(push, NV30_3D(VP_CLIP_PLANES_ENABLE), 1); @@ -389,7 +387,7 @@ static struct state_validate hwtnl_validate_list[] = { { nv30_validate_stipple, NV30_NEW_STIPPLE }, { nv30_validate_scissor, NV30_NEW_SCISSOR | NV30_NEW_RASTERIZER }, { nv30_validate_viewport, NV30_NEW_VIEWPORT }, -{ nv30_validate_clip, NV30_NEW_CLIP }, +{ nv30_validate_clip, NV30_NEW_CLIP | NV30_NEW_RASTERIZER }, { nv30_fragprog_validate, NV30_NEW_FRAGPROG | NV30_NEW_FRAGCONST }, { nv30_vertprog_validate, NV30_NEW_VERTPROG | NV30_NEW_VERTCONST | NV30_NEW_FRAGPROG | NV30_NEW_RASTERIZER }, ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [Mesa-dev] [PATCH 2/2] nv30: fix clip plane uploads and enable changes
On 24.05.2015 10:38, Samuel Pitoiset wrote: On 05/24/2015 06:58 AM, Ilia Mirkin wrote: nv30_validate_clip depends on the rasterizer state. Also we should upload all the new clip planes on change since next time the plane data won't have changed, but the enables might. Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- src/gallium/drivers/nouveau/nv30/nv30_state_validate.c | 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/src/gallium/drivers/nouveau/nv30/nv30_state_validate.c b/src/gallium/drivers/nouveau/nv30/nv30_state_validate.c index 86ac4f7..a954dcc 100644 --- a/src/gallium/drivers/nouveau/nv30/nv30_state_validate.c +++ b/src/gallium/drivers/nouveau/nv30/nv30_state_validate.c @@ -272,15 +272,13 @@ nv30_validate_clip(struct nv30_context *nv30) uint32_t clpd_enable = 0; for (i = 0; i 6; i++) { - if (nv30-rast-pipe.clip_plane_enable (1 i)) { - if (nv30-dirty NV30_NEW_CLIP) { -BEGIN_NV04(push, NV30_3D(VP_UPLOAD_CONST_ID), 5); -PUSH_DATA (push, i); -PUSH_DATAp(push, nv30-clip.ucp[i], 4); - } - - clpd_enable |= 1 (1 + 4*i); + if (nv30-dirty NV30_NEW_CLIP) { + BEGIN_NV04(push, NV30_3D(VP_UPLOAD_CONST_ID), 5); + PUSH_DATA (push, i); + PUSH_DATAp(push, nv30-clip.ucp[i], 4); } + if (nv30-rast-pipe.clip_plane_enable (1 i)) + clpd_enable |= 2 (4*i); Can you explain why did you change this line? This does bother me as well :) } BEGIN_NV04(push, NV30_3D(VP_CLIP_PLANES_ENABLE), 1); @@ -389,7 +387,7 @@ static struct state_validate hwtnl_validate_list[] = { { nv30_validate_stipple, NV30_NEW_STIPPLE }, { nv30_validate_scissor, NV30_NEW_SCISSOR | NV30_NEW_RASTERIZER }, { nv30_validate_viewport, NV30_NEW_VIEWPORT }, -{ nv30_validate_clip, NV30_NEW_CLIP }, +{ nv30_validate_clip, NV30_NEW_CLIP | NV30_NEW_RASTERIZER }, { nv30_fragprog_validate, NV30_NEW_FRAGPROG | NV30_NEW_FRAGCONST }, { nv30_vertprog_validate, NV30_NEW_VERTPROG | NV30_NEW_VERTCONST | NV30_NEW_FRAGPROG | NV30_NEW_RASTERIZER }, ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH 01/11] glapi: add GL_ARB_cull_distance
Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- src/mapi/glapi/gen/gl_API.xml | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml index 3090b9f..a792056 100644 --- a/src/mapi/glapi/gen/gl_API.xml +++ b/src/mapi/glapi/gen/gl_API.xml @@ -8247,7 +8247,12 @@ enum name=QUERY_BY_REGION_NO_WAIT_INVERTED value=0x8E1A/ /category -!-- ARB extensions 162 - 163 -- +category name=ARB_cull_distance number=162 +enum name=MAX_CULL_DISTANCES value=0x82F9/ +enum name=MAX_COMBINED_CLIP_AND_CULL_DISTANCESvalue=0x82FA/ +/category + +!-- ARB extensions 163 -- xi:include href=ARB_direct_state_access.xml xmlns:xi=http://www.w3.org/2001/XInclude/ -- 2.4.1 ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [RFC PATCH 00/11] Implement ARB_cull_distance
This patch series adds the needed support for this extension to the various parts of mesa to finally enable it for nvc0. Dave Airlie (1): glsl: lower cull_distance into cull_distance_mesa Tobias Klausmann (10): glapi: add GL_ARB_cull_distance mesa/main: add support for GL_ARB_cull_distance mesa/prog: Add varyings for arb_cull_distance mesa/st: add support for GL_ARB_cull_distance glsl: Add a helper to see if an array was unsize in the shader glsl: Add arb_cull_distance support i965: rename UsesClipDistanceOut to UsesClipCullDistanceOut gallium: add support for arb_cull_distance nouveau/codegen: sort in galliums cull_distance semantic into the drivers bitmask nouveau/nvc0: implement cull_distance as a special form of clip distance docs/GL3.txt | 2 +- docs/relnotes/10.7.0.html | 4 +- src/gallium/auxiliary/cso_cache/cso_context.c | 3 + src/gallium/drivers/freedreno/freedreno_screen.c | 1 + src/gallium/drivers/i915/i915_screen.c | 1 + src/gallium/drivers/ilo/ilo_screen.c | 1 + src/gallium/drivers/llvmpipe/lp_screen.c | 2 + .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 5 + src/gallium/drivers/nouveau/nv30/nv30_screen.c | 1 + src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_program.c| 6 +- src/gallium/drivers/nouveau/nvc0/nvc0_program.h| 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + .../drivers/nouveau/nvc0/nvc0_state_validate.c | 1 + src/gallium/drivers/r300/r300_screen.c | 1 + src/gallium/drivers/r600/r600_pipe.c | 1 + src/gallium/drivers/radeonsi/si_pipe.c | 1 + src/gallium/drivers/softpipe/sp_screen.c | 2 + src/gallium/drivers/svga/svga_screen.c | 1 + src/gallium/drivers/vc4/vc4_screen.c | 1 + src/gallium/include/pipe/p_defines.h | 1 + src/glsl/Makefile.sources | 1 + src/glsl/ast_to_hir.cpp| 14 + src/glsl/builtin_variables.cpp | 13 +- src/glsl/glcpp/glcpp-parse.y | 3 + src/glsl/glsl_parser_extras.cpp| 1 + src/glsl/glsl_parser_extras.h | 3 + src/glsl/glsl_types.cpp| 8 +- src/glsl/glsl_types.h | 10 +- src/glsl/ir_optimization.h | 1 + src/glsl/link_varyings.cpp | 17 +- src/glsl/link_varyings.h | 3 +- src/glsl/linker.cpp| 124 +++-- src/glsl/lower_cull_distance.cpp | 549 + src/glsl/standalone_scaffolding.cpp| 1 + src/glsl/tests/varyings_test.cpp | 27 + src/mapi/glapi/gen/gl_API.xml | 7 +- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 2 +- src/mesa/drivers/dri/i965/brw_gs.c | 2 +- src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 +- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 2 +- src/mesa/drivers/dri/i965/brw_vs.c | 2 +- src/mesa/main/extensions.c | 1 + src/mesa/main/get.c| 26 + src/mesa/main/get_hash_params.py | 4 + src/mesa/main/mtypes.h | 22 +- src/mesa/main/shaderapi.c | 4 +- src/mesa/main/tests/enum_strings.cpp | 2 + src/mesa/program/prog_print.c | 4 + src/mesa/state_tracker/st_extensions.c | 4 + src/mesa/state_tracker/st_program.c| 34 ++ 51 files changed, 859 insertions(+), 72 deletions(-) create mode 100644 src/glsl/lower_cull_distance.cpp -- 2.4.1 ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH 02/11] mesa/main: add support for GL_ARB_cull_distance
Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- src/mesa/main/extensions.c | 1 + src/mesa/main/get.c | 26 ++ src/mesa/main/get_hash_params.py | 4 src/mesa/main/mtypes.h | 22 +- src/mesa/main/shaderapi.c| 4 ++-- src/mesa/main/tests/enum_strings.cpp | 2 ++ 6 files changed, 48 insertions(+), 11 deletions(-) diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c index c82416a..2145502 100644 --- a/src/mesa/main/extensions.c +++ b/src/mesa/main/extensions.c @@ -99,6 +99,7 @@ static const struct extension extension_table[] = { { GL_ARB_copy_buffer, o(dummy_true), GL, 2008 }, { GL_ARB_copy_image, o(ARB_copy_image), GL, 2012 }, { GL_ARB_conservative_depth, o(ARB_conservative_depth), GL, 2011 }, + { GL_ARB_cull_distance, o(ARB_cull_distance), GL, 2014 }, { GL_ARB_debug_output,o(dummy_true), GL, 2009 }, { GL_ARB_depth_buffer_float, o(ARB_depth_buffer_float), GL, 2008 }, { GL_ARB_depth_clamp, o(ARB_depth_clamp), GL, 2003 }, diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c index 8a6c81a..1dcfcc9 100644 --- a/src/mesa/main/get.c +++ b/src/mesa/main/get.c @@ -143,6 +143,8 @@ enum value_extra { EXTRA_VALID_DRAW_BUFFER, EXTRA_VALID_TEXTURE_UNIT, EXTRA_VALID_CLIP_DISTANCE, + EXTRA_VALID_CULL_DISTANCE, + EXTRA_VALID_CULL_AND_CLIP_DISTANCE, EXTRA_FLUSH_CURRENT, EXTRA_GLSL_130, EXTRA_EXT_UBO_GS4, @@ -267,6 +269,13 @@ static const int extra_valid_clip_distance[] = { EXTRA_END }; +static const int extra_valid_clip_and_cull_distance[] = { + EXTRA_VALID_CLIP_DISTANCE, + EXTRA_VALID_CULL_DISTANCE, + EXTRA_VALID_CULL_AND_CLIP_DISTANCE, + EXTRA_END +}; + static const int extra_flush_current_valid_texture_unit[] = { EXTRA_FLUSH_CURRENT, EXTRA_VALID_TEXTURE_UNIT, @@ -393,6 +402,7 @@ EXTRA_EXT(INTEL_performance_query); EXTRA_EXT(ARB_explicit_uniform_location); EXTRA_EXT(ARB_clip_control); EXTRA_EXT(EXT_polygon_offset_clamp); +EXTRA_EXT(ARB_cull_distance); static const int extra_ARB_color_buffer_float_or_glcore[] = { @@ -1116,6 +1126,22 @@ check_extra(struct gl_context *ctx, const char *func, const struct value_desc *d return GL_FALSE; } break; + case EXTRA_VALID_CULL_DISTANCE: +if (d-pname - GL_MAX_CULL_DISTANCES = ctx-Const.MaxClipPlanes) { + _mesa_error(ctx, GL_INVALID_ENUM, %s(cull distance %u), + func, d-pname - GL_MAX_CULL_DISTANCES); + return GL_FALSE; +} +break; + case EXTRA_VALID_CULL_AND_CLIP_DISTANCE: +if (d-pname - GL_MAX_COMBINED_CLIP_AND_CULL_DISTANCES = + ctx-Const.MaxClipPlanes) { + _mesa_error(ctx, GL_INVALID_ENUM, + %s(combined clip and cull distance %u), func, + d-pname - GL_MAX_COMBINED_CLIP_AND_CULL_DISTANCES); + return GL_FALSE; +} +break; case EXTRA_GLSL_130: api_check = GL_TRUE; if (ctx-Const.GLSLVersion = 130) diff --git a/src/mesa/main/get_hash_params.py b/src/mesa/main/get_hash_params.py index 41cb2c1..a63aba7 100644 --- a/src/mesa/main/get_hash_params.py +++ b/src/mesa/main/get_hash_params.py @@ -798,6 +798,10 @@ descriptor=[ [ MIN_FRAGMENT_INTERPOLATION_OFFSET, CONTEXT_FLOAT(Const.MinFragmentInterpolationOffset), extra_ARB_gpu_shader5 ], [ MAX_FRAGMENT_INTERPOLATION_OFFSET, CONTEXT_FLOAT(Const.MaxFragmentInterpolationOffset), extra_ARB_gpu_shader5 ], [ FRAGMENT_INTERPOLATION_OFFSET_BITS, CONST(FRAGMENT_INTERPOLATION_OFFSET_BITS), extra_ARB_gpu_shader5 ], + +# GL_ARB_cull_distance + [ MAX_CULL_DISTANCES, CONTEXT_INT(Const.MaxClipPlanes), extra_ARB_cull_distance ], + [ MAX_COMBINED_CLIP_AND_CULL_DISTANCES, CONTEXT_INT(Const.MaxClipPlanes), extra_ARB_cull_distance ], ]}, # Enums restricted to OpenGL Core profile diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index 8342517..6425c06 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -236,6 +236,8 @@ typedef enum VARYING_SLOT_CLIP_VERTEX, /* Does not appear in FS */ VARYING_SLOT_CLIP_DIST0, VARYING_SLOT_CLIP_DIST1, + VARYING_SLOT_CULL_DIST0, + VARYING_SLOT_CULL_DIST1, VARYING_SLOT_PRIMITIVE_ID, /* Does not appear in VS */ VARYING_SLOT_LAYER, /* Appears as VS or GS output */ VARYING_SLOT_VIEWPORT, /* Appears as VS or GS output */ @@ -272,6 +274,8 @@ typedef enum #define VARYING_BIT_CLIP_VERTEX BITFIELD64_BIT(VARYING_SLOT_CLIP_VERTEX) #define
[Nouveau] [PATCH 08/11] i965: rename UsesClipDistanceOut to UsesClipCullDistanceOut
Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 2 +- src/mesa/drivers/dri/i965/brw_gs.c | 2 +- src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 +- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 2 +- src/mesa/drivers/dri/i965/brw_vs.c | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index ead7768..c4439fd 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -3912,7 +3912,7 @@ fs_visitor::emit_urb_writes() fs_reg sources[8]; /* Lower legacy ff and ClipVertex clipping to clip distances */ - if (key-base.userclip_active !prog-UsesClipDistanceOut) + if (key-base.userclip_active !prog-UsesClipCullDistanceOut) compute_clip_distance(); /* If we don't have any valid slots to write, just do a minimal urb write diff --git a/src/mesa/drivers/dri/i965/brw_gs.c b/src/mesa/drivers/dri/i965/brw_gs.c index 52c7303..2cb3fe2 100644 --- a/src/mesa/drivers/dri/i965/brw_gs.c +++ b/src/mesa/drivers/dri/i965/brw_gs.c @@ -314,7 +314,7 @@ brw_gs_populate_key(struct brw_context *brw, key-base.program_string_id = gp-id; brw_setup_vue_key_clip_info(brw, key-base, - gp-program.Base.UsesClipDistanceOut); + gp-program.Base.UsesClipCullDistanceOut); /* _NEW_TEXTURE */ brw_populate_sampler_prog_key_data(ctx, prog, stage_state-sampler_count, diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index e9681b7..d99dcd0 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1730,7 +1730,7 @@ vec4_visitor::run() } base_ir = NULL; - if (key-userclip_active !prog-UsesClipDistanceOut) + if (key-userclip_active !prog-UsesClipCullDistanceOut) setup_uniform_clipplane_values(); emit_thread_end(); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index 5a60fe4..0401171 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -3250,7 +3250,7 @@ vec4_visitor::emit_vertex() } /* Lower legacy ff and ClipVertex clipping to clip distances */ - if (key-userclip_active !prog-UsesClipDistanceOut) { + if (key-userclip_active !prog-UsesClipCullDistanceOut) { current_annotation = user clip distances; output_reg[VARYING_SLOT_CLIP_DIST0] = dst_reg(this, glsl_type::vec4_type); diff --git a/src/mesa/drivers/dri/i965/brw_vs.c b/src/mesa/drivers/dri/i965/brw_vs.c index d03567e..eb69cc7 100644 --- a/src/mesa/drivers/dri/i965/brw_vs.c +++ b/src/mesa/drivers/dri/i965/brw_vs.c @@ -435,7 +435,7 @@ brw_vs_populate_key(struct brw_context *brw, */ key-base.program_string_id = vp-id; brw_setup_vue_key_clip_info(brw, key-base, - vp-program.Base.UsesClipDistanceOut); + vp-program.Base.UsesClipCullDistanceOut); /* _NEW_POLYGON */ if (brw-gen 6) { -- 2.4.1 ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH 05/11] glsl: Add a helper to see if an array was unsize in the shader
Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- src/glsl/glsl_types.cpp | 8 src/glsl/glsl_types.h | 10 -- src/glsl/linker.cpp | 2 +- 3 files changed, 13 insertions(+), 7 deletions(-) diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp index f675e90..4bc7324 100644 --- a/src/glsl/glsl_types.cpp +++ b/src/glsl/glsl_types.cpp @@ -340,12 +340,12 @@ _mesa_glsl_release_types(void) } -glsl_type::glsl_type(const glsl_type *array, unsigned length) : +glsl_type::glsl_type(const glsl_type *array, unsigned length, bool was_unsized) : base_type(GLSL_TYPE_ARRAY), sampler_dimensionality(0), sampler_shadow(0), sampler_array(0), sampler_type(0), interface_packing(0), vector_elements(0), matrix_columns(0), - length(length), name(NULL) + length(length), name(NULL), was_unsized(was_unsized) { this-fields.array = array; /* Inherit the gl type of the base. The GL type is used for @@ -635,7 +635,7 @@ glsl_type::get_sampler_instance(enum glsl_sampler_dim dim, } const glsl_type * -glsl_type::get_array_instance(const glsl_type *base, unsigned array_size) +glsl_type::get_array_instance(const glsl_type *base, unsigned array_size, bool was_unsized) { /* Generate a name using the base type pointer in the key. This is * done because the name of the base type may not be unique across @@ -656,7 +656,7 @@ glsl_type::get_array_instance(const glsl_type *base, unsigned array_size) if (t == NULL) { mtx_unlock(glsl_type::mutex); - t = new glsl_type(base, array_size); + t = new glsl_type(base, array_size, was_unsized); mtx_lock(glsl_type::mutex); hash_table_insert(array_types, (void *) t, ralloc_strdup(mem_ctx, key)); diff --git a/src/glsl/glsl_types.h b/src/glsl/glsl_types.h index f54a939..d6ad450 100644 --- a/src/glsl/glsl_types.h +++ b/src/glsl/glsl_types.h @@ -183,6 +183,12 @@ struct glsl_type { } fields; /** +* For \c GLSL_TYPE_ARRAY this determines if an array was unsized and +* got changed to a sized array. +*/ + bool was_unsized; + + /** * \name Pointers to various public type singletons */ /*@{*/ @@ -246,7 +252,7 @@ struct glsl_type { * Get the instance of an array type */ static const glsl_type *get_array_instance(const glsl_type *base, - unsigned elements); + unsigned elements, bool was_unsized = false); /** * Get the instance of a record type @@ -677,7 +683,7 @@ private: enum glsl_interface_packing packing, const char *name); /** Constructor for array types */ - glsl_type(const glsl_type *array, unsigned length); + glsl_type(const glsl_type *array, unsigned length, bool was_unsized); /** Hash table containing the known array types. */ static struct hash_table *array_types; diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp index 9798afe..8eace14 100644 --- a/src/glsl/linker.cpp +++ b/src/glsl/linker.cpp @@ -1261,7 +1261,7 @@ private: { if ((*type)-is_unsized_array()) { *type = glsl_type::get_array_instance((*type)-fields.array, - max_array_access + 1); + max_array_access + 1, true); assert(*type != NULL); } } -- 2.4.1 ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH 06/11] glsl: lower cull_distance into cull_distance_mesa
From: Dave Airlie airl...@redhat.com Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- src/glsl/Makefile.sources| 1 + src/glsl/ir_optimization.h | 1 + src/glsl/link_varyings.cpp | 15 +- src/glsl/link_varyings.h | 3 +- src/glsl/linker.cpp | 1 + src/glsl/lower_cull_distance.cpp | 549 +++ 6 files changed, 565 insertions(+), 5 deletions(-) create mode 100644 src/glsl/lower_cull_distance.cpp diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources index d784a81..502b6ca 100644 --- a/src/glsl/Makefile.sources +++ b/src/glsl/Makefile.sources @@ -143,6 +143,7 @@ LIBGLSL_FILES = \ loop_unroll.cpp \ lower_clip_distance.cpp \ lower_const_arrays_to_uniforms.cpp \ + lower_cull_distance.cpp \ lower_discard.cpp \ lower_discard_flow.cpp \ lower_if_to_cond_assign.cpp \ diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h index e6939f3..1220df6 100644 --- a/src/glsl/ir_optimization.h +++ b/src/glsl/ir_optimization.h @@ -119,6 +119,7 @@ bool lower_variable_index_to_cond_assign(exec_list *instructions, bool lower_quadop_vector(exec_list *instructions, bool dont_lower_swz); bool lower_const_arrays_to_uniforms(exec_list *instructions); bool lower_clip_distance(gl_shader *shader); +bool lower_cull_distance(gl_shader *shader); void lower_output_reads(exec_list *instructions); bool lower_packing_builtins(exec_list *instructions, int op_mask); void lower_ubo_reference(struct gl_shader *shader, exec_list *instructions); diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp index 7b2d4bd..46f84c6 100644 --- a/src/glsl/link_varyings.cpp +++ b/src/glsl/link_varyings.cpp @@ -301,6 +301,7 @@ tfeedback_decl::init(struct gl_context *ctx, const void *mem_ctx, this-location = -1; this-orig_name = input; this-is_clip_distance_mesa = false; + this-is_cull_distance_mesa = false; this-skip_components = 0; this-next_buffer_separator = false; this-matched_candidate = NULL; @@ -351,6 +352,10 @@ tfeedback_decl::init(struct gl_context *ctx, const void *mem_ctx, strcmp(this-var_name, gl_ClipDistance) == 0) { this-is_clip_distance_mesa = true; } + if (ctx-Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].LowerClipDistance + strcmp(this-var_name, gl_CullDistance) == 0) { + this-is_cull_distance_mesa = true; + } } @@ -397,7 +402,8 @@ tfeedback_decl::assign_location(struct gl_context *ctx, this-matched_candidate-type-fields.array-matrix_columns; const unsigned vector_elements = this-matched_candidate-type-fields.array-vector_elements; - unsigned actual_array_size = this-is_clip_distance_mesa ? + unsigned actual_array_size = + (this-is_clip_distance_mesa || this-is_cull_distance_mesa) ? prog-LastClipDistanceArraySize : this-matched_candidate-type-array_size(); @@ -410,7 +416,8 @@ tfeedback_decl::assign_location(struct gl_context *ctx, actual_array_size); return false; } - unsigned array_elem_size = this-is_clip_distance_mesa ? + unsigned array_elem_size = +(this-is_clip_distance_mesa || this-is_cull_distance_mesa) ? 1 : vector_elements * matrix_cols; fine_location += array_elem_size * this-array_subscript; this-size = 1; @@ -419,7 +426,7 @@ tfeedback_decl::assign_location(struct gl_context *ctx, } this-vector_elements = vector_elements; this-matrix_columns = matrix_cols; - if (this-is_clip_distance_mesa) + if (this-is_clip_distance_mesa || this-is_cull_distance_mesa) this-type = GL_FLOAT; else this-type = this-matched_candidate-type-fields.array-gl_type; @@ -542,7 +549,7 @@ const tfeedback_candidate * tfeedback_decl::find_candidate(gl_shader_program *prog, hash_table *tfeedback_candidates) { - const char *name = this-is_clip_distance_mesa + const char *name = this-is_cull_distance_mesa ? gl_CullDistanceMESA : this-is_clip_distance_mesa ? gl_ClipDistanceMESA : this-var_name; this-matched_candidate = (const tfeedback_candidate *) hash_table_find(tfeedback_candidates, name); diff --git a/src/glsl/link_varyings.h b/src/glsl/link_varyings.h index afc16a8..842ab7c 100644 --- a/src/glsl/link_varyings.h +++ b/src/glsl/link_varyings.h @@ -128,7 +128,7 @@ public: */ unsigned num_components() const { - if (this-is_clip_distance_mesa) + if (this-is_clip_distance_mesa || this-is_cull_distance_mesa) return this-size; else return this-vector_elements * this-matrix_columns * this-size; @@ -165,6 +165,7 @@ private: * gl_ClipDistance to gl_ClipDistanceMESA. */ bool is_clip_distance_mesa; + bool is_cull_distance_mesa
[Nouveau] [PATCH 07/11] glsl: Add arb_cull_distance support
Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- src/glsl/ast_to_hir.cpp | 14 + src/glsl/builtin_variables.cpp | 13 +++- src/glsl/glcpp/glcpp-parse.y| 3 + src/glsl/glsl_parser_extras.cpp | 1 + src/glsl/glsl_parser_extras.h | 3 + src/glsl/link_varyings.cpp | 2 +- src/glsl/linker.cpp | 121 +--- src/glsl/standalone_scaffolding.cpp | 1 + src/glsl/tests/varyings_test.cpp| 27 9 files changed, 145 insertions(+), 40 deletions(-) diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index 8aebb13..4db2b2e 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -1045,6 +1045,20 @@ check_builtin_array_max_size(const char *name, unsigned size, _mesa_glsl_error(loc, state, `gl_ClipDistance' array size cannot be larger than gl_MaxClipDistances (%u), state-Const.MaxClipPlanes); + } else if (strcmp(gl_CullDistance, name) == 0 + size state-Const.MaxClipPlanes) { + /* From the ARB_cull_distance spec: + * + * The gl_CullDistance array is predeclared as unsized and + *must be sized by the shader either redeclaring it with + *a size or indexing it only with integral constant + *expressions. The size determines the number and set of + *enabled cull distances and can be at most + *gl_MaxCullDistances. + */ + _mesa_glsl_error(loc, state, `gl_CullDistance' array size cannot + be larger than gl_MaxCullDistances (%u), + state-Const.MaxClipPlanes); } } diff --git a/src/glsl/builtin_variables.cpp b/src/glsl/builtin_variables.cpp index 6806aa1..78c8db2 100644 --- a/src/glsl/builtin_variables.cpp +++ b/src/glsl/builtin_variables.cpp @@ -298,7 +298,7 @@ public: const glsl_type *construct_interface_instance() const; private: - glsl_struct_field fields[10]; + glsl_struct_field fields[11]; unsigned num_fields; }; @@ -600,6 +600,12 @@ builtin_variable_generator::generate_constants() add_const(gl_MaxVaryingComponents, state-ctx-Const.MaxVarying * 4); } + if (state-is_version(450, 0) || state-ARB_cull_distance_enable) { + add_const(gl_MaxCullDistances, state-Const.MaxClipPlanes); + add_const(gl_MaxCombinedClipAndCullDistances, +state-Const.MaxClipPlanes); + } + if (state-is_version(150, 0)) { add_const(gl_MaxVertexOutputComponents, state-Const.MaxVertexOutputComponents); @@ -1029,6 +1035,11 @@ builtin_variable_generator::generate_varyings() gl_ClipDistance); } + if (state-is_version(450, 0) || state-ARB_cull_distance_enable) { + ADD_VARYING(VARYING_SLOT_CULL_DIST0, array(float_t, 0), + gl_CullDistance); + } + if (compatibility) { ADD_VARYING(VARYING_SLOT_TEX0, array(vec4_t, 0), gl_TexCoord); ADD_VARYING(VARYING_SLOT_FOGC, float_t, gl_FogFragCoord); diff --git a/src/glsl/glcpp/glcpp-parse.y b/src/glsl/glcpp/glcpp-parse.y index a11b6b2..536c17f 100644 --- a/src/glsl/glcpp/glcpp-parse.y +++ b/src/glsl/glcpp/glcpp-parse.y @@ -2483,6 +2483,9 @@ _glcpp_parser_handle_version_declaration(glcpp_parser_t *parser, intmax_t versio if (extensions-ARB_shader_precision) add_builtin_define(parser, GL_ARB_shader_precision, 1); + + if (extensions-ARB_cull_distance) +add_builtin_define(parser, GL_ARB_cull_distance, 1); } } diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp index 046d5d7..d1cd8ff 100644 --- a/src/glsl/glsl_parser_extras.cpp +++ b/src/glsl/glsl_parser_extras.cpp @@ -554,6 +554,7 @@ static const _mesa_glsl_extension _mesa_glsl_supported_extensions[] = { EXT(ARB_arrays_of_arrays, true, false, ARB_arrays_of_arrays), EXT(ARB_compute_shader, true, false, ARB_compute_shader), EXT(ARB_conservative_depth, true, false, ARB_conservative_depth), + EXT(ARB_cull_distance, true, false, ARB_cull_distance), EXT(ARB_derivative_control, true, false, ARB_derivative_control), EXT(ARB_draw_buffers, true, false, dummy_true), EXT(ARB_draw_instanced, true, false, ARB_draw_instanced), diff --git a/src/glsl/glsl_parser_extras.h b/src/glsl/glsl_parser_extras.h index 9a0c24e..8572905 100644 --- a/src/glsl/glsl_parser_extras.h +++ b/src/glsl/glsl_parser_extras.h @@ -378,6 +378,7 @@ struct _mesa_glsl_parse_state { /* ARB_viewport_array */ unsigned MaxViewports; + } Const; /** @@ -430,6 +431,8 @@ struct _mesa_glsl_parse_state { bool ARB_compute_shader_warn; bool ARB_conservative_depth_enable; bool ARB_conservative_depth_warn; + bool
[Nouveau] [PATCH 10/11] nouveau/codegen: sort in galliums cull_distance semantic into the drivers bitmask
Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 5 + 1 file changed, 5 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp index ecd115f..381a958 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp @@ -1063,6 +1063,11 @@ bool Source::scanDeclaration(const struct tgsi_full_declaration *decl) decl-Declaration.UsageMask (si * 4); info-io.genUserClip = -1; break; + case TGSI_SEMANTIC_CULLDIST: +info-io.cullDistanceMask |= + decl-Declaration.UsageMask (si * 4); +info-io.genUserClip = -1; +break; case TGSI_SEMANTIC_SAMPLEMASK: info-io.sampleMask = i; break; -- 2.4.1 ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH 04/11] mesa/st: add support for GL_ARB_cull_distance
Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- src/mesa/state_tracker/st_extensions.c | 4 src/mesa/state_tracker/st_program.c| 34 ++ 2 files changed, 38 insertions(+) diff --git a/src/mesa/state_tracker/st_extensions.c b/src/mesa/state_tracker/st_extensions.c index 23a4588..63f3334 100644 --- a/src/mesa/state_tracker/st_extensions.c +++ b/src/mesa/state_tracker/st_extensions.c @@ -466,6 +466,7 @@ void st_init_extensions(struct pipe_screen *screen, { o(ARB_conditional_render_inverted), PIPE_CAP_CONDITIONAL_RENDER_INVERTED }, { o(ARB_texture_view), PIPE_CAP_SAMPLER_VIEW_TARGET }, { o(ARB_clip_control), PIPE_CAP_CLIP_HALFZ }, + { o(ARB_cull_distance),PIPE_CAP_CULL_DISTANCE }, { o(EXT_polygon_offset_clamp), PIPE_CAP_POLYGON_OFFSET_CLAMP }, }; @@ -678,6 +679,9 @@ void st_init_extensions(struct pipe_screen *screen, if (glsl_feature_level = 410) extensions-ARB_shader_precision = GL_TRUE; + if (glsl_feature_level = 130) + extensions-ARB_cull_distance = GL_TRUE; + /* This extension needs full OpenGL 3.2, but we don't know if that's * supported at this point. Only check the GLSL version. */ if (consts-GLSLVersion = 150 diff --git a/src/mesa/state_tracker/st_program.c b/src/mesa/state_tracker/st_program.c index a9110d3..79e8ad7 100644 --- a/src/mesa/state_tracker/st_program.c +++ b/src/mesa/state_tracker/st_program.c @@ -253,6 +253,14 @@ st_prepare_vertex_program(struct gl_context *ctx, stvp-output_semantic_name[slot] = TGSI_SEMANTIC_CLIPDIST; stvp-output_semantic_index[slot] = 1; break; + case VARYING_SLOT_CULL_DIST0: +stvp-output_semantic_name[slot] = TGSI_SEMANTIC_CULLDIST; +stvp-output_semantic_index[slot] = 0; +break; + case VARYING_SLOT_CULL_DIST1: +stvp-output_semantic_name[slot] = TGSI_SEMANTIC_CULLDIST; +stvp-output_semantic_index[slot] = 1; +break; case VARYING_SLOT_EDGE: assert(0); break; @@ -606,6 +614,16 @@ st_translate_fragment_program(struct st_context *st, input_semantic_index[slot] = 1; interpMode[slot] = TGSI_INTERPOLATE_PERSPECTIVE; break; + case VARYING_SLOT_CULL_DIST0: +input_semantic_name[slot] = TGSI_SEMANTIC_CULLDIST; +input_semantic_index[slot] = 0; +interpMode[slot] = TGSI_INTERPOLATE_PERSPECTIVE; +break; + case VARYING_SLOT_CULL_DIST1: +input_semantic_name[slot] = TGSI_SEMANTIC_CULLDIST; +input_semantic_index[slot] = 1; +interpMode[slot] = TGSI_INTERPOLATE_PERSPECTIVE; +break; /* In most cases, there is nothing special about these * inputs, so adopt a convention to use the generic * semantic name and the mesa VARYING_SLOT_ number as the @@ -941,6 +959,14 @@ st_translate_geometry_program(struct st_context *st, input_semantic_name[slot] = TGSI_SEMANTIC_CLIPDIST; input_semantic_index[slot] = 1; break; + case VARYING_SLOT_CULL_DIST0: +input_semantic_name[slot] = TGSI_SEMANTIC_CULLDIST; +input_semantic_index[slot] = 0; +break; + case VARYING_SLOT_CULL_DIST1: +input_semantic_name[slot] = TGSI_SEMANTIC_CULLDIST; +input_semantic_index[slot] = 1; +break; case VARYING_SLOT_PSIZ: input_semantic_name[slot] = TGSI_SEMANTIC_PSIZE; input_semantic_index[slot] = 0; @@ -1028,6 +1054,14 @@ st_translate_geometry_program(struct st_context *st, gs_output_semantic_name[slot] = TGSI_SEMANTIC_CLIPDIST; gs_output_semantic_index[slot] = 1; break; + case VARYING_SLOT_CULL_DIST0: +gs_output_semantic_name[slot] = TGSI_SEMANTIC_CULLDIST; +gs_output_semantic_index[slot] = 0; +break; + case VARYING_SLOT_CULL_DIST1: +gs_output_semantic_name[slot] = TGSI_SEMANTIC_CULLDIST; +gs_output_semantic_index[slot] = 1; +break; case VARYING_SLOT_LAYER: gs_output_semantic_name[slot] = TGSI_SEMANTIC_LAYER; gs_output_semantic_index[slot] = 0; -- 2.4.1 ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH 11/11] nouveau/nvc0: implement cull_distance as a special form of clip distance
This enables ARB_cull_distance. Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- docs/GL3.txt | 2 +- docs/relnotes/10.7.0.html | 4 +++- src/gallium/drivers/nouveau/nvc0/nvc0_program.c| 6 +- src/gallium/drivers/nouveau/nvc0/nvc0_program.h| 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 2 +- src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c | 1 + 6 files changed, 12 insertions(+), 4 deletions(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index 9d56ee5..ebdae38 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -190,7 +190,7 @@ GL 4.5, GLSL 4.50: GL_ARB_ES3_1_compatibility not started GL_ARB_clip_control DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe) GL_ARB_conditional_render_inverted DONE (i965, nv50, nvc0, llvmpipe, softpipe) - GL_ARB_cull_distance not started + GL_ARB_cull_distance DONE (nvc0) GL_ARB_derivative_controlDONE (i965, nv50, nvc0, r600) GL_ARB_direct_state_access DONE (all drivers) - Transform Feedback object DONE diff --git a/docs/relnotes/10.7.0.html b/docs/relnotes/10.7.0.html index 6206716..12e6b5b 100644 --- a/docs/relnotes/10.7.0.html +++ b/docs/relnotes/10.7.0.html @@ -43,7 +43,9 @@ TBD. Note: some of the new features are only available with certain drivers. /p -TBD. +ul +liGL_ARB_cull_distance on nvc0/li +/ul h2Bug fixes/h2 diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c index 4a47cb2..aa3b751 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c @@ -46,6 +46,7 @@ nvc0_shader_input_address(unsigned sn, unsigned si, unsigned ubase) case TGSI_SEMANTIC_BCOLOR: return 0x2a0 + si * 0x10; case NV50_SEMANTIC_CLIPDISTANCE: return 0x2c0 + si * 0x4; case TGSI_SEMANTIC_CLIPDIST: return 0x2c0 + si * 0x10; + case TGSI_SEMANTIC_CULLDIST: return 0x2c0 + si * 0x10; case TGSI_SEMANTIC_CLIPVERTEX: return 0x270; case TGSI_SEMANTIC_PCOORD: return 0x2e0; case NV50_SEMANTIC_TESSCOORD:return 0x2f0; @@ -75,6 +76,7 @@ nvc0_shader_output_address(unsigned sn, unsigned si, unsigned ubase) case TGSI_SEMANTIC_BCOLOR:return 0x2a0 + si * 0x10; case NV50_SEMANTIC_CLIPDISTANCE: return 0x2c0 + si * 0x4; case TGSI_SEMANTIC_CLIPDIST: return 0x2c0 + si * 0x10; + case TGSI_SEMANTIC_CULLDIST: return 0x2c0 + si * 0x10; case TGSI_SEMANTIC_CLIPVERTEX:return 0x270; case TGSI_SEMANTIC_TEXCOORD: return 0x300 + si * 0x10; case TGSI_SEMANTIC_EDGEFLAG: return ~0; @@ -255,11 +257,13 @@ nvc0_vtgp_gen_header(struct nvc0_program *vp, struct nv50_ir_prog_info *info) } } - vp-vp.clip_enable = info-io.clipDistanceMask; for (i = 0; i 8; ++i) if (info-io.cullDistanceMask (1 i)) vp-vp.clip_mode |= 1 (i * 4); + vp-vp.clip_enable = info-io.clipDistanceMask; + vp-vp.cull_enable = info-io.cullDistanceMask; + if (info-io.genUserClip 0) vp-vp.num_ucps = PIPE_MAX_CLIP_PLANES + 1; /* prevent rebuilding */ diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_program.h b/src/gallium/drivers/nouveau/nvc0/nvc0_program.h index 3fd9d21..b8b1a5a 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_program.h +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_program.h @@ -39,6 +39,7 @@ struct nvc0_program { struct { uint32_t clip_mode; /* clip/cull selection */ uint8_t clip_enable; /* mask of defined clip planes */ + uint8_t cull_enable; /* mask of defined cull planes */ uint8_t num_ucps; /* also set to max if ClipDistance is used */ uint8_t edgeflag; /* attribute index of edgeflag input */ boolean need_vertex_id; diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c index c942dda..56d22a0 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c @@ -174,6 +174,7 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_CLIP_HALFZ: case PIPE_CAP_POLYGON_OFFSET_CLAMP: case PIPE_CAP_MULTISAMPLE_Z_RESOLVE: + case PIPE_CAP_CULL_DISTANCE: return 1; case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE: return (class_3d = NVE4_3D_CLASS) ? 1 : 0; @@ -194,7 +195,6 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_VERTEXID_NOBASE: case PIPE_CAP_RESOURCE_FROM_USER_MEMORY: case PIPE_CAP_DEVICE_RESET_STATUS_QUERY: - case PIPE_CAP_CULL_DISTANCE: return 0; case PIPE_CAP_VENDOR_ID: diff --git a/src/gallium/drivers/nouveau
[Nouveau] [PATCH 09/11] gallium: add support for arb_cull_distance
Add another pipe cap so we can savely enable of disable this extension Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- src/gallium/auxiliary/cso_cache/cso_context.c| 3 +++ src/gallium/drivers/freedreno/freedreno_screen.c | 1 + src/gallium/drivers/i915/i915_screen.c | 1 + src/gallium/drivers/ilo/ilo_screen.c | 1 + src/gallium/drivers/llvmpipe/lp_screen.c | 2 ++ src/gallium/drivers/nouveau/nv30/nv30_screen.c | 1 + src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + src/gallium/drivers/r300/r300_screen.c | 1 + src/gallium/drivers/r600/r600_pipe.c | 1 + src/gallium/drivers/radeonsi/si_pipe.c | 1 + src/gallium/drivers/softpipe/sp_screen.c | 2 ++ src/gallium/drivers/svga/svga_screen.c | 1 + src/gallium/drivers/vc4/vc4_screen.c | 1 + src/gallium/include/pipe/p_defines.h | 1 + 15 files changed, 19 insertions(+) diff --git a/src/gallium/auxiliary/cso_cache/cso_context.c b/src/gallium/auxiliary/cso_cache/cso_context.c index 744b00c..7612b43 100644 --- a/src/gallium/auxiliary/cso_cache/cso_context.c +++ b/src/gallium/auxiliary/cso_cache/cso_context.c @@ -119,6 +119,9 @@ struct cso_context { struct pipe_clip_state clip; struct pipe_clip_state clip_saved; + struct pipe_clip_state cull; + struct pipe_clip_state cull_saved; + struct pipe_framebuffer_state fb, fb_saved; struct pipe_viewport_state vp, vp_saved; struct pipe_blend_color blend_color; diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index c596d03..986a942 100644 --- a/src/gallium/drivers/freedreno/freedreno_screen.c +++ b/src/gallium/drivers/freedreno/freedreno_screen.c @@ -221,6 +221,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_MULTISAMPLE_Z_RESOLVE: case PIPE_CAP_RESOURCE_FROM_USER_MEMORY: case PIPE_CAP_DEVICE_RESET_STATUS_QUERY: + case PIPE_CAP_CULL_DISTANCE: return 0; case PIPE_CAP_MAX_VIEWPORTS: diff --git a/src/gallium/drivers/i915/i915_screen.c b/src/gallium/drivers/i915/i915_screen.c index 03fecd1..678347d 100644 --- a/src/gallium/drivers/i915/i915_screen.c +++ b/src/gallium/drivers/i915/i915_screen.c @@ -242,6 +242,7 @@ i915_get_param(struct pipe_screen *screen, enum pipe_cap cap) case PIPE_CAP_MULTISAMPLE_Z_RESOLVE: case PIPE_CAP_RESOURCE_FROM_USER_MEMORY: case PIPE_CAP_DEVICE_RESET_STATUS_QUERY: + case PIPE_CAP_CULL_DISTANCE: return 0; case PIPE_CAP_MAX_DUAL_SOURCE_RENDER_TARGETS: diff --git a/src/gallium/drivers/ilo/ilo_screen.c b/src/gallium/drivers/ilo/ilo_screen.c index b0fed73..f92d5de 100644 --- a/src/gallium/drivers/ilo/ilo_screen.c +++ b/src/gallium/drivers/ilo/ilo_screen.c @@ -459,6 +459,7 @@ ilo_get_param(struct pipe_screen *screen, enum pipe_cap param) case PIPE_CAP_MULTISAMPLE_Z_RESOLVE: case PIPE_CAP_RESOURCE_FROM_USER_MEMORY: case PIPE_CAP_DEVICE_RESET_STATUS_QUERY: + case PIPE_CAP_CULL_DISTANCE: return 0; case PIPE_CAP_VENDOR_ID: diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c b/src/gallium/drivers/llvmpipe/lp_screen.c index 09ac9af..c90c405 100644 --- a/src/gallium/drivers/llvmpipe/lp_screen.c +++ b/src/gallium/drivers/llvmpipe/lp_screen.c @@ -293,6 +293,8 @@ llvmpipe_get_param(struct pipe_screen *screen, enum pipe_cap param) case PIPE_CAP_RESOURCE_FROM_USER_MEMORY: case PIPE_CAP_DEVICE_RESET_STATUS_QUERY: return 0; + case PIPE_CAP_CULL_DISTANCE: + return 0; } /* should only get here on unhandled cases */ debug_printf(Unexpected PIPE_CAP %d query\n, param); diff --git a/src/gallium/drivers/nouveau/nv30/nv30_screen.c b/src/gallium/drivers/nouveau/nv30/nv30_screen.c index bb79ccc..fc33ddf 100644 --- a/src/gallium/drivers/nouveau/nv30/nv30_screen.c +++ b/src/gallium/drivers/nouveau/nv30/nv30_screen.c @@ -162,6 +162,7 @@ nv30_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_MULTISAMPLE_Z_RESOLVE: case PIPE_CAP_RESOURCE_FROM_USER_MEMORY: case PIPE_CAP_DEVICE_RESET_STATUS_QUERY: + case PIPE_CAP_CULL_DISTANCE: return 0; case PIPE_CAP_VENDOR_ID: diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c b/src/gallium/drivers/nouveau/nv50/nv50_screen.c index f455a7f..d8efd75 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c @@ -210,6 +210,7 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_MULTISAMPLE_Z_RESOLVE: /* potentially supported on some hw */ case PIPE_CAP_RESOURCE_FROM_USER_MEMORY: case PIPE_CAP_DEVICE_RESET_STATUS_QUERY: + case PIPE_CAP_CULL_DISTANCE: return 0; case PIPE_CAP_VENDOR_ID: diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c b
[Nouveau] [PATCH 03/11] mesa/prog: Add varyings for arb_cull_distance
Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- src/mesa/program/prog_print.c | 4 1 file changed, 4 insertions(+) diff --git a/src/mesa/program/prog_print.c b/src/mesa/program/prog_print.c index d588d07..e8855cd 100644 --- a/src/mesa/program/prog_print.c +++ b/src/mesa/program/prog_print.c @@ -147,6 +147,8 @@ arb_input_attrib_string(GLuint index, GLenum progType) fragment.(twenty-one), /* VARYING_SLOT_VIEWPORT */ fragment.(twenty-two), /* VARYING_SLOT_FACE */ fragment.(twenty-three), /* VARYING_SLOT_PNTC */ + fragment.(twenty-four), /* VARYING_SLOT_CULL_DIST0 */ + fragment.(twenty-five), /* VARYING_SLOT_CULL_DIST1 */ fragment.varying[0], fragment.varying[1], fragment.varying[2], @@ -272,6 +274,8 @@ arb_output_attrib_string(GLuint index, GLenum progType) result.(twenty-one), /* VARYING_SLOT_VIEWPORT */ result.(twenty-two), /* VARYING_SLOT_FACE */ result.(twenty-three), /* VARYING_SLOT_PNTC */ + result.(twenty-four), /* VARYING_SLOT_CULL_DIST0 */ + result.(twenty-five), /* VARYING_SLOT_CULL_DIST1 */ result.varying[0], result.varying[1], result.varying[2], -- 2.4.1 ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH 02/11] mesa/main: add support for GL_ARB_cull_distance
On 24.05.2015 20:11, Ilia Mirkin wrote: On Sun, May 24, 2015 at 1:58 PM, Tobias Klausmann tobias.johannes.klausm...@mni.thm.de wrote: Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- src/mesa/main/extensions.c | 1 + src/mesa/main/get.c | 26 ++ src/mesa/main/get_hash_params.py | 4 src/mesa/main/mtypes.h | 22 +- src/mesa/main/shaderapi.c| 4 ++-- src/mesa/main/tests/enum_strings.cpp | 2 ++ 6 files changed, 48 insertions(+), 11 deletions(-) diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c index c82416a..2145502 100644 --- a/src/mesa/main/extensions.c +++ b/src/mesa/main/extensions.c @@ -99,6 +99,7 @@ static const struct extension extension_table[] = { { GL_ARB_copy_buffer, o(dummy_true), GL, 2008 }, { GL_ARB_copy_image, o(ARB_copy_image), GL, 2012 }, { GL_ARB_conservative_depth, o(ARB_conservative_depth), GL, 2011 }, + { GL_ARB_cull_distance, o(ARB_cull_distance), GL, 2014 }, { GL_ARB_debug_output,o(dummy_true), GL, 2009 }, { GL_ARB_depth_buffer_float, o(ARB_depth_buffer_float), GL, 2008 }, { GL_ARB_depth_clamp, o(ARB_depth_clamp), GL, 2003 }, diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c index 8a6c81a..1dcfcc9 100644 --- a/src/mesa/main/get.c +++ b/src/mesa/main/get.c @@ -143,6 +143,8 @@ enum value_extra { EXTRA_VALID_DRAW_BUFFER, EXTRA_VALID_TEXTURE_UNIT, EXTRA_VALID_CLIP_DISTANCE, + EXTRA_VALID_CULL_DISTANCE, + EXTRA_VALID_CULL_AND_CLIP_DISTANCE, EXTRA_FLUSH_CURRENT, EXTRA_GLSL_130, EXTRA_EXT_UBO_GS4, @@ -267,6 +269,13 @@ static const int extra_valid_clip_distance[] = { EXTRA_END }; +static const int extra_valid_clip_and_cull_distance[] = { + EXTRA_VALID_CLIP_DISTANCE, + EXTRA_VALID_CULL_DISTANCE, + EXTRA_VALID_CULL_AND_CLIP_DISTANCE, + EXTRA_END +}; + static const int extra_flush_current_valid_texture_unit[] = { EXTRA_FLUSH_CURRENT, EXTRA_VALID_TEXTURE_UNIT, @@ -393,6 +402,7 @@ EXTRA_EXT(INTEL_performance_query); EXTRA_EXT(ARB_explicit_uniform_location); EXTRA_EXT(ARB_clip_control); EXTRA_EXT(EXT_polygon_offset_clamp); +EXTRA_EXT(ARB_cull_distance); static const int extra_ARB_color_buffer_float_or_glcore[] = { @@ -1116,6 +1126,22 @@ check_extra(struct gl_context *ctx, const char *func, const struct value_desc *d return GL_FALSE; } break; + case EXTRA_VALID_CULL_DISTANCE: +if (d-pname - GL_MAX_CULL_DISTANCES = ctx-Const.MaxClipPlanes) { + _mesa_error(ctx, GL_INVALID_ENUM, %s(cull distance %u), + func, d-pname - GL_MAX_CULL_DISTANCES); + return GL_FALSE; +} +break; + case EXTRA_VALID_CULL_AND_CLIP_DISTANCE: +if (d-pname - GL_MAX_COMBINED_CLIP_AND_CULL_DISTANCES = + ctx-Const.MaxClipPlanes) { + _mesa_error(ctx, GL_INVALID_ENUM, + %s(combined clip and cull distance %u), func, + d-pname - GL_MAX_COMBINED_CLIP_AND_CULL_DISTANCES); + return GL_FALSE; +} huh? I guess you were copying EXTRA_VALID_CLIP_DISTANCE? That's for validating GL_CLIP_DISTANCE0..7 all in one go (and erroring out for ones that are too high). That doesn't seem to apply here. You don't appear to use extra_valid_clip_and_cull_distance either, so I guess that makes sense... should remove the whole lot. will do! +break; case EXTRA_GLSL_130: api_check = GL_TRUE; if (ctx-Const.GLSLVersion = 130) diff --git a/src/mesa/main/get_hash_params.py b/src/mesa/main/get_hash_params.py index 41cb2c1..a63aba7 100644 --- a/src/mesa/main/get_hash_params.py +++ b/src/mesa/main/get_hash_params.py @@ -798,6 +798,10 @@ descriptor=[ [ MIN_FRAGMENT_INTERPOLATION_OFFSET, CONTEXT_FLOAT(Const.MinFragmentInterpolationOffset), extra_ARB_gpu_shader5 ], [ MAX_FRAGMENT_INTERPOLATION_OFFSET, CONTEXT_FLOAT(Const.MaxFragmentInterpolationOffset), extra_ARB_gpu_shader5 ], [ FRAGMENT_INTERPOLATION_OFFSET_BITS, CONST(FRAGMENT_INTERPOLATION_OFFSET_BITS), extra_ARB_gpu_shader5 ], + +# GL_ARB_cull_distance + [ MAX_CULL_DISTANCES, CONTEXT_INT(Const.MaxClipPlanes), extra_ARB_cull_distance ], + [ MAX_COMBINED_CLIP_AND_CULL_DISTANCES, CONTEXT_INT(Const.MaxClipPlanes), extra_ARB_cull_distance ], ]}, # Enums restricted to OpenGL Core profile diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index 8342517..6425c06 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main
Re: [Nouveau] [Mesa-dev] [PATCH 04/11] mesa/st: add support for GL_ARB_cull_distance
On 24.05.2015 20:12, Marek Olšák wrote: On Sun, May 24, 2015 at 7:58 PM, Tobias Klausmann tobias.johannes.klausm...@mni.thm.de wrote: Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- src/mesa/state_tracker/st_extensions.c | 4 src/mesa/state_tracker/st_program.c| 34 ++ 2 files changed, 38 insertions(+) diff --git a/src/mesa/state_tracker/st_extensions.c b/src/mesa/state_tracker/st_extensions.c index 23a4588..63f3334 100644 --- a/src/mesa/state_tracker/st_extensions.c +++ b/src/mesa/state_tracker/st_extensions.c @@ -466,6 +466,7 @@ void st_init_extensions(struct pipe_screen *screen, { o(ARB_conditional_render_inverted), PIPE_CAP_CONDITIONAL_RENDER_INVERTED }, { o(ARB_texture_view), PIPE_CAP_SAMPLER_VIEW_TARGET }, { o(ARB_clip_control), PIPE_CAP_CLIP_HALFZ }, + { o(ARB_cull_distance),PIPE_CAP_CULL_DISTANCE }, { o(EXT_polygon_offset_clamp), PIPE_CAP_POLYGON_OFFSET_CLAMP }, }; @@ -678,6 +679,9 @@ void st_init_extensions(struct pipe_screen *screen, if (glsl_feature_level = 410) extensions-ARB_shader_precision = GL_TRUE; + if (glsl_feature_level = 130) + extensions-ARB_cull_distance = GL_TRUE; + This hunk is wrong and seems to be completely unnecessary. Also, the patch which adds PIPE_CAP_CULL_DISTANCE should be before this patch. Marek removing and noted, thanks! ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [Mesa-dev] [PATCH 2/2] nv30: fix clip plane uploads and enable changes
On 24.05.2015 17:42, Ilia Mirkin wrote: On Sun, May 24, 2015 at 10:56 AM, Tobias Klausmann tobias.johannes.klausm...@mni.thm.de wrote: On 24.05.2015 16:15, Pierre Moreau wrote: On 24 May 2015, at 16:03, Tobias Klausmann tobias.johannes.klausm...@mni.thm.de wrote: On 24.05.2015 10:38, Samuel Pitoiset wrote: On 05/24/2015 06:58 AM, Ilia Mirkin wrote: nv30_validate_clip depends on the rasterizer state. Also we should upload all the new clip planes on change since next time the plane data won't have changed, but the enables might. Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- src/gallium/drivers/nouveau/nv30/nv30_state_validate.c | 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/src/gallium/drivers/nouveau/nv30/nv30_state_validate.c b/src/gallium/drivers/nouveau/nv30/nv30_state_validate.c index 86ac4f7..a954dcc 100644 --- a/src/gallium/drivers/nouveau/nv30/nv30_state_validate.c +++ b/src/gallium/drivers/nouveau/nv30/nv30_state_validate.c @@ -272,15 +272,13 @@ nv30_validate_clip(struct nv30_context *nv30) uint32_t clpd_enable = 0; for (i = 0; i 6; i++) { - if (nv30-rast-pipe.clip_plane_enable (1 i)) { - if (nv30-dirty NV30_NEW_CLIP) { -BEGIN_NV04(push, NV30_3D(VP_UPLOAD_CONST_ID), 5); -PUSH_DATA (push, i); -PUSH_DATAp(push, nv30-clip.ucp[i], 4); - } - - clpd_enable |= 1 (1 + 4*i); + if (nv30-dirty NV30_NEW_CLIP) { + BEGIN_NV04(push, NV30_3D(VP_UPLOAD_CONST_ID), 5); + PUSH_DATA (push, i); + PUSH_DATAp(push, nv30-clip.ucp[i], 4); } + if (nv30-rast-pipe.clip_plane_enable (1 i)) + clpd_enable |= 2 (4*i); Can you explain why did you change this line? This does bother me as well :) It should be the same as before but using one less addition: shifting 1 by 5 or 2 by 4 is similar. *dang* you are right. maybe we should change those lines along in nv50 and nvc0, save the additional addition :-) What lines? With this sorted out, series is: Not sure what you mean here... what do you want me to sort out? The 2 back into a +1? I was just looking at the defines like Nah, i meant that it _is_ allright the way you did it and that we should change similar lines for clip in nv50/nvc0 the way you did here #define NV30_3D_VP_CLIP_PLANES_ENABLE_PLANE1 0x0020 and the 2 4i seemed more obviously correct. Although they're obviously identical. Reviewed-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de } BEGIN_NV04(push, NV30_3D(VP_CLIP_PLANES_ENABLE), 1); @@ -389,7 +387,7 @@ static struct state_validate hwtnl_validate_list[] = { { nv30_validate_stipple, NV30_NEW_STIPPLE }, { nv30_validate_scissor, NV30_NEW_SCISSOR | NV30_NEW_RASTERIZER }, { nv30_validate_viewport, NV30_NEW_VIEWPORT }, -{ nv30_validate_clip, NV30_NEW_CLIP }, +{ nv30_validate_clip, NV30_NEW_CLIP | NV30_NEW_RASTERIZER }, { nv30_fragprog_validate, NV30_NEW_FRAGPROG | NV30_NEW_FRAGCONST }, { nv30_vertprog_validate, NV30_NEW_VERTPROG | NV30_NEW_VERTCONST | NV30_NEW_FRAGPROG | NV30_NEW_RASTERIZER }, ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [RFC PATCH 00/11] Implement ARB_cull_distance
On 24.05.2015 20:25, Ilia Mirkin wrote: I'm having a bit of trouble tracing through this. What happens if I have a shader that just does: gl_ClipDistance[0] = 1; gl_CullDistance[0] = 1; what does the resulting TGSI look like? (Assuming that clip plane 0 is enabled.) What about the generated nvc0 code (for the vertex shader)? (hack up a patch for this, run it without DRI_PRIME=1, see i pass and forget to check it again) yeah those are equal, sorry for wasting your time on this :/ On Sun, May 24, 2015 at 1:57 PM, Tobias Klausmann tobias.johannes.klausm...@mni.thm.de wrote: This patch series adds the needed support for this extension to the various parts of mesa to finally enable it for nvc0. Dave Airlie (1): glsl: lower cull_distance into cull_distance_mesa Tobias Klausmann (10): glapi: add GL_ARB_cull_distance mesa/main: add support for GL_ARB_cull_distance mesa/prog: Add varyings for arb_cull_distance mesa/st: add support for GL_ARB_cull_distance glsl: Add a helper to see if an array was unsize in the shader glsl: Add arb_cull_distance support i965: rename UsesClipDistanceOut to UsesClipCullDistanceOut gallium: add support for arb_cull_distance nouveau/codegen: sort in galliums cull_distance semantic into the drivers bitmask nouveau/nvc0: implement cull_distance as a special form of clip distance docs/GL3.txt | 2 +- docs/relnotes/10.7.0.html | 4 +- src/gallium/auxiliary/cso_cache/cso_context.c | 3 + src/gallium/drivers/freedreno/freedreno_screen.c | 1 + src/gallium/drivers/i915/i915_screen.c | 1 + src/gallium/drivers/ilo/ilo_screen.c | 1 + src/gallium/drivers/llvmpipe/lp_screen.c | 2 + .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 5 + src/gallium/drivers/nouveau/nv30/nv30_screen.c | 1 + src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_program.c| 6 +- src/gallium/drivers/nouveau/nvc0/nvc0_program.h| 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + .../drivers/nouveau/nvc0/nvc0_state_validate.c | 1 + src/gallium/drivers/r300/r300_screen.c | 1 + src/gallium/drivers/r600/r600_pipe.c | 1 + src/gallium/drivers/radeonsi/si_pipe.c | 1 + src/gallium/drivers/softpipe/sp_screen.c | 2 + src/gallium/drivers/svga/svga_screen.c | 1 + src/gallium/drivers/vc4/vc4_screen.c | 1 + src/gallium/include/pipe/p_defines.h | 1 + src/glsl/Makefile.sources | 1 + src/glsl/ast_to_hir.cpp| 14 + src/glsl/builtin_variables.cpp | 13 +- src/glsl/glcpp/glcpp-parse.y | 3 + src/glsl/glsl_parser_extras.cpp| 1 + src/glsl/glsl_parser_extras.h | 3 + src/glsl/glsl_types.cpp| 8 +- src/glsl/glsl_types.h | 10 +- src/glsl/ir_optimization.h | 1 + src/glsl/link_varyings.cpp | 17 +- src/glsl/link_varyings.h | 3 +- src/glsl/linker.cpp| 124 +++-- src/glsl/lower_cull_distance.cpp | 549 + src/glsl/standalone_scaffolding.cpp| 1 + src/glsl/tests/varyings_test.cpp | 27 + src/mapi/glapi/gen/gl_API.xml | 7 +- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 2 +- src/mesa/drivers/dri/i965/brw_gs.c | 2 +- src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 +- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 2 +- src/mesa/drivers/dri/i965/brw_vs.c | 2 +- src/mesa/main/extensions.c | 1 + src/mesa/main/get.c| 26 + src/mesa/main/get_hash_params.py | 4 + src/mesa/main/mtypes.h | 22 +- src/mesa/main/shaderapi.c | 4 +- src/mesa/main/tests/enum_strings.cpp | 2 + src/mesa/program/prog_print.c | 4 + src/mesa/state_tracker/st_extensions.c | 4 + src/mesa/state_tracker/st_program.c| 34 ++ 51 files changed, 859 insertions(+), 72 deletions(-) create mode 100644 src/glsl/lower_cull_distance.cpp -- 2.4.1 ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [RFC PATCH 00/11] Implement ARB_cull_distance
On 24.05.2015 21:36, Ilia Mirkin wrote: On Sun, May 24, 2015 at 3:30 PM, Tobias Klausmann tobias.johannes.klausm...@mni.thm.de wrote: On 24.05.2015 20:25, Ilia Mirkin wrote: I'm having a bit of trouble tracing through this. What happens if I have a shader that just does: gl_ClipDistance[0] = 1; gl_CullDistance[0] = 1; what does the resulting TGSI look like? (Assuming that clip plane 0 is enabled.) What about the generated nvc0 code (for the vertex shader)? (hack up a patch for this, run it without DRI_PRIME=1, see i pass and forget to check it again) yeah those are equal, sorry for wasting your time on this :/ Not a waste at all... let's ignore any shortcomings of your patches for a second, and think it through -- what do you want the TGSI to look like? I'm not even sure. Do you want to have a separate 2x CLIPDIST and 2x CULLDIST and let the driver worry about figuring out the max clip dist used and sticking the cull dists above it? Or do you want to work it out at a lower level where they share a single CLIPORCULLDIST semantic and get a separate e.g. shader property that gives them the mask? I don't know how other hardware works, but nv50/nvc0 hw has 8 clip_or_cull distances, and a mask that selects whether each is a clip or a cull distance. But perhaps other hw has them totally separate, dunno. With my limited experience about other hardware i'd go with seperate clip/cull and let the drivers figure out the right way to place them. That gives us the freedom to have it the way nv50/nvc0 works and other ways, like seperated clip/cull_distances if needed. Maybe we should just consider nouveau and radeon and decide by the hw of these often used drivers. Marek any comments on how the various radeons work? On Sun, May 24, 2015 at 1:57 PM, Tobias Klausmann tobias.johannes.klausm...@mni.thm.de wrote: This patch series adds the needed support for this extension to the various parts of mesa to finally enable it for nvc0. Dave Airlie (1): glsl: lower cull_distance into cull_distance_mesa Tobias Klausmann (10): glapi: add GL_ARB_cull_distance mesa/main: add support for GL_ARB_cull_distance mesa/prog: Add varyings for arb_cull_distance mesa/st: add support for GL_ARB_cull_distance glsl: Add a helper to see if an array was unsize in the shader glsl: Add arb_cull_distance support i965: rename UsesClipDistanceOut to UsesClipCullDistanceOut gallium: add support for arb_cull_distance nouveau/codegen: sort in galliums cull_distance semantic into the drivers bitmask nouveau/nvc0: implement cull_distance as a special form of clip distance docs/GL3.txt | 2 +- docs/relnotes/10.7.0.html | 4 +- src/gallium/auxiliary/cso_cache/cso_context.c | 3 + src/gallium/drivers/freedreno/freedreno_screen.c | 1 + src/gallium/drivers/i915/i915_screen.c | 1 + src/gallium/drivers/ilo/ilo_screen.c | 1 + src/gallium/drivers/llvmpipe/lp_screen.c | 2 + .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 5 + src/gallium/drivers/nouveau/nv30/nv30_screen.c | 1 + src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_program.c| 6 +- src/gallium/drivers/nouveau/nvc0/nvc0_program.h| 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + .../drivers/nouveau/nvc0/nvc0_state_validate.c | 1 + src/gallium/drivers/r300/r300_screen.c | 1 + src/gallium/drivers/r600/r600_pipe.c | 1 + src/gallium/drivers/radeonsi/si_pipe.c | 1 + src/gallium/drivers/softpipe/sp_screen.c | 2 + src/gallium/drivers/svga/svga_screen.c | 1 + src/gallium/drivers/vc4/vc4_screen.c | 1 + src/gallium/include/pipe/p_defines.h | 1 + src/glsl/Makefile.sources | 1 + src/glsl/ast_to_hir.cpp| 14 + src/glsl/builtin_variables.cpp | 13 +- src/glsl/glcpp/glcpp-parse.y | 3 + src/glsl/glsl_parser_extras.cpp| 1 + src/glsl/glsl_parser_extras.h | 3 + src/glsl/glsl_types.cpp| 8 +- src/glsl/glsl_types.h | 10 +- src/glsl/ir_optimization.h | 1 + src/glsl/link_varyings.cpp | 17 +- src/glsl/link_varyings.h | 3 +- src/glsl/linker.cpp| 124 +++-- src/glsl/lower_cull_distance.cpp | 549 + src/glsl/standalone_scaffolding.cpp| 1 + src/glsl/tests/varyings_test.cpp | 27 + src/mapi/glapi/gen/gl_API.xml | 7 +- src/mesa
Re: [Nouveau] [PATCH v2] nv50/ir: avoid messing up arg1 of PFETCH
Reviewed-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de On 23.05.2015 18:56, Ilia Mirkin wrote: There can be scenarios where the indirect arg of a PFETCH becomes known, and so the code will attempt to propagate it. Use this opportunity to just fold it into the first argument, and prevent the load propagation pass from touching PFETCH further. This fixes gs-input-array-vec4-index-rd.shader_test and vs-output-array-vec4-index-wr-before-gs.shader_test on nvc0 at least. Signed-off-by: Ilia Mirkin imir...@alum.mit.edu Cc: 10.5 10.6 mesa-sta...@lists.freedesktop.org --- v1 - v2: - redo final section of ConstantFolding::expr using a switch, per tobijk .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 72dd31e..b7fcd56 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -236,6 +236,9 @@ LoadPropagation::visit(BasicBlock *bb) if (i-op == OP_CALL) // calls have args as sources, they must be in regs continue; + if (i-op == OP_PFETCH) // pfetch expects arg1 to be a reg + continue; + if (i-srcExists(1)) checkSwapSrc01(i); @@ -581,6 +584,11 @@ ConstantFolding::expr(Instruction *i, case OP_POPCNT: res.data.u32 = util_bitcount(a-data.u32 b-data.u32); break; + case OP_PFETCH: + // The two arguments to pfetch are logically added together. Normally + // the second argument will not be constant, but that can happen. + res.data.u32 = a-data.u32 + b-data.u32; + break; default: return; } @@ -595,7 +603,9 @@ ConstantFolding::expr(Instruction *i, i-getSrc(0)-reg.data = res.data; - if (i-op == OP_MAD || i-op == OP_FMA) { + switch (i-op) { + case OP_MAD: + case OP_FMA: { i-op = OP_ADD; i-setSrc(1, i-getSrc(0)); @@ -610,8 +620,14 @@ ConstantFolding::expr(Instruction *i, bld.setPosition(i, false); i-setSrc(1, bld.loadImm(NULL, res.data.u32)); } - } else { + break; + } + case OP_PFETCH: + // Leave PFETCH alone... we just folded its 2 args into 1. + break; + default: i-op = i-saturate ? OP_SAT : OP_MOV; /* SAT handled by unary() */ + break; } i-subOp = 0; } ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [Mesa-dev] [PATCH] nv50/ir: avoid messing up arg1 of PFETCH
On 23.05.2015 08:06, Ilia Mirkin wrote: There can be scenarios where the indirect arg of a PFETCH becomes known, and so the code will attempt to propagate it. Use this opportunity to just fold it into the first argument, and prevent the load propagation pass from touching PFETCH further. This fixes gs-input-array-vec4-index-rd.shader_test and vs-output-array-vec4-index-wr-before-gs.shader_test on nvc0 at least. Signed-off-by: Ilia Mirkin imir...@alum.mit.edu Cc: 10.5 10.6 mesa-sta...@lists.freedesktop.org --- src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 10 ++ 1 file changed, 10 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 72dd31e..98e3d1f 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -236,6 +236,9 @@ LoadPropagation::visit(BasicBlock *bb) if (i-op == OP_CALL) // calls have args as sources, they must be in regs continue; + if (i-op == OP_PFETCH) // pfetch expects arg1 to be a reg + continue; + if (i-srcExists(1)) checkSwapSrc01(i); @@ -581,6 +584,11 @@ ConstantFolding::expr(Instruction *i, case OP_POPCNT: res.data.u32 = util_bitcount(a-data.u32 b-data.u32); break; + case OP_PFETCH: + // The two arguments to pfetch are logically added together. Normally + // the second argument will not be constant, but that can happen. + res.data.u32 = a-data.u32 + b-data.u32; + break; default: return; } @@ -610,6 +618,8 @@ ConstantFolding::expr(Instruction *i, bld.setPosition(i, false); i-setSrc(1, bld.loadImm(NULL, res.data.u32)); } + } else if (i-op == OP_PFETCH) { + // Leave PFETCH alone... we just folded its 2 args into 1. } else { i-op = i-saturate ? OP_SAT : OP_MOV; /* SAT handled by unary() */ } this last part sure works, but it gets ugly, while you are at it, can you change it to a switch statement? ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH 4/9] nvkm/fb/ramnv50: Ressurect timing code, use proper timing/rammap handlers
On 23.05.2015 00:33, Roy Spliet wrote: Might need some generalisation to GT200. For those: use at your own risk! Signed-off-by: Roy Spliet rspl...@eclipso.eu --- .../drm/nouveau/include/nvkm/subdev/bios/ramcfg.h | 16 ++ .../drm/nouveau/include/nvkm/subdev/bios/rammap.h | 2 + drivers/gpu/drm/nouveau/nvkm/subdev/bios/rammap.c | 29 drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramnv50.c | 167 + 4 files changed, 181 insertions(+), 33 deletions(-) diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/bios/ramcfg.h b/drivers/gpu/drm/nouveau/include/nvkm/subdev/bios/ramcfg.h index c6fb6aa..f09b6bf 100644 --- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/bios/ramcfg.h +++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/bios/ramcfg.h @@ -35,6 +35,22 @@ struct nvbios_ramcfg { unsigned ramcfg_DLLoff; union { struct { + unsigned ramcfg_00_03_01:1; + unsigned ramcfg_00_03_02:1; + unsigned ramcfg_00_03_08:1; + unsigned ramcfg_00_03_10:1; + unsigned ramcfg_00_04_02:1; + unsigned ramcfg_00_04_04:1; + unsigned ramcfg_00_04_20:1; + unsigned ramcfg_00_05:8; + unsigned ramcfg_00_06:8; + unsigned ramcfg_00_07:8; + unsigned ramcfg_00_08:8; + unsigned ramcfg_00_09:8; + unsigned ramcfg_00_0a_0f:4; + unsigned ramcfg_00_0a_f0:4; + }; + struct { unsigned ramcfg_10_02_01:1; unsigned ramcfg_10_02_02:1; unsigned ramcfg_10_02_04:1; diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/bios/rammap.h b/drivers/gpu/drm/nouveau/include/nvkm/subdev/bios/rammap.h index 609a905..2044fc9 100644 --- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/bios/rammap.h +++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/bios/rammap.h @@ -15,6 +15,8 @@ u32 nvbios_rammapEm(struct nvkm_bios *, u16 mhz, u32 nvbios_rammapSe(struct nvkm_bios *, u32 data, u8 ever, u8 ehdr, u8 ecnt, u8 elen, int idx, u8 *ver, u8 *hdr); +u32 nvbios_rammapSp_from_perf(struct nvkm_bios *bios, u32 data, u8 size, int idx, + struct nvbios_ramcfg *p); u32 nvbios_rammapSp(struct nvkm_bios *, u32 data, u8 ever, u8 ehdr, u8 ecnt, u8 elen, int idx, u8 *ver, u8 *hdr, struct nvbios_ramcfg *); diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bios/rammap.c b/drivers/gpu/drm/nouveau/nvkm/subdev/bios/rammap.c index a688d3b..4ec376a 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bios/rammap.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bios/rammap.c @@ -141,6 +141,35 @@ nvbios_rammapSe(struct nvkm_bios *bios, u32 data, } u32 +nvbios_rammapSp_from_perf(struct nvkm_bios *bios, u32 data, u8 size, int idx, + struct nvbios_ramcfg *p) +{ + data += (idx * size); + + if (size 11) + return NULL; + + p-ramcfg_timing = nv_ro08(bios, data + 0x01); + p-ramcfg_00_03_01 = (nv_ro08(bios, data + 0x03) 0x01) 0; + p-ramcfg_00_03_02 = (nv_ro08(bios, data + 0x03) 0x02) 1; + p-ramcfg_DLLoff = (nv_ro08(bios, data + 0x03) 0x04) 2; + p-ramcfg_00_03_08 = (nv_ro08(bios, data + 0x03) 0x08) 3; + p-ramcfg_00_03_10 = (nv_ro08(bios, data + 0x03) 0x10) 4; + p-ramcfg_00_04_02 = (nv_ro08(bios, data + 0x04) 0x02) 1; + p-ramcfg_00_04_04 = (nv_ro08(bios, data + 0x04) 0x04) 2; + p-ramcfg_00_04_20 = (nv_ro08(bios, data + 0x04) 0x20) 5; + p-ramcfg_00_05= (nv_ro08(bios, data + 0x05) 0xff) 0; + p-ramcfg_00_06= (nv_ro08(bios, data + 0x06) 0xff) 0; + p-ramcfg_00_07= (nv_ro08(bios, data + 0x07) 0xff) 0; + p-ramcfg_00_08= (nv_ro08(bios, data + 0x08) 0xff) 0; + p-ramcfg_00_09= (nv_ro08(bios, data + 0x09) 0xff) 0; + p-ramcfg_00_0a_0f = (nv_ro08(bios, data + 0x0a) 0x0f) 0; + p-ramcfg_00_0a_f0 = (nv_ro08(bios, data + 0x0a) 0xf0) 4; + + return data; +} + +u32 nvbios_rammapSp(struct nvkm_bios *bios, u32 data, u8 ever, u8 ehdr, u8 ecnt, u8 elen, int idx, u8 *ver, u8 *hdr, struct nvbios_ramcfg *p) diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramnv50.c b/drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramnv50.c index a96e512..51f93a0 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramnv50.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramnv50.c @@ -29,6 +29,7 @@ #include subdev/bios.h #include subdev/bios/perf.h #include subdev/bios/pll.h +#include subdev/bios/rammap.h #include subdev/bios/timing.h #include subdev/clk/pll.h @@ -55,6 +56,84 @@ struct nv50_ram { struct nv50_ramseq hwsq; }; +#define T(t)
Re: [Nouveau] [PATCH] fix a wrong use of a logical operator in drmmode_output_dpms()
looks good to me! :) Feel free to add my R-b. On 20.05.2015 17:08, Samuel Pitoiset wrote: This is probably a typo error which has been introduced in 2009... This fixes the following warning detected by Clang : drmmode_display.c:907:30: warning: use of logical '' with constant operand [-Wconstant-logical-operand] if (props (props-flags DRM_MODE_PROP_ENUM)) { Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com --- src/drmmode_display.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/drmmode_display.c b/src/drmmode_display.c index 7c1d2bb..161bccd 100644 --- a/src/drmmode_display.c +++ b/src/drmmode_display.c @@ -904,7 +904,7 @@ drmmode_output_dpms(xf86OutputPtr output, int mode) for (i = 0; i koutput-count_props; i++) { props = drmModeGetProperty(drmmode-fd, koutput-props[i]); - if (props (props-flags DRM_MODE_PROP_ENUM)) { + if (props (props-flags DRM_MODE_PROP_ENUM)) { if (!strcmp(props-name, DPMS)) { mode_id = koutput-props[i]; drmModeFreeProperty(props); ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH] ram/gf100-: error out if a ridiculous amount of vram is detected
Any idea on how to solve the problem. other than just reporting it? But for now this adds a helpful error message... you may add my R-b. On 20.05.2015 22:01, Ilia Mirkin wrote: Some newer chips have trouble coming up, and we get bad MMIO reads from them, like 0xbadf100. This ends up translating into crazy amounts of VRAM, which destroys all sorts of other logic down the line. Instead, fail device init. Signed-off-by: Ilia Mirkin imir...@alum.mit.edu Cc: sta...@kernel.org --- drm/nouveau/nvkm/subdev/fb/ramgf100.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drm/nouveau/nvkm/subdev/fb/ramgf100.c b/drm/nouveau/nvkm/subdev/fb/ramgf100.c index de9f395..9d4d196 100644 --- a/drm/nouveau/nvkm/subdev/fb/ramgf100.c +++ b/drm/nouveau/nvkm/subdev/fb/ramgf100.c @@ -545,6 +545,12 @@ gf100_ram_create_(struct nvkm_object *parent, struct nvkm_object *engine, } } + /* if over 1TB of VRAM is reported, something went very wrong, bail */ + if (ram-size (1ULL 40)) { + nv_error(pfb, invalid vram size: %llx\n, ram-size); + return -EINVAL; + } + /* if all controllers have the same amount attached, there's no holes */ if (uniform) { offset = rsvd_head; ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [Mesa-dev] [PATCH 00/12] Tessellation support for nvc0
as far as i can evaluate this without deeper insight into tess, this patchseries looks good to me! Reviewed-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de On 17.05.2015 07:07, Ilia Mirkin wrote: This is enough to enable tessellation support on nvc0. It seems to work a lot better on my GF108 than GK208. I suspect that there's some sort of scheduling shenanigans that need to be adjusted for kepler+. Or perhaps some shader header things. Even with the GF108, I still get occasional blue triangles in Heaven, but I get a *ton* of them on the GK208 -- seemingly the same issue, but it's much worse on there. Also there's about a 100% chance that gl_PrimitiveID doesn't work. In any case, I plan on pushing this semi-soon unless there are any loud objections. I don't think it's going to do too much good sitting in my tree, or too much evil sitting upstream while core + st/mesa are worked out. Ilia Mirkin (12): nvc0: preliminary tess support nvc0: add support for setting patch vertices at draw time nvc0: add handling for set_tess_state callback nvc0: TESSCOORD comes in as a sysval, not an input nvc0/ir: mark varyings as per-patch based on semantic name nv50/ir: populate info structure based on new tess properties nv50/ir: set perPatch flag on load/stores to per-patch varyings nv50/ir: add support for reading outputs in tess control shaders nvc0/ir: patch vertex count is stored in the upper bits nvc0/ir: handle loads from outputs in control shaders nvc0/ir: allow tess eval output loads to be CSE'd nv50/ir: cleanup private enums that have graduated to gallium src/gallium/drivers/nouveau/codegen/nv50_ir.cpp| 4 +- .../drivers/nouveau/codegen/nv50_ir_driver.h | 12 .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 56 +++-- .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 7 +++ .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 2 + src/gallium/drivers/nouveau/nvc0/nvc0_context.h| 8 ++- src/gallium/drivers/nouveau/nvc0/nvc0_program.c| 56 +++-- src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 7 +-- src/gallium/drivers/nouveau/nvc0/nvc0_screen.h | 1 + .../drivers/nouveau/nvc0/nvc0_shader_state.c | 3 - src/gallium/drivers/nouveau/nvc0/nvc0_state.c | 71 ++ .../drivers/nouveau/nvc0/nvc0_state_validate.c | 11 src/gallium/drivers/nouveau/nvc0/nvc0_tex.c| 34 +-- src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c| 9 ++- .../drivers/nouveau/nvc0/nvc0_vbo_translate.c | 3 +- 15 files changed, 200 insertions(+), 84 deletions(-) ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH] nvc0: fix context destruction for partly implemented tesselation
Backtrace: ... 0x757fc392 in __assert_fail () from /lib64/libc.so.6 0x70cf5bec in nvc0_shader_stage (pipe=optimized out) at ./nvc0/nvc0_context.h:204 nvc0_set_constant_buffer (pipe=0x631080, shader=optimized out, index=optimized out, cb=0x0) at nvc0/nvc0_state.c:771 0x70a2cf63 in st_destroy_context (st=0x68a9f0) at state_tracker/st_context.c:382 ... Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- src/gallium/drivers/nouveau/nvc0/nvc0_context.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h index 09d08e4..f910541 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h @@ -195,8 +195,8 @@ nvc0_shader_stage(unsigned pipe) { switch (pipe) { case PIPE_SHADER_VERTEX: return 0; -/* case PIPE_SHADER_TESSELLATION_CONTROL: return 1; */ -/* case PIPE_SHADER_TESSELLATION_EVALUATION: return 2; */ + case PIPE_SHADER_TESS_CTRL: return 1; + case PIPE_SHADER_TESS_EVAL: return 2; case PIPE_SHADER_GEOMETRY: return 3; case PIPE_SHADER_FRAGMENT: return 4; case PIPE_SHADER_COMPUTE: return 5; -- 2.4.1 ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH] nvc0: switch mechanism for shader eviction to be a while loop
On 10.05.2015 07:57, Ilia Mirkin wrote: This aligns it to work similarly to nv50. However there's no library code there, so the whole thing can be freed. Here we end up with an allocated node that's not attached to a specific program. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86792 Signed-off-by: Ilia Mirkin imir...@alum.mit.edu Cc: mesa-sta...@lists.freedesktop.org --- src/gallium/drivers/nouveau/nvc0/nvc0_program.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c index c156e91..5589695 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c @@ -683,11 +683,12 @@ nvc0_program_upload_code(struct nvc0_context *nvc0, struct nvc0_program *prog) ret = nouveau_heap_alloc(screen-text_heap, size, prog, prog-mem); if (ret) { struct nouveau_heap *heap = screen-text_heap; - struct nouveau_heap *iter; - for (iter = heap; iter iter-next != heap; iter = iter-next) { - struct nvc0_program *evict = iter-priv; - if (evict) -nouveau_heap_free(evict-mem); + /* Note that the code library, which is allocated before anything else, + * does not have a priv pointer. We can stop once we hit it. + */ + while (heap-next heap-next-priv) { + struct nvc0_program *evict = heap-next-priv; + nouveau_heap_free(evict-mem); } debug_printf(WARNING: out of code space, evicting all shaders.\n); ret = nouveau_heap_alloc(heap, size, prog, prog-mem); The new comment is a bit upside down, but thats not really a problem R-b here as well ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH 4/4] nv50/ir: allow OP_SET to merge with OP_SET_AND/etc as well as a neg
On 09.05.2015 07:35, Ilia Mirkin wrote: This covers the pattern where a KILL_IF is used, which triggers a comparison of -x to 0. This can usually be folded into the comparison whose result is being compared to 0, however it may, itself, have already been combined with another comparison. That shouldn't impact the logic of this pass however. With this and the 1.0 change, code like 0020: 001c0001 80081df4 set b32 $r0 lt f32 $r0 0x3e80 0028: 001c 201fc000 and b32 $r0 $r0 0x3f80 0030: 7f9c001e dd885c00 set $p0 0x1 lt f32 neg $r0 0x0 0038: 003c 1980 $p0 discard becomes 0020: 001c001d b5881df4 set $p0 0x1 lt f32 $r0 0x3e80 0028: 003c 1980 $p0 discard Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 51 ++ 1 file changed, 33 insertions(+), 18 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index d8af19a..43a2fe9 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -278,7 +278,6 @@ private: void tryCollapseChainedMULs(Instruction *, const int s, ImmediateValue); - // TGSI 'true' is converted to -1 by F2I(NEG(SET)), track back to SET CmpInstruction *findOriginForTestWithZero(Value *); unsigned int foldCount; @@ -337,16 +336,10 @@ ConstantFolding::findOriginForTestWithZero(Value *value) return NULL; Instruction *insn = value-getInsn(); - while (insn insn-op != OP_SET) { + while (insn insn-op != OP_SET insn-op != OP_SET_AND + insn-op != OP_SET_OR insn-op != OP_SET_XOR) { Instruction *next = NULL; switch (insn-op) { - case OP_NEG: - case OP_ABS: - case OP_CVT: - next = insn-getSrc(0)-getInsn(); - if (insn-sType != next-dType) -return NULL; - break; case OP_MOV: next = insn-getSrc(0)-getInsn(); break; @@ -946,29 +939,51 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue imm0, int s) case OP_SET: // TODO: SET_AND,OR,XOR delete this comment?! { + /* This optimizes the case where the output of a set is being compared + * to zero. Since the set can only produce 0/-1 (int) or 0/1 (float), we + * can be a lot cleverer in our comparison. + */ CmpInstruction *si = findOriginForTestWithZero(i-getSrc(t)); CondCode cc, ccZ; - if (i-src(t).mod != Modifier(0)) - return; - if (imm0.reg.data.u32 != 0 || !si || si-op != OP_SET) + if (imm0.reg.data.u32 != 0 || !si) return; cc = si-setCond; ccZ = (CondCode)((unsigned int)i-asCmp()-setCond ~CC_U); + // We do everything assuming var (cmp) 0, reverse the condition if 0 is + // first. if (s == 0) ccZ = reverseCondCode(ccZ); + // If there is a negative modifier, we need to undo that, by flipping + // the comparison to zero. + if (i-src(t).mod.neg()) + ccZ = reverseCondCode(ccZ); + // If this is a signed comparison, we expect the input to be a regular + // boolean, i.e. 0/-1. However the rest of the logic assumes that true + // is positive, so just flip the sign. + if (i-sType == TYPE_S32) { + assert(!isFloatType(si-dType)); + ccZ = reverseCondCode(ccZ); + } can both this and the previous condition evaluate to true? if yes, this double-flips ccZ... switch (ccZ) { - case CC_LT: cc = CC_FL; break; - case CC_GE: cc = CC_TR; break; - case CC_EQ: cc = inverseCondCode(cc); break; - case CC_LE: cc = inverseCondCode(cc); break; - case CC_GT: break; - case CC_NE: break; + case CC_LT: cc = CC_FL; break; // bool 0 -- this is never true + case CC_GE: cc = CC_TR; break; // bool = 0 -- this is always true + case CC_EQ: cc = inverseCondCode(cc); break; // bool == 0 -- !bool + case CC_LE: cc = inverseCondCode(cc); break; // bool = 0 -- !bool + case CC_GT: break; // bool 0 -- bool + case CC_NE: break; // bool != 0 -- bool default: return; } + + // Update the condition of this SET to be identical to the origin set, + // but with the updated condition code. The original SET should get + // DCE'd, ideally. + i-op = si-op; i-asCmp()-setCond = cc; i-setSrc(0, si-src(0)); i-setSrc(1, si-src(1)); + if (si-srcExists(2)) + i-setSrc(2, si-src(2)); i-sType = si-sType; } break; ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH 3/4] nvc0/ir: optimize set 1.0 to produce boolean-float sets
On 09.05.2015 07:35, Ilia Mirkin wrote: This has started to happen more now that the backend is producing KILL_IF more often. Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 29 ++ .../nouveau/codegen/nv50_ir_target_nv50.cpp| 2 ++ 2 files changed, 31 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 14446b6..d8af19a 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -973,6 +973,35 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue imm0, int s) } break; + case OP_AND: + { + CmpInstruction *cmp = i-getSrc(t)-getInsn()-asCmp(); + if (!cmp || cmp-op == OP_SLCT) how about if (cmp == NULL || ...) and kill the same condition later? + return; + if (!prog-getTarget()-isOpSupported(cmp-op, TYPE_F32)) + return; + if (imm0.reg.data.f32 != 1.0) + return; + if (cmp == NULL) + return; + if (i-getSrc(t)-getInsn()-dType != TYPE_U32) + return; + + i-getSrc(t)-getInsn()-dType = TYPE_F32; + if (i-src(t).mod != Modifier(0)) { + assert(i-src(t).mod == Modifier(NV50_IR_MOD_NOT)); + i-src(t).mod = Modifier(0); + cmp-setCond = reverseCondCode(cmp-setCond); + } + i-op = OP_MOV; + i-setSrc(s, NULL); + if (t) { + i-setSrc(0, i-getSrc(t)); + i-setSrc(t, NULL); + } + } + break; + case OP_SHL: { if (s != 1 || i-src(0).mod != Modifier(0)) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp index 178a167..70180eb 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp @@ -413,6 +413,8 @@ TargetNV50::isOpSupported(operation op, DataType ty) const return false; case OP_SAD: return ty == TYPE_S32; + case OP_SET: + return !isFloatType(ty); default: return true; } ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH] nv50/ir: only enable mul saturate on G200+
Reviewed-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de On 09.05.2015 09:31, Ilia Mirkin wrote: Commit 44673512a84 enabled support for saturating fmul. However experimentally this does not seem to work on the older chips. Restrict the feature to G200 (NVA0) and later. Reported-by: Pierre Moreau pierre.mor...@free.fr Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90350 Signed-off-by: Ilia Mirkin imir...@alum.mit.edu Cc: mesa-sta...@lists.freedesktop.org --- src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp index 70180eb..ca545a6 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp @@ -84,7 +84,7 @@ static const struct opProperties _initProps[] = // neg abs not sat c[] s[], a[], imm { OP_ADD,0x3, 0x0, 0x0, 0x8, 0x2, 0x1, 0x1, 0x2 }, { OP_SUB,0x3, 0x0, 0x0, 0x8, 0x2, 0x1, 0x1, 0x2 }, - { OP_MUL,0x3, 0x0, 0x0, 0x8, 0x2, 0x1, 0x1, 0x2 }, + { OP_MUL,0x3, 0x0, 0x0, 0x0, 0x2, 0x1, 0x1, 0x2 }, { OP_MAX,0x3, 0x3, 0x0, 0x0, 0x2, 0x1, 0x1, 0x0 }, { OP_MIN,0x3, 0x3, 0x0, 0x0, 0x2, 0x1, 0x1, 0x0 }, { OP_MAD,0x7, 0x0, 0x0, 0x8, 0x6, 0x1, 0x1, 0x0 }, // special constraint @@ -188,6 +188,9 @@ void TargetNV50::initOpInfo() if (prop-mSat 8) opInfo[prop-op].dstMods = NV50_IR_MOD_SAT; } + + if (chipset = 0xa0) + opInfo[OP_MUL].dstMods = NV50_IR_MOD_SAT; } unsigned int ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH 3/4] nvc0/ir: optimize set 1.0 to produce boolean-float sets
On 09.05.2015 19:53, Ilia Mirkin wrote: On Sat, May 9, 2015 at 11:27 AM, Tobias Klausmann tobias.johannes.klausm...@mni.thm.de wrote: On 09.05.2015 07:35, Ilia Mirkin wrote: This has started to happen more now that the backend is producing KILL_IF more often. Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 29 ++ .../nouveau/codegen/nv50_ir_target_nv50.cpp| 2 ++ 2 files changed, 31 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 14446b6..d8af19a 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -973,6 +973,35 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue imm0, int s) } break; + case OP_AND: + { + CmpInstruction *cmp = i-getSrc(t)-getInsn()-asCmp(); + if (!cmp || cmp-op == OP_SLCT) how about if (cmp == NULL || ...) and kill the same condition later? I just killed the other one. I think the usual style tends to be if (!ptr) rather than if (ptr == NULL) in codegen. Both are acceptable though. it was mainly about the dead code you killed now. With this it looks fine to me, so feel free to add Reviewed-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de + return; + if (!prog-getTarget()-isOpSupported(cmp-op, TYPE_F32)) + return; + if (imm0.reg.data.f32 != 1.0) + return; + if (cmp == NULL) + return; + if (i-getSrc(t)-getInsn()-dType != TYPE_U32) + return; + + i-getSrc(t)-getInsn()-dType = TYPE_F32; + if (i-src(t).mod != Modifier(0)) { + assert(i-src(t).mod == Modifier(NV50_IR_MOD_NOT)); + i-src(t).mod = Modifier(0); + cmp-setCond = reverseCondCode(cmp-setCond); + } + i-op = OP_MOV; + i-setSrc(s, NULL); + if (t) { + i-setSrc(0, i-getSrc(t)); + i-setSrc(t, NULL); + } + } + break; + case OP_SHL: { if (s != 1 || i-src(0).mod != Modifier(0)) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp index 178a167..70180eb 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp @@ -413,6 +413,8 @@ TargetNV50::isOpSupported(operation op, DataType ty) const return false; case OP_SAD: return ty == TYPE_S32; + case OP_SET: + return !isFloatType(ty); default: return true; } ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Problem with GTX 970 under Fedora 21
On 25.01.2015 21:40, super_7b wrote: Hi, I took this issue to the Fedora forum initially, but no-one there has been able to offer any guidance, so I have decided to come to the nouveau community directly. I was running a KDE desktop under Fedora 21 successfully, including the ability to use a direct text login on an older card (GXT 560 Ti) with an entirely stock Fedora, fully updated. I simply replaced the old card with an Asus GTX 970 and re-booted. I run with the rhgb quiet option removed from grub2, so I can see all that happens up to the graphical login screen and I noticed that the re-boot was different at the point at which the screen changed from a basic (80x25 ?) text to a higher resolution (128X50 ?). This no longer happened and I got a simple blank screen. If I wait, the graphical boot screen eventually appears and I can login to KDE successfully and run my desktop apps as normal. If I switch consoles (e.g. Ctrl-Alt-F2) for a text login, I get a black screen. The last visible thing on my boot screen before the black is something like fb: switching to nouveaufb from EFI VGA. When I looked at the dmesg output, lsmod and lshw, it appears that the nouveau driver does not correctly detect and initialise the GTX 970. Have I a configuration error, or is there something not working in nouveau? I attach logs from dmesg, lshw and lsmod and I can supply more data if needed. BR Mick Hi, the 9xx series is rather new and is not supported by nouveau for now (modesetting is in 3.19rcX imho). Concerning acceleration: Nvidia needs to provide signed firmwares for those cards under a license appropriate to include in an open source project, until that happens, there wont be substantial improvements. Greetings Tobias ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH] fuse/gm107: simplify the return logic
On 25.01.2015 20:35, Martin Peres wrote: Spotted by coccinelle: drivers/gpu/drm/nouveau/core/subdev/fuse/gm107.c:50:5-8: WARNING: end returns can be simpified Signed-off-by: Martin Peres martin.pe...@free.fr --- drm/nouveau/nvkm/subdev/fuse/gm107.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drm/nouveau/nvkm/subdev/fuse/gm107.c b/drm/nouveau/nvkm/subdev/fuse/gm107.c index ba19158..0b256aa 100644 --- a/drm/nouveau/nvkm/subdev/fuse/gm107.c +++ b/drm/nouveau/nvkm/subdev/fuse/gm107.c @@ -45,10 +45,8 @@ gm107_fuse_ctor(struct nvkm_object *parent, struct nvkm_object *engine, ret = nvkm_fuse_create(parent, engine, oclass, priv); *pobject = nv_object(priv); - if (ret) - return ret; - return 0; + return ret; } struct nvkm_oclass Reviewed-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de If it is helping :) ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH] nv50/ir: Handle OP_CVT when folding constant expressions
On 11.01.2015 23:53, Ilia Mirkin wrote: On Sun, Jan 11, 2015 at 5:48 PM, Tobias Klausmann tobias.johannes.klausm...@mni.thm.de wrote: On 11.01.2015 23:12, Ilia Mirkin wrote: On Sun, Jan 11, 2015 at 5:08 PM, Tobias Klausmann tobias.johannes.klausm...@mni.thm.de wrote: On 11.01.2015 22:54, Ilia Mirkin wrote: On Sun, Jan 11, 2015 at 4:40 PM, Tobias Klausmann tobias.johannes.klausm...@mni.thm.de wrote: Folding for conversions: F32-(U{16/32}, S{16/32}) and (U{16/32}, {S16/32})-F32 Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- V2: Split out F64 parts V3: remove handling of saturate for (U/S)32, .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 73 ++ 1 file changed, 73 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 21d20ca..aaf0d0d 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -997,6 +997,79 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue imm0, int s) i-op = OP_MOV; break; } + case OP_CVT: { + Storage res; + bld.setPosition(i, true); /* make sure bld is init'ed */ + switch(i-dType) { + case TYPE_U16: + switch (i-sType) { + case TYPE_F32: + if (i-saturate) + res.data.u16 = util_iround(CLAMP(imm0.reg.data.f32, 0, + UINT16_MAX)); + else + res.data.u16 = util_iround(imm0.reg.data.f32); + break; + default: + return; + } This won't get hit for the U32 - U16 conversion though right? Did you test that case? Am I misreading/misunderstanding perhaps? A complete piglit run did not hit i-saturate for U32 or S32. That said, i kept the assert() there on purpose for now to actually make sure we are no hitting such a case. Do i misread you now? :) From my read of the code, we'd hit that case now with TXF on a 2D_ARRAY with a constant as the array element. i.e. a piglit with uniform sampler2DArray foo; texelFetch(foo, ivec3(1, 2, 3)); Tested this (hope i did the right thing) and the assert did not get triggered, but i am still uncertain of this. - move the assert into the F32 case for U32/S32 just to make sure... switch (i-sType) case TYPE_F32: assert(...) ... other than that, we are not even going to fold U32 - U16 ;-) Right, and that's the problem. Try it with a piglit that has the code I suggest... if you don't end up collapsing it, include the TGSI that's generated (and also the shader test source) and we'll go from there. -ilia Haven't found a piglit test triggering that, but i have created a TGSI shader on my own. That's the only reason i am writing this email and not just posting the patch. This one is collapsed just fine though. FRAG DCL OUT[0..2], COLOR DCL CONST[0..2] DCL TEMP[0..2], LOCAL IMM[0] FLT32 { 1, 2, 3, 4} IMM[1] UINT32 { 5, 6, 7, 8} 0: TEX TEMP[0], IMM[0], SAMP[0], 2D_ARRAY 1: TXF TEMP[1], IMM[1], SAMP[0], 2D_ARRAY 2: MOV OUT[0], TEMP[0] 3: MOV OUT[1], TEMP[1] 4: END Greetings, Tobias ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH v4] nv50/ir: Handle OP_CVT when folding constant expressions
Folding for conversions: F32-(U{16/32}, S{16/32}) (U{16/32}, {S16/32})-F32 U32 - U16 Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- V2: Split out F64 parts V3: remove handling of saturate for (U/S)32 V4: handle U32-U16 for OP_TXF .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 79 ++ 1 file changed, 79 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 21d20ca..235aed9 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -997,6 +997,85 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue imm0, int s) i-op = OP_MOV; break; } + case OP_CVT: { + Storage res; + bld.setPosition(i, true); /* make sure bld is init'ed */ + switch(i-dType) { + case TYPE_U16: + switch (i-sType) { + case TYPE_F32: +if (i-saturate) + res.data.u16 = util_iround(CLAMP(imm0.reg.data.f32, 0, +UINT16_MAX)); +else + res.data.u16 = util_iround(imm0.reg.data.f32); +break; + case TYPE_U32: +if (i-saturate) + res.data.u16 = CLAMP(imm0.reg.data.u32, 0, UINT16_MAX); +else + res.data.u16 = imm0.reg.data.u32; +break; + default: +return; + } + i-setSrc(0, bld.mkImm(res.data.u16)); + break; + case TYPE_U32: + assert(!i-saturate); + switch (i-sType) { + case TYPE_F32: +res.data.u32 = util_iround(imm0.reg.data.f32); +break; + default: +return; + } + i-setSrc(0, bld.mkImm(res.data.u32)); + break; + case TYPE_S16: + switch (i-sType) { + case TYPE_F32: +if (i-saturate) + res.data.s16 = util_iround(CLAMP(imm0.reg.data.f32, INT16_MIN, +INT16_MAX)); +else + res.data.s16 = util_iround(imm0.reg.data.f32); +break; + default: +return; + } + i-setSrc(0, bld.mkImm(res.data.s16)); + break; + case TYPE_S32: + assert(!i-saturate); + switch (i-sType) { + case TYPE_F32: +res.data.s32 = util_iround(imm0.reg.data.f32); +break; + default: +return; + } + i-setSrc(0, bld.mkImm(res.data.s32)); + break; + case TYPE_F32: + switch (i-sType) { + case TYPE_U16: res.data.f32 = (float) imm0.reg.data.u16; break; + case TYPE_U32: res.data.f32 = (float) imm0.reg.data.u32; break; + case TYPE_S16: res.data.f32 = (float) imm0.reg.data.s16; break; + case TYPE_S32: res.data.f32 = (float) imm0.reg.data.s32; break; + default: +return; + } + i-setSrc(0, bld.mkImm(res.data.f32)); + break; + default: + return; + } + i-setType(i-dType); /* Remove i-sType, which we don't need anymore */ + i-op = OP_MOV; + i-src(0).mod = Modifier(0); /* Clear the already applied modifier */ + break; + } default: return; } -- 2.2.2 ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH v2] nv50/ir: Handle OP_CVT when folding constant expressions
On 11.01.2015 20:19, Ilia Mirkin wrote: On Sun, Jan 11, 2015 at 12:27 PM, Tobias Klausmann tobias.johannes.klausm...@mni.thm.de wrote: On 11.01.2015 01:58, Ilia Mirkin wrote: On Fri, Jan 9, 2015 at 8:24 PM, Tobias Klausmann tobias.johannes.klausm...@mni.thm.de wrote: Folding for conversions: F32-(U{16/32}, S{16/32}) and (U{16/32}, {S16/32})-F32 Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- V2: beat me, whip me, split out F64 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 81 ++ 1 file changed, 81 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 9a0bb60..741c74f 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -997,6 +997,87 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue imm0, int s) i-op = OP_MOV; break; } + case OP_CVT: { + Storage res; + bld.setPosition(i, true); /* make sure bld is init'ed */ + switch(i-dType) { + case TYPE_U16: + switch (i-sType) { + case TYPE_F32: +if (i-saturate) + res.data.u16 = util_iround(CLAMP(imm0.reg.data.f32, 0, +UINT16_MAX)); Where did this saturate stuff come from? It doesn't make sense to saturate to a non-float dtype. I'd go ahead and just assert(!i-saturate) in the int dtype cases. One does wonder what the hw does if the float doesn't fit in the destination... whether it saturates or not. I don't hugely care though. Actually i can't remember why that was added in the first place, i'll go ahead and follow your advice here. Oh wait... this was to support saturating an array access into a u16... const int sat = (i-op == OP_TXF) ? 1 : 0; DataType sTy = (i-op == OP_TXF) ? TYPE_U32 : TYPE_F32; bld.mkCvt(OP_CVT, TYPE_U16, layer, sTy, src)-saturate = sat; So... basically if the source is a U32 and the dest is a U16, we want to saturate there? IMO this is such a minor use-case that it doesn't really matter. However I guess you can keep the saturate bits around if you like. We can do it with or without the saturate if we rely on the test, assert(!i-saturate)'ing is the only thing that breaks the test you sure meant: glsl-resource-not-bound 1DArray glsl-resource-not-bound 2DArray glsl-resource-not-bound 2DMSArray -ilia ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] Re: [RFC] mesa/st: Avoid passing a NULL buffer to the drivers
On 11.01.2015 06:05, Ilia Mirkin wrote: Can you elaborate a bit as to why that's the right thing to do? On Wed, Jan 7, 2015 at 1:52 PM, Tobias Klausmann tobias.johannes.klausm...@mni.thm.de wrote: If we capture transform feedback from n stream in (n-1) buffers we face a NULL buffer, use the buffer (n-1) to capture the output of stream n. This fixes one piglit test with nvc0: arb_gpu_shader5-xfb-streams-without-invocations Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- src/mesa/state_tracker/st_cb_xformfb.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/mesa/state_tracker/st_cb_xformfb.c b/src/mesa/state_tracker/st_cb_xformfb.c index 8f75eda..5a12da4 100644 --- a/src/mesa/state_tracker/st_cb_xformfb.c +++ b/src/mesa/state_tracker/st_cb_xformfb.c @@ -123,6 +123,11 @@ st_begin_transform_feedback(struct gl_context *ctx, GLenum mode, struct st_buffer_object *bo = st_buffer_object(sobj-base.Buffers[i]); if (bo) { + if (!bo-buffer) +/* If we capture transform feedback from n streams into (n-1) + * buffers we have to write to buffer (n-1) for stream n. + */ +bo = st_buffer_object(sobj-base.Buffers[i-1]); /* Check whether we need to recreate the target. */ if (!sobj-targets[i] || sobj-targets[i] == sobj-draw_count || -- 2.2.1 Quoted from Ilia Mirkin, to specify what shall be elaborated: Can you explain (on-list) why using buffer n - 1 is the right thing to do to capture output of stream n? I would have thought that the output for that stream should be discarded or something. Like with a spec quotation or some other justification. i.e. why is the code you wrote correct? Why is it better than, say, bo = buffers[0], or some other thing entirely? Yeah thats the most concerning point i see as well. The problem is that there is a interaction between arb_gpu_shader5 and arb_transform_feedback3, but after a bit of reading i think the patch is actually what we should do: From the arb_transfrom_feedback3 spec: (3) How might you use transform feedback with geometry shaders and multiple vertex streams? RESOLVED: As a simple example, let's say you are processing triangles and capture both processed triangle vertices and some values that are computed per-primitive (e.g., facet normal). The geometry shader might declare its outputs like the following: layout(stream = 0) out vec4 position; layout(stream = 0) out vec4 texcoord; layout(stream = 1) out vec4 normal; position and texcoord would be per-vertex attributes written to vertex stream 0; normal would be a per-triangle facet normal. The geometry shader would emit three vertices to stream zero (the processed input vertices) and a single vertex to stream one (the per-triangle data). The transform feedback API usage for this case would be something like: // Set up buffer objects 21 and 22 to capture data for per-vertex and // per primitive values. glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 0, 21); glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 1, 22); // Set up XFB to capture position and texcoord to buffer binding // point 0 (buffer 21 bound), and normal to binding point 1 (buffer // 22 bound). char *strings[] = { position, texcoord, gl_NextBuffer, normal }; - Especially the comments are enlightening as to where the outputs should go. Thats what happens with the arb_gpu_shader5-xfb-streams-without-invocations test, where two stream(outputs) are captured into one buffer. One might argue now if we have to count .Buffers[i-1] for all buffers after this... Comments and additional feedback is always appreciated! Greetings, Tobias ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH] nv50/ir: Handle OP_CVT when folding constant expressions
On 11.01.2015 22:54, Ilia Mirkin wrote: On Sun, Jan 11, 2015 at 4:40 PM, Tobias Klausmann tobias.johannes.klausm...@mni.thm.de wrote: Folding for conversions: F32-(U{16/32}, S{16/32}) and (U{16/32}, {S16/32})-F32 Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- V2: Split out F64 parts V3: remove handling of saturate for (U/S)32, .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 73 ++ 1 file changed, 73 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 21d20ca..aaf0d0d 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -997,6 +997,79 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue imm0, int s) i-op = OP_MOV; break; } + case OP_CVT: { + Storage res; + bld.setPosition(i, true); /* make sure bld is init'ed */ + switch(i-dType) { + case TYPE_U16: + switch (i-sType) { + case TYPE_F32: +if (i-saturate) + res.data.u16 = util_iround(CLAMP(imm0.reg.data.f32, 0, +UINT16_MAX)); +else + res.data.u16 = util_iround(imm0.reg.data.f32); +break; + default: +return; + } This won't get hit for the U32 - U16 conversion though right? Did you test that case? Am I misreading/misunderstanding perhaps? A complete piglit run did not hit i-saturate for U32 or S32. That said, i kept the assert() there on purpose for now to actually make sure we are no hitting such a case. Do i misread you now? :) ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH] nv50/ir: Handle OP_CVT when folding constant expressions
On 11.01.2015 23:12, Ilia Mirkin wrote: On Sun, Jan 11, 2015 at 5:08 PM, Tobias Klausmann tobias.johannes.klausm...@mni.thm.de wrote: On 11.01.2015 22:54, Ilia Mirkin wrote: On Sun, Jan 11, 2015 at 4:40 PM, Tobias Klausmann tobias.johannes.klausm...@mni.thm.de wrote: Folding for conversions: F32-(U{16/32}, S{16/32}) and (U{16/32}, {S16/32})-F32 Signed-off-by: Tobias Klausmann tobias.johannes.klausm...@mni.thm.de --- V2: Split out F64 parts V3: remove handling of saturate for (U/S)32, .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 73 ++ 1 file changed, 73 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 21d20ca..aaf0d0d 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -997,6 +997,79 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue imm0, int s) i-op = OP_MOV; break; } + case OP_CVT: { + Storage res; + bld.setPosition(i, true); /* make sure bld is init'ed */ + switch(i-dType) { + case TYPE_U16: + switch (i-sType) { + case TYPE_F32: +if (i-saturate) + res.data.u16 = util_iround(CLAMP(imm0.reg.data.f32, 0, +UINT16_MAX)); +else + res.data.u16 = util_iround(imm0.reg.data.f32); +break; + default: +return; + } This won't get hit for the U32 - U16 conversion though right? Did you test that case? Am I misreading/misunderstanding perhaps? A complete piglit run did not hit i-saturate for U32 or S32. That said, i kept the assert() there on purpose for now to actually make sure we are no hitting such a case. Do i misread you now? :) From my read of the code, we'd hit that case now with TXF on a 2D_ARRAY with a constant as the array element. i.e. a piglit with uniform sampler2DArray foo; texelFetch(foo, ivec3(1, 2, 3)); Tested this (hope i did the right thing) and the assert did not get triggered, but i am still uncertain of this. - move the assert into the F32 case for U32/S32 just to make sure... switch (i-sType) case TYPE_F32: assert(...) ... other than that, we are not even going to fold U32 - U16 ;-) Greetings, Tobias ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau