[Nouveau] [PATCH] nouveau: expose BO domain in the public API

2010-08-12 Thread Luca Barbieri
This can allow drivers to make better choices. Since it is just a field appended to a struct, compatibility is preserved. --- nouveau/nouveau_bo.c |4 ++-- nouveau/nouveau_bo.h |3 +++ nouveau/nouveau_private.h |1 - nouveau/nouveau_pushbuf.c |2 +- 4 files changed, 6

Re: [Nouveau] nvfx

2010-07-24 Thread Luca Barbieri
On Fri, Jul 23, 2010 at 7:01 PM, Patrice Mandin mandin.patr...@orange.fr wrote: Le Fri, 18 Jun 2010 18:43:27 +0200 Marek Olšák mar...@gmail.com a écrit: On Fri, Jun 18, 2010 at 6:05 PM, Patrice Mandin mandin.patr...@orange.frwrote: Le Thu, 17 Jun 2010 03:35:19 +0200 Marek Olšák

Re: [Nouveau] [PATCH] Support writing out the pushbuffer in renouveau trace format (v2)

2010-04-13 Thread Luca Barbieri
Simply putting the dump in the renouveau directory where a renouveau dump was taken previously seems to work for me (probably because we use the same handle values as nVidia?). But yes, the tools should be improved here and dumping the objclass of the grobjs would be necessary for that.

[Nouveau] [PATCH] Support writing out the pushbuffer in renouveau trace format

2010-04-12 Thread Luca Barbieri
This patch causes libdrm, when NOUVEAU_DUMP=1 is set, to write the pushbuffer to stdout instead of submitting it to the card. renouveau-parse can then be used to parse it and obtain a readable trace. This is very useful for debugging and optimizing the Gallium driver. ---

[Nouveau] [PATCH] Support writing out the pushbuffer in renouveau trace format (v2)

2010-04-12 Thread Luca Barbieri
Changes in v2: - Unmap buffers we mapped, avoid assertion - Silence warnings This patch causes libdrm, when NOUVEAU_DUMP=1 is set, to write the pushbuffer to stdout instead of submitting it to the card. renouveau-parse can then be used to parse it and obtain a readable trace. This is very

Re: [Nouveau] [Mesa3d-dev] [radeonhd] Re: Status of s3tc patent in respect to open-source drivers and workarounds

2010-03-29 Thread Luca Barbieri
Interestingly, the post-trial judge opinion at http://wi.findacase.com/research/wfrmDocViewer.aspx/xq/fac.%5CFDCT%5CWWI%5C2008%5C20080801_734.WWI.htm/qx contains the following text: Plaintiff’s expert, Dr. Stevenson, testified that the ‘327 patent is directed to “a special purpose hardware

Re: [Nouveau] [Mesa3d-dev] Status of s3tc patent in respect to open-source drivers and workarounds

2010-03-28 Thread Luca Barbieri
If the application provides s3tc-encoded data through glCompressedTexImage (usually loaded from a pre-compressed texture stored on disk), Mesa will pass it unaltered to the graphics card (as long as the driver/card supports DXT* format ids) and will not need to use any encoding or decoding

[Nouveau] [PATCH] nv40: remove leftover nv40_transfer.c from unification into nvfx

2010-03-15 Thread Luca Barbieri
--- src/gallium/drivers/nv40/nv40_transfer.c | 181 -- 1 files changed, 0 insertions(+), 181 deletions(-) delete mode 100644 src/gallium/drivers/nv40/nv40_transfer.c diff --git a/src/gallium/drivers/nv40/nv40_transfer.c b/src/gallium/drivers/nv40/nv40_transfer.c

Re: [Nouveau] Interrupt setting

2010-03-13 Thread Luca Barbieri
So a GPU itself updates the sequence # of each fence in a specific register, and we can let the Nouveau driver wait for a target value to be written. Do you know when the value is actually written? When the FIFO command instructing the GPU to do the write is executed. If it is written when

Re: [Nouveau] Interrupt setting

2010-03-13 Thread Luca Barbieri
Since you create one fence object for each pushbuf, I thought that we can synchronize only with last the command. Not sure if my assumption is correct... All the commands in the pushbuffer are executed sequentially and the fence setting command is written at the end of the pushbuffer, so when

[Nouveau] [PATCH] nv30/nv40 Gallium drivers unification

2010-03-13 Thread Luca Barbieri
Currently the nv30 and nv40 Gallium drivers are very similar, and contain about 5000 lines of essentially duplicate code. I prepared a patchset (which can be found at http://repo.or.cz/w/mesa/mesa-lb.git/shortlog/refs/heads/unification+fixes) which gradually unifies the drivers, one file per the

Re: [Nouveau] Gallium driver and compatibility issues

2010-03-12 Thread Luca Barbieri
It is not surprising that some (or most) 3D applications don't actually work correctly with nouveau on nv3x right now. The driver will probably improve in the future. ___ Nouveau mailing list Nouveau@lists.freedesktop.org

[Nouveau] Lost R300/NV40 development work you may have

2010-03-11 Thread Luca Barbieri
regards, Luca Barbieri ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] making 0.0.16 into 1.0.0

2010-03-05 Thread Luca Barbieri
Another possible reason for breaking ABI that hasn't yet been mentioned is the fact that right now any DRM client can trivially lock up the GPU and/or corrupt GPU/GART memory belonging to other clients. This happens often with GL driver bugs and is quite annoying for developers and testers of

Re: [Nouveau] [PATCH] renouveau/nv10: remove duplicate vertex buffer registers

2010-03-01 Thread Luca Barbieri
On Mon, Mar 1, 2010 at 2:34 AM, Francisco Jerez curroje...@riseup.net wrote: Luca Barbieri l...@luca-barbieri.com writes: NV10TCL defines the vertex buffer registers both as arrays and as individual named registers. This causes duplicate register definitions and the individual registers

[Nouveau] [PATCH 3/5] renouveau/nv40: set NV40TCL_LINE_STIPPLE_PATTERN to hexa like nv30

2010-02-26 Thread Luca Barbieri
--- renouveau.xml |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/renouveau.xml b/renouveau.xml index 2b6e8d7..d305a8e 100644 --- a/renouveau.xml +++ b/renouveau.xml @@ -4271,7 +4271,7 @@ reg32 offset=0x1db4 name=LINE_STIPPLE_ENABLE type=boolean/ reg32

[Nouveau] [PATCH 4/5] renouveau/nv30: remove clip planes #6 and #7

2010-02-26 Thread Luca Barbieri
These are defined for nv30 and not nv40, and they probably don't exist in the hardware. Both DirectX and OpenGL nVidia drivers support only 6 clip planes on pre-nv50 hardware. Neither the DDX nor the Gallium driver support user clip planes at all on nv30. This makes the definition the same as

[Nouveau] [PATCH] drm/nouveau: fix missing spin_unlock in failure path

2010-02-20 Thread Luca Barbieri
Found by sparse. Signed-off-by: Luca Barbieri l...@luca-barbieri.com --- drivers/gpu/drm/nouveau/nouveau_gem.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c index 03d8935..d7ace31 100644

[Nouveau] [PATCH 1/3] Introduce nouveau_bo_wait for waiting on a BO with a GPU channel (v2)

2010-02-09 Thread Luca Barbieri
Changes in v2: - Addressed review comments nouveau_bo_wait will make the GPU channel wait for fence if possible, otherwise falling back to waiting with the CPU using ttm_bo_wait. The nouveau_fence_sync function currently returns -ENOSYS, and is the focus of the next patch. Signed-off-by: Luca

[Nouveau] [PATCH 2/3] drm/nouveau: add lockless dynamic semaphore allocator (v2)

2010-02-09 Thread Luca Barbieri
modified by the GPU. This is performed by storing a bitmask that allows to alternate between using the values 0 and 1 for a given semaphore. Signed-off-by: Luca Barbieri l...@luca-barbieri.com --- drivers/gpu/drm/nouveau/nouveau_drv.h |9 + drivers/gpu/drm/nouveau/nouveau_fence.c | 265

[Nouveau] [PATCH 3/3] Use semaphores for fully on-GPU interchannel synchronization (v2)

2010-02-09 Thread Luca Barbieri
ACQUIRE or RELEASE is used. On the waiting channel, a fence is also emitted. Once that fence expires, the semaphore is released and can be reused for any purpose. This results in synchronization taking place fully on the GPU, with no CPU waiting necessary. Signed-off-by: Luca Barbieri l...@luca

Re: [Nouveau] [PATCH 1/2] libdrm/nouveau: new optimized libdrm pushbuffer ABI

2010-02-08 Thread Luca Barbieri
IMO, the changes are good.  However, DRM_NOUVEAU_HEADER_PATCHLEVEL is used to indicate the version of the kernel interface that's supported, and not the libdrm API version. OK. Perhaps it would be useful to add a libdrm API version number as well?

[Nouveau] [PATCH] drm/nouveau: enlarge GART aperture (v2)

2010-02-08 Thread Luca Barbieri
sophisticated approach may be preferable. Could anyone with an nv04 test whether this doesn't break there? Signed-off-by: Luca Barbieri l...@luca-barbieri.com --- drivers/gpu/drm/nouveau/nouveau_sgdma.c | 14 -- 1 files changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm

[Nouveau] [PATCH 1/3] Introduce nouveau_bo_wait for waiting on a BO with a GPU channel

2010-02-01 Thread Luca Barbieri
nouveau_bo_wait will make the GPU channel wait for fence if possible, otherwise falling back to waiting with the CPU using ttm_bo_wait. The nouveau_fence_sync function currently returns -ENOSYS, and is the focus of the next patch. Signed-off-by: Luca Barbieri l...@luca-barbieri.com --- drivers

[Nouveau] [PATCH 2/3] drm/nouveau: add lockless dynamic semaphore allocator

2010-02-01 Thread Luca Barbieri
. This is done by adding fields to nouveau_fence. Semaphore values are zeroed when the semaphore BO is allocated, and are afterwards only modified by the GPU. This is performed by storing a bitmask that allows to alternate between using the values 0 and 1 for a given semaphore. Signed-off-by: Luca

Re: [Nouveau] [PATCH 2/3] drm/nouveau: add lockless dynamic semaphore allocator

2010-02-01 Thread Luca Barbieri
How often do we expect cross-channel sync to kick in? Maybe 2-3 times per frame? I suspect contentions will be rare enough to make spinlocks as fast as atomics for all real-life cases, and they don't have such a high maintainability cost. What do you guys think? For the case of a single (or a

Re: [Nouveau] [PATCH 2/3] drm/nouveau: add lockless dynamic semaphore allocator

2010-02-01 Thread Luca Barbieri
Sounds like premature optimization to me. I'm just stating my personal view here, but I have a feeling a patch with 60% of lines could do very well the same for most realistic cases. Perhaps, but really, the only thing you would probably save by using spinlocks in the fast path is retrying in

[Nouveau] [PATCH] drm/nouveau: dehexify nv50_fifo.c

2010-01-30 Thread Luca Barbieri
--- drivers/gpu/drm/nouveau/nv50_fifo.c | 68 +- 1 files changed, 34 insertions(+), 34 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nv50_fifo.c b/drivers/gpu/drm/nouveau/nv50_fifo.c index 32b244b..f0cba1e 100644 --- a/drivers/gpu/drm/nouveau/nv50_fifo.c

[Nouveau] [PATCH] drm/nouveau: dehexify nv50_fifo.c (v2)

2010-01-30 Thread Luca Barbieri
Merged the two patches and added signoff. Signed-off-by: Luca Barbieri l...@luca-barbieri.com --- drivers/gpu/drm/nouveau/nv50_fifo.c | 84 +- 1 files changed, 42 insertions(+), 42 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nv50_fifo.c b/drivers/gpu/drm

[Nouveau] [PATCH 1/2] libdrm/nouveau: new optimized libdrm pushbuffer ABI

2010-01-29 Thread Luca Barbieri
This patch changes the pushbuffer ABI to: 1. No longer use/expose nouveau_pushbuffer. Everything is directly in nouveau_channel. This saves the extra pushbuf pointer dereference. 2. Use cur/end pointers instead of tracking the remaining size. Pushing data now only needs to alter cur and

[Nouveau] [PATCH 2/2] libdrm/nouveau: support writing out the pushbuffer in renouveau trace format

2010-01-29 Thread Luca Barbieri
This patch causes libdrm, when NOUVEAU_DUMP=1 is set, to write the pushbuffer to stdout instead of submitting it to the card. renouveau-parse can then be used to parse it and obtain a readable trace. This is very useful for debugging and optimizing the Gallium driver. ---

Re: [Nouveau] [PATCH] drm/nouveau: call ttm_bo_wait with the bo lock held to prevent hang

2010-01-28 Thread Luca Barbieri
Please apply or state objections to this patch. Thanks. ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH] drm/nouveau: enlarge GART aperture

2010-01-28 Thread Luca Barbieri
-off-by: Luca Barbieri l...@luca-barbieri.com --- drivers/gpu/drm/nouveau/nouveau_sgdma.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c b/drivers/gpu/drm/nouveau/nouveau_sgdma.c index 4c7f1e4..2ca44cc 100644 --- a/drivers/gpu/drm

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-21 Thread Luca Barbieri
At a first glance: 1) We probably *will* need a delayed destroyed workqueue to avoid wasting memory that otherwise should be freed to the system. At the very least, the delayed delete process should optionally be run by a system shrinker. You are right. For VRAM we don't care since we are the

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-21 Thread Luca Barbieri
Nvidia cards have a synchronization primitive that could be used to synchronize several FIFOs in hardware (AKA semaphores, see [1] for an example). Does this operate wholly on the GPU on all nVidia cards? It seems that at least on some GPUs this will trigger software methods that are

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-21 Thread Luca Barbieri
I'm not sure I understand your proposal correctly. It seems your proposoal is similar to mine, replacing the term fence nodes with ttm transactions, but I'm not sure if I understand it correctly Here is some pseudocode for a improved, simplified version of my proposal. It is modified so that

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-21 Thread Luca Barbieri
If not, it could possibly be hacked around by reading from a DMA object at the address of the fence sequence number and then resizing the DMA object so that addresses from a certain point on would trigger a protection fault interrupt. I don't think you can safely modify a DMA object without

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-20 Thread Luca Barbieri
Yes it's fine. I sent your patch to Dave with an expanded commit comment for merging. Here is a possible redesign of the mechanism inspired by this issue. It seems that what we are racing against is buffer eviction, due to delayed deletion buffers being still kept on the LRU list. I'm wondering

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-20 Thread Luca Barbieri
Also note that the delayed delete list is not in fence order but in deletion-time order, which perhaps gives room for more optimizations. You are right. I think then that ttm_bo_delayed_delete may still need to be changed, because it stops when ttm_bo_cleanup_refs returns -EBUSY, which happens

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-20 Thread Luca Barbieri
When designing this, we should also keep in mind that some drivers (e.g. nouveau) have multiple FIFO channels, and thus we would like a buffer to be referenced for reading by multiple channels at once (and be destroyed only when all fences are expired, obviously). Also, hardware may support on-GPU

[Nouveau] [PATCH 1/2] nv30-nv40: Rewrite primitive splitting and emission

2010-01-18 Thread Luca Barbieri
The current code for primitive splitting and emission on pre-nv50 is severely broken. In particular: 1. Quads and lines are totally broken because = 3 should be = ~3 and similar for lines 2. Triangle fans and polygons are broken because the first vertex must be repeated for each split chunk 3.

[Nouveau] [PATCH 2/2] nv40: output relocations on draw calls and not on flushes

2010-01-18 Thread Luca Barbieri
Currently we emit relocations on pushbuffer flushes. However, this is wrong, because the pushbuffer flushes may be due to 2D calls. In particular, this leads to -22: validating while mapped errors in dmesg, since the current vertex buffer can be mapped while a non-draw (e.g. surface_copy) cal is

[Nouveau] [PATCH] nv40: add support for ARB_half_float_vertex

2010-01-18 Thread Luca Barbieri
This requires the arb_half_float_vertex Mesa branch, plus some unreleased gallium support work by Dave Airlie. You may need to fix an assertion in st_pipe_vertex_format too. --- src/gallium/drivers/nv40/nv40_vbo.c | 14 ++ 1 files changed, 14 insertions(+), 0 deletions(-) diff

[Nouveau] [PATCH] nv40: add missing vertprog setcond instructions

2010-01-18 Thread Luca Barbieri
Trivially adds SEQ, SGT, SLE, SNE, SFL, STR and SSG which were missing. --- src/gallium/drivers/nv40/nv40_vertprog.c | 21 + 1 files changed, 21 insertions(+), 0 deletions(-) diff --git a/src/gallium/drivers/nv40/nv40_vertprog.c b/src/gallium/drivers/nv40/nv40_vertprog.c

[Nouveau] [PATCH] nv40: add missing vertprog setcond instructions (v2)

2010-01-18 Thread Luca Barbieri
Trivially adds SEQ, SGT, SLE, SNE, SFL, STR and SSG which were missing. Changed to preserv alphabetic order of cases. --- src/gallium/drivers/nv40/nv40_vertprog.c | 21 + 1 files changed, 21 insertions(+), 0 deletions(-) diff --git

[Nouveau] [PATCH] nv30-nv40: support unlimited queries (v2)

2010-01-18 Thread Luca Barbieri
Currently on NV30/NV40 an assert will be triggered once 32 queries are outstanding. This violates the OpenGL/Gallium interface, which requires support for an unlimited number of fences. This patch fixes the problem by putting queries in a linked list and waiting on the oldest one if allocation

Re: [Nouveau] [Mesa3d-dev] [PATCH 2/2] st: don't assert on empty fragment program

2010-01-18 Thread Luca Barbieri
Breakpoint 3, _mesa_ProgramStringARB (target=34820, format=34933, len=70, string=0x85922ba) at shader/arbprogram.c:434 434GET_CURRENT_CONTEXT(ctx); $31 = 0x85922ba !!ARBfp1.0\n\nOPTION ARB_precision_hint_fastest;\n\n\n\nEND\n Not sure why Sauerbraten does this, but it

Re: [Nouveau] [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

2010-01-18 Thread Luca Barbieri
If you get this patch in, then you'll still have to fight with every other state tracker that doesn't prettify their TGSI. It would be a much better approach to attempt to RE the routing tables. I don't think there any users of the Gallium interface that need more than 8 vertex

[Nouveau] [PATCH] libdrm/nouveau: Support nested bo mapping

2010-01-17 Thread Luca Barbieri
Most Gallium drivers support nested mapping by using a reference count. We don't, and swtnl fallback triggers an error due to this. This patch adds this support in libdrm. --- nouveau/nouveau_bo.c |8 +++- nouveau/nouveau_private.h |1 + 2 files changed, 8 insertions(+), 1

[Nouveau] [PATCH 1/2] nv40: don't crash on empty fragment program

2010-01-17 Thread Luca Barbieri
--- src/gallium/drivers/nv40/nv40_fragprog.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/src/gallium/drivers/nv40/nv40_fragprog.c b/src/gallium/drivers/nv40/nv40_fragprog.c index 1237066..209d211 100644 --- a/src/gallium/drivers/nv40/nv40_fragprog.c +++

[Nouveau] [PATCH 2/2] st: don't assert on empty fragment program

2010-01-17 Thread Luca Barbieri
Sauerbraten triggers this assert. --- src/mesa/state_tracker/st_atom_shader.c |2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/src/mesa/state_tracker/st_atom_shader.c b/src/mesa/state_tracker/st_atom_shader.c index 176f3ea..fce533a 100644 ---

[Nouveau] [PATCH 1/2] nv30-nv40: support unlimited queries

2010-01-17 Thread Luca Barbieri
Currently on NV30/NV40 an assert will be triggered once 32 queries are outstanding. This violates the OpenGL/Gallium interface, which requires support for an unlimited number of fences. This patch fixes the problem by putting queries in a linked list and waiting on the oldest one if allocation

[Nouveau] [PATCH 2/2] nv30/nv40: allocate a bigger block for queries

2010-01-17 Thread Luca Barbieri
This patch allocates a bigger chunk of memory to store queries in, increasing the (hidden) outstanding query limit from 32 to 125. It also tries to make use of a 16KB notifier block if the kernel supports that. The blob supports 1024 queries due to their 16KB query block and 16-byte rather than

[Nouveau] [PATCH] drm/nouveau: Evict buffers in VRAM before freeing sgdma

2010-01-16 Thread Luca Barbieri
happen because there aren't any buffer on close. However, if the GPU is locked up, this condition is easily triggered. This patch fixes it in the simplest way possible by cleaning VRAM right before cleaning SGDMA memory. Signed-off-by: Luca Barbieri l...@luca-barbieri.com --- drivers/gpu/drm

[Nouveau] More on GART vertex buffer corruption

2010-01-14 Thread Luca Barbieri
I looked a bit more into the problem of vertex corruption with GART vertex buffers that disappears putting the buffers in VRAM that I'm experiencing on my card. The system I'm seeing this on is a Dell Inspiron 9400 notebook with a GeForce Go 7900 GS on a PCI Express Intel i945 chipset. First,

Re: [Nouveau] [PATCH] drm/nouveau: Check pushbuffer bounds in system call

2010-01-13 Thread Luca Barbieri
Any issues with this patch? ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH] nv40: Correct zsa so_new size

2010-01-13 Thread Luca Barbieri
Triggered by Doom 3. --- src/gallium/drivers/nv40/nv40_state.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/src/gallium/drivers/nv40/nv40_state.c b/src/gallium/drivers/nv40/nv40_state.c index ed0ca9e..4e3a61f 100644 --- a/src/gallium/drivers/nv40/nv40_state.c +++

[Nouveau] [PATCH] nv20-nv40: Add support for two sided color

2010-01-13 Thread Luca Barbieri
This patch adds support for two-sided vertex color to NV20, NV30 and NV40. When set, the COLOR0/1 fs inputs on back faces will be wired to vs outputs BCOLOR0/1. This makes OpenGL two sided lighting work, which can be tested with progs/demos/projtex. This is already supported on NV50 and seems

Re: [Nouveau] [Discussion] User controls for PowerManagement

2010-01-09 Thread Luca Barbieri
How about taking inspiration from the cpufreq sysfs interface? There are sysfs objects for drm cards at /sys/class/drm/cardnumber. Mine, for instance, is at /sys/class/drm/card0, which links to /sys/devices/pci:00/:00:01.0/:01:00.0/drm/card0. A simple scaling approach could just look

[Nouveau] Findings on pre-NV50 miptree layout

2010-01-08 Thread Luca Barbieri
I wrote a tool for automatically finding out the texture layout for Gallium drivers. You can find it attached to http://sourceforge.net/mailarchive/forum.php?thread_name=ff13bc9a1001081140y18450c3ejdfac25c9260fd367%40mail.gmail.comforum_name=mesa3d-dev . Here are the findings from running it. The

[Nouveau] [PATCH] Fix null deref in ttm_bo_mem_space caused by forgetting to set placement.busy_placement

2010-01-05 Thread Luca Barbieri
Set it to the same value of placement.placement Triggered by running etracer under compiz. Signed-off-by: Luca Barbieri l...@luca-barbieri.com --- drivers/gpu/drm/nouveau/nouveau_bo.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau

[Nouveau] [PATCH] Fix null deref in nouveau_fence_emit due to deleted fence

2010-01-05 Thread Luca Barbieri
Currently Nouveau will unvalidate all buffers if it is forced to wait on one, and then start revalidating from the beginning. While doing so, it destroys the operation fence, causing nouveau_fence_emit to crash. This patch fixes this bug by taking the fence object out of validate_op and

[Nouveau] [PATCH] Print NOUVEAU_NO_SWIZZLE and NOUVEAU_NO_TRANSFER messages only once

2009-12-31 Thread Luca Barbieri
Currently we are continuously spewing messages messages about these variables since we call debug_get_bool_option everytime we want to check their value This is annoying, slows things down due to terminal rerendering and obscures useful messages. This patch only calls debug_get_bool_option once

[Nouveau] [PATCH] Autogenerate uureg opcode macros

2009-12-31 Thread Luca Barbieri
Also some missing _src()s and cosmetic changes. --- src/gallium/programs/galliumut/Makefile|5 + .../programs/galliumut/gen_uureg_opcodes.sh| 29 +++ src/gallium/programs/galliumut/uureg.h | 196 3 files changed, 71 insertions(+), 159

Re: [Nouveau] [PATCH] Autogenerate uureg opcode macros

2009-12-31 Thread Luca Barbieri
This was supposed to go to mesa3d. On Thu, Dec 31, 2009 at 6:24 PM, Luca Barbieri l...@luca-barbieri.comwrote: Also some missing _src()s and cosmetic changes. --- src/gallium/programs/galliumut/Makefile|5 + .../programs/galliumut/gen_uureg_opcodes.sh| 29 +++ src

[Nouveau] [PATCH] Correct miptree layout for cubemaps on NV20-NV40

2009-12-30 Thread Luca Barbieri
It seems that the current miptree layout is incorrect because the size of all the levels of each cube map face must be 64-byte aligned. This patch fixes piglit cubemap and fbo-cubemap which were broken. This makes sense since otherwise all the levels would no longer be 64-byte aligned, which the

[Nouveau] [PATCH] Correct swizzled surfaces patch

2009-12-29 Thread Luca Barbieri
My swizzling fix incorrectly used the dimensions of the copy rectangle instead of that of the destination surface. This patch fixes that. diff --git a/src/gallium/drivers/nv04/nv04_surface_2d.c b/src/gallium/drivers/nv04/nv04_surface_2d.c index ca0c433..481315e 100644 ---

[Nouveau] [PATCH] Fix glTexSubImage on swizzled surfaces on =NV40

2009-12-29 Thread Luca Barbieri
Currently in nvXX_transfer_new a temporary as large as the surface is created. If the subrectangle is not the whole texture we would need to read back the whole texture, but we aren't. Thus, everything but the subrectangle specified is loaded as garbage. This can be seen in progs/demos/ray. This

Re: [Nouveau] [PATCH] Fix glTexSubImage on swizzled surfaces on =NV40

2009-12-29 Thread Luca Barbieri
Ignore that patch. It's broken because we must set the offset for the up to 1024x1024 chunk we are copying instead of the whole image. The corrected patch is attached. diff --git a/src/gallium/drivers/nv04/nv04_surface_2d.c b/src/gallium/drivers/nv04/nv04_surface_2d.c index ca0c433..3193086 100644

Re: [Nouveau] [PATCH] Fix glTexSubImage on swizzled surfaces on =NV40

2009-12-29 Thread Luca Barbieri
Third attempt, as the second one was logically wrong. The problem in the first patch was actually that the source point register has a 1024 limit. This one leaves the way the source is set up alone, and sets the whole surface as the destination like in the first version, using the point registers

[Nouveau] [PATCH] Fix surface_fill alpha

2009-12-29 Thread Luca Barbieri
Currently surface_fill sets alpha incorrectly to 1.0 when drawing to A8R8G8B8 instead of the correct value. xf86-video-nouveau has the following comment confirming the issue: /* When SURFACE_FORMAT_A8R8G8B8 is used with GDI_RECTANGLE_TEXT, the * alpha channel gets forced to 0xFF

Re: [Nouveau] Synchronization mostly missing?

2009-12-28 Thread Luca Barbieri
It looks like there are two bugs. One seems related to some kind of GPU cache of GART memory which does not get flushed, causes significant corruption and is worked around by putting buffers in VRAM, software TNL or immediate submission. It may be related to the NV40TCL_VTX_CACHE_INVALIDATE which

Re: [Nouveau] Synchronization mostly missing?

2009-12-28 Thread Luca Barbieri
It looks like there are two bugs. One seems related to some kind of cache of GART memory which does not get flushed, causes significant corruption and is worked around by putting buffers in VRAM. For some reason, adding syncing instead of putting buffers of VRAM does seem to greatly reduce the

[Nouveau] Synchronization mostly missing?

2009-12-27 Thread Luca Barbieri
It seems that Noveau is assuming that once the FIFO pointer is past a command, that command has finished executing, and all the buffers it used are no longer needed. However, this seems to be false at least on G71. In particular, the card may not have even finished reading the input vertex

Re: [Nouveau] Synchronization mostly missing?

2009-12-27 Thread Luca Barbieri
I figured out the registers. There is a fence/sync mechanism which apparently triggers after rendering is finished. There are two ways to use it, but they trigger at the same time (spinning in a loop on the CPU checking them, they trigger at the same iteration or in two successive iterations).

Re: [Nouveau] Synchronization mostly missing?

2009-12-27 Thread Luca Barbieri
Can you reproduce this with your vertex buffers in VRAM instead of GART? (to rule out that it's a fencing issue). Putting the vertex buffers in VRAM makes things almost perfect, but still with rare artifacts. In particular, the yellow arrow in dinoshade sometimes becames a yellow polygon on the

Re: [Nouveau] [MESA PATCH] Fix nv40_miptree_layout pitch

2009-12-26 Thread Luca Barbieri
On Sun, Dec 27, 2009 at 2:25 AM, Younes Manton youne...@gmail.com wrote: On Sat, Dec 26, 2009 at 1:22 AM, Luca Barbieri l...@luca-barbieri.com wrote: I just coded a patch that does this and seems to work fine. It must be fixed since it breaks OpenGL (or the state tracker can be changed

Re: [Nouveau] Fix swizzling for copies to rectangular textures

2009-12-26 Thread Luca Barbieri
Patch was mangled, resent attached. diff --git a/src/gallium/drivers/nv04/nv04_surface_2d.c b/src/gallium/drivers/nv04/nv04_surface_2d.c index 12df7fd..40b538f 100644 --- a/src/gallium/drivers/nv04/nv04_surface_2d.c +++ b/src/gallium/drivers/nv04/nv04_surface_2d.c @@ -77,7 +77,7 @@

[Nouveau] [MESA PATCH] Fix nv40_miptree_layout pitch

2009-12-25 Thread Luca Barbieri
This patch fixes two issues in nv40_miptree_layout. First, pt-width0 is used, which is the size of the whole texture, while width, which is the size of the mipmap level, should be used. Second, the current code does not 64-byte align the pitch of swizzled textures. However, on my NV40 this

Re: [Nouveau] [MESA PATCH] Fix nv40_miptree_layout pitch

2009-12-25 Thread Luca Barbieri
You are right. The patch is wrong. Both changes fix my program, but do break OpenGL (e.g. redbook/mipmap). I managed to reproduce the problem with perf/genmipmap. When run, it causes several instances of one of these 3 errors (using swizzled textures): [12949.125732] [drm] nouveau :01:00.0:

Re: [Nouveau] [MESA PATCH] Fix nv40_miptree_layout pitch

2009-12-25 Thread Luca Barbieri
I just coded a patch that does this and seems to work fine. It must be fixed since it breaks OpenGL (or the state tracker can be changed, but it seems better to do it in the driver). The patch also fixes NV20 and NV30 in the same way. They compile but are untested. I would guess that using the

[Nouveau] Fix swizzling for copies to rectangular textures

2009-12-25 Thread Luca Barbieri
nVidia hardware seems to swizzle rectangular texture (with width != height) coordinates by swizzling the lower bits and then adding the higher bits from the larger dimension. However, nv04_swizzle_bits ignores width and height and just interleaves everything. This causes problems with rectangular

[Nouveau] [PATCH] NV30/NV40 CMP and SCS src == dst handling

2009-12-25 Thread Luca Barbieri
CMP and SCS can produce incorrect results if the source and destination are the same. This patch should fix the issues. CMP is fixed by predicating both moves. SCS by changing the order if the source component is X. diff --git a/src/gallium/drivers/nv30/nv30_fragprog.c