This can allow drivers to make better choices.
Since it is just a field appended to a struct, compatibility is preserved.
---
nouveau/nouveau_bo.c |4 ++--
nouveau/nouveau_bo.h |3 +++
nouveau/nouveau_private.h |1 -
nouveau/nouveau_pushbuf.c |2 +-
4 files changed, 6
On Fri, Jul 23, 2010 at 7:01 PM, Patrice Mandin
mandin.patr...@orange.fr wrote:
Le Fri, 18 Jun 2010 18:43:27 +0200
Marek Olšák mar...@gmail.com a écrit:
On Fri, Jun 18, 2010 at 6:05 PM, Patrice Mandin
mandin.patr...@orange.frwrote:
Le Thu, 17 Jun 2010 03:35:19 +0200
Marek Olšák
Simply putting the dump in the renouveau directory where a renouveau
dump was taken previously seems to work for me (probably because we
use the same handle values as nVidia?).
But yes, the tools should be improved here and dumping the objclass of
the grobjs would be necessary for that.
This patch causes libdrm, when NOUVEAU_DUMP=1 is set, to write the
pushbuffer to stdout instead of submitting it to the card.
renouveau-parse can then be used to parse it and obtain a readable
trace.
This is very useful for debugging and optimizing the Gallium driver.
---
Changes in v2:
- Unmap buffers we mapped, avoid assertion
- Silence warnings
This patch causes libdrm, when NOUVEAU_DUMP=1 is set, to write the
pushbuffer to stdout instead of submitting it to the card.
renouveau-parse can then be used to parse it and obtain a readable
trace.
This is very
Interestingly, the post-trial judge opinion at
http://wi.findacase.com/research/wfrmDocViewer.aspx/xq/fac.%5CFDCT%5CWWI%5C2008%5C20080801_734.WWI.htm/qx
contains the following text:
Plaintiff’s expert, Dr. Stevenson, testified that the ‘327 patent is
directed to “a special
purpose hardware
If the application provides s3tc-encoded data through
glCompressedTexImage (usually loaded from a pre-compressed texture
stored on disk), Mesa will pass it unaltered to the graphics card (as
long as the driver/card supports DXT* format ids) and will not need to
use any encoding or decoding
---
src/gallium/drivers/nv40/nv40_transfer.c | 181 --
1 files changed, 0 insertions(+), 181 deletions(-)
delete mode 100644 src/gallium/drivers/nv40/nv40_transfer.c
diff --git a/src/gallium/drivers/nv40/nv40_transfer.c
b/src/gallium/drivers/nv40/nv40_transfer.c
So a GPU itself updates the sequence # of each fence in a specific register,
and we can let the Nouveau driver wait for a target
value to be written.
Do you know when the value is actually written?
When the FIFO command instructing the GPU to do the write is executed.
If it is written when
Since you create one fence object for each pushbuf, I thought that we can
synchronize only with last the command.
Not sure if my assumption is correct...
All the commands in the pushbuffer are executed sequentially and the
fence setting command is written at the end of the pushbuffer, so when
Currently the nv30 and nv40 Gallium drivers are very similar, and
contain about 5000 lines of essentially duplicate code.
I prepared a patchset (which can be found at
http://repo.or.cz/w/mesa/mesa-lb.git/shortlog/refs/heads/unification+fixes)
which gradually unifies the drivers, one file per the
It is not surprising that some (or most) 3D applications don't
actually work correctly with nouveau on nv3x right now.
The driver will probably improve in the future.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
regards,
Luca Barbieri
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau
Another possible reason for breaking ABI that hasn't yet been
mentioned is the fact that right now any DRM client can trivially lock
up the GPU and/or corrupt GPU/GART memory belonging to other clients.
This happens often with GL driver bugs and is quite annoying for
developers and testers of
On Mon, Mar 1, 2010 at 2:34 AM, Francisco Jerez curroje...@riseup.net wrote:
Luca Barbieri l...@luca-barbieri.com writes:
NV10TCL defines the vertex buffer registers both as arrays and as
individual named registers.
This causes duplicate register definitions and the individual registers
---
renouveau.xml |2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/renouveau.xml b/renouveau.xml
index 2b6e8d7..d305a8e 100644
--- a/renouveau.xml
+++ b/renouveau.xml
@@ -4271,7 +4271,7 @@
reg32 offset=0x1db4 name=LINE_STIPPLE_ENABLE type=boolean/
reg32
These are defined for nv30 and not nv40, and they probably don't
exist in the hardware.
Both DirectX and OpenGL nVidia drivers support only 6 clip planes on
pre-nv50 hardware.
Neither the DDX nor the Gallium driver support user clip planes at all
on nv30.
This makes the definition the same as
Found by sparse.
Signed-off-by: Luca Barbieri l...@luca-barbieri.com
---
drivers/gpu/drm/nouveau/nouveau_gem.c |2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c
b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 03d8935..d7ace31 100644
Changes in v2:
- Addressed review comments
nouveau_bo_wait will make the GPU channel wait for fence if possible,
otherwise falling back to waiting with the CPU using ttm_bo_wait.
The nouveau_fence_sync function currently returns -ENOSYS, and is
the focus of the next patch.
Signed-off-by: Luca
modified by the GPU.
This is performed by storing a bitmask that allows to alternate
between using the values 0 and 1 for a given semaphore.
Signed-off-by: Luca Barbieri l...@luca-barbieri.com
---
drivers/gpu/drm/nouveau/nouveau_drv.h |9 +
drivers/gpu/drm/nouveau/nouveau_fence.c | 265
ACQUIRE
or RELEASE is used.
On the waiting channel, a fence is also emitted. Once that fence
expires, the semaphore is released and can be reused for any purpose.
This results in synchronization taking place fully on the GPU, with
no CPU waiting necessary.
Signed-off-by: Luca Barbieri l...@luca
IMO, the changes are good. However, DRM_NOUVEAU_HEADER_PATCHLEVEL is
used to indicate the version of the kernel interface that's supported,
and not the libdrm API version.
OK.
Perhaps it would be useful to add a libdrm API version number as well?
sophisticated
approach may be preferable.
Could anyone with an nv04 test whether this doesn't break there?
Signed-off-by: Luca Barbieri l...@luca-barbieri.com
---
drivers/gpu/drm/nouveau/nouveau_sgdma.c | 14 --
1 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm
nouveau_bo_wait will make the GPU channel wait for fence if possible,
otherwise falling back to waiting with the CPU using ttm_bo_wait.
The nouveau_fence_sync function currently returns -ENOSYS, and is
the focus of the next patch.
Signed-off-by: Luca Barbieri l...@luca-barbieri.com
---
drivers
.
This is done by adding fields to nouveau_fence.
Semaphore values are zeroed when the semaphore BO is allocated, and
are afterwards only modified by the GPU.
This is performed by storing a bitmask that allows to alternate
between using the values 0 and 1 for a given semaphore.
Signed-off-by: Luca
How often do we expect cross-channel sync to kick in? Maybe 2-3 times
per frame? I suspect contentions will be rare enough to make spinlocks
as fast as atomics for all real-life cases, and they don't have such a
high maintainability cost. What do you guys think?
For the case of a single (or a
Sounds like premature optimization to me. I'm just stating my personal
view here, but I have a feeling a patch with 60% of lines could do very
well the same for most realistic cases.
Perhaps, but really, the only thing you would probably save by using
spinlocks in the fast path is retrying in
---
drivers/gpu/drm/nouveau/nv50_fifo.c | 68 +-
1 files changed, 34 insertions(+), 34 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nv50_fifo.c
b/drivers/gpu/drm/nouveau/nv50_fifo.c
index 32b244b..f0cba1e 100644
--- a/drivers/gpu/drm/nouveau/nv50_fifo.c
Merged the two patches and added signoff.
Signed-off-by: Luca Barbieri l...@luca-barbieri.com
---
drivers/gpu/drm/nouveau/nv50_fifo.c | 84 +-
1 files changed, 42 insertions(+), 42 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nv50_fifo.c
b/drivers/gpu/drm
This patch changes the pushbuffer ABI to:
1. No longer use/expose nouveau_pushbuffer. Everything is directly
in nouveau_channel. This saves the extra pushbuf pointer dereference.
2. Use cur/end pointers instead of tracking the remaining size.
Pushing data now only needs to alter cur and
This patch causes libdrm, when NOUVEAU_DUMP=1 is set, to write the
pushbuffer to stdout instead of submitting it to the card.
renouveau-parse can then be used to parse it and obtain a readable
trace.
This is very useful for debugging and optimizing the Gallium driver.
---
Please apply or state objections to this patch.
Thanks.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau
-off-by: Luca Barbieri l...@luca-barbieri.com
---
drivers/gpu/drm/nouveau/nouveau_sgdma.c |2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
index 4c7f1e4..2ca44cc 100644
--- a/drivers/gpu/drm
At a first glance:
1) We probably *will* need a delayed destroyed workqueue to avoid wasting
memory that otherwise should be freed to the system. At the very least, the
delayed delete process should optionally be run by a system shrinker.
You are right. For VRAM we don't care since we are the
Nvidia cards have a synchronization primitive that could be used to
synchronize several FIFOs in hardware (AKA semaphores, see [1] for an
example).
Does this operate wholly on the GPU on all nVidia cards?
It seems that at least on some GPUs this will trigger software
methods that are
I'm not sure I understand your proposal correctly.
It seems your proposoal is similar to mine, replacing the term fence
nodes with ttm transactions, but I'm not sure if I understand it
correctly
Here is some pseudocode for a improved, simplified version of my proposal.
It is modified so that
If not, it could possibly be hacked around by reading from a DMA
object at the address of the fence sequence number and then resizing
the DMA object so that addresses from a certain point on would trigger
a protection fault interrupt.
I don't think you can safely modify a DMA object without
Yes it's fine. I sent your patch to Dave with an expanded commit
comment for merging.
Here is a possible redesign of the mechanism inspired by this issue.
It seems that what we are racing against is buffer eviction, due to
delayed deletion buffers being still kept on the LRU list.
I'm wondering
Also note that the delayed delete list is not in fence order but in
deletion-time order, which perhaps gives room for more optimizations.
You are right.
I think then that ttm_bo_delayed_delete may still need to be changed,
because it stops when ttm_bo_cleanup_refs returns -EBUSY, which
happens
When designing this, we should also keep in mind that some drivers
(e.g. nouveau) have multiple FIFO channels, and thus we would like a
buffer to be referenced for reading by multiple channels at once (and
be destroyed only when all fences are expired, obviously).
Also, hardware may support on-GPU
The current code for primitive splitting and emission on pre-nv50 is
severely broken.
In particular:
1. Quads and lines are totally broken because = 3 should be = ~3
and similar for lines
2. Triangle fans and polygons are broken because the first vertex
must be repeated for each split chunk
3.
Currently we emit relocations on pushbuffer flushes.
However, this is wrong, because the pushbuffer flushes may be due to 2D
calls.
In particular, this leads to -22: validating while mapped errors in
dmesg, since the current vertex buffer can be mapped while a non-draw
(e.g. surface_copy) cal is
This requires the arb_half_float_vertex Mesa branch, plus some unreleased
gallium support work by Dave Airlie.
You may need to fix an assertion in st_pipe_vertex_format too.
---
src/gallium/drivers/nv40/nv40_vbo.c | 14 ++
1 files changed, 14 insertions(+), 0 deletions(-)
diff
Trivially adds SEQ, SGT, SLE, SNE, SFL, STR and SSG which were missing.
---
src/gallium/drivers/nv40/nv40_vertprog.c | 21 +
1 files changed, 21 insertions(+), 0 deletions(-)
diff --git a/src/gallium/drivers/nv40/nv40_vertprog.c
b/src/gallium/drivers/nv40/nv40_vertprog.c
Trivially adds SEQ, SGT, SLE, SNE, SFL, STR and SSG which were missing.
Changed to preserv alphabetic order of cases.
---
src/gallium/drivers/nv40/nv40_vertprog.c | 21 +
1 files changed, 21 insertions(+), 0 deletions(-)
diff --git
Currently on NV30/NV40 an assert will be triggered once 32 queries are
outstanding.
This violates the OpenGL/Gallium interface, which requires support for
an unlimited number of fences.
This patch fixes the problem by putting queries in a linked list and
waiting on the oldest one if allocation
Breakpoint 3, _mesa_ProgramStringARB (target=34820, format=34933,
len=70, string=0x85922ba) at shader/arbprogram.c:434
434GET_CURRENT_CONTEXT(ctx);
$31 = 0x85922ba !!ARBfp1.0\n\nOPTION
ARB_precision_hint_fastest;\n\n\n\nEND\n
Not sure why Sauerbraten does this, but it
If you get this patch in, then you'll still have to fight with every
other state tracker that doesn't prettify their TGSI. It would be a
much better approach to attempt to RE the routing tables.
I don't think there any users of the Gallium interface that need more
than 8 vertex
Most Gallium drivers support nested mapping by using a reference count.
We don't, and swtnl fallback triggers an error due to this.
This patch adds this support in libdrm.
---
nouveau/nouveau_bo.c |8 +++-
nouveau/nouveau_private.h |1 +
2 files changed, 8 insertions(+), 1
---
src/gallium/drivers/nv40/nv40_fragprog.c |3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/src/gallium/drivers/nv40/nv40_fragprog.c
b/src/gallium/drivers/nv40/nv40_fragprog.c
index 1237066..209d211 100644
--- a/src/gallium/drivers/nv40/nv40_fragprog.c
+++
Sauerbraten triggers this assert.
---
src/mesa/state_tracker/st_atom_shader.c |2 --
1 files changed, 0 insertions(+), 2 deletions(-)
diff --git a/src/mesa/state_tracker/st_atom_shader.c
b/src/mesa/state_tracker/st_atom_shader.c
index 176f3ea..fce533a 100644
---
Currently on NV30/NV40 an assert will be triggered once 32 queries are
outstanding.
This violates the OpenGL/Gallium interface, which requires support for
an unlimited number of fences.
This patch fixes the problem by putting queries in a linked list and
waiting on the oldest one if allocation
This patch allocates a bigger chunk of memory to store queries in,
increasing the (hidden) outstanding query limit from 32 to 125.
It also tries to make use of a 16KB notifier block if the kernel
supports that.
The blob supports 1024 queries due to their 16KB query block and
16-byte rather than
happen because there aren't any buffer on close.
However, if the GPU is locked up, this condition is easily triggered.
This patch fixes it in the simplest way possible by cleaning VRAM
right before cleaning SGDMA memory.
Signed-off-by: Luca Barbieri l...@luca-barbieri.com
---
drivers/gpu/drm
I looked a bit more into the problem of vertex corruption with GART
vertex buffers that disappears putting the buffers in VRAM that I'm
experiencing on my card.
The system I'm seeing this on is a Dell Inspiron 9400 notebook with a
GeForce Go 7900 GS on a PCI Express Intel i945 chipset.
First,
Any issues with this patch?
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau
Triggered by Doom 3.
---
src/gallium/drivers/nv40/nv40_state.c |2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/src/gallium/drivers/nv40/nv40_state.c
b/src/gallium/drivers/nv40/nv40_state.c
index ed0ca9e..4e3a61f 100644
--- a/src/gallium/drivers/nv40/nv40_state.c
+++
This patch adds support for two-sided vertex color to NV20, NV30 and NV40.
When set, the COLOR0/1 fs inputs on back faces will be wired to vs outputs
BCOLOR0/1.
This makes OpenGL two sided lighting work, which can be tested with
progs/demos/projtex.
This is already supported on NV50 and seems
How about taking inspiration from the cpufreq sysfs interface?
There are sysfs objects for drm cards at /sys/class/drm/cardnumber.
Mine, for instance, is at /sys/class/drm/card0, which links to
/sys/devices/pci:00/:00:01.0/:01:00.0/drm/card0.
A simple scaling approach could just look
I wrote a tool for automatically finding out the texture layout for Gallium
drivers.
You can find it attached to
http://sourceforge.net/mailarchive/forum.php?thread_name=ff13bc9a1001081140y18450c3ejdfac25c9260fd367%40mail.gmail.comforum_name=mesa3d-dev
.
Here are the findings from running it.
The
Set it to the same value of placement.placement
Triggered by running etracer under compiz.
Signed-off-by: Luca Barbieri l...@luca-barbieri.com
---
drivers/gpu/drm/nouveau/nouveau_bo.c |4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/nouveau
Currently Nouveau will unvalidate all buffers if it is forced to wait on one,
and then start revalidating from the beginning.
While doing so, it destroys the operation fence, causing nouveau_fence_emit to
crash.
This patch fixes this bug by taking the fence object out of validate_op and
Currently we are continuously spewing messages messages about these variables
since we call debug_get_bool_option everytime we want to check their value
This is annoying, slows things down due to terminal rerendering and obscures
useful messages.
This patch only calls debug_get_bool_option once
Also some missing _src()s and cosmetic changes.
---
src/gallium/programs/galliumut/Makefile|5 +
.../programs/galliumut/gen_uureg_opcodes.sh| 29 +++
src/gallium/programs/galliumut/uureg.h | 196
3 files changed, 71 insertions(+), 159
This was supposed to go to mesa3d.
On Thu, Dec 31, 2009 at 6:24 PM, Luca Barbieri l...@luca-barbieri.comwrote:
Also some missing _src()s and cosmetic changes.
---
src/gallium/programs/galliumut/Makefile|5 +
.../programs/galliumut/gen_uureg_opcodes.sh| 29 +++
src
It seems that the current miptree layout is incorrect because the size
of all the levels of each cube map face must be 64-byte aligned.
This patch fixes piglit cubemap and fbo-cubemap which were broken.
This makes sense since otherwise all the levels would no longer be
64-byte aligned, which the
My swizzling fix incorrectly used the dimensions of the copy rectangle
instead of that of the destination surface. This patch fixes that.
diff --git a/src/gallium/drivers/nv04/nv04_surface_2d.c b/src/gallium/drivers/nv04/nv04_surface_2d.c
index ca0c433..481315e 100644
---
Currently in nvXX_transfer_new a temporary as large as the surface is created.
If the subrectangle is not the whole texture we would need to read
back the whole texture, but we aren't.
Thus, everything but the subrectangle specified is loaded as garbage.
This can be seen in progs/demos/ray.
This
Ignore that patch. It's broken because we must set the offset for the
up to 1024x1024 chunk we are copying instead of the whole image.
The corrected patch is attached.
diff --git a/src/gallium/drivers/nv04/nv04_surface_2d.c b/src/gallium/drivers/nv04/nv04_surface_2d.c
index ca0c433..3193086 100644
Third attempt, as the second one was logically wrong.
The problem in the first patch was actually that the source point
register has a 1024 limit.
This one leaves the way the source is set up alone, and sets the whole
surface as the destination like in the first version, using the point
registers
Currently surface_fill sets alpha incorrectly to 1.0 when drawing to
A8R8G8B8 instead of the correct value.
xf86-video-nouveau has the following comment confirming the issue:
/* When SURFACE_FORMAT_A8R8G8B8 is used with GDI_RECTANGLE_TEXT, the
* alpha channel gets forced to 0xFF
It looks like there are two bugs.
One seems related to some kind of GPU cache of GART memory which does
not get flushed, causes significant corruption and is worked around by
putting buffers in VRAM, software TNL or immediate submission.
It may be related to the NV40TCL_VTX_CACHE_INVALIDATE which
It looks like there are two bugs.
One seems related to some kind of cache of GART memory which does not
get flushed, causes significant corruption and is worked around by
putting buffers in VRAM.
For some reason, adding syncing instead of putting buffers of VRAM
does seem to greatly reduce the
It seems that Noveau is assuming that once the FIFO pointer is past a
command, that command has finished executing, and all the buffers it
used are no longer needed.
However, this seems to be false at least on G71.
In particular, the card may not have even finished reading the input
vertex
I figured out the registers.
There is a fence/sync mechanism which apparently triggers after
rendering is finished.
There are two ways to use it, but they trigger at the same time
(spinning in a loop on the CPU checking them, they trigger at the same
iteration or in two successive iterations).
Can you reproduce this with your vertex buffers in VRAM instead of GART?
(to rule out that it's a fencing issue).
Putting the vertex buffers in VRAM makes things almost perfect, but
still with rare artifacts.
In particular, the yellow arrow in dinoshade sometimes becames a
yellow polygon on the
On Sun, Dec 27, 2009 at 2:25 AM, Younes Manton youne...@gmail.com wrote:
On Sat, Dec 26, 2009 at 1:22 AM, Luca Barbieri l...@luca-barbieri.com wrote:
I just coded a patch that does this and seems to work fine. It must be
fixed since it breaks OpenGL (or the state tracker can be changed
Patch was mangled, resent attached.
diff --git a/src/gallium/drivers/nv04/nv04_surface_2d.c b/src/gallium/drivers/nv04/nv04_surface_2d.c
index 12df7fd..40b538f 100644
--- a/src/gallium/drivers/nv04/nv04_surface_2d.c
+++ b/src/gallium/drivers/nv04/nv04_surface_2d.c
@@ -77,7 +77,7 @@
This patch fixes two issues in nv40_miptree_layout.
First, pt-width0 is used, which is the size of the whole texture,
while width, which is the size of the mipmap level, should be used.
Second, the current code does not 64-byte align the pitch of swizzled
textures. However, on my NV40 this
You are right. The patch is wrong. Both changes fix my program, but do
break OpenGL (e.g. redbook/mipmap).
I managed to reproduce the problem with perf/genmipmap.
When run, it causes several instances of one of these 3 errors (using
swizzled textures):
[12949.125732] [drm] nouveau :01:00.0:
I just coded a patch that does this and seems to work fine. It must be
fixed since it breaks OpenGL (or the state tracker can be changed, but
it seems better to do it in the driver).
The patch also fixes NV20 and NV30 in the same way. They compile but
are untested.
I would guess that using the
nVidia hardware seems to swizzle rectangular texture (with width !=
height) coordinates by swizzling the lower bits and then adding the
higher bits from the larger dimension.
However, nv04_swizzle_bits ignores width and height and just
interleaves everything.
This causes problems with rectangular
CMP and SCS can produce incorrect results if the source and
destination are the same.
This patch should fix the issues.
CMP is fixed by predicating both moves.
SCS by changing the order if the source component is X.
diff --git a/src/gallium/drivers/nv30/nv30_fragprog.c
83 matches
Mail list logo