According to ISA docs, the range is 1..64, so effectively
bytes_to_fetch-1.
---
src/gallium/drivers/r600/r600_shader.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/gallium/drivers/r600/r600_shader.c
b/src/gallium/drivers/r600/r600_shader.c
index 81ed3ce..0444579
On 31.05.2013 14:37, Vadim Girlin wrote:
There are no regressions on evergreen with piglit tests or any
other apps that I tested, with and without llvm backend.
(Issue with Unigine Heaven that I mentioned on #dri-devel
yesterday was in fact caused by my own well-hidden bug, now it's fixed).
This is my first try to contribute anything useful to Mesa, so please
bear with me. This is not finished, but I'd like feedback to make sure
the code's quality and style is in line with what is expected in Mesa.
___
mesa-dev mailing list
Allows MSAA colorbuffers, which have a CMASK automatically and don't
need any further special handling, to be fast cleared. Instead
of clearing the buffer, set the clear color and the CMASK to the
cleared state.
---
src/gallium/drivers/r600/evergreen_state.c | 8 +++-
On 08.06.2013 00:40, Marek Olšák wrote:
Also the fast clear
shouldn't be used for array, cube, and 3D textures unless all layers
are cleared together.
OK. I hadn't really thought about these.
One more thing. If you don't use piglit, I recommend using it before
sending patches to the mailing
Allows MSAA colorbuffers, which have a CMASK automatically and don't
need any further special handling, to be fast cleared. Instead
of clearing the buffer, set the clear color and the CMASK to the
cleared state.
Fast clear is used only when all bound colorbuffers fulfill certain
conditions: a
On 11.06.2013 02:41, Marek Olšák wrote:
+
+ /* cannot pack color, needs support in u_format */
+ if (desc-pack_rgba_float == NULL) {
+ return false;
+ }
Hi Grirogi,
Is this for disallowing integer textures? You probably
Allows MSAA colorbuffers, which have a CMASK automatically and don't
need any further special handling, to be fast cleared. Instead
of clearing the buffer, set the clear color and the CMASK to the
cleared state.
Fast clear is used only when all bound colorbuffers fulfill certain
conditions: a
On 12.06.2013 00:04, Grigori Goronzy wrote:
Allows MSAA colorbuffers, which have a CMASK automatically and don't
need any further special handling, to be fast cleared. Instead
of clearing the buffer, set the clear color and the CMASK to the
cleared state.
Fast clear is used only when all bound
This interface is used to expand fast-cleared window system
colorbuffers.
---
src/gallium/include/pipe/p_context.h | 8
src/gallium/state_trackers/dri/common/dri_drawable.c | 4
src/gallium/state_trackers/dri/drm/dri2.c| 8 ++--
3 files changed, 18
---
src/gallium/drivers/r600/evergreen_state.c | 24 +++-
src/gallium/drivers/r600/r600_hw_context.c | 12 +---
src/gallium/drivers/r600/r600_resource.h | 3 +++
src/gallium/drivers/r600/r600_texture.c| 25 -
4 files changed, 55
Allocate a CMASK on demand and use it to fast clear single-sample
colorbuffers. Both FBOs and window system colorbuffers are fast
cleared. Expand as needed when colorbuffers are mapped or displayed
on screen.
---
src/gallium/drivers/r600/evergreen_state.c | 11
On 12.07.2013 16:19, Jose Fonseca wrote:
I admit I haven't fully understood what's being proposed yet. But just a few
quick words.
I always wanted to have a present method that ensures that the contents of a
resource is made visible to whatever the consumer is (full-screen flip, blit to
On 16.07.2013 19:26, Marek Olšák wrote:
Surprisingly all drivers supporting MSAA can already do this (r300g and r600g
for sure) and I think Christoph wanted to have this feature for his Nouveau
drivers anyway.
OK, they can do it, but is it actually any faster than doing a resolve
and regular
On 17.07.2013 02:05, Marek Olšák wrote:
No, it's not faster, but it's not slower either.
Now that I think about it, I can't come up with a good shader-based
algorithm for the resolve operation.
I don't think Christoph's approach that an MSAA texture can be viewed
as a larger single-sample
From: Marek Olšák mar...@gmail.com
r600g needs explicit flushing before DRI2 buffers are presented on the screen.
v2: add (stub) implementations for all drivers, fix frontbuffer flushing
---
src/gallium/docs/source/context.rst | 13 +
---
src/gallium/drivers/r600/evergreen_state.c | 24 +++-
src/gallium/drivers/r600/r600_hw_context.c | 12 +---
src/gallium/drivers/r600/r600_resource.h | 3 +++
src/gallium/drivers/r600/r600_texture.c| 25 -
4 files changed, 55
Allocate a CMASK on demand and use it to fast clear single-sample
colorbuffers. Both FBOs and window system colorbuffers are fast
cleared. Expand as needed when colorbuffers are mapped or displayed
on screen.
---
src/gallium/drivers/r600/evergreen_state.c | 11 +
On 09.09.2013 16:09, Marek Olšák wrote:
/* Check colorbuffers. */
for (i = 0; i rctx-framebuffer.state.nr_cbufs; i++) {
+ struct r600_texture *tex =
+ (struct
r600_texture*)rctx-framebuffer.state.cbufs[i]-texture;
+
Please check if
Allocate a CMASK on demand and use it to fast clear single-sample
colorbuffers. Both FBOs and window system colorbuffers are fast
cleared. Expand as needed when colorbuffers are mapped or displayed
on screen.
v2: cosmetics, move transfer expansion into dma_blit
---
v2: check for NULL cbufs
---
src/gallium/drivers/r600/evergreen_state.c | 24 +++-
src/gallium/drivers/r600/r600_hw_context.c | 18 ++
src/gallium/drivers/r600/r600_resource.h | 3 +++
src/gallium/drivers/r600/r600_texture.c| 25 -
From: Marek Olšák mar...@gmail.com
r600g needs explicit flushing before DRI2 buffers are presented on the screen.
v2: add (stub) implementations for all drivers, fix frontbuffer flushing
v3: fix galahad
---
src/gallium/docs/source/context.rst | 13 +
On 30.09.2013 10:06, Michel Dänzer wrote:
On Son, 2013-09-29 at 22:34 +0200, Dieter Nützel wrote:
after latest git pull I've only MPEG1, MPEG2_SIMPLE and MPEG2_MAIN with
my RV730 (AGP).
Same problem on PALM. Bisection shows that it is caused by commit
68f6dec32. The initialization order
UVD was checked before the info fields were initialized. Introduced
by commit 68f6dec32.
---
src/gallium/drivers/r600/r600_pipe.c | 13 +++--
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/src/gallium/drivers/r600/r600_pipe.c
b/src/gallium/drivers/r600/r600_pipe.c
index
Fixes regression on r600g due to fast clear introduced by commit
edbbfac6.
---
src/gallium/state_trackers/egl/x11/native_dri2.c | 11 +++
1 file changed, 11 insertions(+)
diff --git a/src/gallium/state_trackers/egl/x11/native_dri2.c
b/src/gallium/state_trackers/egl/x11/native_dri2.c
UVD can only support NV12 in the case of hardware decoding, but we
can still use all other formats for software decoding. Use the UNKNOWN
entrypoint to signal that we're not interesting in hardware decoding.
---
src/gallium/drivers/radeon/radeon_uvd.c | 7 +--
MPEG-2 and later video standards align the chroma sample position
horizontally with the leftmost luma sample position. Add a half-texel
offset to the chroma texture sampling coordinate to sample at the
this position instead of sampling in the center between the luma
texels. This avoids minor color
All texture instructions can use offsets, not just TXF. Offsets into
the literals array were wrong, too.
---
src/gallium/drivers/r600/r600_shader.c | 20 ++--
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/src/gallium/drivers/r600/r600_shader.c
On 03.10.2013 00:12, Grigori Goronzy wrote:
All texture instructions can use offsets, not just TXF. Offsets into
the literals array were wrong, too.
BTW, I just noticed it now: this fixes the fs-textureOffset-2D piglit
test, which unfortunately does not appear to be part of any of the test
On 07.10.2013 11:25, Christian König wrote:
Am 01.10.2013 21:12, schrieb Ilia Mirkin:
On Tue, Oct 1, 2013 at 3:06 PM, Grigori Goronzy g...@chown.ath.cx
wrote:
UVD can only support NV12 in the case of hardware decoding, but we
can still use all other formats for software decoding. Use
UVD can only support NV12 in the case of hardware decoding, but we
can still use all other formats for software decoding. Use the UNKNOWN
profile to signal that we're not interesting in hardware decoding.
v2: use profile instead of entrypoint
---
src/gallium/drivers/radeon/radeon_uvd.c | 7
The DPB size calculations seem to be off; there is various random
corruption happening, even with advanced profile. Always assuming
a minimum number of references appears to fix it, similarly to
H.264. This might overallocate the DPB. Also clean up the SPS/PPS
field setup so that it matches VC-1
As per API specification, it is legal to supply a NULL procamp. In this
case, a CSC matrix according to the colorspace should be generated,
but no further adjustments are made.
Addresses:
https://trac.videolan.org/vlc/ticket/9281
https://bugs.freedesktop.org/show_bug.cgi?id=68792
---
OutputSurfaces have simple YCbCr rendering functionality built in,
but so far only 4:2:0 subsampling worked correctly. This fixes 4:2:2
and 4:4:4 formats.
---
src/gallium/state_trackers/vdpau/output.c| 2 +-
src/gallium/state_trackers/vdpau/vdpau_private.h | 23 +++
2
pipe_screen::fence_finish with zero timeout returns quickly and
doesn't wait at all. Fix that, and also delete the fence afterwards,
so that QuerySurfaceStatus returns the right state later.
Addresses:
https://trac.videolan.org/vlc/ticket/9281
https://bugs.freedesktop.org/show_bug.cgi?id=68792
Add simple plain C routines for NV12-YV12 and YUYV-UYVY
conversions. The NV12-YV12 conversion is commonly used, for instance
by VLC.
---
src/gallium/state_trackers/vdpau/surface.c | 125 +++--
1 file changed, 117 insertions(+), 8 deletions(-)
diff --git
R600_RESOURCE_FLAG_TRANSFER forces direct mapping, and reading from
VRAM is simply too slow. VDPAU GetBitsYCbCr is unusuable. Change to
the new PIPE_BIND_LINEAR and adjust r600_transfer_map so that it uses
a staging texture.
---
src/gallium/drivers/r600/r600_uvd.c | 6 +++---
On 10.10.2013 11:41, Christian König wrote:
Am 09.10.2013 22:19, schrieb Grigori Goronzy:
R600_RESOURCE_FLAG_TRANSFER forces direct mapping, and reading from
VRAM is simply too slow. VDPAU GetBitsYCbCr is unusuable. Change to
the new PIPE_BIND_LINEAR and adjust r600_transfer_map so that it uses
We should be able to safely set the framebuffer state without a
fragment shader bound. bind_ps_state will take care of updating the
necessary state bits later.
---
src/gallium/drivers/r600/evergreen_state.c | 4 +++-
src/gallium/drivers/r600/r600_state.c | 4 +++-
2 files changed, 6
We should be able to safely set the framebuffer state without a
fragment shader bound. bind_ps_state will take care of updating the
necessary state bits later.
v2: check in update_db_shader_control
---
src/gallium/drivers/r600/evergreen_state.c | 23 +++
Textures that likely reside in VRAM, are mapped for reading and
don't require direct mapping should be staged into GTT, to avoid bad
performance. This fixes readback performance of VDPAU surfaces.
---
src/gallium/drivers/radeon/r600_texture.c | 6 ++
1 file changed, 6 insertions(+)
diff
This new bind flag forces linear storage, but does not have other
side effects like R600_RESOURCE_FLAG_TRANSFER.
---
src/gallium/drivers/r600/r600_uvd.c | 6 +++---
src/gallium/drivers/radeonsi/radeonsi_uvd.c | 8
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git
Add simple plain C routines for NV12-YV12 and YUYV-UYVY
conversions. The NV12-YV12 conversion is commonly used, for instance
by VLC.
---
src/gallium/state_trackers/vdpau/surface.c | 125 +++--
1 file changed, 117 insertions(+), 8 deletions(-)
diff --git
On 26.10.2013 16:31, Peter Frühberger wrote:
Hi,
I looked at the openmax decoder posted yesterday and have seen that
only two fields are missing to also decode hi10p with the current
vdpau uvd infrastructure in place.
I mailed two patches to the vdpau mailing list in order to get the API
Otherwise OutputSurface interop has funny results sometimes.
This fixes interop with the mpv media player.
---
src/gallium/state_trackers/vdpau/output.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/gallium/state_trackers/vdpau/output.c
b/src/gallium/state_trackers/vdpau/output.c
index
, but I don't
know if this is a realistic goal.
Best regards
Grigori
Thanks for the help,
Christian.
Am 06.11.2013 00:35, schrieb Grigori Goronzy:
Otherwise OutputSurface interop has funny results sometimes.
This fixes interop with the mpv media player.
---
src/gallium/state_trackers/vdpau
From: Vadim Girlin vadimgir...@gmail.com
v2: make it actually work, improve condition
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68503
Cc: 10.0 mesa-sta...@lists.freedesktop.org
Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---
src/gallium/drivers/r600/sb/sb_bc.h|
Grigori
From 386dc4f201a65a2a8740c8c9f4a039d5c8209a9c Mon Sep 17 00:00:00 2001
From: Grigori Goronzy g...@chown.ath.cx
Date: Sun, 24 Nov 2013 20:24:58 +0100
Subject: [PATCH] WIP: fix unnamed struct type conflicts
If two shader stages define the same unnamed struct type, they will
conflict
---
src/glsl/glsl_types.cpp | 61 +++--
src/glsl/glsl_types.h | 7 ++
2 files changed, 41 insertions(+), 27 deletions(-)
diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp
index f740130..6c9727e 100644
--- a/src/glsl/glsl_types.cpp
Unnamed record types are assigned to separate types per stage, e.g.
uniform struct { ... } a;
if defined in both vertex and fragment shader, will result in two
separate types of different name. When linking the shader, this
results in a type conflict. However, there is no reason why this
should
Ping? Can anyone review this, please?
Grigori
On 27.11.2013 00:15, Grigori Goronzy wrote:
---
src/glsl/glsl_types.cpp | 61 +++--
src/glsl/glsl_types.h | 7 ++
2 files changed, 41 insertions(+), 27 deletions(-)
diff --git a/src/glsl
On 04.02.2014 00:53, Dave Airlie wrote:
From: Dave Airlie airl...@redhat.com
attempt to calculate a better value for array size to avoid breaking apps.
Signed-off-by: Dave Airlie airl...@redhat.com
---
src/gallium/drivers/r600/r600_shader.c | 2 +-
1 file changed, 1 insertion(+), 1
On 05.02.2014 18:08, Jose Fonseca wrote:
I honestly hope that GL_AMD_pinned_memory doesn't become popular. It would have
been alright if it wasn't for this bit in
http://www.opengl.org/registry/specs/AMD/pinned_memory.txt which says:
2) Can the application still use the buffer using the
---
src/gallium/drivers/freedreno/freedreno_screen.c | 5 +
src/gallium/drivers/i915/i915_screen.c | 5 +
src/gallium/drivers/ilo/ilo_screen.c | 3 +++
src/gallium/drivers/llvmpipe/lp_screen.c | 3 +++
src/gallium/drivers/nouveau/nv30/nv30_screen.c | 2 ++
On 06.02.2014 02:46, Michel Dänzer wrote:
+ case PIPE_CAP_MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS:
+ return 16384;
radeonsi currently can't handle more than 4095 total output components,
as the buffer resource for writing to the GSVS ring only has 14 bits for
the stride in
v2: adjust limits for radeonsi and llvmpipe
---
src/gallium/drivers/freedreno/freedreno_screen.c | 5 +
src/gallium/drivers/i915/i915_screen.c | 5 +
src/gallium/drivers/ilo/ilo_screen.c | 3 +++
src/gallium/drivers/llvmpipe/lp_screen.c | 3 +++
v2: adjust limits for radeonsi and llvmpipe
v3: add documentation
Cc: 10.1 mesa-sta...@lists.freedesktop.org
---
src/gallium/docs/source/screen.rst | 6 ++
src/gallium/drivers/freedreno/freedreno_screen.c | 5 +
src/gallium/drivers/i915/i915_screen.c | 5 +
/vl_deint_filter.c
b/src/gallium/auxiliary/vl/vl_deint_filter.c
new file mode 100644
index 000..9b05154
--- /dev/null
+++ b/src/gallium/auxiliary/vl/vl_deint_filter.c
@@ -0,0 +1,491 @@
+/**
+ *
+ * Copyright 2013 Grigori Goronzy g
---
src/gallium/state_trackers/vdpau/mixer.c | 69 ++--
src/gallium/state_trackers/vdpau/query.c | 1 +
src/gallium/state_trackers/vdpau/vdpau_private.h | 7 +++
3 files changed, 73 insertions(+), 4 deletions(-)
diff --git
On 15.02.2014 13:14, Andy Furniss wrote:
Thanks Grigori for doing this - looks really good on HD stuff I've
tested and of course is easily fast enough, unlike anything on the CPU
at high res.
Any plans for the future?
Well, adding edge-guided spatial interpolation for the temporal-spatial
---
src/gallium/state_trackers/vdpau/mixer.c | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/gallium/state_trackers/vdpau/mixer.c
b/src/gallium/state_trackers/vdpau/mixer.c
index 996fd8e..e6bfb8c 100644
--- a/src/gallium/state_trackers/vdpau/mixer.c
+++
The spec incorrectly used void as return type, when it should have
been GLboolean. This has now been fixed. According to Nvidia, their
implementation always used GLboolean.
---
include/GL/glext.h | 2 +-
src/mapi/glapi/gen/NV_vdpau_interop.xml | 1 +
src/mesa/main/vdpau.c
On 10.04.2014 11:23, Michel Dänzer wrote:
From: Michel Dänzer michel.daen...@amd.com
---
This is just an RFC; if other developers approve of this approach, I can
make a more extensive patch removing the use_reusable_pool parameters.
The x11perf numbers below compare ShmGet/PutImage before and
On 20.04.2014 03:02, Marek Olšák wrote:
It looks like the check is not needed with SB, because SB performs
register allocation. What happens if you comment out the conditional
which fails?
SB takes the machine code generated by the classic compiler as input,
so the check is still needed. The
We need this for radeonsi, and it might be useful for other drivers,
too.
---
src/gallium/auxiliary/util/u_format.c | 11 +++
src/gallium/auxiliary/util/u_format.h | 3 +++
src/gallium/drivers/r600/r600_blit.c | 12 +---
3 files changed, 15 insertions(+), 11 deletions(-)
diff
This makes 4:2:2 video surfaces work in VDPAU.
---
src/gallium/drivers/radeon/r600_texture.c | 5 +-
src/gallium/drivers/radeonsi/si_blit.c| 91 ++-
src/gallium/drivers/radeonsi/si_state.c | 15 +
3 files changed, 71 insertions(+), 40 deletions(-)
diff
It's about as broken as on later UVD revisions.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66452
Cc: 10.1 10.2 mesa-sta...@lists.freedesktop.org
---
src/gallium/drivers/radeon/radeon_video.c | 5 -
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git
Ping? I'm not sure if this is completely correct, but this code path is
only excercised by VDPAU and it seems to work fine on SI.
Grigori
On 04.06.2014 18:54, Grigori Goronzy wrote:
This makes 4:2:2 video surfaces work in VDPAU.
---
src/gallium/drivers/radeon/r600_texture.c | 5 +-
src
Olšák marek.ol...@amd.com
Marek
On Wed, Jun 4, 2014 at 6:54 PM, Grigori Goronzy g...@chown.ath.cx
wrote:
This makes 4:2:2 video surfaces work in VDPAU.
---
src/gallium/drivers/radeon/r600_texture.c | 5 +-
src/gallium/drivers/radeonsi/si_blit.c| 91
On 02.07.2014 22:18, Andy Furniss wrote:
Before I knew how to get field sync to use my TVs deinterlacer I had to
modify mesa so that I could use the vdpau de-interlacer(s), when I did
this I noticed that 422 didn't work and looked the same as it does now
this has gone in with my si.
Are
On 17.07.2014 12:01, Michel Dänzer wrote:
From: Michel Dänzer michel.daen...@amd.com
This is hopefully safe: The kernel makes sure writes to these mappings
finish before the GPU might start reading from them, and the GPU caches
are invalidated at the start of a command stream.
Aren't CPU
Accuracy of some operations was recently improved in the R600 backend,
at the cost of slower code. This is required for compute shaders,
but not for graphics shaders. Add unsafe-fp-math hint to make LLVM
generate faster but possibly less accurate code.
Piglit didn't indicate any regressions.
---
Use KR and same indent as most other code. No functional change
intended.
---
src/gallium/drivers/radeon/radeon_llvm_emit.c | 24 ++--
1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/src/gallium/drivers/radeon/radeon_llvm_emit.c
On 18.07.2014 13:45, Marek Olšák wrote:
If the requirements of GL_MAP_COHERENT_BIT are satisfied, then the
patch is okay.
Apart from correctness, I still wonder how this will affect performance,
most notably CPU reads. This change unconditionally uses write-combined,
uncached memory for
On 17.07.2014 21:24, Tom Stellard wrote:
On Thu, Jul 17, 2014 at 06:44:25PM +0200, Grigori Goronzy wrote:
Accuracy of some operations was recently improved in the R600 backend,
at the cost of slower code. This is required for compute shaders,
but not for graphics shaders. Add unsafe-fp-math
Passes corrected piglit test and should also handle signed vs unsigned
float correctly.
---
src/gallium/drivers/radeonsi/si_state.c | 20
1 file changed, 20 insertions(+)
diff --git a/src/gallium/drivers/radeonsi/si_state.c
b/src/gallium/drivers/radeonsi/si_state.c
index
Passes all piglit tests.
v2: rebased
---
src/gallium/drivers/radeonsi/si_state.c | 20
1 file changed, 20 insertions(+)
diff --git a/src/gallium/drivers/radeonsi/si_state.c
b/src/gallium/drivers/radeonsi/si_state.c
index 6e9a60a..4f7adea 100644
---
On 04.07.2014 01:24, Andy Furniss wrote:
Maybe not 1/frame but anyway the first couple of a run have numbers
rather than s
[27977.386795] radeon :01:00.0: GPU fault detected: 146 0x0c035014
[27977.386800] radeon :01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR
0x15E0
On 29.08.2014 10:19, Christian König wrote:
That sounds like something doesn't work correctly.
The resources are created with the subsamled formats R8G8_R8B8 or
G8R8_B8R8, but since this can't be accessed by the CB we need to use
R8G8B8A8 as surface format for writing to them.
If that
On 29.08.2014 12:31, Andy Furniss wrote:
As for that 4:2:2 doesn't work, AFAICT it absolutely does, but
there is no linear interpolation for chroma, so quality isn't ideal.
This seems to be a hardware restriction, unfortunately.
Hmm, we may have to disagree on the definition of working here
On 08.09.2014 14:50, Axel Davy wrote:
Hi,
When reading si_dma.c code, it looks like the requested width of the
copy is ignored except for PIPE_BUFFER.
Perhaps that explains the bugs observed ?
It isn't ignored. Partial DMA copies (i.e. operations that do not copy
whole lines) are simply
On 08.09.2014 21:07, Axel Davy wrote:
On 08/09/2014 20:21, Grigori Goronzy wrote :
On 08.09.2014 14:50, Axel Davy wrote:
Hi,
When reading si_dma.c code, it looks like the requested width of the
copy is ignored except for PIPE_BUFFER.
Perhaps that explains the bugs observed ?
It isn't
LGTM, but I have a comments below.
Grigori
On 10.09.2014 10:54, Michel Dänzer wrote:
From: Michel Dänzer michel.daen...@amd.com
Signed-off-by: Michel Dänzer michel.daen...@amd.com
---
This might help for investigating DMA related bugs.
src/gallium/drivers/radeonsi/si_dma.c | 103
On 30.09.2014 05:58, Michel Dänzer wrote:
diff --git a/src/gallium/drivers/radeonsi/si_dma.c
b/src/gallium/drivers/radeonsi/si_dma.c
index ff64722..643ce3f 100644
--- a/src/gallium/drivers/radeonsi/si_dma.c
+++ b/src/gallium/drivers/radeonsi/si_dma.c
@@ -251,7 +251,9 @@ void
Reviewed-by: Grigori Goronzy g...@chown.ath.cx
I've been using a similar patch to fix stability issues on my machine
for quite a while. Still, it's a pity we have to go that far to get
everything stable again.
On 13.11.2014 07:52, Michel Dänzer wrote:
From: Michel Dänzer michel.daen...@amd.com
Hi,
AFAIR not enabling this makes LLVM generate really slow code in some
common cases. Maybe this is just a bug in LLVM/R600 triggered by unsafe
FP math optimization or some optimization is too eager. Other drivers do
fine with these types of optimization.
What's the impact on performance with
Am 2015-02-18 09:13, schrieb Michel Dänzer:
On 18.02.2015 16:52, Grigori Goronzy wrote:
Hi,
AFAIR not enabling this makes LLVM generate really slow code in some
common cases. Maybe this is just a bug in LLVM/R600 triggered by
unsafe
FP math optimization or some optimization is too eager
This flag is typically used to request pinned host memory, to avoid
any copies between GPU and CPU.
This improves throughput with an older OpenCL app which I unfortunately
can't publish due to its licensing.
---
src/gallium/state_trackers/clover/core/resource.cpp | 4
1 file changed, 4
According to spec, CL_MEM_USE_HOST_PTR should directly use host memory,
if possible. This is just what userptr is for, so use it.
In case the memory cannot be mapped, a fallback similar to
CL_MEM_COPY_HOST_PTR is used.
---
src/gallium/state_trackers/clover/core/memory.cpp | 2 +-
On 28.05.2015 13:04, Grigori Goronzy wrote:
We need this to implement OpenCL's
CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE.
---
Ping?
src/gallium/docs/source/screen.rst | 2 ++
src/gallium/drivers/ilo/ilo_screen.c | 8
src/gallium/drivers/nouveau/nvc0
On 28.05.2015 10:10, Grigori Goronzy wrote:
Wrap MapBuffer and MapImage as hard_event actions, like other
operations. This enables correct profiling. Also make sure to wait
for events to finish when blocking is requested by the caller.
---
Ping?
src/gallium/state_trackers/clover/api
On 2015-06-09 22:52, Francisco Jerez wrote:
+
+ if (blocking)
+ hev().wait();
+
hard_event::wait() may fail, so this should probably be done before the
ret_object() call to avoid leaks.
Alright... C++ exceptions are a minefield. :)
Is there any reason you didn't make
the same change
On 2015-05-28 13:04, Grigori Goronzy wrote:
Work-group size should always be aligned to subgroup size; this is a
basic requirement, otherwise some work-items will be no-operation.
It might make sense to refine the value according to a kernel's
resource usage, but that's a possible optimization
Wrap MapBuffer and MapImage as hard_event actions, like other
operations. This enables correct profiling. Also make sure to wait
for events to finish when blocking is requested by the caller.
---
src/gallium/state_trackers/clover/api/transfer.cpp | 50 --
1 file changed, 46
Mapping can fail, and this should be handled. Return the proper error
code and abort the associated event in this case.
---
src/gallium/state_trackers/clover/api/transfer.cpp | 16 ++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git
We need this to implement OpenCL's
CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE.
---
src/gallium/docs/source/screen.rst | 2 ++
src/gallium/drivers/ilo/ilo_screen.c | 8
src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 4
Work-group size should always be aligned to subgroup size; this is a
basic requirement, otherwise some work-items will be no-operation.
It might make sense to refine the value according to a kernel's
resource usage, but that's a possible optimization for the future.
---
On 23.05.2015 15:53, Francisco Jerez wrote:
diff --git a/src/gallium/state_trackers/clover/core/resource.cpp
b/src/gallium/state_trackers/clover/core/resource.cpp
index 8ed4c42..8e51b3c 100644
--- a/src/gallium/state_trackers/clover/core/resource.cpp
+++
the same issues as SI? We should really
try to figure out what's wrong with tiled DMA copies.
Anyway,
Reviewed-by: Grigori Goronzy g...@chown.ath.cx
Signed-off-by: Michel Dänzer michel.daen...@amd.com
---
src/gallium/drivers/radeonsi/Makefile.sources | 1 +
src/gallium/drivers/radeonsi
Hi,
On 23.09.2015 10:11, Christian König wrote:
> From: Boyuan Zhang
>
> Signed-off-by: Boyuan Zhang
> Reviewed-by: Christian König
> ---
Thanks, nice to see this finally getting fixed, and it was a pretty
simple thing
1 - 100 of 160 matches
Mail list logo