from:"Matt Arsenault"

Re: [Mesa-dev] [PATCH] ac, radv: fix removing the vec3 restriction on SI

2019-06-03 Thread Matt Arsenault

> On Jun 3, 2019, at 9:13 AM, Samuel Pitoiset wrote: > > I thought LLVM was able to handle that itself but actually it > does not. That means we shouldn't try to emit vec3 on SI because > it's unsupported. > It should. Can you file a bug with an example that doesn’t work? > Fixes:

Re: [Mesa-dev] [PATCH] radv: enable denorms for 64-bit and 16-bit floats

2017-12-28 Thread Matt Arsenault

> On Dec 28, 2017, at 16:55, Samuel Pitoiset wrote: > > Similar to RadeonSI. > > This fixes: > dEQP-VK.image.texel_view_compatible.graphic.basic.attachment_read.bc*r16g16b16a16_sfloat > dEQP-VK.image.extended_usage_bit.attachment_write.r16_sfloat > > Signed-off-by:

Re: [Mesa-dev] [PATCH 0/5] Volatile and invariant LDS memory ops

2017-11-09 Thread Matt Arsenault

> On Nov 10, 2017, at 07:41, Marek Olšák wrote: > > Hi, > > This fixes the TCS gl_ClipDistance piglit failure that was uncovered > by a recent LLVM change. The solution is to set volatile on loads > and stores to enforce proper ordering. > > Please review. > Every LDS

Re: [Mesa-dev] [PATCH] radv: emit fmuladd instead of fma to llvm.

2017-10-04 Thread Matt Arsenault

> On Oct 4, 2017, at 12:50, Marek Olšák wrote: > > The LLVM backends selects MAD (unfused) for fmuladd, and FMA (fused) for fma. For f64 and f16 by default it will emit an FMA since mad doesn’t support denorms.___ mesa-dev mailing

Re: [Mesa-dev] [PATCH] radv: lower ffma in nir.

2017-10-03 Thread Matt Arsenault

> On Oct 3, 2017, at 13:58, Dave Airlie wrote: > > From: Dave Airlie > > So it appears the Vulkan SPIR-V fma opcode can be equivalent to a > mad operation, and the fma hw opcode on AMD hw is issued like a double > opcode so is slower. Also the radeonsi

Re: [Mesa-dev] [PATCH 3/6] ac/nir: rewrite local variable handling

2017-07-07 Thread Matt Arsenault

> On Jul 6, 2017, at 19:02, Connor Abbott <cwabbo...@gmail.com> wrote: > > On Thu, Jul 6, 2017 at 6:36 PM, Matt Arsenault <arse...@gmail.com> wrote: >> >> On Jul 6, 2017, at 18:31, Connor Abbott <cwabbo...@gmail.com> wrote: >> >> After lo

Re: [Mesa-dev] [PATCH 3/6] ac/nir: rewrite local variable handling

2017-07-06 Thread Matt Arsenault

> On Jul 6, 2017, at 18:31, Connor Abbott wrote: > > After looking into it some more, I think LLVM won't promote allocas to > registers at all when there are non-constant indices in the mix, and > fixing it seems kinda involved. I guess a better solution for now

Re: [Mesa-dev] [PATCH 3/4] ac/llvm: set xnack like radeonsi does.

2017-07-06 Thread Matt Arsenault

> On Jul 6, 2017, at 13:08, Dave Airlie <airl...@gmail.com> wrote: > > On 7 July 2017 at 05:07, Matt Arsenault <arse...@gmail.com> wrote: >> >>> On Jul 5, 2017, at 19:09, Dave Airlie <airl...@gmail.com> wrote: >>> >>> From: Dave A

Re: [Mesa-dev] [PATCH 3/4] ac/llvm: set xnack like radeonsi does.

2017-07-06 Thread Matt Arsenault

> On Jul 5, 2017, at 19:09, Dave Airlie wrote: > > From: Dave Airlie > > Use family, but only set xnack+ for gfx9. > The driver shouldn’t be explicitly setting this. This should be set as part of the subtarget chosen -Matt

Re: [Mesa-dev] [PATCH] radeonsi/gfx9: compile shaders with +xnack

2017-05-18 Thread Matt Arsenault

> On May 18, 2017, at 22:46, Marek Olšák wrote: > > From: Marek Olšák > > so that LLVM doesn't allocate SGPRs where XNACK is. > > Cc: 17.1 You shouldn’t be explicitly enabling xnack. This sounds like a workaround for

Re: [Mesa-dev] [PATCH] radv: flush f32->f16 conversion denormals to zero.

2017-03-16 Thread Matt Arsenault

> On Mar 16, 2017, at 20:02, Dave Airlie <airl...@gmail.com> wrote: > > From: Dave Airlie <airl...@redhat.com> > > SPIR-V defines the f32->f16 operation as flushing denormals to 0, > this compares the class using amd class opcode. > > Thank

Re: [Mesa-dev] [PATCH 10/24] radeonsi: replace SI.packf16 with amdgcn.cvt.pkrtz

2017-02-25 Thread Matt Arsenault

> On Feb 25, 2017, at 15:58, Marek Olšák wrote: > > } > + > +LLVMValueRef ac_emit_cvt_pkrtz_f16(struct ac_llvm_context *ctx, > +LLVMValueRef args[2]) > +{ > + if (HAVE_LLVM >= 0x0500) { > + LLVMTypeRef v2f16 = > +

Re: [Mesa-dev] [PATCH] radv/ac: enable loop unrolling.

2017-02-24 Thread Matt Arsenault

> On Feb 24, 2017, at 14:39, Marek Olšák <mar...@gmail.com> wrote: > > On Fri, Feb 24, 2017 at 7:20 PM, Matt Arsenault <arse...@gmail.com> wrote: >> >> On Feb 24, 2017, at 01:45, Marek Olšák <mar...@gmail.com> wrote: >> >> The main requi

Re: [Mesa-dev] [PATCH] radv/ac: enable loop unrolling.

2017-02-24 Thread Matt Arsenault

> On Feb 24, 2017, at 01:45, Marek Olšák wrote: > > The main requirement is that if there is indirect indexing inside a > loop, we always want to unroll the whole loop to get rid of the > indexing, which can decrease scratch usage. > > Marek We boost the unroll thresholds

Re: [Mesa-dev] [PATCH] radv/ac: enable loop unrolling.

2017-02-23 Thread Matt Arsenault

> On Feb 23, 2017, at 19:44, Dave Airlie <airl...@gmail.com> wrote: > > On 24 February 2017 at 13:36, Matt Arsenault <arse...@gmail.com > <mailto:arse...@gmail.com>> wrote: >> >> On Feb 23, 2017, at 19:27, Dave Airlie <airl...@gmail.com> wrote

Re: [Mesa-dev] [PATCH] radv/ac: enable loop unrolling.

2017-02-23 Thread Matt Arsenault

> On Feb 23, 2017, at 19:27, Dave Airlie wrote: > > +static void set_unroll_metadata(struct nir_to_llvm_context *ctx, > +LLVMValueRef br) > +{ > + unsigned kind = LLVMGetMDKindIDInContext(ctx->context, "llvm.loop", 9); > + LLVMValueRef

Re: [Mesa-dev] [PATCH 1/1] clover: Dump linked module to a different file

2017-02-22 Thread Matt Arsenault

> On Feb 22, 2017, at 07:51, Jan Vesely wrote: > > This allows to pass the generated files directly to llc or bugpoint. > Note that if program links multiple binaries they will still be in the same > file, the module name is "link”. Can you add a counter ID or

Re: [Mesa-dev] [PATCH] radeonsi: allow unaligned vertex buffer offsets and strides on CIK-VI

2017-02-14 Thread Matt Arsenault

> On Feb 13, 2017, at 09:01, Marek Olšák wrote: > > So that we can disable u_vbuf for GL core profiles. > > This is a v2 of the previous VI-only patch. > It requires SH_MEM_CONFIG.ALIGNMENT_MODE = UNALIGNED on CIK-VI. Is this enabled? I wasn’t sure, so currently LLVM assumes

Re: [Mesa-dev] GLSL IR & TGSI on-disk shader cache

2017-02-13 Thread Matt Arsenault

> On Feb 6, 2017, at 19:42, Timothy Arceri wrote: > > This series does not include the patch that adds cache support > to the radeonsi backend, the main reason for this is that llvm > currently doesn't allow the version to be queried at runtime > (as far as I'm aware)

Re: [Mesa-dev] Mesa (master): Revert "radeon/llvm: Use alloca instructions for larger arrays"

2016-07-26 Thread Matt Arsenault

> On Jul 26, 2016, at 14:37, Marek Olšák <mar...@gmail.com> wrote: > > On Sat, Jul 23, 2016 at 4:07 PM, Nicolai Hähnle <nhaeh...@gmail.com > <mailto:nhaeh...@gmail.com>> wrote: >> On 22.07.2016 12:08, Michel Dänzer wrote: >>> >>> On 21.07

Re: [Mesa-dev] Mesa (master): Revert "radeon/llvm: Use alloca instructions for larger arrays"

2016-07-21 Thread Matt Arsenault

> On Jul 21, 2016, at 01:03, Michel Dänzer wrote: > > On 21.07.2016 00:04, Michel Dänzer wrote: >> On 15.07.2016 05:15, Marek =?UNKNOWN?B?T2zFocOhaw==?= wrote: >>> Module: Mesa >>> Branch: master >>> Commit: f84e9d749fbb6da73a60fb70e6725db773c9b8f8 >>> URL: >>>

Re: [Mesa-dev] [PATCH 2/5] radeonsi: set dereferenceable attribute on descriptor arrays

2016-07-13 Thread Matt Arsenault

> On Jul 13, 2016, at 12:36, Marek Olšák wrote: > > On Wed, Jul 13, 2016 at 9:25 PM, Tom Stellard > wrote: >> On Wed, Jul 13, 2016 at 03:20:55PM -0400, Tom Stellard wrote: >>> On Tue, Jul 12, 2016 at 10:52:35PM +0200, Marek Olšák

Re: [Mesa-dev] [PATCH] radeonsi: add a debug flag for unsafe math LLVM optimizations

2016-06-13 Thread Matt Arsenault

> On Jun 13, 2016, at 09:27, Marek Olšák wrote: > > + { "unsafemath", DBG_UNSAFE_MATH, "Enable unsafe math shader > optimizations" }, Perhaps one for each of the individual fast math options as well (no nans, no signed zeros

Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-09 Thread Matt Arsenault

> On Feb 9, 2016, at 11:23, Tom Stellard wrote: > > We should still add +fp64-denormals even if the backend doesn't do > anything with it now. This is the default, so it doesn’t really matter anyway. -Matt___ mesa-dev mailing list

Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Matt Arsenault

> On Feb 8, 2016, at 08:08, Tom Stellard wrote: > > Do SI/CI support fp64 denorms? If so, won't this hurt performance? This is the only mode that should ever be used. I’m not sure why these are options. There technically are separate flush on input or flush on output

Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Matt Arsenault

> On Feb 8, 2016, at 08:08, Tom Stellard wrote: > > Do SI/CI support fp64 denorms? If so, won't this hurt performance? > > We should tell the compiler we are enabling fp-64 denorms by adding > +fp64-denormals to the feature string. It would also be better to > read the

Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Matt Arsenault

> On Feb 8, 2016, at 12:38, Marek Olšák wrote: > >> >> We should tell the compiler we are enabling fp-64 denorms by adding >> +fp64-denormals to the feature string. It would also be better to >> read the float_mode value from the config registers emitted by the >> compiler.

Re: [Mesa-dev] [PATCH 2/3] radeon/llvm: Set the target triple on the module

2016-02-04 Thread Matt Arsenault

> On Feb 4, 2016, at 13:02, Tom Stellard wrote: > > + LLVMSetTarget(ctx->gallivm.module, > + > +#if HAVE_LLVM < 0x0306 > + "r600--"); > +#else > + triple); > +#endif This alone does not set the datalayout, which should also

Re: [Mesa-dev] [PATCH 2/2] radeonsi: Allow dumping LLVM IR before optimization passes

2016-02-04 Thread Matt Arsenault

> On Feb 4, 2016, at 00:15, Nicolai Hähnle wrote: > > From: Nicolai Hähnle > > Set R600_DEBUG=preoptir to dump the LLVM IR before optimization passes, > to allow diagnosing problems caused by optimization passes. > > Note that in order to compile

Re: [Mesa-dev] [PATCH shader-db] si-report: Track max waves per CU

2016-01-05 Thread Matt Arsenault

> On Jan 5, 2016, at 07:28, Marek Olšák wrote: > > Hi, > > I'd like us to do this computation in Mesa, because it can be more > accurate there. The pixel shader wave count depends heavily on LDS, > because each interpolated input occupies 12 dwords of LDS per > primitive and

Re: [Mesa-dev] [PATCH 09/10] radeonsi: don't emit AMDGPU intrinsics for RSQ opcodes

2015-10-11 Thread Matt Arsenault

> On Oct 10, 2015, at 6:29 PM, Marek Olšák wrote: > > +/* This requires "unsafe-fp-math" for LLVM to convert it to RSQ. */ > +static void emit_rsq(const struct lp_build_tgsi_action *action, > + struct lp_build_tgsi_context *bld_base, > +

Re: [Mesa-dev] [PATCH 07/10] radeonsi: don't use the AMDGPU intrinsic for CMP

2015-10-11 Thread Matt Arsenault

> On Oct 10, 2015, at 6:29 PM, Marek Olšák wrote: > > The increase in VGPRs in unfortunate, but the decrease in the scratch size > is always welcome. Do you have a specific example where this happens you can post?___ mesa-dev

Re: [Mesa-dev] [PATCH] clover: Return the minimum required value for CL_DEVICE_SINGLE_FP_CONFIG

2015-03-06 Thread Matt Arsenault

On Mar 6, 2015, at 8:56 AM, Francisco Jerez curroje...@riseup.net wrote: Tom Stellard t...@stellard.net mailto:t...@stellard.net writes: On Thu, Mar 05, 2015 at 08:42:25PM +0200, Francisco Jerez wrote: Tom Stellard thomas.stell...@amd.com writes: This means dropping CL_FP_DENORM from

Re: [Mesa-dev] [PATCH] clover: Return the minimum required value for CL_DEVICE_SINGLE_FP_CONFIG

2015-03-05 Thread Matt Arsenault

On Mar 5, 2015, at 10:42 AM, Francisco Jerez curroje...@riseup.net wrote: Could you add that this is according to the OpenCL 1.1 specification? OpenCL 1.2 is even weaker (CL_FP_INF_NAN is not required, only one of CL_FP_ROUND_TO_ZERO or CL_FP_ROUND_TO_NEAREST is required, and no FP

Re: [Mesa-dev] [PATCH 2/9] radeonsi: use V_BFE for extracting a sample index

2015-03-05 Thread Matt Arsenault

On Mar 5, 2015, at 6:50 AM, Tom Stellard t...@stellard.net wrote: On Mon, Mar 02, 2015 at 02:09:29PM -0800, Matt Arsenault wrote: On Mar 2, 2015, at 1:19 PM, Tom Stellard t...@stellard.net wrote: On Mon, Mar 02, 2015 at 10:14:00PM +0100, Marek Olšák wrote: On Mon, Mar 2, 2015 at 10:05

Re: [Mesa-dev] [PATCH 2/9] radeonsi: use V_BFE for extracting a sample index

2015-03-02 Thread Matt Arsenault

On Mar 2, 2015, at 1:19 PM, Tom Stellard t...@stellard.net wrote: On Mon, Mar 02, 2015 at 10:14:00PM +0100, Marek Olšák wrote: On Mon, Mar 2, 2015 at 10:05 PM, Tom Stellard t...@stellard.net wrote: On Mon, Mar 02, 2015 at 12:54:16PM +0100, Marek Olšák wrote: From: Marek Olšák

Re: [Mesa-dev] [PATCH 2/3] clover: Enable cl_khr_fp64 for devices that support doubles v2

2015-02-26 Thread Matt Arsenault

On Feb 26, 2015, at 5:06 PM, Tom Stellard thomas.stell...@amd.com wrote: v2: - Report correct values for CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE and CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE. - Only define cl_khr_fp64 if the extension is supported. - Remove trailing space from extension

Re: [Mesa-dev] [PATCH] Revert radeon/llvm: enable unsafe math for graphics shaders

2015-02-18 Thread Matt Arsenault

On Feb 17, 2015, at 11:52 PM, Grigori Goronzy g...@chown.ath.cx wrote: Hi, AFAIR not enabling this makes LLVM generate really slow code in some common cases. Maybe this is just a bug in LLVM/R600 triggered by unsafe FP math optimization or some optimization is too eager. Other drivers do

Re: [Mesa-dev] [PATCH] Revert radeon/llvm: enable unsafe math for graphics shaders

2015-02-18 Thread Matt Arsenault

On Feb 18, 2015, at 1:15 AM, Michel Dänzer mic...@daenzer.net wrote: On 18.02.2015 17:13, Michel Dänzer wrote: On 18.02.2015 16:52, Grigori Goronzy wrote: What's the impact on performance with unsafe FP math disabled at this time? I don't know. Correctness trumps performance. FWIW,

Re: [Mesa-dev] [PATCH] radeonsi: force NaNs to 0

2014-12-10 Thread Matt Arsenault

On Dec 10, 2014, at 5:08 PM, Marek Olšák mar...@gmail.com wrote: From: Marek Olšák marek.ol...@amd.com This fixes incorrect rendering in Unreal Engine demos. I don't know why it's called dx10 clamp mode. MSDN doesn't mention it. Bugzilla:

Re: [Mesa-dev] [PATCH] radeonsi: use minnum and maxnum LLVM intrinsics for MIN and MAX opcodes

2014-11-22 Thread Matt Arsenault

On Nov 22, 2014, at 7:35 AM, Marek Olšák mar...@gmail.com wrote: AFAICS, the R600 backend doesn't implement the intrinsics for R600. Marek Should it? It’s trivial to switch to these for it, but I wasn’t sure what the actual semantics of its instructions were. There’s MAX and MAX_DX10,

Re: [Mesa-dev] [PATCH 05/10] clover: Add environment variables for dumping kernel code

2014-10-08 Thread Matt Arsenault

On Oct 6, 2014, at 12:44 PM, Tom Stellard thomas.stell...@amd.com wrote: --- .../state_trackers/clover/llvm/invocation.cpp | 74 ++ 1 file changed, 63 insertions(+), 11 deletions(-) diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp

Re: [Mesa-dev] [PATCH 5/5] clover: Enable cl_khr_fp64 for devices that support doubles v2

2014-08-13 Thread Matt Arsenault

On Jun 26, 2014, at 7:15 AM, Francisco Jerez curroje...@riseup.net wrote: Tom Stellard thomas.stell...@amd.com writes: v2: - Report correct values for CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE and CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE. - Only define cl_khr_fp64 if the extension is

Re: [Mesa-dev] [PATCH 0/2] clover: add clCompileProgram

2014-08-04 Thread Matt Arsenault

On Aug 4, 2014, at 8:03 AM, EdB edb+m...@sigluy.net wrote: Hello I'm done with the clCompile part of OpenCL 1.2. As you can see I use char* data to transfert data from core to llvm. At first I was thinking of using std class but we need to be binary safe when data are transfert

Re: [Mesa-dev] [PATCH 2/2] r600g: Pass dimension parameter to compute shader.

2014-07-31 Thread Matt Arsenault

On 07/31/2014 03:58 PM, Jan Vesely wrote: Would that work with things like one kernel calling another kernel? If we had a function called from two kernels how would it know where to look? I don't think this case can be handled as 2 separate kernels with the same calling convention. If a kernel

Re: [Mesa-dev] [PATCH 1/1] r600: Use llvm intrinsic to read work dimension information

2014-07-30 Thread Matt Arsenault

On Jul 30, 2014, at 4:11 PM, Jan Vesely jan.ves...@rutgers.edu wrote: +define i32 @get_work_dim() nounwind readnone alwaysinline { + %x = call i32 @llvm.r600.read.workdim() nounwind readnone + ret i32 %x +} -- Maybe this should have range metadata attached now that it applies to

Re: [Mesa-dev] [PATCH 1/1] R600: Add new intrinsic to read work dimensions

2014-07-30 Thread Matt Arsenault

On 07/30/2014 04:11 PM, Jan Vesely wrote: CC: Tom Stellard t...@stellard.net CC: Matt Arsenault matthew.arsena...@amd.com Signed-off-by: Jan Vesely jan.ves...@rutgers.edu --- include/llvm/IR/IntrinsicsR600.td| 2 ++ lib/Target/R600/R600ISelLowering.cpp | 6 -- 2 files changed, 6

Re: [Mesa-dev] [PATCH] R600/SI: Use i32 vectors for resources and samplers

2014-07-07 Thread Matt Arsenault

On Jul 7, 2014, at 8:28 AM, Marek Olšák mar...@gmail.com wrote: From: Marek Olšák marek.ol...@amd.com This affects new intrinsics only. What surprises me is that v32i8 still works. --- lib/Target/R600/SIInstructions.td | 4 +- lib/Target/R600/SIIntrinsics.td | 6

Re: [Mesa-dev] [PATCH 1/2] clover: Report a default value for CL_DEVICE_SINGLE_FP_CONFIG

2014-07-02 Thread Matt Arsenault

On Jul 2, 2014, at 12:48 PM, Tom Stellard thomas.stell...@amd.com wrote: --- src/gallium/state_trackers/clover/api/device.cpp | 3 +-- src/gallium/state_trackers/clover/core/device.cpp | 6 ++ src/gallium/state_trackers/clover/core/device.hpp | 1 + 3 files changed, 8 insertions(+), 2

Re: [Mesa-dev] [PATCH 1/2] clover: Report a default value for CL_DEVICE_SINGLE_FP_CONFIG

2014-07-02 Thread Matt Arsenault

On Jul 2, 2014, at 12:52 PM, Matt Arsenault arse...@gmail.com wrote: On Jul 2, 2014, at 12:48 PM, Tom Stellard thomas.stell...@amd.com wrote: --- src/gallium/state_trackers/clover/api/device.cpp | 3 +-- src/gallium/state_trackers/clover/core/device.cpp | 6 ++ src/gallium

Re: [Mesa-dev] [PATCH 5/5] clover: Enable cl_khr_fp64 for devices that support doubles

2014-06-21 Thread Matt Arsenault

On Jun 21, 2014, at 9:37 AM, Francisco Jerez curroje...@riseup.net wrote: Tom Stellard thomas.stell...@amd.com writes: --- src/gallium/state_trackers/clover/api/device.cpp | 4 +++- src/gallium/state_trackers/clover/core/device.cpp | 6 ++

Re: [Mesa-dev] [PATCH 5/5] clover: Enable cl_khr_fp64 for devices that support doubles

2014-06-21 Thread Matt Arsenault

On Jun 21, 2014, at 12:32 PM, Francisco Jerez curroje...@riseup.net wrote: Matt Arsenault arse...@gmail.com writes: On Jun 21, 2014, at 9:37 AM, Francisco Jerez curroje...@riseup.net wrote: Tom Stellard thomas.stell...@amd.com writes: [...] case CL_DEVICE_EXTENSIONS

Re: [Mesa-dev] [PATCH] radeon/llvm: Adapt to AMDGPU.rsq intrinsic change in LLVM 3.5

2014-06-19 Thread Matt Arsenault

On Jun 18, 2014, at 11:53 PM, Michel Dänzer mic...@daenzer.net wrote: From: Michel Dänzer michel.daen...@amd.com Signed-off-by: Michel Dänzer michel.daen...@amd.com --- src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 4 1 file changed, 4 insertions(+) diff --git

Re: [Mesa-dev] [PATCH 5/5] clover: Enable cl_khr_fp64 for devices that support doubles

2014-06-17 Thread Matt Arsenault

On Jun 17, 2014, at 3:11 PM, Bruno Jimenez brunoji...@gmail.com wrote: Hi, I have a couple of questions about this patch: 1) Could you please also change how the results of the 'CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE' and 'CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE' queries are generated?

Re: [Mesa-dev] [PATCH 1/2] R600/SI: add Gather4 intrinsics (v2)

2014-06-16 Thread Matt Arsenault

On 06/16/2014 08:45 AM, Tom Stellard wrote: You don't need to add new SDNodes for all these instructions, you can just use the intrinsic directly in the pattern. The only reason to add SDNodes, is if there are optimizations / special lowering we can do for these instructions. I kind of like

Re: [Mesa-dev] [PATCH] R600/SI: add Gather4 intrinsics

2014-06-08 Thread Matt Arsenault

On 06/06/2014 02:57 PM, Marek Olšák wrote: DMASK was repurposed for GATHER4, so all passes which modify DMASK are disabled by setting MIMG=0 and hasPostISelHook=0. See my Mesa patches for how DMASK works with GATHER4, because this is not documented anywhere. Can you add a comment explaining

Re: [Mesa-dev] [cfe-dev] 3 element vectors in opencl 1.1+

2014-04-22 Thread Matt Arsenault

On 04/22/2014 02:35 PM, Tom Stellard wrote: On Mon, Apr 21, 2014 at 10:02:27PM -0400, Jan Vesely wrote: Hi, I ran into a problem caused by this part of the OCL specs (6.1.5 Alignment of Types): For 3-component vector data types, the size of the data type is 4 * sizeof(component). and the

Re: [Mesa-dev] [cfe-dev] 3 element vectors in opencl 1.1+

2014-04-22 Thread Matt Arsenault

On 04/22/2014 05:22 PM, Jan Vesely wrote: On Tue, 2014-04-22 at 14:40 -0700, Matt Arsenault wrote: On 04/22/2014 02:35 PM, Tom Stellard wrote: On Mon, Apr 21, 2014 at 10:02:27PM -0400, Jan Vesely wrote: Hi, I ran into a problem caused by this part of the OCL specs (6.1.5 Alignment of Types

Re: [Mesa-dev] [PATCH] R600: Verify all instructions in the AsmPrinter on debug builds

2014-02-25 Thread Matt Arsenault

, Tom Stellard t...@stellard.net wrote: On Tue, Feb 25, 2014 at 01:47:17PM -0800, Matt Arsenault wrote: On 02/25/2014 01:42 PM, Tom Stellard wrote: +errs() Please file a bug a bugs.freedesktop.org\n; Typo, s/a/at/ Thanks, I will fix this before I commit. -Tom

Re: [Mesa-dev] [PATCH] R600: Verify all instructions in the AsmPrinter on debug builds

2014-02-25 Thread Matt Arsenault

On 02/25/2014 01:42 PM, Tom Stellard wrote: +errs() Please file a bug a bugs.freedesktop.org\n; Typo, s/a/at/ ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] R600/SI: Custom select 64-bit ADD

2014-02-13 Thread Matt Arsenault

On Feb 7, 2014, at 7:46 AM, Tom Stellard t...@stellard.net wrote: From: Tom Stellard thomas.stell...@amd.com --- lib/Target/R600/AMDGPUISelDAGToDAG.cpp | 48 ++ lib/Target/R600/SIISelLowering.cpp | 29 lib/Target/R600/SIISelLowering.h

Re: [Mesa-dev] [PATCH] R600/SI: Split global vector loads with more than 4 elements

2014-02-10 Thread Matt Arsenault

Why would you want to do this for the small types? You should be able to load those in fewer loads and then promote them. On 02/10/2014 01:32 PM, Tom Stellard wrote: From: Tom Stellard thomas.stell...@amd.com --- lib/Target/R600/SIISelLowering.cpp | 8 +- test/CodeGen/R600/load.ll

Re: [Mesa-dev] [PATCH] R600/SI: Custom select 64-bit ADD

2014-02-08 Thread Matt Arsenault

I didn't think to try this. Where is the address folding happening? On 02/07/2014 07:46 AM, Tom Stellard wrote: From: Tom Stellard thomas.stell...@amd.com --- lib/Target/R600/AMDGPUISelDAGToDAG.cpp | 48 ++ lib/Target/R600/SIISelLowering.cpp | 29

Re: [Mesa-dev] PATCH: R600 + SI Private memory fixes; Use more SALU instructions on SI

2013-10-10 Thread Matt Arsenault

On 10/10/2013 10:55 AM, Tom Stellard wrote: Hi, The attached patches simplify the handling of OpenCL private memory space for VLIW4/VLIW5 GPUs and should fix a crash with pyrit on r600g. Also included in the series is private memory support on SI as well as an optimization to prefer selecting

Re: [Mesa-dev] [PATCH 2/2] R600/SI: FMA is faster than fmul and fadd for f64

2013-08-09 Thread Matt Arsenault

On 08/09/2013 05:59 AM, Niels Ole Salscheider wrote: +bool SITargetLowering::isFMAFasterThanFMulAndFAdd(EVT VT) const { + VT = VT.getScalarType(); + + if (!VT.isSimple()) +return false; + + switch (VT.getSimpleVT().SimpleTy) { + case MVT::f32: +return false; /* There is V_MAD_F32 for

Re: [Mesa-dev] R600 Patches: Add support for the local address space

2013-06-12 Thread Matt Arsenault

On 06/12/2013 05:42 PM, Tom Stellard wrote: Hi, The attached patches add support for local address space on Evergreen / Northern Islands GPUs. Please Review. -Tom + def int_AMDGPU_barrier_local : Intrinsic[], [], []; You probably want to mark this as IntrReadMem to try to avoid reordering

66 matches

Mail list logo