RE: [ARM] PR66791: Replace calls to builtin in vmul_n (a, b) intrinsics with __a * __b

2021-06-30 Thread Kyrylo Tkachov via Gcc-patches
Hi Prathamesh,

> -Original Message-
> From: Prathamesh Kulkarni 
> Sent: 29 June 2021 08:22
> To: gcc Patches 
> Cc: Kyrylo Tkachov 
> Subject: Re: [ARM] PR66791: Replace calls to builtin in vmul_n (a, b) 
> intrinsics
> with __a * __b
> 
> On Mon, 21 Jun 2021 at 14:04, Prathamesh Kulkarni
>  wrote:
> >
> > On Mon, 14 Jun 2021 at 13:27, Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Mon, 7 Jun 2021 at 12:45, Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Mon, 31 May 2021 at 16:01, Prathamesh Kulkarni
> > > >  wrote:
> > > > >
> > > > > On Mon, 31 May 2021 at 15:22, Prathamesh Kulkarni
> > > > >  wrote:
> > > > > >
> > > > > > On Wed, 26 May 2021 at 14:07, Marc Glisse 
> wrote:
> > > > > > >
> > > > > > > On Wed, 26 May 2021, Prathamesh Kulkarni via Gcc-patches
> wrote:
> > > > > > >
> > > > > > > > The attached patch removes calls to builtins in vmul_n* (a, b)
> with __a * __b.
> > > > > > >
> > > > > > > I am not familiar with neon, but are __a and __b unsigned here?
> Otherwise,
> > > > > > > is vmul_n already undefined in case of overflow?
> > > > > > Hi Marc,
> > > > > > Sorry for late reply, for vmul_n_s*, I think they are signed
> > > > > > (intx_t).
> > > > > Oops, I meant intx_t.
> > > > > > I am not sure how should the intrinsic behave in case of signed
> overflow,
> > > > > > but I am assuming it's OK since vmul_s* intrinsics leave it 
> > > > > > undefined
> too.
> > > > > > Kyrill, is it OK to leave vmul_s* and vmul_n_s* undefined in case of
> overflow ?
> > > > The attached version fixes one fallout I missed earlier.
> > > > Is this OK to commit ?
> > > ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572037.html
> > ping * 2 https://gcc.gnu.org/pipermail/gcc-patches/2021-
> June/572037.html
> ping * 3 https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572037.html

I'm a bit wary of leaving them undefined for signed overflow. I see that 
aarch64 leaves them too so maybe it's not such a big deal, but I'd like to 
consider that separately.
Can you please split this into the unsigned and floating-point parts, followed 
by the signed intrinsics? The unsigned and FP parts are okay, lets' review the 
signed intrinsics separately.

Thanks,
Kyrill

> 
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > > > >
> > > > > > Thanks,
> > > > > > Prathamesh
> > > > > > >
> > > > > > > --
> > > > > > > Marc Glisse


Re: [ARM] PR66791: Replace calls to builtin in vmul_n (a, b) intrinsics with __a * __b

2021-06-29 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 21 Jun 2021 at 14:04, Prathamesh Kulkarni
 wrote:
>
> On Mon, 14 Jun 2021 at 13:27, Prathamesh Kulkarni
>  wrote:
> >
> > On Mon, 7 Jun 2021 at 12:45, Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Mon, 31 May 2021 at 16:01, Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Mon, 31 May 2021 at 15:22, Prathamesh Kulkarni
> > > >  wrote:
> > > > >
> > > > > On Wed, 26 May 2021 at 14:07, Marc Glisse  
> > > > > wrote:
> > > > > >
> > > > > > On Wed, 26 May 2021, Prathamesh Kulkarni via Gcc-patches wrote:
> > > > > >
> > > > > > > The attached patch removes calls to builtins in vmul_n* (a, b) 
> > > > > > > with __a * __b.
> > > > > >
> > > > > > I am not familiar with neon, but are __a and __b unsigned here? 
> > > > > > Otherwise,
> > > > > > is vmul_n already undefined in case of overflow?
> > > > > Hi Marc,
> > > > > Sorry for late reply, for vmul_n_s*, I think they are signed
> > > > > (intx_t).
> > > > Oops, I meant intx_t.
> > > > > I am not sure how should the intrinsic behave in case of signed 
> > > > > overflow,
> > > > > but I am assuming it's OK since vmul_s* intrinsics leave it undefined 
> > > > > too.
> > > > > Kyrill, is it OK to leave vmul_s* and vmul_n_s* undefined in case of 
> > > > > overflow ?
> > > The attached version fixes one fallout I missed earlier.
> > > Is this OK to commit ?
> > ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572037.html
> ping * 2 https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572037.html
ping * 3 https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572037.html

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > > Prathamesh
> > > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > > >
> > > > > > --
> > > > > > Marc Glisse


Re: [ARM] PR66791: Replace calls to builtin in vmul_n (a, b) intrinsics with __a * __b

2021-06-21 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 14 Jun 2021 at 13:27, Prathamesh Kulkarni
 wrote:
>
> On Mon, 7 Jun 2021 at 12:45, Prathamesh Kulkarni
>  wrote:
> >
> > On Mon, 31 May 2021 at 16:01, Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Mon, 31 May 2021 at 15:22, Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Wed, 26 May 2021 at 14:07, Marc Glisse  wrote:
> > > > >
> > > > > On Wed, 26 May 2021, Prathamesh Kulkarni via Gcc-patches wrote:
> > > > >
> > > > > > The attached patch removes calls to builtins in vmul_n* (a, b) with 
> > > > > > __a * __b.
> > > > >
> > > > > I am not familiar with neon, but are __a and __b unsigned here? 
> > > > > Otherwise,
> > > > > is vmul_n already undefined in case of overflow?
> > > > Hi Marc,
> > > > Sorry for late reply, for vmul_n_s*, I think they are signed
> > > > (intx_t).
> > > Oops, I meant intx_t.
> > > > I am not sure how should the intrinsic behave in case of signed 
> > > > overflow,
> > > > but I am assuming it's OK since vmul_s* intrinsics leave it undefined 
> > > > too.
> > > > Kyrill, is it OK to leave vmul_s* and vmul_n_s* undefined in case of 
> > > > overflow ?
> > The attached version fixes one fallout I missed earlier.
> > Is this OK to commit ?
> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572037.html
ping * 2 https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572037.html

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > --
> > > > > Marc Glisse


Re: [ARM] PR66791: Replace calls to builtin in vmul_n (a, b) intrinsics with __a * __b

2021-06-14 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 7 Jun 2021 at 12:45, Prathamesh Kulkarni
 wrote:
>
> On Mon, 31 May 2021 at 16:01, Prathamesh Kulkarni
>  wrote:
> >
> > On Mon, 31 May 2021 at 15:22, Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Wed, 26 May 2021 at 14:07, Marc Glisse  wrote:
> > > >
> > > > On Wed, 26 May 2021, Prathamesh Kulkarni via Gcc-patches wrote:
> > > >
> > > > > The attached patch removes calls to builtins in vmul_n* (a, b) with 
> > > > > __a * __b.
> > > >
> > > > I am not familiar with neon, but are __a and __b unsigned here? 
> > > > Otherwise,
> > > > is vmul_n already undefined in case of overflow?
> > > Hi Marc,
> > > Sorry for late reply, for vmul_n_s*, I think they are signed
> > > (intx_t).
> > Oops, I meant intx_t.
> > > I am not sure how should the intrinsic behave in case of signed overflow,
> > > but I am assuming it's OK since vmul_s* intrinsics leave it undefined too.
> > > Kyrill, is it OK to leave vmul_s* and vmul_n_s* undefined in case of 
> > > overflow ?
> The attached version fixes one fallout I missed earlier.
> Is this OK to commit ?
ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572037.html

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> > >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > --
> > > > Marc Glisse


Re: [ARM] PR66791: Replace calls to builtin in vmul_n (a, b) intrinsics with __a * __b

2021-06-07 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 31 May 2021 at 16:01, Prathamesh Kulkarni
 wrote:
>
> On Mon, 31 May 2021 at 15:22, Prathamesh Kulkarni
>  wrote:
> >
> > On Wed, 26 May 2021 at 14:07, Marc Glisse  wrote:
> > >
> > > On Wed, 26 May 2021, Prathamesh Kulkarni via Gcc-patches wrote:
> > >
> > > > The attached patch removes calls to builtins in vmul_n* (a, b) with __a 
> > > > * __b.
> > >
> > > I am not familiar with neon, but are __a and __b unsigned here? Otherwise,
> > > is vmul_n already undefined in case of overflow?
> > Hi Marc,
> > Sorry for late reply, for vmul_n_s*, I think they are signed
> > (intx_t).
> Oops, I meant intx_t.
> > I am not sure how should the intrinsic behave in case of signed overflow,
> > but I am assuming it's OK since vmul_s* intrinsics leave it undefined too.
> > Kyrill, is it OK to leave vmul_s* and vmul_n_s* undefined in case of 
> > overflow ?
The attached version fixes one fallout I missed earlier.
Is this OK to commit ?

Thanks,
Prathamesh
> >
> > Thanks,
> > Prathamesh
> > >
> > > --
> > > Marc Glisse
2021-06-07  Prathamesh Kulkarni  

PR target/66791
* config/arm/arm_neon.h (vmul_n_s16): Replace call to builtin
with __a * __b.
(vmul_n_s32): Likewise.
(vmul_n_u16): Likewise.
(vmul_n_u32): Likewise.
(vmulq_n_s16): Likewise.
(vmulq_n_s32): Likewise.
(vmulq_n_u16): Likewise.
(vmulq_n_u32): Likewise.
(vmul_n_f32): Gate __a * __b conditionally on __FAST_MATH__.
(vmulq_n_f32): Likewise.
(vmul_n_f16): Likewise.
(vmulq_n_f16): Likewise.

testsuite/
* gcc.target/arm/armv8_2-fp16-neon-2.c: Adjust.

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index dcd533fd003..8ac00774e6c 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -8331,70 +8331,78 @@ __extension__ extern __inline int16x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vmul_n_s16 (int16x4_t __a, int16_t __b)
 {
-  return (int16x4_t)__builtin_neon_vmul_nv4hi (__a, (__builtin_neon_hi) __b);
+  return __a * __b;
 }
 
 __extension__ extern __inline int32x2_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vmul_n_s32 (int32x2_t __a, int32_t __b)
 {
-  return (int32x2_t)__builtin_neon_vmul_nv2si (__a, (__builtin_neon_si) __b);
+  return __a * __b;
 }
 
 __extension__ extern __inline float32x2_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vmul_n_f32 (float32x2_t __a, float32_t __b)
 {
+#ifdef __FAST_MATH__
+  return __a * __b;
+#else
   return (float32x2_t)__builtin_neon_vmul_nv2sf (__a, (__builtin_neon_sf) __b);
+#endif
 }
 
 __extension__ extern __inline uint16x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vmul_n_u16 (uint16x4_t __a, uint16_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vmul_nv4hi ((int16x4_t) __a, 
(__builtin_neon_hi) __b);
+  return __a * __b;
 }
 
 __extension__ extern __inline uint32x2_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vmul_n_u32 (uint32x2_t __a, uint32_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vmul_nv2si ((int32x2_t) __a, 
(__builtin_neon_si) __b);
+  return __a * __b;
 }
 
 __extension__ extern __inline int16x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vmulq_n_s16 (int16x8_t __a, int16_t __b)
 {
-  return (int16x8_t)__builtin_neon_vmul_nv8hi (__a, (__builtin_neon_hi) __b);
+  return __a * __b;
 }
 
 __extension__ extern __inline int32x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vmulq_n_s32 (int32x4_t __a, int32_t __b)
 {
-  return (int32x4_t)__builtin_neon_vmul_nv4si (__a, (__builtin_neon_si) __b);
+  return __a * __b;
 }
 
 __extension__ extern __inline float32x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vmulq_n_f32 (float32x4_t __a, float32_t __b)
 {
+#ifdef __FAST_MATH__
+  return __a * __b;
+#else
   return (float32x4_t)__builtin_neon_vmul_nv4sf (__a, (__builtin_neon_sf) __b);
+#endif
 }
 
 __extension__ extern __inline uint16x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vmulq_n_u16 (uint16x8_t __a, uint16_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vmul_nv8hi ((int16x8_t) __a, 
(__builtin_neon_hi) __b);
+  return __a * __b;
 }
 
 __extension__ extern __inline uint32x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vmulq_n_u32 (uint32x4_t __a, uint32_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vmul_nv4si ((int32x4_t) __a, 
(__builtin_neon_si) __b);
+  return __a * __b;
 }
 
 __extension__ extern __inline int32x4_t
@@ -17661,7 +17669,11 @@ __extension__ extern __inline float16x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vmul_n_f16 (float16x4_t __a, float16_t __b)
 {
+#ifdef __FAST_MATH__
+  return __a * __b;
+#else
   return __builtin_neon_vmul_nv4hf (__a, __b);
+#endif
 }
 
 __extension__ extern __inline float16x8_t
@@ -17686,7 

Re: [ARM] PR66791: Replace calls to builtin in vmul_n (a, b) intrinsics with __a * __b

2021-05-31 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 31 May 2021 at 15:22, Prathamesh Kulkarni
 wrote:
>
> On Wed, 26 May 2021 at 14:07, Marc Glisse  wrote:
> >
> > On Wed, 26 May 2021, Prathamesh Kulkarni via Gcc-patches wrote:
> >
> > > The attached patch removes calls to builtins in vmul_n* (a, b) with __a * 
> > > __b.
> >
> > I am not familiar with neon, but are __a and __b unsigned here? Otherwise,
> > is vmul_n already undefined in case of overflow?
> Hi Marc,
> Sorry for late reply, for vmul_n_s*, I think they are signed
> (intx_t).
Oops, I meant intx_t.
> I am not sure how should the intrinsic behave in case of signed overflow,
> but I am assuming it's OK since vmul_s* intrinsics leave it undefined too.
> Kyrill, is it OK to leave vmul_s* and vmul_n_s* undefined in case of overflow 
> ?
>
> Thanks,
> Prathamesh
> >
> > --
> > Marc Glisse


Re: [ARM] PR66791: Replace calls to builtin in vmul_n (a, b) intrinsics with __a * __b

2021-05-31 Thread Prathamesh Kulkarni via Gcc-patches
On Wed, 26 May 2021 at 14:07, Marc Glisse  wrote:
>
> On Wed, 26 May 2021, Prathamesh Kulkarni via Gcc-patches wrote:
>
> > The attached patch removes calls to builtins in vmul_n* (a, b) with __a * 
> > __b.
>
> I am not familiar with neon, but are __a and __b unsigned here? Otherwise,
> is vmul_n already undefined in case of overflow?
Hi Marc,
Sorry for late reply, for vmul_n_s*, I think they are signed
(intx_t).
I am not sure how should the intrinsic behave in case of signed overflow,
but I am assuming it's OK since vmul_s* intrinsics leave it undefined too.
Kyrill, is it OK to leave vmul_s* and vmul_n_s* undefined in case of overflow ?

Thanks,
Prathamesh
>
> --
> Marc Glisse


Re: [ARM] PR66791: Replace calls to builtin in vmul_n (a, b) intrinsics with __a * __b

2021-05-26 Thread Marc Glisse

On Wed, 26 May 2021, Prathamesh Kulkarni via Gcc-patches wrote:


The attached patch removes calls to builtins in vmul_n* (a, b) with __a * __b.


I am not familiar with neon, but are __a and __b unsigned here? Otherwise, 
is vmul_n already undefined in case of overflow?


--
Marc Glisse


[ARM] PR66791: Replace calls to builtin in vmul_n (a, b) intrinsics with __a * __b

2021-05-26 Thread Prathamesh Kulkarni via Gcc-patches
Hi,
The attached patch removes calls to builtins in vmul_n* (a, b) with __a * __b.
For float variants, it gates multiplication on __FAST_MATH__.
Since we are not removing all calls to builtins, I am not sure if we
should remove
entry for vmul_n from arm_neon_builtins.def ?

Testing the patch showed fallout for armv8_2-fp16-neon-2.c, because the patch
generates better code.
Code-gen diff:

--- armv8_2-fp16-neon-2.s   2021-05-26 11:34:30.870304900 +0530
+++ armv8_2-fp16-neon-2-after.s 2021-05-26 11:19:13.990304900 +0530
@@ -84,21 +84,9 @@
 test_vmul_n_16x4:
-   @ args = 0, pretend = 0, frame = 8
+   @ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
-   sub sp, sp, #8
-   vldrd7, .L8
-   add r3, sp, #6
-   vst1.16 {d1[0]}, [r3]
-   vld1.16 {d7[0]}, [r3]
-   vmul.f16d0, d0, d7[0]
-   add sp, sp, #8
-   @ sp needed
+   vmov.f16r3, s2  @ __fp16
+   vdup.16 d16, r3
+   vmul.f16d0, d16, d0
bx  lr
-.L9:
-   .align  3
-.L8:
-   .short  0
-   .short  0
-   .short  0
-   .short  0
.size   test_vmul_n_16x4, .-test_vmul_n_16x4
@@ -113,21 +101,9 @@
 test_vmul_n_16x8:
-   @ args = 0, pretend = 0, frame = 8
+   @ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
-   sub sp, sp, #8
-   vldrd7, .L12
-   add r3, sp, #6
-   vst1.16 {d2[0]}, [r3]
-   vld1.16 {d7[0]}, [r3]
-   vmul.f16q0, q0, d7[0]
-   add sp, sp, #8
-   @ sp needed
+   vmov.f16r3, s4  @ __fp16
+   vdup.16 q8, r3
+   vmul.f16q0, q8, q0
bx  lr
-.L13:
-   .align  3
-.L12:
-   .short  0
-   .short  0
-   .short  0
-   .short  0
.size   test_vmul_n_16x8, .-test_vmul_n_16x8

Adjusted the test, to fix the failing tests.
OK to commit if testing passes ?

Thanks,
Prathamesh
2021-26-05  Prathamesh Kulkarni  

PR target/66791
* config/arm/arm_neon.h (vmul_n_s16): Replace call to builtin with __a 
* __b.
(vmul_n_s32): Likewise.
(vmul_n_u16): Likewise.
(vmul_n_u32): Likewise.
(vmulq_n_s16): Likewise.
(vmulq_n_s32): Likewise.
(vmulq_n_u16): Likewise.
(vmulq_n_u32): Likewise.
(vmul_n_f32): Gate __a * __b conditionally on __FAST_MATH__.
(vmulq_n_f32): Likewise.
(vmul_n_f16): Likewise.
(vmulq_n_f16): Likewise.

testsuite/
* gcc.target/arm/armv8_2-fp16-neon-2.c: Adjust.

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index dcd533fd003..8ac00774e6c 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -8331,70 +8331,78 @@ __extension__ extern __inline int16x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vmul_n_s16 (int16x4_t __a, int16_t __b)
 {
-  return (int16x4_t)__builtin_neon_vmul_nv4hi (__a, (__builtin_neon_hi) __b);
+  return __a * __b;
 }
 
 __extension__ extern __inline int32x2_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vmul_n_s32 (int32x2_t __a, int32_t __b)
 {
-  return (int32x2_t)__builtin_neon_vmul_nv2si (__a, (__builtin_neon_si) __b);
+  return __a * __b;
 }
 
 __extension__ extern __inline float32x2_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vmul_n_f32 (float32x2_t __a, float32_t __b)
 {
+#ifdef __FAST_MATH__
+  return __a * __b;
+#else
   return (float32x2_t)__builtin_neon_vmul_nv2sf (__a, (__builtin_neon_sf) __b);
+#endif
 }
 
 __extension__ extern __inline uint16x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vmul_n_u16 (uint16x4_t __a, uint16_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vmul_nv4hi ((int16x4_t) __a, 
(__builtin_neon_hi) __b);
+  return __a * __b;
 }
 
 __extension__ extern __inline uint32x2_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vmul_n_u32 (uint32x2_t __a, uint32_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vmul_nv2si ((int32x2_t) __a, 
(__builtin_neon_si) __b);
+  return __a * __b;
 }
 
 __extension__ extern __inline int16x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vmulq_n_s16 (int16x8_t __a, int16_t __b)
 {
-  return (int16x8_t)__builtin_neon_vmul_nv8hi (__a, (__builtin_neon_hi) __b);
+  return __a * __b;
 }
 
 __extension__ extern __inline int32x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vmulq_n_s32 (int32x4_t __a, int32_t __b)
 {
-  return (int32x4_t)__builtin_neon_vmul_nv4si (__a, (__builtin_neon_si) __b);
+  return __a * __b;
 }
 
 __extension__ extern __inline float32x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vmulq_n_f32 (float32x4_t __a, float32_t __b)
 {
+#ifdef __FAST_MATH__
+  return __a * __b;
+#else
   return