Re: [Patch]Bug 89057 - [8/9/10 Regression] AArch64 ld3 st4 less optimized

2019-04-29 Thread Jaydeep Chauhan
Hello Ramana,

Description for bug-89057:
--
This issue is occurs specifically with vcombine* method with optimization.
Assembly generated by vcombine* is less optimize. But individual
vcombine* method
can combining instruction properly and giving optimized output as per
below case.


Testcase 01:
---
int16x8_t
foo ()
{
  return vcombine_s16 (vdup_n_s16 (0), vdup_n_s16 (8));
}

Output with trunk ("-O2"):(Same for with patch)
-
foo:
.LFB3987:
.cfi_startproc
adrpx0, .LC0
ldr q0, [x0, #:lo12:.LC0]
ret
.cfi_endproc
.LFE3987:
.size   foo, .-foo
.section.rodata.cst16,"aM",@progbits,16
.align  4
.LC0:
.hword  0
.hword  0
.hword  0
.hword  0
.hword  8
.hword  8
.hword  8
.hword  8


But if vcombine call throught intrinsic function vst4_u8 combining is
not perform properly.
testcase is given below.

Testcase 02:
---

#include 
#include 

void qt_convert_rgb888_to_rgb32_neon(unsigned *dst, const unsigned
char *src, int len)
{
const unsigned *const end = dst + len;

const unsigned *const simdEnd = end - 7;
// non-inline asm version (uses more moves)
uint8x8x4_t dstVector;
dstVector.val[3] = vdup_n_u8(0xff);
do {
uint8x8x3_t srcVector = vld3_u8(src);
src += 3 * 8;
dstVector.val[0] = srcVector.val[2];
dstVector.val[1] = srcVector.val[1];
dstVector.val[2] = srcVector.val[0];
///Internally vst4_u8 is calling vcombine
vst4_u8((uint8_t*)dst, dstVector);
dst += 8;
} while (dst < simdEnd);

}

Output with trunk ("-O2"):
-
.L2:
ld3 {v4.8b - v6.8b}, [x1]
adrpx3, .LC1
add x1, x1, 24
ldr q3, [x3, #:lo12:.LC1]
mov v16.8b, v6.8b
mov v7.8b, v5.8b
mov v4.8b, v4.8b
ins v16.d[1], v17.d[0]
ins v7.d[1], v17.d[0]
ins v4.d[1], v17.d[0]
mov v0.16b, v16.16b
mov v1.16b, v7.16b
mov v2.16b, v4.16b
st4 {v0.8b - v3.8b}, [x0]
add x0, x0, 32
cmp x2, x0
bhi .L2
ret
.cfi_endproc

Output with trunk ("-O2") with patch:
-
.L2:
ld3 {v4.8b - v6.8b}, [x1]
adrpx3, .LC1
add x1, x1, 24
ldr q3, [x3, #:lo12:.LC1]
mov v0.8b, v6.8b
mov v1.8b, v5.8b
mov v2.8b, v4.8b
st4 {v0.8b - v3.8b}, [x0]
add x0, x0, 32
cmp x2, x0
bhi .L2
ret

For testcase 2 combine instruction is not done properly with
aarch64_split_simd_combine
method. Internally it is combine by complex way so that it is
generating more assembly
instruction in output. So to do combining instruction with optimal way
i have added
"aarch64_combine_internal"  created for this.

We can optimize output of both testcases with only
"aarch64_combine_internal" ,But when we
run without optimization it is create problem in split instruction. So
that i have
handle both cases with optimization ("aarch64_combine_internal")and
without optimizations (aarch64_split_simd_combine) separately.

Testing:
---
I have tested both testcase by executing below command in build directory
make -k check-gcc RUNTESTFLAGS="aarch64.exp=pr* -v" &>test.txt
and verified output of test.txt and also gcc.log.
In gcc.log status is for both case is PASS.

Thanks,
Jaydeep.

On Mon, Apr 29, 2019 at 4:19 PM Umesh Kalappa  wrote:
>
> >>Before getting started with reviewing the patch , the first question
> is whether you have a copyright assignment on file or does your
> employer have one on record with the FSF ?
>
> Ramana, We asked for copyright assignment form ,with details asked by
> copyright-cl...@fsf.org(craig)
> @ ass...@gnu.org.
>
> Waiting for the reply .
> ~Umesh
>
> On Mon, Apr 29, 2019 at 2:55 PM Ramana Radhakrishnan
>  wrote:
> >
> > On Mon, Apr 29, 2019 at 8:44 AM Jaydeep Chauhan
> >  wrote:
> > >
> > > Hi All,
> > >
> > > The attached patch (89057.patch) fixes the subjected issue.
> > > Please let me know your thoughts on the patch.
> >
> > Thanks for your patch.
> >
> > Before getting started with reviewing the patch , the first question
> > is whether you have a copyright assignment on file or does your
> > employer have one on record with the FSF ?
> >
> > Further could you elaborate more on what you have done to fix this by
> > providing some description other than "fix the subjected issue" and
> > why you have chosen the approach that you have. Finally one of the
> > things required for any patch to be considered is a full test run. Can
> > you tell us how you tested this patch ?

Re: [Patch]Bug 89057 - [8/9/10 Regression] AArch64 ld3 st4 less optimized

2019-04-29 Thread Umesh Kalappa
>>Before getting started with reviewing the patch , the first question
is whether you have a copyright assignment on file or does your
employer have one on record with the FSF ?

Ramana, We asked for copyright assignment form ,with details asked by
copyright-cl...@fsf.org(craig)
@ ass...@gnu.org.

Waiting for the reply .
~Umesh

On Mon, Apr 29, 2019 at 2:55 PM Ramana Radhakrishnan
 wrote:
>
> On Mon, Apr 29, 2019 at 8:44 AM Jaydeep Chauhan
>  wrote:
> >
> > Hi All,
> >
> > The attached patch (89057.patch) fixes the subjected issue.
> > Please let me know your thoughts on the patch.
>
> Thanks for your patch.
>
> Before getting started with reviewing the patch , the first question
> is whether you have a copyright assignment on file or does your
> employer have one on record with the FSF ?
>
> Further could you elaborate more on what you have done to fix this by
> providing some description other than "fix the subjected issue" and
> why you have chosen the approach that you have. Finally one of the
> things required for any patch to be considered is a full test run. Can
> you tell us how you tested this patch ?
>
>
> Ramana
>
>
>
> >
> > Thanks,
> > Jaydeep.


Re: [Patch]Bug 89057 - [8/9/10 Regression] AArch64 ld3 st4 less optimized

2019-04-29 Thread Ramana Radhakrishnan
On Mon, Apr 29, 2019 at 8:44 AM Jaydeep Chauhan
 wrote:
>
> Hi All,
>
> The attached patch (89057.patch) fixes the subjected issue.
> Please let me know your thoughts on the patch.

Thanks for your patch.

Before getting started with reviewing the patch , the first question
is whether you have a copyright assignment on file or does your
employer have one on record with the FSF ?

Further could you elaborate more on what you have done to fix this by
providing some description other than "fix the subjected issue" and
why you have chosen the approach that you have. Finally one of the
things required for any patch to be considered is a full test run. Can
you tell us how you tested this patch ?


Ramana



>
> Thanks,
> Jaydeep.


[Patch]Bug 89057 - [8/9/10 Regression] AArch64 ld3 st4 less optimized

2019-04-29 Thread Jaydeep Chauhan
Hi All,

The attached patch (89057.patch) fixes the subjected issue.
Please let me know your thoughts on the patch.

Thanks,
Jaydeep.


89057.patch
Description: Binary data