[Bug target/106340] flag set from SVE svwhilelt intrinsic not reused in loop

2022-07-20 Thread yyc1992 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106340 Yichao Yu changed: What|Removed |Added Resolution|--- |INVALID Status|UNCONFIRMED

[Bug target/106324] ptrue not reused between vector instructions and predicate instructions

2022-07-18 Thread yyc1992 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106324 --- Comment #3 from Yichao Yu --- Actually I just realized that the not instruction used the .d version as requested, the vector instruction didn’t….. I got it reversed in the original post……

[Bug target/106340] flag set from SVE svwhilelt intrinsic not reused in loop

2022-07-18 Thread yyc1992 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106340 --- Comment #1 from Yichao Yu --- Also note that this is for code I've tweaked to match what the finally code as much as possible. For a complete implementation of this, I expect the loop transformation done for normal loop should move the

[Bug target/106340] New: flag set from SVE svwhilelt intrinsic not reused in loop

2022-07-18 Thread yyc1992 at gmail dot com via Gcc-bugs
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- I'm experimenting with manually writing VLA loops and trying to match the assembly code I expect/from autovectorizer. One of the main area I

[Bug target/106329] New: No optimization for SVE pfalse predicate

2022-07-16 Thread yyc1992 at gmail dot com via Gcc-bugs
: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- If a known-all-false predicate is used on an SVE intrinsic, the result should be fully no-op, undefined, zeroing and no actual instruction (other than potentially returning

[Bug target/106327] New: side-effect-free _x variance not optimized to unpredicated instruction

2022-07-16 Thread yyc1992 at gmail dot com via Gcc-bugs
: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106326 . According to the Arm C Language Extension for SVE, when

[Bug target/106326] New: _m and _z version of SVE instrinsics not optimized to predicate-free version

2022-07-16 Thread yyc1992 at gmail dot com via Gcc-bugs
: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- The following code should generate a predicate-free fadd instruction since all the predicates are true. ``` svfloat64_t

[Bug target/106324] New: ptrue not reused between vector instructions and predicate instructions

2022-07-16 Thread yyc1992 at gmail dot com via Gcc-bugs
: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- The following code has two use of `svptrue_b64()`s and none of the instructions using them should be clearning it so only one

[Bug c++/100161] New: Impossible to suppress Wtype-limits warning involving template parameter.

2021-04-20 Thread yyc1992 at gmail dot com via Gcc-bugs
: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- If a comparison involving a template parameter is always true or false, it should not raise a warning if it could take other

[Bug tree-optimization/100088] New: ymm store split into two xmm stores

2021-04-14 Thread yyc1992 at gmail dot com via Gcc-bugs
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- The following code ``` __attribute__((target("avx2"))) void fill_avx2(double *__restrict__ data, int n, double value) { for (int i = 0; i &

[Bug c/96990] New: Regression in aarch64 struct vector member initialization

2020-09-08 Thread yyc1992 at gmail dot com
Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- The following code used to work on gcc 9.3 but stops working with 10.2 with an error ``` a.c: In function ‘test_aa64_vec_2’: a.c:19:24: error

[Bug c/96629] spurious maybe uninitialized variable warning with difficult control-flow analysis

2020-09-03 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96629 --- Comment #3 from Yichao Yu --- Just curious, is it some particular structure that is upsetting it or did it simply hit some depth limit.

[Bug c/96629] New: spurious uninitialized variable warning with branches at -O1 and higher

2020-08-16 Thread yyc1992 at gmail dot com
: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Reduced test code: ``` int mem(char *data); int cond(void); void f(char *data, unsigned idx, unsigned inc) { char *d2; int c

[Bug rtl-optimization/96539] Unnecessary no-op copy with Os and tail call with struct argument

2020-08-11 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96539 --- Comment #4 from Yichao Yu --- Wow that was fast... thx.

[Bug rtl-optimization/96539] New: Unnecessary no-op copy with Os and tail call with struct argument

2020-08-08 Thread yyc1992 at gmail dot com
Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Test C code, ``` struct A { int a; int b; int c; int d; int e; int f; void *p1; void *p2

[Bug preprocessor/96069] -ffile-prefix-map does not affect print in gfortran

2020-07-08 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96069 --- Comment #8 from Yichao Yu --- OK, done. It would be nice to mention it on https://gcc.gnu.org/contribute.html#patches

[Bug preprocessor/96069] -ffile-prefix-map does not affect print in gfortran

2020-07-08 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96069 --- Comment #6 from Yichao Yu --- https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549411.html and https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549413.html

[Bug preprocessor/96069] -ffile-prefix-map does not affect print in gfortran

2020-07-08 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96069 --- Comment #4 from Yichao Yu --- > Apparently it is. Yes, but my question is about why should this be "WONTFIX". This feature (reproducible build) is certainly as useful in fortran as it is in C family. > Let move the component to

[Bug fortran/96069] -ffile-prefix-map does not affect print in gfortran

2020-07-08 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96069 --- Comment #2 from Yichao Yu --- Why should this feature be c only?

[Bug fortran/96069] New: -ffile-prefix-map does not affect print in gfortran

2020-07-05 Thread yyc1992 at gmail dot com
Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Compiling the following code `a.f` ``` subroutine f(name) implicit none character*(*) name print *,name return end

[Bug ipa/95775] Command line argument for target_clones?

2020-06-23 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95775 --- Comment #4 from Yichao Yu --- > Hey. My opinion is similar to Richi's. If you really want a highly optimized > library, you should rather use a dlopen mechanism with pre-built set of > options. Well, a few things, 1. That sounds like an

[Bug c/95777] Allow specifying more than one target options at the same time in target and target_clones attribute

2020-06-22 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95777 --- Comment #3 from Yichao Yu --- And for backward compatibility maybe `target_clones("(sse4.1,arch=core2),default")` would work?

[Bug c/95777] Allow specifying more than one target options at the same time in target and target_clones attribute

2020-06-22 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95777 --- Comment #2 from Yichao Yu --- I only tested this with `target_clones` and it seems that I misread the document for `target`. So this is only an issue with `target_clones` attribute. `target` support this just fine. So to be more clear,

[Bug ipa/95775] Command line argument for target_clones?

2020-06-22 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95775 --- Comment #2 from Yichao Yu --- > But it will blow up code-size considerably. > So without some major work I don't think simply slapping target_clones on > each function is going to fly in practice. I mean, it'll blow up not much more than

[Bug ipa/95796] New: Inlining works between functions with the same target attribute but not target_clones

2020-06-20 Thread yyc1992 at gmail dot com
Severity: normal Priority: P3 Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com CC: marxin at gcc dot gnu.org Target Milestone: --- If two functions with the same target attribute calls each other, GCC

[Bug ipa/95790] Incorrect static target dispatch

2020-06-20 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790 --- Comment #8 from Yichao Yu --- And the reason I reported this as a mis-optimization rather than something completely unsupported is that the following code. ``` #include // #define disable_opt __attribute__((flatten)) #define disable_opt

[Bug ipa/95790] Incorrect static target dispatch

2020-06-20 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790 --- Comment #7 from Yichao Yu --- > Your testcase has nested function multi-versioning. I don't think it works at all. I opened PR 95793. I'm sorry but what is nested function multi-versioning? and what's the difference between the test case

[Bug ipa/95790] Incorrect static target dispatch

2020-06-20 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790 --- Comment #5 from Yichao Yu --- It’s wrong when running on a target that has avx512f. The unoptimuzed version will call the correct foo but the unoptimized case won’t. As I said, this is an issue when the total targets are different between

[Bug ipa/95790] Incorrect static target dispatch

2020-06-20 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790 --- Comment #3 from Yichao Yu --- And the assembly showing the correct dispatch is .file "a.c" .text .p2align 4 .type _ZL3fooPKcj, @function _ZL3fooPKcj: .LFB0: .cfi_startproc movl$1,

[Bug ipa/95790] Incorrect static target dispatch

2020-06-20 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790 --- Comment #2 from Yichao Yu --- The C++ code attached above produces the following incorrect code with `g++ -O2 -S` .file "a.c" .text .p2align 4 .globl _Z3barv .type _Z3barv, @function _Z3barv:

[Bug other/95778] target_clones indirection eliminates requires noinline

2020-06-20 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95778 --- Comment #4 from Yichao Yu --- Yeah, after digging further the two issue are indeed the same. I initially didn't think they are since I didn't realize PR95786 (that the visibility attribute is simply ignored completely...) and thought static

[Bug ipa/95790] New: Incorrect static target dispatch

2020-06-20 Thread yyc1992 at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com CC: marxin at gcc dot gnu.org Target Milestone: --- The indirection elimination code currently only check for match of the target for the specific version but doesn't check if all the targets

[Bug tree-optimization/95786] New: Too aggressive target indirection elimination

2020-06-20 Thread yyc1992 at gmail dot com
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- I realize this issue when debugging PR95778 and PR95780 (ref https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548631.html) It seems that the indirection

[Bug other/95778] target_clones indirection eliminates requires noinline

2020-06-20 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95778 --- Comment #2 from Yichao Yu --- Also, the original code example had an error, the code that works properly was ``` static __attribute__((noinline,target_clones("default,avx2"))) int f2(int *p) { asm volatile ("" :: "r"(p) : "memory");

[Bug other/95778] target_clones indirection eliminates requires noinline

2020-06-20 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95778 --- Comment #1 from Yichao Yu --- Ah, I think this might be the fix for both this issue and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95780 . I'll test more and will try to submit it later. ``` diff --git a/gcc/multiple_target.c

[Bug other/95781] New: Missing dead code elimination when a recursive function is inlined.

2020-06-19 Thread yyc1992 at gmail dot com
Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Code, ``` static int 2(int *p, int k) { int res = 0; if (k > 0) res += 2(p, k - 1); return *p +

[Bug other/95780] New: target_clones treats internal visibility different from static functions

2020-06-19 Thread yyc1992 at gmail dot com
: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Again using the code in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95778. If the static function `f2` is changed

[Bug other/95779] New: Unnecessary dispatch function for static target_clones function.

2020-06-19 Thread yyc1992 at gmail dot com
Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Using the code in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95778 the full assembly generated (the version with both noinline

[Bug other/95778] New: target_clones indirection eliminates requires noinline

2020-06-19 Thread yyc1992 at gmail dot com
Component: other Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Compiling ``` static __attribute__((noinline,target_clones("default,avx2"))) int f2(int *p) { asm volatile ("" :: "r"

[Bug c/95777] New: Allow specifying more than one target options at the same time in target and target_clones attribute

2020-06-19 Thread yyc1992 at gmail dot com
: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Currently it seems that (document and own tests) only a single option is allowed for each version

[Bug lto/95776] New: Reduce indirection with target_clones at link time (with LTO)

2020-06-19 Thread yyc1992 at gmail dot com
Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com CC: marxin at gcc dot gnu.org Target Milestone: --- Currently, if a function is not not visible outside the final library (static, or internal

[Bug target/95775] New: Command line argument for target_clones?

2020-06-19 Thread yyc1992 at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Would it make sense to add a command line argument that is roughly equivalent to to adding `target_clones` to all functions? In terms of usefulness, I believe it will be a very

[Bug lto/94659] New: Missing symbol with LTO and target_clones

2020-04-19 Thread yyc1992 at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com CC: marxin at gcc dot gnu.org Target Milestone: --- This is basically the same as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80732 except now it only happens with LTO enabled. It seems

[Bug ipa/94656] New: target_clones on alias leads to segfault in the compiler

2020-04-18 Thread yyc1992 at gmail dot com
Priority: P3 Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com CC: marxin at gcc dot gnu.org Target Milestone: --- Compiling the following code with `gcc -c` leads to a segfault in the compiler targetclone pass

[Bug libstdc++/92759] New: Typo in libstdcxx/v6/xmethods.py

2019-12-02 Thread yyc1992 at gmail dot com
++ Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- I get the following warning when running gdb/rr. ``` /usr/lib/../share/gcc-9.2.0/python/libstdcxx/v6/xmethods.py:731: SyntaxWarning: list indices must be integers or slices, not str

[Bug target/54412] minimal 32-byte stack alignment with -mavx on 64-bit Windows

2019-08-25 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412 --- Comment #29 from Yichao Yu --- See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412#c25 GCC is fully capable of aligning the stack. It just seems that different part of it disagrees on what the current stack alignment is and whether a

[Bug target/90826] Weak symbol does not work reliably on windows

2019-06-10 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90826 --- Comment #2 from Yichao Yu --- Also, I just upgraded the compiler on this computer from 7.x to 9.1.0. The issue appeared before the upgrade as well but I didn't investigate until the upgrade finished.

[Bug target/90826] Weak symbol does not work reliably on windows

2019-06-10 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90826 --- Comment #1 from Yichao Yu --- Oh, forgot to mention that the first assembly was generated with -O3 and adding `.weak f` to the generated file fixes the issue as well.

[Bug target/90826] New: Weak symbol does not work reliably on windows

2019-06-10 Thread yyc1992 at gmail dot com
: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- The following code does not link correctly with all optimization levels on windows with the mingw-w64-x86_64-g++ compiler. ``` #include extern "C" void f() __a

[Bug c/90728] New: False positive Wmemset-elt-size with zero size array

2019-06-03 Thread yyc1992 at gmail dot com
Component: c Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- The code below comes from a template expansion (when certain cache feature is disabled) and all the operation on the `buff` member are no-op. ``` #include struct

[Bug tree-optimization/89582] Suboptimal code generated for floating point struct in -O3 compare to -O2

2019-04-04 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89582 --- Comment #6 from Yichao Yu --- For the vfloat test case, isn't the optimum code just ``` addps %xmm2, %xmm0 addps %xmm3, %xmm1 retq ``` It's not making full use of the vector but I assume not having to spill is a

[Bug target/89606] Extra mov after structure load instructions on aarch64

2019-03-06 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89606 --- Comment #1 from Yichao Yu --- Compiled a GCC 9 snapshot for pr89607 and the issue is still present.

[Bug target/89607] Missing optimization for store of multiple registers on aarch64

2019-03-06 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89607 --- Comment #8 from Yichao Yu --- I see. I don't imagine this to cause a major local speed up though I assume it should at least not be slower? That's also why I mentioned that this should at least be done for `-Os`.

[Bug target/89607] Missing optimization for store of multiple registers on aarch64

2019-03-06 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89607 --- Comment #6 from Yichao Yu --- > For aarch64, there was talk about adding stp for q registers. What do you mean? I was initially unsure about it too but I assume it already exist since clang (and now GCC 9) emits it and the arm arch

[Bug target/89607] Missing optimization for store of multiple registers on aarch64

2019-03-06 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89607 --- Comment #5 from Yichao Yu --- I just compiled the 9-20190303 snapshot and this is indeed seems to be fixed. Should this be closed now or after GCC 9 is released?

[Bug target/89607] Missing optimization for store of multiple registers on arm and aarch64

2019-03-06 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89607 --- Comment #3 from Yichao Yu --- Done pr89614

[Bug target/89614] New: Missing optimization for store of multiple registers on arm

2019-03-06 Thread yyc1992 at gmail dot com
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Separated from pr89607 as requested. Test code and result compiled with any non-zero optimization levels, ``` #include void f4

[Bug target/89607] Missing optimization for store of multiple registers on arm and aarch64

2019-03-06 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89607 --- Comment #2 from Yichao Yu --- Sure. I'll do that.

[Bug target/89607] New: Missing optimization for store of multiple registers on arm and aarch64

2019-03-06 Thread yyc1992 at gmail dot com
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Test code, Compiled for arm/aarch64 with -O1/-O2/-O3/-Os/-Ofast ``` #include void f4(float32x4x2_t *p, const float *p1

[Bug target/89606] New: Extra mov after structure load instructions on aarch64

2019-03-06 Thread yyc1992 at gmail dot com
Component: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Code to reproduce, ``` #include #ifdef __aarch64__ float64x2x2_t f(const double *p1, const double *p2) { float64x2x2_t v = vld2q_f64(p1); return

[Bug target/89597] New: Inconsistent vector calling convention on windows with Clang and MSVC

2019-03-05 Thread yyc1992 at gmail dot com
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- For 256bit and 512bit vector return values, Clang and MSVC always returns them in the corresponding registers even without

[Bug target/89581] Unneeded stack alignment on windows x86

2019-03-04 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89581 --- Comment #1 from Yichao Yu --- The problem is still there when compiled with -O2 ``` f: pushq %rbp vmovq (%r8), %xmm1 movq%rcx, %rax vmovq 8(%r8), %xmm0 vaddsd (%rdx), %xmm1, %xmm1

[Bug target/89582] New: Suboptimal code generated for floating point struct in -O3 compare to -O2

2019-03-04 Thread yyc1992 at gmail dot com
: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- When testing the code for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89581 on linux, I noticed that the code seems suboptimum

[Bug target/89581] New: Unneeded stack alignment on windows x86

2019-03-04 Thread yyc1992 at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- On windows, when compiling the following code with ` gcc -mavx2 a.c -o - -S -O3 -g0 -fno-asynchronous-unwind-tables -fomit-frame-pointer -Wall -Wextra` ``` typedef struct

[Bug target/54412] minimal 32-byte stack alignment with -mavx on 64-bit Windows

2019-02-27 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412 --- Comment #24 from Yichao Yu --- Oh, and the test case above was compiled with -O3 (and -g -Wall -Wextra).

[Bug target/54412] minimal 32-byte stack alignment with -mavx on 64-bit Windows

2019-02-27 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412 Yichao Yu changed: What|Removed |Added CC||yyc1992 at gmail dot com --- Comment #23

[Bug c/89485] New: Support vectorcall calling convention on windows

2019-02-24 Thread yyc1992 at gmail dot com
: c Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- I'm very surprised that I didn't find an issue for this so sorry if this is discussed/rejected somewhere else. It appears that both MSVC and clang supports a vectorcall calling

[Bug target/82641] Unable to enable crc32 for a certain function with target attribute on ARM (aarch32)

2018-01-30 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82641 --- Comment #20 from Yichao Yu --- Just want to mention that the lack of a way to locally change the arch settings without lying to the compiler is exactly why I reported this issue.

[Bug target/83110] Relocation error when taking address of protected function in shared library.

2017-11-23 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83110 --- Comment #2 from Yichao Yu --- What might be invalid about the source?

[Bug target/83110] New: Relocation error when taking address of protected function in shared library.

2017-11-22 Thread yyc1992 at gmail dot com
: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- This is very similar to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65248 although that one is marked as fixed

[Bug target/82641] Unable to enable crc32 for a certain function with target attribute on ARM (aarch32)

2017-11-02 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82641 --- Comment #7 from Yichao Yu --- It would be great if `+crc` can work if it's not ambiguous. Requiring `arch=armv8-a+crc` works for me too, and it'll just require more preprocessor checks.

[Bug target/82641] Unable to enable crc32 for a certain function with target attribute on ARM (aarch32)

2017-10-24 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82641 --- Comment #3 from Yichao Yu --- > ARMv8-a is the only architecture variant where the CRC extension is optional Not really. There's also armv8-r and armv8-m. Also, I believe code compiled for armv7-a can run on armv8-a hardware and can also

[Bug target/82641] Unable to enable crc32 for a certain function with target attribute on ARM

2017-10-20 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82641 --- Comment #1 from Yichao Yu --- I've found a workaround in https://sourceware.org/ml/binutils/2017-04/msg00171.html but it's extremely ugly (albeit also very clever...).

[Bug target/82641] New: Unable to enable crc32 for a certain function with target attribute on ARM

2017-10-20 Thread yyc1992 at gmail dot com
: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- The assembler complains about the target not supporting CRC32 instructions for certain (generic) targets on ARM and AArch64

[Bug target/80732] target_clones does not work with dlsym

2017-06-19 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80732 --- Comment #9 from Yichao Yu --- Thanks for the fix! Does it fix https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78366 at the same time?

[Bug target/80732] target_clones does not work with dlsym

2017-05-17 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80732 --- Comment #6 from Yichao Yu --- Good to know. Thanks.

[Bug target/80732] target_clones does not work with dlsym

2017-05-17 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80732 --- Comment #4 from Yichao Yu --- `double (*pf1)(double, double, double) = dlsym(hdl, "f1.ifunc");` Wouldn't it be better if GCC generates local functions `f1.default`, `f1.fma` as implementation and `f1` to replace `f1.ifunc`? It's quite

[Bug target/80732] New: target_clones does not work with dlsym

2017-05-12 Thread yyc1992 at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Compiling the code below to a executable with `gcc -Wall -Wextra -O3 -fPIC -ldl -rdynamic`. On a haswell+ system, the output is ``` 1: 0, 4.93038e-32, 0 2: 4.93038e-32, 4.93038e-32

[Bug target/77728] [5/6 Regression] Miscompilation multiple vector iteration on ARM

2017-04-25 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77728 --- Comment #48 from Yichao Yu --- Thanks for fixing this. I didn't follow all the comments since I'm not familiar with the C++ ABI so just to make sure I understand what's happening is it that the bug is caused by a inconsistency in C++ ABI for

[Bug target/77728] [5/6/7 Regression] Miscompilation multiple vector iteration on ARM

2017-03-15 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77728 --- Comment #6 from Yichao Yu --- Anything new here?

[Bug target/77728] [5/6/7 Regression] Miscompilation multiple vector iteration on ARM

2017-01-13 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77728 --- Comment #5 from Yichao Yu --- Ping again? Anything new or I can help with here?

[Bug middle-end/77996] Miscompilation due to LTO on aarch64

2016-10-20 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77996 --- Comment #12 from Yichao Yu --- Since the LLVM miscompilation isn't fixed, is there any way to check the alias assumptions more programmatically? (I can see that the TrailingObject might easily introduce something like this but given the

[Bug target/77728] [5/6/7 Regression] Miscompilation multiple vector iteration on ARM

2016-10-20 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77728 --- Comment #4 from Yichao Yu --- Ping. Anything I can help with debugging this?

[Bug middle-end/77996] Miscompilation due to LTO on aarch64

2016-10-16 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77996 --- Comment #11 from Yichao Yu --- The case pointed out is fixed in https://reviews.llvm.org/rL284336 although as expected that doesn't fix the error. Still not sure whose bug is this...

[Bug middle-end/77996] Miscompilation due to LTO on aarch64

2016-10-15 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77996 --- Comment #10 from Yichao Yu --- That does look like an violation (this particular one should be hidden behind shared library boundary in the reduced case though). Reported to LLVM at https://llvm.org/bugs/show_bug.cgi?id=30711 .

[Bug middle-end/77996] Miscompilation due to LTO on aarch64

2016-10-15 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77996 --- Comment #8 from Yichao Yu --- > Can you try with -fno-strict-aliasing ? That seems to fix it for both the original case (LLVM) and the reduced case (the linked tarball). Is there a way to figure out the problematic (either bug in LLVM's

[Bug middle-end/77996] Miscompilation due to LTO on aarch64

2016-10-15 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77996 --- Comment #6 from Yichao Yu --- I've compiled a gcc at 951db45 using the same configuration as archlinux arm PKGBUILD and I can reproduce the problem using the `code/` in

[Bug middle-end/77996] Miscompilation due to LTO on aarch64

2016-10-15 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77996 --- Comment #5 from Yichao Yu --- Compiling current llvm trunk (r284322) still shows the same error. The script I used to compile LLVM is here https://github.com/yuyichao/arch-pkg/blob/master/pkg/all/llvm-svn/PKGBUILD. Compiling gcc 951db45

[Bug middle-end/77996] Miscompilation due to LTO on aarch64

2016-10-15 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77996 --- Comment #3 from Yichao Yu --- > What exact version of LLVM are you trying to compile? Revision of the LLVM > sources including revision of clang, etc. I was compiling the trunk version. The version I started reducing from was

[Bug lto/77997] Miscompilation due to LTO on aarch64

2016-10-15 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77997 --- Comment #2 from Yichao Yu --- . Sorry the first submission gave me a time out so I did again..

[Bug lto/77997] New: Miscompilation due to LTO on aarch64

2016-10-15 Thread yyc1992 at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- I'm seeing a miscompilation of LLVM's tablegen on AArch64 by gcc 6.2.1 when LTO is enabled. I've tried very hard to reduce it but unfortunately it wasn't very successful this time

[Bug lto/77996] New: Miscompilation due to LTO on aarch64

2016-10-15 Thread yyc1992 at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- I'm seeing a miscompilation of LLVM's tablegen on AArch64 by gcc 6.2.1 when LTO is enabled. I've tried very hard to reduce it but unfortunately it wasn't very successful this time

[Bug target/77728] [5/6/7 Regression] Miscompilation multiple vector iteration on ARM

2016-09-26 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77728 --- Comment #2 from Yichao Yu --- I should add that turning on lto works around the issue both in the simple code attached and for the original issue I was having in julia (i.e. compiling llvm with LTO makes the issue go away).

[Bug target/77728] New: Miscompilation multiple vector iteration on ARM

2016-09-24 Thread yyc1992 at gmail dot com
: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Code to reproduce is at https://gist.github.com/yuyichao/a66edb9d05d18755fb7587b12e021a8a. The two cpp files are ```c++ #include #include typedef std::vector<std::p

[Bug target/70814] atomic store of __int128 is not lock free on aarch64

2016-06-28 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814 --- Comment #4 from Yichao Yu --- Thanks for the explanation. I didn't realize that the load is the problem. Just curious (since I somehow can't find documentation about it), would `ldaxp` provide the right semantics without the corresponding

[Bug tree-optimization/71414] 2x slower than clang summing small float array, GCC should consider larger vectorization factor for "unrolling" reductions

2016-06-07 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71414 --- Comment #7 from Yichao Yu --- If I add `-fvariable-expansion-in-unroller` (omg this options is like half the command line ;-p ...), the performance matches the clang one after the clang 3.8 regression. ``` % gcc -funroll-loops

[Bug other/71414] 2x slower than clang summing small float array

2016-06-06 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71414 --- Comment #4 from Yichao Yu --- The C code is in the gist linked `a` is a cacheline aligned pointer and `n` is 1024 so `a` should even fits in L1d, which is 32kB on both processors I benchmarked. More precise timing (ns per loop) 6700K ```

[Bug other/71414] New: 2x slower than clang summing small float array

2016-06-04 Thread yyc1992 at gmail dot com
: other Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Ref https://llvm.org/bugs/show_bug.cgi?id=28002 C source code. ```c __attribute__((noinline)) float sum32(float *a, size_t n) { /* a = (float*)__builtin_assume_aligned

[Bug target/71056] [6/7 Regression] __builtin_bswap32 NEON instruction error with -O3

2016-05-21 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71056 --- Comment #4 from Yichao Yu --- (Sorry I'm not sure how to understand that cross link). Is the fix merged?

[Bug target/71056] New: __builtin_bswap32 NEON instruction error with -O3

2016-05-10 Thread yyc1992 at gmail dot com
: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- The following code generate a NEON instruction not available error when compiling with `gcc -march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 -O3 -o /dev/null -c a.c` on ARM

  1   2   >