[PATCH v3] LoongArch: Add support code model extreme.
v1 -> v2: - Modify some description information. - Add options -W[no]extreme-plt, warn about code model extreme not support plt mode, and then disable plt. v2 -> v3: - When -mcmodel=extreme, default set to -fno-plt mode, if the user forces to use '-mcmodel=extreme -fplt', an error will be reported. - Delete -Wextreme-plt. - Fix bug when compiling with '-mcmodel=normal -fno-plt -mno-explicit-relocs'. Use five instructions to calculate a signed 64-bit offset relative to the pc. gcc/ChangeLog: * config/loongarch/loongarch-opts.cc: Allow cmodel to be extreme. * config/loongarch/loongarch.cc (loongarch_call_tls_get_addr): Add extreme support for TLS GD and LD types. (loongarch_legitimize_tls_address): Add extreme support for TLS LE and IE. (loongarch_split_symbol): When compiling with -mcmodel=extreme, the symbol address will be obtained through five instructions. (loongarch_print_operand_reloc): Add support. (loongarch_print_operand): Add support. (loongarch_print_operand_address): Add support. (loongarch_option_override_internal): Set '-mcmodel=extreme' option incompatible with '-mno-explicit-relocs'. * config/loongarch/loongarch.md (@lui_l_hi20): Loads bits 12-31 of data into registers. (lui_h_lo20): Load bits 32-51 of the data and spell bits 0-31 of the source register. (lui_h_hi12): Load bits 52-63 of the data and spell bits 0-51 of the source register. * config/loongarch/predicates.md: Symbols need to be decomposed when defining the macro TARGET_CMODEL_EXTREME * doc/invoke.texi: Modify the description information of cmodel in the document. Document -W[no-]extreme-plt. gcc/testsuite/ChangeLog: * gcc.target/loongarch/func-call-1.c: Add option '-mcmodel=normal'. * gcc.target/loongarch/func-call-2.c: Likewise. * gcc.target/loongarch/func-call-3.c: Likewise. * gcc.target/loongarch/func-call-4.c: Likewise. * gcc.target/loongarch/func-call-5.c: Likewise. * gcc.target/loongarch/func-call-6.c: Likewise. * gcc.target/loongarch/func-call-7.c: Likewise. * gcc.target/loongarch/func-call-8.c: Likewise. * gcc.target/loongarch/relocs-symbol-noaddend.c: Likewise. * gcc.target/loongarch/func-call-extreme-1.c: New test. * gcc.target/loongarch/func-call-extreme-2.c: New test. --- gcc/config/loongarch/loongarch-opts.cc| 3 +- gcc/config/loongarch/loongarch.cc | 223 +++--- gcc/config/loongarch/loongarch.md | 34 ++- gcc/config/loongarch/predicates.md| 9 +- gcc/doc/invoke.texi | 50 +--- .../gcc.target/loongarch/func-call-1.c| 2 +- .../gcc.target/loongarch/func-call-2.c| 2 +- .../gcc.target/loongarch/func-call-3.c| 2 +- .../gcc.target/loongarch/func-call-4.c| 2 +- .../gcc.target/loongarch/func-call-5.c| 2 +- .../gcc.target/loongarch/func-call-6.c| 2 +- .../gcc.target/loongarch/func-call-7.c| 2 +- .../gcc.target/loongarch/func-call-8.c| 2 +- .../loongarch/func-call-extreme-1.c | 32 +++ .../loongarch/func-call-extreme-2.c | 32 +++ .../loongarch/relocs-symbol-noaddend.c| 2 +- 16 files changed, 319 insertions(+), 82 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-extreme-1.c create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-extreme-2.c diff --git a/gcc/config/loongarch/loongarch-opts.cc b/gcc/config/loongarch/loongarch-opts.cc index 3f70943ded6..2ae89f23443 100644 --- a/gcc/config/loongarch/loongarch-opts.cc +++ b/gcc/config/loongarch/loongarch-opts.cc @@ -376,14 +376,13 @@ fallback: /* 5. Target code model */ t.cmodel = constrained.cmodel ? opt_cmodel : CMODEL_NORMAL; - if (t.cmodel != CMODEL_NORMAL) + if (t.cmodel != CMODEL_NORMAL && t.cmodel != CMODEL_EXTREME) { warning (0, "%qs is not supported, now cmodel is set to %qs", loongarch_cmodel_strings[t.cmodel], "normal"); t.cmodel = CMODEL_NORMAL; } - /* Cleanup and return. */ obstack_free (&msg_obstack, NULL); *target = t; diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 24378143641..c5667f5ae3c 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -2436,7 +2436,19 @@ loongarch_call_tls_get_addr (rtx sym, enum loongarch_symbol_type type, rtx v0) /* Split tls symbol to high and low. */ rtx high = gen_rtx_HIGH (Pmode, copy_rtx (loc)); high = loongarch_force_temporary (tmp, high); - emit_insn (gen_tls_low (Pmode, a0, high, loc)); + + if (TARGET_CMODEL_EXTREME) + { + gcc_assert (TARGET_EXPLICIT_RELOCS);
Re: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
After deep analysis and several tries. I have made a analysis and conclusion as follows. I really appreciate if you could spend some time take a look at the following analysis that I made: According to the codes in test_vector_subregs_fore_back: ~~~ poly_uint64 nunits = GET_MODE_NUNITS (inner_mode); unsigned int min_nunits = constant_lower_bound (nunits); scalar_mode int_mode = GET_MODE_INNER (inner_mode); unsigned int count = gcd (min_nunits, 4); rtx_vector_builder builder (inner_mode, count, 2); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (i, int_mode)); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (-(int) i, int_mode)); rtx x = builder.build (); Analyze the codes and tried several patterns: For poly_uint16 (2,2): x = (const_vector:VNx1DI [ (const_int 0 [0]) (const_int 1 [0x1]) repeat [ (const_int 0 [0]) (const_int -1 [0x]) ] ]) For poly_uint16 (4,4): x = (const_vector:VNx1DI [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 2 [0x2]) (const_int 3 [0x3]) repeat [ (const_int 0 [0]) (const_int -1 [0x]) (const_int -2 [0xfffe]) (const_int -3 [0xfffd]) ] ]) So, I think I can conclude the pattern rule as follows: poly_uint16 (n,n): x = (const_vector:VNx1DI [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 2 [0x2]) (const_int 3 [0x3]) .. (const_int n [hex(n)]) repeat [ (const_int 0 [0]) (const_int -1 [0x]) (const_int -2 [0xfffe]) (const_int -3 [0xfffd]) . (const_int -n [hex(-n)]) ] ]) Am I understanding right? Let's first take a look at the codes you write: rtx_vector_builder builder (inner_mode, count, 2); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (i, int_mode)); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (-(int) i, int_mode)); rtx x = builder.build (); So for poly_uint (1,1): Ideally according to the codes, you want to generate the pattern like this: x = (const_vector:VNx1DI [ (const_int 0 [0]) repeat [ (const_int 0 [0]) ] ]) In your codes, when count = 1 for poly_uint16 (1,1). And i < count iteration make i always is 0. In this case, i = -i = 0. So we will end up with the first value = 0, and repeat value is also 0, in "rtx x = builder.build ();". Because GCC thinks all the value are 0, so the pattern is changed as follows which is not a stepped vector: x = (const_vector:VNx1DI repeat [ (const_int 0 [0]) ]) This is a const_vector duplicate vector. This is not the correct pattern you want so it fails in the following check. I think the test pattern generator goes wrong in poly_uint16 (1,1) which is the only corner case will go wrong. To fix this, we should adjust test pattern generator to specifically poly_uint16 (1,1). We should make first value and repeat value different. I tried 2 solution and they both pass for poly_uint16 (1,1): Original codes: rtx_vector_builder builder (inner_mode, count, 2); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (i, int_mode)); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (-(int) i, int_mode)); 1st solution: rtx_vector_builder builder (inner_mode, count, 2); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (count == 1 ? 1 : i, int_mode)); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (-(int) i, int_mode)); x = (const_vector:VNx1DI [ (const_int 0 [0]) repeat [ (const_int 1 [0x1]) ] ]) 2nd solution: rtx_vector_builder builder (inner_mode, count, 2); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (i, int_mode)); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (count == 1 ? -1 : -(int) i, int_mode)); x= (const_vector:VNx1DI [ (const_int 0 [0]) repeat [ (const_int -1 [0x]) ] ]) Do you agree with these solution? Or you could offer a better solution to fix this? Thank you so much. juzhe.zh...@rivai.ai From: Richard Sandiford Date: 2022-08-19 20:52 To: juzhe.zhong\@rivai.ai CC: gcc-patches; rguenther; kito.cheng Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) "juzhe.zh...@rivai.ai" writes: >> Ah, right, sorry for the bogus suggestion. >> In
[PATCH v3] c++: Implement -Wself-move warning [PR81159]
On Thu, Aug 18, 2022 at 08:33:47PM -0400, Jason Merrill wrote: > On 8/18/22 13:19, Marek Polacek wrote: > > On Mon, Aug 15, 2022 at 03:54:05PM -0400, Jason Merrill wrote: > > > On 8/9/22 09:37, Marek Polacek wrote: > > > > + /* We're looking for *std::move ((T &) &arg), or > > > > + *std::move ((T &) (T *) r) if the argument it a reference. */ > > > > + if (!REFERENCE_REF_P (rhs) > > > > + || TREE_CODE (TREE_OPERAND (rhs, 0)) != CALL_EXPR) > > > > +return; > > > > + tree fn = TREE_OPERAND (rhs, 0); > > > > + if (!is_std_move_p (fn)) > > > > +return; > > > > + tree arg = CALL_EXPR_ARG (fn, 0); > > > > + if (TREE_CODE (arg) != NOP_EXPR) > > > > +return; > > > > + /* Strip the (T &). */ > > > > + arg = TREE_OPERAND (arg, 0); > > > > + /* Strip the (T *) or &. */ > > > > + arg = TREE_OPERAND (arg, 0); > > > > > > Are you sure these are the only two expressions that can make it here? > > > What > > > if the argument to move is *Tptr? > > > > Not 100% sure but I couldn't find any other form. For *Tptr we get > > *std::move ((int * &) &Tptr) > > That likes like what you'd get when the argument is Tptr, not when it's > *Tptr. And indeed that's what I see in the testcase: > > > + Tptr = std::move (Tptr); // { dg-warning "moving a variable to itself" } > > is missing the * Duh, sorry. The previous patch didn't handle the *Tptr case. Further poking revealed that we need special care to handle (*(Tptr)) and **Tptr etc. So in this patch I'm stripping all *s and V_C_Es. Sigh. Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? -- >8 -- About 5 years ago we got a request to implement -Wself-move, which warns about useless moves like this: int x; x = std::move (x); This patch implements that warning. PR c++/81159 gcc/c-family/ChangeLog: * c.opt (Wself-move): New option. gcc/cp/ChangeLog: * typeck.cc (maybe_warn_self_move): New. (cp_build_modify_expr): Call maybe_warn_self_move. gcc/ChangeLog: * doc/invoke.texi: Document -Wself-move. gcc/testsuite/ChangeLog: * g++.dg/warn/Wself-move1.C: New test. --- gcc/c-family/c.opt | 4 + gcc/cp/typeck.cc| 51 +- gcc/doc/invoke.texi | 23 - gcc/testsuite/g++.dg/warn/Wself-move1.C | 122 4 files changed, 198 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/g++.dg/warn/Wself-move1.C diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index dfdebd596ef..f776efd39d8 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -1229,6 +1229,10 @@ Wselector ObjC ObjC++ Var(warn_selector) Warning Warn if a selector has multiple methods. +Wself-move +C++ ObjC++ Var(warn_self_move) Warning LangEnabledBy(C++ ObjC++, Wall) +Warn when a value is moved to itself with std::move. + Wsequence-point C ObjC C++ ObjC++ Var(warn_sequence_point) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall) Warn about possible violations of sequence point rules. diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc index 992ebfd99fb..b2bd13db1b6 100644 --- a/gcc/cp/typeck.cc +++ b/gcc/cp/typeck.cc @@ -8897,7 +8897,54 @@ cp_build_c_cast (location_t loc, tree type, tree expr, return error_mark_node; } - + +/* Warn when a value is moved to itself with std::move. LHS is the target, + RHS may be the std::move call, and LOC is the location of the whole + assignment. */ + +static void +maybe_warn_self_move (location_t loc, tree lhs, tree rhs) +{ + if (!warn_self_move) +return; + + /* C++98 doesn't know move. */ + if (cxx_dialect < cxx11) +return; + + if (processing_template_decl) +return; + + if (!REFERENCE_REF_P (rhs) + || TREE_CODE (TREE_OPERAND (rhs, 0)) != CALL_EXPR) +return; + tree fn = TREE_OPERAND (rhs, 0); + if (!is_std_move_p (fn)) +return; + + /* Just a little helper to strip * and various NOPs. */ + auto extract_op = [] (tree &op) { +STRIP_NOPS (op); +while (INDIRECT_REF_P (op)) + op = TREE_OPERAND (op, 0); +op = maybe_undo_parenthesized_ref (op); +STRIP_ANY_LOCATION_WRAPPER (op); + }; + + tree arg = CALL_EXPR_ARG (fn, 0); + extract_op (arg); + if (TREE_CODE (arg) == ADDR_EXPR) +arg = TREE_OPERAND (arg, 0); + extract_op (lhs); + + if (cp_tree_equal (lhs, arg)) +{ + auto_diagnostic_group d; + if (warning_at (loc, OPT_Wself_move, "moving a variable to itself")) + inform (loc, "remove % call"); +} +} + /* For use from the C common bits. */ tree build_modify_expr (location_t location, @@ -9101,6 +9148,8 @@ cp_build_modify_expr (location_t loc, tree lhs, enum tree_code modifycode, if (modifycode == NOP_EXPR) { + maybe_warn_self_move (loc, lhs, rhs); + if (c_dialect_objc ()) { result = objc_maybe_build_modify_expr (lhs, rhs); diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index f65d351a5fc..5dea3fee124 1006
Ping [PATCH V2] libcpp: Optimize #pragma once with a hash table [PR58770]
Hi all, Would love some feedback on this patch! Thanks, Paul On Mon, Aug 01, 2022 at 05:18:40AM +, Paul Hollinsky wrote: > Rather than traversing the all_files linked list for every include, > this factors out the quick idempotency checks (modification time > and size) to be the keys in a hash table so we can find matching > files quickly. > > The hash table value type is a linked list, in case more than one > file matches the quick check. > > The table is only built if a once-only file is seen, so include > guard performance is not affected. > > My laptop would previously complete Ricardo's benchmark from the > PR in ~1.1s using #pragma once, and ~0.35s using include guards. > > After this change, both benchmarks now complete in ~0.35s. I did > have to randomize the modification dates on the benchmark headers > so the files did not all end up in the same hash table list, but > that would likely not come up outside of the contrived benchmark. > > I bootstrapped and ran the testsuite on x86_64 Darwin, as well as > ppc64le and aarch64 Linux. > > libcpp/ChangeLog: > > PR preprocessor/58770 > * internal.h: Add hash table for #pragma once > * files.cc: Optimize #pragma once with the hash table > > Signed-off-by: Paul Hollinsky > --- > libcpp/files.cc | 116 +++--- > libcpp/internal.h | 3 ++ > 2 files changed, 112 insertions(+), 7 deletions(-) > > diff --git a/libcpp/files.cc b/libcpp/files.cc > index 24208f7b0f8..d4ffd77578e 100644 > --- a/libcpp/files.cc > +++ b/libcpp/files.cc > @@ -167,6 +167,33 @@ struct file_hash_entry_pool >struct cpp_file_hash_entry pool[FILE_HASH_POOL_SIZE]; > }; > > +/* A set of attributes designed to quickly identify obviously different files > + in a hashtable. Just in case there are collisions, we still maintain a > + list. These sub-lists can then be checked for #pragma once rather than > + interating through all_files. */ > +struct file_quick_idempotency_attrs > +{ > + file_quick_idempotency_attrs(const _cpp_file *f) > +: mtime(f->st.st_mtime), size(f->st.st_size) {} > + > + time_t mtime; > + off_t size; > + > + static hashval_t hash (/* _cpp_file* */ const void *p); > +}; > + > +/* Sub-list of very similar files kept in a hashtable to check for #pragma > + once. */ > +struct file_sublist > +{ > + _cpp_file *f; > + file_sublist *next; > + > + static int eq (/* _cpp_file* */ const void *p, > + /* file_sublist* */ const void *q); > + static void del (/* file_sublist* */ void *p); > +}; > + > static bool open_file (_cpp_file *file); > static bool pch_open_file (cpp_reader *pfile, _cpp_file *file, > bool *invalid_pch); > @@ -849,17 +876,17 @@ has_unique_contents (cpp_reader *pfile, _cpp_file > *file, bool import, >if (!pfile->seen_once_only) > return true; > > - /* We may have read the file under a different name. Look > - for likely candidates and compare file contents to be sure. */ > - for (_cpp_file *f = pfile->all_files; f; f = f->next_file) > + /* We may have read the file under a different name. We've kept > + similar looking files in this lists under this hash table, so > + check those more thoroughly. */ > + void* ent = htab_find(pfile->pragma_once_files, file); > + for (file_sublist *e = static_cast (ent); e; e = e->next) > { > + _cpp_file *f = e->f; >if (f == file) > continue; /* It'sa me! */ > > - if ((import || f->once_only) > - && f->err_no == 0 > - && f->st.st_mtime == file->st.st_mtime > - && f->st.st_size == file->st.st_size) > + if ((import || f->once_only) && f->err_no == 0) > { > _cpp_file *ref_file; > > @@ -895,6 +922,38 @@ has_unique_contents (cpp_reader *pfile, _cpp_file *file, > bool import, >return true; > } > > +/* Add the given file to the #pragma once table so it can be > + quickly identified and excluded the next time it's seen. */ > +static void > +update_pragma_once_table (cpp_reader *pfile, _cpp_file *file) > +{ > + void **slot = htab_find_slot (pfile->pragma_once_files, file, INSERT); > + if (slot) > +{ > + if (!*slot) > + *slot = xcalloc (1, sizeof (file_sublist)); > + > + file_sublist *e = static_cast (*slot); > + while (e->f) > + { > + if (!e->next) > + { > + void *new_sublist = xcalloc(1, sizeof (file_sublist)); > + e->next = static_cast (new_sublist); > + } > + e = e->next; > + } > + > + e->f = file; > +} > + else > +{ > + cpp_error (pfile, CPP_DL_ERROR, > + "Unable to create #pragma once table space for %s", > + _cpp_get_file_name (file)); > +} > +} > + > /* Place the file referenced by FILE into a new buffer on the buffer > stack if possible. Returns true if a buffer is stacked. Use LOC > for any diagnostics. */ > @@ -950,6 +1009,9 @@ _cpp_stac
Re: [PATCH] Add ABI test for __bf16 type
On Thu, Aug 18, 2022 at 5:56 PM Hongtao Liu via Gcc-patches wrote: > > On Thu, Aug 18, 2022 at 3:36 PM Haochen Jiang via Gcc-patches > wrote: > > > > Hi all, > > > > This patch aims to add bf16 abi test after the whole __bf16 type is added. > > > > Regtested on x86_64-pc-linux-gnu. Ok for trunk? > Ok. All BF16 ABI tests failed due to missing __m128bf16/__m256bf16/__m512bf16. When will __bf16 types be added? > > > > BRs, > > Haochen > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/x86_64/abi/bf16/abi-bf16.exp: New test. > > * gcc.target/x86_64/abi/bf16/args.h: Ditto. > > * gcc.target/x86_64/abi/bf16/asm-support.S: Ditto. > > * gcc.target/x86_64/abi/bf16/bf16-check.h: Ditto. > > * gcc.target/x86_64/abi/bf16/bf16-helper.h: Ditto. > > * gcc.target/x86_64/abi/bf16/defines.h: Ditto. > > * gcc.target/x86_64/abi/bf16/m256bf16/abi-bf16-ymm.exp: Ditto. > > * gcc.target/x86_64/abi/bf16/m256bf16/args.h: Ditto. > > * gcc.target/x86_64/abi/bf16/m256bf16/asm-support.S: Ditto. > > * gcc.target/x86_64/abi/bf16/m256bf16/bf16-ymm-check.h: Ditto. > > * gcc.target/x86_64/abi/bf16/m256bf16/test_m256_returning.c: Ditto. > > * gcc.target/x86_64/abi/bf16/m256bf16/test_passing_m256.c: Ditto. > > * gcc.target/x86_64/abi/bf16/m256bf16/test_passing_structs.c: Ditto. > > * gcc.target/x86_64/abi/bf16/m256bf16/test_passing_unions.c: Ditto. > > * gcc.target/x86_64/abi/bf16/m256bf16/test_varargs-m256.c: Ditto. > > * gcc.target/x86_64/abi/bf16/m512bf16/abi-bf16-zmm.exp: Ditto. > > * gcc.target/x86_64/abi/bf16/m512bf16/args.h: Ditto. > > * gcc.target/x86_64/abi/bf16/m512bf16/asm-support.S: Ditto. > > * gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h: Ditto. > > * gcc.target/x86_64/abi/bf16/m512bf16/test_m512_returning.c: Ditto. > > * gcc.target/x86_64/abi/bf16/m512bf16/test_passing_m512.c: Ditto. > > * gcc.target/x86_64/abi/bf16/m512bf16/test_passing_structs.c: Ditto. > > * gcc.target/x86_64/abi/bf16/m512bf16/test_passing_unions.c: Ditto. > > * gcc.target/x86_64/abi/bf16/m512bf16/test_varargs-m512.c: Ditto. > > * gcc.target/x86_64/abi/bf16/macros.h: Ditto. > > * gcc.target/x86_64/abi/bf16/test_3_element_struct_and_unions.c: > > Ditto. > > * gcc.target/x86_64/abi/bf16/test_basic_alignment.c: Ditto. > > * gcc.target/x86_64/abi/bf16/test_basic_array_size_and_align.c: > > Ditto. > > * gcc.target/x86_64/abi/bf16/test_basic_returning.c: Ditto. > > * gcc.target/x86_64/abi/bf16/test_basic_sizes.c: Ditto. > > * gcc.target/x86_64/abi/bf16/test_basic_struct_size_and_align.c: > > Ditto. > > * gcc.target/x86_64/abi/bf16/test_basic_union_size_and_align.c: > > Ditto. > > * gcc.target/x86_64/abi/bf16/test_m128_returning.c: Ditto. > > * gcc.target/x86_64/abi/bf16/test_passing_floats.c: Ditto. > > * gcc.target/x86_64/abi/bf16/test_passing_m128.c: Ditto. > > * gcc.target/x86_64/abi/bf16/test_passing_structs.c: Ditto. > > * gcc.target/x86_64/abi/bf16/test_passing_unions.c: Ditto. > > * gcc.target/x86_64/abi/bf16/test_struct_returning.c: Ditto. > > * gcc.target/x86_64/abi/bf16/test_varargs-m128.c: Ditto. -- H.J.
[Patch] lto-wrapper.cc: Delete offload_names temp files in case of error [PR106686]
I saw that files such as /tmp/ccxFKYeS.target.o kept accumulating. Not a high number, but still several. I turned out that when compiling an offloading program successfully, they were removed. But when it failed, those remained. (Example: See PR.) This patch fixes this by storing the file name earlier and process them during cleanup, unless they have been taken care of before. (Usual way; as they are printf'ed to stdout, I assume the caller takes care of the tmp files.) OK for mainline? Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 lto-wrapper.cc: Delete offload_names temp files in case of error [PR106686] Usually, the caller takes care of the .o files for the offload compilers (suffix: ".target.o"). However, if an error occurs during processing (e.g. fatal error by lto1), they were not deleted. gcc/ChangeLog: PR lto/106686 * lto-wrapper.cc (free_array_of_ptrs): Move before tool_cleanup. (tool_cleanup): Unlink offload_names. (compile_offload_image): Take filename argument to set it early. (compile_images_for_offload_targets): Update call; set offload_names to NULL after freeing the array. diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc index 1e8eba1..9a76470 100644 --- a/gcc/lto-wrapper.cc +++ b/gcc/lto-wrapper.cc @@ -89,6 +89,25 @@ static bool xassembler_options_error = false; const char tool_name[] = "lto-wrapper"; +/* Auxiliary function that frees elements of PTR and PTR itself. + N is number of elements to be freed. If PTR is NULL, nothing is freed. + If an element is NULL, subsequent elements are not freed. */ + +static void ** +free_array_of_ptrs (void **ptr, unsigned n) +{ + if (!ptr) +return NULL; + for (unsigned i = 0; i < n; i++) +{ + if (!ptr[i]) + break; + free (ptr[i]); +} + free (ptr); + return NULL; +} + /* Delete tempfiles. Called from utils_cleanup. */ void @@ -114,6 +133,12 @@ tool_cleanup (bool) if (output_names[i]) maybe_unlink (output_names[i]); } + if (offload_names) +{ + for (i = 0; offload_names[i]; i++) + maybe_unlink (offload_names[i]); + free_array_of_ptrs ((void **) offload_names, i); +} } static void @@ -626,25 +651,6 @@ merge_and_complain (vec &decoded_options, } } -/* Auxiliary function that frees elements of PTR and PTR itself. - N is number of elements to be freed. If PTR is NULL, nothing is freed. - If an element is NULL, subsequent elements are not freed. */ - -static void ** -free_array_of_ptrs (void **ptr, unsigned n) -{ - if (!ptr) -return NULL; - for (unsigned i = 0; i < n; i++) -{ - if (!ptr[i]) - break; - free (ptr[i]); -} - free (ptr); - return NULL; -} - /* Parse STR, saving found tokens into PVALUES and return their number. Tokens are assumed to be delimited by ':'. If APPEND is non-null, append it to every token we find. */ @@ -908,13 +914,13 @@ access_check (const char *name, int mode) /* Prepare a target image for offload TARGET, using mkoffload tool from COMPILER_PATH. Return the name of the resultant object file. */ -static char * +static const char * compile_offload_image (const char *target, const char *compiler_path, unsigned in_argc, char *in_argv[], vec compiler_opts, - vec linker_opts) + vec linker_opts, + char **filename) { - char *filename = NULL; char *dumpbase; char **argv; char *suffix @@ -922,6 +928,7 @@ compile_offload_image (const char *target, const char *compiler_path, strcpy (suffix, "/accel/"); strcat (suffix, target); strcat (suffix, "/mkoffload"); + *filename = NULL; char **paths = NULL; unsigned n_paths = parse_env_var (compiler_path, &paths, suffix); @@ -950,9 +957,9 @@ compile_offload_image (const char *target, const char *compiler_path, /* Generate temporary output file name. */ if (save_temps) -filename = concat (dumpbase, ".o", NULL); +*filename = concat (dumpbase, ".o", NULL); else -filename = make_temp_file (".target.o"); +*filename = make_temp_file (".target.o"); struct obstack argv_obstack; obstack_init (&argv_obstack); @@ -962,7 +969,7 @@ compile_offload_image (const char *target, const char *compiler_path, if (verbose) obstack_ptr_grow (&argv_obstack, "-v"); obstack_ptr_grow (&argv_obstack, "-o"); - obstack_ptr_grow (&argv_obstack, filename); + obstack_ptr_grow (&argv_obstack, *filename); /* Append names of input object files. */ for (unsigned i = 0; i < in_argc; i++) @@ -986,7 +993,7 @@ compile_offload_image (const char *target, const char *compiler_path, obstack_free (&argv_obstack, NULL); free_array_of_ptrs ((void **) paths, n_paths); - return filename; + return *filename; } @@ -1016,10 +1023,9
Re: [PATCH] wwwdocs: Add D language changes and caveats to gcc-12/changes.html
On Wed, 17 Aug 2022, Iain Buclaw wrote: > This patch belatedly adds the new features and changes to the D > front-end during the GCC 12 development cycle, as well as a bullet in > the caveat section for D's new bootstrapping requirements. Nice! > +D: > +Building and bootstrapping GDC, the D compiler, now requires a working > GDC > +compiler (GCC version 9.1 or later) and D runtime library, libphobos, as > +the D front end is written in D. Might we be able to omit the "and bootstrapping" reference, which can be seen as a specific flavor of builing? And put "libphobos" in parentheses, in line with "GCC version 9.1..."? > +by default, but compiles and works if --enable-libphobos is > +used. Other targets may require a more recent minimum version of GCC to > +bootstrap. Specifics are documented for affected targets in the Might we be able to omit "for affected targets"? How do you feel about +https://gcc.gnu.org/install/specific.html";>installation +instructions. instead of +https://gcc.gnu.org/install/specific.html";>manual for +installing GCC. Genuine questions, all of them. > + On supported targets, the __traits(compiles) > expression No comma between "targets" and "the". > + -fcheck=, enables or disables the code generation of > + specific run-time contract checks. No comma (I think)? ALso for the following entries. > + -fcheckaction=, controls the run-time behaviour on an We generally use US English for consistency. :) > + -fdump-c++-spec=, dumps all compiled > + extern(C++) declarations as C++ code to a file. "to a file as C++ code"? > + The supplimentary option -fdump-c++-spec-verbose turns on > + emission of comments for ignored declaration in the generated C++ spec. "declarations" > + -fextern-std=, controls which C++ standard > + extern(C++) declarations are compiled to be compatible > + with. It feels something is missing here (in terms of grammar)? > + -fsave-mixins=, saves mixins expanded at compile-time > to > + a file. > + Will it be clear to everyone what a mixin is? (It's not to me, but I do not know D.) > + Deprecated and removed features: : > + The -ftransition=dip25 and > + -ftransition=dip1000 compiler switches have been renamed > + to -fpreview=dip25 and -fprefix=dip1000. > + Hmm, so the feature as such has not been deprecated or removed; just how it's been invoked has changed. I hope this is useful feedback. As maintainer for everything D in GCC you do not need approval to commit, I'm just happy to help (from a consistency and "naive user" perspective if nothing else). Cheers, Gerald
Re: [PATCH] Refactor back_threader_profitability
On 8/17/2022 1:31 AM, Aldy Hernandez via Gcc-patches wrote: I just have a few high level comments. On Tue, Aug 16, 2022 at 4:05 PM Richard Biener wrote: The following refactors profitable_path_p in the backward threader, splitting out parts that can be computed once the exit block is known, parts that contiguously update and that can be checked allowing for the path to be later identified as FSM with larger limits, possibly_profitable_path_p, and final checks done when the whole path is known, profitable_path_p. I thought we were removing references to FSM, as they were leftovers from some previous incarnation. For that matter, I don't think I ever understood what they are, so if we're gonna keep them, could you comment what makes FSM threads different from other threads? FSM refers to the initial implementation from Steve E. IIRC. It was designed to handle threading backedges in a loop where the path out of the current iteration would tell us where a multi-way branch in the next loop iteration would go. During the integration of Steve's work it was recognized that the backwards walk was generally a better model and we started moving to the backwards based threader with the goal of removing the forward threader. There should be tests in the testsuite which validate that we haven't lost the key transformation. ssa-thread-backedge.c and pr77445-2.c The DOM threader has limits? I thought most of those limits were just due to the fact that it couldn't determine long enough paths? Either way, I like that we're merging the necessary forward threader bits here, in preparation for its demise ;-). The forward threader has structural limitations due to its custom block copier and CFG update code as well as profitibility limitations. jeff
Re: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
I rewrite test_vector_subregs_fore_back as follows: static void test_vector_subregs_fore_back (machine_mode inner_mode) { poly_uint64 nunits = GET_MODE_NUNITS (inner_mode); unsigned int min_nunits = constant_lower_bound (nunits); scalar_mode int_mode = GET_MODE_INNER (inner_mode); unsigned int count = gcd (min_nunits, 4); rtx_vector_builder builder (inner_mode, count, 2); unsigned int nelts_per_pattern = count == 1 ? 2 : count; for (unsigned int i = 0; i < nelts_per_pattern; ++i) builder.quick_push (gen_int_mode (i, int_mode)); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (-(int) i, int_mode)); rtx x = builder.build (); test_vector_subregs_modes (x); if (!nunits.is_constant ()) test_vector_subregs_modes (x, nunits - min_nunits, count); } I add the code: unsigned int nelts_per_pattern = count == 1 ? 2 : count; then replace the "count" into "nelts_per_pattern " in the first loop. It can pass now. And "x" value I print out seems to be correct: (const_vector:VNx1DI [ (const_int 0 [0]) repeat [ (const_int 1 [0x1]) ] ]) Is this correct solution ? Thanks. juzhe.zh...@rivai.ai From: Richard Sandiford Date: 2022-08-19 20:52 To: juzhe.zhong\@rivai.ai CC: gcc-patches; rguenther; kito.cheng Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) "juzhe.zh...@rivai.ai" writes: >> Ah, right, sorry for the bogus suggestion. >> In that case, what specifically goes wrong? Which test in >> test_vector_subregs_modes fails, and what does the ICE look like? > > Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute > test_vector_subregs_modes. > The fail ICE: > ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: > ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte)) > expected: (nil) > > actual: (const_int 0 [0]) > cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57 > 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, > rtx_def*, rtx_def*) > ../../../riscv-gcc/gcc/selftest-rtl.cc:57 > 0x1332504 test_vector_subregs_modes > ../../../riscv-gcc/gcc/simplify-rtx.cc:8395 > 0x1332988 test_vector_subregs_fore_back > ../../../riscv-gcc/gcc/simplify-rtx.cc:8442 > 0x1332ae7 test_vector_subregs > ../../../riscv-gcc/gcc/simplify-rtx.cc:8467 > 0x1332c57 test_vector_ops > ../../../riscv-gcc/gcc/simplify-rtx.cc:8487 > 0x1332c7b selftest::simplify_rtx_cc_tests() > ../../../riscv-gcc/gcc/simplify-rtx.cc:8547 > 0x21318fa selftest::run_tests() > ../../../riscv-gcc/gcc/selftest-run-tests.cc:115 > 0x1362a76 toplev::run_self_tests() > ../../../riscv-gcc/gcc/toplev.cc:2205 > > I analyzed the codes: > In test_vector_subregs_fore_back, when nunits = (1,1). The expected = > NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0. > So the assertion fails. Hmm, ok, so the subreg operation is unexpected succeeding. > This is the test for stepped vector using 2 element per pattern. For > poly_uint16 (1,1), it's true it is possible only has 1 element. The stepped case is 3 elements per pattern rather than 2. In a stepped pattern: a, b, b+n are represented explicitly, then the rest are implicitly b+n*2, b+n*3, etc. The case being handled by this code is instead the 2-element case: a, b are represented explicitly, then the rest are implicitly all b. Why is (1,1) different though? The test is constructing: nunits: 1 + 1x shape: nelts_per_pattern == 2, npatterns == 1 elements: a, b[, b, b, b, b, ...] It then tests subregs starting at 0 + 1x (so starting at the first b). But for (2,2) we should have: nunits: 2 + 2x shape: nelts_per_pattern == 2, npatterns == 2 elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...] and we should test subregs starting at 0 + 2x (so starting at the first b1). The two cases should be very similar, it's just that the (2,2) case doubles the number of patterns. > I think it makes sense to fail the test. However for poly (1,1) machine mode, > can we have the chance that some target define this > kind of machine mode only used for intrinsics? I already developed full RVV > support in GCC (including intrinsc and auto-vectorization). > I only enable auto-vectorization with mode larger than (2,2) and test it > fully. > From my experience, it seems the stepped vector only created during VLA > auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics > will > not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~ Following on from what I said above, it doesn't look like this particular case is related to stepped vectors. (1,1) shouldn't (need to) be a special case though. Any potentital problems that would occur for (1,1) with npatterns==1 would also occur for (n
Re: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
As you mentioned. For poly_uint16 (1, 1), the pattern should be: shape: nelts_per_pattern == 2, npatterns == 1 elements: a, b[, b, b, b, b, ...] I tried to print out the rtx that tests create to test: static void test_vector_subregs_fore_back (machine_mode inner_mode) { poly_uint64 nunits = GET_MODE_NUNITS (inner_mode); unsigned int min_nunits = constant_lower_bound (nunits); scalar_mode int_mode = GET_MODE_INNER (inner_mode); unsigned int count = gcd (min_nunits, 4); rtx_vector_builder builder (inner_mode, count, 2); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (i, int_mode)); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (-(int) i, int_mode)); rtx x = builder.build (); test_vector_subregs_modes (x); print_rtl_single(stdout,x); if (!nunits.is_constant ()) test_vector_subregs_modes (x, nunits - min_nunits, count); } the x value: (const_vector:VNx1DI repeat [ (const_int 0 [0]) ]) It seems that it doesn't match the pattern you said. Am I understanding correctly? And would you mind taking a look at the codes which generate x: static void test_vector_subregs_fore_back (machine_mode inner_mode) { poly_uint64 nunits = GET_MODE_NUNITS (inner_mode); unsigned int min_nunits = constant_lower_bound (nunits); scalar_mode int_mode = GET_MODE_INNER (inner_mode); unsigned int count = gcd (min_nunits, 4); rtx_vector_builder builder (inner_mode, count, 2); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (i, int_mode)); for (unsigned int i = 0; i < count; ++i) builder.quick_push (gen_int_mode (-(int) i, int_mode)); rtx x = builder.build (); The nunits = (1,1), min_nunits = 1, count = 1. I print out these variable value may helpful for you to check. Thanks! juzhe.zh...@rivai.ai From: Richard Sandiford Date: 2022-08-19 20:52 To: juzhe.zhong\@rivai.ai CC: gcc-patches; rguenther; kito.cheng Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) "juzhe.zh...@rivai.ai" writes: >> Ah, right, sorry for the bogus suggestion. >> In that case, what specifically goes wrong? Which test in >> test_vector_subregs_modes fails, and what does the ICE look like? > > Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute > test_vector_subregs_modes. > The fail ICE: > ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: > ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte)) > expected: (nil) > > actual: (const_int 0 [0]) > cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57 > 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, > rtx_def*, rtx_def*) > ../../../riscv-gcc/gcc/selftest-rtl.cc:57 > 0x1332504 test_vector_subregs_modes > ../../../riscv-gcc/gcc/simplify-rtx.cc:8395 > 0x1332988 test_vector_subregs_fore_back > ../../../riscv-gcc/gcc/simplify-rtx.cc:8442 > 0x1332ae7 test_vector_subregs > ../../../riscv-gcc/gcc/simplify-rtx.cc:8467 > 0x1332c57 test_vector_ops > ../../../riscv-gcc/gcc/simplify-rtx.cc:8487 > 0x1332c7b selftest::simplify_rtx_cc_tests() > ../../../riscv-gcc/gcc/simplify-rtx.cc:8547 > 0x21318fa selftest::run_tests() > ../../../riscv-gcc/gcc/selftest-run-tests.cc:115 > 0x1362a76 toplev::run_self_tests() > ../../../riscv-gcc/gcc/toplev.cc:2205 > > I analyzed the codes: > In test_vector_subregs_fore_back, when nunits = (1,1). The expected = > NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0. > So the assertion fails. Hmm, ok, so the subreg operation is unexpected succeeding. > This is the test for stepped vector using 2 element per pattern. For > poly_uint16 (1,1), it's true it is possible only has 1 element. The stepped case is 3 elements per pattern rather than 2. In a stepped pattern: a, b, b+n are represented explicitly, then the rest are implicitly b+n*2, b+n*3, etc. The case being handled by this code is instead the 2-element case: a, b are represented explicitly, then the rest are implicitly all b. Why is (1,1) different though? The test is constructing: nunits: 1 + 1x shape: nelts_per_pattern == 2, npatterns == 1 elements: a, b[, b, b, b, b, ...] It then tests subregs starting at 0 + 1x (so starting at the first b). But for (2,2) we should have: nunits: 2 + 2x shape: nelts_per_pattern == 2, npatterns == 2 elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...] and we should test subregs starting at 0 + 2x (so starting at the first b1). The two cases should be very similar, it's just that the (2,2) case doubles the number of patterns. > I think it makes sense to fail the test. However for poly (1,1) machine mode, > can we have the chance that some target define this > kind of machine mode only used for intrinsics? I already developed
Re: [Patch] mkoffload: Cleanup temporary omp_requires_file
On Fri, Aug 19, 2022 at 04:01:10PM +0200, Tobias Burnus wrote: > Rather obvious, once found. I forgot to add some cleanup, cluttering > /tmp with ".mkoffload.omp_requires". > > The same issue exists for GCN also for ".mkoffload.dbg" > > OK for mainline? – For the dbg issue, OK also for GCC 11 and 12 backport? > > Follow-up to the 'OpenMP: Move omp requires checks to libgomp" commit > https://gcc.gnu.org/r13-1458-g683f11843974f0bdf42f79cdcbb0c2b43c7b81b0 > > For GCN, the issues exists since > https://gcc.gnu.org/r11-6683-g505caa7295b93ecdec8ac9b31595eb34dbd48c9f > > Tobias > > PS: In case you wonder why it is also added with -save-temps: The > internally called maybe_unlink removed the file, unless -save-temps. For > the latter, with verbose flag, it also shows them as '[Leaving %s]'. > > PPS: If you see some ...target.o files in /tmp, that's because of > https://gcc.gnu.org/PR106686 > - > Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 > München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas > Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht > München, HRB 106955 > mkoffload: Cleanup temporary omp_requires_file > > The file (suffix ".mkoffload.omp_requires") used to save the 'omp requires' > data has to be passed to maybe_unlink for cleanup or -v -save-temps stderr > diagnostic. That was missed before. - For GCN, the same has to be done for > the files with suffix ".mkoffload.dbg.o". > > gcc/ChangeLog: > > * config/gcn/mkoffload.cc (main): Add omp_requires_file and dbgobj to > files_to_cleanup. > * config/i386/intelmic-mkoffload.cc (prepare_target_image): Add > omp_requires_file to temp_files. > * config/nvptx/mkoffload.cc (omp_requires_file): New global static var. > (main): Remove local omp_requires_file var. > (tool_cleanup): Handle omp_requires_file. Ok, thanks. Jakub
[Patch] mkoffload: Cleanup temporary omp_requires_file
Rather obvious, once found. I forgot to add some cleanup, cluttering /tmp with ".mkoffload.omp_requires". The same issue exists for GCN also for ".mkoffload.dbg" OK for mainline? – For the dbg issue, OK also for GCC 11 and 12 backport? Follow-up to the 'OpenMP: Move omp requires checks to libgomp" commit https://gcc.gnu.org/r13-1458-g683f11843974f0bdf42f79cdcbb0c2b43c7b81b0 For GCN, the issues exists since https://gcc.gnu.org/r11-6683-g505caa7295b93ecdec8ac9b31595eb34dbd48c9f Tobias PS: In case you wonder why it is also added with -save-temps: The internally called maybe_unlink removed the file, unless -save-temps. For the latter, with verbose flag, it also shows them as '[Leaving %s]'. PPS: If you see some ...target.o files in /tmp, that's because of https://gcc.gnu.org/PR106686 - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 mkoffload: Cleanup temporary omp_requires_file The file (suffix ".mkoffload.omp_requires") used to save the 'omp requires' data has to be passed to maybe_unlink for cleanup or -v -save-temps stderr diagnostic. That was missed before. - For GCN, the same has to be done for the files with suffix ".mkoffload.dbg.o". gcc/ChangeLog: * config/gcn/mkoffload.cc (main): Add omp_requires_file and dbgobj to files_to_cleanup. * config/i386/intelmic-mkoffload.cc (prepare_target_image): Add omp_requires_file to temp_files. * config/nvptx/mkoffload.cc (omp_requires_file): New global static var. (main): Remove local omp_requires_file var. (tool_cleanup): Handle omp_requires_file. diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc index d2464332275..4206448703a 100644 --- a/gcc/config/gcn/mkoffload.cc +++ b/gcc/config/gcn/mkoffload.cc @@ -1030,6 +1030,7 @@ main (int argc, char **argv) } else dbgobj = make_temp_file (".mkoffload.dbg.o"); + obstack_ptr_grow (&files_to_cleanup, dbgobj); /* If the copy fails then just ignore it. */ if (copy_early_debug_info (argv[ix], dbgobj)) @@ -1085,6 +1086,7 @@ main (int argc, char **argv) omp_requires_file = concat (dumppfx, ".mkoffload.omp_requires", NULL); else omp_requires_file = make_temp_file (".mkoffload.omp_requires"); + obstack_ptr_grow (&files_to_cleanup, omp_requires_file); /* Run the compiler pass. */ xputenv (concat ("GCC_OFFLOAD_OMP_REQUIRES_FILE=", omp_requires_file, NULL)); diff --git a/gcc/config/i386/intelmic-mkoffload.cc b/gcc/config/i386/intelmic-mkoffload.cc index 596f6f107b8..5deddff6ca2 100644 --- a/gcc/config/i386/intelmic-mkoffload.cc +++ b/gcc/config/i386/intelmic-mkoffload.cc @@ -526,6 +526,7 @@ prepare_target_image (const char *target_compiler, int argc, char **argv, uint32 omp_requires_file = concat (dumppfx, ".mkoffload.omp_requires", NULL); else omp_requires_file = make_temp_file (".mkoffload.omp_requires"); + temp_files[num_temps++] = omp_requires_file; xputenv (concat ("GCC_OFFLOAD_OMP_REQUIRES_FILE=", omp_requires_file, NULL)); compile_for_target (&argv_obstack); diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc index 0fa5f4423bf..7557681ad07 100644 --- a/gcc/config/nvptx/mkoffload.cc +++ b/gcc/config/nvptx/mkoffload.cc @@ -55,6 +55,7 @@ static id_map *var_ids, **vars_tail = &var_ids; /* Files to unlink. */ static const char *ptx_name; static const char *ptx_cfile_name; +static const char *omp_requires_file; static const char *ptx_dumpbase; enum offload_abi offload_abi = OFFLOAD_ABI_UNSET; @@ -68,6 +69,8 @@ tool_cleanup (bool from_signal ATTRIBUTE_UNUSED) maybe_unlink (ptx_cfile_name); if (ptx_name) maybe_unlink (ptx_name); + if (omp_requires_file) +maybe_unlink (omp_requires_file); } static void @@ -571,7 +689,6 @@ main (int argc, char **argv) unsetenv ("COMPILER_PATH"); unsetenv ("LIBRARY_PATH"); - char *omp_requires_file; if (save_temps) omp_requires_file = concat (dumppfx, ".mkoffload.omp_requires", NULL); else
[PATCH] tree-optimization/105937 - avoid uninit diagnostics crossing iterations
The following avoids adding PHIs to the worklist for uninit processing if we reach them following backedges. That confuses predicate analysis because it assumes the use is happening in the same iteration as the the definition. For the testcase in the PR the situation is like void foo (int val) { int uninit; # val = PHI <..> (B) for (..) { if (..) { .. = val; (C) val = uninit; } # val = PHI <..> (A) } } and starting from (A) with 'uninit' as argument we arrive at (B) and from there at (C). Predicate analysis then tries to prove the predicate of (B) (not the backedge) can prove that the path from (B) to (C) is unreachable which isn't really what it necessary - that's what we'd need to do when the preheader edge of the loop were the edge with the uninitialized def. So the following makes those cases intentionally false negatives. Bootstrapped and tested on x86_64-unknown-linux-gnu, will push on Monday if there are no comments. Thanks, Richard. PR tree-optimization/105937 * tree-ssa-uninit.cc (find_uninit_use): Do not queue PHIs on backedges. (execute_late_warn_uninitialized): Mark backedges. * g++.dg/uninit-pr105937.C: New testcase. --- gcc/testsuite/g++.dg/uninit-pr105937.C | 235 + gcc/tree-ssa-uninit.cc | 14 +- 2 files changed, 247 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/g++.dg/uninit-pr105937.C diff --git a/gcc/testsuite/g++.dg/uninit-pr105937.C b/gcc/testsuite/g++.dg/uninit-pr105937.C new file mode 100644 index 000..26b4f74c5e1 --- /dev/null +++ b/gcc/testsuite/g++.dg/uninit-pr105937.C @@ -0,0 +1,235 @@ +// { dg-do compile } +// { dg-require-effective-target c++17 } +// { dg-options "-O2 -Wall" } + +#include +#include +#include + +using utf8 = char; + +enum +{ +FONT_SIZE_TINY = 2, +FONT_SIZE_SMALL = 0, +FONT_SIZE_MEDIUM = 1, +FONT_SIZE_COUNT = 3 +}; + +constexpr const uint16_t FONT_SPRITE_GLYPH_COUNT = 224; + +enum class FontSpriteBase : int16_t +{ +MEDIUM_EXTRA_DARK = -2, +MEDIUM_DARK = -1, + +TINY = FONT_SIZE_TINY * FONT_SPRITE_GLYPH_COUNT, +SMALL = FONT_SIZE_SMALL * FONT_SPRITE_GLYPH_COUNT, +MEDIUM = FONT_SIZE_MEDIUM * FONT_SPRITE_GLYPH_COUNT, +}; + +struct TTFSurface; + +class CodepointView +{ +private: +std::string_view _str; + +public: +class iterator +{ +private: +std::string_view _str; +size_t _index; + +public: +iterator(std::string_view str, size_t index) +: _str(str) +, _index(index) +{ +} + +bool operator==(const iterator& rhs) const +{ +return _index == rhs._index; +} +bool operator!=(const iterator& rhs) const +{ +return _index != rhs._index; +} +char32_t operator*() const +{ +return GetNextCodepoint(&_str[_index], nullptr); +} +iterator& operator++() +{ +return *this; +} +iterator operator++(int) +{ +auto result = *this; +if (_index < _str.size()) +{ +const utf8* nextch; +GetNextCodepoint(&_str[_index], &nextch); +_index = nextch - _str.data(); +} +return result; +} + +size_t GetIndex() const +{ +return _index; +} + +static char32_t GetNextCodepoint(const char* ch, const char** next); +}; + +CodepointView(std::string_view str) +: _str(str) +{ +} + +iterator begin() const +{ +return iterator(_str, 0); +} + +iterator end() const +{ +return iterator(_str, _str.size()); +} +}; + +struct InternalTTFFont; +using TTF_Font = InternalTTFFont; +struct TTFFontDescriptor +{ +const utf8* filename; +const utf8* font_name; +int32_t ptSize; +int32_t offset_x; +int32_t offset_y; +int32_t line_height; +int32_t hinting_threshold; +TTF_Font* font; +}; +using codepoint_t = uint32_t; + +#define abstract = 0 + +struct ITTF +{ +virtual ~ITTF() = default; +virtual TTFFontDescriptor* ttf_get_font_from_sprite_base(FontSpriteBase spriteBase) abstract; +virtual TTFSurface* ttf_surface_cache_get_or_add(TTF_Font* font, std::string_view text) abstract; +}; + +namespace OpenRCT2 { +struct IContext +{ +virtual ~IContext() = default; + +virtual ITTF* GetTTF() abstract; +}; +} + +static void ttf_draw_string_raw_ttf(OpenRCT2::IContext* context, std::string_view text) +{ +TTFFontDescriptor* fontDesc = context->GetTTF()->ttf_get_font_from_sprite_base(FontSpriteBase::MEDIUM_EXTRA_DARK); +if (fontDesc->font == nullptr) +{ +return; +} + +TTFSurface* surface = context->GetTTF()->ttf_surface_cache_get_or_add(fontDesc->font, text); +if (surface
Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
"juzhe.zh...@rivai.ai" writes: >> Ah, right, sorry for the bogus suggestion. >> In that case, what specifically goes wrong? Which test in >> test_vector_subregs_modes fails, and what does the ICE look like? > > Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute > test_vector_subregs_modes. > The fail ICE: > ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: > ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte)) > expected: (nil) > > actual: (const_int 0 [0]) > cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57 > 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, > rtx_def*, rtx_def*) > ../../../riscv-gcc/gcc/selftest-rtl.cc:57 > 0x1332504 test_vector_subregs_modes > ../../../riscv-gcc/gcc/simplify-rtx.cc:8395 > 0x1332988 test_vector_subregs_fore_back > ../../../riscv-gcc/gcc/simplify-rtx.cc:8442 > 0x1332ae7 test_vector_subregs > ../../../riscv-gcc/gcc/simplify-rtx.cc:8467 > 0x1332c57 test_vector_ops > ../../../riscv-gcc/gcc/simplify-rtx.cc:8487 > 0x1332c7b selftest::simplify_rtx_cc_tests() > ../../../riscv-gcc/gcc/simplify-rtx.cc:8547 > 0x21318fa selftest::run_tests() > ../../../riscv-gcc/gcc/selftest-run-tests.cc:115 > 0x1362a76 toplev::run_self_tests() > ../../../riscv-gcc/gcc/toplev.cc:2205 > > I analyzed the codes: > In test_vector_subregs_fore_back, when nunits = (1,1). The expected = > NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0. > So the assertion fails. Hmm, ok, so the subreg operation is unexpected succeeding. > This is the test for stepped vector using 2 element per pattern. For > poly_uint16 (1,1), it's true it is possible only has 1 element. The stepped case is 3 elements per pattern rather than 2. In a stepped pattern: a, b, b+n are represented explicitly, then the rest are implicitly b+n*2, b+n*3, etc. The case being handled by this code is instead the 2-element case: a, b are represented explicitly, then the rest are implicitly all b. Why is (1,1) different though? The test is constructing: nunits: 1 + 1x shape: nelts_per_pattern == 2, npatterns == 1 elements: a, b[, b, b, b, b, ...] It then tests subregs starting at 0 + 1x (so starting at the first b). But for (2,2) we should have: nunits: 2 + 2x shape: nelts_per_pattern == 2, npatterns == 2 elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...] and we should test subregs starting at 0 + 2x (so starting at the first b1). The two cases should be very similar, it's just that the (2,2) case doubles the number of patterns. > I think it makes sense to fail the test. However for poly (1,1) machine mode, > can we have the chance that some target define this > kind of machine mode only used for intrinsics? I already developed full RVV > support in GCC (including intrinsc and auto-vectorization). > I only enable auto-vectorization with mode larger than (2,2) and test it > fully. > From my experience, it seems the stepped vector only created during VLA > auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics > will > not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~ Following on from what I said above, it doesn't look like this particular case is related to stepped vectors. (1,1) shouldn't (need to) be a special case though. Any potentital problems that would occur for (1,1) with npatterns==1 would also occur for (n,n) with npatterns==n. E.g. if stepped vectors are problematic for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2) would be problematic for (2,2). So yeah, preventing a mode being used for autovectorisation would allow the target to have a bit more control over which constants are actually generated. But it shouldn't be necessary to do that for correctness. Thanks, Richard > juzhe.zh...@rivai.ai > > From: Richard Sandiford > Date: 2022-08-19 17:35 > To: juzhe.zhong\@rivai.ai > CC: rguenther; gcc-patches; kito.cheng > Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) > and allow the machine_mode definition with poly_uint16 (1, 1) > "juzhe.zh...@rivai.ai" writes: >> Hi, Richard. I tried the codes: >> if (!nunits.is_constant () && maybe_gt (nunits, 1)) >> test_vector_subregs_modes (x, nunits - min_nunits, count); >> >> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value. > > Ah, right, sorry for the bogus suggestion. > > In that case, what specifically goes wrong? Which test in > test_vector_subregs_modes fails, and what does the ICE look like? > > Thanks, > Richard > >> But I tried: >> if (!nunits.is_constant () && known_gt (nunits, 1)) >> test_vector_subregs_modes (x, nunits - min_nunits, count); >> It pass. But it report a warning: "warning: comparison between signed and >> unsigned integer expressions [-Wsign-compare]" during the compilation. >> >>
Re: [GCC 13/15][PATCH v3] arm: Add support for dwarf debug directives and pseudo hard-register for PAC feature.
On 19/08/2022 11:04, Srinath Parvathaneni via Gcc-patches wrote: Hello, This patch teaches the DWARF support in gcc about RA_AUTH_CODE pseudo hard-register and also .save {ra_auth_code} and .cfi_offset ra_auth_code dwarf directives for the PAC feature in Armv8.1-M architecture. RA_AUTH_CODE register number is 107 and it's dwarf register number is 143. When compiled with " -march=armv8.1-m.main -mbranch-protection=pac-ret+leaf+bti -mthumb -mfloat-abi=soft -fasynchronous-unwind-tables -g -O2 -S" command line options, the assembly output after this patch looks like below: ... .cfi_startproc pacbti ip, lr, sp movsr1, #40 push{ip, lr} .save {ra_auth_code, lr} .cfi_def_cfa_offset 8 .cfi_offset 143, -8 .cfi_offset 14, -4 ... pop {ip, lr} .cfi_restore 14 .cfi_restore 143 .cfi_def_cfa_offset 0 movsr0, #0 aut ip, lr, sp bx lr .cfi_endproc ... Regression tested on arm-none-eabi target and found no regressions. Ok for master? Regards, Srinath. gcc/ChangeLog: 2022-08-17 Srinath Parvathaneni * config/arm/aout.h (ra_auth_code): Add to enum. * config/arm/arm.cc (emit_multi_reg_push): Add RA_AUTH_CODE register to dwarf frame expression. (arm_emit_multi_reg_pop): Restore RA_AUTH_CODE register. (arm_expand_prologue): Mark as frame related insn. (arm_regno_class): Check for pac pseudo reigster. (arm_dbx_register_number): Assign ra_auth_code register number in dwarf. (arm_unwind_emit_sequence): Print .save directive with ra_auth_code register. (arm_conditional_register_usage): Mark ra_auth_code in fixed reigsters. * config/arm/arm.h (FIRST_PSEUDO_REGISTER): Modify. (IS_PAC_Pseudo_REGNUM): Define. (enum reg_class): Add PAC_REG entry. * config/arm/arm.md (RA_AUTH_CODE): Define. gcc/testsuite/ChangeLog: 2022-08-17 Srinath Parvathaneni * g++.target/arm/pac-1.C: New test. * gcc.target/arm/pac-9.c: Likewise. The general boiler-plate to add the RA register is OK, but the code that tweaks the generation of the push instructions is fixing the wrong problem. The dwarf code already knows how to to track reg->reg copies and put out the right frame information, but this isn't working because you've not augmented the PAC instruction correctly. What you need is a frame-related augmentation to that and that essentially does (set (IP_REGNUM) (RA_AUTH_CODE)) The generic dwarf code should then handle all the rest for emitting CFI directives. The code in arm_unwind_emit_sequence (and technically in arm_unwind_emit_set as well, but we probably never reach that code as of today) then needs updating to handle the special cases when IP appears in the list of registers and PAC is enabled. That's a bit of a hack, but I can't immediately think of a better way of handling it. R. ### Attachment also inlined for ease of reply### diff --git a/gcc/config/arm/aout.h b/gcc/config/arm/aout.h index b918ad3782fbee82320febb8b6e72ad615780261..ffeed45a678f17c63d5b42c21f020ca416cbf23f 100644 --- a/gcc/config/arm/aout.h +++ b/gcc/config/arm/aout.h @@ -74,7 +74,8 @@ "wr8", "wr9", "wr10", "wr11", \ "wr12", "wr13", "wr14", "wr15", \ "wcgr0", "wcgr1", "wcgr2", "wcgr3", \ - "cc", "vfpcc", "sfp", "afp", "apsrq", "apsrge", "p0" \ + "cc", "vfpcc", "sfp", "afp", "apsrq", "apsrge", "p0", \ + "ra_auth_code" \ } #endif diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 3495ab857eac38ecdf37e55f1d201b1c35cbde0b..c7067819f6785e44d30d8e5365505ab98682 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -816,7 +816,8 @@ extern const int arm_arch_cde_coproc_bits[]; s16-s31 S VFP variable (aka d8-d15). vfpcc Not a real register. Represents the VFP condition code flags. - vpr Used to represent MVE VPR predication. */ + vpr Used to represent MVE VPR predication. + ra_auth_codePseudo register to save PAC. */ /* The stack backtrace structure is as follows: fp points to here: | save code pointer | [fp] @@ -857,7 +858,7 @@ extern const int arm_arch_cde_coproc_bits[]; 1,1,1,1,1,1,1,1,\ 1,1,1,1,\ /* Specials. */\ - 1,1,1,1,1,1,1\ + 1,1,1,1,1,1,1,1 \ } /* 1 for registers not available across function calls. @@ -887,7 +888,7 @@ extern const int arm_arch_cde_coproc_bits[]; 1,1,1,1,1,1,1,1,\ 1,1,1,1,\ /* Spec
Re: [PATCH v2] LoongArch: Add support code model extreme.
On Fri, 2022-08-19 at 09:56 +0800, Lulu Cheng wrote: > v1 -> v2: > - Modify some description information. > - Add options -W[no]extreme-plt, warn about code model extreme not support > plt mode, > and then disable plt. I think we can make -mcmodel=large imply -fno-plt, and reject if user uses "-fplt -mcmodel=large" explicitly: diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 114ff8ec836..25483da0f44 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -5961,9 +5961,8 @@ loongarch_option_override_internal (struct gcc_options *opts) if (opts->x_flag_plt) { - warning (OPT_Wextreme_plt, "code model %qs not support %s mode," -"now set to %s", -"extreme", "plt", "noplt"); + if (global_options_set.x_flag_plt) + error ("code model %qs is not compatible with %s", "-fplt"); opts->x_flag_plt = 0; } break; -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[GCC 13/15][PATCH v3] arm: Add support for dwarf debug directives and pseudo hard-register for PAC feature.
Hello, This patch teaches the DWARF support in gcc about RA_AUTH_CODE pseudo hard-register and also .save {ra_auth_code} and .cfi_offset ra_auth_code dwarf directives for the PAC feature in Armv8.1-M architecture. RA_AUTH_CODE register number is 107 and it's dwarf register number is 143. When compiled with " -march=armv8.1-m.main -mbranch-protection=pac-ret+leaf+bti -mthumb -mfloat-abi=soft -fasynchronous-unwind-tables -g -O2 -S" command line options, the assembly output after this patch looks like below: ... .cfi_startproc pacbti ip, lr, sp movsr1, #40 push{ip, lr} .save {ra_auth_code, lr} .cfi_def_cfa_offset 8 .cfi_offset 143, -8 .cfi_offset 14, -4 ... pop {ip, lr} .cfi_restore 14 .cfi_restore 143 .cfi_def_cfa_offset 0 movsr0, #0 aut ip, lr, sp bx lr .cfi_endproc ... Regression tested on arm-none-eabi target and found no regressions. Ok for master? Regards, Srinath. gcc/ChangeLog: 2022-08-17 Srinath Parvathaneni * config/arm/aout.h (ra_auth_code): Add to enum. * config/arm/arm.cc (emit_multi_reg_push): Add RA_AUTH_CODE register to dwarf frame expression. (arm_emit_multi_reg_pop): Restore RA_AUTH_CODE register. (arm_expand_prologue): Mark as frame related insn. (arm_regno_class): Check for pac pseudo reigster. (arm_dbx_register_number): Assign ra_auth_code register number in dwarf. (arm_unwind_emit_sequence): Print .save directive with ra_auth_code register. (arm_conditional_register_usage): Mark ra_auth_code in fixed reigsters. * config/arm/arm.h (FIRST_PSEUDO_REGISTER): Modify. (IS_PAC_Pseudo_REGNUM): Define. (enum reg_class): Add PAC_REG entry. * config/arm/arm.md (RA_AUTH_CODE): Define. gcc/testsuite/ChangeLog: 2022-08-17 Srinath Parvathaneni * g++.target/arm/pac-1.C: New test. * gcc.target/arm/pac-9.c: Likewise. ### Attachment also inlined for ease of reply### diff --git a/gcc/config/arm/aout.h b/gcc/config/arm/aout.h index b918ad3782fbee82320febb8b6e72ad615780261..ffeed45a678f17c63d5b42c21f020ca416cbf23f 100644 --- a/gcc/config/arm/aout.h +++ b/gcc/config/arm/aout.h @@ -74,7 +74,8 @@ "wr8", "wr9", "wr10", "wr11", \ "wr12", "wr13", "wr14", "wr15", \ "wcgr0", "wcgr1", "wcgr2", "wcgr3", \ - "cc", "vfpcc", "sfp", "afp", "apsrq", "apsrge", "p0" \ + "cc", "vfpcc", "sfp", "afp", "apsrq", "apsrge", "p0",\ + "ra_auth_code" \ } #endif diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 3495ab857eac38ecdf37e55f1d201b1c35cbde0b..c7067819f6785e44d30d8e5365505ab98682 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -816,7 +816,8 @@ extern const int arm_arch_cde_coproc_bits[]; s16-s31 S VFP variable (aka d8-d15). vfpcc Not a real register. Represents the VFP condition code flags. - vpr Used to represent MVE VPR predication. */ + vpr Used to represent MVE VPR predication. + ra_auth_codePseudo register to save PAC. */ /* The stack backtrace structure is as follows: fp points to here: | save code pointer | [fp] @@ -857,7 +858,7 @@ extern const int arm_arch_cde_coproc_bits[]; 1,1,1,1,1,1,1,1, \ 1,1,1,1, \ /* Specials. */ \ - 1,1,1,1,1,1,1\ + 1,1,1,1,1,1,1,1 \ } /* 1 for registers not available across function calls. @@ -887,7 +888,7 @@ extern const int arm_arch_cde_coproc_bits[]; 1,1,1,1,1,1,1,1, \ 1,1,1,1, \ /* Specials. */ \ - 1,1,1,1,1,1,1\ + 1,1,1,1,1,1,1,1 \ } #ifndef SUBTARGET_CONDITIONAL_REGISTER_USAGE @@ -1063,10 +1064,10 @@ extern const int arm_arch_cde_coproc_bits[]; && (LAST_VFP_REGNUM - (REGNUM) >= 2 * (N) - 1)) /* The number of hard registers is 16 ARM + 1 CC + 1 SFP + 1 AFP - + 1 APSRQ + 1 APSRGE + 1 VPR. */ + + 1 APSRQ + 1 APSRGE + 1 VPR + 1 Pseudo register to save PAC. */ /* Intel Wireless MMX Technology registers add 16 + 4 more. */ /* VFP (VFP3) adds 32 (64) + 1 VFPCC. */ -#define FIRST_PSEUDO_REGISTER 107 +#define FIRST_PSEUDO_REGISTER 108 #define DBX_REGISTER_NUMBER(REGNO) arm_dbx_register_number (REGNO) @@ -1253,12 +1254,15 @@ extern int arm_regs_in_sequence[]; CC_REGNUM, VFPCC_REGNUM, \ FRAME_POINTER_REGNUM, ARG_POINTER_REGNUM,\ SP_REGNUM, PC_REGNUM, APSRQ_REGNUM, \ - APSRGE_REGNUM, VPR_REGNUM\ + APSRGE_REGNUM, VPR_REGNUM, RA_AUTH_CODE
Re: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
> Ah, right, sorry for the bogus suggestion. > In that case, what specifically goes wrong? Which test in > test_vector_subregs_modes fails, and what does the ICE look like? Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute test_vector_subregs_modes. The fail ICE: ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte)) expected: (nil) actual: (const_int 0 [0]) cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, rtx_def*, rtx_def*) ../../../riscv-gcc/gcc/selftest-rtl.cc:57 0x1332504 test_vector_subregs_modes ../../../riscv-gcc/gcc/simplify-rtx.cc:8395 0x1332988 test_vector_subregs_fore_back ../../../riscv-gcc/gcc/simplify-rtx.cc:8442 0x1332ae7 test_vector_subregs ../../../riscv-gcc/gcc/simplify-rtx.cc:8467 0x1332c57 test_vector_ops ../../../riscv-gcc/gcc/simplify-rtx.cc:8487 0x1332c7b selftest::simplify_rtx_cc_tests() ../../../riscv-gcc/gcc/simplify-rtx.cc:8547 0x21318fa selftest::run_tests() ../../../riscv-gcc/gcc/selftest-run-tests.cc:115 0x1362a76 toplev::run_self_tests() ../../../riscv-gcc/gcc/toplev.cc:2205 I analyzed the codes: In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0. So the assertion fails. This is the test for stepped vector using 2 element per pattern. For poly_uint16 (1,1), it's true it is possible only has 1 element. I think it makes sense to fail the test. However for poly (1,1) machine mode, can we have the chance that some target define this kind of machine mode only used for intrinsics? I already developed full RVV support in GCC (including intrinsc and auto-vectorization). I only enable auto-vectorization with mode larger than (2,2) and test it fully. From my experience, it seems the stepped vector only created during VLA auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics will not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~ juzhe.zh...@rivai.ai From: Richard Sandiford Date: 2022-08-19 17:35 To: juzhe.zhong\@rivai.ai CC: rguenther; gcc-patches; kito.cheng Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) "juzhe.zh...@rivai.ai" writes: > Hi, Richard. I tried the codes: > if (!nunits.is_constant () && maybe_gt (nunits, 1)) > test_vector_subregs_modes (x, nunits - min_nunits, count); > > It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value. Ah, right, sorry for the bogus suggestion. In that case, what specifically goes wrong? Which test in test_vector_subregs_modes fails, and what does the ICE look like? Thanks, Richard > But I tried: > if (!nunits.is_constant () && known_gt (nunits, 1)) > test_vector_subregs_modes (x, nunits - min_nunits, count); > It pass. But it report a warning: "warning: comparison between signed and > unsigned integer expressions [-Wsign-compare]" during the compilation. > > Finally, I tried: > if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) > test_vector_subregs_modes (x, nunits - min_nunits, count); > It passed with no warning. > > Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this? > Thanks! > > > juzhe.zh...@rivai.ai > > From: Richard Sandiford > Date: 2022-08-19 16:03 > To: juzhe.zhong > CC: gcc-patches; rguenther; kito.cheng > Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) > and allow the machine_mode definition with poly_uint16 (1, 1) > juzhe.zh...@rivai.ai writes: >> From: zhongjuzhe >> >> Hello. This patch is preparing for following RVV support. >> >> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector. >> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of >> ARM SVE is always 128-bit blocks. >> However, the minimum vector length of RVV can be 32bit in 'Zve32*' >> sub-extension and 64bit in 'Zve*' sub-extension. >> >> So I define the machine_mode as follows: >> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0); >> ADJUST_NUNITS (MODE, riscv_vector_chunks); >> The riscv_vector_chunks = poly_uint16 (1, 1) >> >> The compilation is failed for the stepped vector test: >> (const_vector:VNx1DI repeat [ >> (const_int 8 [0x8]) >> (const_int 7 [0x7]) >> ]) >> >> I understand for stepped vector should always have aleast 2 elements and >> stepped vector initialization is common >> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that >> report fail for stepped vector of poly_uint16 (1, 1). >> >> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in >> intrinsics. And I would like to enable RVV auto-vectoriz
Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
"juzhe.zh...@rivai.ai" writes: > Hi, Richard. I tried the codes: > if (!nunits.is_constant () && maybe_gt (nunits, 1)) > test_vector_subregs_modes (x, nunits - min_nunits, count); > > It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value. Ah, right, sorry for the bogus suggestion. In that case, what specifically goes wrong? Which test in test_vector_subregs_modes fails, and what does the ICE look like? Thanks, Richard > But I tried: > if (!nunits.is_constant () && known_gt (nunits, 1)) > test_vector_subregs_modes (x, nunits - min_nunits, count); > It pass. But it report a warning: "warning: comparison between signed and > unsigned integer expressions [-Wsign-compare]" during the compilation. > > Finally, I tried: > if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) > test_vector_subregs_modes (x, nunits - min_nunits, count); > It passed with no warning. > > Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this? > Thanks! > > > juzhe.zh...@rivai.ai > > From: Richard Sandiford > Date: 2022-08-19 16:03 > To: juzhe.zhong > CC: gcc-patches; rguenther; kito.cheng > Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) > and allow the machine_mode definition with poly_uint16 (1, 1) > juzhe.zh...@rivai.ai writes: >> From: zhongjuzhe >> >> Hello. This patch is preparing for following RVV support. >> >> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector. >> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of >> ARM SVE is always 128-bit blocks. >> However, the minimum vector length of RVV can be 32bit in 'Zve32*' >> sub-extension and 64bit in 'Zve*' sub-extension. >> >> So I define the machine_mode as follows: >> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0); >> ADJUST_NUNITS (MODE, riscv_vector_chunks); >> The riscv_vector_chunks = poly_uint16 (1, 1) >> >> The compilation is failed for the stepped vector test: >> (const_vector:VNx1DI repeat [ >> (const_int 8 [0x8]) >> (const_int 7 [0x7]) >> ]) >> >> I understand for stepped vector should always have aleast 2 elements and >> stepped vector initialization is common >> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that >> report fail for stepped vector of poly_uint16 (1, 1). >> >> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in >> intrinsics. And I would like to enable RVV auto-vectorization >> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V >> backend. I think it will not create issue if we define >> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or >> offer me some other better solutions. Thanks! >> >> >> >> gcc/ChangeLog: >> >> * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for >> poly_uint16 (1, 1). >> >> --- >> gcc/simplify-rtx.cc | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc >> index 7d09bf7103d..61e0dfa00d0 100644 >> --- a/gcc/simplify-rtx.cc >> +++ b/gcc/simplify-rtx.cc >> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode) >>rtx x = builder.build (); >> >>test_vector_subregs_modes (x); >> - if (!nunits.is_constant ()) >> + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1))) >> test_vector_subregs_modes (x, nunits - min_nunits, count); > > I think instead we should use maybe_gt (nunits, 1), on the basis that > the fore_back tests require vectors that have a minimum of 2 elements. > Something like poly_uint16 (1, 2) would have the same problem as > poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in > principle.) > > This corresponds to the minimum of 3 elements for the stepped tests: > > if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT > && maybe_gt (GET_MODE_NUNITS (mode), 2)) > { > test_vector_ops_series (mode, scalar_reg); > test_vector_subregs (mode); > } > > Thanks, > Richard >
Re: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
Hi, Richard. I tried the codes: if (!nunits.is_constant () && maybe_gt (nunits, 1)) test_vector_subregs_modes (x, nunits - min_nunits, count); It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value. But I tried: if (!nunits.is_constant () && known_gt (nunits, 1)) test_vector_subregs_modes (x, nunits - min_nunits, count); It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation. Finally, I tried: if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) test_vector_subregs_modes (x, nunits - min_nunits, count); It passed with no warning. Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this? Thanks! juzhe.zh...@rivai.ai From: Richard Sandiford Date: 2022-08-19 16:03 To: juzhe.zhong CC: gcc-patches; rguenther; kito.cheng Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) juzhe.zh...@rivai.ai writes: > From: zhongjuzhe > > Hello. This patch is preparing for following RVV support. > > Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector. > The minimum vector length of ARM SVE is 128-bit and the runtime invariant of > ARM SVE is always 128-bit blocks. > However, the minimum vector length of RVV can be 32bit in 'Zve32*' > sub-extension and 64bit in 'Zve*' sub-extension. > > So I define the machine_mode as follows: > VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0); > ADJUST_NUNITS (MODE, riscv_vector_chunks); > The riscv_vector_chunks = poly_uint16 (1, 1) > > The compilation is failed for the stepped vector test: > (const_vector:VNx1DI repeat [ > (const_int 8 [0x8]) > (const_int 7 [0x7]) > ]) > > I understand for stepped vector should always have aleast 2 elements and > stepped vector initialization is common > for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that > report fail for stepped vector of poly_uint16 (1, 1). > > machine mode with nunits = poly_uint16 (1, 1) needs to implemented in > intrinsics. And I would like to enable RVV auto-vectorization > with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V > backend. I think it will not create issue if we define > vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or > offer me some other better solutions. Thanks! > > > > gcc/ChangeLog: > > * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for > poly_uint16 (1, 1). > > --- > gcc/simplify-rtx.cc | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc > index 7d09bf7103d..61e0dfa00d0 100644 > --- a/gcc/simplify-rtx.cc > +++ b/gcc/simplify-rtx.cc > @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode) >rtx x = builder.build (); > >test_vector_subregs_modes (x); > - if (!nunits.is_constant ()) > + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1))) > test_vector_subregs_modes (x, nunits - min_nunits, count); I think instead we should use maybe_gt (nunits, 1), on the basis that the fore_back tests require vectors that have a minimum of 2 elements. Something like poly_uint16 (1, 2) would have the same problem as poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in principle.) This corresponds to the minimum of 3 elements for the stepped tests: if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT && maybe_gt (GET_MODE_NUNITS (mode), 2)) { test_vector_ops_series (mode, scalar_reg); test_vector_subregs (mode); } Thanks, Richard
GCC 12.2.1 Status Report (2022-08-19)
Status == The gcc-12 branch is again open for regression and documentation fixes. Quality Data Priority # Change from last report --- --- P1 0 P2 434 + 2 P3 64+ 2 P4 239 P5 25 --- --- Total P1-P3 498 + 4 Total 762 + 4 Previous Report === https://gcc.gnu.org/pipermail/gcc/2022-August/239270.html
Re: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
Thank you so much. Address your comment. I think "maybe_gt (nunits, 1)" is a more solid solution than I do. I will send a patch to fix this. juzhe.zh...@rivai.ai From: Richard Sandiford Date: 2022-08-19 16:03 To: juzhe.zhong CC: gcc-patches; rguenther; kito.cheng Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) juzhe.zh...@rivai.ai writes: > From: zhongjuzhe > > Hello. This patch is preparing for following RVV support. > > Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector. > The minimum vector length of ARM SVE is 128-bit and the runtime invariant of > ARM SVE is always 128-bit blocks. > However, the minimum vector length of RVV can be 32bit in 'Zve32*' > sub-extension and 64bit in 'Zve*' sub-extension. > > So I define the machine_mode as follows: > VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0); > ADJUST_NUNITS (MODE, riscv_vector_chunks); > The riscv_vector_chunks = poly_uint16 (1, 1) > > The compilation is failed for the stepped vector test: > (const_vector:VNx1DI repeat [ > (const_int 8 [0x8]) > (const_int 7 [0x7]) > ]) > > I understand for stepped vector should always have aleast 2 elements and > stepped vector initialization is common > for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that > report fail for stepped vector of poly_uint16 (1, 1). > > machine mode with nunits = poly_uint16 (1, 1) needs to implemented in > intrinsics. And I would like to enable RVV auto-vectorization > with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V > backend. I think it will not create issue if we define > vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or > offer me some other better solutions. Thanks! > > > > gcc/ChangeLog: > > * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for > poly_uint16 (1, 1). > > --- > gcc/simplify-rtx.cc | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc > index 7d09bf7103d..61e0dfa00d0 100644 > --- a/gcc/simplify-rtx.cc > +++ b/gcc/simplify-rtx.cc > @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode) >rtx x = builder.build (); > >test_vector_subregs_modes (x); > - if (!nunits.is_constant ()) > + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1))) > test_vector_subregs_modes (x, nunits - min_nunits, count); I think instead we should use maybe_gt (nunits, 1), on the basis that the fore_back tests require vectors that have a minimum of 2 elements. Something like poly_uint16 (1, 2) would have the same problem as poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in principle.) This corresponds to the minimum of 3 elements for the stepped tests: if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT && maybe_gt (GET_MODE_NUNITS (mode), 2)) { test_vector_ops_series (mode, scalar_reg); test_vector_subregs (mode); } Thanks, Richard
Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
juzhe.zh...@rivai.ai writes: > From: zhongjuzhe > > Hello. This patch is preparing for following RVV support. > > Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector. > The minimum vector length of ARM SVE is 128-bit and the runtime invariant of > ARM SVE is always 128-bit blocks. > However, the minimum vector length of RVV can be 32bit in 'Zve32*' > sub-extension and 64bit in 'Zve*' sub-extension. > > So I define the machine_mode as follows: > VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0); > ADJUST_NUNITS (MODE, riscv_vector_chunks); > The riscv_vector_chunks = poly_uint16 (1, 1) > > The compilation is failed for the stepped vector test: > (const_vector:VNx1DI repeat [ > (const_int 8 [0x8]) > (const_int 7 [0x7]) > ]) > > I understand for stepped vector should always have aleast 2 elements and > stepped vector initialization is common > for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that > report fail for stepped vector of poly_uint16 (1, 1). > > machine mode with nunits = poly_uint16 (1, 1) needs to implemented in > intrinsics. And I would like to enable RVV auto-vectorization > with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V > backend. I think it will not create issue if we define > vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or > offer me some other better solutions. Thanks! > > > > gcc/ChangeLog: > > * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for > poly_uint16 (1, 1). > > --- > gcc/simplify-rtx.cc | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc > index 7d09bf7103d..61e0dfa00d0 100644 > --- a/gcc/simplify-rtx.cc > +++ b/gcc/simplify-rtx.cc > @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode) >rtx x = builder.build (); > >test_vector_subregs_modes (x); > - if (!nunits.is_constant ()) > + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1))) > test_vector_subregs_modes (x, nunits - min_nunits, count); I think instead we should use maybe_gt (nunits, 1), on the basis that the fore_back tests require vectors that have a minimum of 2 elements. Something like poly_uint16 (1, 2) would have the same problem as poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in principle.) This corresponds to the minimum of 3 elements for the stepped tests: if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT && maybe_gt (GET_MODE_NUNITS (mode), 2)) { test_vector_ops_series (mode, scalar_reg); test_vector_subregs (mode); } Thanks, Richard
Re: [PATCH] jobserver: detect properly O_NONBLOCK
On Thu, Aug 18, 2022 at 2:43 PM Martin Liška wrote: > > That handles systems that don't have O_NONBLOCK, in that case > WPA streaming is not using jobserver if --jobserver-auth uses 'fifo'. > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > Tested with mingw cross compiler as well. > > Ready to be installed? libstdc++ uses # For Networking TS. AC_CHECK_HEADERS([fcntl.h sys/ioctl.h sys/socket.h sys/uio.h poll.h netdb.h arpa/inet.h netinet/in.h netinet/tcp.h]) AC_CHECK_DECL(F_GETFL,,,[#include ]) AC_CHECK_DECL(F_SETFL,,,[#include ]) if test "$ac_cv_have_decl_F_GETFL$ac_cv_have_decl_F_SETFL" = yesyes ; then AC_CHECK_DECL(O_NONBLOCK,,,[#include ]) fi I'd rather not invent sth fancy with /dev/null (that seems very unixy) > Thanks, > Martin > > gcc/ChangeLog: > > * configure.ac: Detect O_NONBLOCK flag for open. > * config.in: Regenerate. > * configure: Regenerate. > * opts-common.cc (jobserver_info::connect): Set is_connected > properly based on O_NONBLOCK. > * opts-jobserver.h (struct jobserver_info): Add is_connected > member variable. > > gcc/lto/ChangeLog: > > * lto.cc (wait_for_child): Ask if we are connected to jobserver. > (stream_out_partitions): Likewise. > --- > gcc/config.in| 6 ++ > gcc/configure| 39 +-- > gcc/configure.ac | 11 +++ > gcc/lto/lto.cc | 12 ++-- > gcc/opts-common.cc | 11 ++- > gcc/opts-jobserver.h | 2 ++ > 6 files changed, 72 insertions(+), 9 deletions(-) > > diff --git a/gcc/config.in b/gcc/config.in > index 413b2bd36cb..abab9bf5024 100644 > --- a/gcc/config.in > +++ b/gcc/config.in > @@ -2148,6 +2148,12 @@ > #endif > > > +/* Define if O_NONBLOCK supported by fcntl. */ > +#ifndef USED_FOR_TARGET > +#undef HOST_HAS_O_NONBLOCK > +#endif > + > + > /* Define which stat syscall is able to handle 64bit indodes. */ > #ifndef USED_FOR_TARGET > #undef HOST_STAT_FOR_64BIT_INODES > diff --git a/gcc/configure b/gcc/configure > index da7a45066b5..8b416c1a142 100755 > --- a/gcc/configure > +++ b/gcc/configure > @@ -12460,6 +12460,41 @@ $as_echo "#define HOST_HAS_O_CLOEXEC 1" >>confdefs.h > > fi > > +# Check if O_NONBLOCK is defined by fcntl > +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for O_NONBLOCK" >&5 > +$as_echo_n "checking for O_NONBLOCK... " >&6; } > +if ${ac_cv_o_nonblock+:} false; then : > + $as_echo_n "(cached) " >&6 > +else > + > +cat confdefs.h - <<_ACEOF >conftest.$ac_ext > +/* end confdefs.h. */ > + > +#include > +int > +main () > +{ > + > +return open ("/dev/null", O_RDONLY | O_NONBLOCK); > + ; > + return 0; > +} > +_ACEOF > +if ac_fn_cxx_try_compile "$LINENO"; then : > + ac_cv_o_nonblock=yes > +else > + ac_cv_o_nonblock=no > +fi > +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext > +fi > +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_o_nonblock" >&5 > +$as_echo "$ac_cv_o_nonblock" >&6; } > +if test $ac_cv_o_nonblock = yes; then > + > +$as_echo "#define HOST_HAS_O_NONBLOCK 1" >>confdefs.h > + > +fi > + > # C++ Modules would like some networking features to provide the mapping > # server. You can still use modules without them though. > # The following network-related checks could probably do with some > @@ -19678,7 +19713,7 @@ else >lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 >lt_status=$lt_dlunknown >cat > conftest.$ac_ext <<_LT_EOF > -#line 19681 "configure" > +#line 19716 "configure" > #include "confdefs.h" > > #if HAVE_DLFCN_H > @@ -19784,7 +19819,7 @@ else >lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 >lt_status=$lt_dlunknown >cat > conftest.$ac_ext <<_LT_EOF > -#line 19787 "configure" > +#line 19822 "configure" > #include "confdefs.h" > > #if HAVE_DLFCN_H > diff --git a/gcc/configure.ac b/gcc/configure.ac > index f70b6c24fda..4ebdad38b9b 100644 > --- a/gcc/configure.ac > +++ b/gcc/configure.ac > @@ -1707,6 +1707,17 @@ if test $ac_cv_o_cloexec = yes; then >[Define if O_CLOEXEC supported by fcntl.]) > fi > > +# Check if O_NONBLOCK is defined by fcntl > +AC_CACHE_CHECK(for O_NONBLOCK, ac_cv_o_nonblock, [ > +AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[ > +#include ]], [[ > +return open ("/dev/null", O_RDONLY | O_NONBLOCK);]])], > +[ac_cv_o_nonblock=yes],[ac_cv_o_nonblock=no])]) > +if test $ac_cv_o_nonblock = yes; then > + AC_DEFINE(HOST_HAS_O_NONBLOCK, 1, > + [Define if O_NONBLOCK supported by fcntl.]) > +fi > + > # C++ Modules would like some networking features to provide the mapping > # server. You can still use modules without them though. > # The following network-related checks could probably do with some > diff --git a/gcc/lto/lto.cc b/gcc/lto/lto.cc > index c82307f4f7e..3a9147b01b5 100644 > --- a/gcc/lto/lto.cc > +++ b/gcc/lto/lto.cc > @@ -213,11 +213,11 @@ wait_for_child () > } >while (!WIFEXITED (status) && !WIFSIGNALED (status)); > > ---nruns; > + --nruns; > > -
Re: [PATCH] Make path_range_query standalone and add reset_path.
On Thu, Aug 18, 2022 at 1:10 PM Aldy Hernandez wrote: > > On Thu, Aug 18, 2022, 11:34 Richard Biener wrote: > > > > On Thu, Aug 18, 2022 at 9:58 AM Aldy Hernandez wrote: > > > > > > FWIW, here is a rebased version with the latest trunk. > > > > > > There are no regressions, or differences in threading counts, and the > > > performance change is negligible. > > > > +{ > > + mark_dfs_back_edges (); > > + > > + if (flag_checking) > > + verify_marked_backedges (fun); > > > > that looks redundant. > > Do you suggest somewhere else, or should we nuke > verify_marked_backedges altogether since that's its only use? Not sure. I'd leave the function around in any case. > > > > Otherwise looks good - it might be possible to get rid of the reset_path use > > as well? > > > That's what I mentioned wrt DOM threading. There's a 14% degradation > from not reusing the path_range_query in the forward threader because > it really abuses the query to try to simplify its statements. So I've > left reset_path for it to use, but we can get rid of it when we nuke > the old threader. Ah, I see. > Aldy > > > Aldy > > > > > Richard. > > > > > Aldy > > > > > > On Wed, Aug 17, 2022 at 1:59 PM Aldy Hernandez wrote: > > > > > > > > These are a bunch of cleanups inspired by Richi's suggestion of making > > > > path_range_query standalone, instead of having to call > > > > compute_ranges() for each path. > > > > > > > > I've made the ranger need explicit, and moved the responsibility for > > > > its creation to the caller. > > > > > > > > I've also investigated and documented why the forward threader needs its > > > > own compute exit dependencies variant. I can't wait for it to go away > > > > :-/. > > > > > > > > I've also added constructors that take a path and dependencies, and > > > > made compute_ranges() private. Unfortunately, reinstantiating > > > > path_range_query in the forward threader caused a 14% performance > > > > regression in DOM, because the old threader calls it over and over on > > > > the same path to simplify statements (some of which not even in the > > > > IL, but that's old news). > > > > > > > > In the meantime, I've left the ability to reset a path, but this time > > > > appropriately called reset_path(). > > > > > > > > I've moved the verify_marked_backedges call to the threader. Is this > > > > ok? > > > > > > > > Interestingly, comparing the thread counts for the patch yielded more > > > > threads. I narrowed this down to a bug in the path oracle reset code > > > > that's not cleaning things up as expected. I'll fix that before > > > > committing this, but figured I'd post for comments in the meantime. > > > > > > > > Thoughts? > > > > > > > > gcc/ChangeLog: > > > > > > > > * gimple-range-path.cc (path_range_query::path_range_query): Add > > > > various constructors to take a path. > > > > (path_range_query::~path_range_query): Remove m_alloced_ranger. > > > > (path_range_query::range_on_path_entry): Adjust for m_ranger > > > > being > > > > a reference. > > > > (path_range_query::set_path): Rename to... > > > > (path_range_query::reset_path): ...this and call compute_ranges. > > > > (path_range_query::ssa_range_in_phi): Adjust for m_ranger > > > > reference. > > > > (path_range_query::range_defined_in_block): Same. > > > > (path_range_query::compute_ranges_in_block): Same. > > > > (path_range_query::adjust_for_non_null_uses): Same. > > > > (path_range_query::compute_exit_dependencies): Use m_path > > > > instead > > > > of argument. > > > > (path_range_query::compute_ranges): Remove path argument. > > > > (path_range_query::range_of_stmt): Adjust for m_ranger > > > > reference. > > > > (path_range_query::compute_outgoing_relations): Same. > > > > * gimple-range-path.h (class path_range_query): Add various > > > > constructors. > > > > Make compute_ranges and compute_exit_dependencies private. > > > > Rename set_path to reset_path. > > > > Make m_ranger a reference. > > > > Remove m_alloced_ranger. > > > > * tree-ssa-dom.cc (pass_dominator::execute): Adjust constructor > > > > to > > > > path_range_query. > > > > * tree-ssa-loop-ch.cc (entry_loop_condition_is_static): Take a > > > > ranger and instantiate a new path_range_query every time. > > > > (ch_base::copy_headers): Pass ranger instead of > > > > path_range_query. > > > > * tree-ssa-threadbackward.cc (class back_threader): Remove > > > > m_solver. > > > > (back_threader::~back_threader): Remove m_solver. > > > > (back_threader::find_taken_edge_switch): Adjust for m_ranger > > > > reference. > > > > (back_threader::find_taken_edge_cond): Same. > > > > (back_threader::dump): Remove m_solver. > > > > (back_threader::back_threader): Move v
Re: [PATCH] Remove path_range_query constructor that takes an edge.
On Thu, Aug 18, 2022 at 6:13 PM Aldy Hernandez wrote: > > The path_range_query constructor that takes an edge is really a > convenience function for the loop-ch pass. It feels wrong to pollute > the API with such a specialized function that could be done with > a small inline function closer to its user. > > As an added benefit, we remove one use of reset_path. The last > remaining one is the forward threader one. > > OK? OK. > gcc/ChangeLog: > > * gimple-range-path.cc (path_range_query::path_range_query): > Remove constructor that takes edge. > * gimple-range-path.h (class path_range_query): Same. > * tree-ssa-loop-ch.cc (edge_range_query): New. > (entry_loop_condition_is_static): Call edge_range_query. > --- > gcc/gimple-range-path.cc | 15 --- > gcc/gimple-range-path.h | 1 - > gcc/tree-ssa-loop-ch.cc | 17 +++-- > 3 files changed, 15 insertions(+), 18 deletions(-) > > diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc > index ba7c2ed9b47..bc2879c0c57 100644 > --- a/gcc/gimple-range-path.cc > +++ b/gcc/gimple-range-path.cc > @@ -59,21 +59,6 @@ path_range_query::path_range_query (gimple_ranger &ranger, > bool resolve) >m_oracle = new path_oracle (m_ranger.oracle ()); > } > > -path_range_query::path_range_query (gimple_ranger &ranger, > - edge e, > - bool resolve) > - : m_cache (new ssa_global_cache), > -m_has_cache_entry (BITMAP_ALLOC (NULL)), > -m_ranger (ranger), > -m_resolve (resolve) > -{ > - m_oracle = new path_oracle (m_ranger.oracle ()); > - auto_vec bbs (2); > - bbs.quick_push (e->dest); > - bbs.quick_push (e->src); > - reset_path (bbs, NULL); > -} > - > path_range_query::~path_range_query () > { >delete m_oracle; > diff --git a/gcc/gimple-range-path.h b/gcc/gimple-range-path.h > index 483fde0d431..9f2d6d92dab 100644 > --- a/gcc/gimple-range-path.h > +++ b/gcc/gimple-range-path.h > @@ -37,7 +37,6 @@ public: > const bitmap_head *dependencies = NULL, > bool resolve = true); >path_range_query (gimple_ranger &ranger, bool resolve = true); > - path_range_query (gimple_ranger &ranger, edge e, bool resolve = true); >virtual ~path_range_query (); >void reset_path (const vec &, const bitmap_head > *dependencies); >bool range_of_expr (vrange &r, tree name, gimple * = NULL) override; > diff --git a/gcc/tree-ssa-loop-ch.cc b/gcc/tree-ssa-loop-ch.cc > index 96816b89287..9c316887d5b 100644 > --- a/gcc/tree-ssa-loop-ch.cc > +++ b/gcc/tree-ssa-loop-ch.cc > @@ -45,6 +45,20 @@ along with GCC; see the file COPYING3. If not see > increases effectiveness of code motion optimizations, and reduces the need > for loop preconditioning. */ > > +/* Given a path through edge E, whose last statement is COND, return > + the range of the solved conditional in R. */ > + > +static void > +edge_range_query (irange &r, edge e, gcond *cond, gimple_ranger &ranger) > +{ > + auto_vec path (2); > + path.safe_push (e->dest); > + path.safe_push (e->src); > + path_range_query query (ranger, path); > + if (!query.range_of_stmt (r, cond)) > +r.set_varying (boolean_type_node); > +} > + > /* Return true if the condition on the first iteration of the loop can > be statically determined. */ > > @@ -72,8 +86,7 @@ entry_loop_condition_is_static (class loop *l, > gimple_ranger *ranger) > desired_static_value = boolean_true_node; > >int_range<2> r; > - path_range_query query (*ranger, e); > - query.range_of_stmt (r, last); > + edge_range_query (r, e, last, *ranger); >return r == int_range<2> (desired_static_value, desired_static_value); > } > > -- > 2.37.1 >
Re: [PATCH] Fix bogus -Wstringop-overflow warning in Ada
On Thu, Aug 18, 2022 at 4:46 PM Eric Botcazou wrote: > > > Hmm :/ But that means we _should_ force a sign extension but only > > from ptrofftype_p ()? That is, your test above should maybe read > > > >signop sgn = TYPE_SIGN (type); > >if (ptrofftype_p (type)) > > sgn = SIGNED; > > > > assuming 'type' is the type of lowbnd > > Yes, that's essentially equivalent to what get_offset_range does, but I'm not > sure why having two slightly different ways of doing it would be better than a > single one here, Maybe replace the call to get_precision in both places with > TYPE_PRECSION (type) then? I wasn't aware of the copy in get_offset_range. To cite: wide_int wr[2]; if (!get_range (x, stmt, wr, rvals)) return false; signop sgn = SIGNED; /* Only convert signed integers or unsigned sizetype to a signed offset and avoid converting large positive values in narrower types to negative offsets. */ if (TYPE_UNSIGNED (type) && wr[0].get_precision () < TYPE_PRECISION (sizetype)) sgn = UNSIGNED; r[0] = offset_int::from (wr[0], sgn); r[1] = offset_int::from (wr[1], sgn); I guess the main issue here is that the machinery converts to offset_int prematurely and thus needs to do it even when it's not clear in what context (POINTER_PLUS_EXPR offset or not) it is used. The code unfortunately is a bit of a mess and I'm not too familiar with it. I'm OK with your original patch, given the above it's consistent (even if maybe broken). Thanks, Richard. > > -- > Eric Botcazou > > >