[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2023-06-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

Andrew Pinski  changed:

   What|Removed |Added

 CC||dilfridge at gentoo dot org

--- Comment #51 from Andrew Pinski  ---
*** Bug 84377 has been marked as a duplicate of this bug. ***

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-29 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

Uroš Bizjak  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |7.4

--- Comment #50 from Uroš Bizjak  ---
Fixed for 7.4+.

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-29 Thread uros at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #49 from uros at gcc dot gnu.org ---
Author: uros
Date: Mon Jan 29 16:03:17 2018
New Revision: 257154

URL: https://gcc.gnu.org/viewcvs?rev=257154=gcc=rev
Log:
Backport from mainline
2018-01-26  Uros Bizjak  

PR target/81763
* config/i386/i386.md (*andndi3_doubleword): Add earlyclobber
to (=,r,rm) alternative. Add (=r,0,rm) and (=r,r,0) alternatives.


Modified:
branches/gcc-7-branch/gcc/ChangeLog
branches/gcc-7-branch/gcc/config/i386/i386.md

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-26 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #48 from Uroš Bizjak  ---
(In reply to Mike Lothian from comment #47)
> With the patch you've committed to gcc master, applied on top of GCC 7.3 I'm
> now seeing the following error building Clang:

I don't think this is related to the original bugreport involving BMI
instructions.  Could be Clang's source incompatibility with 32bit target.

In any case, please open a new PR.

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-26 Thread mike at fireburn dot co.uk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #47 from Mike Lothian  ---
With the patch you've committed to gcc master, applied on top of GCC 7.3 I'm
now seeing the following error building Clang:

[294/954] /usr/bin/x86_64-pc-linux-gnu-g++ -m32 -D_FILE_OFFSET_BITS=64
-D_GNU_SOURCE -D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS
-D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Serialization
-I/var/tmp/portage/sys-devel/clang-/work/x/y/clang-/lib/Serialization
-I/var/tmp/portage/sys-devel/clang-/work/x/y/clang-/include -Iinclude
-I/usr/lib/llvm/7/include  -DNDEBUG -O2 -march=native -pipe
-mindirect-branch=thunk -mfunction-return=thunk -mindirect-branch-register
-fpermissive -fPIC -fvisibility-inlines-hidden -Werror=date-time -std=c++11
-Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual
-Wno-missing-field-initializers -Wdelete-non-virtual-dtor -Wno-comment
-ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual
-fno-strict-aliasing -pedantic -Wno-long-long -fPIC -MD -MT
lib/Serialization/CMakeFiles/clangSerialization.dir/ASTReader.cpp.o -MF
lib/Serialization/CMakeFiles/clangSerialization.dir/ASTReader.cpp.o.d -o
lib/Serialization/CMakeFiles/clangSerialization.dir/ASTReader.cpp.o -c
/var/tmp/portage/sys-devel/clang-/work/x/y/clang-/lib/Serialization/ASTReader.cpp
FAILED: lib/Serialization/CMakeFiles/clangSerialization.dir/ASTReader.cpp.o
/usr/bin/x86_64-pc-linux-gnu-g++ -m32 -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE
-D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS
-D__STDC_LIMIT_MACROS -Ilib/Serialization
-I/var/tmp/portage/sys-devel/clang-/work/x/y/clang-/lib/Serialization
-I/var/tmp/portage/sys-devel/clang-/work/x/y/clang-/include -Iinclude
-I/usr/lib/llvm/7/include  -DNDEBUG -O2 -march=native -pipe
-mindirect-branch=thunk -mfunction-return=thunk -mindirect-branch-register
-fpermissive -fPIC -fvisibility-inlines-hidden -Werror=date-time -std=c++11
-Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual
-Wno-missing-field-initializers -Wdelete-non-virtual-dtor -Wno-comment
-ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual
-fno-strict-aliasing -pedantic -Wno-long-long -fPIC -MD -MT
lib/Serialization/CMakeFiles/clangSerialization.dir/ASTReader.cpp.o -MF
lib/Serialization/CMakeFiles/clangSerialization.dir/ASTReader.cpp.o.d -o
lib/Serialization/CMakeFiles/clangSerialization.dir/ASTReader.cpp.o -c
/var/tmp/portage/sys-devel/clang-/work/x/y/clang-/lib/Serialization/ASTReader.cpp
In file included from
/var/tmp/portage/sys-devel/clang-/work/x/y/clang-/include/clang/AST/APValue.h:20:0,
 from
/var/tmp/portage/sys-devel/clang-/work/x/y/clang-/include/clang/AST/Decl.h:17,
 from
/var/tmp/portage/sys-devel/clang-/work/x/y/clang-/include/clang/AST/DeclObjC.h:17,
 from
/var/tmp/portage/sys-devel/clang-/work/x/y/clang-/include/clang/Serialization/ASTReader.h:17,
 from
/var/tmp/portage/sys-devel/clang-/work/x/y/clang-/lib/Serialization/ASTReader.cpp:14:
/usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h: In instantiation of ‘struct
llvm::PointerIntPairInfo*>, 2,
llvm::PointerLikeTypeTraits*> > >’:
/usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h:56:57:   required from
‘PointerTy llvm::PointerIntPair::getPointer() const [with PointerTy =
llvm::PointerUnion*>; unsigned int IntBits = 2; IntType = unsigned int; PtrTraits
= llvm::PointerLikeTypeTraits*> >; Info =
llvm::PointerIntPairInfo*>, 2,
llvm::PointerLikeTypeTraits*> > >]’
/var/tmp/portage/sys-devel/clang-/work/x/y/clang-/include/clang/AST/Decl.h:2859:39:
  required from here
/usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h:132:3: error: static
assertion failed: PointerIntPair with integer size too large for pointer
   static_assert(IntBits <= PtrTraits::NumLowBitsAvailable,
   ^
/usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h:147:42: warning: left shift
count >= width of type [-Wshift-count-overflow]
 ShiftedIntMask = (uintptr_t)(IntMask << IntShift)
 ~^~~~
/usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h:147:42: warning: right
operand of shift expression ‘(3 << 4294967295)’ is >= than the precision of the
left operand [-fpermissive]
/usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h: In 

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-26 Thread manuel.lauss at googlemail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #46 from Manuel Lauss  ---
Created attachment 43257
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43257=edit
reducest testcase

reduced testcase for Jakub's patch in comment #36 and the build failure it
causes in comment #42:

g++ (Gentoo 7.2.0-r1 p1.1) 7.2.0

g++ -m32 -O2 -mbmi -fPIC -fno-strict-aliasing -c testcase.i
testcase.i:126:12: warning: ‘unsigned int {anonymous}::bc::bn()’ used but never
defined
   unsigned bn();
^~
testcase.i:98:8: warning: ‘void {anonymous}::bc::bo({anonymous}::bc&)’ used but
never defined
   void bo(bc &);
^~
testcase.i:154:3: warning: ‘{anonymous}::bw::bw(unsigned int)’ used but never
defined
   bw(unsigned);
   ^~
testcase.i: In function ‘bool bz({anonymous}::bw&, ai::as, by&, bool,
ai::av&)’:
testcase.i:178:1: internal compiler error: in gen_split_178, at
config/i386/i386.md:8623
 }
 ^

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-26 Thread uros at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #45 from uros at gcc dot gnu.org ---
Author: uros
Date: Fri Jan 26 15:36:32 2018
New Revision: 257096

URL: https://gcc.gnu.org/viewcvs?rev=257096=gcc=rev
Log:
PR target/81763
* config/i386/i386.md (*andndi3_doubleword): Add earlyclobber
to (=,r,rm) alternative. Add (=r,0,rm) and (=r,r,0) alternatives.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.md

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-26 Thread manuel.lauss at googlemail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #44 from Manuel Lauss  ---
Created attachment 43252
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43252=edit
compressed preprocessed source

g++ -m32 -O3 -ggdb -march=znver1 -mtune=broadwell -pipe -fPIC
-fvisibility-inlines-hidden -Werror=date-time -std=c++11 -Wall -W
-Wno-unused-parameter -Wwrite-strings -Wcast-qual
-Wno-missing-field-initializers -Wdelete-non-virtual-dtor -Wno-comment
-ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual
-fno-strict-aliasing -pedantic -Wno-long-long -fPIC -o test.o -c
PPExpressions.i

/tmp-ram/portage/sys-devel/clang-/work/x/y/clang-/lib/Lex/PPExpressions.cpp:
In function ‘bool EvaluateValue({anonymous}::PPValue&, clang::Token&,
DefinedTracker&, bool, clang::Preprocessor&)’:
/tmp-ram/portage/sys-devel/clang-/work/x/y/clang-/lib/Lex/PPExpressions.cpp:492:1:
internal compiler error: in gen_split_178, at config/i386/i386.md:8623
 }
 ^
Please submit a full bug report,
with preprocessed source if appropriate.

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-26 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #43 from Jakub Jelinek  ---
(In reply to Uroš Bizjak from comment #41)
> Let's go forward with this pattern:
> 
> (define_insn "*andndi3_doubleword"
>   [(set (match_operand:DI 0 "register_operand" "=,r,r,")
>   (and:DI
> (not:DI (match_operand:DI 1 "register_operand" "r,0,r,0"))
> (match_operand:DI 2 "nonimmediate_operand" "rm,rm,0,rm")))
>(clobber (reg:CC FLAGS_REG))]
>   "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
>   "#"
>   [(set_attr "isa" "bmi,bmi,bmi,*")])

This looks reasonable to me.

(In reply to Mike Lothian from comment #42)
> With the patch in Comment 36 I get the following error compiling Clang

Just out of interest, could you attach the preprocessed source here, I'd like
to see the case when the RA assigns such a partial overlap.  As you're using
-march=native, will need g++ output with -v added to the command too (what
flags it expands to).

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-26 Thread mike at fireburn dot co.uk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #42 from Mike Lothian  ---
With the patch in Comment 36 I get the following error compiling Clang

FAILED: lib/Lex/CMakeFiles/clangLex.dir/PPExpressions.cpp.o
/usr/bin/x86_64-pc-linux-gnu-g++ -m32 -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE
-D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS
-D__STDC_LIMIT_MACROS -Ilib/Lex
-I/var/tmp/portage/sys-devel/clang-/work/x/y/clang-/lib/Lex
-I/var/tmp/portage/sys-devel/clang-/work/x/y/clang-/include -Iinclude
-I/usr/lib/llvm/7/include  -DNDEBUG -O2 -march=native -pipe
-mindirect-branch=thunk -mfunction-return=thunk -mindirect-branch-register
-fpermissive -fPIC -fvisibility-inlines-hidden -Werror=date-time -std=c++11
-Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual
-Wno-missing-field-initializers -Wdelete-non-virtual-dtor -Wno-comment
-ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual
-fno-strict-aliasing -pedantic -Wno-long-long -fPIC -MD -MT
lib/Lex/CMakeFiles/clangLex.dir/PPExpressions.cpp.o -MF
lib/Lex/CMakeFiles/clangLex.dir/PPExpressions.cpp.o.d -o
lib/Lex/CMakeFiles/clangLex.dir/PPExpressions.cpp.o -c
/var/tmp/portage/sys-devel/clang-/work/x/y/clang-/lib/Lex/PPExpressions.cpp
/var/tmp/portage/sys-devel/clang-/work/x/y/clang-/lib/Lex/PPExpressions.cpp:
In function ‘bool EvaluateValue({anonymous}::PPValue&, clang::Token&,
DefinedTracker&, bool, clang::Preprocessor&)’:
/var/tmp/portage/sys-devel/clang-/work/x/y/clang-/lib/Lex/PPExpressions.cpp:492:1:
internal compiler error: in gen_split_178, at config/i386/i386.md:8623
 }
 ^

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-26 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #41 from Uroš Bizjak  ---
Let's go forward with this pattern:

(define_insn "*andndi3_doubleword"
  [(set (match_operand:DI 0 "register_operand" "=,r,r,")
(and:DI
  (not:DI (match_operand:DI 1 "register_operand" "r,0,r,0"))
  (match_operand:DI 2 "nonimmediate_operand" "rm,rm,0,rm")))
   (clobber (reg:CC FLAGS_REG))]
  "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
  "#"
  [(set_attr "isa" "bmi,bmi,bmi,*")])

(=,r,rm) alternative avoids matching of output to all other overlapped and
partial overlapped operands.

(=r,0,rm) alternative allows matching with op1, so we are sure output won't
*partially* overlap with op2. This is true for register and memory operands. It
can still fully overlap with register op2 in case the same reg is allocated for
op1, op2 and op3, which is OK for BMI.

(=r,r,0) alternative will prevent *partial* overlap of output with op1 in a
similar way.

(=,0,rm) is non-bmi alternative. Earlyclobber is needed, otherwise RA can
match op2 with op0 and op1, so the same reg is allocated for op0, op1 and op2.
If this is the case, after the split, NOT/AND sequence won't match ANDN
instruction, since NOT will change both input operands to a follow-up AND insn.

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-26 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #40 from Jakub Jelinek  ---
(In reply to Uroš Bizjak from comment #37)
> (In reply to Jakub Jelinek from comment #33)
> 
> > and it should work.  The last case would be right now:
> >   SI:N+1 = SI:N &~ SI:N+2; SI:N+2 = SI:N+1 &~ SI:N+3;
> > and is again wrong, but we could again swap:
> >   SI:N+2 = SI:N+1 &~ SI:N+3; SI:N+1 = SI:N &~ SI:N+2;
> > and all is fine.
> 
> Whoops, it looks that SI:N+2 is clobbered in the swapped case.

You're right.  So the question is if IRA/LRA can ever allow that case where
there is partial overlap with both registers.  I've tried hard to simulate that
case with:
unsigned long long
foo (unsigned long long x, unsigned long long y)
{
  unsigned long long z;
  asm ("" : "+A" (x), "+Q" (y));
  z = x & ~y;
  asm ("" : "+Q" (z) : "a" (0), "b" (0));
  return z;
}
where IRA indeed allocates the used pseudos such that x is in ax:dx, y in cx:bx
and z in dx:cx.  Now, if I try this and testcase with ~x & y instead of x & ~y
with GCC patched with #c36, I get:
andn%eax, %ecx, %ecx
xorl%eax, %eax
andn%edx, %ebx, %ebx
movl%ecx, %edx
movl%ebx, %ecx
movl%eax, %ebx
resp.
andn%ecx, %eax, %ecx
xorl%eax, %eax
andn%ebx, %edx, %ebx
movl%ecx, %edx
movl%ebx, %ecx
movl%eax, %ebx
between the two inline asms, and if I leave just the =r <- (r, r) alternative
and nothing else, LRA ICEs on it (on both variants).  All is with -O2 -m32
-mbmi -mstv -msse2.  So, is there something in LRA that prevents these partial
overlaps?

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-25 Thread mike at fireburn dot co.uk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #39 from Mike Lothian  ---
I can confirm it fixes things for me too. 

Is that the final patch in Comment 36? If so I'll try and get the Gentoo devs
to include it in the GCC ebuilds

Will this be added to GCC 8.1 and 7.4?

Thanks again, this bug has been plaguing me for the last 8 months

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-25 Thread manuel.lauss at googlemail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #38 from Manuel Lauss  ---
(In reply to Jakub Jelinek from comment #36)

Your patch does fix my llvm issue. Thank you!

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-25 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #37 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #33)

> and it should work.  The last case would be right now:
>   SI:N+1 = SI:N &~ SI:N+2; SI:N+2 = SI:N+1 &~ SI:N+3;
> and is again wrong, but we could again swap:
>   SI:N+2 = SI:N+1 &~ SI:N+3; SI:N+1 = SI:N &~ SI:N+2;
> and all is fine.

Whoops, it looks that SI:N+2 is clobbered in the swapped case.

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-25 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #36 from Jakub Jelinek  ---
Ah, bmi, not bmi2.  And the =r <- (r, r) alternative might be best first.
--- gcc/config/i386/i386.md.jj  2018-01-16 09:28:19.721432394 +0100
+++ gcc/config/i386/i386.md 2018-01-25 20:58:18.382378827 +0100
@@ -9250,14 +9250,14 @@ (define_split
 })

 (define_insn "*andndi3_doubleword"
-  [(set (match_operand:DI 0 "register_operand" "=r,")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,")
(and:DI
- (not:DI (match_operand:DI 1 "register_operand" "r,0"))
- (match_operand:DI 2 "nonimmediate_operand" "rm,rm")))
+ (not:DI (match_operand:DI 1 "register_operand" "r,0,r"))
+ (match_operand:DI 2 "nonimmediate_operand" "r,rm,m")))
(clobber (reg:CC FLAGS_REG))]
   "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
   "#"
-  [(set_attr "isa" "bmi,*")])
+  [(set_attr "isa" "bmi,*,bmi")])

 (define_split
   [(set (match_operand:DI 0 "register_operand")
@@ -9273,7 +9273,22 @@ (define_split
(parallel [(set (match_dup 3)
   (and:SI (not:SI (match_dup 4)) (match_dup 5)))
  (clobber (reg:CC FLAGS_REG))])]
-  "split_double_mode (DImode, [0], 3, [0], [3]);")
+{
+  split_double_mode (DImode, [0], 3, [0], [3]);
+  /* For the =r <- (r, r) alternative of *andndi3_doubleword, there could
+ be overlap between the output and input registers.  If the output
+ is equal to one of the input operands, this is fine, if there is
+ partial overlap, we can resolve it by swapping the two instructions.  */
+  if (reg_overlap_mentioned_p (operands[0], operands[4])
+  || reg_overlap_mentioned_p (operands[0], operands[5]))
+{
+  std::swap (operands[0], operands[3]);
+  std::swap (operands[1], operands[4]);
+  std::swap (operands[2], operands[5]);
+  gcc_assert (!reg_overlap_mentioned_p (operands[0], operands[4])
+ && !reg_overlap_mentioned_p (operands[0], operands[5]));
+}
+})

 (define_split
   [(set (match_operand:DI 0 "register_operand")

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-25 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #35 from Jakub Jelinek  ---
So, what about following?
--- gcc/config/i386/i386.md.jj  2018-01-16 09:28:19.721432394 +0100
+++ gcc/config/i386/i386.md 2018-01-25 20:58:18.382378827 +0100
@@ -9250,14 +9250,14 @@ (define_split
 })

 (define_insn "*andndi3_doubleword"
-  [(set (match_operand:DI 0 "register_operand" "=r,")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,")
(and:DI
- (not:DI (match_operand:DI 1 "register_operand" "r,0"))
- (match_operand:DI 2 "nonimmediate_operand" "rm,rm")))
+ (not:DI (match_operand:DI 1 "register_operand" "0,r,r"))
+ (match_operand:DI 2 "nonimmediate_operand" "rm,r,m")))
(clobber (reg:CC FLAGS_REG))]
   "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
   "#"
-  [(set_attr "isa" "bmi,*")])
+  [(set_attr "isa" "*,bmi2,bmi2")])

 (define_split
   [(set (match_operand:DI 0 "register_operand")
@@ -9273,7 +9273,22 @@ (define_split
(parallel [(set (match_dup 3)
   (and:SI (not:SI (match_dup 4)) (match_dup 5)))
  (clobber (reg:CC FLAGS_REG))])]
-  "split_double_mode (DImode, [0], 3, [0], [3]);")
+{
+  split_double_mode (DImode, [0], 3, [0], [3]);
+  /* For the =r <- (r, r) alternative of *andndi3_doubleword, there could
+ be overlap between the output and input registers.  If the output
+ is equal to one of the input operands, this is fine, if there is
+ partial overlap, we can resolve it by swapping the two instructions.  */
+  if (reg_overlap_mentioned_p (operands[0], operands[4])
+  || reg_overlap_mentioned_p (operands[0], operands[5]))
+{
+  std::swap (operands[0], operands[3]);
+  std::swap (operands[1], operands[4]);
+  std::swap (operands[2], operands[5]);
+  gcc_assert (!reg_overlap_mentioned_p (operands[0], operands[4])
+ && !reg_overlap_mentioned_p (operands[0], operands[5]));
+}
+})

 (define_split
   [(set (match_operand:DI 0 "register_operand")

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-25 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #34 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #33)

> At least with a smarter splitter we don't really need to avoid no overlap at
> all for the r <- (r, r) bmi case, we can choose which of the two 32-bit
> andn's we do first depending on the overlap, all we need to guarantee is
> that the splitter is not impossible and ideally doesn't need any
> instructions but the two.

This would work nicely. I have seen a couple of places with the attached source
(using only -mbmi -m32) where RA satisfied & or 0 constraint only with spills
and would benefit from relaxed constraints.

BTW: AFAICS, "andn" is the only double-word three-operand instruction that is
split after reload. So, andn r <- (r,r) post-reload splitter is a precedent in
x86 world.

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-25 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #33 from Jakub Jelinek  ---
(In reply to Uroš Bizjak from comment #31)
> (In reply to Uroš Bizjak from comment #30)
> > So, I'll bootstrap:
> 
> Maybe we can also allow  <- (r,r) for BMI, to be safe (c.f. comment #23):
> 
> (define_insn "*andndi3_doubleword"
>   [(set (match_operand:DI 0 "register_operand" "=r,")
>   (and:DI
> (not:DI (match_operand:DI 1 "register_operand" "0,r"))
> (match_operand:DI 2 "nonimmediate_operand" "rm,rm")))
>(clobber (reg:CC FLAGS_REG))]
>   "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
>   "#"
>   [(set_attr "isa" "*,bmi")])
> 
> Manuel, can you please test this pattern?

At least with a smarter splitter we don't really need to avoid no overlap at
all for the r <- (r, r) bmi case, we can choose which of the two 32-bit andn's
we do first depending on the overlap, all we need to guarantee is that the
splitter is not impossible and ideally doesn't need any instructions but the
two.
Hard registers for DImode must be consecutive because we identify them by the
(lowest) register number and mode and for r <- (r, r) there can't be any
overlap between the two input operands.  So, even if DImode registers can start
at any GPR number other than the last, not just even ones, either there is no
overlap at all in between output and inputs, or the output is the same as the
first, or as the second input (all these cases are fine), or there is a partial
overlap with one or both of the operands.
For the partial operand I can think of DI:N = DI:N+1 &~ DI:unrelated, or
DI:N+1 = DI:N &~ DI:unrelated, or DI:N+1 = DI:N &~ DI:N+2 (or swapped
operands),
the last case is partial overlap with both inputs.  So, right now we'd split
those into:
  SI:N = SI:N+1 &~ SI:unrelated; SI:N+1 = SI:N+2 &~ SI:unrelated+1
which is fine, or
  SI:N+1 = SI:N &~ SI:unrelated; SI:N+2 = SI:N+1 &~ SI:unrelated+1
which is wrong but we can swap those:
  SI:N+2 = SI:N+1 &~ SI:unrelated+1; SI:N+1 = SI:N &~ SI:unrelated
and it should work.  The last case would be right now:
  SI:N+1 = SI:N &~ SI:N+2; SI:N+2 = SI:N+1 &~ SI:N+3;
and is again wrong, but we could again swap:
  SI:N+2 = SI:N+1 &~ SI:N+3; SI:N+1 = SI:N &~ SI:N+2;
and all is fine.

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-25 Thread manuel.lauss at googlemail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #32 from Manuel Lauss  ---
(In reply to Uroš Bizjak from comment #31)
> (In reply to Uroš Bizjak from comment #30)
> > So, I'll bootstrap:
> 
> Maybe we can also allow  <- (r,r) for BMI, to be safe (c.f. comment #23):
> 
> (define_insn "*andndi3_doubleword"
>   [(set (match_operand:DI 0 "register_operand" "=r,")
>   (and:DI
> (not:DI (match_operand:DI 1 "register_operand" "0,r"))
> (match_operand:DI 2 "nonimmediate_operand" "rm,rm")))
>(clobber (reg:CC FLAGS_REG))]
>   "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
>   "#"
>   [(set_attr "isa" "*,bmi")])
> 
> Manuel, can you please test this pattern?

Seems to work as well:

[...]
  b1:   c4 e2 73 f7 d2  shrx   %ecx,%edx,%edx
  b6:   0f 45 c2cmovne %edx,%eax
  b9:   0f 45 d5cmovne %ebp,%edx
  bc:   c4 e2 68 f2 54 df 04andn   0x4(%edi,%ebx,8),%edx,%edx
  c3:   89 d1   mov%edx,%ecx
  c5:   c4 e2 78 f2 04 df   andn   (%edi,%ebx,8),%eax,%eax
  cb:   09 c1   or %eax,%ecx
[...]

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-25 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

Uroš Bizjak  changed:

   What|Removed |Added

 Status|WAITING |NEW

--- Comment #31 from Uroš Bizjak  ---
(In reply to Uroš Bizjak from comment #30)
> So, I'll bootstrap:

Maybe we can also allow  <- (r,r) for BMI, to be safe (c.f. comment #23):

(define_insn "*andndi3_doubleword"
  [(set (match_operand:DI 0 "register_operand" "=r,")
(and:DI
  (not:DI (match_operand:DI 1 "register_operand" "0,r"))
  (match_operand:DI 2 "nonimmediate_operand" "rm,rm")))
   (clobber (reg:CC FLAGS_REG))]
  "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
  "#"
  [(set_attr "isa" "*,bmi")])

Manuel, can you please test this pattern?

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-25 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #30 from Uroš Bizjak  ---
So, I'll bootstrap:

(define_insn "*andndi3_doubleword"
  [(set (match_operand:DI 0 "register_operand" "=r,")
(and:DI
  (not:DI (match_operand:DI 1 "register_operand" "0,r"))
  (match_operand:DI 2 "nonimmediate_operand" "rm,m")))
   (clobber (reg:CC FLAGS_REG))]
  "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
  "#"
  [(set_attr "isa" "*,bmi")])

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-25 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #29 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #25)
> I believe for double-word pseudos the RA will not do that, CCing Vlad about
> it.

I start to worry about it due to allocated:

   0x080a96c3 <+1347>:  andn   (%eax,%ebx,8),%edx,%eax
=> 0x080a96c9 <+1353>:  andn   0x4(%eax,%ebx,8),%ecx,%edx

In addition to memory input operand, (dx,cx) regpair is used and (ax,dx) is the
output operand. These regpairs do in fact interleave!

> Anyway, by having all of r <- (r, r), r <- (0, rm) and  <- (r, m)
> alternatives I'd think the RA has more choices than when it just has the
> first 2.
> If it sees it as beneficial to have the middle operand in the destination,
> it can due to the second alternative even if third one is a memory, if it
> wants some other, it can, just needs to make sure the destination doesn't
> overlap with mem's address.

Taking all those facts into account, I think we can allow two alternatives:

r <- (0, rm)

and additionally

 <- (r, m) for BMI

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-25 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #28 from Uroš Bizjak  ---
(In reply to Uroš Bizjak from comment #23)
> (The earlyclobber of non-BMI case is needed due to separate not insn).

It is not needed. I have added earlyclobber in r243202 without much thought.

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-25 Thread manuel.lauss at googlemail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #27 from Manuel Lauss  ---
(In reply to Uroš Bizjak from comment #21)
> Following patch should fix the problem:
> 
> --cut here--
> Index: i386.md
> ===
> --- i386.md (revision 256935)
> +++ i386.md (working copy)
> @@ -9250,10 +9250,10 @@
>  })
>  
>  (define_insn "*andndi3_doubleword"
> -  [(set (match_operand:DI 0 "register_operand" "=r,")
> +  [(set (match_operand:DI 0 "register_operand" "=r,r")
> (and:DI
>   (not:DI (match_operand:DI 1 "register_operand" "r,0"))
> - (match_operand:DI 2 "nonimmediate_operand" "rm,rm")))
> + (match_operand:DI 2 "nonimmediate_operand" "r,rm")))
> (clobber (reg:CC FLAGS_REG))]
>"!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
>"#"
> --cut here--
> 
> So, the pattern now reads:
> 
> (define_insn "*andndi3_doubleword"
>   [(set (match_operand:DI 0 "register_operand" "=r,r")
>   (and:DI
> (not:DI (match_operand:DI 1 "register_operand" "r,0"))
> (match_operand:DI 2 "nonimmediate_operand" "r,rm")))
>(clobber (reg:CC FLAGS_REG))]
>   "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
>   "#"
>   [(set_attr "isa" "bmi,*")])


I rebuilt gcc-7.3 with this applied, now the generated code looks much better:
  ab:   0f ad d0shrd   %cl,%edx,%eax
  ae:   f6 c1 20test   $0x20,%cl
  b1:   c4 e2 73 f7 d2  shrx   %ecx,%edx,%edx
  b6:   0f 45 c2cmovne %edx,%eax
  b9:   0f 45 d5cmovne %ebp,%edx
  bc:   c4 e2 68 f2 54 df 04andn   0x4(%edi,%ebx,8),%edx,%edx
  c3:   89 d1   mov%edx,%ecx
  c5:   c4 e2 78 f2 04 df   andn   (%edi,%ebx,8),%eax,%eax
  cb:   09 c1   or %eax,%ecx

And it seems to work as well, 32bit llvm now built successfully.

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-25 Thread mike at fireburn dot co.uk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #26 from Mike Lothian  ---
Is this the patch you want us to test then:

diff -ur a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
--- a/gcc/config/i386/i386.md   2018-01-16 11:17:49.509247000 +
+++ b/gcc/config/i386/i386.md   2018-01-25 18:21:25.562225621 +
@@ -8586,7 +8586,7 @@
 (define_insn "*andndi3_doubleword"
   [(set (match_operand:DI 0 "register_operand" "=r,")
(and:DI
- (not:DI (match_operand:DI 1 "register_operand" "r,0"))
+ (not:DI (match_operand:DI 1 "register_operand" "0,0"))
  (match_operand:DI 2 "nonimmediate_operand" "rm,rm")))
(clobber (reg:CC FLAGS_REG))]
   "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-25 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

Jakub Jelinek  changed:

   What|Removed |Added

 CC||vmakarov at gcc dot gnu.org

--- Comment #25 from Jakub Jelinek  ---
I believe for double-word pseudos the RA will not do that, CCing Vlad about it.
Anyway, by having all of r <- (r, r), r <- (0, rm) and  <- (r, m)
alternatives I'd think the RA has more choices than when it just has the first
2.
If it sees it as beneficial to have the middle operand in the destination, it
can due to the second alternative even if third one is a memory, if it wants
some other, it can, just needs to make sure the destination doesn't overlap
with mem's address.

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-25 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #24 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #22)
> Wonder though if it wouldn't give the RA more choices by also including
> another
>  <- (r, m) alternative with bmi2 isa attribute.

This would be worse than r <- (0, m) alternative on register starved x86_32
architecture. The above approach can use up to 6 registers, while r <- (0, m)
uses up to 4.

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-25 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #23 from Uroš Bizjak  ---
The above patch builds on the promise, that with (=r,r,r) alternative, the
register allocator won't allocate (=r1,=r2) = ~(r0,r1) & (r2,r3). This would
again clobber the r1 too early:

r1 = ~r0 & r2
r2 = ~r1 & r3

The safest choice of the pattern would be:

(define_insn "*andndi3_doubleword"
  [(set (match_operand:DI 0 "register_operand" "=r,")
(and:DI
  (not:DI (match_operand:DI 1 "register_operand" "0,0"))
  (match_operand:DI 2 "nonimmediate_operand" "rm,rm")))
   (clobber (reg:CC FLAGS_REG))]
  "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
  "#"
  [(set_attr "isa" "bmi,*")])

(The earlyclobber of non-BMI case is needed due to separate not insn).

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-25 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #22 from Jakub Jelinek  ---
Will do, for now I'm including it with my normal options bootstraps (testing
other patches and need the same baseline), then will try some
--with-arch/--with-tune).
Wonder though if it wouldn't give the RA more choices by also including another
 <- (r, m) alternative with bmi2 isa attribute.

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-25 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #21 from Uroš Bizjak  ---
Following patch should fix the problem:

--cut here--
Index: i386.md
===
--- i386.md (revision 256935)
+++ i386.md (working copy)
@@ -9250,10 +9250,10 @@
 })

 (define_insn "*andndi3_doubleword"
-  [(set (match_operand:DI 0 "register_operand" "=r,")
+  [(set (match_operand:DI 0 "register_operand" "=r,r")
(and:DI
  (not:DI (match_operand:DI 1 "register_operand" "r,0"))
- (match_operand:DI 2 "nonimmediate_operand" "rm,rm")))
+ (match_operand:DI 2 "nonimmediate_operand" "r,rm")))
(clobber (reg:CC FLAGS_REG))]
   "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
   "#"
--cut here--

So, the pattern now reads:

(define_insn "*andndi3_doubleword"
  [(set (match_operand:DI 0 "register_operand" "=r,r")
(and:DI
  (not:DI (match_operand:DI 1 "register_operand" "r,0"))
  (match_operand:DI 2 "nonimmediate_operand" "r,rm")))
   (clobber (reg:CC FLAGS_REG))]
  "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
  "#"
  [(set_attr "isa" "bmi,*")])

Instead of using earlyclobber on the output operand (which guarantees that no
input register will be clobbered), we can use a little trick and in case of
memory operand 2, match the output with a register operand 1. This will keep
output registers separate from registers in the memory operand (and is in fact
what we do in all other _doubleword patterns).

Jakub, I don't have Haswell target to test BMI instructions via native
bootstrap, can you perhaps bootstrap the compiler with the above patch?

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-25 Thread manuel.lauss at googlemail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #20 from Manuel Lauss  ---
Created attachment 43242
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43242=edit
preprocessed source

preprocessed source of file that contains the function
"llvm::TypeInfer::EnforceSmallerThan", compressed due it being 2.5MB.

The Makefile builds it with these parameters: (stripped the includes)
g++ -m32 -O3 -ggdb -march=znver1 -mtune=broadwell -pipe -fPIC
-fvisibility-inlines-hidden -Werror=date-time -std=c++11 -Wall -W
-Wno-unused-parameter -Wwrite-strings -Wcast-qual
-Wno-missing-field-initializers -pedantic -Wno-long-long
-Wno-maybe-uninitialized -Wdelete-non-virtual-dtor -Wno-comment
-ffunction-sections -fdata-sections

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-25 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #19 from Jakub Jelinek  ---
Haven't managed to reproduce it e.g. with
long long
foo (long long *p, int q, unsigned r1, unsigned r2)
{
  int t, u;
  asm ("" : "+a" (p), "+b" (q), "+d" (r1), "+c" (r2), "=S" (t), "=D" (u));
  unsigned long long r = ((unsigned long long) r2 << 32) | r1;
  long long a = p[q] & ~r;
  asm volatile ("" : "+A" (a) : "S" (t), "D" (u));
  return a;
}

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-25 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #18 from Jakub Jelinek  ---
(In reply to Uroš Bizjak from comment #17)
> (In reply to Manuel Lauss from comment #16)
> >0x080a96c3 <+1347>:  andn   (%eax,%ebx,8),%edx,%eax
> > => 0x080a96c9 <+1353>:  andn   0x4(%eax,%ebx,8),%ecx,%edx
> 
> This looks like double-word andn is clobbering %eax too early.

Yeah.  In that case, can you please attach the preprocessed source of whatever
source contains that and g++ command line options used to compile that?
Thanks.

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-25 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #17 from Uroš Bizjak  ---
(In reply to Manuel Lauss from comment #16)
>0x080a96c3 <+1347>:  andn   (%eax,%ebx,8),%edx,%eax
> => 0x080a96c9 <+1353>:  andn   0x4(%eax,%ebx,8),%ecx,%edx

This looks like double-word andn is clobbering %eax too early.

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-25 Thread manuel.lauss at googlemail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

Manuel Lauss  changed:

   What|Removed |Added

 CC||manuel.lauss at googlemail dot 
com

--- Comment #16 from Manuel Lauss  ---
I've seen this as well.
Build llvm-7 with gcc-7.3, -m32 -march=znver1 -mtune=broadwell
-O1 / -Og: all good
-O2: llvm-tblgen runs in an infinite(?) loop
-O3: llvm-tblgen segfaults

-mno-bmi: all good, at all optimization levels

In case it helps, here's gdb info:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x080a96c9 in llvm::MachineValueTypeSet::const_iterator::find_from_pos
(P=, this=)
at
/tmp-ram/x1/portage/sys-devel/llvm-/work/llvm-/utils/TableGen/CodeGenDAGPatterns.h:148
148 W &= maskLeadingOnes(WordWidth-SkipBits);

(gdb) list
143
144   // If P is in the middle of a word, process it manually here,
because
145   // the trailing bits need to be masked off to use findFirstSet.
146   if (SkipBits != 0) {
147 WordType W = Set->Words[SkipWords];
148 W &= maskLeadingOnes(WordWidth-SkipBits);
149 if (W != 0)
150   return Count + findFirstSet(W);
151 Count += WordWidth;
152 SkipWords++;

[...]
   0x080a968d <+1293>:  shr$0x6,%ebx
   0x080a9690 <+1296>:  and$0xffc0,%esi
   0x080a9693 <+1299>:  and$0x3f,%eax
   0x080a9696 <+1302>:  je 0x80a96e0

   0x080a9698 <+1304>:  mov$0x40,%ecx
   0x080a969d <+1309>:  mov$0x,%edx
   0x080a96a2 <+1314>:  xor%edi,%edi
   0x080a96a4 <+1316>:  sub%eax,%ecx
   0x080a96a6 <+1318>:  mov$0x,%eax
   0x080a96ab <+1323>:  shrd   %cl,%edx,%eax
   0x080a96ae <+1326>:  test   $0x20,%cl
   0x080a96b1 <+1329>:  shrx   %ecx,%edx,%edx
   0x080a96b6 <+1334>:  cmovne %edx,%eax
   0x080a96b9 <+1337>:  cmovne %edi,%edx
   0x080a96bc <+1340>:  mov%edx,%ecx
   0x080a96be <+1342>:  mov%eax,%edx
   0x080a96c0 <+1344>:  mov-0x3c(%ebp),%eax
   0x080a96c3 <+1347>:  andn   (%eax,%ebx,8),%edx,%eax
=> 0x080a96c9 <+1353>:  andn   0x4(%eax,%ebx,8),%ecx,%edx
   0x080a96d0 <+1360>:  mov%edx,%edi
   0x080a96d2 <+1362>:  or %eax,%edi
   0x080a96d4 <+1364>:  jne0x80a9e70

   0x080a96da <+1370>:  add$0x40,%esi
   0x080a96dd <+1373>:  add$0x1,%ebx
   0x080a96e0 <+1376>:  cmp$0x4,%ebx
   0x080a96e3 <+1379>:  je 0x80a9e60

   0x080a96e9 <+1385>:  mov-0x3c(%ebp),%eax
   0x080a96ec <+1388>:  mov0x4(%eax,%ebx,8),%edx
   0x080a96f0 <+1392>:  mov(%eax,%ebx,8),%eax
   0x080a96f3 <+1395>:  mov%edx,%ecx
   0x080a96f5 <+1397>:  or %eax,%ecx
   0x080a96f7 <+1399>:  jne0x80a9e42

   0x080a96fd <+1405>:  lea0x1(%ebx),%eax
[...]


(gdb) info registers
eax0x1840   402653248
ecx0x0  0
edx0x3f 63
ebx0x0  0
esp0xff8f9660   0xff8f9660
ebp0xff8f96c8   0xff8f96c8
esi0x0  0
edi0x0  0
eip0x80a96c90x80a96c9

eflags 0x10202  [ IF RF ]
cs 0x23 35
ss 0x2b 43
ds 0x2b 43
es 0x2b 43
fs 0x0  0
gs 0x63 99

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-05 Thread mike at fireburn dot co.uk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #15 from Mike Lothian  ---
Created attachment 43041
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43041=edit
llvm-tblgen Working and Broken binaries

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-05 Thread mike at fireburn dot co.uk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #14 from Mike Lothian  ---
I've been playing around with GCC 7.2.0 again

When compiling a 32bit LLVM with -O2 -march=native, llvm-tblgen runs
indefinitely during the build so it never completes

Doing the same with -O2 -march=native -mno-bmi or using GCC 6.4.0 allows it to
build 

From what I can see llvm-tblgen is a C++ program

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2017-08-09 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #13 from H.J. Lu  ---
(In reply to Mike Lothian from comment #12)
> Created attachment 41960 [details]
> si_shader objdumps

We need a small testcase in C.

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2017-08-09 Thread mike at fireburn dot co.uk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #12 from Mike Lothian  ---
Created attachment 41960
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41960=edit
si_shader objdumps

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2017-08-09 Thread mike at fireburn dot co.uk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #11 from Mike Lothian  ---
So a lot of the segfaults I see are in si_shader so I thought I'd compile Mesa
with and without BMI and compare the onjdumps of the two .o files

CFLAGS="-O2 -march=native -pipe -mno-bmi -m32" CXXFLAGS="-O2 -march=native
-pipe -mno-bmi -m32" LDFLAGS="-Wl,-O1 -Wl,--hash-style=gnu -Wl,--as-needed
-fuse-ld=bfd -m32" ./autogen.sh --prefix=/usr --build=i686-pc-linux-gnu
--host=i686-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info
--datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib
--disable-dependency-tracking --disable-silent-rules
--docdir=/usr/share/doc/mesa- --htmldir=/usr/share/doc/mesa-/html
--libdir=/usr/lib32 --enable-dri --enable-glx --enable-shared-glapi
--enable-texture-float --enable-nine --disable-debug --enable-dri3 --enable-egl
--enable-gbm --enable-gles1 --enable-gles2 --enable-glx-tls --disable-libunwind
--enable-valgrind=no --enable-llvm-shared-libs --with-dri-drivers=,swrast,i965
--with-gallium-drivers=,swrast,radeonsi --with-vulkan-drivers=,intel,radeon
PYTHON2=/usr/bin/python2.7 --with-platforms=x11,surfaceless,wayland,drm
--enable-nine --enable-llvm --enable-omx --enable-va --enable-vdpau
--disable-xa --disable-xvmc --with-va-libdir=/usr/lib32/va/drivers
--disable-glx-read-only-text --disable-gallium-osmesa


CFLAGS="-O2 -march=native -pipe -m32" CXXFLAGS="-O2 -march=native -pipe -m32"
LDFLAGS="-Wl,-O1 -Wl,--hash-style=gnu -Wl,--as-needed -fuse-ld=bfd -m32"
./autogen.sh --prefix=/usr --build=i686-pc-linux-gnu --host=i686-pc-linux-gnu
--mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share
--sysconfdir=/etc --localstatedir=/var/lib --disable-dependency-tracking
--disable-silent-rules --docdir=/usr/share/doc/mesa-
--htmldir=/usr/share/doc/mesa-/html --libdir=/usr/lib32 --enable-dri
--enable-glx --enable-shared-glapi --enable-texture-float --enable-nine
--disable-debug --enable-dri3 --enable-egl --enable-gbm --enable-gles1
--enable-gles2 --enable-glx-tls --disable-libunwind --enable-valgrind=no
--enable-llvm-shared-libs --with-dri-drivers=,swrast,i965
--with-gallium-drivers=,swrast,radeonsi --with-vulkan-drivers=,intel,radeon
PYTHON2=/usr/bin/python2.7 --with-platforms=x11,surfaceless,wayland,drm
--enable-nine --enable-llvm --enable-omx --enable-va --enable-vdpau
--disable-xa --disable-xvmc --with-va-libdir=/usr/lib32/va/drivers
--disable-glx-read-only-text --disable-gallium-osmesa

Were my configure options, and LLVM was compiled with -mno-bmi 

I'm attaching the dumps

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2017-08-09 Thread mike at fireburn dot co.uk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #10 from Mike Lothian  ---
Unfortunately it also depends on LLVM not just Mesa which makes it a much
bigger target for figuring this out

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2017-08-09 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #9 from Alexander Monakov  ---
A (potentially simpler) alternative is to use sequential builds (make without
-j) and bisect by index of compiled source file, i.e. have a wrapper script
around gcc that uses some global counter to pass -mno-bmi to first N compiler
invocations.

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2017-08-09 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #8 from Alexander Monakov  ---
Mike, you can start by preparing two build trees, one with faulty mesa compiled
with -march=native, the other with working-fine mesa compiled with
-march=native -mno-bmi. You should be able to collect:

- .o files from each tree, and
- link commands from build logs

and be able to re-link mesa libraries by hand and verify they still exhibit the
same behavior (one fails, the other doesn't).

>From there you can proceed by building "hybrid" libraries by taking a set of
good .o files and a complementary set of bad .o files. This will allow you to
find a single .o file that makes the library behave wrongly. More explanation
and a helper script for bisecting is available at
https://gcc.gnu.org/wiki/Analysing_Large_Testcases

At that point please share your status (once you're down to one file there's no
generic recipe, you'd have to get creative).

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2017-08-08 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #7 from H.J. Lu  ---
(In reply to Mike Lothian from comment #6)
> I tried the test case with
> 
> gcc -O2 -march=native test.c -o test
> 
> and 
> 
> gcc -O2 -march=native -mno-bmi test.c -o test
> 
> Both executables seem to run with no output
> 
> I've only seen the issue in radeonsi in Mesa with LLVM, I'm not sure what's
> happening as the problem doesn't manifest when debugging is enabled so I'm
> not sure how to create a smaller test case

Compile radeonsi one function at a time with and without -mbmi to
find out the smallest function which causes the problem with -mbmi.
If the function is small enough, we can take a guess.

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2017-08-08 Thread mike at fireburn dot co.uk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #6 from Mike Lothian  ---
I tried the test case with

gcc -O2 -march=native test.c -o test

and 

gcc -O2 -march=native -mno-bmi test.c -o test

Both executables seem to run with no output

I've only seen the issue in radeonsi in Mesa with LLVM, I'm not sure what's
happening as the problem doesn't manifest when debugging is enabled so I'm not
sure how to create a smaller test case

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2017-08-08 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

H.J. Lu  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2017-08-08
 CC||hjl.tools at gmail dot com
 Ever confirmed|0   |1

--- Comment #5 from H.J. Lu  ---
Please extract a small testcase.

[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

2017-08-08 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #4 from Andrew Pinski  ---
This might be PR 53399.