Re: [PR libfortran/62768] Handle filenames with embedded nulls

2014-09-18 Thread Hans-Peter Nilsson
On Wed, 17 Sep 2014, Hans-Peter Nilsson wrote:
 On Thu, 18 Sep 2014, Janne Blomqvist wrote:
  On Thu, Sep 18, 2014 at 12:57 AM, Hans-Peter Nilsson h...@bitrange.com 
  wrote:
   'k so we'll track the regressions in a PR.
 
  Do you prefer to tack on to 62768 or a new PR?

 Hijacking 62768 for the purposes of reporting a regression for
 its fix would not be proper.

 Will tell in a new PR, unless I see a really obvious fix.

False alarm.  If you look back at the patch I posted, there's a
typo. :-}  Duly warned about, but I'd rather expect the build to
fail.

Apparently libgfortran is not compiled with -Werror, at least
not for crosses.  Maybe -Werror is there for native but I'm not
sure as I see some warning: array subscript has type 'char'
[-Wchar-subscripts] which seems generic and also some others.
Though no more than can be fixed or excepted, IMHO.

brgds, H-P


Re: [PATCH AArch64]: Add constraint letter for stack_protect_test pattern)

2014-09-18 Thread Marcus Shawcroft
On 17 September 2014 15:43, James Greenhalgh james.greenha...@arm.com wrote:

 On Wed, Sep 17, 2014 at 09:30:31AM +0100, Richard Earnshaw wrote:
 =r is correct for an early-clobbered scratch.

 R.

 In that case...

 How is the attached patch for trunk? I've bootstrapped it on AArch64
 with -fstack-protector-strong and -frename-registers in the BOOT_CFLAGS
 without seeing any issues.

 OK?

 Thanks,
 James

 ---
 gcc/

 2014-09-15  James Greenhalgh  james.greenha...@arm.com

 * config/aarch64/aarch64.md (stack_protect_test_mode): Mark
 scratch register as an output to placate register renaming.


OK for this part.


 gcc/testsuite/

 2014-09-15  James Greenhalgh  james.greenha...@arm.com

 * gcc.target/aarch64/stack_protector_set_1.c: New.
 * gcc.target/aarch64/stack_protector_set_2.c: Likewise.

I agree with Andrew, these don't need to be aarch64 specific.

/Marcus


Re: [PR libfortran/62768] Handle filenames with embedded nulls

2014-09-18 Thread Janne Blomqvist
On Thu, Sep 18, 2014 at 11:14 AM, Hans-Peter Nilsson h...@bitrange.com wrote:
 On Wed, 17 Sep 2014, Hans-Peter Nilsson wrote:
 On Thu, 18 Sep 2014, Janne Blomqvist wrote:
  On Thu, Sep 18, 2014 at 12:57 AM, Hans-Peter Nilsson h...@bitrange.com 
  wrote:
   'k so we'll track the regressions in a PR.
 
  Do you prefer to tack on to 62768 or a new PR?

 Hijacking 62768 for the purposes of reporting a regression for
 its fix would not be proper.

 Will tell in a new PR, unless I see a really obvious fix.

 False alarm.

Ok, good; I was a bit perplexed what could be wrong.

  If you look back at the patch I posted, there's a
 typo. :-}  Duly warned about, but I'd rather expect the build to
 fail.

Yes, strange that it didn't fail. There's no prototype for cf_fstrcpy,
and since we use std=gnu11 prototypes should be mandatory. Also, since
there's no symbol called cf_fstrcpy  so at least the linking should
fail. Unless the link picked up some old inquire.o file?

 Apparently libgfortran is not compiled with -Werror, at least
 not for crosses.  Maybe -Werror is there for native but I'm not
 sure as I see some warning: array subscript has type 'char'
 [-Wchar-subscripts] which seems generic and also some others.
 Though no more than can be fixed or excepted, IMHO.

No, Werror isn't used. It was tried, but apparently caused issues.
From the changelog:

2008-06-13  Tobias Burnus  bur...@net-b.de

* configure.ac (AM_CFLAGS): Remove -Werror again.
* configure: Regenerate.

2008-06-13  Tobias Burnus  bur...@net-b.de

PR libgfortran/36518
* configure.ac (AM_CFLAGS): Add -Werror.
* configure: Regenerate.
* m4/ifunction_logical.m4: Cast n to (int).
* generated/any_l16.c: Regenerate.
* generated/any_l2.c: Regenerate.
* generated/all_l1.c: Regenerate.
* generated/all_l2.c: Regenerate.
* generated/all_l16.c: Regenerate.
* generated/any_l4.c: Regenerate.
* generated/count_4_l.c: Regenerate.
* generated/count_8_l.c: Regenerate.
* generated/all_l4.c: Regenerate.
* generated/count_1_l.c: Regenerate.
* generated/count_16_l.c: Regenerate.
* generated/any_l8.c: Regenerate.
* generated/count_2_l.c: Regenerate.
* generated/any_l1.c: Regenerate.
* generated/all_l8.c: Regenerate.


I have a vague recollection that there were issues with system headers
on non-glibc targets. It would be nice if Werror was used by default,
I think we've had a few cases where bugs have slipped past due to it.

-- 
Janne Blomqvist


Re: [PATCHv4] Vimrc config with GNU formatting

2014-09-18 Thread Yury Gribov

On 09/18/2014 07:52 AM, Segher Boessenkool wrote: +# Local Vim config
 +
 +vimrc:
 +	(cd $(srcdir); $(LN_S) contrib/vimrc .local.vimrc; $(LN_S) 
contrib/vimrc .lvimrc)

 +

 This is another target than what the doc (in the script itself) mentions.

Right, I've forgot to fix it before sending the patch.
Too much experimenting in the evening...

 It is not marked as phony.

Noted.

 It should not _be_ phony; the two files should
 be separate targets.

I've done that initially but that may look weird for the user.
When typing 'make .local.vimrc' in GCC build directory one would expect
.local.vimrc to be created at the root of build directory, not srcdir.

 Why make links instead of copies?  A user will likely
 want to edit his config.

I see your point. On the other hand fixing a bug in contrib/vimrc
will not be propagated to .local.vimrc which looks like a major disadvantage
(to me at least).

 The way you use ; is wrong (it continues if there
 is an error).

Agreed. Current Makefiles do use ; in backticks and that drew me away.

 You don't need the cd anyway, come to that.

Noted.

 It's pretty silly to have a makefile target that only copies a file (that
 is never used by the makefile itself); just tell in the doc where to copy
 the file.

I personally prefer a Makefile target to simplify things.
But let's wait for other people opinions on this.

 --- /dev/null
 +++ b/contrib/vimrc
 @@ -0,0 +1,45 @@
 + Code formatting settings for Vim.
 +
 + To enable this for GCC files by default, install thinca's vim-localrc
 + plugin and do
 +   $ make .local.vimrc

 No, we should *not* advertise an enough rope solution without 
mentioning

 it *will* kill you.

How about adding a disclaimer? E.g. beware that Vim plugins are a 
GAPING SECURITY HOLE
so use the at YOUR OWN RISK. (And note that Braun's plugin does use 
sandboxes).


 Or not mention it at all.  Esp. since your next option
 has all the same functionality and more.

It lacks very important functionality: user has to specify path
to concrete GCC source tree when adding the autocmd.
I have a dozen of trees on my box and I regularly rename, move or copy them.
With plugins one doesn't have to bother fixing paths in ~/.vimrc
which is important for productivity.

 + Or if you dislike plugins, add autocmd in your ~/.vimrc:
 +   :au BufNewFile,BufReadPost path/to/gcc/* :so 
path/to/gcc/contrib/vimrc


 There are many more reasons than just dislike of plugins to prefer
 something like this.  For one thing, many Vim users will have many
 similar statements in their config _already_.

So if you don't want to use plugins?

 + Or just source file manually every time if you are masochist:
 +   :so path/to/gcc/contrib/vimrc

 How is that masochist?  Typing that cino by hand though, now that would
 qualify ;-)

Note that user has to type source command for every newly opened file.
This indeed looks inconvenient (to me again).

 Just keep things neutral please.

Trying to salt the boring docs a bit to attract reader's attention ;)

 +setlocal cindent
 +setlocal shiftwidth=2
 +setlocal softtabstop=2
 +setlocal 
cinoptions=2s,n-s,{s,^-s,:s,=s,g0,f0,hs,p2s,t0,+s,(0,u0,w1,m0


 If you write this as absolute numbers instead of as shift widths, you 
do not

 need to force sw and sts settings down people's throat.  It might also be
 easier to read?  Well I doubt that, but it will be slightly shorter 
at least.


IMHO matching shiftwidth with GNU indent may be useful.
E.g. Vim won't reindent when you start editing an empty line and user
will have to insert indents manually.

Also replacing offsets with numbers hides the fact
that they are based on GNU shiftwidth.

 +setlocal textwidth=79

 The coding conventions say maximum line length is 80.

From https://www.gnu.org/prep/standards/html_node/Formatting.html : 
Please keep the length of source lines to 79 characters or less, for 
maximum readability in the widest range of environments.


 'tw' is a user preference as well

The config just follows the GNU coding standard. Now we rarely do 
violate textwidth in our codes, that's why I do formatoptions+=l below.


 +setlocal formatoptions-=ro formatoptions+=cql

 Yet another user preference.  Also mostly the default, except l -- 
which
 won't do anything if tw=0 as it should be.  And you do not enable t 
(also

 on by default), so you do not want to wrap text anyway?  Confused now.

Me as well, the original config author did it that way. IMHO +t makes 
sense here.


-Y


Re: [PR libfortran/62768] Handle filenames with embedded nulls

2014-09-18 Thread N.M. Maclaren

On Sep 18 2014, Janne Blomqvist wrote:



Apparently libgfortran is not compiled with -Werror, at least
not for crosses.  Maybe -Werror is there for native but I'm not
sure as I see some warning: array subscript has type 'char'
[-Wchar-subscripts] which seems generic and also some others.
Though no more than can be fixed or excepted, IMHO.


No, Werror isn't used. It was tried, but apparently caused issues.
From the changelog:

2008-06-13  Tobias Burnus  bur...@net-b.de

   * configure.ac (AM_CFLAGS): Remove -Werror again.

I have a vague recollection that there were issues with system headers
on non-glibc targets. It would be nice if Werror was used by default,
I think we've had a few cases where bugs have slipped past due to it.


I wasn't involved, but that sounds more than just likely!  I have had
that experience with several options, including Werror, pedantic and
specific standards ones.  My experience is that most vendors clean up
at least the standard C headers with time, and usually the more basic
POSIX ones, but any others often remain beyond redemption.

And what is not going to help is the ongoing incompatibilities
in de jure and de facto standards.  I have certainly seen standard
headers that would compile only with specific language selection
options.  Oh, yes, their COMPILER supported other ones - you just
couldn't use some important system headers with them :-(

If I get time, I will look at the libfortran header use and see if I
can make any useful specific comments.


Regards,
Nick Maclaren.



Re: [PATCH AArch64]: Add constraint letter for stack_protect_test pattern)

2014-09-18 Thread Jakub Jelinek
On Thu, Sep 18, 2014 at 09:18:53AM +0100, Marcus Shawcroft wrote:
  gcc/testsuite/
 
  2014-09-15  James Greenhalgh  james.greenha...@arm.com
 
  * gcc.target/aarch64/stack_protector_set_1.c: New.
  * gcc.target/aarch64/stack_protector_set_2.c: Likewise.
 
 I agree with Andrew, these don't need to be aarch64 specific.

Well, guess the 16 needs to be replaced with sizeof buffer, because
sizeof (unsigned int) is not 4 on all architectures.
And
/* { dg-require-effective-target fstack_protector } */
is needed, not all targets support -fstack-protector*.

Jakub


Re: [PATCH] Relax check against commuting XOR and ASHIFTRT in combine.c

2014-09-18 Thread Alan Lawrence

Thanks for the reply - and the in-depth investigation. I agree that the
correctness of the compiler is critical rather than particular platforms such as
Ada / Alpha.

Moreover, I think we both agree that if result_mode==shift_mode, the
transformation is correct. Just putting that check in, achieves what I'm
trying for here, so I'd be happy to go with the attached patch and call it a
day. However, I'm a little concerned about the other cases - i.e. where
shift_mode is wider than result_mode.

If I understand correctly (and I'm not sure about that, let's see how far I
get), this means we'll perform the shift in (say) DImode, when we're only really
concerned about the lower (say) 32-bits (for an originally-SImode shift).
try_widen_shift_mode will in this case check that the result of the operation
*inside* the shift (in our case an XOR) has 33 sign bit copies (via
num_sign_bit_copies), i.e. that its *top* 32-bits are all equal to the original
SImode sign bit. count of these bits may be fed into the top of the desired
SImode result by the DImode shift. Right so far?

AFAICT, num_sign_bit_copies for an XOR, conservatively returns the minimum of
the num_sign_bit_copies of its two operands. I'm not sure whether this is
behaviour we should rely on in its callers, or for the sake of abstraction we
should treat num_sign_bit_copies as a black box (which does what it says on the,
erm, tin).

If the former, doesn't having num_sign_bit_copies = the difference in size
between result_mode and shift_mode, of both operands to the XOR, guarantee
safety of the commutation (whether the constant is positive or negative)? We
could perform the shift (in the larger mode) on both of the XOR operands safely,
then XOR together their lower parts.

If, however, we want to play safe and ensure that we deal safely with any XOR
whose top (mode size difference + 1) bits were the same, then I think the
restriction that the XOR constant is positive is neither necessary nor
sufficient; rather (mirroring try_widen_shift_mode) I think we need that
num_sign_bit_copies of the constant in shift_mode, is more than the size
difference between result_mode and shift_mode.

Hmmm. I might try that patch at some point, I think it is the right check to
make. (Meta-comment: this would be *so*much* easier if we could write unit tests
more easily!) In the meantime I'd be happy to settle for the attached...

(tests are as they were posted
https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01233.html .)

--Alan

Jeff Law wrote:

On 07/17/14 10:56, Alan Lawrence wrote:

Ok, the attached tests are passing on x86_64-none-linux-gnu,
aarch64-none-elf, arm-none-eabi, and a bunch of smaller platforms for
which I've only built a stage 1 compiler (i.e. as far as necessary to
assemble). That's with either change to simplify_shift_const_1.

As to the addition of result_mode != shift_mode, or removing the whole
check against XOR - I've now tested the latter:

bootstrapped on x86_64-none-linux-gnu, check-gcc and check-ada;
bootstrapped on arm-none-linux-gnueabihf;
bootstrapped on aarch64-none-linux-gnu;
cross-tested check-gcc on aarch64-none-elf;
cross-tested on arm-none-eabi;
(and Uros has bootstrapped on alpha and done a suite of tests, as per
https://gcc.gnu.org/ml/gcc-testresults/2014-07/msg01236.html).

 From a perspective of paranoia, I'd lean towards adding result_mode !=
shift_mode, but for neatness removing the whole check against XOR is
nicer. So I'd defer to the maintainers as to whether one might be
preferable to the other...(but my unproven suspicion is that the two are
equivalent, and no case where result_mode != shift_mode is possible!)
So first, whether or not someone cares about Alpha-VMS isn't the issue 
at hand.  It's whether or not the new code is correct or not. 
Similarly the fact that the code generation paths are radically 
different now when compared to 2004 and we can't currently trigger the 
cases where the modes are different isn't the issue, again, it's whether 
or not your code is correct or not.


I think the key is to look at try_widen_shift_mode and realize that it 
can return a mode larger than the original mode of the operations. 
However, it only does so when it presented with a case where it knows 
the sign bit being shifted in from the left will be the same as the sign 
bit in the original mode.


In the case of an XOR when the sign bit set in shift_mode, that's not 
going to be the case.  We would violate the assumption made when we 
decided to widen the shift to shift_mode.


So your relaxation is safe when shift_mode == result_mode, but unsafe 
otherwise -- even though we don't currently have a testcase for the 
shift_mode != result_mode case, we don't want to break that.


So your updated patch is correct.  However, I would ask that you make 
one additional change.  Namely the comment before the two fragments of 
code you changed needs updating.  Something like


... and the constant has its sign bit set in shift_mode and shift_mode
  

[AArch64] Auto-generate the BUILTIN_ macros for aarch64-builtins.c

2014-09-18 Thread James Greenhalgh

Hi,

A possible source of errors is in keeping the iterators.md file and the
iterator macros in aarch64-builtin.c synchronized.

Clearly this shouldn't be a problem given standard unix tools, it is just a
text processing job.

This patch adds geniterators.sh to the AArch64 backend which takes the
iterators.md file and generates aarch64-builtin-iterators.h, this replaces
the definitions from aarch64-builtins.c, which now just include this file.

Bootstrapped for aarch64-none-linux-gnueabi, and regression tested for
aarch64-none-elf with no issues.

OK?

Thanks,
James

---
gcc/

2014-09-18  James Greenhalgh  james.greenha...@arm.com

* config/aarch64/aarch64-builtin-iterators.h: New.
* config/aarch64/geniterators.sh: New.
* config/aarch64/iterators.md (VDQF_DF): New.
* config/aarch64/t-aarch64: Add dependencies on new build script.
* config/aarch64/aarch64-builtins.c (BUILTIN_*) Remove.
diff --git a/gcc/config/aarch64/aarch64-builtin-iterators.h b/gcc/config/aarch64/aarch64-builtin-iterators.h
new file mode 100644
index 000..ae579e8
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-builtin-iterators.h
@@ -0,0 +1,113 @@
+/* -*- buffer-read-only: t -*- */
+/* Generated automatically by geniterators.sh from iterators.md.  */
+#ifndef GCC_AARCH64_ITERATORS_H
+#define GCC_AARCH64_ITERATORS_H
+#define BUILTIN_GPI(T, N, MAP) \
+  VAR2 (T, N, MAP, si, di)
+#define BUILTIN_SHORT(T, N, MAP) \
+  VAR2 (T, N, MAP, qi, hi)
+#define BUILTIN_ALLI(T, N, MAP) \
+  VAR4 (T, N, MAP, qi, hi, si, di)
+#define BUILTIN_SDQ_I(T, N, MAP) \
+  VAR4 (T, N, MAP, qi, hi, si, di)
+#define BUILTIN_ALLX(T, N, MAP) \
+  VAR3 (T, N, MAP, qi, hi, si)
+#define BUILTIN_GPF(T, N, MAP) \
+  VAR2 (T, N, MAP, sf, df)
+#define BUILTIN_VDQ(T, N, MAP) \
+  VAR7 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di)
+#define BUILTIN_VDQ_I(T, N, MAP) \
+  VAR7 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di)
+#define BUILTIN_VSDQ_I(T, N, MAP) \
+  VAR11 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di, qi, hi, si, di)
+#define BUILTIN_VSDQ_I_DI(T, N, MAP) \
+  VAR8 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di, di)
+#define BUILTIN_VD(T, N, MAP) \
+  VAR4 (T, N, MAP, v8qi, v4hi, v2si, v2sf)
+#define BUILTIN_VD_BHSI(T, N, MAP) \
+  VAR3 (T, N, MAP, v8qi, v4hi, v2si)
+#define BUILTIN_VDQ_BHSI(T, N, MAP) \
+  VAR6 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si)
+#define BUILTIN_VQ(T, N, MAP) \
+  VAR6 (T, N, MAP, v16qi, v8hi, v4si, v2di, v4sf, v2df)
+#define BUILTIN_VQ_NO2E(T, N, MAP) \
+  VAR4 (T, N, MAP, v16qi, v8hi, v4si, v4sf)
+#define BUILTIN_VQ_2E(T, N, MAP) \
+  VAR2 (T, N, MAP, v2di, v2df)
+#define BUILTIN_VQ_S(T, N, MAP) \
+  VAR6 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si)
+#define BUILTIN_VSDQ_I_BHSI(T, N, MAP) \
+  VAR10 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di, qi, hi, si)
+#define BUILTIN_VDQM(T, N, MAP) \
+  VAR6 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si)
+#define BUILTIN_VDQF(T, N, MAP) \
+  VAR3 (T, N, MAP, v2sf, v4sf, v2df)
+#define BUILTIN_VDQF_DF(T, N, MAP) \
+  VAR4 (T, N, MAP, v2sf, v4sf, v2df, df)
+#define BUILTIN_VDQSF(T, N, MAP) \
+  VAR2 (T, N, MAP, v2sf, v4sf)
+#define BUILTIN_VDQF_COND(T, N, MAP) \
+  VAR6 (T, N, MAP, v2sf, v2si, v4sf, v4si, v2df, v2di)
+#define BUILTIN_VALLF(T, N, MAP) \
+  VAR5 (T, N, MAP, v2sf, v4sf, v2df, sf, df)
+#define BUILTIN_V2F(T, N, MAP) \
+  VAR2 (T, N, MAP, v2sf, v2df)
+#define BUILTIN_VALL(T, N, MAP) \
+  VAR10 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di, v2sf, v4sf, v2df)
+#define BUILTIN_VALLDI(T, N, MAP) \
+  VAR11 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di, v2sf, v4sf, v2df, di)
+#define BUILTIN_VALLDIF(T, N, MAP) \
+  VAR12 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di, v2sf, v4sf, v2df, di, df)
+#define BUILTIN_VDQV(T, N, MAP) \
+  VAR6 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v4si, v2di)
+#define BUILTIN_VDQV_S(T, N, MAP) \
+  VAR5 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v4si)
+#define BUILTIN_VDN(T, N, MAP) \
+  VAR3 (T, N, MAP, v4hi, v2si, di)
+#define BUILTIN_VQN(T, N, MAP) \
+  VAR3 (T, N, MAP, v8hi, v4si, v2di)
+#define BUILTIN_VDW(T, N, MAP) \
+  VAR3 (T, N, MAP, v8qi, v4hi, v2si)
+#define BUILTIN_VSQN_HSDI(T, N, MAP) \
+  VAR6 (T, N, MAP, v8hi, v4si, v2di, hi, si, di)
+#define BUILTIN_VQW(T, N, MAP) \
+  VAR3 (T, N, MAP, v16qi, v8hi, v4si)
+#define BUILTIN_VDC(T, N, MAP) \
+  VAR6 (T, N, MAP, v8qi, v4hi, v2si, v2sf, di, df)
+#define BUILTIN_VDIC(T, N, MAP) \
+  VAR3 (T, N, MAP, v8qi, v4hi, v2si)
+#define BUILTIN_VD1(T, N, MAP) \
+  VAR5 (T, N, MAP, v8qi, v4hi, v2si, v2sf, v1df)
+#define BUILTIN_VDQIF(T, N, MAP) \
+  VAR9 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2sf, v4sf, v2df)
+#define BUILTIN_VDQQH(T, N, MAP) \
+  VAR4 (T, N, MAP, v8qi, v16qi, v4hi, v8hi)
+#define BUILTIN_VDQHS(T, N, MAP) \
+  VAR4 (T, N, MAP, v4hi, v8hi, v2si, v4si)
+#define BUILTIN_VDQHSD(T, N, MAP) \
+  VAR5 (T, N, MAP, v4hi, v8hi, v2si, v4si, v2di)
+#define BUILTIN_VDQQHS(T, N, MAP) \
+  VAR6 (T, 

[PATCH][ARM] Fix insn type of movmisalign neon load pattern

2014-09-18 Thread Kyrill Tkachov

Hi all,

While browsing the code I noticed that the pattern in the patch has a 
store type when it is really a vld1 operation. Looking at the patterns 
around it, I think it was just a copy-pasto.


The patch corrects that.

Tested arm-none-eabi.

Ok for trunk?

2014-09-18  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* config/arm/neon.md (*movmisalignmode_neon_load): Change type
to neon_load1_1regq.diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 354a105..69b7cfa 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -296,7 +296,7 @@ (define_insn *movmisalignmode_neon_load
 		UNSPEC_MISALIGNED_ACCESS))]
   TARGET_NEON  !BYTES_BIG_ENDIAN  unaligned_access
   vld1.V_sz_elem\t{%q0}, %A1
-  [(set_attr type neon_store1_1regq)])
+  [(set_attr type neon_load1_1regq)])
 
 (define_insn vec_setmode_internal
   [(set (match_operand:VD 0 s_register_operand =w,w)

Re: [PATCH][ARM] Fix insn type of movmisalign neon load pattern

2014-09-18 Thread Richard Earnshaw
On 18/09/14 11:01, Kyrill Tkachov wrote:
 Hi all,
 
 While browsing the code I noticed that the pattern in the patch has a 
 store type when it is really a vld1 operation. Looking at the patterns 
 around it, I think it was just a copy-pasto.
 
 The patch corrects that.
 
 Tested arm-none-eabi.
 
 Ok for trunk?
 
 2014-09-18  Kyrylo Tkachov  kyrylo.tkac...@arm.com
 
  * config/arm/neon.md (*movmisalignmode_neon_load): Change type
  to neon_load1_1regq.
 
 
 arm-movmisalign-type.patch
 
 
 diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
 index 354a105..69b7cfa 100644
 --- a/gcc/config/arm/neon.md
 +++ b/gcc/config/arm/neon.md
 @@ -296,7 +296,7 @@ (define_insn *movmisalignmode_neon_load
   UNSPEC_MISALIGNED_ACCESS))]
TARGET_NEON  !BYTES_BIG_ENDIAN  unaligned_access
vld1.V_sz_elem\t{%q0}, %A1
 -  [(set_attr type neon_store1_1regq)])
 +  [(set_attr type neon_load1_1regq)])
  
  (define_insn vec_setmode_internal
[(set (match_operand:VD 0 s_register_operand =w,w)
 

OK.

R.




[PATCH 0/5] Fix handling of word subregs of wide registers

2014-09-18 Thread Richard Sandiford
This series is a cleaned-up version of:

https://gcc.gnu.org/ml/gcc/2014-03/msg00163.html

The underlying problem is that the semantics of subregs depend on the
word size.  You can't have a subreg for byte 2 of a 4-byte word, say,
but you can have a subreg for word 2 of a 4-word value (as well as lowpart
subregs of that word, etc.).  This causes problems when an architecture has
wider-than-word registers, since the addressability of a word can then depend
on which register class is used.

The register allocators need to fix up cases where a subreg turns out to
be invalid for a particular class.  This is really an extension of what
we need to do for CANNOT_CHANGE_MODE_CLASS.

Tested on x86_64-linux-gnu, powerpc64-linux-gnu and aarch64_be-elf.

Thanks,
Richard



[PATCH 1/5] Allow *_HARD_REG_SET arguments to be const

2014-09-18 Thread Richard Sandiford
Patch 4 needs to pass a const HARD_REG_SET to AND/COPY_HARD_REG_SET.
This patch allows that for all intent-in arguments.


gcc/
* hard-reg-set.h (COPY_HARD_REG_SET, COMPL_HARD_REG_SET)
(AND_HARD_REG_SET, AND_COMPL_HARD_REG_SET, IOR_HARD_REG_SET)
(IOR_COMPL_HARD_REG_SET): Allow the from set to be constant.


Index: gcc/hard-reg-set.h
===
--- gcc/hard-reg-set.h  2014-09-15 10:00:12.133398136 +0100
+++ gcc/hard-reg-set.h  2014-09-15 10:00:12.129398185 +0100
@@ -168,32 +168,38 @@ do { HARD_REG_ELT_TYPE *scan_tp_ = (TO);
  scan_tp_[1] = -1; } while (0)
 
 #define COPY_HARD_REG_SET(TO, FROM)  \
-do { HARD_REG_ELT_TYPE *scan_tp_ = (TO), *scan_fp_ = (FROM);   \
+do { HARD_REG_ELT_TYPE *scan_tp_ = (TO);   \
+ const HARD_REG_ELT_TYPE *scan_fp_ = (FROM);   \
  scan_tp_[0] = scan_fp_[0];\
  scan_tp_[1] = scan_fp_[1]; } while (0)
 
 #define COMPL_HARD_REG_SET(TO, FROM)  \
-do { HARD_REG_ELT_TYPE *scan_tp_ = (TO), *scan_fp_ = (FROM);   \
+do { HARD_REG_ELT_TYPE *scan_tp_ = (TO);   \
+ const HARD_REG_ELT_TYPE *scan_fp_ = (FROM);   \
  scan_tp_[0] = ~ scan_fp_[0];  \
  scan_tp_[1] = ~ scan_fp_[1]; } while (0)
 
 #define AND_HARD_REG_SET(TO, FROM)  \
-do { HARD_REG_ELT_TYPE *scan_tp_ = (TO), *scan_fp_ = (FROM);   \
+do { HARD_REG_ELT_TYPE *scan_tp_ = (TO);   \
+ const HARD_REG_ELT_TYPE *scan_fp_ = (FROM);   \
  scan_tp_[0] = scan_fp_[0];   \
  scan_tp_[1] = scan_fp_[1]; } while (0)
 
 #define AND_COMPL_HARD_REG_SET(TO, FROM)  \
-do { HARD_REG_ELT_TYPE *scan_tp_ = (TO), *scan_fp_ = (FROM);   \
+do { HARD_REG_ELT_TYPE *scan_tp_ = (TO);   \
+ const HARD_REG_ELT_TYPE *scan_fp_ = (FROM);   \
  scan_tp_[0] = ~ scan_fp_[0]; \
  scan_tp_[1] = ~ scan_fp_[1]; } while (0)
 
 #define IOR_HARD_REG_SET(TO, FROM)  \
-do { HARD_REG_ELT_TYPE *scan_tp_ = (TO), *scan_fp_ = (FROM);   \
+do { HARD_REG_ELT_TYPE *scan_tp_ = (TO);   \
+ const HARD_REG_ELT_TYPE *scan_fp_ = (FROM);   \
  scan_tp_[0] |= scan_fp_[0];   \
  scan_tp_[1] |= scan_fp_[1]; } while (0)
 
 #define IOR_COMPL_HARD_REG_SET(TO, FROM)  \
-do { HARD_REG_ELT_TYPE *scan_tp_ = (TO), *scan_fp_ = (FROM);   \
+do { HARD_REG_ELT_TYPE *scan_tp_ = (TO);   \
+ const HARD_REG_ELT_TYPE *scan_fp_ = (FROM);   \
  scan_tp_[0] |= ~ scan_fp_[0]; \
  scan_tp_[1] |= ~ scan_fp_[1]; } while (0)
 
@@ -236,37 +242,43 @@ do { HARD_REG_ELT_TYPE *scan_tp_ = (TO);
  scan_tp_[2] = -1; } while (0)
 
 #define COPY_HARD_REG_SET(TO, FROM)  \
-do { HARD_REG_ELT_TYPE *scan_tp_ = (TO), *scan_fp_ = (FROM);   \
+do { HARD_REG_ELT_TYPE *scan_tp_ = (TO);   \
+ const HARD_REG_ELT_TYPE *scan_fp_ = (FROM);   \
  scan_tp_[0] = scan_fp_[0];\
  scan_tp_[1] = scan_fp_[1];\
  scan_tp_[2] = scan_fp_[2]; } while (0)
 
 #define COMPL_HARD_REG_SET(TO, FROM)  \
-do { HARD_REG_ELT_TYPE *scan_tp_ = (TO), *scan_fp_ = (FROM);   \
+do { HARD_REG_ELT_TYPE *scan_tp_ = (TO);   \
+ const HARD_REG_ELT_TYPE *scan_fp_ = (FROM);   \
  scan_tp_[0] = ~ scan_fp_[0];  \
  scan_tp_[1] = ~ scan_fp_[1];  \
  scan_tp_[2] = ~ scan_fp_[2]; } while (0)
 
 #define AND_HARD_REG_SET(TO, FROM)  \
-do { HARD_REG_ELT_TYPE *scan_tp_ = (TO), *scan_fp_ = (FROM);   \
+do { HARD_REG_ELT_TYPE *scan_tp_ = (TO);   \
+ const HARD_REG_ELT_TYPE *scan_fp_ = (FROM);   \
  scan_tp_[0] = scan_fp_[0];   \
  scan_tp_[1] = scan_fp_[1];   \
  scan_tp_[2] = scan_fp_[2]; } while (0)
 
 #define AND_COMPL_HARD_REG_SET(TO, FROM)  \
-do { HARD_REG_ELT_TYPE *scan_tp_ = (TO), *scan_fp_ = (FROM);   \
+do { HARD_REG_ELT_TYPE *scan_tp_ = (TO);   \
+ const HARD_REG_ELT_TYPE *scan_fp_ = (FROM);   \
  scan_tp_[0] = ~ scan_fp_[0]; \
  scan_tp_[1] = ~ scan_fp_[1]; \
  scan_tp_[2] = ~ scan_fp_[2]; } while (0)
 
 #define IOR_HARD_REG_SET(TO, FROM)  \
-do { HARD_REG_ELT_TYPE *scan_tp_ = (TO), *scan_fp_ = (FROM);   \
+do { HARD_REG_ELT_TYPE *scan_tp_ = (TO);   \
+ const HARD_REG_ELT_TYPE *scan_fp_ = (FROM);   \
  scan_tp_[0] |= scan_fp_[0];   \
  scan_tp_[1] |= scan_fp_[1];   \
  scan_tp_[2] |= scan_fp_[2]; } while (0)
 
 #define 

Re: Fix i386 FP_TRAPPING_EXCEPTIONS

2014-09-18 Thread Uros Bizjak
On Wed, Sep 17, 2014 at 9:47 PM, Joseph S. Myers
jos...@codesourcery.com wrote:
 The i386 sfp-machine.h defines FP_TRAPPING_EXCEPTIONS in a way that is
 always wrong: it treats a set bit as indicating the exception is
 trapping, when actually a set bit (both for 387 and SSE floating
 point) indicates it is masked, and a clear bit indicates it is
 trapping.  This patch fixes this bug.

 Bootstrapped with no regressions on x86_64-unknown-linux-gnu.  OK to
 commit?

 Note to ia64 maintainers: it would be a good idea to add a definition
 of FP_TRAPPING_EXCEPTIONS for ia64, and I expect the new test to fail
 on ia64 until you do so.

 libgcc:
 2014-09-17  Joseph Myers  jos...@codesourcery.com

 * config/i386/sfp-machine.h (FP_TRAPPING_EXCEPTIONS): Treat clear
 bits not set bits as indicating trapping exceptions.

 gcc/testsuite:
 2014-09-17  Joseph Myers  jos...@codesourcery.com

 * gcc.dg/torture/float128-exact-underflow.c: New test.

My brown paperbag bug :(

OK for mainline and release branches.

Thanks,
Uros.


[PATCH 2/5] Tweak subreg_get_info documentation

2014-09-18 Thread Richard Sandiford
Try to clarify what subreg_get_info does and doesn't check.


gcc/
* rtl.h (subreg_info): Expand commentary
* rtlanal.c (subreg_get_info): Likewise.

Index: gcc/rtl.h
===
--- gcc/rtl.h   2014-09-15 10:00:14.693366097 +0100
+++ gcc/rtl.h   2014-09-15 10:00:14.689366147 +0100
@@ -2866,10 +2866,13 @@ struct subreg_info
 {
   /* Offset of first hard register involved in the subreg.  */
   int offset;
-  /* Number of hard registers involved in the subreg.  */
+  /* Number of hard registers involved in the subreg.  In the case of
+ a paradoxical subreg, this is the number of registers that would
+ be modified by writing to the subreg; some of them may be don't-care
+ when reading from the subreg.  */
   int nregs;
   /* Whether this subreg can be represented as a hard reg with the new
- mode.  */
+ mode (by adding OFFSET to the original hard register).  */
   bool representable_p;
 };
 
Index: gcc/rtlanal.c
===
--- gcc/rtlanal.c   2014-09-15 10:00:14.693366097 +0100
+++ gcc/rtlanal.c   2014-09-15 10:00:14.689366147 +0100
@@ -3411,7 +3411,20 @@ subreg_lsb (const_rtx x)
xmode  - The mode of xregno.
offset - The byte offset.
ymode  - The mode of a top level SUBREG (or what may become one).
-   info   - Pointer to structure to fill in.  */
+   info   - Pointer to structure to fill in.
+
+   Rather than considering one particular inner register (and thus one
+   particular outer register) in isolation, this function really uses
+   XREGNO as a model for a sequence of isomorphic hard registers.  Thus the
+   function does not check whether adding INFO-offset to XREGNO gives
+   a valid hard register; even if INFO-offset + XREGNO is out of range,
+   there might be another register of the same type that is in range.
+   Likewise it doesn't check whether HARD_REGNO_MODE_OK accepts the new
+   register, since that can depend on things like whether the final
+   register number is even or odd.  Callers that want to check whether
+   this particular subreg can be replaced by a simple (reg ...) should
+   use simplify_subreg_regno.  */
+
 void
 subreg_get_info (unsigned int xregno, enum machine_mode xmode,
 unsigned int offset, enum machine_mode ymode,



Re: [AArch64] Auto-generate the BUILTIN_ macros for aarch64-builtins.c

2014-09-18 Thread Richard Earnshaw
On 18/09/14 10:53, James Greenhalgh wrote:
 
 Hi,
 
 A possible source of errors is in keeping the iterators.md file and the
 iterator macros in aarch64-builtin.c synchronized.
 
 Clearly this shouldn't be a problem given standard unix tools, it is just a
 text processing job.
 
 This patch adds geniterators.sh to the AArch64 backend which takes the
 iterators.md file and generates aarch64-builtin-iterators.h, this replaces
 the definitions from aarch64-builtins.c, which now just include this file.
 
 Bootstrapped for aarch64-none-linux-gnueabi, and regression tested for
 aarch64-none-elf with no issues.
 
 OK?
 
 Thanks,
 James
 
 ---
 gcc/
 
 2014-09-18  James Greenhalgh  james.greenha...@arm.com
 
   * config/aarch64/aarch64-builtin-iterators.h: New.
   * config/aarch64/geniterators.sh: New.
   * config/aarch64/iterators.md (VDQF_DF): New.
   * config/aarch64/t-aarch64: Add dependencies on new build script.
   * config/aarch64/aarch64-builtins.c (BUILTIN_*) Remove.
 
 
 0001-AArch64-Auto-generate-the-BUILTIN_-macros-for-aarch6.patch
 
 
 diff --git a/gcc/config/aarch64/aarch64-builtin-iterators.h 
 b/gcc/config/aarch64/aarch64-builtin-iterators.h
 new file mode 100644
 index 000..ae579e8
 --- /dev/null
 +++ b/gcc/config/aarch64/aarch64-builtin-iterators.h
 @@ -0,0 +1,113 @@
 +/* -*- buffer-read-only: t -*- */
 +/* Generated automatically by geniterators.sh from iterators.md.  */
 +#ifndef GCC_AARCH64_ITERATORS_H
 +#define GCC_AARCH64_ITERATORS_H
 +#define BUILTIN_GPI(T, N, MAP) \
 +  VAR2 (T, N, MAP, si, di)
 +#define BUILTIN_SHORT(T, N, MAP) \
 +  VAR2 (T, N, MAP, qi, hi)
 +#define BUILTIN_ALLI(T, N, MAP) \
 +  VAR4 (T, N, MAP, qi, hi, si, di)
 +#define BUILTIN_SDQ_I(T, N, MAP) \
 +  VAR4 (T, N, MAP, qi, hi, si, di)
 +#define BUILTIN_ALLX(T, N, MAP) \
 +  VAR3 (T, N, MAP, qi, hi, si)
 +#define BUILTIN_GPF(T, N, MAP) \
 +  VAR2 (T, N, MAP, sf, df)
 +#define BUILTIN_VDQ(T, N, MAP) \
 +  VAR7 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di)
 +#define BUILTIN_VDQ_I(T, N, MAP) \
 +  VAR7 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di)
 +#define BUILTIN_VSDQ_I(T, N, MAP) \
 +  VAR11 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di, qi, hi, si, 
 di)
 +#define BUILTIN_VSDQ_I_DI(T, N, MAP) \
 +  VAR8 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di, di)
 +#define BUILTIN_VD(T, N, MAP) \
 +  VAR4 (T, N, MAP, v8qi, v4hi, v2si, v2sf)
 +#define BUILTIN_VD_BHSI(T, N, MAP) \
 +  VAR3 (T, N, MAP, v8qi, v4hi, v2si)
 +#define BUILTIN_VDQ_BHSI(T, N, MAP) \
 +  VAR6 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si)
 +#define BUILTIN_VQ(T, N, MAP) \
 +  VAR6 (T, N, MAP, v16qi, v8hi, v4si, v2di, v4sf, v2df)
 +#define BUILTIN_VQ_NO2E(T, N, MAP) \
 +  VAR4 (T, N, MAP, v16qi, v8hi, v4si, v4sf)
 +#define BUILTIN_VQ_2E(T, N, MAP) \
 +  VAR2 (T, N, MAP, v2di, v2df)
 +#define BUILTIN_VQ_S(T, N, MAP) \
 +  VAR6 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si)
 +#define BUILTIN_VSDQ_I_BHSI(T, N, MAP) \
 +  VAR10 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di, qi, hi, si)
 +#define BUILTIN_VDQM(T, N, MAP) \
 +  VAR6 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si)
 +#define BUILTIN_VDQF(T, N, MAP) \
 +  VAR3 (T, N, MAP, v2sf, v4sf, v2df)
 +#define BUILTIN_VDQF_DF(T, N, MAP) \
 +  VAR4 (T, N, MAP, v2sf, v4sf, v2df, df)
 +#define BUILTIN_VDQSF(T, N, MAP) \
 +  VAR2 (T, N, MAP, v2sf, v4sf)
 +#define BUILTIN_VDQF_COND(T, N, MAP) \
 +  VAR6 (T, N, MAP, v2sf, v2si, v4sf, v4si, v2df, v2di)
 +#define BUILTIN_VALLF(T, N, MAP) \
 +  VAR5 (T, N, MAP, v2sf, v4sf, v2df, sf, df)
 +#define BUILTIN_V2F(T, N, MAP) \
 +  VAR2 (T, N, MAP, v2sf, v2df)
 +#define BUILTIN_VALL(T, N, MAP) \
 +  VAR10 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di, v2sf, v4sf, 
 v2df)
 +#define BUILTIN_VALLDI(T, N, MAP) \
 +  VAR11 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di, v2sf, v4sf, 
 v2df, di)
 +#define BUILTIN_VALLDIF(T, N, MAP) \
 +  VAR12 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di, v2sf, v4sf, 
 v2df, di, df)
 +#define BUILTIN_VDQV(T, N, MAP) \
 +  VAR6 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v4si, v2di)
 +#define BUILTIN_VDQV_S(T, N, MAP) \
 +  VAR5 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v4si)
 +#define BUILTIN_VDN(T, N, MAP) \
 +  VAR3 (T, N, MAP, v4hi, v2si, di)
 +#define BUILTIN_VQN(T, N, MAP) \
 +  VAR3 (T, N, MAP, v8hi, v4si, v2di)
 +#define BUILTIN_VDW(T, N, MAP) \
 +  VAR3 (T, N, MAP, v8qi, v4hi, v2si)
 +#define BUILTIN_VSQN_HSDI(T, N, MAP) \
 +  VAR6 (T, N, MAP, v8hi, v4si, v2di, hi, si, di)
 +#define BUILTIN_VQW(T, N, MAP) \
 +  VAR3 (T, N, MAP, v16qi, v8hi, v4si)
 +#define BUILTIN_VDC(T, N, MAP) \
 +  VAR6 (T, N, MAP, v8qi, v4hi, v2si, v2sf, di, df)
 +#define BUILTIN_VDIC(T, N, MAP) \
 +  VAR3 (T, N, MAP, v8qi, v4hi, v2si)
 +#define BUILTIN_VD1(T, N, MAP) \
 +  VAR5 (T, N, MAP, v8qi, v4hi, v2si, v2sf, v1df)
 +#define BUILTIN_VDQIF(T, N, MAP) \
 +  VAR9 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2sf, v4sf, v2df)
 +#define BUILTIN_VDQQH(T, N, MAP) \
 +  VAR4 (T, N, MAP, 

[PATCH 3/5] Use simplify_subreg_regno in combine.c:subst

2014-09-18 Thread Richard Sandiford
combine.c:subst should refuse to substitute a hard register
into a subreg if the new subreg would not be simplified to a
simple hard register, since the result would have to be reloaded.
This is more for optimisation than correctness, since in theory
the RA should be able to fix up any unsimplified subregs.


gcc/
* combine.c (subst): Use simplify_subreg_regno rather than
REG_CANNOT_CHANGE_MODE_P to detect invalid mode changes.

Index: gcc/combine.c
===
--- gcc/combine.c   2014-09-15 10:00:17.545330404 +0100
+++ gcc/combine.c   2014-09-15 10:00:17.545330404 +0100
@@ -5121,15 +5121,13 @@ #define COMBINE_RTX_EQUAL_P(X,Y)
\
  )
return gen_rtx_CLOBBER (VOIDmode, const0_rtx);
 
-#ifdef CANNOT_CHANGE_MODE_CLASS
  if (code == SUBREG
   REG_P (to)
   REGNO (to)  FIRST_PSEUDO_REGISTER
-  REG_CANNOT_CHANGE_MODE_P (REGNO (to),
-  GET_MODE (to),
-  GET_MODE (x)))
+  simplify_subreg_regno (REGNO (to), GET_MODE (to),
+   SUBREG_BYTE (x),
+   GET_MODE (x))  0)
return gen_rtx_CLOBBER (VOIDmode, const0_rtx);
-#endif
 
  new_rtx = (unique_copy  n_occurrences ? copy_rtx (to) : to);
  n_occurrences++;



Re: [PATCH AArch64]: Add constraint letter for stack_protect_test pattern)

2014-09-18 Thread James Greenhalgh

On Wed, Sep 17, 2014 at 03:50:55PM +0100, pins...@gmail.com wrote:


  On Sep 17, 2014, at 7:43 AM, James Greenhalgh james.greenha...@arm.com 
  wrote:
 
 
  On Wed, Sep 17, 2014 at 09:30:31AM +0100, Richard Earnshaw wrote:
  =r is correct for an early-clobbered scratch.
 
  R.
 
  In that case...
 
  How is the attached patch for trunk? I've bootstrapped it on AArch64
  with -fstack-protector-strong and -frename-registers in the BOOT_CFLAGS
  without seeing any issues.

 There is nothing aarch64 specific about this testcase so I would place them
 under gcc.dg and add the extra marker which says this testcase requires stack
 protector.

That sounds reasonable to me. Updated as attached, along with Jakub's
suggestions.

   And maybe even use compile instead of just assemble too.

Compile is weaker than assemble. Assemble takes you up to an object file,
which is as far as we need to go.

Thanks,
James

---

gcc/

2014-09-18  James Greenhalgh  james.greenha...@arm.com

* config/aarch64/aarch64.md (stack_protect_test_mode): Mark
scratch register as an output to placate register renaming.

gcc/testsuite/

2014-09-18  James Greenhalgh  james.greenha...@arm.com

* gcc.dg/ssp-3.c: New.
* gcc.dg/ssp-4.c: Likewise.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c60038a9015d614f40f6d9e3fd228ad3e2b247a8..f15a516bb0559c86bea7512f91d60dc179ec9149 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4031,7 +4031,7 @@ (define_insn stack_protect_test_mode
 	(unspec:PTR [(match_operand:PTR 1 memory_operand m)
 		 (match_operand:PTR 2 memory_operand m)]
 	 UNSPEC_SP_TEST))
-   (clobber (match_scratch:PTR 3 r))]
+   (clobber (match_scratch:PTR 3 =r))]
   
   ldr\t%w3, %x1\;ldr\t%w0, %x2\;eor\t%w0, %w3, %w0
   [(set_attr length 12)
diff --git a/gcc/testsuite/gcc.dg/ssp-3.c b/gcc/testsuite/gcc.dg/ssp-3.c
new file mode 100644
index 000..98c12da
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ssp-3.c
@@ -0,0 +1,16 @@
+/* { dg-do assemble } */
+/* { dg-options -fstack-protector-strong -O1 -frename-registers } */
+/* { dg-require-effective-target fstack_protector } */
+
+extern int bar (const char *s, int *argc);
+extern int baz (const char *s);
+
+char
+foo (const char *s)
+{
+  int argc;
+  int ret;
+  if ( !bar (s, argc))
+ret = baz (s);
+  return *s;
+}
diff --git a/gcc/testsuite/gcc.dg/ssp-4.c b/gcc/testsuite/gcc.dg/ssp-4.c
new file mode 100644
index 000..402033c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ssp-4.c
@@ -0,0 +1,18 @@
+/* { dg-do assemble } */
+/* { dg-options -fstack-protector-strong -O1 -frename-registers } */
+/* { dg-require-effective-target fstack_protector } */
+
+typedef unsigned int uint32_t;
+struct ctx
+{
+  uint32_t A;
+};
+
+void *
+buffer_copy (const struct ctx *ctx, void *resbuf)
+{
+  uint32_t buffer[4];
+  buffer[0] = (ctx-A);
+  __builtin_memcpy (resbuf, buffer, sizeof (buffer));
+  return resbuf;
+}

[Patch] Teach genrecog/genoutput that scratch registers require write constraint modifiers

2014-09-18 Thread James Greenhalgh

Hi,

As discussed in https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01334.html
The construct

  (clobber (match_scratch 0 r))

is invalid - operand 0 must be marked either write or read/write.

Likewise

  (match_* 0 r)

is invalid, marking an operand earlyclobber does not remove the need to
also mark it write or read/write.

This patch adds checking for these two error conditions to the generator
programs and documents the restriction.

Bootstrapped on x86, ARM and AArch64 with no new issues.

Ok?

Thanks,
James

---
2014-09-17  James Greenhalgh  james.greenha...@arm.com

* doc/md.texi (Modifiers): Consistently use read/write
nomenclature rather than input/output.
* genrecog.c (constraints_supported_in_insn_p): New.
(validate_pattern): If needed, also check constraints on
MATCH_SCRATCH operands.
* genoutput.c (validate_insn_alternatives): Catch earlyclobber
operands with no '=' or '+' modifier.
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 80e8bd6..435d850 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -1546,18 +1546,18 @@ Here are constraint modifier characters.
 @table @samp
 @cindex @samp{=} in constraint
 @item =
-Means that this operand is write-only for this instruction: the previous
-value is discarded and replaced by output data.
+Means that this operand is written to by this instruction:
+the previous value is discarded and replaced by new data.
 
 @cindex @samp{+} in constraint
 @item +
 Means that this operand is both read and written by the instruction.
 
 When the compiler fixes up the operands to satisfy the constraints,
-it needs to know which operands are inputs to the instruction and
-which are outputs from it.  @samp{=} identifies an output; @samp{+}
-identifies an operand that is both input and output; all other operands
-are assumed to be input only.
+it needs to know which operands are read by the instruction and
+which are written by it.  @samp{=} identifies an operand which is only
+written; @samp{+} identifies an operand that is both read and written; all
+other operands are assumed to only be read.
 
 If you specify @samp{=} or @samp{+} in a constraint, you put it in the
 first character of the constraint string.
@@ -1566,9 +1566,9 @@ first character of the constraint string.
 @cindex earlyclobber operand
 @item 
 Means (in a particular alternative) that this operand is an
-@dfn{earlyclobber} operand, which is modified before the instruction is
+@dfn{earlyclobber} operand, which is written before the instruction is
 finished using the input operands.  Therefore, this operand may not lie
-in a register that is used as an input operand or as part of any memory
+in a register that is read by the instruction or as part of any memory
 address.
 
 @samp{} applies only to the alternative in which it is written.  In
@@ -1576,16 +1576,19 @@ constraints with multiple alternatives, sometimes one alternative
 requires @samp{} while others do not.  See, for example, the
 @samp{movdf} insn of the 68000.
 
-An input operand can be tied to an earlyclobber operand if its only
-use as an input occurs before the early result is written.  Adding
-alternatives of this form often allows GCC to produce better code
-when only some of the inputs can be affected by the earlyclobber.
-See, for example, the @samp{mulsi3} insn of the ARM@.
+A operand which is read by the instruction can be tied to an earlyclobber
+operand if its only use as an input occurs before the early result is
+written.  Adding alternatives of this form often allows GCC to produce
+better code when only some of the read operands can be affected by the
+earlyclobber. See, for example, the @samp{mulsi3} insn of the ARM@.
 
-Furthermore, if the @dfn{earlyclobber} operand is also read/write operand, then
-that operand is modified only after it's used.
+Furthermore, if the @dfn{earlyclobber} operand is also a read/write
+operand, then that operand is written only after it's used.
 
-@samp{} does not obviate the need to write @samp{=} or @samp{+}.
+@samp{} does not obviate the need to write @samp{=} or @samp{+}.  As
+@dfn{earlyclobber} operands are always written, a read-only
+@dfn{earlyclobber} operand is ill-formed and will be rejected by the
+compiler.
 
 @cindex @samp{%} in constraint
 @item %
@@ -1593,7 +1596,7 @@ Declares the instruction to be commutative for this operand and the
 following operand.  This means that the compiler may interchange the
 two operands if that is the cheapest way to make all operands fit the
 constraints.  @samp{%} applies to all alternatives and must appear as
-the first character in the constraint.  Only input operands can use
+the first character in the constraint.  Only read-only operands can use
 @samp{%}.
 
 @ifset INTERNALS
diff --git a/gcc/genoutput.c b/gcc/genoutput.c
index 69d5ab0..8094288 100644
--- a/gcc/genoutput.c
+++ b/gcc/genoutput.c
@@ -769,6 +769,7 @@ validate_insn_alternatives (struct data *d)
 	char c;
 	int 

Re: Fix ARM ICE for register var asm (pc) (PR target/60606)

2014-09-18 Thread Alan Lawrence
It seems to be the change to arm_regno_class relating to PC_REGNUM. I see 
scal-to-vec1.c failing with just that, or that in combination with the changes 
to cfgexpand.c+varasm.c.


And scal-to-vec1.c is OK on -fPIC if I apply the changes to cfgexpand.c, 
varasm.c, and arm.c (arm_hard_regno_ok), i.e. all bar the change to arm_regno_class.


A change relating to the program counter affecting -fPIC does sound plausible, I 
haven't looked any further than that...


--Alan

Joseph S. Myers wrote:

On Wed, 17 Sep 2014, Alan Lawrence wrote:


We've just noticed this patch causes an ICE in
gcc.c-torture/execute/scal-to-vec1.c at -O3 when running with -fPIC on
arm-none-linux-gnueabi and arm-none-linux-gnueabihf; test logs:


Which part causes the ICE?  The arm_hard_regno_mode_ok change relating to 
modes assigned to CC_REGNUM, the arm_regno_class change relating to 
PC_REGNUM, or something else?  Either of those would indicate something 
very strange going on in LRA (maybe something else needs to change 
somewhere as well to stop attempts to use CC_REGNUM or PC_REGNUM 
inappropriately?).







[PATCH 4/5] Generalise invalid_mode_change_p

2014-09-18 Thread Richard Sandiford
This is the main patch for the bug.  We should treat a register as invalid
for a mode change if simplify_subreg_regno cannot provide a new register
number for the result.  We should treat a class as invalid for a mode change
if all registers in the class are invalid.  This is an extension of the old
CANNOT_CHANGE_MODE_CLASS-based check (simplify_subreg_regno checks C_C_C_M).

I forgot to say that the patch is a prerequisite to removing aarch64's
C_C_C_M.  There are other prerequisites too, but removing C_C_C_M without
this patch caused regressions in the existing testsuite, which is why no
new tests are needed.


gcc/
* hard-reg-set.h: Include hash-table.h.
(target_hard_regs): Add a finalize method and a x_simplifiable_subregs
field.
* target-globals.c (target_globals::~target_globals): Handle
hard_regs-finalize.
* rtl.h (subreg_shape): New structure.
(shape_of_subreg): New function.
(simplifiable_subregs): Declare.
* reginfo.c (simplifiable_subreg): New structure.
(simplifiable_subregs_hasher): Likewise.
(simplifiable_subregs): New function.
(invalid_mode_changes): Delete.
(alid_mode_changes, valid_mode_changes_obstack): New variables.
(record_subregs_of_mode): Remove subregs_of_mode parameter.
Record valid mode changes in valid_mode_changes.
(find_subregs_of_mode): Remove subregs_of_mode parameter.
Update calls to record_subregs_of_mode.
(init_subregs_of_mode): Remove invalid_mode_changes and bitmap
handling.  Initialize new variables.  Update call to
find_subregs_of_mode.
(invalid_mode_change_p): Check new variables instead of
invalid_mode_changes.
(finish_subregs_of_mode): Finalize new variables instead of
invalid_mode_changes.
(target_hard_regs::finalize): New function.
* ira-costs.c (print_allocno_costs): Call invalid_mode_change_p
even when CLASS_CANNOT_CHANGE_MODE is undefined.

Index: gcc/hard-reg-set.h
===
--- gcc/hard-reg-set.h  2014-09-15 11:55:40.459855161 +0100
+++ gcc/hard-reg-set.h  2014-09-15 11:55:40.455855210 +0100
@@ -20,6 +20,8 @@ Software Foundation; either version 3, o
 #ifndef GCC_HARD_REG_SET_H
 #define GCC_HARD_REG_SET_H
 
+#include hash-table.h
+
 /* Define the type of a set of hard registers.  */
 
 /* HARD_REG_ELT_TYPE is a typedef of the unsigned integral type which
@@ -613,7 +615,11 @@ #define EXECUTE_IF_SET_IN_HARD_REG_SET(S
 
 extern char global_regs[FIRST_PSEUDO_REGISTER];
 
+struct simplifiable_subregs_hasher;
+
 struct target_hard_regs {
+  void finalize ();
+
   /* The set of registers that actually exist on the current target.  */
   HARD_REG_SET x_accessible_reg_set;
 
@@ -688,6 +694,10 @@ struct target_hard_regs {
 
   /* Vector indexed by hardware reg giving its name.  */
   const char *x_reg_names[FIRST_PSEUDO_REGISTER];
+
+  /* Records which registers can form a particular subreg, with the subreg
+ being identified by its outer mode, inner mode and offset.  */
+  hash_table simplifiable_subregs_hasher *x_simplifiable_subregs;
 };
 
 extern struct target_hard_regs default_target_hard_regs;
Index: gcc/target-globals.c
===
--- gcc/target-globals.c2014-09-15 11:55:40.459855161 +0100
+++ gcc/target-globals.c2014-09-15 11:55:40.459855161 +0100
@@ -125,6 +125,7 @@ target_globals::~target_globals ()
   /* default_target_globals points to static data so shouldn't be freed.  */
   if (this != default_target_globals)
 {
+  hard_regs-finalize ();
   XDELETE (flag_state);
   XDELETE (regs);
   XDELETE (recog);
Index: gcc/rtl.h
===
--- gcc/rtl.h   2014-09-15 11:55:40.459855161 +0100
+++ gcc/rtl.h   2014-09-15 12:26:21.249077760 +0100
@@ -1822,6 +1822,64 @@ costs_add_n_insns (struct full_rtx_costs
   c-size += COSTS_N_INSNS (n);
 }
 
+/* Describes the shape of a subreg:
+
+   inner_mode == the mode of the SUBREG_REG
+   offset == the SUBREG_BYTE
+   outer_mode == the mode of the SUBREG itself.  */
+struct subreg_shape {
+  subreg_shape (enum machine_mode, unsigned int, enum machine_mode);
+  bool operator == (const subreg_shape ) const;
+  bool operator != (const subreg_shape ) const;
+  unsigned int unique_id () const;
+
+  enum machine_mode inner_mode;
+  unsigned int offset;
+  enum machine_mode outer_mode;
+};
+
+inline
+subreg_shape::subreg_shape (enum machine_mode inner_mode_in,
+   unsigned int offset_in,
+   enum machine_mode outer_mode_in)
+  : inner_mode (inner_mode_in), offset (offset_in), outer_mode (outer_mode_in)
+{}
+
+inline bool
+subreg_shape::operator == (const subreg_shape other) const
+{
+  return (inner_mode == other.inner_mode
+  offset == other.offset
+ 

[PATCH 5/5] Remove CANNOT_CHANGE_MODE_CLASS workaround in i386.c

2014-09-18 Thread Richard Sandiford
Patch 4 should make it possible to relax i386'a CANNOT_CHANGE_MODE_CLASS,
solving the missed optimisation that triggered the original thread.


gcc/
* config/i386/i386.c (ix86_cannot_change_mode_class): Remove
GET_MODE_SIZE (to)  GET_MODE_SIZE (from) test.

Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c  2014-09-15 09:48:11.310438531 +0100
+++ gcc/config/i386/i386.c  2014-09-15 09:48:11.310438531 +0100
@@ -37526,13 +37526,6 @@ ix86_cannot_change_mode_class (enum mach
 the vec_dupv4hi pattern.  */
   if (GET_MODE_SIZE (from)  4)
return true;
-
-  /* Vector registers do not support subreg with nonzero offsets, which
-are otherwise valid for integer registers.  Since we can't see
-whether we have a nonzero offset from here, prohibit all
- nonparadoxical subregs changing size.  */
-  if (GET_MODE_SIZE (to)  GET_MODE_SIZE (from))
-   return true;
 }
 
   return false;



[patch] powerpc-vxworksmils port, variant of powerpc-vxworksae

2014-09-18 Thread Olivier Hainque
Hello,

We have been maintaining a port to VxWorks MILS for powerpc for a while now
and thought others might be interested.

VxWorksMILS is very close to VxWorksAE, so the patch is pretty small. The main
noticeable difference is that only the vthreads environment headers are
available, so we arrange to build the libgcc variants all with -mvthreads.

We have been using this with a gcc-4.7 based toolchain for a couple of
years and moved to gcc-4.9 recently. The 4.9 patch applies as-is on mainline.

OK to commit ?

Thanks in advance for your feedback,

With Kind Regards,

Olivier

2014-09-18  Olivier Hainque  hain...@adacore.com

gcc/
* config.gcc (powerpc-wrs-vxworksmils): New configuration.
* config/rs6000/t-vxworksmils: New file.
* config/rs6000/vxworksmils.h: New file.

libgcc/
* config.host (powerpc-wrs-vxworksmils): New configuration,
same as vxworksae.

contrib/
* config-list.mk (LIST): Add powerpc-wrs-vxworksmils.



vxmils.diff
Description: Binary data


Re: [PATCH][PING] Enable -fsanitize-recover for KASan

2014-09-18 Thread Jakub Jelinek
On Mon, Sep 15, 2014 at 01:38:42PM +0400, Yury Gribov wrote:
 --- a/gcc/builtins.def
 +++ b/gcc/builtins.def
 @@ -176,7 +176,7 @@ along with GCC; see the file COPYING3.  If not see
DEF_BUILTIN (ENUM, __builtin_ NAME, BUILT_IN_NORMAL, TYPE, TYPE,\
  true, true, true, ATTRS, true, \
 (flag_sanitize  (SANITIZE_ADDRESS | SANITIZE_THREAD \
 - | SANITIZE_UNDEFINED | SANITIZE_NONDEFAULT)))
 + | SANITIZE_UNDEFINED | 
 SANITIZE_UNDEFINED_NONDEFAULT)))

This is too long line after the change.

 --- a/gcc/gcc.c
 +++ b/gcc/gcc.c
 @@ -8236,7 +8236,7 @@ sanitize_spec_function (int argc, const char **argv)
if (strcmp (argv[0], thread) == 0)
  return (flag_sanitize  SANITIZE_THREAD) ?  : NULL;
if (strcmp (argv[0], undefined) == 0)
 -return ((flag_sanitize  (SANITIZE_UNDEFINED | SANITIZE_NONDEFAULT))
 +return ((flag_sanitize  (SANITIZE_UNDEFINED | 
 SANITIZE_UNDEFINED_NONDEFAULT))

Likewise.

 --- a/gcc/opts.c
 +++ b/gcc/opts.c
 @@ -1551,6 +1551,12 @@ common_handle_option (struct gcc_options *opts,
| SANITIZE_RETURNS_NONNULL_ATTRIBUTE))
 opts-x_flag_delete_null_pointer_checks = 0;
  
 + /* UBSan and KASan enable recovery by default.  */
 + opts-x_flag_sanitize_recover
 +   = !!(flag_sanitize  (SANITIZE_UNDEFINED
 + | SANITIZE_UNDEFINED_NONDEFAULT
 + | SANITIZE_KERNEL_ADDRESS));
 +

Doesn't this override even user supplied -fsanitize-recover or
-fno-sanitize-recover ?  Have you tried both
-fno-sanitize-recover -fsanitize=kernel-address
and
-fsanitize=kernel-address -fno-sanitize-recover
option orders?

Seems for -fdelete-null-pointer-checks we got it wrong too,
IMHO for -fsanitize={null,{,returns-}nonnull-attribute,undefined}
we want to disable it unconditionally, regardless of whether
that option appears on the command line or not.
And we handle it right for 
-fdelete-null-pointer-checks -fsanitize=undefined
but not for
-fsanitize=undefined -fdelete-null-pointer-checks
Joseph, thoughts where to override it instead (I mean, after all
options are processed)?

In the -fsanitize-recover case, I'd on the other side think that
it should just override the default and not override explicit
user's decision.  Which could be done here, but supposedly guarded
with if (!opts_set-x_flag_sanitize_recover)?

I don't think your proposal will work properly though,
if one compiles with
-fsanitize=undefined -fsanitize=address
you'll just get userland asan with error recovery, which is highly
undesirable (not just that it changes the behavior from how it
behaved before, but especially because libasan doesn't contain
such entrypoints at all).
-fsanitize=undefined,address
or
-fsanitize=address,undefined
is normal supported mode and thus I think you either can't reuse
-fsanitize-recover option for what you want to do, or
asan.c needs to limit it to flag_sanitize  SANITIZE_KERNEL_ADDRESS
mode only.  Depends if you ever want to add recovery for userland
sanitization.

Jakub


Re: [kyukhin/gomp4-offload] DESTDIR issues

2014-09-18 Thread Ilya Verbin
Ok, the approach with additional --enable-offload-targets arguments seems to be
more appropriate, so I will fix offloading infrastructure pach #1.

Thanks,
  -- Ilya


Re: [PATCH][Kasan][PING] Allow to override Asan shadow offset from command line

2014-09-18 Thread Jakub Jelinek
On Mon, Sep 15, 2014 at 01:46:14PM +0400, Yury Gribov wrote:
 On 09/08/2014 06:29 PM, Yury Gribov wrote:
 Kasan developers has asked for an option to override offset of Asan
 shadow memory region. This should simplify experimenting with memory
 layouts on 64-bit architectures.
 
 I've bootstrapped and regtested this on x64.
 
 Ok to commit?

I don't like it at all.  For the kernel-address perhaps it might make sense
as a param, but for userland, as it is an ABI changing option, I'm afraid
people would start to create objects/shared libraries/binaries with
ABI incompatible values.
So, if you need it for kernel, use a param that can be eventually dropped,
and limit it to kernel-address mode only.

Jakub


Re: [PATCH][Kasan][PING] Allow to override Asan shadow offset from command line

2014-09-18 Thread Yury Gribov

On 09/18/2014 03:01 PM, Jakub Jelinek wrote:

On Mon, Sep 15, 2014 at 01:46:14PM +0400, Yury Gribov wrote:

On 09/08/2014 06:29 PM, Yury Gribov wrote:

Kasan developers has asked for an option to override offset of Asan
shadow memory region. This should simplify experimenting with memory
layouts on 64-bit architectures.

I've bootstrapped and regtested this on x64.

Ok to commit?


I don't like it at all.  For the kernel-address perhaps it might make sense
as a param, but for userland, as it is an ABI changing option, I'm afraid
people would start to create objects/shared libraries/binaries with
ABI incompatible values.
So, if you need it for kernel, use a param that can be eventually dropped,
and limit it to kernel-address mode only.


Problem with params is that they are ints so won't work for 64-bit 
platforms. How about aborting if -fasan-shadow-offset is supplied 
without -fsanitize=kernel-address?


-Y


[PATCH 0/14+2][Vectorizer] Made reductions endianness-neutral, fixes PR/61114

2014-09-18 Thread Alan Lawrence
The end goal here is to remove this code from tree-vect-loop.c 
(vect_create_epilog_for_reduction):


  if (BYTES_BIG_ENDIAN)
bitpos = size_binop (MULT_EXPR,
 bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) - 1),
 TYPE_SIZE (scalar_type));
  else

as this is the root cause of PR/61114 (see testcase there, failing on all 
bigendian targets supporting reduc_[us]plus_optab). Quoting Richard Biener, all 
code conditional on BYTES/WORDS_BIG_ENDIAN in tree-vect* is suspicious. The 
code snippet above is used on two paths:


(Path 1) (patches 1-6) Reductions using REDUC_(PLUS|MIN|MAX)_EXPR = 
reduc_[us](plus|min|max)_optab.
The optab is documented as the scalar result is stored in the least significant 
bits of operand 0, but the tree code as the first element in the vector 
holding the result of the reduction of all elements of the operand. This 
mismatch means that when the tree code is folded, the code snippet above reads 
the result from the wrong end of the vector.


The strategy (as per https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html) 
is to define new tree codes and optabs that produce scalar results directly; 
this seems better than tying (the element of the vector into which the result is 
placed) to (the endianness of the target), and avoids generating extra moves on 
current bigendian targets. However, the previous optabs are retained for now as 
a migration strategy so as not to break existing backends; moving individual 
platforms over will follow.


A complication here is on AArch64, where we directly generate REDUC_PLUS_EXPRs 
from intrinsics in gimple_fold_builtin; I temporarily remove this folding in 
order to decouple the midend and AArch64 backend.


(Path 2) (patches 7-13) Reductions using whole-vector-shifts, i.e. 
VEC_RSHIFT_EXPR and vec_shr_optab. Here the tree code as well as the optab is 
defined in an endianness-dependent way, leading to significant complication in 
fold-const.c. (Moreover, the equivalent vec_shl_optab is never used!). Few 
platforms appear to handle vec_shr_optab (and fewer bigendian - I see only 
PowerPC and MIPS), so it seems pertinent to change the existing optab to be 
endianness-neutral.


Patch 10 defines vec_shr for AArch64, for the old specification; patch 13 
updates that implementation to fit the new endianness-neutral specification, 
serving as a guide for other existing backends. Patches/RFCs 15 and 16 are 
equivalents for MIPS and PowerPC; I haven't tested these but hope they act as 
useful pointers for the port maintainers.


Finally patch 14 cleans up the affected part of tree-vect-loop.c 
(vect_create_epilog_for_reduction).


--Alan



[PATCH 1/14][AArch64] Temporarily remove aarch64_gimple_fold_builtin code for reduction operations

2014-09-18 Thread Alan Lawrence
The gimple folding ties the AArch64 backend to the tree representation of the 
midend via the neon intrinsics. This code enables constant folding of Neon 
intrinsics reduction ops, so improves performance, but is not necessary for 
correctness. By temporarily removing it (here), we can then change the midend 
representation independently of the AArch64 backend + intrinsics.


However, I'm leaving the code in place, as a later patch will bring it all back 
in a very similar form (but enabled for bigendian).


Bootstrapped on aarch64-none-linux; tested aarch64.exp on aarch64-none-elf and 
aarch64_be-none-elf. (The removed code was already disabled for bigendian; and 
this is solely a __builtin-folding mechanism, i.e. used only for Neon/ACLE 
intrinsics.)


gcc/ChangeLog:
* config/aarch64/aarch64.c (TARGET_GIMPLE_FOLD_BUILTIN): Comment out.
* config/aarch64/aarch64-builtins.c (aarch64_gimple_fold_builtin):
Remove using preprocessor directives.diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 5217f4a5f39224dbf8029542ad33790ef2c191be..15eb7c686d95b1d66cbd514500ec29ba074eaa3f 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -1333,6 +1333,9 @@ aarch64_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED, tree *args,
   return NULL_TREE;
 }
 
+/* Handling of reduction operations temporarily removed so as to decouple
+   changes to tree codes from AArch64 NEON Intrinsics.  */
+#if 0
 bool
 aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 {
@@ -1404,6 +1407,7 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 
   return changed;
 }
+#endif
 
 void
 aarch64_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index e7946fc0b70ced70a4e98caa0a33121f29242aad..9197ec038b7d40a601c886b846113c50a29cf5e2 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9925,8 +9925,8 @@ aarch64_expand_movmem (rtx *operands)
 #undef TARGET_FRAME_POINTER_REQUIRED
 #define TARGET_FRAME_POINTER_REQUIRED aarch64_frame_pointer_required
 
-#undef TARGET_GIMPLE_FOLD_BUILTIN
-#define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin
+//#undef TARGET_GIMPLE_FOLD_BUILTIN
+//#define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin
 
 #undef TARGET_GIMPLIFY_VA_ARG_EXPR
 #define TARGET_GIMPLIFY_VA_ARG_EXPR aarch64_gimplify_va_arg_expr

[PATCH i386 AVX512] [42/n] Add masked vunpck[lh]pd.

2014-09-18 Thread Kirill Yukhin
Hello,
Patch in the bottom extends/adds patterns for masked unpack 
instructions.

Bootstrapped.
AVX-512* tests on top of patch-set all pass
under simulator.

Is it ok for trunk?

gcc/
* config/i386/sse.md
(define_insn avx_unpckhpd256mask_name): Add masking.
(define_insn avx512vl_unpckhpd128_mask): New.
(define_expand avx_movddup256mask_name): Add masking.
(define_expand avx_unpcklpd256mask_name): Ditto.
(define_insn *avx_unpcklpd256mask_name): Ditto.
(define_insn avx512vl_unpcklpd128_mask): New.

--
Thanks, K

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 533308b..ab2d3b1 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -7081,16 +7081,16 @@
(set_attr mode V8DF)])
 
 ;; Recall that the 256-bit unpck insns only shuffle within their lanes.
-(define_insn avx_unpckhpd256
-  [(set (match_operand:V4DF 0 register_operand =x)
+(define_insn avx_unpckhpd256mask_name
+  [(set (match_operand:V4DF 0 register_operand =v)
(vec_select:V4DF
  (vec_concat:V8DF
-   (match_operand:V4DF 1 register_operand x)
-   (match_operand:V4DF 2 nonimmediate_operand xm))
+   (match_operand:V4DF 1 register_operand v)
+   (match_operand:V4DF 2 nonimmediate_operand vm))
  (parallel [(const_int 1) (const_int 5)
 (const_int 3) (const_int 7)])))]
-  TARGET_AVX
-  vunpckhpd\t{%2, %1, %0|%0, %1, %2}
+  TARGET_AVX  mask_avx512vl_condition
+  vunpckhpd\t{%2, %1, %0mask_operand3|%0mask_operand3, %1, %2}
   [(set_attr type sselog)
(set_attr prefix vex)
(set_attr mode V4DF)])
@@ -7124,6 +7124,22 @@
 })
 
 
+(define_insn avx512vl_unpckhpd128_mask
+  [(set (match_operand:V2DF 0 register_operand =v)
+   (vec_merge:V2DF
+ (vec_select:V2DF
+   (vec_concat:V4DF
+ (match_operand:V2DF 1 register_operand v)
+ (match_operand:V2DF 2 nonimmediate_operand vm))
+   (parallel [(const_int 1) (const_int 3)]))
+ (match_operand:V2DF 3 vector_move_operand 0C)
+ (match_operand:QI 4 register_operand Yk)))]
+  TARGET_AVX512VL
+  vunpckhpd\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}
+  [(set_attr type sselog)
+   (set_attr prefix evex)
+   (set_attr mode V2DF)])
+
 (define_expand vec_interleave_highv2df
   [(set (match_operand:V2DF 0 register_operand)
(vec_select:V2DF
@@ -7204,7 +7220,7 @@
(set_attr mode V8DF)])
 
 ;; Recall that the 256-bit unpck insns only shuffle within their lanes.
-(define_expand avx_movddup256
+(define_expand avx_movddup256mask_name
   [(set (match_operand:V4DF 0 register_operand)
(vec_select:V4DF
  (vec_concat:V8DF
@@ -7212,9 +7228,9 @@
(match_dup 1))
  (parallel [(const_int 0) (const_int 4)
 (const_int 2) (const_int 6)])))]
-  TARGET_AVX)
+  TARGET_AVX  mask_avx512vl_condition)
 
-(define_expand avx_unpcklpd256
+(define_expand avx_unpcklpd256mask_name
   [(set (match_operand:V4DF 0 register_operand)
(vec_select:V4DF
  (vec_concat:V8DF
@@ -7222,20 +7238,20 @@
(match_operand:V4DF 2 nonimmediate_operand))
  (parallel [(const_int 0) (const_int 4)
 (const_int 2) (const_int 6)])))]
-  TARGET_AVX)
+  TARGET_AVX  mask_avx512vl_condition)
 
-(define_insn *avx_unpcklpd256
-  [(set (match_operand:V4DF 0 register_operand =x,x)
+(define_insn *avx_unpcklpd256mask_name
+  [(set (match_operand:V4DF 0 register_operand =v,v)
(vec_select:V4DF
  (vec_concat:V8DF
-   (match_operand:V4DF 1 nonimmediate_operand  x,m)
-   (match_operand:V4DF 2 nonimmediate_operand xm,1))
+   (match_operand:V4DF 1 nonimmediate_operand  v,m)
+   (match_operand:V4DF 2 nonimmediate_operand vm,1))
  (parallel [(const_int 0) (const_int 4)
 (const_int 2) (const_int 6)])))]
-  TARGET_AVX
+  TARGET_AVX  mask_avx512vl_condition
   @
-   vunpcklpd\t{%2, %1, %0|%0, %1, %2}
-   vmovddup\t{%1, %0|%0, %1}
+   vunpcklpd\t{%2, %1, %0mask_operand3|%0mask_operand3, %1, %2}
+   vmovddup\t{%1, %0mask_operand3|%0mask_operand3, %1}
   [(set_attr type sselog)
(set_attr prefix vex)
(set_attr mode V4DF)])
@@ -7268,6 +7284,22 @@
   operands[4] = gen_reg_rtx (V4DFmode);
 })
 
+(define_insn avx512vl_unpcklpd128_mask
+  [(set (match_operand:V2DF 0 register_operand =v)
+   (vec_merge:V2DF
+ (vec_select:V2DF
+   (vec_concat:V4DF
+ (match_operand:V2DF 1 register_operand v)
+ (match_operand:V2DF 2 nonimmediate_operand vm))
+   (parallel [(const_int 0) (const_int 2)]))
+ (match_operand:V2DF 3 vector_move_operand 0C)
+ (match_operand:QI 4 register_operand Yk)))]
+  TARGET_AVX512VL
+  vunpcklpd\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}
+  [(set_attr type sselog)
+   (set_attr prefix evex)
+   (set_attr mode V2DF)])
+
 (define_expand vec_interleave_lowv2df
   [(set (match_operand:V2DF 0 

[PATCH 2/14][Vectorizer] Make REDUC_xxx_EXPR tree codes produce a scalar result

2014-09-18 Thread Alan Lawrence

This fixes PR/61114 by redefining the REDUC_{MIN,MAX,PLUS}_EXPR tree codes.

These are presently documented as producing a vector with the result in element 
0, and this is inconsistent with their use in tree-vect-loop.c (which on 
bigendian targets pulls the bits out of the wrong end of the vector result). 
This leads to bugs on bigendian targets - see also 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114.


I discounted fixing the vectorizer (to read from element 0) and then making 
bigendian targets (whose architectural insn produces the result in lane N-1) 
permute the result vector, as optimization of vectors in RTL seems unlikely to 
remove such a permute and would lead to a performance regression.


Instead it seems more natural for the tree code to produce a scalar result 
(producing a vector with the result in lane 0 has already caused confusion, e.g. 
https://gcc.gnu.org/ml/gcc-patches/2012-10/msg01100.html).


However, this patch preserves the meaning of the optab (producing a result in 
lane 0 on little-endian architectures or N-1 on bigendian), thus generally 
avoiding the need to change backends. Thus, expr.c extracts an 
endianness-dependent element from the optab result to give the result expected 
for the tree code.


Previously posted as an RFC 
https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html , now with an extra 
VIEW_CONVERT_EXPR if the types of the reduction/result do not match.


Testing:
x86_86-none-linux-gnu: bootstrap, check-gcc, check-g++
aarch64-none-linux-gnu: bootstrap
aarch64-none-elf:  check-gcc, check-g++
arm-none-eabi: check-gcc

aarch64_be-none-elf: check-gcc, showing
FAIL-PASS: gcc.dg/vect/no-scevccp-outer-7.c execution test
FAIL-PASS: gcc.dg/vect/no-scevccp-outer-13.c execution test
Passes the (previously-failing) reduced testcase on
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114

Have also assembler/stage-1 tested that testcase on PowerPC, also fixed.

gcc/ChangeLog:

* expr.c (expand_expr_real_2): For REDUC_{MIN,MAX,PLUS}_EXPR, add
extract_bit_field around optab result.

* fold-const.c (fold_unary_loc): For REDUC_{MIN,MAX,PLUS}_EXPR, produce
scalar not vector.

* tree-cfg.c (verify_gimple_assign_unary): Check result vs operand type
for REDUC_{MIN,MAX,PLUS}_EXPR.

* tree-vect-loop.c (vect_analyze_loop): Update comment.
(vect_create_epilog_for_reduction): For direct vector reduction, use
result of tree code directly without extract_bit_field.

* tree.def (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR): Update
comment.
diff --git a/gcc/expr.c b/gcc/expr.c
index 58b87ba7ed7eee156b9730b61679af946694e8df..a293c06489f09586ed56dff1381467401687be45 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9020,7 +9020,17 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
   {
 op0 = expand_normal (treeop0);
 this_optab = optab_for_tree_code (code, type, optab_default);
-temp = expand_unop (mode, this_optab, op0, target, unsignedp);
+enum machine_mode vec_mode = TYPE_MODE (TREE_TYPE (treeop0));
+temp = expand_unop (vec_mode, this_optab, op0, NULL_RTX, unsignedp);
+gcc_assert (temp);
+/* The tree code produces a scalar result, but (somewhat by convention)
+   the optab produces a vector with the result in element 0 if
+   little-endian, or element N-1 if big-endian.  So pull the scalar
+   result out of that element.  */
+int index = BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (vec_mode) - 1 : 0;
+int bitsize = GET_MODE_BITSIZE (GET_MODE_INNER (vec_mode));
+temp = extract_bit_field (temp, bitsize, bitsize * index, unsignedp,
+  target, mode, mode);
 gcc_assert (temp);
 return temp;
   }
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index d44476972158b125aecd8c4a5c8d6176ad3b0e5c..b8baa94d37a74ebb824e2a4d03f2a10befcdf749 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -8475,12 +8475,13 @@ fold_unary_loc (location_t loc, enum tree_code code, tree type, tree op0)
 case REDUC_MAX_EXPR:
 case REDUC_PLUS_EXPR:
   {
-	unsigned int nelts = TYPE_VECTOR_SUBPARTS (type), i;
+	unsigned int nelts, i;
 	tree *elts;
 	enum tree_code subcode;
 
 	if (TREE_CODE (op0) != VECTOR_CST)
 	  return NULL_TREE;
+nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0));
 
 	elts = XALLOCAVEC (tree, nelts);
 	if (!vec_cst_ctor_to_array (op0, elts))
@@ -8499,10 +8500,9 @@ fold_unary_loc (location_t loc, enum tree_code code, tree type, tree op0)
 	elts[0] = const_binop (subcode, elts[0], elts[i]);
 	if (elts[0] == NULL_TREE || !CONSTANT_CLASS_P (elts[0]))
 	  return NULL_TREE;
-	elts[i] = build_zero_cst (TREE_TYPE (type));
 	  }
 
-	return build_vector (type, elts);
+	return elts[0];
   }
 
 default:
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c

Re: [PATCH] RTEMS: Update contrib/config-list.mk

2014-09-18 Thread Jan-Benedict Glaw
On Wed, 2014-09-17 10:52:34 -0500, Joel Sherrill joel.sherr...@oarcorp.com 
wrote:
 On 9/17/2014 10:41 AM, Sebastian Huber wrote:
  On 09/17/2014 04:45 PM, Jan-Benedict Glaw wrote:
   On Wed, 2014-09-17 15:37:32 +0200, Sebastian 
   Hubersebastian.hu...@embedded-brains.de  wrote:
 contrib/ChangeLog
 2014-09-17  Sebastian Hubersebastian.hu...@embedded-brains.de

   * config-list.mk (LIST): Add arm-rtems.
   Add nios2-rtems.  Remove extra option from powerpc-rtems.
   What's the rationale for removing --enable-threads=yes here, as well
   as the specific version number?
[...]
 And is this the input to your buildbot? :)

Yes, the target list in contrib/config-list.mk is what'll be built
using the config-list.mk-building backend. (The robot has another
backend using a different build strategy, which has a separate target
list, though one could argue that I'd also include all the
config-list.mk targets in that other list as well.)

  And to tell the whole story, Sebastian approached me with extending
the target lists in use by those targets he sent a patch for; I just
asked him to go this route, because I guess that'd be beneficial for
other folks as well.

MfG, JBG

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
Signature of:  The course of history shows that as a government grows, liberty
the second  : decreases.  (Thomas Jefferson)


signature.asc
Description: Digital signature


[PATCH i386 AVX512] [43/n] Add rest of vunpck[lh]ps.

2014-09-18 Thread Kirill Yukhin
Hello,
This patch adds rest of unpack insn patterns.

Bootstrapped.
AVX-512* tests on top of patch-set all pass
under simulator.

Is it ok for trunk?

gcc/
* config/i386/sse.md
(define_insn avx_unpckhps256mask_name): Add masking.
(define_insn vec_interleave_highv4sfmask_name): Ditto.
(define_insn avx_unpcklps256mask_name): Ditto.
(define_insn unpcklps128_mask): New.

--
Thanks, K

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index ab2d3b1..295f11a 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -5525,18 +5525,18 @@
(set_attr mode V16SF)])
 
 ;; Recall that the 256-bit unpck insns only shuffle within their lanes.
-(define_insn avx_unpckhps256
-  [(set (match_operand:V8SF 0 register_operand =x)
+(define_insn avx_unpckhps256mask_name
+  [(set (match_operand:V8SF 0 register_operand =v)
(vec_select:V8SF
  (vec_concat:V16SF
-   (match_operand:V8SF 1 register_operand x)
-   (match_operand:V8SF 2 nonimmediate_operand xm))
+   (match_operand:V8SF 1 register_operand v)
+   (match_operand:V8SF 2 nonimmediate_operand vm))
  (parallel [(const_int 2) (const_int 10)
 (const_int 3) (const_int 11)
 (const_int 6) (const_int 14)
 (const_int 7) (const_int 15)])))]
-  TARGET_AVX
-  vunpckhps\t{%2, %1, %0|%0, %1, %2}
+  TARGET_AVX  mask_avx512vl_condition
+  vunpckhps\t{%2, %1, %0mask_operand3|%0mask_operand3, %1, %2}
   [(set_attr type sselog)
(set_attr prefix vex)
(set_attr mode V8SF)])
@@ -5575,18 +5575,18 @@
   operands[4] = gen_reg_rtx (V8SFmode);
 })
 
-(define_insn vec_interleave_highv4sf
-  [(set (match_operand:V4SF 0 register_operand =x,x)
+(define_insn vec_interleave_highv4sfmask_name
+  [(set (match_operand:V4SF 0 register_operand =x,v)
(vec_select:V4SF
  (vec_concat:V8SF
-   (match_operand:V4SF 1 register_operand 0,x)
-   (match_operand:V4SF 2 nonimmediate_operand xm,xm))
+   (match_operand:V4SF 1 register_operand 0,v)
+   (match_operand:V4SF 2 nonimmediate_operand xm,vm))
  (parallel [(const_int 2) (const_int 6)
 (const_int 3) (const_int 7)])))]
-  TARGET_SSE
+  TARGET_SSE  mask_avx512vl_condition
   @
unpckhps\t{%2, %0|%0, %2}
-   vunpckhps\t{%2, %1, %0|%0, %1, %2}
+   vunpckhps\t{%2, %1, %0mask_operand3|%0mask_operand3, %1, %2}
   [(set_attr isa noavx,avx)
(set_attr type sselog)
(set_attr prefix orig,vex)
@@ -5613,22 +5613,39 @@
(set_attr mode V16SF)])
 
 ;; Recall that the 256-bit unpck insns only shuffle within their lanes.
-(define_insn avx_unpcklps256
-  [(set (match_operand:V8SF 0 register_operand =x)
+(define_insn avx_unpcklps256mask_name
+  [(set (match_operand:V8SF 0 register_operand =v)
(vec_select:V8SF
  (vec_concat:V16SF
-   (match_operand:V8SF 1 register_operand x)
-   (match_operand:V8SF 2 nonimmediate_operand xm))
+   (match_operand:V8SF 1 register_operand v)
+   (match_operand:V8SF 2 nonimmediate_operand vm))
  (parallel [(const_int 0) (const_int 8)
 (const_int 1) (const_int 9)
 (const_int 4) (const_int 12)
 (const_int 5) (const_int 13)])))]
-  TARGET_AVX
-  vunpcklps\t{%2, %1, %0|%0, %1, %2}
+  TARGET_AVX  mask_avx512vl_condition
+  vunpcklps\t{%2, %1, %0mask_operand3|%0mask_operand3, %1, %2}
   [(set_attr type sselog)
(set_attr prefix vex)
(set_attr mode V8SF)])
 
+(define_insn unpcklps128_mask
+  [(set (match_operand:V4SF 0 register_operand =v)
+   (vec_merge:V4SF
+ (vec_select:V4SF
+   (vec_concat:V8SF
+ (match_operand:V4SF 1 register_operand v)
+ (match_operand:V4SF 2 nonimmediate_operand vm))
+   (parallel [(const_int 0) (const_int 4)
+ (const_int 1) (const_int 5)]))
+ (match_operand:V4SF 3 vector_move_operand 0C)
+ (match_operand:QI 4 register_operand Yk)))]
+  TARGET_AVX512VL
+  vunpcklps\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}
+  [(set_attr type sselog)
+   (set_attr prefix evex)
+   (set_attr mode V4SF)])
+
 (define_expand vec_interleave_lowv8sf
   [(set (match_dup 3)
(vec_select:V8SF


[PATCH 3/14] Add new optabs for reducing vectors to scalars

2014-09-18 Thread Alan Lawrence
These match their corresponding tree codes, by taking a vector and returning a 
scalar; this is more architecturally neutral than the (somewhat loosely defined) 
previous optab that took a vector and returned a vector with the result in the 
least significant bits (i.e. element 0 for little-endian or N-1 for bigendian). 
However, the old optabs are preserved so as not to break existing backends, so 
clients check for both old + new optabs.


Bootstrap, check-gcc and check-g++ on x86_64-none-linux-gnu.
aarch64.exp + vect.exp on aarch64{,_be}-none-elf.
(of course at this point in the series all these are using the old optab + 
migration path.)


gcc/ChangeLog:

* doc/md.texi (Standard Names): Add reduc_(plus,[us](min|max))|scal
optabs, and note in reduc_[us](plus|min|max) to prefer the former.

* expr.c (expand_expr_real_2): Use reduc_..._scal if available, fall
back to old reduc_... + BIT_FIELD_REF only if not.

* optabs.c (optab_for_tree_code): for REDUC_(MAX,MIN,PLUS)_EXPR,
return the reduce-to-scalar (reduc_..._scal) optab.
(scalar_reduc_to_vector): New.

* optabs.def (reduc_smax_scal_optab, reduc_smin_scal_optab,
reduc_plus_scal_optab, reduc_umax_scal_optab, reduc_umin_scal_optab):
New.

* optabs.h (scalar_reduc_to_vector): Declare.

* tree-vect-loop.c (vectorizable_reduction): Look for optabs reducing
to either scalar or vector.diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index dd7861188afb8afd01971f9f75f0e32da9f9f826..3f5fd6f0e3ac3fcc30f6c961e3e2709a35f4d413 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4811,29 +4811,48 @@ it is unspecified which of the two operands is returned as the result.
 @cindex @code{reduc_smax_@var{m}} instruction pattern
 @item @samp{reduc_smin_@var{m}}, @samp{reduc_smax_@var{m}}
 Find the signed minimum/maximum of the elements of a vector. The vector is
-operand 1, and the scalar result is stored in the least significant bits of
+operand 1, and the result is stored in the least significant bits of
 operand 0 (also a vector). The output and input vector should have the same
-modes.
+modes. These are legacy optabs, and platforms should prefer to implement
+@samp{reduc_smin_scal_@var{m}} and @samp{reduc_smax_scal_@var{m}}.
 
 @cindex @code{reduc_umin_@var{m}} instruction pattern
 @cindex @code{reduc_umax_@var{m}} instruction pattern
 @item @samp{reduc_umin_@var{m}}, @samp{reduc_umax_@var{m}}
 Find the unsigned minimum/maximum of the elements of a vector. The vector is
-operand 1, and the scalar result is stored in the least significant bits of
+operand 1, and the result is stored in the least significant bits of
 operand 0 (also a vector). The output and input vector should have the same
-modes.
+modes. These are legacy optabs, and platforms should prefer to implement
+@samp{reduc_umin_scal_@var{m}} and @samp{reduc_umax_scal_@var{m}}.
 
 @cindex @code{reduc_splus_@var{m}} instruction pattern
-@item @samp{reduc_splus_@var{m}}
-Compute the sum of the signed elements of a vector. The vector is operand 1,
-and the scalar result is stored in the least significant bits of operand 0
-(also a vector). The output and input vector should have the same modes.
-
 @cindex @code{reduc_uplus_@var{m}} instruction pattern
-@item @samp{reduc_uplus_@var{m}}
-Compute the sum of the unsigned elements of a vector. The vector is operand 1,
-and the scalar result is stored in the least significant bits of operand 0
+@item @samp{reduc_splus_@var{m}}, @samp{reduc_uplus_@var{m}}
+Compute the sum of the signed/unsigned elements of a vector. The vector is
+operand 1, and the result is stored in the least significant bits of operand 0
 (also a vector). The output and input vector should have the same modes.
+These are legacy optabs, and platforms should prefer to implement
+@samp{reduc_plus_scal_@var{m}}.
+
+@cindex @code{reduc_smin_scal_@var{m}} instruction pattern
+@cindex @code{reduc_smax_scal_@var{m}} instruction pattern
+@item @samp{reduc_smin_scal_@var{m}}, @samp{reduc_smax_scal_@var{m}}
+Find the signed minimum/maximum of the elements of a vector. The vector is
+operand 1, and operand 0 is the scalar result, with mode equal to the mode of
+the elements of the input vector.
+
+@cindex @code{reduc_umin_scal_@var{m}} instruction pattern
+@cindex @code{reduc_umax_scal_@var{m}} instruction pattern
+@item @samp{reduc_umin_scal_@var{m}}, @samp{reduc_umax_scal_@var{m}}
+Find the unsigned minimum/maximum of the elements of a vector. The vector is
+operand 1, and operand 0 is the scalar result, with mode equal to the mode of
+the elements of the input vector.
+
+@cindex @code{reduc_plus_scal_@var{m}} instruction pattern
+@item @samp{reduc_plus_scal_@var{m}}
+Compute the sum of the elements of a vector. The vector is operand 1, and
+operand 0 is the scalar result, with mode equal to the mode of the elements of
+the input vector.
 
 @cindex @code{sdot_prod@var{m}} instruction pattern
 @item 

[PATCH 4/14][AArch64] Use new reduc_plus_scal optabs, inc. for __builtins

2014-09-18 Thread Alan Lawrence
This migrates AArch64 over to the new optab for 'plus' reductions, i.e. so the 
define_expands produce scalars by generating a MOV to a GPR. Effectively, this 
moves the vget_lane inside every arm_neon.h intrinsic, into the inside of the 
define_expand.


Tested: aarch64.exp vect.exp on aarch64-none-elf and aarch64_be-none-elf (full 
check-gcc on next patch for reduc_min/max)


gcc/ChangeLog:

* config/aarch64/aarch64-simd-builtins.def
(reduc_splus_mode/VDQF, reduc_uplus_mode/VDQF, reduc_splus_v4sf):
Remove.
(reduc_plus_scal_mode, reduc_plus_scal_v4sf): New.

* config/aarch64/aarch64-simd.md (reduc_surplus_mode): Remove.
(reduc_splus_mode, reduc_uplus_mode, reduc_plus_scal_mode): New.

(reduc_surplus_mode): Change SUADDV - UNSPEC_ADDV, rename to...
(aarch64_reduc_plus_internalmode): ...this.

(reduc_surplus_v2si): Change SUADDV - UNSPEC_ADDV, rename to...
(aarch64_reduc_plus_internalv2si): ...this.

(reduc_splus_mode/V2F): Rename to...
(aarch64_reduc_plus_internalmode): ...this.

* config/aarch64/iterators.md
(UNSPEC_SADDV, UNSPEC_UADDV, SUADDV): Remove.
(UNSPEC_ADDV): New.
(sur): Remove elements for UNSPEC_SADDV and UNSPEC_UADDV.

* config/aarch64/arm_neon.h (vaddv_s8, vaddv_s16, vaddv_s32, vaddv_u8,
vaddv_u16, vaddv_u32, vaddvq_s8, vaddvq_s16, vaddvq_s32, vaddvq_s64,
vaddvq_u8, vaddvq_u16, vaddvq_u32, vaddvq_u64, vaddv_f32, vaddvq_f32,
vaddvq_f64): Change __builtin_aarch64_reduc_[us]plus_... to
__builtin_aarch64_reduc_plus_scal, remove vget_lane wrapper.diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 4f3bd12c8447e7125dfeba3f06536cdf9acc2440..ae4ab42e3e3df7de4e4b2c5e46a1476a2ed64175 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -248,9 +248,8 @@
   BUILTIN_VSDQ_I_DI (BINOP, cmgtu, 0)
   BUILTIN_VSDQ_I_DI (BINOP, cmtst, 0)
 
-  /* Implemented by reduc_surplus_mode.  */
-  BUILTIN_VALL (UNOP, reduc_splus_, 10)
-  BUILTIN_VDQ (UNOP, reduc_uplus_, 10)
+  /* Implemented by aarch64_reduc_plus_mode.  */
+  BUILTIN_VALL (UNOP, reduc_plus_scal_, 10)
 
   /* Implemented by reduc_maxmin_uns_mode.  */
   BUILTIN_VDQIF (UNOP, reduc_smax_, 10)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index f5fa4aebe4cafe1430b31ca3a89ec5f3698d23bd..23b89584d9ba1d88ff49bfa28d210b325e7dea7f 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1719,25 +1719,74 @@
 
 ;; 'across lanes' add.
 
-(define_insn reduc_surplus_mode
+(define_expand reduc_splus_mode
+  [(match_operand:VALL 0 register_operand =w)
+   (match_operand:VALL 1 register_operand w)]
+  TARGET_SIMD
+  {
+/* Old optab/standard name, should not be used since we are providing
+   newer reduc_plus_scal_mode.  */
+gcc_unreachable ();
+  }
+)
+
+(define_expand reduc_uplus_mode
+  [(match_operand:VALL 0 register_operand =w)
+   (match_operand:VALL 1 register_operand w)]
+  TARGET_SIMD
+  {
+/* Old optab/standard name, should not be used since we are providing
+   newer reduc_plus_scal_mode.  */
+gcc_unreachable ();
+  }
+)
+
+(define_expand reduc_plus_scal_mode
+  [(match_operand:VEL 0 register_operand =w)
+   (unspec:VDQ [(match_operand:VDQ 1 register_operand w)]
+	   UNSPEC_ADDV)]
+  TARGET_SIMD
+  {
+rtx elt = GEN_INT (ENDIAN_LANE_N (MODEmode, 0));
+rtx scratch = gen_reg_rtx (MODEmode);
+emit_insn (gen_aarch64_reduc_plus_internalmode (scratch, operands[1]));
+emit_insn (gen_aarch64_get_lanemode (operands[0], scratch, elt));
+DONE;
+  }
+)
+
+(define_expand reduc_plus_scal_mode
+  [(match_operand:VEL 0 register_operand =w)
+   (match_operand:V2F 1 register_operand w)]
+  TARGET_SIMD
+  {
+rtx elt = GEN_INT (ENDIAN_LANE_N (MODEmode, 0));
+rtx scratch = gen_reg_rtx (MODEmode);
+emit_insn (gen_aarch64_reduc_plus_internalmode (scratch, operands[1]));
+emit_insn (gen_aarch64_get_lanemode (operands[0], scratch, elt));
+DONE;
+  }
+)
+
+(define_insn aarch64_reduc_plus_internalmode
  [(set (match_operand:VDQV 0 register_operand =w)
(unspec:VDQV [(match_operand:VDQV 1 register_operand w)]
-		SUADDV))]
+		UNSPEC_ADDV))]
  TARGET_SIMD
  addVDQV:vp\\t%Vetype0, %1.Vtype
   [(set_attr type neon_reduc_addq)]
 )
 
-(define_insn reduc_surplus_v2si
+(define_insn aarch64_reduc_plus_internalv2si
  [(set (match_operand:V2SI 0 register_operand =w)
(unspec:V2SI [(match_operand:V2SI 1 register_operand w)]
-		SUADDV))]
+		UNSPEC_ADDV))]
  TARGET_SIMD
  addp\\t%0.2s, %1.2s, %1.2s
   [(set_attr type neon_reduc_add)]
 )
 
-(define_insn reduc_splus_mode
+(define_insn aarch64_reduc_plus_internalmode
  [(set (match_operand:V2F 0 register_operand =w)
(unspec:V2F [(match_operand:V2F 1 register_operand w)]
 		   UNSPEC_FADDV))]
@@ 

[PATCH i386 AVX512] [44/n] Add vsgufps insn patterns.

2014-09-18 Thread Kirill Yukhin
Hello,
Patch in the bottom extends AVX-512 shufps.

Bootstrapped.
AVX-512* tests on top of patch-set all pass
under simulator.

Is it ok for trunk?

gcc/
* config/i386/sse.md
(define_expand avx_shufps256mask_expand4_name): Add masking.
(define_insn avx_shufps256_1mask_name): Ditto.
(define_expand sse_shufpsmask_expand4_name): Ditto.
(define_insn sse_shufps_v4sf_mask): New.

--
Thanks, K

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 295f11a..9151063 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -5805,7 +5805,7 @@
(set_attr prefix evex)
(set_attr mode V16SF)])
 
-(define_expand avx_shufps256
+(define_expand avx_shufps256mask_expand4_name
   [(match_operand:V8SF 0 register_operand)
(match_operand:V8SF 1 register_operand)
(match_operand:V8SF 2 nonimmediate_operand)
@@ -5813,25 +5813,28 @@
   TARGET_AVX
 {
   int mask = INTVAL (operands[3]);
-  emit_insn (gen_avx_shufps256_1 (operands[0], operands[1], operands[2],
- GEN_INT ((mask  0)  3),
- GEN_INT ((mask  2)  3),
- GEN_INT (((mask  4)  3) + 8),
- GEN_INT (((mask  6)  3) + 8),
- GEN_INT (((mask  0)  3) + 4),
- GEN_INT (((mask  2)  3) + 4),
- GEN_INT (((mask  4)  3) + 12),
- GEN_INT (((mask  6)  3) + 12)));
+  emit_insn (gen_avx_shufps256_1mask_expand4_name (operands[0],
+operands[1],
+operands[2],
+GEN_INT ((mask  0)  3),
+GEN_INT ((mask  2)  3),
+GEN_INT (((mask  4)  3) 
+ 8),
+GEN_INT (((mask  6)  3) 
+ 8),
+GEN_INT (((mask  0)  3) 
+ 4),
+GEN_INT (((mask  2)  3) 
+ 4),
+GEN_INT (((mask  4)  3) 
+ 12),
+GEN_INT (((mask  6)  3) 
+ 12)
+mask_expand4_args));
   DONE;
 })
 
 ;; One bit in mask selects 2 elements.
-(define_insn avx_shufps256_1
-  [(set (match_operand:V8SF 0 register_operand =x)
+(define_insn avx_shufps256_1mask_name
+  [(set (match_operand:V8SF 0 register_operand =v)
(vec_select:V8SF
  (vec_concat:V16SF
-   (match_operand:V8SF 1 register_operand x)
-   (match_operand:V8SF 2 nonimmediate_operand xm))
+   (match_operand:V8SF 1 register_operand v)
+   (match_operand:V8SF 2 nonimmediate_operand vm))
  (parallel [(match_operand 3  const_0_to_3_operand  )
 (match_operand 4  const_0_to_3_operand  )
 (match_operand 5  const_8_to_11_operand )
@@ -5841,6 +5844,7 @@
 (match_operand 9  const_12_to_15_operand)
 (match_operand 10 const_12_to_15_operand)])))]
   TARGET_AVX
+mask_avx512vl_condition
 (INTVAL (operands[3]) == (INTVAL (operands[7]) - 4)
 INTVAL (operands[4]) == (INTVAL (operands[8]) - 4)
 INTVAL (operands[5]) == (INTVAL (operands[9]) - 4)
@@ -5853,14 +5857,14 @@
   mask |= (INTVAL (operands[6]) - 8)  6;
   operands[3] = GEN_INT (mask);
 
-  return vshufps\t{%3, %2, %1, %0|%0, %1, %2, %3};
+  return vshufps\t{%3, %2, %1, %0mask_operand11|%0mask_operand11, %1, %2, 
%3};
 }
   [(set_attr type sseshuf)
(set_attr length_immediate 1)
-   (set_attr prefix vex)
+   (set_attr prefix mask_prefix)
(set_attr mode V8SF)])
 
-(define_expand sse_shufps
+(define_expand sse_shufpsmask_expand4_name
   [(match_operand:V4SF 0 register_operand)
(match_operand:V4SF 1 register_operand)
(match_operand:V4SF 2 nonimmediate_operand)
@@ -5868,14 +5872,46 @@
   TARGET_SSE
 {
   int mask = INTVAL (operands[3]);
-  emit_insn (gen_sse_shufps_v4sf (operands[0], operands[1], operands[2],
-  GEN_INT ((mask  0)  3),
-  GEN_INT ((mask  2)  3),
-  GEN_INT (((mask  4)  3) + 4),
-  GEN_INT (((mask  6)  3) + 4)));
+  emit_insn (gen_sse_shufps_v4sfmask_expand4_name (operands[0],
+operands[1],
+operands[2],
+GEN_INT ((mask  0)  3),
+GEN_INT ((mask  2)  3),
+GEN_INT (((mask  4)  3) 
+ 4),
+

[PATCH 5/14][AArch64] Use new reduc_[us](min|max)_scal optabs, inc. for builtins

2014-09-18 Thread Alan Lawrence
Similarly to the previous patch (r/2205), this migrates AArch64 to the new 
reduce-to-scalar optabs for min and max. For consistency we apply the same 
treatment to the smax_nan and smin_nan patterns (used for __builtins), even 
though reduc_smin_nan_scal (etc.) is not a standard name.


Tested: check-gcc on aarch64-none-elf and aarch64_be-none-elf.

gcc/ChangeLog:

* config/aarch64/aarch64-simd-builtins.def (reduc_smax_, reduc_smin_,
reduc_umax_, reduc_umin_, reduc_smax_nan_, reduc_smin_nan_): Remove.
(reduc_smax_scal_, reduc_smin_scal_, reduc_umax_scal_,
reduc_umin_scal_, reduc_smax_nan_scal_, reduc_smin_nan_scal_): New.

* config/aarch64/aarch64-simd.md
(reduc_maxmin_uns_mode): Rename VDQV_S variant to...
(reduc_maxmin_uns_internalmode): ...this.
(reduc_maxmin_uns_mode): New (VDQ_BHSI).
(reduc_maxmin_uns_scal_mode): New (*2).

(reduc_maxmin_uns_v2si): Combine with below, renaming...
(reduc_maxmin_uns_mode): Combine V2F with above, renaming...
(reduc_maxmin_uns_internal_mode): ...to this (VDQF).

* config/aarch64/arm_neon.h (vmaxv_f32, vmaxv_s8, vmaxv_s16,
vmaxv_s32, vmaxv_u8, vmaxv_u16, vmaxv_u32, vmaxvq_f32, vmaxvq_f64,
vmaxvq_s8, vmaxvq_s16, vmaxvq_s32, vmaxvq_u8, vmaxvq_u16, vmaxvq_u32,
vmaxnmv_f32, vmaxnmvq_f32, vmaxnmvq_f64, vminv_f32, vminv_s8,
vminv_s16, vminv_s32, vminv_u8, vminv_u16, vminv_u32, vminvq_f32,
vminvq_f64, vminvq_s8, vminvq_s16, vminvq_s32, vminvq_u8, vminvq_u16,
vminvq_u32, vminnmv_f32, vminnmvq_f32, vminnmvq_f64): Update to use
__builtin_aarch64_reduc_..._scal; remove vget_lane wrapper.diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index ae4ab42e3e3df7de4e4b2c5e46a1476a2ed64175..e213b9ce3adfc0c4c50b4dc34f4f1b995d5e8042 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -251,13 +251,13 @@
   /* Implemented by aarch64_reduc_plus_mode.  */
   BUILTIN_VALL (UNOP, reduc_plus_scal_, 10)
 
-  /* Implemented by reduc_maxmin_uns_mode.  */
-  BUILTIN_VDQIF (UNOP, reduc_smax_, 10)
-  BUILTIN_VDQIF (UNOP, reduc_smin_, 10)
-  BUILTIN_VDQ_BHSI (UNOP, reduc_umax_, 10)
-  BUILTIN_VDQ_BHSI (UNOP, reduc_umin_, 10)
-  BUILTIN_VDQF (UNOP, reduc_smax_nan_, 10)
-  BUILTIN_VDQF (UNOP, reduc_smin_nan_, 10)
+  /* Implemented by reduc_maxmin_uns_scal_mode (producing scalar).  */
+  BUILTIN_VDQIF (UNOP, reduc_smax_scal_, 10)
+  BUILTIN_VDQIF (UNOP, reduc_smin_scal_, 10)
+  BUILTIN_VDQ_BHSI (UNOPU, reduc_umax_scal_, 10)
+  BUILTIN_VDQ_BHSI (UNOPU, reduc_umin_scal_, 10)
+  BUILTIN_VDQF (UNOP, reduc_smax_nan_scal_, 10)
+  BUILTIN_VDQF (UNOP, reduc_smin_nan_scal_, 10)
 
   /* Implemented by maxminmode3.
  smax variants map to fmaxnm,
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 23b89584d9ba1d88ff49bfa28d210b325e7dea7f..d4a745be59897b4cb2a0de23adb56b5d79203592 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1828,7 +1828,64 @@
 
 ;; 'across lanes' max and min ops.
 
-(define_insn reduc_maxmin_uns_mode
+(define_expand reduc_maxmin_uns_mode
+  [(match_operand:VDQ_BHSI 0 register_operand)
+   (unspec:VDQ_BHSI [(match_operand:VDQ_BHSI 1 register_operand)]
+		MAXMINV)]
+  TARGET_SIMD
+  {
+/* Old optab/standard name, should not be used since we are providing
+newer reduc_..._scal_mode.  */
+gcc_unreachable ();
+  }
+)
+
+(define_expand reduc_maxmin_uns_mode
+  [(match_operand:VDQF 0 register_operand)
+   (unspec:VDQF [(match_operand:VDQF 1 register_operand)]
+		FMAXMINV)]
+  TARGET_SIMD
+  {
+/* Old optab/standard name, should not be used since we are providing
+newer reduc_..._scal_mode.  */
+gcc_unreachable ();
+  }
+)
+
+;; Template for outputting a scalar, so we can create __builtins which can be
+;; gimple_fold'd to the REDUC_(MAX|MIN)_EXPR tree code.  (This is FP smax/smin).
+(define_expand reduc_maxmin_uns_scal_mode
+  [(match_operand:VEL 0 register_operand)
+   (unspec:VDQF [(match_operand:VDQF 1 register_operand)]
+		FMAXMINV)]
+  TARGET_SIMD
+  {
+rtx elt = GEN_INT (ENDIAN_LANE_N (MODEmode, 0));
+rtx scratch = gen_reg_rtx (MODEmode);
+emit_insn (gen_aarch64_reduc_maxmin_uns_internalmode (scratch,
+			  operands[1]));
+emit_insn (gen_aarch64_get_lanemode (operands[0], scratch, elt));
+DONE;
+  }
+)
+
+;; Likewise for integer cases, signed and unsigned.
+(define_expand reduc_maxmin_uns_scal_mode
+  [(match_operand:VEL 0 register_operand)
+   (unspec:VDQ_BHSI [(match_operand:VDQ_BHSI 1 register_operand)]
+		MAXMINV)]
+  TARGET_SIMD
+  {
+rtx elt = GEN_INT (ENDIAN_LANE_N (MODEmode, 0));
+rtx scratch = gen_reg_rtx (MODEmode);
+emit_insn (gen_aarch64_reduc_maxmin_uns_internalmode (scratch,
+			  operands[1]));
+emit_insn (gen_aarch64_get_lanemode (operands[0], 

[PATCH i386 AVX512] [45/n] Add vshufpd insn patterns.

2014-09-18 Thread Kirill Yukhin
Hello,
This patch supports AVX-512's vshufpd insns.

Bootstrapped.
AVX-512* tests on top of patch-set all pass
under simulator.

Is it ok for trunk?

gcc/
* config/i386/sse.md
(define_expand avx_shufpd256mask_expand4_name): Add masking.
(define_insn avx_shufpd256_1mask_name): Ditto.
(define_expand sse2_shufpdmask_expand4_name): Ditto.
(define_insn sse2_shufpd_v2df_mask): New.

--
Thanks, K

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 9151063..9e0c0e8 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -7790,7 +7790,7 @@
(set_attr prefix evex)
(set_attr mode V8DF)])
 
-(define_expand avx_shufpd256
+(define_expand avx_shufpd256mask_expand4_name
   [(match_operand:V4DF 0 register_operand)
(match_operand:V4DF 1 register_operand)
(match_operand:V4DF 2 nonimmediate_operand)
@@ -7798,25 +7798,28 @@
   TARGET_AVX
 {
   int mask = INTVAL (operands[3]);
-  emit_insn (gen_avx_shufpd256_1 (operands[0], operands[1], operands[2],
-  GEN_INT (mask  1),
-  GEN_INT (mask  2 ? 5 : 4),
-  GEN_INT (mask  4 ? 3 : 2),
-  GEN_INT (mask  8 ? 7 : 6)));
+  emit_insn (gen_avx_shufpd256_1mask_expand4_name (operands[0],
+operands[1],
+operands[2],
+GEN_INT (mask  1),
+GEN_INT (mask  2 ? 5 : 4),
+GEN_INT (mask  4 ? 3 : 2),
+GEN_INT (mask  8 ? 7 : 6)
+mask_expand4_args));
   DONE;
 })
 
-(define_insn avx_shufpd256_1
-  [(set (match_operand:V4DF 0 register_operand =x)
+(define_insn avx_shufpd256_1mask_name
+  [(set (match_operand:V4DF 0 register_operand =v)
(vec_select:V4DF
  (vec_concat:V8DF
-   (match_operand:V4DF 1 register_operand x)
-   (match_operand:V4DF 2 nonimmediate_operand xm))
+   (match_operand:V4DF 1 register_operand v)
+   (match_operand:V4DF 2 nonimmediate_operand vm))
  (parallel [(match_operand 3 const_0_to_1_operand)
 (match_operand 4 const_4_to_5_operand)
 (match_operand 5 const_2_to_3_operand)
 (match_operand 6 const_6_to_7_operand)])))]
-  TARGET_AVX
+  TARGET_AVX  mask_avx512vl_condition
 {
   int mask;
   mask = INTVAL (operands[3]);
@@ -7825,14 +7828,14 @@
   mask |= (INTVAL (operands[6]) - 6)  3;
   operands[3] = GEN_INT (mask);
 
-  return vshufpd\t{%3, %2, %1, %0|%0, %1, %2, %3};
+  return vshufpd\t{%3, %2, %1, %0mask_operand7|%0mask_operand7, %1, %2, 
%3};
 }
   [(set_attr type sseshuf)
(set_attr length_immediate 1)
(set_attr prefix vex)
(set_attr mode V4DF)])
 
-(define_expand sse2_shufpd
+(define_expand sse2_shufpdmask_expand4_name
   [(match_operand:V2DF 0 register_operand)
(match_operand:V2DF 1 register_operand)
(match_operand:V2DF 2 nonimmediate_operand)
@@ -7840,12 +7843,38 @@
   TARGET_SSE2
 {
   int mask = INTVAL (operands[3]);
-  emit_insn (gen_sse2_shufpd_v2df (operands[0], operands[1], operands[2],
-   GEN_INT (mask  1),
-   GEN_INT (mask  2 ? 3 : 2)));
+  emit_insn (gen_sse2_shufpd_v2dfmask_expand4_name (operands[0], operands[1],
+ operands[2], GEN_INT 
(mask  1),
+ GEN_INT (mask  2 ? 3 : 2)
+ mask_expand4_args));
   DONE;
 })
 
+(define_insn sse2_shufpd_v2df_mask
+  [(set (match_operand:V2DF 0 register_operand =v)
+(vec_merge:V2DF
+ (vec_select:V2DF
+   (vec_concat:V4DF
+ (match_operand:V2DF 1 register_operand v)
+ (match_operand:V2DF 2 nonimmediate_operand vm))
+   (parallel [(match_operand 3 const_0_to_1_operand)
+  (match_operand 4 const_2_to_3_operand)]))
+  (match_operand:V2DF 5 vector_move_operand 0C)
+  (match_operand:QI 6 register_operand Yk)))]
+  TARGET_AVX512VL
+{
+  int mask;
+  mask = INTVAL (operands[3]);
+  mask |= (INTVAL (operands[4]) - 2)  1;
+  operands[3] = GEN_INT (mask);
+
+  return vshufpd\t{%3, %2, %1, %0%{%6%}%N5|%0%{6%}%N5, %1, %2, %3};
+}
+  [(set_attr type sseshuf)
+   (set_attr length_immediate 1)
+   (set_attr prefix evex)
+   (set_attr mode V2DF)])
+
 ;; punpcklqdq and punpckhqdq are shorter than shufpd.
 (define_insn avx2_interleave_highv4dimask_name
   [(set (match_operand:V4DI 0 register_operand =v)


[PATCH 6/14][AArch64] Restore gimple_folding of reduction intrinsics

2014-09-18 Thread Alan Lawrence
This gives us back the constant-folding of the neon-intrinsics that was removed 
in the first patch, but is now OK for bigendian too.


bootstrapped on aarch64-none-linux-gnu.
check-gcc on aarch64-none-elf and aarch64_be-none-elf.

gcc/ChangeLog:

* config/aarch64/aarch64.c (TARGET_GIMPLE_FOLD_BUILTIN): Define again.
* config/aarch64/aarch64-builtins.c (aarch64_gimple_fold_builtin):
Restore, enable for bigendian, update to use __builtin..._scal...diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 15eb7c686d95b1d66cbd514500ec29ba074eaa3f..0432d3aa1a515a15b051ba89afec7c0306cb5803 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -1333,9 +1333,6 @@ aarch64_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED, tree *args,
   return NULL_TREE;
 }
 
-/* Handling of reduction operations temporarily removed so as to decouple
-   changes to tree codes from AArch64 NEON Intrinsics.  */
-#if 0
 bool
 aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 {
@@ -1345,19 +1342,6 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   tree fndecl;
   gimple new_stmt = NULL;
 
-  /* The operations folded below are reduction operations.  These are
- defined to leave their result in the 0'th element (from the perspective
- of GCC).  The architectural instruction we are folding will leave the
- result in the 0'th element (from the perspective of the architecture).
- For big-endian systems, these perspectives are not aligned.
-
- It is therefore wrong to perform this fold on big-endian.  There
- are some tricks we could play with shuffling, but the mid-end is
- inconsistent in the way it treats reduction operations, so we will
- end up in difficulty.  Until we fix the ambiguity - just bail out.  */
-  if (BYTES_BIG_ENDIAN)
-return false;
-
   if (call)
 {
   fndecl = gimple_call_fndecl (stmt);
@@ -1369,23 +1353,28 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 			? gimple_call_arg_ptr (stmt, 0)
 			: error_mark_node);
 
+	  /* We use gimple's REDUC_(PLUS|MIN|MAX)_EXPRs for float, signed int
+	 and unsigned int; it will distinguish according to the types of
+	 the arguments to the __builtin.  */
 	  switch (fcode)
 	{
-	  BUILTIN_VALL (UNOP, reduc_splus_, 10)
-		new_stmt = gimple_build_assign_with_ops (
+	  BUILTIN_VALL (UNOP, reduc_plus_scal_, 10)
+	new_stmt = gimple_build_assign_with_ops (
 		REDUC_PLUS_EXPR,
 		gimple_call_lhs (stmt),
 		args[0],
 		NULL_TREE);
 		break;
-	  BUILTIN_VDQIF (UNOP, reduc_smax_, 10)
+	  BUILTIN_VDQIF (UNOP, reduc_smax_scal_, 10)
+	  BUILTIN_VDQ_BHSI (UNOPU, reduc_umax_scal_, 10)
 		new_stmt = gimple_build_assign_with_ops (
 		REDUC_MAX_EXPR,
 		gimple_call_lhs (stmt),
 		args[0],
 		NULL_TREE);
 		break;
-	  BUILTIN_VDQIF (UNOP, reduc_smin_, 10)
+	  BUILTIN_VDQIF (UNOP, reduc_smin_scal_, 10)
+	  BUILTIN_VDQ_BHSI (UNOPU, reduc_umin_scal_, 10)
 		new_stmt = gimple_build_assign_with_ops (
 		REDUC_MIN_EXPR,
 		gimple_call_lhs (stmt),
@@ -1407,7 +1396,6 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 
   return changed;
 }
-#endif
 
 void
 aarch64_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9197ec038b7d40a601c886b846113c50a29cf5e2..e7946fc0b70ced70a4e98caa0a33121f29242aad 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9925,8 +9925,8 @@ aarch64_expand_movmem (rtx *operands)
 #undef TARGET_FRAME_POINTER_REQUIRED
 #define TARGET_FRAME_POINTER_REQUIRED aarch64_frame_pointer_required
 
-//#undef TARGET_GIMPLE_FOLD_BUILTIN
-//#define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin
+#undef TARGET_GIMPLE_FOLD_BUILTIN
+#define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin
 
 #undef TARGET_GIMPLIFY_VA_ARG_EXPR
 #define TARGET_GIMPLIFY_VA_ARG_EXPR aarch64_gimplify_va_arg_expr

Re: Fix i386 FP_TRAPPING_EXCEPTIONS

2014-09-18 Thread Joseph S. Myers
On Thu, 18 Sep 2014, Uros Bizjak wrote:

 OK for mainline and release branches.

I've omitted ia64 from the targets in the testcase in the release branch 
version, given the lack of any definition of FP_TRAPPING_EXCEPTIONS at all 
there.

(I think a definition as (~_fcw  0x3f) should work for ia64, but haven't 
tested that.)

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH 7/14][Testsuite] Add tests of reductions using whole-vector-shifts (multiplication)

2014-09-18 Thread Alan Lawrence
For reduction operations (e.g. multiply) that don't have such a tree code ,or 
where the target platform doesn't define an optab handler for the tree code, we 
can perform the reduction using a series of log(N) shifts (where N = #elements 
in vector), using the VEC_RSHIFT_EXPR=whole-vector-shift tree code (if the 
platform handles the vec_shr_optab).


First stage is to add some tests of non-(min/max/plus) reductions; here, 
multiplies. The first is designed to be non-foldable, so we make sure the 
architectural instructions line up with what the tree codes specify. The second 
is designed to be easily constant-propagated, to test the (currently 
endianness-dependent) constant folding code.


In lib/target-supports.exp, I've defined a new 
check_effective_target_whole_vector_shift, which I intended to define to true 
for platforms with the vec_shr optab. However, I've not managed to make this 
test pass on PowerPC - even with -maltivec, -fdump-tree-vect-details gives me a 
message about the target not supporting vector multiplication - so I've omitted 
PowerPC from the whole_vector_shift. This doesn't feel right, suggestions 
welcomed from PowerPC maintainers?


Tests passing on arm-none-eabi and x86_64-none-linux-gnu;
also verified the scan-tree-dump part works on ia64-none-linux-gnu (by compiling 
to assembly only).
(Tests are not run on AArch64, because we have no vec_shr_optab at this point; 
PowerPC, as above; or MIPS, as check_effective_target_vect_int_mult yields 0.)


gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_whole_vector_shift):
New.

* gcc.dg/vect/vect-reduc-mul_1.c: New test.
* gcc.dg/vect/vect-reduc-mul_2.c: New test.
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-mul_1.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-mul_1.c
new file mode 100644
index ..44f026ff9b561bcf314224c44d51bdd19448851b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-mul_1.c
@@ -0,0 +1,36 @@
+/* { dg-require-effective-target vect_int_mult } */
+/* { dg-require-effective-target whole_vector_shift } */
+
+/* Write a reduction loop to be reduced using vector shifts.  */
+
+extern void abort(void);
+
+unsigned char in[16];
+
+int
+main (unsigned char argc, char **argv)
+{
+  unsigned char i = 0;
+  unsigned char sum = 1;
+
+  for (i = 0; i  16; i++)
+in[i] = i + i + 1;
+
+  /* Prevent constant propagation of the entire loop below.  */
+  asm volatile ( : : : memory);
+
+  for (i = 0; i  16; i++)
+sum *= in[i];
+
+  if (sum != 33)
+{
+  __builtin_printf(Failed %d\n, sum);
+  abort();
+}
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump Reduce using vector shifts vect } } */
+/* { dg-final { cleanup-tree-dump vect } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-mul_2.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-mul_2.c
new file mode 100644
index ..414fba7a5c96c4dd89030682492edb57ebba3b16
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-mul_2.c
@@ -0,0 +1,32 @@
+/* { dg-require-effective-target vect_int_mult } */
+/* { dg-require-effective-target whole_vector_shift } */
+
+/* Write a reduction loop to be reduced using vector shifts and folded.  */
+
+extern void abort(void);
+
+int
+main (unsigned char argc, char **argv)
+{
+  unsigned char in[16];
+  unsigned char i = 0;
+  unsigned char sum = 1;
+
+  for (i = 0; i  16; i++)
+in[i] = i + i + 1;
+
+  for (i = 0; i  16; i++)
+sum *= in[i];
+
+  if (sum != 33)
+{
+  __builtin_printf(Failed %d\n, sum);
+  abort();
+}
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump Reduce using vector shifts vect } } */
+/* { dg-final { cleanup-tree-dump vect } } */
+
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index fa5137ea472e1773be60759caad32bbc7ab4c551..0f4bebd533c9268adfcd4ed250f06fca825c92b1 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3320,6 +3320,22 @@ proc check_effective_target_vect_shift { } {
 return $et_vect_shift_saved
 }
 
+proc check_effective_target_whole_vector_shift { } {
+if { [istarget x86_64-*-*]
+	 || [istarget ia64-*-*]
+	 || ([check_effective_target_arm32]
+	  [check_effective_target_arm_little_endian])
+	 || ([istarget mips*-*-*]
+	  [check_effective_target_mips_loongson]) } {
+	set answer 1
+} else {
+	set answer 0
+}
+
+verbose check_effective_target_vect_long: returning $answer 2
+return $answer
+}
+
 # Return 1 if the target supports vector bswap operations.
 
 proc check_effective_target_vect_bswap { } {

[PATCH 8/14][Testsuite] Add tests of reductions using whole-vector-shifts (ior)

2014-09-18 Thread Alan Lawrence
These are like the previous patch, but using | rather than * - I was unable to 
get the previous test to pass on PowerPC and MIPS.


I note there is no inherent vector operation here - a bitwise OR across a word, 
and a reduction via shifts using scalar (not vector) ops would be all that's 
necessary. However, GCC doesn't exploit this possibility at present, and I don't 
have any plans at present to add such myself.


Passing on x86_64-linux-gnu, aarch64-none-elf, aarch64_be-none-elf, 
arm-none-eabi.
The 'scan-tree-dump' part passes on mips64 and powerpc (although the latter is 
disabled as check_effective_target_whole_vector_shift gives 0, as per previous 
patch)


gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-reduc-or_1.c: New test.
* gcc.dg/vect/vect-reduc-or_2.c: Likewise.
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c
new file mode 100644
index ..4e1a8577ce21aad539fca7cf07700b99575dfab0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c
@@ -0,0 +1,35 @@
+/* { dg-require-effective-target whole_vector_shift } */
+
+/* Write a reduction loop to be reduced using vector shifts.  */
+
+extern void abort(void);
+
+unsigned char in[16] __attribute__((__aligned__(16)));
+
+int
+main (unsigned char argc, char **argv)
+{
+  unsigned char i = 0;
+  unsigned char sum = 1;
+
+  for (i = 0; i  16; i++)
+in[i] = (i + i + 1)  0xfd;
+
+  /* Prevent constant propagation of the entire loop below.  */
+  asm volatile ( : : : memory);
+
+  for (i = 0; i  16; i++)
+sum |= in[i];
+
+  if (sum != 29)
+{
+  __builtin_printf(Failed %d\n, sum);
+  abort();
+}
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump Reduce using vector shifts vect } } */
+/* { dg-final { cleanup-tree-dump vect } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c
new file mode 100644
index ..e25467e59221adc09cbe0bb7548842902a4bf6da
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c
@@ -0,0 +1,31 @@
+/* { dg-require-effective-target whole_vector_shift } */
+
+/* Write a reduction loop to be reduced using vector shifts and folded.  */
+
+extern void abort(void);
+
+int
+main (unsigned char argc, char **argv)
+{
+  unsigned char in[16] __attribute__((aligned(16)));
+  unsigned char i = 0;
+  unsigned char sum = 1;
+
+  for (i = 0; i  16; i++)
+in[i] = (i + i + 1)  0xfd;
+
+  for (i = 0; i  16; i++)
+sum |= in[i];
+
+  if (sum != 29)
+{
+  __builtin_printf(Failed %d\n, sum);
+  abort();
+}
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump Reduce using vector shifts vect } } */
+/* { dg-final { cleanup-tree-dump vect } } */
+

[PATCH 9/14] Enforce whole-vector-shifts to always be by a whole number of elements

2014-09-18 Thread Alan Lawrence
The VEC_RSHIFT_EXPR is only ever used by the vectorizer in tree-vect-loop.c 
(vect_create_epilog_for_reduction), to shift the vector by a whole number of 
elements. The tree code allows more general shifts but only for integral types. 
This only causes pain and difficulty for backends (particularly for backends 
with different endiannesses), and enforcing that restriction for integral types 
too does no harm.


bootstrapped on aarch64-none-linux-gnu and x86-64-none-linux-gnu
check-gcc on aarch64-none-elf and x86_64-none-linux-gnu

gcc/ChangeLog:

* tree-cfg.c (verify_gimple_assign_binary): for VEC_RSHIFT_EXPR (and
VEC_LSHIFT_EXPR), require shifts to be by a whole number of elements
for all types, rather than only non-integral types.

* tree.def (VEC_LSHIFT_EXPR, VEC_RSHIFT_EXPR): Update comment.

* doc/md.texi (vec_shl_m, vec_shr_m): Update comment.

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 3f5fd6f0e3ac3fcc30f6c961e3e2709a35f4d413..a78aea2f3f6e35b0d89719a42d734e62a2f5bd65 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4888,7 +4888,8 @@ of a wider mode.)
 @item @samp{vec_shl_@var{m}}, @samp{vec_shr_@var{m}}
 Whole vector left/right shift in bits.
 Operand 1 is a vector to be shifted.
-Operand 2 is an integer shift amount in bits.
+Operand 2 is an integer shift amount in bits, which must be a multiple of the
+element size.
 Operand 0 is where the resulting shifted vector is stored.
 The output and input vectors should have the same modes.
 
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 49986cc40758bb5998e395c727142e75f7d6e9f4..1ea2e256b09b25331810a57a9c35e5cc875d0404 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3667,14 +3667,11 @@ verify_gimple_assign_binary (gimple stmt)
 	debug_generic_expr (rhs2_type);
 	return true;
 	  }
-	/* For shifting a vector of non-integral components we
-	   only allow shifting by a constant multiple of the element size.  */
-	if (!INTEGRAL_TYPE_P (TREE_TYPE (rhs1_type))
-	 (TREE_CODE (rhs2) != INTEGER_CST
-		|| !div_if_zero_remainder (rhs2,
-	   TYPE_SIZE (TREE_TYPE (rhs1_type)
+	/* All shifts must be by a constant multiple of the element size.  */
+	if (TREE_CODE (rhs2) != INTEGER_CST
+	|| !div_if_zero_remainder (rhs2, TYPE_SIZE (TREE_TYPE (rhs1_type
 	  {
-	error (non-element sized vector shift of floating point vector);
+	error (non-element sized vector shift);
 	return true;
 	  }
 
diff --git a/gcc/tree.def b/gcc/tree.def
index e9af52e554babb100d49ea14f47c805cd5024949..5406ffe67c53ff3f12920ca8c965cf0740a079c2 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1240,7 +1240,8 @@ DEFTREECODE (FMA_EXPR, fma_expr, tcc_expression, 3)
 
 /* Whole vector left/right shift in bits.
Operand 0 is a vector to be shifted.
-   Operand 1 is an integer shift amount in bits.  */
+   Operand 1 is an integer shift amount in bits, which must be a multiple of the
+   element size.  */
 DEFTREECODE (VEC_LSHIFT_EXPR, vec_lshift_expr, tcc_binary, 2)
 DEFTREECODE (VEC_RSHIFT_EXPR, vec_rshift_expr, tcc_binary, 2)
 

v3 PATCH to __dynamic_cast to handle undefined behavior

2014-09-18 Thread Jason Merrill
A while back a customer complained about this program crashing, which I 
explained was because the behavior is undefined; it crashes specifically 
because when we try to do the dynamic_cast the F vptr is pointing to a 
construction vtable for E-in-F which doesn't have a vbase offset entry 
for C, so when dynamic_cast goes looking for it it instead loads some 
random value from whatever happens to be just before that vtable.  But 
while looking at Jakub's -fsanitize=vptr work, it occurred to me that it 
would be easy and pretty cheap to catch this situation in dynamic_cast: 
if the whole object disagrees with the original subobject about what 
type it is, just fail.


Jakub, I think ubsan could use the same approach to check the argument.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit e5f1ca3b03c352380dda95474e06f525ff9849be
Author: Jason Merrill ja...@redhat.com
Date:   Wed Sep 17 14:47:20 2014 -0400

	* libsupc++/dyncast.cc (__dynamic_cast): Handle mid-destruction
	dynamic_cast more gracefully.

diff --git a/gcc/testsuite/g++.dg/rtti/dyncast7.C b/gcc/testsuite/g++.dg/rtti/dyncast7.C
new file mode 100644
index 000..deb4397
--- /dev/null
+++ b/gcc/testsuite/g++.dg/rtti/dyncast7.C
@@ -0,0 +1,28 @@
+// I think this dynamic_cast has undefined behavior when destroying E::o
+// because we're the F period of destruction has started and ap doesn't
+// point to the object currently being destroyed--but the reasonable
+// options are success or failure, not SEGV.
+
+// { dg-do run }
+
+extern C void abort();
+
+struct A { virtual ~A(); };
+struct B { virtual ~B() { } };
+struct C : B, A { };
+struct E : virtual B { A o; };
+struct F : virtual C, virtual E { };
+
+A* ap;
+C* cp;
+
+A::~A() {
+  C* cp2 = dynamic_castC*(ap);
+  if (cp2 != cp  cp2 != 0)
+abort();
+}
+
+int main() {
+  F f;
+  ap = cp = f;
+}
diff --git a/libstdc++-v3/libsupc++/dyncast.cc b/libstdc++-v3/libsupc++/dyncast.cc
index 2bcb7dd..9f6adef 100644
--- a/libstdc++-v3/libsupc++/dyncast.cc
+++ b/libstdc++-v3/libsupc++/dyncast.cc
@@ -55,6 +55,18 @@ __dynamic_cast (const void *src_ptr,// object started from
   adjust_pointer void (src_ptr, prefix-whole_object);
   const __class_type_info *whole_type = prefix-whole_type;
   __class_type_info::__dyncast_result result;
+
+  // If the whole object vptr doesn't refer to the whole object type, we're
+  // in the middle of constructing a primary base, and src is a separate
+  // base.  This has undefined behavior and we can't find anything outside
+  // of the base we're actually constructing, so fail now rather than
+  // segfault later trying to use a vbase offset that doesn't exist.
+  const void *whole_vtable = *static_cast const void *const * (whole_ptr);
+  const vtable_prefix *whole_prefix =
+adjust_pointer vtable_prefix (whole_vtable,
+-offsetof (vtable_prefix, origin));
+  if (whole_prefix-whole_type != whole_type)
+return NULL;
   
   whole_type-__do_dyncast (src2dst, __class_type_info::__contained_public,
 dst_type, whole_ptr, src_type, src_ptr, result);


[PATCH 10/14][AArch64] Implement vec_shr optab

2014-09-18 Thread Alan Lawrence
This allows reduction of non-(plus|min|max) operations using log_2(N) shifts 
rather than N vec_extracts; e.g. for example code


int
main (unsigned char argc, char **argv)
{
  unsigned char in[16] = { 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31 };
  unsigned char i = 0;
  unsigned char sum = 1;

  /* Prevent constant propagation of the entire loop below.  */
  asm volatile ( : : : memory);

  for (i = 0; i  16; i++)
sum *= in[i];

  if (sum != 33)
  __builtin_printf(Failed %d\n, sum);
}

(a simplified, less-general version of vect-reduc-mul_1.c) this gives

main:
ldr q0, .LC0
sub sp, sp, #16
str q0, [sp]
ldr q1, [sp]
moviv0.4s, 0
ext v2.16b, v1.16b, v0.16b, #8
mul v1.16b, v1.16b, v2.16b
ext v2.16b, v1.16b, v0.16b, #4
mul v1.16b, v2.16b, v1.16b
ext v2.16b, v1.16b, v0.16b, #2
mul v1.16b, v2.16b, v1.16b
ext v0.16b, v1.16b, v0.16b, #1
mul v0.16b, v0.16b, v1.16b
umovw1, v0.b[0]
cmp w1, 33
beq .L2
...

rather than previously:

main:
ldr q0, .LC0
sub sp, sp, #16
str q0, [sp]
ldr d1, [sp]
ldr d0, [sp, 8]
mul v0.8b, v0.8b, v1.8b
umovw0, v0.b[1]
umovw3, v0.b[0]
umovw2, v0.b[2]
umovw7, v0.b[3]
umovw6, v0.b[4]
mul w3, w0, w3
umovw5, v0.b[5]
umovw4, v0.b[6]
umovw1, v0.b[7]
mul w3, w3, w2
mul w2, w3, w7
mul w2, w2, w6
mul w0, w2, w5
mul w0, w0, w4
mul w1, w0, w1
uxtbw1, w1
cmp w1, 33
beq .L2
...


Tested check-gcc on aarch64-none-elf and aarch64_be-none-elf. (Including new 
tests from previous patches.)


gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (vec_shrmode): New (*2).

gcc/testsuite/ChangeLog:
* lib/target_supports.exp (check_effective_target_whole_vector_shift):
Add aarch64*-*-*.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index d4a745be59897b4cb2a0de23adb56b5d79203592..3fcf809113d73b37a95653b8c2be432478d2bc1e 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -770,6 +770,45 @@
   }
 )
 
+;; For 64-bit modes we use ushl/r, as this does not require a SIMD zero.
+(define_insn vec_shr_mode
+  [(set (match_operand:VD 0 register_operand =w)
+(lshiftrt:VD (match_operand:VD 1 register_operand w)
+		 (match_operand:SI 2 immediate_operand i)))]
+  TARGET_SIMD
+  ushr %d0, %d1, %2
+  [(set_attr type neon_shift_imm)]
+)
+
+(define_expand vec_shr_mode
+  [(set (match_operand:VQ 0 register_operand =w)
+(lshiftrt:VQ (match_operand:VQ 1 register_operand w)
+		  (match_operand:SI 2 immediate_operand i)))]
+  TARGET_SIMD
+{
+  HOST_WIDE_INT num_bits = INTVAL (operands[2]);
+  HOST_WIDE_INT elem_bits = GET_MODE_BITSIZE (GET_MODE_INNER (MODEmode));
+  rtx zero_reg = force_reg (MODEmode, CONST0_RTX (MODEmode));
+
+  gcc_assert (GET_MODE_BITSIZE (MODEmode) == 128);
+  gcc_assert (num_bits % elem_bits == 0);
+
+  if (num_bits == 0)
+{
+  emit_move_insn (operands[0], operands[1]);
+  DONE;
+}
+  else if (num_bits == 128)
+{
+  emit_move_insn (operands[0], CONST0_RTX (MODEmode));
+  DONE;
+}
+
+  emit_insn (gen_aarch64_extmode (operands[0], operands[1], zero_reg,
+		  GEN_INT (num_bits / elem_bits)));
+  DONE;
+})
+
 (define_insn aarch64_simd_vec_setv2di
   [(set (match_operand:V2DI 0 register_operand =w,w)
 (vec_merge:V2DI
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 5e40f5fcdfc95e41e804075bb5daa7030eb9bc66..720cc345bf6a76470cc85116d7b3365be07caa97 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3323,6 +3323,7 @@ proc check_effective_target_vect_shift { } {
 proc check_effective_target_whole_vector_shift { } {
 if { [istarget x86_64-*-*]
 	 || [istarget ia64-*-*]
+	 || [istarget aarch64*-*-*]
 	 || ([check_effective_target_arm32]
 	  [check_effective_target_arm_little_endian])
 	 || ([istarget mips*-*-*]

[PATCH 11/14] Remove VEC_LSHIFT_EXPR and vec_shl_optab

2014-09-18 Thread Alan Lawrence
The VEC_LSHIFT_EXPR tree code, and the corresponding vec_shl_optab, seem to have 
been added for completeness, providing a counterpart to VEC_RSHIFT_EXPR and 
vec_shr_optab. However, whereas VEC_RSHIFT_EXPRs are generated (only) by the 
vectorizer, VEC_LSHIFT_EXPR expressions are not generated at all, so there seems 
little point in maintaining it.


Bootstrapped on x86_64-unknown-linux-gnu.
aarch64.exp+vect.exp on aarch64-none-elf and aarch64_be-none-elf.

gcc/ChangeLog:

* expr.c (expand_expr_real_2): Remove code handling VEC_LSHIFT_EXPR.
* fold-const.c (const_binop): Likewise.
* cfgexpand.c (expand_debug_expr): Likewise.
* tree-inline.c (estimate_operator_cost, dump_generic_node,
op_code_prio, op_symbol_code): Likewise.
* tree-vect-generic.c (expand_vector_operations_1): Likewise.
* optabs.c (optab_for_tree_code): Likewise.
(expand_vec_shift_expr): Likewise, update comment.
* tree.def: Delete VEC_LSHIFT_EXPR, remove comment.
* optabs.h (expand_vec_shift_expr): Remove comment re. VEC_LSHIFT_EXPR.
* optabs.def: Remove vec_shl_optab.
* doc/md.texi: Remove references to vec_shr_m.diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index f6da5d632f441544fdacafc266e9cf17083a825a..6b46b08538c01190215a174773dfcb1109134873 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -4592,7 +4592,6 @@ expand_debug_expr (tree exp)
 case REDUC_MIN_EXPR:
 case REDUC_PLUS_EXPR:
 case VEC_COND_EXPR:
-case VEC_LSHIFT_EXPR:
 case VEC_PACK_FIX_TRUNC_EXPR:
 case VEC_PACK_SAT_EXPR:
 case VEC_PACK_TRUNC_EXPR:
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index a78aea2f3f6e35b0d89719a42d734e62a2f5bd65..f94e0f62c622d43e2df0d0619fb1eba74c415165 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4883,10 +4883,9 @@ operand 1. Add operand 1 to operand 2 and place the widened result in
 operand 0. (This is used express accumulation of elements into an accumulator
 of a wider mode.)
 
-@cindex @code{vec_shl_@var{m}} instruction pattern
 @cindex @code{vec_shr_@var{m}} instruction pattern
-@item @samp{vec_shl_@var{m}}, @samp{vec_shr_@var{m}}
-Whole vector left/right shift in bits.
+@item @samp{vec_shr_@var{m}}
+Whole vector right shift in bits.
 Operand 1 is a vector to be shifted.
 Operand 2 is an integer shift amount in bits, which must be a multiple of the
 element size.
diff --git a/gcc/expr.c b/gcc/expr.c
index 11930ca121e4e1f3807261a2e5b0ca4f6723176d..30ea87af3ef102d7071c6c29db37df875af316f5 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9053,7 +9053,6 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 	return temp;
   }
 
-case VEC_LSHIFT_EXPR:
 case VEC_RSHIFT_EXPR:
   {
 	target = expand_vec_shift_expr (ops, target);
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index b8baa94d37a74ebb824e2a4d03f2a10befcdf749..bd4ba5f0c64c710df9fa36d4059f7b08e949fae0 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -1406,8 +1406,7 @@ const_binop (enum tree_code code, tree arg1, tree arg2)
   int count = TYPE_VECTOR_SUBPARTS (type), i;
   tree *elts = XALLOCAVEC (tree, count);
 
-  if (code == VEC_LSHIFT_EXPR
-	  || code == VEC_RSHIFT_EXPR)
+  if (code == VEC_RSHIFT_EXPR)
 	{
 	  if (!tree_fits_uhwi_p (arg2))
 	return NULL_TREE;
@@ -1419,11 +1418,10 @@ const_binop (enum tree_code code, tree arg1, tree arg2)
 	  if (shiftc = outerc || (shiftc % innerc) != 0)
 	return NULL_TREE;
 	  int offset = shiftc / innerc;
-	  /* The direction of VEC_[LR]SHIFT_EXPR is endian dependent.
-	 For reductions, compiler emits VEC_RSHIFT_EXPR always,
-	 for !BYTES_BIG_ENDIAN picks first vector element, but
-	 for BYTES_BIG_ENDIAN last element from the vector.  */
-	  if ((code == VEC_RSHIFT_EXPR) ^ (!BYTES_BIG_ENDIAN))
+	  /* The direction of VEC_RSHIFT_EXPR is endian dependent.
+	 For reductions, if !BYTES_BIG_ENDIAN then compiler picks first
+	 vector element, but last element if BYTES_BIG_ENDIAN.  */
+	  if (BYTES_BIG_ENDIAN)
 	offset = -offset;
 	  tree zero = build_zero_cst (TREE_TYPE (type));
 	  for (i = 0; i  count; i++)
diff --git a/gcc/optabs.c b/gcc/optabs.c
index e422bcce18d06a39b26547b510c35858efc2303e..9c5b5daa6f2b51bda5ba92fcd61534f1dd55e646 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -515,9 +515,6 @@ optab_for_tree_code (enum tree_code code, const_tree type,
 case REDUC_PLUS_EXPR:
   return reduc_plus_scal_optab;
 
-case VEC_LSHIFT_EXPR:
-  return vec_shl_optab;
-
 case VEC_RSHIFT_EXPR:
   return vec_shr_optab;
 
@@ -765,7 +762,7 @@ force_expand_binop (enum machine_mode mode, optab binoptab,
   return true;
 }
 
-/* Generate insns for VEC_LSHIFT_EXPR, VEC_RSHIFT_EXPR.  */
+/* Generate insns for VEC_RSHIFT_EXPR.  */
 
 rtx
 expand_vec_shift_expr (sepops ops, rtx target)
@@ -776,21 +773,10 @@ expand_vec_shift_expr (sepops ops, rtx target)
   enum machine_mode mode = TYPE_MODE (ops-type);
   tree vec_oprnd = 

[PATCH 12/14][Vectorizer] Redefine VEC_RSHIFT_EXPR and vec_shr_optab as endianness-neutral

2014-09-18 Thread Alan Lawrence
The direction of VEC_RSHIFT_EXPR has been endian-dependent, contrary to the 
general principles of tree. This patch updates fold-const and the vectorizer 
(the only place where such expressions are created), such that VEC_RSHIFT_EXPR 
always shifts towards element 0.


The tree code still maps directly onto the vec_shr_optab, and so this patch 
*will break any bigendian platform defining the vec_shr optab*.

-- For AArch64_be, patch follows next in series;
-- For PowerPC, I think patch/rfc 15 should fix, please inspect;
-- For MIPS, I think patch/rfc 16 should fix, please inspect.

gcc/ChangeLog:

* fold-const.c (const_binop): VEC_RSHIFT_EXPR always shifts towards
element 0.

* tree-vect-loop.c (vect_create_epilog_for_reduction): always extract
the result of a reduction with vector shifts from element 0.

* tree.def (VEC_RSHIFT_EXPR, VEC_LSHIFT_EXPR): Comment shift direction.

* doc/md.texi (vec_shr_m, vec_shl_m): Document shift direction.

Testing Done:

Bootstrap and check-gcc on x86_64-none-linux-gnu; check-gcc on aarch64-none-elf.
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index f94e0f62c622d43e2df0d0619fb1eba74c415165..a2e8f297fbdd69dfec23e6e0769a21917b06b5c7 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4885,7 +4885,7 @@ of a wider mode.)
 
 @cindex @code{vec_shr_@var{m}} instruction pattern
 @item @samp{vec_shr_@var{m}}
-Whole vector right shift in bits.
+Whole vector right shift in bits, i.e. towards element 0.
 Operand 1 is a vector to be shifted.
 Operand 2 is an integer shift amount in bits, which must be a multiple of the
 element size.
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index bd4ba5f0c64c710df9fa36d4059f7b08e949fae0..2a4fafa1b0634edd7a56f2484dec3a51a4699222 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -1418,15 +1418,10 @@ const_binop (enum tree_code code, tree arg1, tree arg2)
 	  if (shiftc = outerc || (shiftc % innerc) != 0)
 	return NULL_TREE;
 	  int offset = shiftc / innerc;
-	  /* The direction of VEC_RSHIFT_EXPR is endian dependent.
-	 For reductions, if !BYTES_BIG_ENDIAN then compiler picks first
-	 vector element, but last element if BYTES_BIG_ENDIAN.  */
-	  if (BYTES_BIG_ENDIAN)
-	offset = -offset;
 	  tree zero = build_zero_cst (TREE_TYPE (type));
 	  for (i = 0; i  count; i++)
 	{
-	  if (i + offset  0 || i + offset = count)
+	  if (i + offset = count)
 		elts[i] = zero;
 	  else
 		elts[i] = VECTOR_CST_ELT (arg1, i + offset);
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index d0a29d312bfd9a7eb552d937e3c64cf9b30d558a..016e2c1fc839fc4d1c97caaa38064fb8bbb510d8 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -3860,7 +3860,7 @@ vect_create_epilog_for_reduction (vectree vect_defs, gimple stmt,
   gimple epilog_stmt = NULL;
   enum tree_code code = gimple_assign_rhs_code (stmt);
   gimple exit_phi;
-  tree bitsize, bitpos;
+  tree bitsize;
   tree adjustment_def = NULL;
   tree vec_initial_def = NULL;
   tree reduction_op, expr, def;
@@ -4371,14 +4371,8 @@ vect_create_epilog_for_reduction (vectree vect_defs, gimple stmt,
 dump_printf_loc (MSG_NOTE, vect_location,
 			 extract scalar result\n);
 
-  if (BYTES_BIG_ENDIAN)
-bitpos = size_binop (MULT_EXPR,
- bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) - 1),
- TYPE_SIZE (scalar_type));
-  else
-bitpos = bitsize_zero_node;
-
-  rhs = build3 (BIT_FIELD_REF, scalar_type, new_temp, bitsize, bitpos);
+  rhs = build3 (BIT_FIELD_REF, scalar_type,
+		new_temp, bitsize, bitsize_zero_node);
   epilog_stmt = gimple_build_assign (new_scalar_dest, rhs);
   new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
   gimple_assign_set_lhs (epilog_stmt, new_temp);
diff --git a/gcc/tree.def b/gcc/tree.def
index ff56bfc18bc00e8dac2dfc072fd4fa878a0f2a04..90bc27fde303e1606baac858738a7a86a517573b 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1238,7 +1238,7 @@ DEFTREECODE (WIDEN_LSHIFT_EXPR, widen_lshift_expr, tcc_binary, 2)
before adding operand three.  */
 DEFTREECODE (FMA_EXPR, fma_expr, tcc_expression, 3)
 
-/* Whole vector right shift in bits.
+/* Whole vector right shift in bits, i.e. towards element 0.
Operand 0 is a vector to be shifted.
Operand 1 is an integer shift amount in bits, which must be a multiple of the
element size.  */

Re: [C++ Patch] PR 62232

2014-09-18 Thread Jason Merrill

OK.

Jason


[PATCH 13/14][AArch64_be] Fix vec_shr pattern to correctly implement endianness-neutral optab

2014-09-18 Thread Alan Lawrence
The previous patch broke aarch64_be by redefining VEC_RSHIFT_EXPR / 
vec_shr_optab to always shift the vector towards gcc's element 0. This fixes 
aarch64_be to do that.


check-gcc on aarch64-none-elf (no changes) and aarch64_be-none-elf (fixes all 
regressions produced by previous patch, i.e. no regressions from before 
redefining vec_shr).



gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (vec_shr_mode *2): Fix bigendian.


diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 3fcf809113d73b37a95653b8c2be432478d2bc1e..e45eddbda7528cfbb4b0953b2c9934c5408d2f6d 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -776,7 +776,12 @@
 (lshiftrt:VD (match_operand:VD 1 register_operand w)
 		 (match_operand:SI 2 immediate_operand i)))]
   TARGET_SIMD
-  ushr %d0, %d1, %2
+  {
+if (BYTES_BIG_ENDIAN)
+  return ushl %d0, %d1, %2;
+else
+  return ushr %d0, %d1, %2;
+  }
   [(set_attr type neon_shift_imm)]
 )
 
@@ -804,6 +809,14 @@
   DONE;
 }
 
+  if (BYTES_BIG_ENDIAN)
+{
+  rtx temp = operands[1];
+  operands[1] = zero_reg;
+  zero_reg = temp;
+  num_bits = 128 - num_bits;
+}
+
   emit_insn (gen_aarch64_extmode (operands[0], operands[1], zero_reg,
 		  GEN_INT (num_bits / elem_bits)));
   DONE;

[PATCH 14/14][Vectorizer] Tidy up vect_create_epilog / use_scalar_result

2014-09-18 Thread Alan Lawrence
Following earlier patches, vect_create_epilog_for_reduction contains exactly one 
case where extract_scalar_result==true. Hence, move the code 'if 
(extract_scalar_result)' there, and tidy-up/remove some variables.


bootstrapped on x86_64-none-linux-gnu + check-gcc + check-g++.

gcc/ChangeLog:

* tree-vect-loop.c (vect_create_epilog_for_reduction): Move code for
'if (extract_scalar_result)' to the only place that it is true.diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 016e2c1fc839fc4d1c97caaa38064fb8bbb510d8..62b279e4d29d1fdfbfbd4e606fc8be9d608d3707 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -3867,7 +3867,6 @@ vect_create_epilog_for_reduction (vectree vect_defs, gimple stmt,
   tree orig_name, scalar_result;
   imm_use_iterator imm_iter, phi_imm_iter;
   use_operand_p use_p, phi_use_p;
-  bool extract_scalar_result = false;
   gimple use_stmt, orig_stmt, reduction_phi = NULL;
   bool nested_in_vect_loop = false;
   auto_vecgimple new_phis;
@@ -4235,6 +4234,8 @@ vect_create_epilog_for_reduction (vectree vect_defs, gimple stmt,
   Create:  va = vop va, va'
 }  */
 
+  tree rhs;
+
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location,
 			 Reduce using vector shifts\n);
@@ -4260,7 +4261,20 @@ vect_create_epilog_for_reduction (vectree vect_defs, gimple stmt,
   gsi_insert_before (exit_gsi, epilog_stmt, GSI_SAME_STMT);
 }
 
-  extract_scalar_result = true;
+	  /* 2.4  Extract the final scalar result.  Create:
+	 s_out3 = extract_field v_out2, bitpos  */
+
+	  if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 extract scalar result\n);
+
+	  rhs = build3 (BIT_FIELD_REF, scalar_type, new_temp,
+			bitsize, bitsize_zero_node);
+	  epilog_stmt = gimple_build_assign (new_scalar_dest, rhs);
+	  new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
+	  gimple_assign_set_lhs (epilog_stmt, new_temp);
+	  gsi_insert_before (exit_gsi, epilog_stmt, GSI_SAME_STMT);
+	  scalar_results.safe_push (new_temp);
 }
   else
 {
@@ -4355,30 +4369,8 @@ vect_create_epilog_for_reduction (vectree vect_defs, gimple stmt,
   else
 /* Not SLP - we have one scalar to keep in SCALAR_RESULTS.  */
 scalar_results.safe_push (new_temp);
-
-  extract_scalar_result = false;
 }
 }
-
-  /* 2.4  Extract the final scalar result.  Create:
-  s_out3 = extract_field v_out2, bitpos  */
-
-  if (extract_scalar_result)
-{
-  tree rhs;
-
-  if (dump_enabled_p ())
-dump_printf_loc (MSG_NOTE, vect_location,
-			 extract scalar result\n);
-
-  rhs = build3 (BIT_FIELD_REF, scalar_type,
-		new_temp, bitsize, bitsize_zero_node);
-  epilog_stmt = gimple_build_assign (new_scalar_dest, rhs);
-  new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
-  gimple_assign_set_lhs (epilog_stmt, new_temp);
-  gsi_insert_before (exit_gsi, epilog_stmt, GSI_SAME_STMT);
-  scalar_results.safe_push (new_temp);
-}
   
 vect_finalize_reduction:
 

parallel check output changes?

2014-09-18 Thread Andrew MacLeod
Has the changes that have gone into the check parallelization made the 
.sum file non-deterministic?
I'm seeing a lot of small hunks in different orders which cause my 
comparison scripts to show big differences.
I haven't been paying attention to the nature of the make check changes 
so Im not sure if this is expected...


Or is this something else?  Its the same code base between runs, just 
with a few changes made to some include files.


ie: the  order of the  options  -mstackrealign and -mno-stackrealign are 
swapped in this output:


 Running 
/gcc/2014-09-16/gcc/gcc/testsuite/gcc.target/i386/stackalign/stackalign.exp 
...

- UNSUPPORTED: gcc.target/i386/stackalign/asm-1.c -mstackrealign
  UNSUPPORTED: gcc.target/i386/stackalign/asm-1.c -mno-stackrealign
! UNSUPPORTED: gcc.target/i386/stackalign/longlong-1.c -mstackrealign
  UNSUPPORTED: gcc.target/i386/stackalign/longlong-1.c -mno-stackrealign
  UNSUPPORTED: gcc.target/i386/stackalign/longlong-2.c -mstackrealign
  UNSUPPORTED: gcc.target/i386/stackalign/longlong-2.c -mno-stackrealign
  PASS: gcc.target/i386/stackalign/pr39146.c -mstackrealign (test for 
excess errors)

--- 110393,110402 
  PASS: gcc.target/i386/math-torture/trunc.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  (test for excess errors)
  PASS: gcc.target/i386/math-torture/trunc.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
  Running 
/gcc/2014-09-16/gcc/gcc/testsuite/gcc.target/i386/stackalign/stackalign.exp 
...

  UNSUPPORTED: gcc.target/i386/stackalign/asm-1.c -mno-stackrealign
! UNSUPPORTED: gcc.target/i386/stackalign/asm-1.c -mstackrealign
  UNSUPPORTED: gcc.target/i386/stackalign/longlong-1.c -mno-stackrealign
+ UNSUPPORTED: gcc.target/i386/stackalign/longlong-1.c -mstackrealign
  UNSUPPORTED: gcc.target/i386/stackalign/longlong-2.c -mstackrealign
  UNSUPPORTED: gcc.target/i386/stackalign/longlong-2.c -mno-stackrealign
  PASS: gcc.target/i386/stackalign/pr39146.c -mstackrealign (test for 
excess errors)



Andrew


[PATCH/RFC 15 / 14+2][RS6000] Remove vec_shl and (hopefully) fix vec_shr

2014-09-18 Thread Alan Lawrence
Patch 12 of 14 (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01475.html) will 
break bigendian targets implementing vec_shr. This is a PowerPC parallel of 
patch 13 of 14 (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01477.html) for 
AArch64. I've checked I can build a stage 1 compiler for powerpc-none-eabi and 
that the assembly output looks plausible but no further than that.


In fact I find BYTES_BIG_ENDIAN is defined to true on powerpcle-none-eabi as 
well as powerpc-none-eabi (and also on ppc64-none-elf, but to false on 
ppc64le-none-elf), so I'm not quite sure how your backend works in this regard - 
nonetheless I hope this is a helpful starting point even if not definitive.


gcc/ChangeLog:

* config/rs6000/vector.md (vec_shl_mode): Remove.
(vec_shr_mode): Reverse shift if BYTES_BIG_ENDIAN.
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index edbb83161d142b1a562735635fe90ef65b09fbbf..8bc010eb26526e2997d02ea7aef655e60eca8707 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -972,53 +972,11 @@
  VECTOR_MEM_VSX_P (MODEmode)  TARGET_ALLOW_MOVMISALIGN
  )
 
-
-;; Vector shift left in bits.  Currently supported ony for shift
-;; amounts that can be expressed as byte shifts (divisible by 8).
-;; General shift amounts can be supported using vslo + vsl. We're
-;; not expecting to see these yet (the vectorizer currently
-;; generates only shifts divisible by byte_size).
-(define_expand vec_shl_mode
-  [(match_operand:VEC_L 0 vlogical_operand )
-   (match_operand:VEC_L 1 vlogical_operand )
-   (match_operand:QI 2 reg_or_short_operand )]
-  TARGET_ALTIVEC
-  
-{
-  rtx bitshift = operands[2];
-  rtx shift;
-  rtx insn;
-  HOST_WIDE_INT bitshift_val;
-  HOST_WIDE_INT byteshift_val;
-
-  if (! CONSTANT_P (bitshift))
-FAIL;
-  bitshift_val = INTVAL (bitshift);
-  if (bitshift_val  0x7)
-FAIL;
-  byteshift_val = bitshift_val  3;
-  if (TARGET_VSX  (byteshift_val  0x3) == 0)
-{
-  shift = gen_rtx_CONST_INT (QImode, byteshift_val  2);
-  insn = gen_vsx_xxsldwi_mode (operands[0], operands[1], operands[1],
- shift);
-}
-  else
-{
-  shift = gen_rtx_CONST_INT (QImode, byteshift_val);
-  insn = gen_altivec_vsldoi_mode (operands[0], operands[1], operands[1],
-	shift);
-}
-
-  emit_insn (insn);
-  DONE;
-})
-
 ;; Vector shift right in bits. Currently supported ony for shift
 ;; amounts that can be expressed as byte shifts (divisible by 8).
 ;; General shift amounts can be supported using vsro + vsr. We're
 ;; not expecting to see these yet (the vectorizer currently
-;; generates only shifts divisible by byte_size).
+;; generates only shifts by a whole number of vector elements).
 (define_expand vec_shr_mode
   [(match_operand:VEC_L 0 vlogical_operand )
(match_operand:VEC_L 1 vlogical_operand )
@@ -1037,7 +995,9 @@
   bitshift_val = INTVAL (bitshift);
   if (bitshift_val  0x7)
 FAIL;
-  byteshift_val = 16 - (bitshift_val  3);
+  byteshift_val = (bitshift_val  3);
+  if (!BYTES_BIG_ENDIAN)
+byteshift_val = 16 - byteshift_val;
   if (TARGET_VSX  (byteshift_val  0x3) == 0)
 {
   shift = gen_rtx_CONST_INT (QImode, byteshift_val  2);

[PATCH 16 / 14+2][MIPS] Remove vec_shl and (hopefully) fix vec_shr

2014-09-18 Thread Alan Lawrence

Patch 12 of 14 (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01475.html) will
break bigendian targets implementing vec_shr. This is a MIPS parallel of
patch 13 of 14 (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01477.html) for
AArch64; the idea is that vec_shr should be unaffected on little-endian, but 
reversed (to be the same as the old vec_shl) if big-endian.


Manual inspection of assembler output looks to do the right sort of thing on 
mips and mips64, but I haven't been able to run any testcases so this is not 
definitive. I'm hoping it is nonetheless helpful as a starting point!


gcc/ChangeLog:

* config/mips/loongson.md (unspec): Remove UNSPEC_LOONGSON_DSLL.
(vec_shl_mode): Remove.
(vec_shr_mode): Reverse shift if BYTES_BIG_ENDIAN.diff --git a/gcc/config/mips/loongson.md b/gcc/config/mips/loongson.md
index 474033d1e2c244d3b70ad5ed630ab9f29d5fd5f6..dcba23440a5cb8cf0f2063ee15fbcf9d2a579714 100644
--- a/gcc/config/mips/loongson.md
+++ b/gcc/config/mips/loongson.md
@@ -39,7 +39,6 @@
   UNSPEC_LOONGSON_PUNPCKL
   UNSPEC_LOONGSON_PADDD
   UNSPEC_LOONGSON_PSUBD
-  UNSPEC_LOONGSON_DSLL
   UNSPEC_LOONGSON_DSRL
 ])
 
@@ -834,22 +833,18 @@
 })
 
 ;; Whole vector shifts, used for reduction epilogues.
-(define_insn vec_shl_mode
-  [(set (match_operand:VWHBDI 0 register_operand =f)
-(unspec:VWHBDI [(match_operand:VWHBDI 1 register_operand f)
-(match_operand:SI 2 register_operand f)]
-   UNSPEC_LOONGSON_DSLL))]
-  TARGET_HARD_FLOAT  TARGET_LOONGSON_VECTORS
-  dsll\t%0,%1,%2
-  [(set_attr type fcvt)])
-
 (define_insn vec_shr_mode
   [(set (match_operand:VWHBDI 0 register_operand =f)
 (unspec:VWHBDI [(match_operand:VWHBDI 1 register_operand f)
 (match_operand:SI 2 register_operand f)]
UNSPEC_LOONGSON_DSRL))]
   TARGET_HARD_FLOAT  TARGET_LOONGSON_VECTORS
-  dsrl\t%0,%1,%2
+  {
+if (BYTES_BIG_ENDIAN)
+  return dsll\t%0,%1,%2;
+else
+  return dsrl\t%0,%1,%2;
+  }
   [(set_attr type fcvt)])
 
 (define_expand reduc_uplus_mode

Re: parallel check output changes?

2014-09-18 Thread Andrew MacLeod

On 09/18/2014 09:01 AM, Jakub Jelinek wrote:

On Thu, Sep 18, 2014 at 08:56:50AM -0400, Andrew MacLeod wrote:

Has the changes that have gone into the check parallelization made the .sum
file non-deterministic?
I'm seeing a lot of small hunks in different orders which cause my
comparison scripts to show big differences.
I haven't been paying attention to the nature of the make check changes so
Im not sure if this is expected...

Or is this something else?  Its the same code base between runs, just with a
few changes made to some include files.

I'm using contrib/test_summary and haven't seen any non-determinisms in the
output of that command.  As for dg-extract-results.sh, we have two versions
of that, one if you have python 2.6 or newer, another one if you don't.
Perhaps the behavior of those two (I'm using the python version probably)
differs?

Jakub
Not sure, although I do have python 2.7.5 installed for what its 
worth...  I'll try another run in a bit.


Andrew


Re: parallel check output changes?

2014-09-18 Thread Jakub Jelinek
On Thu, Sep 18, 2014 at 08:56:50AM -0400, Andrew MacLeod wrote:
 Has the changes that have gone into the check parallelization made the .sum
 file non-deterministic?
 I'm seeing a lot of small hunks in different orders which cause my
 comparison scripts to show big differences.
 I haven't been paying attention to the nature of the make check changes so
 Im not sure if this is expected...
 
 Or is this something else?  Its the same code base between runs, just with a
 few changes made to some include files.

I'm using contrib/test_summary and haven't seen any non-determinisms in the
output of that command.  As for dg-extract-results.sh, we have two versions
of that, one if you have python 2.6 or newer, another one if you don't.
Perhaps the behavior of those two (I'm using the python version probably)
differs?

Jakub


Re: [PATCH 12/14][Vectorizer] Redefine VEC_RSHIFT_EXPR and vec_shr_optab as endianness-neutral

2014-09-18 Thread David Edelsohn
On Thu, Sep 18, 2014 at 8:42 AM, Alan Lawrence alan.lawre...@arm.com wrote:
 The direction of VEC_RSHIFT_EXPR has been endian-dependent, contrary to the
 general principles of tree. This patch updates fold-const and the vectorizer
 (the only place where such expressions are created), such that
 VEC_RSHIFT_EXPR always shifts towards element 0.

 The tree code still maps directly onto the vec_shr_optab, and so this patch
 *will break any bigendian platform defining the vec_shr optab*.
 -- For AArch64_be, patch follows next in series;
 -- For PowerPC, I think patch/rfc 15 should fix, please inspect;
 -- For MIPS, I think patch/rfc 16 should fix, please inspect.

 gcc/ChangeLog:

 * fold-const.c (const_binop): VEC_RSHIFT_EXPR always shifts towards
 element 0.

 * tree-vect-loop.c (vect_create_epilog_for_reduction): always
 extract
 the result of a reduction with vector shifts from element 0.

 * tree.def (VEC_RSHIFT_EXPR, VEC_LSHIFT_EXPR): Comment shift
 direction.

 * doc/md.texi (vec_shr_m, vec_shl_m): Document shift direction.

 Testing Done:

 Bootstrap and check-gcc on x86_64-none-linux-gnu; check-gcc on
 aarch64-none-elf.

Why wasn't this tested on the PowerLinux system in the GCC Compile Farm?

Also, Bill Schmidt can help check the PPC parts fo the patches.

Thanks, David


Re: [PATCH][PING] Enable -fsanitize-recover for KASan

2014-09-18 Thread Yury Gribov

Added Marek to comment on proposed UBSan option change.

On 09/18/2014 02:52 PM, Jakub Jelinek wrote:

--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -1551,6 +1551,12 @@ common_handle_option (struct gcc_options *opts,
 | SANITIZE_RETURNS_NONNULL_ATTRIBUTE))
  opts-x_flag_delete_null_pointer_checks = 0;

+   /* UBSan and KASan enable recovery by default.  */
+   opts-x_flag_sanitize_recover
+ = !!(flag_sanitize  (SANITIZE_UNDEFINED
+   | SANITIZE_UNDEFINED_NONDEFAULT
+   | SANITIZE_KERNEL_ADDRESS));
+


Doesn't this override even user supplied -fsanitize-recover or
-fno-sanitize-recover ?  Have you tried both
-fno-sanitize-recover -fsanitize=kernel-address
and
-fsanitize=kernel-address -fno-sanitize-recover
option orders?


I did and this worked in a seemingly logical way:
* -fsanitize=address (disable recovery)
* -fsanitize-recover -fsanitize=address (disable recovery)
* -fsanitize=address -fsanitize-recover (enable recovery)
* -fsanitize=kernel-address (enable recovery)
* -fno-sanitize-recover -fsanitize=kernel-address (enable recovery)
* -fsanitize=kernel-address -fno-sanitize-recover (enable recovery)


Seems for -fdelete-null-pointer-checks we got it wrong too,
IMHO for -fsanitize={null,{,returns-}nonnull-attribute,undefined}
we want to disable it unconditionally, regardless of whether
that option appears on the command line or not.


My understanding is that all 
-fsanitize=(address|kernel-address|undefined|you-name-it) are simply 
packs of options to enable. User may override any selected option from 
the pack if he so desires.



I don't think your proposal will work properly though,
if one compiles with
-fsanitize=undefined -fsanitize=address
you'll just get userland asan with error recovery, which is highly
undesirable


Now that's a problem. Looks like I'll need a separate flag to achieve 
what I need (-fasan-recover? And maybe then rename -fsanitize-recover to 
-fubsan-recover for consistency?).



or asan.c needs to limit it to flag_sanitize  SANITIZE_KERNEL_ADDRESS
mode only.


We may want to UBsanitize kernel in future and this may cause the same 
problem as for userspace Asan/UBSan interaction you described above.


 Depends if you ever want to add recovery for userland
 sanitization.

Also kernel developers want both recoverable (more user-friendly) and 
non-recoverable (faster) Asan error handling.


-Y



Re: [PATCH, i386, Pointer Bounds Checker 31/x] Pointer Bounds Checker builtins for i386 target

2014-09-18 Thread Ilya Enkovich
On 17 Sep 20:06, Uros Bizjak wrote:
 On Wed, Sep 17, 2014 at 6:31 PM, Ilya Enkovich enkovich@gmail.com wrote:
 
   I don't like the way arguments are prepared. For the case above,
   bnd_ldx should have index_register_operand predicate in its pattern,
   and this predicate (and its mode) should be checked in the expander
   code. There are many examples of argument expansion in
   ix86_expand_builtin function, including how Pmode is handled.
  
   Also, please see how target is handled there. Target can be null, so
   REG_P predicate will crash.
  
   You should also select insn patterns depending on BNDmode, not 
   TARGET_64BIT.
  
   Please use assign_386_stack_local so stack slots can be shared.
   SLOT_TEMP is intended for short-lived temporaries, you can introduce
   new slots if you need more live values at once.
  
   Uros.
  
   Thanks for comments!  Here is a new version in which I addressed all 
   your concerns.
 
  Unfortunately, it doesn't. The patch only fixed one instance w.r.t to
  target handling, the one I referred as an example. You still have
  unchecked target, at least in IX86_BUILTIN_BNDMK.
 
  However, you have a general problems in your builtin expansion code,
  so please look at how other builtins are handled. E.g.:
 
if (optimize || !target
|| GET_MODE (target) != tmode
|| !register_operand(target, tmode))
  target = gen_reg_rtx (tmode);
 
  also, here is an example how input operands are prepared:
 
op0 = expand_normal (arg0);
op1 = expand_normal (arg1);
op2 = expand_normal (arg2);
if (!register_operand (op0, Pmode))
  op0 = ix86_zero_extend_to_Pmode (op0);
if (!register_operand (op1, SImode))
  op1 = copy_to_mode_reg (SImode, op1);
if (!register_operand (op2, SImode))
  op2 = copy_to_mode_reg (SImode, op2);
 
  So, Pmode is handled in a special way, even when x32 is not considered.
 
  BTW: I wonder if word_mode is needed here, Pmode can be SImode with
  address prefix (x32).
 
  Inside the expanders, please use expand_simple_binop and expand_unop
  on RTX, not tree expressions. Again, please see many examples.
 
  Thank you for additional explanations.  Hope this time I answer your 
  concerns correctly :)
 
 Yes, this version is MUCH better. There are further comments down the code.
 
  2014-09-17  Ilya Enkovich  ilya.enkov...@intel.com
 
  * config/i386/i386-builtin-types.def (BND): New.
  (ULONG): New.
  (BND_FTYPE_PCVOID_ULONG): New.
  (VOID_FTYPE_BND_PCVOID): New.
  (VOID_FTYPE_PCVOID_PCVOID_BND): New.
  (BND_FTYPE_PCVOID_PCVOID): New.
  (BND_FTYPE_PCVOID): New.
  (BND_FTYPE_BND_BND): New.
  (PVOID_FTYPE_PVOID_PVOID_ULONG): New.
  (PVOID_FTYPE_PCVOID_BND_ULONG): New.
  (ULONG_FTYPE_VOID): New.
  (PVOID_FTYPE_BND): New.
  * config/i386/i386.c: Include tree-chkp.h, rtl-chkp.h.
  (ix86_builtins): Add
  IX86_BUILTIN_BNDMK, IX86_BUILTIN_BNDSTX,
  IX86_BUILTIN_BNDLDX, IX86_BUILTIN_BNDCL,
  IX86_BUILTIN_BNDCU, IX86_BUILTIN_BNDRET,
  IX86_BUILTIN_BNDNARROW, IX86_BUILTIN_BNDINT,
  IX86_BUILTIN_SIZEOF, IX86_BUILTIN_BNDLOWER,
  IX86_BUILTIN_BNDUPPER.
  (builtin_isa): Add leaf_p and nothrow_p fields.
  (def_builtin): Initialize leaf_p and nothrow_p.
  (ix86_add_new_builtins): Handle leaf_p and nothrow_p
  flags.
  (bdesc_mpx): New.
  (bdesc_mpx_const): New.
  (ix86_init_mpx_builtins): New.
  (ix86_init_builtins): Call ix86_init_mpx_builtins.
  (ix86_emit_cmove): New.
  (ix86_emit_move_max): New.
  (ix86_expand_builtin): Expand IX86_BUILTIN_BNDMK,
  IX86_BUILTIN_BNDSTX, IX86_BUILTIN_BNDLDX,
  IX86_BUILTIN_BNDCL, IX86_BUILTIN_BNDCU,
  IX86_BUILTIN_BNDRET, IX86_BUILTIN_BNDNARROW,
  IX86_BUILTIN_BNDINT, IX86_BUILTIN_SIZEOF,
  IX86_BUILTIN_BNDLOWER, IX86_BUILTIN_BNDUPPER.
  * config/i386/i386.h (ix86_stack_slot): Added SLOT_BND_STORED.
 
 ..
 
  +   /* We need to move bounds to memory before any computations.  */
  +   if (!MEM_P (op1))
  + {
  +   m1 = assign_386_stack_local (BNDmode, SLOT_TEMP);
  +   emit_move_insn (m1, op1);
  + }
  +   else
  + m1 = op1;
 
 No negative conditions, please. Just swap the arms of if sentence. It
 is much more readable.
 
  +
  +   /* Generate mem expression to be used for access to LB and UB.  */
  +   m1h1 = gen_rtx_MEM (Pmode, XEXP (m1, 0));
  +   m1h2 = gen_rtx_MEM (Pmode, plus_constant (Pmode, XEXP (m1, 0),
  + GET_MODE_SIZE (Pmode)));
 
 Please use adjust_address  instead of manually producing MEMs.
 
  +
  +   t1 = gen_reg_rtx (Pmode);
  +
  +   /* Compute LB.  */
  +   emit_move_insn (t1, m1h1);
  +   ix86_emit_move_max (t1, lb);
  +   

[committed] Fix up pr59594.c testcase (PR testsuite/63292)

2014-09-18 Thread Jakub Jelinek
Hi!

This testsuite contains a buffer overflow, I've fixed it thusly,
tested that it still fails with the fix reverted and works with current
trunk and committed as obvious to 4.9 and trunk.

2014-09-18  Jakub Jelinek  ja...@redhat.com

PR testsuite/63292
* gcc.dg/vect/pr59594.c (b): Increase size to N + 2 elements.

--- gcc/testsuite/gcc.dg/vect/pr59594.c.jj  2014-01-29 10:26:34.0 
+0100
+++ gcc/testsuite/gcc.dg/vect/pr59594.c 2014-09-18 15:42:38.628739317 +0200
@@ -3,7 +3,7 @@
 #include tree-vect.h
 
 #define N 1024
-int b[N + 1];
+int b[N + 2];
 
 int
 main ()

Jakub


Re: [PATCH, i386, Pointer Bounds Checker 30/x] Size relocation

2014-09-18 Thread Ilya Enkovich
On 17 Sep 20:51, Uros Bizjak wrote:
 On Wed, Sep 17, 2014 at 8:35 PM, Ilya Enkovich enkovich@gmail.com wrote:
  On 16 Sep 12:22, Uros Bizjak wrote:
  On Tue, Sep 16, 2014 at 11:37 AM, Ilya Enkovich enkovich@gmail.com 
  wrote:
   2014-09-16 13:08 GMT+04:00 Uros Bizjak ubiz...@gmail.com:
  
   Can x86_64_immediate_operand predicate be used here?
  
   I think it cannot be used because of TLS symbols not counting as 
   immediate.
 
  OK, please introduce a new predicate, similar to
  x86_64_immediate_operand, perhaps x86_64_immediate_size_operand, so we
  can add some comments there. This will also help to macroize the insn,
  x86_64_immediate_operand has !TARGET_64BIT shortcut for this case.
 
  Uros.
 
  I don't see how new predicate would help to macroize insn.  Single template 
  may look as following patch.
 
 You put early return for !TARGET_64BITS. Please see
 x86_64_immediate_operand predicate.
 
 So,
 
 /* Here comes comment. */
 (define_predicate x86_64_immediate_size_operand
   (match_code symbol_ref)
 {
   if (!TARGET_64BIT)
 return true;
 
   /* Comment here explaining these conditions.  */
   return (ix86_cmodel == CM_SMALL || ix86_cmodel == CM_KERNEL);
 }
 
 And then in the pattern itself:
 
 if (x86_64_immediate_size_operand (operands[1], VOIDmode)
   return mov{l}\t{%1@SIZE, %k0|%k0, %1@SIZE};
 else
   return movabs{q}\t{%1@SIZE, %0|%0, %1@SIZE};
 
 Uros.

Here is a version with check in a form you suggest.

Thanks,
Ilya
--
2014-09-18  Ilya Enkovich  ilya.enkov...@intel.com

* config/i386/i386.md (UNSPEC_SIZEOF): New.
(move_size_reloc_mode): New.
* config/i386/predicates.md (symbol_operand): New.
(x86_64_immediate_size_operand): New.


diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 2c367b2..db22b06 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -79,6 +79,7 @@
   UNSPEC_PLTOFF
   UNSPEC_MACHOPIC_OFFSET
   UNSPEC_PCREL
+  UNSPEC_SIZEOF
 
   ;; Prologue support
   UNSPEC_STACK_ALLOC
@@ -18554,6 +18555,21 @@
   bndstx\t{%2, %3|%3, %2}
   [(set_attr type mpxst)])
 
+(define_insn move_size_reloc_mode
+  [(set (match_operand:SWI48 0 register_operand =r)
+   (unspec:SWI48
+[(match_operand:SWI48 1 symbol_operand)]
+UNSPEC_SIZEOF))]
+  TARGET_MPX
+{
+  if (x86_64_immediate_size_operand (operands[1], VOIDmode))
+return mov{l}\t{%1@SIZE, %k0|%k0, %1@SIZE};
+  else
+return movabs{q}\t{%1@SIZE, %0|%0, %1@SIZE};
+}
+  [(set_attr type imov)
+   (set_attr mode MODE)])
+
 (include mmx.md)
 (include sse.md)
 (include sync.md)
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index cd542b7..da01c9a 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -124,6 +124,10 @@
(match_test TARGET_64BIT)
(match_test REGNO (op)  BX_REG)))
 
+;; Return true if VALUE is symbol reference
+(define_predicate symbol_operand
+  (match_code symbol_ref))
+
 ;; Return true if VALUE can be stored in a sign extended immediate field.
 (define_predicate x86_64_immediate_operand
   (match_code const_int,symbol_ref,label_ref,const)
@@ -336,6 +340,19 @@
   return false;
 })
 
+;; Return true if size of VALUE can be stored in a sign
+;; extended immediate field.
+(define_predicate x86_64_immediate_size_operand
+  (match_code symbol_ref)
+{
+  if (!TARGET_64BIT)
+return true;
+
+  /* For 64 bit target we may assume size of object fits
+ immediate only when code model guarantees that.  */
+  return (ix86_cmodel == CM_SMALL || ix86_cmodel == CM_KERNEL);
+})
+
 ;; Return true if OP is general operand representable on x86_64.
 (define_predicate x86_64_general_operand
   (if_then_else (match_test TARGET_64BIT)


[committed] Don't instrument clobbers with asan (PR c++/62017)

2014-09-18 Thread Jakub Jelinek
Hi!

Clobber stmts, being artificial statements, were certainly never
meant to be instrumented.  In 4.8 when asan has been introduced into gcc,
the lhs of clobber could be only a decl and as a whole decl store would not
be really instrumented, but with *this clobbers in 4.9 that is no longer the
case.

Fixed thusly, tested on x86_64-linux, committed to trunk and 4.9 as obvious.

2014-09-18  Jakub Jelinek  ja...@redhat.com

PR c++/62017
* asan.c (transform_statements): Don't instrument clobber statements.

* g++.dg/asan/pr62017.C: New test.

--- gcc/asan.c.jj   2014-09-08 22:12:52.0 +0200
+++ gcc/asan.c  2014-09-18 14:34:30.023446693 +0200
@@ -2072,6 +2072,7 @@ transform_statements (void)
  if (has_stmt_been_instrumented_p (s))
gsi_next (i);
  else if (gimple_assign_single_p (s)
+   !gimple_clobber_p (s)
maybe_instrument_assignment (i))
/*  Nothing to do as maybe_instrument_assignment advanced
the iterator I.  */;
--- gcc/testsuite/g++.dg/asan/pr62017.C.jj  2014-09-18 14:44:03.964525585 
+0200
+++ gcc/testsuite/g++.dg/asan/pr62017.C 2014-09-18 14:43:52.0 +0200
@@ -0,0 +1,17 @@
+// PR c++/62017
+// { dg-do run }
+
+struct A
+{
+  int x;
+  virtual ~A () {}
+};
+struct B : public virtual A {};
+struct C : public virtual A {};
+struct D : public B, virtual public C {};
+
+int
+main ()
+{
+  D d;
+}

Jakub


Re: [PATCH] Fix PR 58867: asan and ubsan tests not run for installed testing.

2014-09-18 Thread Maxim Ostapenko

Hi Andrew!

What is the status of this patch? Enabling ASan and UBSan testsuites is 
useful for testing installed toolchain, so I wonder if you are going to 
commit this.


-Maxim


[PATCH][PING] Keep patch file permissions in mklog

2014-09-18 Thread Yury Gribov

On 08/04/2014 12:14 PM, Tom de Vries wrote:

On 04-08-14 08:45, Yury Gribov wrote:

Thanks! My 2 (actually 4) cents below.



Hi Yuri,

thanks for the review.


  +if ($#ARGV == 1  ($ARGV[0] eq -i || $ARGV[0] eq
--inline)) {
  +$diff = $ARGV[1];

Can we shift here and then just set $diff to $ARGV[0] unconditionally?



Done.


  +if ($diff eq -) {
  +die Reading from - and using -i are not compatible;
  +}

Hm, can't we dump ChangeLog to stdout in that case?
The limitation looks rather strange.



My original idea here was that --inline means 'in the patch file', which
is not possible if the patch comes from stdin.

I've now interpreted it such that --inline prints to stdout what it
would print to the patch file otherwise, that is, both log and patch.
Printing just the log to stdout can be already be achieved by not using
--inline.


  +open (FILE1, '', $tmp) or die Could not open temp file;

Could we use more descriptive name?



I've used the slightly more descriptive 'OUTPUTFILE'.


  +system (cat $diff $tmp) == 0
  +or die Could not append patch to temp file;
  ...
  +unlink ($tmp) == 1 or die Could not remove temp file;

The checks look like an overkill given that we don't check for result
of mktemp...



I've added a check for the result of mktemp, and removed the unlink
result check. I've left in the Could not append patch to temp file
check because the patch file might be read-only.

OK for trunk?

Thanks,
- Tom



Pinging the patch for Tom.


Re: [PATCH] RTEMS: Update contrib/config-list.mk

2014-09-18 Thread Joel Sherrill
I committed this to 4.9 and head.

Sebastian.. please double check that it is OK please.
I had some issues with applying it to the head and
manually did it.

--joel
On 9/17/2014 8:37 AM, Sebastian Huber wrote:
 contrib/ChangeLog
 2014-09-17  Sebastian Huber  sebastian.hu...@embedded-brains.de

   * config-list.mk (LIST): Add arm-rtems.
   Add nios2-rtems.  Remove extra option from powerpc-rtems.
 ---
  contrib/config-list.mk | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

 diff --git a/contrib/config-list.mk b/contrib/config-list.mk
 index 4345487..056fbf0 100644
 --- a/contrib/config-list.mk
 +++ b/contrib/config-list.mk
 @@ -17,7 +17,7 @@ LIST = aarch64-elf aarch64-linux-gnu \
arc-elf32OPT-with-cpu=arc600 arc-elf32OPT-with-cpu=arc700 \
arc-linux-uclibcOPT-with-cpu=arc700 arceb-linux-uclibcOPT-with-cpu=arc700 \
arm-wrs-vxworks arm-netbsdelf \
 -  arm-linux-androideabi arm-uclinux_eabi arm-eabi \
 +  arm-linux-androideabi arm-uclinux_eabi arm-eabi arm-rtems \
arm-symbianelf avr-rtems avr-elf \
bfin-elf bfin-uclinux bfin-linux-uclibc bfin-rtems bfin-openbsd \
c6x-elf c6x-uclinux cr16-elf cris-elf cris-linux crisv32-elf crisv32-linux 
 \
 @@ -48,13 +48,13 @@ LIST = aarch64-elf aarch64-linux-gnu \
moxie-uclinux moxie-rtems \
msp430-elf \
nds32le-elf nds32be-elf \
 -  nios2-elf nios2-linux-gnu \
 +  nios2-elf nios2-linux-gnu nios2-rtems \
pdp11-aout picochip-elfOPT-enable-obsolete \
powerpc-darwin8 \
powerpc-darwin7 powerpc64-darwin powerpc-freebsd6 powerpc-netbsd \
powerpc-eabispe powerpc-eabisimaltivec powerpc-eabisim ppc-elf \
powerpc-eabialtivec powerpc-xilinx-eabi powerpc-eabi \
 -  powerpc-rtems4.11OPT-enable-threads=yes powerpc-linux_spe \
 +  powerpc-rtems powerpc-linux_spe \
powerpc-linux_paired powerpc64-linux_altivec \
powerpc-wrs-vxworks powerpc-wrs-vxworksae powerpc-lynxos powerpcle-elf \
powerpcle-eabisim powerpcle-eabi rs6000-ibm-aix4.3 rs6000-ibm-aix5.1.0 \

-- 
Joel Sherrill, Ph.D. Director of Research  Development
joel.sherr...@oarcorp.comOn-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805
Support Available(256) 722-9985



Re: [PATCH] RTEMS: Update contrib/config-list.mk

2014-09-18 Thread Joel Sherrill

On 9/18/2014 6:51 AM, Jan-Benedict Glaw wrote:
 On Wed, 2014-09-17 10:52:34 -0500, Joel Sherrill joel.sherr...@oarcorp.com 
 wrote:
 On 9/17/2014 10:41 AM, Sebastian Huber wrote:
 On 09/17/2014 04:45 PM, Jan-Benedict Glaw wrote:
 On Wed, 2014-09-17 15:37:32 +0200, Sebastian 
 Hubersebastian.hu...@embedded-brains.de  wrote:
 contrib/ChangeLog
 2014-09-17  Sebastian Hubersebastian.hu...@embedded-brains.de

  * config-list.mk (LIST): Add arm-rtems.
  Add nios2-rtems.  Remove extra option from powerpc-rtems.
 What's the rationale for removing --enable-threads=yes here, as well
 as the specific version number?
 [...]
 And is this the input to your buildbot? :)
 Yes, the target list in contrib/config-list.mk is what'll be built
 using the config-list.mk-building backend. (The robot has another
 backend using a different build strategy, which has a separate target
 list, though one could argue that I'd also include all the
 config-list.mk targets in that other list as well.)

   And to tell the whole story, Sebastian approached me with extending
 the target lists in use by those targets he sent a patch for; I just
 asked him to go this route, because I guess that'd be beneficial for
 other folks as well.
OK. Thanks for clarifying that. I suspected there was a link.

And it is committed.  And I will post a follow up patch to
add v850-elf and v850-rtems.

--joel
 MfG, JBG





[PATCH] Add v850-rtems to contrib/config-list.mk

2014-09-18 Thread Joel Sherrill
OK to commit?

2014-09-18  Joel Sherrill joel.sherr...@oarcorp.com

* config-list.mk (LIST): Add v850-rtems.

Index: contrib/config-list.mk
===
--- contrib/config-list.mk  (revision 215357)
+++ contrib/config-list.mk  (working copy)
@@ -68,7 +68,7 @@
   sparc-wrs-vxworks sparc64-elf sparc64-rtems sparc64-linux
sparc64-freebsd6 \
   sparc64-netbsd sparc64-openbsd spu-elf \
   tilegx-linux-gnu tilegxbe-linux-gnu tilepro-linux-gnu \
-  v850e-elf v850-elf vax-linux-gnu \
+  v850e-elf v850-elf v850-rtems vax-linux-gnu \
   vax-netbsdelf vax-openbsd x86_64-apple-darwin \
   x86_64-pc-linux-gnuOPT-with-fpmath=avx \
   x86_64-elfOPT-with-fpmath=sse x86_64-freebsd6 x86_64-netbsd \

-- 
Joel Sherrill, Ph.D. Director of Research  Development
joel.sherr...@oarcorp.comOn-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805
Support Available(256) 722-9985



Re: [jit] Add sphinx-based documentation for libgccjit

2014-09-18 Thread Mike Stump
On Sep 17, 2014, at 6:22 PM, David Malcolm dmalc...@redhat.com wrote:
 I greatly prefer to use Sphinx over Texinfo, both for the ease of
 editing, and the quality of the generated HTML; I already use it for
 both the Python bindings to libgccjit, and for gcc-python-plugin.

 Hence I've used Sphinx for these docs.  It's trivial to build texinfo
 and info files from it (assuming you have sphinx installed).

So, I’d recommend for you to additionally generate and build the texinfo and 
possibly the html source from it and check that in as a generated file.  That 
way, no one has to have the package and all the builds and installs just work 
as normal.  People that want to edit those file, which will be few, would then 
just install that package and regenerate and check it in.  People doing 
spelling corrections and the like, could just edit the source and let someone 
else regenerate as well.

Re: [PATCHv4] Vimrc config with GNU formatting

2014-09-18 Thread Mike Stump
On Sep 18, 2014, at 1:40 AM, Yury Gribov y.gri...@samsung.com wrote:
 How about adding a disclaimer? E.g. beware that Vim plugins are a GAPING 
 SECURITY HOLE
 so use the at YOUR OWN RISK. (And note that Braun's plugin does use 
 sandboxes).

Building gcc features a security risk at least as big as a plugin for vim.  
And, yes, I’ve built gcc in a sandbox before.

[PING] [PATCH 2/2] Add patch for debugging compiler ICEs.

2014-09-18 Thread Maxim Ostapenko

Ping.
On 09/11/2014 08:20 PM, Maxim Ostapenko wrote:

Hi, Joseph,

Thanks for your review! I've added comments for new functions and 
replaced POSIX subprocess interfaces with libiberty's ones.


In general, when cc1 or cc1plus ICE-es, we try to reproduce the bug by 
running compiler 3 times and comparing stderr and stdout on each 
attempt with respective ones that were gotten as the result of 
previous compiler run (we use temporary dump files to do this). If 
these files are identical, we add GCC configuration (e.g. target, 
configure options and version), compiler command line and preprocessed 
source code into last dump file, containing backtrace. Following 
Jakub's approach, we trigger ICE_EXIT_CODE instead of FATAL_EXIT_CODE 
in case of DK_FATAL error to differ ICEs from other fatal errors, so 
try_generate_repro routine will be able to run even if fatal_error 
occurred in compiler.


We've noticed that on rare occasion a particularly severe segfault can 
cause GCC to abort without ICE-ing. These (hopefully rare) errors will 
be missed by our patch, because SIGSEGV handler is not able to catch 
the signal due to corrupted stack. It could make sense to allocate 
separate stack for SIGSEGV handler to resolve this situation.


-Maxim
On 09/10/2014 08:37 PM, Joseph S. Myers wrote:

On Wed, 10 Sep 2014, Jakub Jelinek wrote:


On Tue, Sep 09, 2014 at 10:51:23PM +, Joseph S. Myers wrote:

On Thu, 28 Aug 2014, Maxim Ostapenko wrote:


diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
index 0cc7593..67b8c5b 100644
--- a/gcc/diagnostic.c
+++ b/gcc/diagnostic.c
@@ -492,7 +492,7 @@ diagnostic_action_after_output 
(diagnostic_context *context,

  real_abort ();
diagnostic_finish (context);
fnotice (stderr, compilation terminated.\n);
-  exit (FATAL_EXIT_CODE);
+  exit (ICE_EXIT_CODE);
Why?  This is the case for fatal_error.  FATAL_EXIT_CODE seems 
right for

this, and ICE_EXIT_CODE wrong.
So that the driver can understand the difference between an ICE and 
other

fatal errors (e.g. sorry etc.).
Users are typically using the driver and for them it matters what 
exit code

is returned from the driver, not from cc1/cc1plus etc.

Well, I think the next revision of the patch submission needs more
explanation in this area.  What exit codes do cc1 and the driver now
return for (normal error, fatal error, ICE), and what do they return 
after
the patch, and how does the change to the fatal_error case avoid 
incorrect

changes if either cc1 or the driver called fatal_error (as opposed to
either cc1 or the driver having an ICE)?  Maybe that explanation 
should be

in the form of a comment on this exit call, explaining why the
counterintuitive use of ICE_EXIT_CODE in the DK_FATAL case is correct.





2014-09-04  Jakub Jelinek  ja...@redhat.com
	Max Ostapenko  m.ostape...@partner.samsung.com

	* common.opt: New option.
	* doc/invoke.texi: Describe new option.
	* diagnostic.c (diagnostic_action_after_output): Exit with
	ICE_EXIT_CODE instead of FATAL_EXIT_CODE.
	* gcc.c (execute): Don't free first string early, but at the end
	of the function.  Call retry_ice if compiler exited with
	ICE_EXIT_CODE.
	(main): Factor out common code.
	(print_configuration): New function.
	(try_fork): Likewise.
	(redirect_stdout_stderr): Likewise.
	(files_equal_p): Likewise.
	(check_repro): Likewise.
	(run_attempt): Likewise.
	(do_report_bug): Likewise.
	(append_text): Likewise.
	(try_generate_repro): Likewise

diff --git a/gcc/common.opt b/gcc/common.opt
index 7d78803..ce71f09 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1120,6 +1120,11 @@ fdump-noaddr
 Common Report Var(flag_dump_noaddr)
 Suppress output of addresses in debugging dumps
 
+freport-bug
+Common Driver Var(flag_report_bug)
+Collect and dump debug information into temporary file if ICE in C/C++
+compiler occured.
+
 fdump-passes
 Common Var(flag_dump_passes) Init(0)
 Dump optimization passes
diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
index 73666d6..dbc928b 100644
--- a/gcc/diagnostic.c
+++ b/gcc/diagnostic.c
@@ -494,7 +494,10 @@ diagnostic_action_after_output (diagnostic_context *context,
 	real_abort ();
   diagnostic_finish (context);
   fnotice (stderr, compilation terminated.\n);
-  exit (FATAL_EXIT_CODE);
+  /* Exit with ICE_EXIT_CODE rather then FATAL_EXIT_CODE so the driver
+ understands the difference between an ICE and other fatal errors
+ (DK_SORRY and DK_ERROR).  */
+  exit (ICE_EXIT_CODE);
 
 default:
   gcc_unreachable ();
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 863b382..565421c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -6336,6 +6336,11 @@ feasible to use diff on debugging dumps for compiler invocations with
 different compiler binaries and/or different
 text / bss / data / heap / stack / dso start locations.
 
+@item -freport-bug
+@opindex freport-bug
+Collect and dump debug information into temporary file if ICE in C/C++
+compiler 

Re: parallel check output changes?

2014-09-18 Thread Andrew MacLeod

On 09/18/2014 09:05 AM, Andrew MacLeod wrote:

On 09/18/2014 09:01 AM, Jakub Jelinek wrote:

On Thu, Sep 18, 2014 at 08:56:50AM -0400, Andrew MacLeod wrote:
Has the changes that have gone into the check parallelization made 
the .sum

file non-deterministic?
I'm seeing a lot of small hunks in different orders which cause my
comparison scripts to show big differences.
I haven't been paying attention to the nature of the make check 
changes so

Im not sure if this is expected...

Or is this something else?  Its the same code base between runs, 
just with a

few changes made to some include files.
I'm using contrib/test_summary and haven't seen any non-determinisms 
in the
output of that command.  As for dg-extract-results.sh, we have two 
versions

of that, one if you have python 2.6 or newer, another one if you don't.
Perhaps the behavior of those two (I'm using the python version 
probably)

differs?

Jakub
Not sure, although I do have python 2.7.5 installed for what its 
worth...  I'll try another run in a bit.


Andrew


hum. My 3rd run (which has no compilation change from the 2nd one) is 
different from both other runs  :-P.   I did tweak my -j parameter in 
the make check, but that is it.


Andrew


Re: [AArch64] Auto-generate the BUILTIN_ macros for aarch64-builtins.c

2014-09-18 Thread Mike Stump
On Sep 18, 2014, at 3:12 AM, Richard Earnshaw rearn...@arm.com wrote:
 
 Is there any real need to write this into the source directory and have
 the built file checked in?  Ie. can't we always write to the build
 directory and use it from there.

I build part of my .md file from a C++ program, so I have to build that 
program, then generate the md file:

s-mddeps: abi.md

abi.md: s-abi; @true
s-abi: genmd $(srcdir)/config/port/port-assist.h
./genmd tmp-abi.md
$(SHELL) $(srcdir)/../move-if-change tmp-abi.md abi.md
$(STAMP) s-abi

genmd: $(srcdir)/config/port/genmd.c $(OPTIONS_H) s-genbuiltin
touch insn-constants.h
touch insn-flags.h
$(COMPILER_FOR_BUILD) $(BUILD_COMPILERFLAGS) $(BUILD_CPPFLAGS) 
$(srcdir)/config/port/genmd.c -o genmd


Certainly bash source is portable enough to just built on demand.  This let’s 
me take content that is in .h files concerning the abi and generate md 
constants from them.

Patch to fix PR61360

2014-09-18 Thread Vladimir Makarov
The following patch fixes the PR.  The details can be found on

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360

The patch was bootstrapped and tested on x86/x86-64.

Committed as rev. 215358.

2014-09-18  Vladimir Makarov  vmaka...@redhat.com

PR target/61360
* lra.c (lra): Call recog_init.

2014-09-18  Vladimir Makarov  vmaka...@redhat.com

PR target/61360
* gcc.target/i386/pr61360.c: New.

Index: lra.c
===
--- lra.c	(revision 215337)
+++ lra.c	(working copy)
@@ -2135,6 +2135,11 @@ lra (FILE *f)
 
   lra_in_progress = 1;
 
+  /* The enable attributes can change their values as LRA starts
+ although it is a bad practice.  To prevent reuse of the outdated
+ values, clear them.  */
+  recog_init ();
+
   lra_live_range_iter = lra_coalesce_iter = 0;
   lra_constraint_iter = lra_constraint_iter_after_spill = 0;
   lra_inheritance_iter = lra_undo_inheritance_iter = 0;
Index: testsuite/gcc.target/i386/pr61360.c
===
--- testsuite/gcc.target/i386/pr61360.c	(revision 0)
+++ testsuite/gcc.target/i386/pr61360.c	(working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options -march=amdfam10 -O2 } */
+int a, b, c, e, f, g, h;
+long *d;
+__attribute__((cold)) void fn1() {
+  int i = g | 1;
+  for (; g; h++) {
+for (; a; e++) d[0] = c;
+if (0.002 * i) break;
+for (; b; f++) d[h] = 0;
+  }
+}


[jit] Build the example files from the documentation when running the testsuite

2014-09-18 Thread David Malcolm
Doing this caught missing return statements in the examples.

This brings the result I see in jit.sum to: # of expected passes  4286

Committed to branch dmalcolm/jit:

gcc/jit/ChangeLog.jit:
* docs/examples/install-hello-world.c (main): Fix missing
return.
* docs/examples/tut01-square.c (main): Likewise.
* docs/examples/tut02-sum-of-squares.c (main): Likewise.

gcc/testsuite/ChangeLog.jit:
* jit.dg/jit.exp: When constructing tests, add the example files
from the documentation, to ensure that they compile.

(I accidentally committed this to gcc/testsuite/ChangeLog; I've fixed
it up to use ChangeLog.jit in a subsequent commit)

---
 gcc/jit/ChangeLog.jit| 7 +++
 gcc/jit/docs/examples/install-hello-world.c  | 1 +
 gcc/jit/docs/examples/tut01-square.c | 1 +
 gcc/jit/docs/examples/tut02-sum-of-squares.c | 1 +
 gcc/testsuite/ChangeLog  | 5 +
 gcc/testsuite/jit.dg/jit.exp | 7 +++
 6 files changed, 22 insertions(+)

diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit
index b42038e..7ee7ebf 100644
--- a/gcc/jit/ChangeLog.jit
+++ b/gcc/jit/ChangeLog.jit
@@ -1,3 +1,10 @@
+2014-09-18  David Malcolm  dmalc...@redhat.com
+
+   * docs/examples/install-hello-world.c (main): Fix missing
+   return.
+   * docs/examples/tut01-square.c (main): Likewise.
+   * docs/examples/tut02-sum-of-squares.c (main): Likewise.
+
 2014-09-17  David Malcolm  dmalc...@redhat.com
 
* docs/Makefile: New file.
diff --git a/gcc/jit/docs/examples/install-hello-world.c 
b/gcc/jit/docs/examples/install-hello-world.c
index c75543d..29afad9 100644
--- a/gcc/jit/docs/examples/install-hello-world.c
+++ b/gcc/jit/docs/examples/install-hello-world.c
@@ -100,4 +100,5 @@ main (int argc, char **argv)
 
   gcc_jit_context_release (ctxt);
   gcc_jit_result_release (result);
+  return 0;
 }
diff --git a/gcc/jit/docs/examples/tut01-square.c 
b/gcc/jit/docs/examples/tut01-square.c
index ddb218e..ea07b92 100644
--- a/gcc/jit/docs/examples/tut01-square.c
+++ b/gcc/jit/docs/examples/tut01-square.c
@@ -84,4 +84,5 @@ main (int argc, char **argv)
  error:
   gcc_jit_context_release (ctxt);
   gcc_jit_result_release (result);
+  return 0;
 }
diff --git a/gcc/jit/docs/examples/tut02-sum-of-squares.c 
b/gcc/jit/docs/examples/tut02-sum-of-squares.c
index e2811ac..1970a36 100644
--- a/gcc/jit/docs/examples/tut02-sum-of-squares.c
+++ b/gcc/jit/docs/examples/tut02-sum-of-squares.c
@@ -149,4 +149,5 @@ main (int argc, char **argv)
  error:
   gcc_jit_context_release (ctxt);
   gcc_jit_result_release (result);
+  return 0;
 }
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 448a7ef..942e219 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2014-09-18  David Malcolm  dmalc...@redhat.com
+
+   * jit.dg/jit.exp: When constructing tests, add the example files
+   from the documentation, to ensure that they compile.
+
 2014-09-09  Bill Schmidt  wschm...@linux.vnet.ibm.com
 
* gcc.target/powerpc/swaps-p8-15.c: Remove scan-assembler-not for
diff --git a/gcc/testsuite/jit.dg/jit.exp b/gcc/testsuite/jit.dg/jit.exp
index 878ff4b..7986185 100644
--- a/gcc/testsuite/jit.dg/jit.exp
+++ b/gcc/testsuite/jit.dg/jit.exp
@@ -33,7 +33,14 @@ if ![info exists GCC_UNDER_TEST] {
 dg-init
 
 # Gather a list of all tests.
+
+# Tests within the testsuite: gcc/testsuite/jit.dg/test-*.c
 set tests [lsort [find $srcdir/$subdir test-*.c]]
+
+# We also test the examples within the documentation, to ensure that
+# they compile:
+set tests [lsort [concat $tests [find $srcdir/../jit/docs/examples *.c]]]
+
 verbose tests: $tests
 
 proc jit-dg-test { prog do_what extra_tool_flags } {
-- 
1.7.11.7



Re: Patch to fix PR61360

2014-09-18 Thread Jakub Jelinek
On Thu, Sep 18, 2014 at 12:04:30PM -0400, Vladimir Makarov wrote:
 The following patch fixes the PR.  The details can be found on
 
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360
 
 The patch was bootstrapped and tested on x86/x86-64.
 
 Committed as rev. 215358.

What effect does this have on compile time?

 2014-09-18  Vladimir Makarov  vmaka...@redhat.com
 
 PR target/61360
 * lra.c (lra): Call recog_init.
 
 2014-09-18  Vladimir Makarov  vmaka...@redhat.com
 
 PR target/61360
 * gcc.target/i386/pr61360.c: New.

Jakub


[patch] rename DECL_ABSTRACT to DECL_ABSTRACT_P

2014-09-18 Thread Aldy Hernandez
Similarly named DECL_ABSTRACT, DECL_ABSTRACT_ORIGIN, and DECL_ORIGIN are 
somewhat confusing to my poor brain.  Particularly annoying is 
DECL_ABSTRACT which is actually a boolean, unlike the other two.


Would it be OK to rename it something more sensible like 
DECL_ABSTRACT_P?  I know this is a longstanding name, but the proposed 
is clearer and virtually the same.


OK for mainline?
commit 2705197662689e354fe397abe907ebf3763eae2d
Author: Aldy Hernandez al...@redhat.com
Date:   Thu Sep 18 10:06:43 2014 -0600

* cgraph.h, dbxout.c, dwarfout2.c, gimple-fold.c,
lto-streamer-out.c, print-tree.c, symtab.c, tree-inline.c,
tree-streamer-in.c, tree-streamer-out.c, tree.c, tree.h,
varpool.c: Rename all instances of DECL_ABSTRACT to
DECL_ABSTRACT_P.

cp/
* class.c, decl.c, optimize.c: Rename all instances of
DECL_ABSTRACT to DECL_ABSTRACT_P.

lto/
* lto-symtab.c, lto.c: Rename all instances of DECL_ABSTRACT to
DECL_ABSTRACT_P.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index dd76758..9c4ec45 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2014-09-18  Aldy Hernandez  al...@redhat.com
+
+   * cgraph.h, dbxout.c, dwarfout2.c, gimple-fold.c,
+   lto-streamer-out.c, print-tree.c, symtab.c, tree-inline.c,
+   tree-streamer-in.c, tree-streamer-out.c, tree.c, tree.h,
+   varpool.c: Rename all instances of DECL_ABSTRACT to
+   DECL_ABSTRACT_P.
+
 2014-09-05  David Malcolm  dmalc...@redhat.com
 
* config/arc/arc.c (arc_print_operand): Use insn method of
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 030a1c7..0902fe9 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1970,7 +1970,7 @@ symtab_node::real_symbol_p (void)
 {
   cgraph_node *cnode;
 
-  if (DECL_ABSTRACT (decl))
+  if (DECL_ABSTRACT_P (decl))
 return false;
   if (!is_a cgraph_node * (this))
 return true;
diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog
index 3d87231..a236569 100644
--- a/gcc/cp/ChangeLog
+++ b/gcc/cp/ChangeLog
@@ -1,3 +1,8 @@
+2014-09-18  Aldy Hernandez  al...@redhat.com
+
+   * class.c, decl.c, optimize.c: Rename all instances of
+   DECL_ABSTRACT to DECL_ABSTRACT_P.
+
 2014-09-05  Jason Merrill  ja...@redhat.com
 
PR c++/62659
diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index 09f946f..1c802b4 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -4580,7 +4580,7 @@ clone_function_decl (tree fn, int update_method_vec_p)
 }
 
   /* Note that this is an abstract function that is never emitted.  */
-  DECL_ABSTRACT (fn) = 1;
+  DECL_ABSTRACT_P (fn) = 1;
 }
 
 /* DECL is an in charge constructor, which is being defined. This will
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index d8fb35e..775e057 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -2262,7 +2262,7 @@ duplicate_decls (tree newdecl, tree olddecl, bool 
newdecl_is_friend)
}
 
   /* Preserve abstractness on cloned [cd]tors.  */
-  DECL_ABSTRACT (newdecl) = DECL_ABSTRACT (olddecl);
+  DECL_ABSTRACT_P (newdecl) = DECL_ABSTRACT_P (olddecl);
 
   /* Update newdecl's parms to point at olddecl.  */
   for (parm = DECL_ARGUMENTS (newdecl); parm;
@@ -10272,7 +10272,7 @@ grokdeclarator (const cp_declarator *declarator,
   clones.  The decloning optimization (for space) may
revert this subsequently if it determines that
the clones should share a common implementation.  */
-   DECL_ABSTRACT (decl) = 1;
+   DECL_ABSTRACT_P (decl) = 1;
}
   else if (current_class_type
constructor_name_p (unqualified_id, current_class_type))
diff --git a/gcc/cp/optimize.c b/gcc/cp/optimize.c
index 31acb07..f37515ec2 100644
--- a/gcc/cp/optimize.c
+++ b/gcc/cp/optimize.c
@@ -270,7 +270,7 @@ maybe_thunk_body (tree fn, bool force)
  (for non-vague linkage ctors) or the COMDAT group (otherwise).  */
 
   populate_clone_array (fn, fns);
-  DECL_ABSTRACT (fn) = false;
+  DECL_ABSTRACT_P (fn) = false;
   if (!DECL_WEAK (fn))
 {
   TREE_PUBLIC (fn) = false;
diff --git a/gcc/dbxout.c b/gcc/dbxout.c
index d856bdd..91cedf7 100644
--- a/gcc/dbxout.c
+++ b/gcc/dbxout.c
@@ -1618,7 +1618,7 @@ dbxout_type_methods (tree type)
 
  /* Also ignore abstract methods; those are only interesting to
 the DWARF backends.  */
- if (DECL_IGNORED_P (fndecl) || DECL_ABSTRACT (fndecl))
+ if (DECL_IGNORED_P (fndecl) || DECL_ABSTRACT_P (fndecl))
continue;
 
  /* Redundantly output the plain name, since that's what gdb
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 23a80d8..e3c4f98 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -3679,7 +3679,7 @@ decl_ultimate_origin (const_tree decl)
   /* output_inline_function sets DECL_ABSTRACT_ORIGIN for all the
  nodes in the function to point to themselves; ignore that if
  we're trying to output the abstract instance of this function.  */
-  if 

Re: [patch] rename DECL_ABSTRACT to DECL_ABSTRACT_P

2014-09-18 Thread Marek Polacek
On Thu, Sep 18, 2014 at 10:11:24AM -0600, Aldy Hernandez wrote:
 Similarly named DECL_ABSTRACT, DECL_ABSTRACT_ORIGIN, and DECL_ORIGIN are
 somewhat confusing to my poor brain.  Particularly annoying is DECL_ABSTRACT
 which is actually a boolean, unlike the other two.
 
 Would it be OK to rename it something more sensible like DECL_ABSTRACT_P?  I
 know this is a longstanding name, but the proposed is clearer and virtually
 the same.
 
 OK for mainline?

IMHO a good idea.

 --- a/gcc/cp/class.c
 +++ b/gcc/cp/class.c
 @@ -4580,7 +4580,7 @@ clone_function_decl (tree fn, int update_method_vec_p)
  }
  
/* Note that this is an abstract function that is never emitted.  */
 -  DECL_ABSTRACT (fn) = 1;
 +  DECL_ABSTRACT_P (fn) = 1;

It'd probably make sense to use 'true' now.

 @@ -10272,7 +10272,7 @@ grokdeclarator (const cp_declarator *declarator,
  clones.  The decloning optimization (for space) may
 revert this subsequently if it determines that
 the clones should share a common implementation.  */
 - DECL_ABSTRACT (decl) = 1;
 + DECL_ABSTRACT_P (decl) = 1;

Likewise.

 --- a/gcc/tree-inline.c
 +++ b/gcc/tree-inline.c
 @@ -5095,7 +5095,7 @@ copy_decl_no_change (tree decl, copy_body_data *id)
copy = copy_node (decl);
  
/* The COPY is not abstract; it will be generated in DST_FN.  */
 -  DECL_ABSTRACT (copy) = 0;
 +  DECL_ABSTRACT_P (copy) = 0;
lang_hooks.dup_lang_specific_decl (copy);

And false here.

 --- a/gcc/varpool.c
 +++ b/gcc/varpool.c
 @@ -704,7 +704,7 @@ add_new_static_var (tree type)
TREE_STATIC (new_decl) = 1;
TREE_USED (new_decl) = 1;
DECL_CONTEXT (new_decl) = NULL_TREE;
 -  DECL_ABSTRACT (new_decl) = 0;
 +  DECL_ABSTRACT_P (new_decl) = 0;

And here.

Marek


Re: [PATCH] Add header guard to several header files.

2014-09-18 Thread Kito Cheng
Hi Joseph:

Here is updated patch and ChangeLog,

However I don't have commit write yet, can you help me to commit it? thanks

btw, I has already signed FSF agreement:)


2014-09-19  Kito Cheng  k...@0xlab.org

except.h: Fix header guard.
addresses.h: Add missing header guard.
cfghooks.h: Likewise.
collect-utils.h: Likewise.
collect2-aix.h: Likewise.
conditions.h: Likewise.
cselib.h: Likewise.
dwarf2asm.h: Likewise.
graphds.h: Likewise.
graphite-scop-detection.h: Likewise.
gsyms.h: Likewise.
hw-doloop.h: Likewise.
incpath.h: Likewise.
ipa-inline.h: Likewise.
ipa-ref.h: Likewise.
ira-int.h: Likewise.
ira.h: Likewise.
lra-int.h: Likewise.
lra.h: Likewise.
lto-section-names.h: Likewise.
read-md.h: Likewise.
reload.h: Likewise.
rtl-error.h: Likewise.
sdbout.h: Likewise.
targhooks.h: Likewise.
tree-affine.h: Likewise.
xcoff.h: Likewise.
xcoffout.h: Likewise.

 OK except for the changes to target-def.h and target-hooks-macros.h.
 (Those aren't exactly normal headers that could reasonably be included
 more than once in a source file; they have dependencies on where they get
 included and what's defined before/after inclusion.  So while I suspect
 the include guards would not cause any problems in those headers, it's not
 obvious they're desirable either.)
From 8c7e08c00526265f21830f72c7b266fd48ddea17 Mon Sep 17 00:00:00 2001
From: Kito Cheng kito.ch...@gmail.com
Date: Fri, 22 Aug 2014 17:34:49 +0800
Subject: [PATCH] Add header guard to several header files.

2014-09-19  Kito Cheng  k...@0xlab.org

	except.h: Fix header guard.
	addresses.h: Add missing header guard.
	cfghooks.h: Likewise.
	collect-utils.h: Likewise.
	collect2-aix.h: Likewise.
	conditions.h: Likewise.
	cselib.h: Likewise.
	dwarf2asm.h: Likewise.
	graphds.h: Likewise.
	graphite-scop-detection.h: Likewise.
	gsyms.h: Likewise.
	hw-doloop.h: Likewise.
	incpath.h: Likewise.
	ipa-inline.h: Likewise.
	ipa-ref.h: Likewise.
	ira-int.h: Likewise.
	ira.h: Likewise.
	lra-int.h: Likewise.
	lra.h: Likewise.
	lto-section-names.h: Likewise.
	read-md.h: Likewise.
	reload.h: Likewise.
	rtl-error.h: Likewise.
	sdbout.h: Likewise.
	targhooks.h: Likewise.
	tree-affine.h: Likewise.
	xcoff.h: Likewise.
	xcoffout.h: Likewise.
---
 gcc/addresses.h   | 5 +
 gcc/cfghooks.h| 4 
 gcc/collect-utils.h   | 5 +
 gcc/collect2-aix.h| 4 
 gcc/conditions.h  | 5 +
 gcc/cselib.h  | 5 +
 gcc/dwarf2asm.h   | 4 
 gcc/except.h  | 5 +++--
 gcc/graphds.h | 5 +
 gcc/graphite-scop-detection.h | 4 
 gcc/gsyms.h   | 4 
 gcc/hw-doloop.h   | 5 +
 gcc/incpath.h | 5 +
 gcc/ipa-inline.h  | 5 +
 gcc/ipa-ref.h | 5 +
 gcc/ira-int.h | 5 +
 gcc/ira.h | 5 +
 gcc/lra-int.h | 5 +
 gcc/lra.h | 5 +
 gcc/lto-section-names.h   | 5 +
 gcc/read-md.h | 5 +
 gcc/reload.h  | 4 
 gcc/rtl-error.h   | 5 +
 gcc/sdbout.h  | 5 +
 gcc/targhooks.h   | 5 +
 gcc/tree-affine.h | 5 +
 gcc/xcoff.h   | 5 +
 gcc/xcoffout.h| 4 
 28 files changed, 131 insertions(+), 2 deletions(-)

diff --git a/gcc/addresses.h b/gcc/addresses.h
index e323b58..3f0089a 100644
--- a/gcc/addresses.h
+++ b/gcc/addresses.h
@@ -21,6 +21,9 @@ along with GCC; see the file COPYING3.  If not see
MODE_BASE_REG_REG_CLASS, MODE_BASE_REG_CLASS and BASE_REG_CLASS.
Arguments as for the MODE_CODE_BASE_REG_CLASS macro.  */
 
+#ifndef GCC_ADDRESSES_H
+#define GCC_ADDRESSES_H
+
 static inline enum reg_class
 base_reg_class (enum machine_mode mode ATTRIBUTE_UNUSED,
 		addr_space_t as ATTRIBUTE_UNUSED,
@@ -82,3 +85,5 @@ regno_ok_for_base_p (unsigned regno, enum machine_mode mode, addr_space_t as,
 
   return ok_for_base_p_1 (regno, mode, as, outer_code, index_code);
 }
+
+#endif /* GCC_ADDRESSES_H */
diff --git a/gcc/cfghooks.h b/gcc/cfghooks.h
index 8ff808c..1b8587a 100644
--- a/gcc/cfghooks.h
+++ b/gcc/cfghooks.h
@@ -18,6 +18,9 @@ You should have received a copy of the GNU General Public License
 along with GCC; see the file COPYING3.  If not see
 http://www.gnu.org/licenses/.  */
 
+#ifndef GCC_CFGHOOKS_H
+#define GCC_CFGHOOKS_H
+
 /* Only basic-block.h includes this.  */
 
 struct cfg_hooks
@@ -221,3 +224,4 @@ extern void gimple_register_cfg_hooks (void);
 extern struct cfg_hooks get_cfg_hooks (void);
 extern void set_cfg_hooks (struct cfg_hooks);
 
+#endif /* GCC_CFGHOOKS_H */
diff --git a/gcc/collect-utils.h b/gcc/collect-utils.h
index 2989c6b..ba1985e 100644
--- a/gcc/collect-utils.h
+++ b/gcc/collect-utils.h
@@ -17,6 +17,9 @@ You should have received a copy of the 

Re: [patch] rename DECL_ABSTRACT to DECL_ABSTRACT_P

2014-09-18 Thread Aldy Hernandez



-  DECL_ABSTRACT (fn) = 1;
+  DECL_ABSTRACT_P (fn) = 1;


It'd probably make sense to use 'true' now.


I thought about it, but I wanted to change as little as possible, plus I 
wanted to follow the same style as what we've been doing for a lot of 
the _P macros:


DECL_HAS_VALUE_EXPR_P (t) = 1;
DECL_HAS_DEBUG_ARGS_P (from) = 1;
DECL_IGNORED_P (lab) = 1;
TREE_PUBLIC (decl) = 1;
CONSTANT_POOL_ADDRESS_P (symbol) = 1;

etc, etc.

But I am happy to change it, if people feel strongly about it.  (Though 
I'm not volunteering to change the other umteenhundred _P macros that 
currently use 0/1 ;-)).


Aldy


Re: [patch] rename DECL_ABSTRACT to DECL_ABSTRACT_P

2014-09-18 Thread Marek Polacek
On Thu, Sep 18, 2014 at 10:30:30AM -0600, Aldy Hernandez wrote:
 
 -  DECL_ABSTRACT (fn) = 1;
 +  DECL_ABSTRACT_P (fn) = 1;
 
 It'd probably make sense to use 'true' now.
 
 I thought about it, but I wanted to change as little as possible, plus I
 wanted to follow the same style as what we've been doing for a lot of the _P
 macros:
 
 DECL_HAS_VALUE_EXPR_P (t) = 1;
 DECL_HAS_DEBUG_ARGS_P (from) = 1;
 DECL_IGNORED_P (lab) = 1;
 TREE_PUBLIC (decl) = 1;
 CONSTANT_POOL_ADDRESS_P (symbol) = 1;
 
 etc, etc.
 
 But I am happy to change it, if people feel strongly about it.  (Though I'm
 not volunteering to change the other umteenhundred _P macros that currently
 use 0/1 ;-)).

Yeah, sure, either way it's a good cleanup ;).

Marek


[patch] update comments on *_ultimate_origin

2014-09-18 Thread Aldy Hernandez
output_inline_function was removed in tree-ssa times, no sense 
referencing it a decade later.


I still see DECL_ABSTRACT_ORIGIN pointing to itself in some instances, 
though I haven't tracked down where, so I assume we still need the 
functionality, just not the comment :).


OK for mainline?

Aldy
commit d51576de0a8450634ff7622e4688fd02fc8fcee9
Author: Aldy Hernandez al...@redhat.com
Date:   Thu Sep 18 10:35:30 2014 -0600

* dwarf2out.c (decl_ultimate_origin): Update comment.
* tree.c (block_ultimate_origin): Same.

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 23a80d8..c65c756 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -3676,8 +3676,7 @@ decl_ultimate_origin (const_tree decl)
   if (!CODE_CONTAINS_STRUCT (TREE_CODE (decl), TS_DECL_COMMON))
 return NULL_TREE;
 
-  /* output_inline_function sets DECL_ABSTRACT_ORIGIN for all the
- nodes in the function to point to themselves; ignore that if
+  /* DECL_ABSTRACT_ORIGIN can point to itself; ignore that if
  we're trying to output the abstract instance of this function.  */
   if (DECL_ABSTRACT (decl)  DECL_ABSTRACT_ORIGIN (decl) == decl)
 return NULL_TREE;
diff --git a/gcc/tree.c b/gcc/tree.c
index d1d67ef..fc544de 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -11554,8 +11554,7 @@ block_ultimate_origin (const_tree block)
 {
   tree immediate_origin = BLOCK_ABSTRACT_ORIGIN (block);
 
-  /* output_inline_function sets BLOCK_ABSTRACT_ORIGIN for all the
- nodes in the function to point to themselves; ignore that if
+  /* BLOCK_ABSTRACT_ORIGIN can point to itself; ignore that if
  we're trying to output the abstract instance of this function.  */
   if (BLOCK_ABSTRACT (block)  immediate_origin == block)
 return NULL_TREE;


Re: [PATCH] Put all constants last in tree_swap_operands_p, remove odd -Os check

2014-09-18 Thread Alan Lawrence
We've been seeing errors using aarch64-none-linux-gnu gcc to build the 403.gcc 
benchmark from spec2k6, that we've traced back to this patch. The error looks like:


/home/alalaw01/bootstrap_richie/gcc/xgcc   -B/home/alalaw01/bootstrap_richie/gcc 
-O3 -mcpu=cortex-a57.cortex-a53  -DSPEC_CPU_LP64alloca.o asprintf.o 
vasprintf.o c-parse.o c-lang.o attribs.o c-errors.o c-lex.o c-pragma.o c-decl.o 
c-typeck.o c-convert.o c-aux-info.o c-common.o c-format.o c-semantics.o 
c-objc-common.o main.o cpplib.o cpplex.o cppmacro.o cppexp.o cppfiles.o 
cpphash.o cpperror.o cppinit.o cppdefault.o line-map.o mkdeps.o prefix.o 
version.o mbchar.o alias.o bb-reorder.o bitmap.o builtins.o caller-save.o 
calls.o cfg.o cfganal.o cfgbuild.o cfgcleanup.o cfglayout.o cfgloop.o cfgrtl.o 
combine.o conflict.o convert.o cse.o cselib.o dbxout.o debug.o dependence.o df.o 
diagnostic.o doloop.o dominance.o dwarf2asm.o dwarf2out.o dwarfout.o emit-rtl.o 
except.o explow.o expmed.o expr.o final.o flow.o fold-const.o function.o gcse.o 
genrtl.o ggc-common.o global.o graph.o haifa-sched.o hash.o hashtable.o hooks.o 
ifcvt.o insn-attrtab.o insn-emit.o insn-extract.o insn-opinit.o insn-output.o 
insn-peep.o insn-recog.o integrate.o intl.o jump.o langhooks.o lcm.o lists.o 
local-alloc.o loop.o obstack.o optabs.o params.o predict.o print-rtl.o 
print-tree.o profile.o real.o recog.o reg-stack.o regclass.o regmove.o 
regrename.o reload.o reload1.o reorg.o resource.o rtl.o rtlanal.o rtl-error.o 
sbitmap.o sched-deps.o sched-ebb.o sched-rgn.o sched-vis.o sdbout.o sibcall.o 
simplify-rtx.o ssa.o ssa-ccp.o ssa-dce.o stmt.o stor-layout.o stringpool.o 
timevar.o toplev.o tree.o tree-dump.o tree-inline.o unroll.o varasm.o varray.o 
vmsdbgout.o xcoffout.o ggc-page.o i386.o xmalloc.o xexit.o hashtab.o 
safe-ctype.o splay-tree.o xstrdup.o md5.o fibheap.o xstrerror.o concat.o 
partition.o hex.o lbasename.o getpwd.o ucbqsort.o -lm-o gcc

emit-rtl.o: In function `gen_rtx_REG':
emit-rtl.c:(.text+0x12f8): relocation truncated to fit: 
R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON section 
in regclass.o

emit-rtl.o: In function `gen_rtx':
emit-rtl.c:(.text+0x1824): relocation truncated to fit: 
R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON section 
in regclass.o

collect2: error: ld returned 1 exit status
specmake: *** [gcc] Error 1
Error with make 'specmake -j7 build': check file 
'/home/alalaw01/spectest/benchspec/CPU2006/403.gcc/build/build_base_test./make.err'

  Command returned exit code 2
  Error with make!
*** Error building 403.gcc

Inspecting the compiled emit-rtl.o shows:

$ readelf --relocs good/emit-rtl.o | grep fixed_regs
12a8 005d0113 R_AARCH64_ADR_PRE  fixed_regs + 0
12ac 005d0115 R_AARCH64_ADD_ABS  fixed_regs + 0
1800 005d0113 R_AARCH64_ADR_PRE  fixed_regs + 0
1804 005d0115 R_AARCH64_ADD_ABS  fixed_regs + 0

(that's compiled with a gcc just before this patch), contrastingly using a gcc 
with that patch:


$ readelf --relocs bad/emit-rtl.o | grep fixed_regs
12a8 005d0113 R_AARCH64_ADR_PRE  fixed_regs + 0
12ac 005d0115 R_AARCH64_ADD_ABS  fixed_regs + 0
12f8 005d0113 R_AARCH64_ADR_PRE  fixed_regs + 

12fc 005d0116 R_AARCH64_LDST8_A  fixed_regs + 

1824 005d0113 R_AARCH64_ADR_PRE  fixed_regs + 

1828 005d0116 R_AARCH64_LDST8_A  fixed_regs + 

186c 005d0113 R_AARCH64_ADR_PRE  fixed_regs + 0
1870 005d0115 R_AARCH64_ADD_ABS  fixed_regs + 0

I attach a candidate 'fix', which allows building of 403.gcc on 
aarch64-none-linux-gnu, full regression etc ongoing. (I admit there may be 
better options in terms of canonicalizing if you want to!)


--Alan


Richard Biener wrote:

The following makes tree_swap_operands_p put all constants 2nd place,
also looks through sign-changes when considering further canonicalzations
and removes the odd -Os guard for those.  That was put in with
https://gcc.gnu.org/ml/gcc-patches/2003-10/msg01208.html just
motivated by CSiBE numbers - but rather than disabling canonicalization
this should have disabled the actual harmful transforms.

Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu.

Richard.

2014-08-15  Richard Biener  rguent...@suse.de

* fold-const.c (tree_swap_operands_p): Put all constants
last, also strip sign-changing NOPs when considering further
canonicalization.  Canonicalize also when optimizing for size.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 214007)
+++ gcc/fold-const.c(working copy)
@@ -6642,37 +6650,19 @@ reorder_operands_p (const_tree arg0, 

[PATCH, rs6000] Rename GCC version in warning messages

2014-09-18 Thread Ulrich Weigand
Hello,

the ABI warning messages I introduced in recent patches refer to a GCC
version 4.10.  As GCC has since adopted a new version naming scheme,
this patch updates those messages to refer to GCC 5 instead.

Tested on powerpc64le-linux.

OK for mainline?

Bye,
Ulrich


ChangeLog:

* config/rs6000/rs6000.c (rs6000_special_adjust_field_align_p):
Update GCC version name to GCC 5.
(rs6000_function_arg_boundary): Likewise.
(rs6000_function_arg): Likewise.


Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 215355)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -5939,7 +5939,7 @@
  warned = true;
  inform (input_location,
  the layout of aggregates containing vectors with
-  %d-byte alignment has changed in GCC 4.10,
+  %d-byte alignment has changed in GCC 5,
  computed / BITS_PER_UNIT);
}
}
@@ -9307,7 +9307,7 @@
  warned = true;
  inform (input_location,
  the ABI of passing aggregates with %d-byte alignment
-  has changed in GCC 4.10,
+  has changed in GCC 5,
  (int) TYPE_ALIGN (type) / BITS_PER_UNIT);
}
}
@@ -10428,7 +10428,7 @@
  warned = true;
  inform (input_location,
  the ABI of passing homogeneous float aggregates
-  has changed in GCC 4.10);
+  has changed in GCC 5);
}
}
 
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



[committed] Fix Crash when OpenMP target's array section handling is used with templates (PR c++/63248)

2014-09-18 Thread Jakub Jelinek
On Wed, Sep 10, 2014 at 05:21:07PM +0200, Thomas Schwinge wrote:
 Hi!
 
 On Wed, 10 Sep 2014 12:23:04 +0200, Jakub Jelinek ja...@redhat.com wrote:
  On Wed, Sep 10, 2014 at 12:12:03PM +0200, Thomas Schwinge wrote:
   Are the following issues known?
  
  No, please file a PR.
 
 Will do tomorrow.

Here is a fix I've committed to trunk/4.9 after testing on x86_64-linux.

2014-09-18  Jakub Jelinek  ja...@redhat.com

PR c++/63248
* semantics.c (finish_omp_clauses): Don't call cp_omp_mappable_type
on type of type dependent expressions, and don't call it if
handle_omp_array_sections has kept TREE_LIST because something
was type dependent.
* pt.c (tsubst_expr) case OMP_TARGET, case OMP_TARGET_DATA:
Use keep_next_level, begin_omp_structured_block and
finish_omp_structured_block instead of push_stmt_list and
pop_stmt_list.
libgomp/
* testsuite/libgomp.c++/pr63248.C: New test.

--- gcc/cp/semantics.c.jj   2014-09-17 21:01:11.0 +0200
+++ gcc/cp/semantics.c  2014-09-18 17:05:19.785988633 +0200
@@ -5668,7 +5668,9 @@ finish_omp_clauses (tree clauses)
  else
{
  t = OMP_CLAUSE_DECL (c);
- if (!cp_omp_mappable_type (TREE_TYPE (t)))
+ if (TREE_CODE (t) != TREE_LIST
+  !type_dependent_expression_p (t)
+  !cp_omp_mappable_type (TREE_TYPE (t)))
{
  error_at (OMP_CLAUSE_LOCATION (c),
array section does not have mappable type 
@@ -5708,6 +5710,7 @@ finish_omp_clauses (tree clauses)
remove = true;
  else if (!(OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
  OMP_CLAUSE_MAP_KIND (c) == OMP_CLAUSE_MAP_POINTER)
+   !type_dependent_expression_p (t)
!cp_omp_mappable_type ((TREE_CODE (TREE_TYPE (t))
  == REFERENCE_TYPE)
 ? TREE_TYPE (TREE_TYPE (t))
--- gcc/cp/pt.c.jj  2014-09-16 10:00:35.0 +0200
+++ gcc/cp/pt.c 2014-09-18 18:12:09.804850925 +0200
@@ -14089,8 +14089,6 @@ tsubst_expr (tree t, tree args, tsubst_f
 case OMP_SECTIONS:
 case OMP_SINGLE:
 case OMP_TEAMS:
-case OMP_TARGET_DATA:
-case OMP_TARGET:
   tmp = tsubst_omp_clauses (OMP_CLAUSES (t), false,
args, complain, in_decl);
   stmt = push_stmt_list ();
@@ -14099,6 +14097,22 @@ tsubst_expr (tree t, tree args, tsubst_f
 
   t = copy_node (t);
   OMP_BODY (t) = stmt;
+  OMP_CLAUSES (t) = tmp;
+  add_stmt (t);
+  break;
+
+case OMP_TARGET_DATA:
+case OMP_TARGET:
+  tmp = tsubst_omp_clauses (OMP_CLAUSES (t), false,
+   args, complain, in_decl);
+  keep_next_level (true);
+  stmt = begin_omp_structured_block ();
+
+  RECUR (OMP_BODY (t));
+  stmt = finish_omp_structured_block (stmt);
+
+  t = copy_node (t);
+  OMP_BODY (t) = stmt;
   OMP_CLAUSES (t) = tmp;
   add_stmt (t);
   break;
--- libgomp/testsuite/libgomp.c++/pr63248.C.jj  2014-09-18 18:19:49.806529990 
+0200
+++ libgomp/testsuite/libgomp.c++/pr63248.C 2014-09-18 18:18:58.0 
+0200
@@ -0,0 +1,62 @@
+// PR c++/63248
+// { dg-do run }
+
+int *v;
+
+template typename T
+T
+foo (T A, T B)
+{
+  T a = 2;
+  T b = 4;
+
+#pragma omp target map(v[a:b])
+  v[a] = 1;
+
+#pragma omp target map(v[A:B])
+  v[a] = 2;
+
+#pragma omp target map(A)
+  A = 19;
+  return A;
+}
+
+template int N
+int
+bar (int A, int B)
+{
+#pragma omp target map(A)
+  A = 8;
+  if (A != 8)
+__builtin_abort ();
+#pragma omp target map(A, B)
+  {
+A = 1;
+B = 2;
+  }
+  return A + B;
+}
+
+int
+baz (int A, int B)
+{
+#pragma omp target map(A)
+  A = 8;
+  if (A != 8)
+__builtin_abort ();
+#pragma omp target map(A, B)
+  {
+A = 1;
+B = 2;
+  }
+  return A + B;
+}
+
+int
+main ()
+{
+  int a[10] = { 0 };
+  v = a;
+  if (foo (1, 5) != 19 || v[2] != 2 || bar0 (5, 7) != 3 || baz (5, 7) != 3)
+__builtin_abort ();
+}


Jakub


Re: Patch to fix PR61360

2014-09-18 Thread Vladimir Makarov
On 09/18/2014 12:10 PM, Jakub Jelinek wrote:
 On Thu, Sep 18, 2014 at 12:04:30PM -0400, Vladimir Makarov wrote:
 The following patch fixes the PR.  The details can be found on

 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360

 The patch was bootstrapped and tested on x86/x86-64.

 Committed as rev. 215358.
 What effect does this have on compile time?


It is hard to measure real time but  0.05% according to valgrind lackey
on combine.i for -O2



Re: [PATCH i386 AVX512] [42/n] Add masked vunpck[lh]pd.

2014-09-18 Thread Uros Bizjak
On Thu, Sep 18, 2014 at 1:47 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 Patch in the bottom extends/adds patterns for masked unpack
 instructions.

 Bootstrapped.
 AVX-512* tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/sse.md
 (define_insn avx_unpckhpd256mask_name): Add masking.
 (define_insn avx512vl_unpckhpd128_mask): New.
 (define_expand avx_movddup256mask_name): Add masking.
 (define_expand avx_unpcklpd256mask_name): Ditto.
 (define_insn *avx_unpcklpd256mask_name): Ditto.
 (define_insn avx512vl_unpcklpd128_mask): New.

OK.

Thanks,
Uros.


Re: [PATCH i386 AVX512] [43/n] Add rest of vunpck[lh]ps.

2014-09-18 Thread Uros Bizjak
On Thu, Sep 18, 2014 at 1:54 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 This patch adds rest of unpack insn patterns.

 Bootstrapped.
 AVX-512* tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/sse.md
 (define_insn avx_unpckhps256mask_name): Add masking.
 (define_insn vec_interleave_highv4sfmask_name): Ditto.
 (define_insn avx_unpcklps256mask_name): Ditto.
 (define_insn unpcklps128_mask): New.

OK.

Thanks,
Uros.


[patch] normalize the x86-vxworks port organization

2014-09-18 Thread Olivier Hainque
Hello,

VxWorks ports typically come in two flavors: regular VxWorks and VxWorksAE
(653). In most cases, cpu/vxworks.h is used as a common configuration file
for the two flavors and cpu/vxworksae.h overrides/adds on top of that.
There are also config/vx*.h shared by everybody.

The x86 port departs from this scheme, with a i386/vx-common.h file.

The attached patch is a proposal to bring the x86 port organization
in line with what is done for other CPUs. It essentially 

- moves the contents of i386/vx-common.h within i386/vxworks.h,
- removes i386/vx-common.h
- adjusts config.gcc accordingly

The patch takes the opportunity to

- cleanup i386/vxworksae.h, removing redundant or obsolete
  definitions and putting the one we use wrt stack-checking support
  for this platform.

We (AdaCore) have been using this succesfully for a while on gcc-4.7
and recently on gcc-4.9, for both VxWorks6 and VxWorksAE targets.

The patch attached here applies on mainline and passes
make all-gcc for --target=i686-wrs-vxworksae --enable-languages=c

OK to commit ?

Thanks in advance for your feedback,

With Kind Regards,

Olivier

2014-09-18  Olivier Hainque  hain...@adacore.com

* config/i386/vxworksae.h: Remove obsolete definitions.
(STACK_CHECK_PROTECT): Define.
* config/i386/vx-common.h: Remove.  Merge contents within
config/i386/vxworks.h.
* config.gcc (i?86-vxworks*): Use i386/vxworks.h instead of
i386/vx-common.h.



cleanup-x86vx653.diff
Description: Binary data


Re: [PATCH i386 AVX512] [44/n] Add vsgufps insn patterns.

2014-09-18 Thread Uros Bizjak
On Thu, Sep 18, 2014 at 1:59 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 Patch in the bottom extends AVX-512 shufps.

 Bootstrapped.
 AVX-512* tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/sse.md
 (define_expand avx_shufps256mask_expand4_name): Add masking.
 (define_insn avx_shufps256_1mask_name): Ditto.
 (define_expand sse_shufpsmask_expand4_name): Ditto.
 (define_insn sse_shufps_v4sf_mask): New.

OK.

Thanks,
Uros.


Re: [PATCH i386 AVX512] [45/n] Add vshufpd insn patterns.

2014-09-18 Thread Uros Bizjak
On Thu, Sep 18, 2014 at 2:02 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 This patch supports AVX-512's vshufpd insns.

 Bootstrapped.
 AVX-512* tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/sse.md
 (define_expand avx_shufpd256mask_expand4_name): Add masking.
 (define_insn avx_shufpd256_1mask_name): Ditto.
 (define_expand sse2_shufpdmask_expand4_name): Ditto.
 (define_insn sse2_shufpd_v2df_mask): New.

OK.

Thanks,
Uros.


Re: [PATCHv4] Vimrc config with GNU formatting

2014-09-18 Thread Segher Boessenkool
On Thu, Sep 18, 2014 at 12:40:08PM +0400, Yury Gribov wrote:
 When typing 'make .local.vimrc' in GCC build directory one would expect
 .local.vimrc to be created at the root of build directory, not srcdir.

Yes, you would not expect it to do anything to your source dir, ever :-)

  + To enable this for GCC files by default, install thinca's vim-localrc
  + plugin and do
  +   $ make .local.vimrc
 
  No, we should *not* advertise an enough rope solution without 
 mentioning
  it *will* kill you.
 
 How about adding a disclaimer? E.g. beware that Vim plugins are a 
 GAPING SECURITY HOLE
 so use the at YOUR OWN RISK. (And note that Braun's plugin does use 
 sandboxes).

This *particular* plugin is suicidal.  Most plugins are just fine.

  Or not mention it at all.  Esp. since your next option
  has all the same functionality and more.
 
 It lacks very important functionality: user has to specify path
 to concrete GCC source tree when adding the autocmd.

I was talking about mbr's plugin here :-)

 I have a dozen of trees on my box and I regularly rename, move or copy them.
 With plugins one doesn't have to bother fixing paths in ~/.vimrc
 which is important for productivity.

And  :au bufread ~/src/gcc/* ...  works for me.  To each their own.

  + Or if you dislike plugins, add autocmd in your ~/.vimrc:
  +   :au BufNewFile,BufReadPost path/to/gcc/* :so 
 path/to/gcc/contrib/vimrc
 
  There are many more reasons than just dislike of plugins to prefer
  something like this.  For one thing, many Vim users will have many
  similar statements in their config _already_.
 
 So if you don't want to use plugins?

Just mention it as another option?  Something like
You can add these options to your .vimrc; or you can :source this script
file; or do either with an :autocmd; or use e.g. the name of plugin here
plugin some vim.org url.  Don't say do X if Y; let people decide for
themselves what fits their situation best.

  + Or just source file manually every time if you are masochist:
  +   :so path/to/gcc/contrib/vimrc
 
  How is that masochist?  Typing that cino by hand though, now that would
  qualify ;-)
 
 Note that user has to type source command for every newly opened file.
 This indeed looks inconvenient (to me again).

Well for most people it is justt :so contrib/vimrc.  Or just :lo if
you're talking about crazy people with views.

  +setlocal 
 cinoptions=2s,n-s,{s,^-s,:s,=s,g0,f0,hs,p2s,t0,+s,(0,u0,w1,m0
 
  If you write this as absolute numbers instead of as shift widths, you 
 do not
  need to force sw and sts settings down people's throat.  It might also be
  easier to read?  Well I doubt that, but it will be slightly shorter 
 at least.
 
 IMHO matching shiftwidth with GNU indent may be useful.
 E.g. Vim won't reindent when you start editing an empty line and user
 will have to insert indents manually.
 
 Also replacing offsets with numbers hides the fact
 that they are based on GNU shiftwidth.

I have no idea what you mean with matching with GNU indent, sorry.
I was suggesting you could write it as
:set cino=4,n-2,{2,^-2,:2,=2,g0,f0,h2,p4,t0,+2,(0,u0,w1,m0
and you'd be independent of sw setting.  The coding standard says
indent two spaces etc. anyway.

And yeah sw=2 does make sense for editing GCC, if you are used to sw=2
that is.  The point is that the sw setting has nothing to do with what
your text will look like, only with what keys you press.

  +setlocal textwidth=79
 
  The coding conventions say maximum line length is 80.
 
 From https://www.gnu.org/prep/standards/html_node/Formatting.html : 
 Please keep the length of source lines to 79 characters or less, for 
 maximum readability in the widest range of environments.

There is a doc on gcc.gnu.org as well, which describes many more details.

 Now we rarely do violate textwidth in our codes,

rarely?  Ho hum.  There are many worse formatting errors, of course.
And how much do those matter _really_.

  And you do not enable t (also
  on by default), so you do not want to wrap text anyway?  Confused now.
 
 Me as well, the original config author did it that way. IMHO +t makes 
 sense here.

It is certainly more consistent.


Segher


Re: [PATCH, i386, Pointer Bounds Checker 30/x] Size relocation

2014-09-18 Thread Uros Bizjak
On Thu, Sep 18, 2014 at 4:00 PM, Ilya Enkovich enkovich@gmail.com wrote:
 On 17 Sep 20:51, Uros Bizjak wrote:
 On Wed, Sep 17, 2014 at 8:35 PM, Ilya Enkovich enkovich@gmail.com 
 wrote:
  On 16 Sep 12:22, Uros Bizjak wrote:
  On Tue, Sep 16, 2014 at 11:37 AM, Ilya Enkovich enkovich@gmail.com 
  wrote:
   2014-09-16 13:08 GMT+04:00 Uros Bizjak ubiz...@gmail.com:
  
   Can x86_64_immediate_operand predicate be used here?
  
   I think it cannot be used because of TLS symbols not counting as 
   immediate.
 
  OK, please introduce a new predicate, similar to
  x86_64_immediate_operand, perhaps x86_64_immediate_size_operand, so we
  can add some comments there. This will also help to macroize the insn,
  x86_64_immediate_operand has !TARGET_64BIT shortcut for this case.
 
  Uros.
 
  I don't see how new predicate would help to macroize insn.  Single 
  template may look as following patch.

 You put early return for !TARGET_64BITS. Please see
 x86_64_immediate_operand predicate.

 So,

 /* Here comes comment. */
 (define_predicate x86_64_immediate_size_operand
   (match_code symbol_ref)
 {
   if (!TARGET_64BIT)
 return true;

   /* Comment here explaining these conditions.  */
   return (ix86_cmodel == CM_SMALL || ix86_cmodel == CM_KERNEL);
 }

 And then in the pattern itself:

 if (x86_64_immediate_size_operand (operands[1], VOIDmode)
   return mov{l}\t{%1@SIZE, %k0|%k0, %1@SIZE};
 else
   return movabs{q}\t{%1@SIZE, %0|%0, %1@SIZE};

 Uros.

 Here is a version with check in a form you suggest.

 Thanks,
 Ilya
 --
 2014-09-18  Ilya Enkovich  ilya.enkov...@intel.com

 * config/i386/i386.md (UNSPEC_SIZEOF): New.
 (move_size_reloc_mode): New.
 * config/i386/predicates.md (symbol_operand): New.
 (x86_64_immediate_size_operand): New.

OK.

We are always on the safe side now, movl is an optimization exception.
I wonder if we can also add something like

  || (ix86_cmodel == CM_MEDIUM  !SYMBOL_REF_FAR_ADDR_P (op)));

as is the case with x86_64_immediate_operand, but I am not sure that
object size is guaranteed to fit in 31bits. Maybe Honza (CC'd) can
confirm this.

Uros.

 diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
 index 2c367b2..db22b06 100644
 --- a/gcc/config/i386/i386.md
 +++ b/gcc/config/i386/i386.md
 @@ -79,6 +79,7 @@
UNSPEC_PLTOFF
UNSPEC_MACHOPIC_OFFSET
UNSPEC_PCREL
 +  UNSPEC_SIZEOF

;; Prologue support
UNSPEC_STACK_ALLOC
 @@ -18554,6 +18555,21 @@
bndstx\t{%2, %3|%3, %2}
[(set_attr type mpxst)])

 +(define_insn move_size_reloc_mode
 +  [(set (match_operand:SWI48 0 register_operand =r)
 +   (unspec:SWI48
 +[(match_operand:SWI48 1 symbol_operand)]
 +UNSPEC_SIZEOF))]
 +  TARGET_MPX
 +{
 +  if (x86_64_immediate_size_operand (operands[1], VOIDmode))
 +return mov{l}\t{%1@SIZE, %k0|%k0, %1@SIZE};
 +  else
 +return movabs{q}\t{%1@SIZE, %0|%0, %1@SIZE};
 +}
 +  [(set_attr type imov)
 +   (set_attr mode MODE)])
 +
  (include mmx.md)
  (include sse.md)
  (include sync.md)
 diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
 index cd542b7..da01c9a 100644
 --- a/gcc/config/i386/predicates.md
 +++ b/gcc/config/i386/predicates.md
 @@ -124,6 +124,10 @@
 (match_test TARGET_64BIT)
 (match_test REGNO (op)  BX_REG)))

 +;; Return true if VALUE is symbol reference
 +(define_predicate symbol_operand
 +  (match_code symbol_ref))
 +
  ;; Return true if VALUE can be stored in a sign extended immediate field.
  (define_predicate x86_64_immediate_operand
(match_code const_int,symbol_ref,label_ref,const)
 @@ -336,6 +340,19 @@
return false;
  })

 +;; Return true if size of VALUE can be stored in a sign
 +;; extended immediate field.
 +(define_predicate x86_64_immediate_size_operand
 +  (match_code symbol_ref)
 +{
 +  if (!TARGET_64BIT)
 +return true;
 +
 +  /* For 64 bit target we may assume size of object fits
 + immediate only when code model guarantees that.  */
 +  return (ix86_cmodel == CM_SMALL || ix86_cmodel == CM_KERNEL);
 +})
 +
  ;; Return true if OP is general operand representable on x86_64.
  (define_predicate x86_64_general_operand
(if_then_else (match_test TARGET_64BIT)


Re: parallel check output changes?

2014-09-18 Thread Bernd Schmidt

On 09/18/2014 05:03 PM, Andrew MacLeod wrote:

On 09/18/2014 09:05 AM, Andrew MacLeod wrote:

On 09/18/2014 09:01 AM, Jakub Jelinek wrote:

On Thu, Sep 18, 2014 at 08:56:50AM -0400, Andrew MacLeod wrote:

Has the changes that have gone into the check parallelization made
the .sum
file non-deterministic?
I'm seeing a lot of small hunks in different orders which cause my
comparison scripts to show big differences.
I haven't been paying attention to the nature of the make check
changes so
Im not sure if this is expected...

Or is this something else?  Its the same code base between runs,
just with a
few changes made to some include files.

I'm using contrib/test_summary and haven't seen any non-determinisms
in the
output of that command.  As for dg-extract-results.sh, we have two
versions
of that, one if you have python 2.6 or newer, another one if you don't.
Perhaps the behavior of those two (I'm using the python version
probably)
differs?

Jakub

Not sure, although I do have python 2.7.5 installed for what its
worth...  I'll try another run in a bit.

Andrew


hum. My 3rd run (which has no compilation change from the 2nd one) is
different from both other runs  :-P.   I did tweak my -j parameter in
the make check, but that is it.


I'm also seeing this. Python 3.3.5 here.


Bernd




[jit] Markup fixes within documentation

2014-09-18 Thread David Malcolm
Committed to branch dmalcolm/jit:

gcc/jit/ChangeLog.jit:
* docs/intro/install.rst: Markup fixes.
* docs/intro/tutorial01.rst: Likewise.
* docs/intro/tutorial02.rst: Likewise.
* docs/topics/contexts.rst: Likewise.
* docs/topics/expressions.rst: Likewise.
* docs/topics/functions.rst: Likewise.
* docs/topics/locations.rst: Likewise.
* docs/topics/types.rst: Likewise.
---
 gcc/jit/ChangeLog.jit   | 11 +++
 gcc/jit/docs/intro/install.rst  | 32 +++-
 gcc/jit/docs/intro/tutorial01.rst   |  8 +---
 gcc/jit/docs/intro/tutorial02.rst   |  4 +++-
 gcc/jit/docs/topics/contexts.rst| 16 +---
 gcc/jit/docs/topics/expressions.rst | 34 +-
 gcc/jit/docs/topics/functions.rst   | 28 +---
 gcc/jit/docs/topics/locations.rst   |  8 ++--
 gcc/jit/docs/topics/types.rst   | 16 
 9 files changed, 119 insertions(+), 38 deletions(-)

diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit
index 7ee7ebf..11c9298 100644
--- a/gcc/jit/ChangeLog.jit
+++ b/gcc/jit/ChangeLog.jit
@@ -1,5 +1,16 @@
 2014-09-18  David Malcolm  dmalc...@redhat.com
 
+   * docs/intro/install.rst: Markup fixes.
+   * docs/intro/tutorial01.rst: Likewise.
+   * docs/intro/tutorial02.rst: Likewise.
+   * docs/topics/contexts.rst: Likewise.
+   * docs/topics/expressions.rst: Likewise.
+   * docs/topics/functions.rst: Likewise.
+   * docs/topics/locations.rst: Likewise.
+   * docs/topics/types.rst: Likewise.
+
+2014-09-18  David Malcolm  dmalc...@redhat.com
+
* docs/examples/install-hello-world.c (main): Fix missing
return.
* docs/examples/tut01-square.c (main): Likewise.
diff --git a/gcc/jit/docs/intro/install.rst b/gcc/jit/docs/intro/install.rst
index fc2e96e..1a39192 100644
--- a/gcc/jit/docs/intro/install.rst
+++ b/gcc/jit/docs/intro/install.rst
@@ -41,12 +41,14 @@ your system.  Having done this,
   sudo yum install libgccjit-devel
 
 should give you both the JIT library (`libgccjit`) and the header files
-needed to develop against it (`libgccjit-devel`)::
+needed to develop against it (`libgccjit-devel`):
 
-  [david@c64 ~]$ rpm -qlv libgccjit
+.. code-block:: console
+
+  $ rpm -qlv libgccjit
   lrwxrwxrwx1 rootroot   18 Aug 12 07:56 
/usr/lib64/libgccjit.so.0 - libgccjit.so.0.0.1
   -rwxr-xr-x1 rootroot 14463448 Aug 12 07:57 
/usr/lib64/libgccjit.so.0.0.1
-  [david@c64 ~]$ rpm -qlv libgccjit-devel
+  $ rpm -qlv libgccjit-devel
   -rwxr-xr-x1 rootroot37654 Aug 12 07:56 
/usr/include/libgccjit++.h
   -rwxr-xr-x1 rootroot28967 Aug 12 07:56 
/usr/include/libgccjit.h
   lrwxrwxrwx1 rootroot   14 Aug 12 07:56 
/usr/lib64/libgccjit.so - libgccjit.so.0
@@ -103,7 +105,9 @@ To build it (within the jit/build subdirectory, 
installing to
 On my 4-core laptop this takes 17 minutes and 1.1G of disk space
 (it's much faster with many cores and a corresponding -j setting).
 
-This should build a libgccjit.so within jit/build/gcc::
+This should build a libgccjit.so within jit/build/gcc:
+
+.. code-block:: console
 
  [build] $ file gcc/libgccjit.so*
  gcc/libgccjit.so:   symbolic link to `libgccjit.so.0'
@@ -126,14 +130,18 @@ earlier) via:
 On my laptop this uses a further 0.4G of disk space.
 
 You should be able to see the header files within the `include`
-subdirectory of the installation prefix::
+subdirectory of the installation prefix:
+
+.. code-block:: console
 
   $ find $PREFIX/include
   /home/david/gcc-jit/install/include
   /home/david/gcc-jit/install/include/libgccjit.h
   /home/david/gcc-jit/install/include/libgccjit++.h
 
-and the library within the `lib` subdirectory::
+and the library within the `lib` subdirectory:
+
+.. code-block:: console
 
   $ find $PREFIX/lib/libgccjit.*
   /home/david/gcc-jit/install/lib/libgccjit.so
@@ -152,7 +160,9 @@ a call to `printf` and use it to write a message to stdout.
 
 Copy it to `jit-hello-world.c`.
 
-To build it with prebuilt packages, use::
+To build it with prebuilt packages, use:
+
+.. code-block:: console
 
   $ gcc \
   jit-hello-world.c \
@@ -165,7 +175,9 @@ To build it with prebuilt packages, use::
 
 
 If building against an locally-built install (to $PREFIX), specify the
-include and library paths with -I and -L::
+include and library paths with -I and -L:
+
+.. code-block:: console
 
   $ gcc \
   jit-hello-world.c \
@@ -173,7 +185,9 @@ include and library paths with -I and -L::
   -lgccjit \
   -I$PREFIX/include -L$PREFIX/lib
 
-and when running, specify the dynamic linkage path via LD_LIBRARY_PATH::
+and when running, specify the dynamic linkage path via LD_LIBRARY_PATH:
+
+.. code-block:: console
 
   $ LD_LIBRARY_PATH=$PREFIX/lib ./jit-hello-world
   hello world
diff --git 

Re: [PATCH, i386, Pointer Bounds Checker 31/x] Pointer Bounds Checker builtins for i386 target

2014-09-18 Thread Uros Bizjak
On Thu, Sep 18, 2014 at 3:47 PM, Ilya Enkovich enkovich@gmail.com wrote:

 Thanks for your comments.  Below is a fixed verison.

 Ilya
 --
 2014-09-17  Ilya Enkovich  ilya.enkov...@intel.com

 * config/i386/i386-builtin-types.def (BND): New.
 (ULONG): New.
 (BND_FTYPE_PCVOID_ULONG): New.
 (VOID_FTYPE_BND_PCVOID): New.
 (VOID_FTYPE_PCVOID_PCVOID_BND): New.
 (BND_FTYPE_PCVOID_PCVOID): New.
 (BND_FTYPE_PCVOID): New.
 (BND_FTYPE_BND_BND): New.
 (PVOID_FTYPE_PVOID_PVOID_ULONG): New.
 (PVOID_FTYPE_PCVOID_BND_ULONG): New.
 (ULONG_FTYPE_VOID): New.
 (PVOID_FTYPE_BND): New.
 * config/i386/i386.c: Include tree-chkp.h, rtl-chkp.h.
 (ix86_builtins): Add
 IX86_BUILTIN_BNDMK, IX86_BUILTIN_BNDSTX,
 IX86_BUILTIN_BNDLDX, IX86_BUILTIN_BNDCL,
 IX86_BUILTIN_BNDCU, IX86_BUILTIN_BNDRET,
 IX86_BUILTIN_BNDNARROW, IX86_BUILTIN_BNDINT,
 IX86_BUILTIN_SIZEOF, IX86_BUILTIN_BNDLOWER,
 IX86_BUILTIN_BNDUPPER.
 (builtin_isa): Add leaf_p and nothrow_p fields.
 (def_builtin): Initialize leaf_p and nothrow_p.
 (ix86_add_new_builtins): Handle leaf_p and nothrow_p
 flags.
 (bdesc_mpx): New.
 (bdesc_mpx_const): New.
 (ix86_init_mpx_builtins): New.
 (ix86_init_builtins): Call ix86_init_mpx_builtins.
 (ix86_emit_cmove): New.
 (ix86_emit_move_max): New.
 (ix86_expand_builtin): Expand IX86_BUILTIN_BNDMK,
 IX86_BUILTIN_BNDSTX, IX86_BUILTIN_BNDLDX,
 IX86_BUILTIN_BNDCL, IX86_BUILTIN_BNDCU,
 IX86_BUILTIN_BNDRET, IX86_BUILTIN_BNDNARROW,
 IX86_BUILTIN_BNDINT, IX86_BUILTIN_SIZEOF,
 IX86_BUILTIN_BNDLOWER, IX86_BUILTIN_BNDUPPER.

OK with a few nits below.

Thanks,
Uros.


 diff --git a/gcc/config/i386/i386-builtin-types.def 
 b/gcc/config/i386/i386-builtin-types.def
 index 35c0035..989297a 100644
 --- a/gcc/config/i386/i386-builtin-types.def
 +++ b/gcc/config/i386/i386-builtin-types.def
 @@ -47,6 +47,7 @@ DEF_PRIMITIVE_TYPE (UCHAR, unsigned_char_type_node)
  DEF_PRIMITIVE_TYPE (QI, char_type_node)
  DEF_PRIMITIVE_TYPE (HI, intHI_type_node)
  DEF_PRIMITIVE_TYPE (SI, intSI_type_node)
 +DEF_PRIMITIVE_TYPE (BND, pointer_bounds_type_node)
  # ??? Logically this should be intDI_type_node, but that maps to long
  # with 64-bit, and that's not how the emmintrin.h is written.  Again,
  # changing this would change name mangling.
 @@ -60,6 +61,7 @@ DEF_PRIMITIVE_TYPE (USHORT, short_unsigned_type_node)
  DEF_PRIMITIVE_TYPE (INT, integer_type_node)
  DEF_PRIMITIVE_TYPE (UINT, unsigned_type_node)
  DEF_PRIMITIVE_TYPE (UNSIGNED, unsigned_type_node)
 +DEF_PRIMITIVE_TYPE (ULONG, long_unsigned_type_node)
  DEF_PRIMITIVE_TYPE (LONGLONG, long_long_integer_type_node)
  DEF_PRIMITIVE_TYPE (ULONGLONG, long_long_unsigned_type_node)
  DEF_PRIMITIVE_TYPE (UINT8, unsigned_char_type_node)
 @@ -806,3 +808,15 @@ DEF_FUNCTION_TYPE_ALIAS (V2DI_FTYPE_V2DI_V2DI, TF)
  DEF_FUNCTION_TYPE_ALIAS (V4SF_FTYPE_V4SF_V4SF, TF)
  DEF_FUNCTION_TYPE_ALIAS (V4SI_FTYPE_V4SI_V4SI, TF)
  DEF_FUNCTION_TYPE_ALIAS (V8HI_FTYPE_V8HI_V8HI, TF)
 +
 +# MPX builtins
 +DEF_FUNCTION_TYPE (BND, PCVOID, ULONG)
 +DEF_FUNCTION_TYPE (VOID, PCVOID, BND)
 +DEF_FUNCTION_TYPE (VOID, PCVOID, BND, PCVOID)
 +DEF_FUNCTION_TYPE (BND, PCVOID, PCVOID)
 +DEF_FUNCTION_TYPE (BND, PCVOID)
 +DEF_FUNCTION_TYPE (BND, BND, BND)
 +DEF_FUNCTION_TYPE (PVOID, PVOID, PVOID, ULONG)
 +DEF_FUNCTION_TYPE (PVOID, PCVOID, BND, ULONG)
 +DEF_FUNCTION_TYPE (ULONG, VOID)
 +DEF_FUNCTION_TYPE (PVOID, BND)
 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index d0f58b1..6082f86 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -85,6 +85,8 @@ along with GCC; see the file COPYING3.  If not see
  #include tree-vectorizer.h
  #include shrink-wrap.h
  #include builtins.h
 +#include tree-chkp.h
 +#include rtl-chkp.h

  static rtx legitimize_dllimport_symbol (rtx, bool);
  static rtx legitimize_pe_coff_extern_decl (rtx, bool);
 @@ -28775,6 +28777,19 @@ enum ix86_builtins
IX86_BUILTIN_XABORT,
IX86_BUILTIN_XTEST,

 +  /* MPX */
 +  IX86_BUILTIN_BNDMK,
 +  IX86_BUILTIN_BNDSTX,
 +  IX86_BUILTIN_BNDLDX,
 +  IX86_BUILTIN_BNDCL,
 +  IX86_BUILTIN_BNDCU,
 +  IX86_BUILTIN_BNDRET,
 +  IX86_BUILTIN_BNDNARROW,
 +  IX86_BUILTIN_BNDINT,
 +  IX86_BUILTIN_SIZEOF,
 +  IX86_BUILTIN_BNDLOWER,
 +  IX86_BUILTIN_BNDUPPER,
 +
/* BMI instructions.  */
IX86_BUILTIN_BEXTR32,
IX86_BUILTIN_BEXTR64,
 @@ -28848,6 +28863,8 @@ struct builtin_isa {
enum ix86_builtin_func_type tcode; /* type to use in the declaration */
HOST_WIDE_INT isa;   /* isa_flags this builtin is defined for */
bool const_p;/* true if the declaration is 
 constant */
 +  bool leaf_p; /* true if the declaration has leaf attribute 
 */
 +  bool nothrow_p;  /* true if the declaration has nothrow 
 attribute */

Re: parallel check output changes?

2014-09-18 Thread Jakub Jelinek
On Thu, Sep 18, 2014 at 07:32:00PM +0200, Bernd Schmidt wrote:
 hum. My 3rd run (which has no compilation change from the 2nd one) is
 different from both other runs  :-P.   I did tweak my -j parameter in
 the make check, but that is it.
 
 I'm also seeing this. Python 3.3.5 here.

Segher on IRC mentioned that changing result_re in dg-extract-results.py
should help here (or disabling the python version, *.sh version should
sort everything).

Jakub


Re: Patch to fix PR61360

2014-09-18 Thread Richard Sandiford
Jakub Jelinek ja...@redhat.com writes:
 On Thu, Sep 18, 2014 at 12:04:30PM -0400, Vladimir Makarov wrote:
 The following patch fixes the PR.  The details can be found on
 
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360
 
 The patch was bootstrapped and tested on x86/x86-64.
 
 Committed as rev. 215358.

 What effect does this have on compile time?

Regardless of compile time, I strongly object to this kind of hack.

(a) it will in practice never go away.

(b) (more importantly) it makes no conceptual sense.  It means that
passes before lra use the old, cached enabled attribute while
lra and after will uew fresh values.

The only reason the call has been put here is because lra was the
only pass that checks for and asserted on inconsistent values.
Passes before lra will still see the same inconsistent values but
they happen not to assert.

I.e. we've put the call here to shut up a valid assert rather than
because it's the right place to do it.

(c) the enabled attribute was never supposed to be used in this way.

I really think the patch should be reverted.

Thanks,
Richard


Re: [PATCH][PING] Keep patch file permissions in mklog

2014-09-18 Thread Diego Novillo
On Thu, Sep 18, 2014 at 10:56 AM, Yury Gribov y.gri...@samsung.com wrote:

 On 08/04/2014 12:14 PM, Tom de Vries wrote:

 On 04-08-14 08:45, Yury Gribov wrote:

 Thanks! My 2 (actually 4) cents below.


 Hi Yuri,

 thanks for the review.

   +if ($#ARGV == 1  ($ARGV[0] eq -i || $ARGV[0] eq
 --inline)) {
   +$diff = $ARGV[1];

 Can we shift here and then just set $diff to $ARGV[0] unconditionally?


 Done.

   +if ($diff eq -) {
   +die Reading from - and using -i are not compatible;
   +}

 Hm, can't we dump ChangeLog to stdout in that case?
 The limitation looks rather strange.


 My original idea here was that --inline means 'in the patch file', which
 is not possible if the patch comes from stdin.

 I've now interpreted it such that --inline prints to stdout what it
 would print to the patch file otherwise, that is, both log and patch.
 Printing just the log to stdout can be already be achieved by not using
 --inline.

   +open (FILE1, '', $tmp) or die Could not open temp file;

 Could we use more descriptive name?


 I've used the slightly more descriptive 'OUTPUTFILE'.

   +system (cat $diff $tmp) == 0
   +or die Could not append patch to temp file;
   ...
   +unlink ($tmp) == 1 or die Could not remove temp file;

 The checks look like an overkill given that we don't check for result
 of mktemp...


 I've added a check for the result of mktemp, and removed the unlink
 result check. I've left in the Could not append patch to temp file
 check because the patch file might be read-only.

 OK for trunk?

 Thanks,
 - Tom


 Pinging the patch for Tom.


Apologies for the delay. Could someone post the latest patch. I see
it's gone through a cycle of reviews and changes.


Thanks. Diego.


[gomp4] OpenACC acc_on_device (was: various OpenACC/PTX built-ins and a reduction tweak)

2014-09-18 Thread Thomas Schwinge
Hi!

Here is my OpenACC acc_on_device patch, in a more complete form, with
test cases and all that.  Thanks, Cesar, for getting the ball rolling!

On Wed, 17 Sep 2014 10:49:54 +0200, Jakub Jelinek ja...@redhat.com wrote:
 On Wed, Sep 17, 2014 at 10:44:12AM +0200, Tobias Burnus wrote:
  Cesar Philippidis wrote:
   The patch introduces the following OpenACC/PTX-specific built-ins:
  ...
  
  It is not completely clear how they are supposed to get used. Should the
  user call them directly in some cases? Or are they only used internally?
  
  acc_on_device sounds like a function which would be in C/C++ made available
  to the user via #define acc_on_device __builtin_acc_on_device.
 
 And not just providing acc_on_device prototype in some header?

Yes, just a prototype.  And next to DEF_GOACC_BUILTIN (configured the
same as DEF_GOMP_BUILTIN), I add a new DEF_GOACC_BUILTIN_COMPILER that is
configured to always provide the __builtin_[...] variant, but the
un-prefixed [...]  only if -fopenacc is in effect.  Does that look
alright?

 Without
 looking at the OpenACC standard, it sounds like this function could be
 similar to omp_is_initial_device, so can and should be handled supposedly
 similarly.

I think we've been talking about this at the Cauldron, where you agreed
that omp_is_initial_device should also be implemented as a builtin.  (Or
am I confusing things?)

  However, the rest looks as if it should rather be an internal function
  instead of a builtin. Or should the user really ever call the builtin
  directly?
 
 GOMP_* functions are builtins and not internal functions too, all those
 functions are library functions, while the user typically doesn't call them
 directly, they still are implemented in the library.  Internal functions are
 used for something that doesn't have a library implementation and is not
 something user can call directly.

  Regarding Fortran: Builtins aren't directly available to the user. You have 
  to
  wrap them into an intrinsic to make them available. If they have to be made
  available via a module (e.g. via module acc) - you have to create a virtual
  module, which provides the intrinsic. If you don't want to convert the whole
  module, you could create an auxiliar module (e.g. acc_internal_) which 
  provides
  only those bits - and then include it (use,intrinsic :: ...) it in the
  main module - written in normal Fortran.

This I have not yet addressed -- please see the TODO comments in the
gcc/fortran/ files as well as Fortran test cases.

 For the user callable fortran functions, for OpenMP libgomp just provides
 *_ entrypoints to * functions.  Perhaps acc_on_device_ could be provided
 too.

This is what I had done already.

Does that patch look good?  (With the Fortran things still to be
addressed.)  (And, obviously this is not yet based on the Tobias/Jim
Fortran module/header rewrite.)

commit 8efbd08ed058d7ed3c43e10fbff0eac35b4defc9
Author: Thomas Schwinge tho...@codesourcery.com
Date:   Fri Jul 4 11:45:05 2014 +

OpenACC acc_on_device.

gcc/
* builtins.def (DEF_GOACC_BUILTIN_COMPILER): New macro.
* oacc-builtins.def (BUILT_IN_GOACC_UPDATE): New builtin.
* builtins.c (expand_builtin_acc_on_device): New function.
(expand_builtin): Use it to handle BUILT_IN_ACC_ON_DEVICE.
(is_inexpensive_builtin): Handle BUILT_IN_ACC_ON_DEVICE.
gcc/fortran/
* f95-lang.c (DEF_GOACC_BUILTIN_COMPILER): New macro.
* types.def (BT_FN_INT_INT): New type.
gcc/testsuite/
* c-c++-common/goacc/acc_on_device-1.c: New file.
* c-c++-common/goacc/acc_on_device-2.c: Likewise.
* c-c++-common/goacc/acc_on_device-2-off.c: Likewise.
* gfortran.dg/goacc/acc_on_device-1.f95: Likewise.
* gfortran.dg/goacc/acc_on_device-2.f95: Likewise.
* gfortran.dg/goacc/acc_on_device-2-off.f95: Likewise.
libgomp/
* libgomp.map (OACC_2.0): Add acc_on_device, acc_on_device_.
* fortran.c: Include openacc.h.
(acc_on_device_): New function.
* oacc-parallel.c: Include openacc.h.
(acc_on_device): New function.
* openacc.f90 (acc_device_kind, acc_device_none)
(acc_device_default, acc_device_host, acc_device_not_host): New
parameters.
(acc_on_device): New function declaration.
* openacc_lib.h (acc_device_kind, acc_device_none)
(acc_device_default, acc_device_host, acc_device_not_host): New
parameters.
(acc_on_device): New function declaration.
* openacc.h (acc_device_t): New enum.
(acc_on_device): New function declaration.
* testsuite/libgomp.oacc-c/acc_on_device-1.c: New file.
* testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise.
* testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise.
---
 gcc/ChangeLog.gomp |  8 
 

Re: Patch to fix PR61360

2014-09-18 Thread Vladimir Makarov
On 09/18/2014 01:36 PM, Richard Sandiford wrote:
 Jakub Jelinek ja...@redhat.com writes:
 On Thu, Sep 18, 2014 at 12:04:30PM -0400, Vladimir Makarov wrote:
 The following patch fixes the PR.  The details can be found on

 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360

 The patch was bootstrapped and tested on x86/x86-64.

 Committed as rev. 215358.
 What effect does this have on compile time?
 Regardless of compile time, I strongly object to this kind of hack.

 (a) it will in practice never go away.

 (b) (more importantly) it makes no conceptual sense.  It means that
 passes before lra use the old, cached enabled attribute while
 lra and after will uew fresh values.

 The only reason the call has been put here is because lra was the
 only pass that checks for and asserted on inconsistent values.
 Passes before lra will still see the same inconsistent values but
 they happen not to assert.

 I.e. we've put the call here to shut up a valid assert rather than
 because it's the right place to do it.

 (c) the enabled attribute was never supposed to be used in this way.

 I really think the patch should be reverted.


Richard, I waited  4 months that somebody fixes this in md file (and
people tried to do this without success).  Instead I was asked numerous
times from people interesting in fixing these crashes to fix it in LRA. 
After a recent request, I gave up.

So I could revert it transferring blame on you but I don't think this
hack is so bad to do this (may be I am wrong).





[jit] Use the pyramid theme for generated HTML docs

2014-09-18 Thread David Malcolm
Committed to branch dmalcolm/jit:

The default Sphinx theme is perhaps a little dated; switch
to a non-default one.  The pyramid one is clean and attractive IMHO.

I've updated the prebuilt docs currently at:
  https://dmalcolm.fedorapeople.org/gcc/libgccjit-api-docs/
to use the new theme.

gcc/jit/ChangeLog.jit:
* docs/conf.py (Options for HTML output): Update html_theme from
default to pyramid.
---
 gcc/jit/ChangeLog.jit | 5 +
 gcc/jit/docs/conf.py  | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit
index 11c9298..06734db 100644
--- a/gcc/jit/ChangeLog.jit
+++ b/gcc/jit/ChangeLog.jit
@@ -1,5 +1,10 @@
 2014-09-18  David Malcolm  dmalc...@redhat.com
 
+   * docs/conf.py (Options for HTML output): Update html_theme from
+   default to pyramid.
+
+2014-09-18  David Malcolm  dmalc...@redhat.com
+
* docs/intro/install.rst: Markup fixes.
* docs/intro/tutorial01.rst: Likewise.
* docs/intro/tutorial02.rst: Likewise.
diff --git a/gcc/jit/docs/conf.py b/gcc/jit/docs/conf.py
index 6199010..22e763a 100644
--- a/gcc/jit/docs/conf.py
+++ b/gcc/jit/docs/conf.py
@@ -91,7 +91,7 @@ pygments_style = 'sphinx'
 
 # The theme to use for HTML and HTML Help pages.  See the documentation for
 # a list of builtin themes.
-html_theme = 'default'
+html_theme = 'pyramid'
 
 # Theme options are theme-specific and customize the look and feel of a theme
 # further.  For a list of options available for each theme, see the
-- 
1.7.11.7



Re: [PATCH] Put all constants last in tree_swap_operands_p, remove odd -Os check

2014-09-18 Thread Andrew Pinski
On Thu, Sep 18, 2014 at 9:44 AM, Alan Lawrence alan.lawre...@arm.com wrote:
 We've been seeing errors using aarch64-none-linux-gnu gcc to build the
 403.gcc benchmark from spec2k6, that we've traced back to this patch. The
 error looks like:

 /home/alalaw01/bootstrap_richie/gcc/xgcc
 -B/home/alalaw01/bootstrap_richie/gcc -O3 -mcpu=cortex-a57.cortex-a53
 -DSPEC_CPU_LP64alloca.o asprintf.o vasprintf.o c-parse.o c-lang.o
 attribs.o c-errors.o c-lex.o c-pragma.o c-decl.o c-typeck.o c-convert.o
 c-aux-info.o c-common.o c-format.o c-semantics.o c-objc-common.o main.o
 cpplib.o cpplex.o cppmacro.o cppexp.o cppfiles.o cpphash.o cpperror.o
 cppinit.o cppdefault.o line-map.o mkdeps.o prefix.o version.o mbchar.o
 alias.o bb-reorder.o bitmap.o builtins.o caller-save.o calls.o cfg.o
 cfganal.o cfgbuild.o cfgcleanup.o cfglayout.o cfgloop.o cfgrtl.o combine.o
 conflict.o convert.o cse.o cselib.o dbxout.o debug.o dependence.o df.o
 diagnostic.o doloop.o dominance.o dwarf2asm.o dwarf2out.o dwarfout.o
 emit-rtl.o except.o explow.o expmed.o expr.o final.o flow.o fold-const.o
 function.o gcse.o genrtl.o ggc-common.o global.o graph.o haifa-sched.o
 hash.o hashtable.o hooks.o ifcvt.o insn-attrtab.o insn-emit.o insn-extract.o
 insn-opinit.o insn-output.o insn-peep.o insn-recog.o integrate.o intl.o
 jump.o langhooks.o lcm.o lists.o local-alloc.o loop.o obstack.o optabs.o
 params.o predict.o print-rtl.o print-tree.o profile.o real.o recog.o
 reg-stack.o regclass.o regmove.o regrename.o reload.o reload1.o reorg.o
 resource.o rtl.o rtlanal.o rtl-error.o sbitmap.o sched-deps.o sched-ebb.o
 sched-rgn.o sched-vis.o sdbout.o sibcall.o simplify-rtx.o ssa.o ssa-ccp.o
 ssa-dce.o stmt.o stor-layout.o stringpool.o timevar.o toplev.o tree.o
 tree-dump.o tree-inline.o unroll.o varasm.o varray.o vmsdbgout.o xcoffout.o
 ggc-page.o i386.o xmalloc.o xexit.o hashtab.o safe-ctype.o splay-tree.o
 xstrdup.o md5.o fibheap.o xstrerror.o concat.o partition.o hex.o lbasename.o
 getpwd.o ucbqsort.o -lm-o gcc
 emit-rtl.o: In function `gen_rtx_REG':
 emit-rtl.c:(.text+0x12f8): relocation truncated to fit:
 R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON
 section in regclass.o
 emit-rtl.o: In function `gen_rtx':
 emit-rtl.c:(.text+0x1824): relocation truncated to fit:
 R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON
 section in regclass.o
 collect2: error: ld returned 1 exit status
 specmake: *** [gcc] Error 1
 Error with make 'specmake -j7 build': check file
 '/home/alalaw01/spectest/benchspec/CPU2006/403.gcc/build/build_base_test./make.err'
   Command returned exit code 2
   Error with make!
 *** Error building 403.gcc

 Inspecting the compiled emit-rtl.o shows:

 $ readelf --relocs good/emit-rtl.o | grep fixed_regs
 12a8 005d0113 R_AARCH64_ADR_PRE  fixed_regs + 0
 12ac 005d0115 R_AARCH64_ADD_ABS  fixed_regs + 0
 1800 005d0113 R_AARCH64_ADR_PRE  fixed_regs + 0
 1804 005d0115 R_AARCH64_ADD_ABS  fixed_regs + 0

 (that's compiled with a gcc just before this patch), contrastingly using a
 gcc with that patch:

 $ readelf --relocs bad/emit-rtl.o | grep fixed_regs
 12a8 005d0113 R_AARCH64_ADR_PRE  fixed_regs + 0
 12ac 005d0115 R_AARCH64_ADD_ABS  fixed_regs + 0
 12f8 005d0113 R_AARCH64_ADR_PRE  fixed_regs +
 
 12fc 005d0116 R_AARCH64_LDST8_A  fixed_regs +
 
 1824 005d0113 R_AARCH64_ADR_PRE  fixed_regs +
 
 1828 005d0116 R_AARCH64_LDST8_A  fixed_regs +
 
 186c 005d0113 R_AARCH64_ADR_PRE  fixed_regs + 0
 1870 005d0115 R_AARCH64_ADD_ABS  fixed_regs + 0

 I attach a candidate 'fix', which allows building of 403.gcc on
 aarch64-none-linux-gnu, full regression etc ongoing. (I admit there may be
 better options in terms of canonicalizing if you want to!)


I don't think this is the correct fix or even close to the real issue.
I think we have some tiny memory model issues coming in when medium
memory model is being used.  I ran into this issue while compiling php
with a GCC 4.7 based compiler.  Try the attached patch.

Thanks,
Andrew Pinski


 --Alan



 Richard Biener wrote:

 The following makes tree_swap_operands_p put all constants 2nd place,
 also looks through sign-changes when considering further canonicalzations
 and removes the odd -Os guard for those.  That was put in with
 https://gcc.gnu.org/ml/gcc-patches/2003-10/msg01208.html just
 motivated by CSiBE numbers - but rather than disabling canonicalization
 this should have disabled the actual harmful transforms.

 Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu.

 Richard.

 2014-08-15  Richard Biener  rguent...@suse.de

 * fold-const.c 

  1   2   >