Re: __fp16 is ambiguous error in C++

2021-06-25 Thread Jim Wilson
On Thu, Jun 24, 2021 at 7:26 PM ALO via Gcc  wrote:

> foo.c: In function '__fp16 foo(__fp16, __fp16)':
> foo.c:6:23: error: call of overloaded 'exp(__fp16&)' is ambiguous
> 6 | return a + std::exp(b);
> | ^
>

No, there isn't a solution for this.  You might want to try an ARM port
clang/gcc to see what they do, but it probably isn't much better than the
RISC-V port.  Looks like the same gcc result to me with a quick check.  And
note that only the non-upstream V extension branch for RISC-V has the
__fp16 support because the vector extension depends on it.  It is hard to
argue for changes when the official RISC-V GCC port has no __fp16 support.

Kito started a related thread in March, and there was tentative agreement
to add _Float16 support to the GCC C++ front end.
https://gcc.gnu.org/pipermail/gcc/2021-March/234971.html
That may or may not help you.

I think it will be difficult to do anything useful here until the C and C++
standards figure out how they want half-float support to work.  If we do
something before then, it will probably end up incompatible with the
official solution and we will end up stuck with a mess.

Jim


Re: Default debug format for AVR

2021-04-05 Thread Jim Wilson
On Sat, Apr 3, 2021 at 6:24 PM Simon Marchi via Gcc  wrote:

> The default debug format (when using only -g) for the AVR target is
> stabs.  Is there a reason for it not being DWARF, and would it be
> possible to maybe consider possibly thinking about making it default to
> DWARF?  I am asking because the support for stabs in GDB is pretty much
> untested and bit-rotting, so I think it would be more useful for
> everyone to use DWARF.
>

I tried to deprecate the stabs support a little over 4 years ago.
https://gcc.gnu.org/pipermail/gcc-patches/2017-December/489296.html
There was a suggestion to change the error to a warning, but my startup
company job kept me so busy I never had a chance to follow up on this.

I would like to see the stabs support deprecated and the later removed from
gcc.  No new features have been added in a long time, and it is only being
maintained in the sense that when it fails it is fixed to ignore source
code constructs that it doesn't support.  The longer it survives in this
state, the less useful it becomes.

Jim


Re: Having trouble getting my school to sign the copyright disclaimer

2021-03-31 Thread Jim Wilson
On Wed, Mar 31, 2021 at 8:27 AM PKU via Gcc  wrote:

> I’m trying to get my school to sign the copyright disclaimer.
> Unfortunately the officials are reluctant to do that. Can anyone suggest
> what to do next?
>

Maybe the PLCT Lab at the Chinese Academy of Sciences can help.  They are
doing GCC and LLVM work, and have GCC etc assignments.  Even if they can't
help get an assignment for your current work, if you want to continue doing
GCC work, maybe your next patch can be written for them.  They hold regular
meetings for people doing GCC/LLVM work in China to meet and discuss their
work.  The PLCT work is primarily RISC-V focused though.

https://github.com/lazyparser/weloveinterns
https://github.com/lazyparser/weloveinterns/blob/master/open-internships.md#bj37-gccbinutilsglibclinker-%E5%BC%80%E5%8F%91%E5%AE%9E%E4%B9%A0%E7%94%9F-10%E5%90%8D

You might be able to find more appropriate links.  I don't actually read
Mandarin.  This is just a link I happen to know about which has good info
about the PLCT Lab.

Jim


Re: HELP: MIPS PC Relative Addressing

2021-02-24 Thread Jim Wilson
On Wed, Feb 24, 2021 at 9:30 AM Maciej W. Rozycki  wrote:

> On Wed, 24 Feb 2021, Jiaxun Yang wrote:
>
> > For RISC-V, %pcrel_lo shall point to the label of corresponding
> %pcrel_hi,
> > like
> >
> > .LA0:
> > auipca0, %pcrel_hi(sym)
> > addi  a0, a0, %pcrel_lo(.LA0)
>
>  I commented on it once, in the course of the FDPIC design project, and I
> find it broken by design.  Sadly it has made it into the RISC-V psABI and
> it is hard to revert at this time, too many places have started relying on
> it.
>

It was already a production ABI before you asked for the change.  And
changing a production ABI is extremely difficult.  You were not the first
to complain about this, and you probably won't be the last.

Jim


Re: HELP: MIPS PC Relative Addressing

2021-02-24 Thread Jim Wilson
On Wed, Feb 24, 2021 at 6:18 AM Jiaxun Yang  wrote:

> I found it's very difficult for GCC to generate this kind of pcrel_lo
> expression,
> RTX label_ref can't be lower into such LOW_SUM expression.
>

Yes, it is difficult.  You need to generate a label, and put the label
number in an unspec in the auipc pattern, and then create a label_ref to
put in the addi.  The fact that we have an unspec and a label_ref means a
number of optimizations get disabled, like basic block duplication and loop
unrolling, because they can't make a copy of an instruction that uses a
label as data, as they have no way to know how to duplicate the label
itself.  Or at least RISC-V needs to create one label.  You probably need
to create two labels.

There is a far easier way to do this, which is to just emit an assembler
macro, and let the assembler generate the labels and relocs.  This is what
the RISC-V GCC port does by default.  This prevents some optimizations like
scheduling the two instructions, but enables some other optimizations like
loop unrolling.  So it is a tossup.  Sometimes we get better code with the
assembler macro, and sometimes we get better code by emitting the auipc and
addi separately.

The RISC-V gcc port can emit the auipc/addi with
-mexplicit-relocs -mcode-model=medany, but this is known to sometimes
fail.  The problem is that if you have an 8-byte variable with 8-byte
alignment, and try to load it with 2 4-byte loads, gcc knows that offset+4
must be safe from overflow because the data is 8-byte aligned.  However,
when you use a pc-relative offset that is data address-code address, the
offset is only as aligned as the code is.  RISC-V has 2-byte instruction
alignment with the C extension.  So if you have offset+4 and offset is only
2-byte aligned, it is possible that offset+4 may overflow the add immediate
field.  The same thing can happen with 16-byte data that is 16-byte
aligned, accessed with two 8-byte loads.  There is no easy software
solution.  We just emit a linker error in that case as we can't do anything
else.  I think this would work better if auipc cleared some low bits of the
result, in which case the pc-relative offset would have enough alignment to
prevent overflow when adding small offsets, but it is far too late to
change how the RISC-V auipc works.

If it looks infeasible for GCC side, another option would be adding
> RISC-V style
> %pcrel_{hi,lo} modifier at assembler side. We can add another pair of
> modifier
> like %pcrel_paired_{hi,lo} to implement the behavior. Would it be a good
> idea?
>

I wouldn't recommend following the RISC-V approach for the relocation.

Jim


RISC-V -menable-experimental-extensions option

2020-12-07 Thread Jim Wilson
I'm not aware of any other target that has a similar feature, so I thought
a bit of discussion first might be useful.

For most ISAs, there is one organization that owns it, and does development
internally, in private.  For RISC-V, the ISA is owned by RISC-V
International which has no developers.  The development all happens
externally, in public, spread across at least a dozen different
organizations.  So we have the problem of coordinating this work,
especially for draft versions of extensions.  So we would like to add
support for draft extensions to mainline, controlled by a
-menable-experimental-extensions option.  For features enabled by this
option, there would be no guarantee that the next compiler release is
compatible with the previous one, since the draft extension may change in
incompatible ways.

LLVM already has support for this option.
http://lists.llvm.org/pipermail/llvm-dev/2020-January/138364.html
https://reviews.llvm.org/D73891

We are still discussing the details of how this will work.  We may want to
limit this to 'stable" draft extensions, and put the unstable drafts on a
vendor branch.

We have been doing work on branches in the github.com riscv tree, but there
are issues with tracking who has copyright assignments, issues with
identifying who exactly a github user actually is, and issues with getting
the right set of people right access to the trees.  These won't be problems
if we are using the FSF trees instead.

We want this draft extension support on mainline for the same reasons that
the LLVM developers do, to ensure that everyone is working in the same
branch in the upstream tree.  And it is easiest to do that if that branch
is mainline.

This is just a binutils and gcc proposal at the moment, but we might need
something on the gdb side later, like a
  set riscv experimental-extensions 1
or whatever command to enable support for draft extensions.

Jim


Re: Wrong insn scheduled by Sched1 pass

2020-11-04 Thread Jim Wilson
On Mon, Nov 2, 2020 at 11:45 PM Jojo R  wrote:

> From origin insn seqs, I think the insn 'r500=unspec[r100] 300’ is in
> Good place because of the bypass of my pipeline description, it is not
> needed to schedule.
> ...
> Is there any way to control my case ?
> Or my description of pipeline is not good ?
>

I would suggest looking at verbose scheduler debugging dumps to see exactly
what decisions the scheduler is making.  See the -fsched-verbose=X option,
and give it a value of at least 9 as I think that is the highest supported
value.  This will put a lot of info in the scheduler rtl dumps that will
help you understand what the scheduler is doing.  You can then use that
info to try to figure out how to tweak your port to get the result you
want.  The problem may not be in the pipeline description file.  You might
need to define some macros.  See the list of TARGET_SCHED_* macros you can
use to control how the scheduler works.

Jim


Re: Is there a way to tell GCC not to reorder a specific instruction?

2020-10-01 Thread Jim Wilson
On Wed, Sep 30, 2020 at 11:35 PM Richard Biener
 wrote:
> On Wed, Sep 30, 2020 at 10:01 PM Jim Wilson  wrote:
> > We have a lot of examples in gcc/testsuite/gcc.target/riscv/rvv that
> > we are using for testing the vector support.
>
> That doesn't seem to exist (but maybe it's just not on trunk yet).

The vector extension is still in draft form, and they are still making
major compatibility breaks.  There was yet another one about 3-4 weeks
ago.  I don't want to upstream anything until we have an officially
accepted V extension, at which point they will stop allowing
compatibility breaks.  If we upstream now, we would need some protocol
for how to handle unsupported experimental patches in mainline, and I
don't think that we have one.

So for now, the vector support is on a branch in the RISC-V
International github repo.
https://github.com/riscv/riscv-gnu-toolchain/tree/rvv-intrinsic
The gcc testcases specifically are here
https://github.com/riscv/riscv-gcc/tree/riscv-gcc-10.1-rvv-dev/gcc/testsuite/gcc.target/riscv/rvv
A lot of the testcases use macros so we can test every variation of an
instruction, and there is a large number of variations for most
instructions, so most of these testcases aren't very readable.  They
are just to verify that we can generate the instructions we expect.
Only the algorithm ones are readable, like saxpy, memcpy, strcpy.

Jim


Re: Is there a way to tell GCC not to reorder a specific instruction?

2020-09-30 Thread Jim Wilson
On Tue, Sep 29, 2020 at 11:40 PM Richard Biener
 wrote:
> But this also doesn't work on GIMPLE.  On GIMPLE riscv_vlen would
> be a barrier for code motion if you make it __attribute__((returns_twice))
> since then abnormal edges distort the CFG in a way preventing such motion.

At the gimple level, all vector operations have an implicit vsetvl, so
it doesn't matter much how they are sorted.  As long as they don't get
sorted across an explicit vsetvl that they depend on.  But the normal
way to use explicit vsetvl is to control a loop, and you can't move
dependent operations out of the loop, so it tends to work.  Setting
vsetvl in the middle of a basic block is less useful and less common,
and very unlikely to work unless you really know what you are doing.
Basically, RISC-V wasn't designed to work this way, and so you
probably shouldn't be writing your code this way.  There might be edge
cases where we aren't handling this right, as we aren't writing code
this way, and hence we aren't testing this support.  This is still a
work in progress.

Good RVV code should look more like this:

#include 
#include 

void saxpy(size_t n, const float a, const float *x, float *y) {
  size_t l;

  vfloat32m8_t vx, vy;

  for (; (l = vsetvl_e32m8(n)) > 0; n -= l) {
vx = vle32_v_f32m8(x);
x += l;
vy = vle32_v_f32m8(y);
// vfmacc
vy = a * vx + vy;
vse32_v_f32m8(y, vy);
y += l;
  }
}

We have a lot of examples in gcc/testsuite/gcc.target/riscv/rvv that
we are using for testing the vector support.

Jim


Re: Is there a way to tell GCC not to reorder a specific instruction?

2020-09-30 Thread Jim Wilson
On Tue, Sep 29, 2020 at 7:22 PM 夏 晋  wrote:
> vint16m1_t foo3(vint16m1_t a, vint16m1_t b){
>   vint16m1_t add = a+b;
>   vint16m1_t mul = a*b;
>   vsetvl_e8m1(32);
>   return add + mul;
> }

Taking another look at your example, you have type confusion.  Using
vsetvl to specify an element width of 8 does not magically convert
types into 8-bit vector types.  They are still 16-bit vector types and
will still result in 16-bit vector operations.  So your explicit
vsetvl_e8m1 is completely useless.

In the RISC-V V scheme, every vector operation emits an implicit
vsetvl instruction, and then we optimize away the redundant ones.  So
the add and mul at the start are emitting two vsetvl instructions.
Then you have an explicit vsetvl.  Then another add, which will emit
another implicit vsetvl.  The compiler reordered the arithmetic in
such a way that two of the implicit vsetvl instructions can be
optimized away.  That probably happened by accident.  But we don't
have support for optimizing away the useless explicit vsetvl, so it
remains.

Jim


Re: Is there a way to tell GCC not to reorder a specific instruction?

2020-09-29 Thread Jim Wilson
On Tue, Sep 29, 2020 at 3:47 AM 夏 晋 via Gcc  wrote:
> I tried to set the "vlen" after the add & multi, as shown in the following 
> code:

> vf32 x3,x4;
> void foo1(float16_t* input, float16_t* output, int vlen){
> vf32 add = x3 + x4;
> vf32 mul = x3 * x4;
> __builtin_riscv_vlen(vlen);  //<
> storevf(&output[0], add);
> storevf(&output[4], mul);
> }

Not clear what __builtin_riscv_vlen is doing, or what exactly your
target is, but the gcc port I did for the RISC-V draft V extension
creates new fake vector type and vector length registers, like the
existing fake fp and arg pointer registers, and the vsetvl{i}
instruction sets the fake vector type and vector length registers, and
all vector instructions read the fake vector type and vector length
registers.  That creates the dependence between the instructions that
prevents reordering.  It is a little more complicated than that, as
you can have more than one vsetvl{i} instruction setting different
vector type and/or vector length values, so we have to match on the
expected values to make sure that vector instructions are tied to the
right vsetvl{i} instruction.  This is a work in progress, but overall
it is working pretty well.  This requires changes to the gcc port, as
you have to add the new fake registers in gcc/config/riscv/riscv.h.
This isn't something you can do with macros and extended asms.

See for instance

https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/Krhw8--wmi4/m/-3IPvT7JCgAJ

Jim


Re: New pseudos in splitters

2020-09-23 Thread Jim Wilson
On Wed, Sep 23, 2020 at 7:51 AM Ilya Leoshkevich via Gcc
 wrote:
> Is this restriction still valid today?  Is there a reason we can't
> introduce new pseudos in a splitter before LRA?

See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91683
for an example of what can go wrong when a splitter creates a new
pseudo.  I think there was another one I fixed around the same time
that failed for a different reason, but don't have time to look for
it.

Jim


Re: How to forbid register allocator to overlap bewteen DEST and SOURCE

2020-07-02 Thread Jim Wilson
On Wed, Jul 1, 2020 at 8:40 PM  wrote:
> GCC seems to overlap register bewteen DEST and SOURCE in different 
> machine mode,
> Is there any target hooks to control this feature ?
> I use  ‘&’ to forbid register allocator to 
> overlap bewteen DEST and SOURCE,
> but there are some redundancy instructions in the result code :(

& is the correct solution in general.

Presumably this is about your draft v0.7.1 vector port.  This port
uses an unspec in every pattern.  This limits the compiler's ability
to optimize code.  You might get better results if you eliminated as
many of the unspecs as you can.

You might want to check TARGET_MODES_TIEABLE_P though this is mostly
about casts and moves not register allocation.

Jim


Re: sign_and_send_pubkey: signing failed: agent refused operation

2020-06-02 Thread Jim Wilson
On Mon, Jun 1, 2020 at 3:33 PM Martin Sebor via Gcc  wrote:
> So it sounds like you wouldn't expect the "agent refused operation"
> error either, and it's not just a poor error message that I should
> learn to live with.  That makes me think I should try to figure out
> what's wrong.  I think the ~/.ssh/ contents are pretty standard:

My experience with Ubuntu 18.04 is that 2K bit keys aren't accepted by
something (gnome UI?) anymore.  I had to upgrade to 4K bit keys.
Though oddly ssh-keygen still generates 2K bit keys by default even
though they won't be accepted by the gnome UI (or whatever).  The work
around is to run ssh-add manually to register your 2K bit key, because
ssh-add will still accept 2K bit keys, and then ssh will work, and can
be used to install a 4K bit public key on the other side, and then
things will work normally again.  A web search suggested that there
was some security problem with 2K bit keys and apparently they are
trying to force people to upgrade, but the inconsistent approach here
between different packages makes this confusing as to what is actually
going on.

Jim


Re: `insn does not satisfy its constraints` when compiling a simple program.

2020-04-20 Thread Jim Wilson
On Sat, Apr 18, 2020 at 8:45 AM Joe via Gcc  wrote:
> test.c: In function ‘main’:
> test.c:5:1: error: insn does not satisfy its constraints:

The constrain_operands function is failing to match the insn to its
constraints.  Try putting a breakpoint there, and stepping through the
code to see what is going wrong.  The function is likely called many
times, so you might need to figure out which call is the failing one
first and only step through that one.

Jim


Re: Modifying RTL cost model to know about long-latency loads

2020-04-16 Thread Jim Wilson
On Thu, Apr 16, 2020 at 7:28 PM Sasha Krassovsky  wrote:
> @Jim I saw you were from SiFive - I noticed that modifying the costs for 
> integer multiplies in the riscv_tune_info structs didn’t affect the generated 
> code. Could this be why?

rtx_costs is used for instruction selection.  For instance, choosing
whether to use a shift and add sequence as opposed to a multiply
depends on rtx_cost.  rtx_cost is not used for instruction scheduling.
This uses the latency info from the pipeline model, e.g. generic.md.
It looks like I didn't read your first message closely enough and
should have mentioned this earlier.

Changing multiply rtx_cost does affect code generation.  Just try a
testcase multiplying by a number of small prime factors, and you will
see that which ones use shift/add and which ones use multiply depends
on the multiply cost in the riscv_tune_info structs.  This also
factors into the optimization that turns divide by constant into a
multiply.  When this happens depends on the relative values of the
multiply cost and the divide cost.

Jim


Re: Modifying RTL cost model to know about long-latency loads

2020-04-13 Thread Jim Wilson
On Sat, Apr 11, 2020 at 4:28 PM Sasha Krassovsky via Gcc
 wrote:
> I’m currently modifying the RISC-V backend for a manycore processor where 
> each core is connected over a network. Each core has a local scratchpad 
> memory, but can also read and write other cores’ scratchpads. I’d like to add 
> an attribute to give a hint to the optimizer about which loads will be remote 
> and therefore longer latency than others.

GCC has support for the proposed named address space extension to the
ISO C standard.  You may be able to use this instead of defining your
own attributes.  I don't know if this helps with the rtx cost
calculation though.  This is mostly about support for more than one
address space.  See "Named Address Spaces" in the gcc internals docs,
and the *_ADDR_SPACE_* stuff in the sources.

The problem may be one similar to what Alan Modra mentioned.  I would
suggest stepping through the cost calculation code in a debugger to
see what is happening.

Jim


Re: Not usable email content encoding

2020-03-18 Thread Jim Wilson
I'm one of the old timers that likes our current work flow, but even I
think that we are risking our future by staying with antiquated tools.
One of the first things I need to teach new people is now to use email
"properly".  It is a barrier to entry for new contributors, since our
requirements aren't how the rest of the world uses email anymore.
LLVM has phabricator.  Some git based projects are using gerrit.
Github and gitlab are useful services.  We need to think about setting
up easier ways for people to submit patches, rather than trying to fix
all of the MUAs and MTAs in the world.

Jim


Re: Update on SVE/sizeless types for C and C++

2019-11-13 Thread Jim Wilson
On Tue, Nov 12, 2019 at 2:12 PM Richard Sandiford
 wrote:
> Are both RVV intrinsic proposals like SVE in that all sizeless types
> can be/are built into the compiler?  If so, do you think the target hook
> added in:
> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00942.html
> would be enough for RVV too?  Or do the RVV proposals require support
> for user-defined sizeless types?

We only have built-in types.  I think we have 54 of them, 32 integer,
12 float, and 10 mask.  I hadn't thought about user-defined sizeless
types, and hope that I don't have to support that.

> If the hook is enough, I guess there are three ways we can go:
> (1) Add hooks for both targets, with similar functionality.  This means
> a certain amount of cut-&-paste but also allows for more specific
> error messages.

I think this would be OK.  I took a quick look at your patch.  I'm a
little surprised that you can't support alignof on a vector type, I
would think that depends on the base type for the vector, but maybe
this is a difference between SVE and RVV, or maybe I just haven't
gotten far enough to find the problem yet.  Otherwise it looks like
this would also work for the RVV support.

Jim


Re: Update on SVE/sizeless types for C and C++

2019-11-12 Thread Jim Wilson
On Tue, Nov 12, 2019 at 8:06 AM Richard Sandiford
 wrote:
> If the use of sizeless types does expand beyond SVE built-in types
> in future, the places that call the hook are the places that would
> need to deal directly with sizeless types.

We are using the same sizeless type infrastructure for the RISC-V
vector extension work.  The RVV extension is still in draft form and
still evolving.  The software is only in prototype form at the moment.
We don't have an ABI yet.  We have at least two competing proposals
for the intrinsics based programming model.  We don't have
auto-vectorization support yet.  Etc.  But SiFive has been working on
gcc patches for one of the intrinsics proposals, and EPI (European
Processor Initiative) has been working on llvm patches for another
intrinsics proposal, and both of these are using sizeless types.  RVV
has a similar design to ARM SVE where the size of types depends on the
hardware you are running on, and those sizes can change at run-time,
where they can be different from one loop iteration to the next.

Jim


Re: gcc vs clang for non-power-2 atomic structures

2019-08-23 Thread Jim Wilson
I was pointed at
https://bugs.llvm.org/show_bug.cgi?id=26462
for the LLVM discussion of this problem.

Another issue here is that we should have ABI testing for atomic.
For instance, gcc/testsuite/gcc.dg/compat has no atomic testcases.
Likewise g++.dg/compat.

Jim


gcc vs clang for non-power-2 atomic structures

2019-08-22 Thread Jim Wilson
We got a change request for the RISC-V psABI to define the atomic
structure size and alignment.  And looking at this, it turned out that
gcc and clang are implementing this differently.  Consider this
testcase

rohan:2274$ cat tmp.c
#include 
struct s { int a; int b; int c;};
int
main(void)
{
  printf("size=%ld align=%ld\n", sizeof (struct s), _Alignof(struct s));
  printf("size=%ld align=%ld\n", sizeof (_Atomic (struct s)),
_Alignof(_Atomic (struct s)));
  return 0;
}
rohan:2275$ gcc tmp.c
rohan:2276$ ./a.out
size=12 align=4
size=12 align=4
rohan:2277$ clang tmp.c
rohan:2278$ ./a.out
size=12 align=4
size=16 align=16
rohan:2279$

This is with an x86 compiler.  I get the same result with a RISC-V
compiler.  This is an ABI incompatibility between gcc and clang.  gcc
has code in build_qualified_type in tree.c that sets alignment for
power-of-2 structs to the same size integer alignment, but we don't
change alignment for non-power-of-2 structs.  Clang is padding the
size of non-power-of-2 structs to the next power-of-2 and giving them
that alignment.

Unfortunately, I don't know who to contact on the clang side, but we
need to have a discussion here, and we probably need to fix one of the
compilers to match the other one, as we should not have ABI
incompatibilities like this between gcc and clang.

The original RISC-V bug report is at
https://github.com/riscv/riscv-elf-psabi-doc/pull/112
There is a pointer to a gist with a larger testcase with RISC-V results.

Jim


Re: gcc/config/arch/arch.opt: Option mask gen problem

2019-07-22 Thread Jim Wilson
On Mon, Jul 22, 2019 at 4:05 AM Maxim Blinov  wrote:
> Is it possible, in the arch.opt file, to have GCC generate a bitmask
> relative to a user-defined variable without an associated name? To
> illustrate my problem, consider the following option file snippet:
> ...
> But, I don't want the user to be able to pass "-mbmi-zbb" or
> "-mno-bmi-zbb" on the command line:

If you don't want an option, why are you making changes to the
riscv.opt file?  This is specifically for supporting command line
options.

Adding a variable here does mean that it will automatically be saved
and restored, and I can see the advantage of doing that, even if it is
only indirectly tied to options.  You could add a variable here, and
then manually define the bitmasks yourself in riscv-opt.h or riscv.h.
Or you could just add the variable to the machine_function struct in
riscv.c, which will also automatically save and restore the variable.

Jim


Re: [PATCH] Deprecate ia64*-*-*

2019-06-13 Thread Jim Wilson
On Thu, Jun 13, 2019 at 10:39 AM Joel Sherrill  wrote:
> Ok with me if no one steps up and the downstream projects like Debian gets
> notice. This is just a reflection of this architecture's status in the
> world.

I sent email to the debian-ia64 list half an hour ago.  Just got a
response.  They mentioned that there is also a gentoo group that I
didn't know about, and want to know why exactly we want to deprecate
it.  I can discuss it with them.

Jim


Re: [PATCH] Deprecate ia64*-*-*

2019-06-13 Thread Jim Wilson
On Thu, 2019-06-13 at 09:09 -0600, Jeff Law wrote:
> On 6/13/19 5:13 AM, Richard Biener wrote:
> > 
> > ia64 has no maintainer anymore so the following deprecates it
> > with the goal of eliminating the port for GCC 11 if no maintainer
> > steps up.

OK with me since I'm not the maintainer anymore.

> Works for me.  James Clarke has been fixing small stuff recently, not
> sure if he wants to step into a larger role though.

There are 3 of them.  See
https://wiki.debian.org/Ports/ia64
It might be useful to send an email to the debian-ia64 list to notify
them.

> ia64 has been failing the qsort checking in the scheduler since the
> day
> that checking was introduced -- it shows up building the kernel IIRC.
> Nobody's shown any interest in addressing those issues :-)

I tried looking at this once.  It looked more difficult than I was
willing to do for IA-64.  There are so many checks in the qsort compare
functions that I don't think you can get a stable sort there.  I think
this is really a sel-sched problem not an IA-64 problem, but the IA-64
port is tied to sel-sched and may not work well without it.  Most other
ports aren't using it by default.

Jim




Re: Dejagnu output size limit and flaky test ( c-c++-common/builtins.c -Wc++-compat )

2019-05-08 Thread Jim Wilson

On 5/8/19 3:34 AM, Matthew Malcomson wrote:

The cause seems to be a restriction in dejagnu where it stops reading
after a given read if its output buffer is greater than 512000 bytes.


This dejagnu restriction was removed in 2016.  Try using a newer dejagnu 
release.


2016-03-27  Ben Elliston  

* lib/remote.exp (standard_wait): Append any trailing characters
to $output that may be still in $expect_out(buffer) when eof is
matched. Remove arbitrary limitation in the ".+" matching case,
similar to the change to local_exec on 2016-02-17.

Jim


Re: Please help!!!

2019-05-06 Thread Jim Wilson
On Mon, May 6, 2019 at 6:02 AM Алексей Хилаев via gcc  wrote:
> Gcc riscv won`t emit my insns, binutils and spike(riscv sim) work correctly, 
> but gcc don`t. I want to add min/max for integer, gcc compiling correct, sim 
> executing correctly.
> (define_insn "*min_"
> [(set (match_operand:GPR 0 "register_operand" "=r")
> (smin:GPR (match_operand:X 1 "register_operand" " r")
> (match_operand:X 2 "register_operand" " r")))]
> ""
> "min\t%0,%1,%2"
> [(set_attr "type" "move")
> (set_attr "mode" "")])

You must have patterns named sminXi3 where X can be s and/or d.
Likewise for smaxXi3.  Once the named patterns exist, then gcc will
automatically call the named patterns to generate RTL when
appropriate.  Then later passes like combine can create new RTL from
the min/max pattern RTL.  See for instance how the existing FP min/max
patterns work.  The pattern name is important.  You might also
consider adding uminXi3 and umaxXi3 patterns.  You can find a list of
supported named patterns in the gcc docs.

Also note that the RTL that you generate must look sensible.  You have
a smin:GPR operation that is accepting Xmode operands which is not OK.
The modes must match.  You can use sign_extend/zero_extend to
sign/zero extend a smaller mode to a larger mode, and subreg to reduce
a larger mode to a smaller one.   These will have to be separate
patterns.  But once you have the basic smin/smax patterns, combine can
create the sign_extend/whatever versions for you.  See for instance
how the addsi3 and addsi3_extend* patterns work.

Jim


Re: RISC-V sibcall optimization with save-restore

2019-03-20 Thread Jim Wilson

On 3/20/19 5:25 AM, Paulo Matos wrote:

I am working on trying to get RISC-V 32 emitting sibcalls even in the
present of `-msave-restore`, for a client concerned with generated code
size.


This won't work unless you define a new set of restore functions.  The 
current ones restore the return address from the stack and return, which 
is wrong if you want to do a sibcall.  This is why we tail call (jump 
to) the restore functions, because the actual function return is in the 
restore functions.  You will need a new set of restore functions that 
restore regs without restoring the ra.  You then probably also then need 
other cascading changes to make this work.


The new set of restore functions will then increase code size a bit 
offsetting the gain you get from using them.  You would have to have 
enough sibling calls that can use -msave-restore to make this 
worthwhile.  It isn't clear if this would be a win or not.



I thought I was on the right path until I noticed that the CFG is messed
up because of assumptions related to emission of sibcall instead of a
libcall until the epilogue is expanded. During the pro_and_epilogue pass
I get an emergency dump and a segfault:
gcc/gcc/testsuite/gcc.target/riscv/save-restore-1.c:11:1: error: in
basic block 2:
gcc/gcc/testsuite/gcc.target/riscv/save-restore-1.c:11:1: error: flow
control insn inside a basic block
(jump_insn 24 23 6 2 (parallel [
 (return)
 (use (reg:SI 1 ra))
 (const_int 0 [0])
 ]) "gcc/gcc/testsuite/gcc.target/riscv/save-restore-1.c":11:1 -1
  (nil))


If you look at the epilogue code, you will see that it emits a regular 
instruction which hides the call to the restore routine, and then it 
emits a special fake return insn that doesn't do anything.  You can just 
stop emitting the special fake return insn in this case.  This of course 
assumes that you have a new set of restore functions that actually 
return the caller, instead of the caller's parent.


One of the issues with -msave-restore is that the limited offset ranges 
of calls and branches means that if you don't have a tiny program then 
each save/restore call/jump is probably an auipc/lui plus the call/tail, 
which limits the code size reduction you get from using it.  If you can 
control where the -msave-restore routines are placed in memory, then 
putting them near address 0, or near the global pointer address, will 
allow linker relaxation to optimize these calls/jumps to a single 
instruction.  This will probably help more than trying to get it to work 
with sibling calls.


If you can modify the hardware, you might try adding load/store multiple 
instructions and using that instead of the -msave-restore option.  I 
don't know if anyone has tried this yet, but it would be an interesting 
experiment that might result in smaller code size.


Jim


Re: riscv64 dep. computation

2019-02-15 Thread Jim Wilson
On Thu, Feb 14, 2019 at 11:33 PM Paulo Matos  wrote:
> Are global variables not supposed to alias each other?
> If I indeed do that, gcc still won't group loads and stores:
> https://cx.rv8.io/g/rFjGLa

I meant something like
struct foo_t x, y;
and now they clearly don't alias.  As global pointers they may still alias.

Jim


Re: riscv64 dep. computation

2019-02-14 Thread Jim Wilson

On 2/14/19 3:13 AM, Paulo Matos wrote:

If I compile this with -O2, sched1 groups all loads and all stores
together. That's perfect. However, if I change TYPE to unsigned char and
recompile, the stores and loads are interleaved.

Further investigation shows that for unsigned char there are extra
dependencies that block the scheduler from grouping stores and loads.


The ISO C standard says that anything can be casted to char *, and char 
* can be casted to anything.  Hence, a char * pointer aliases everything.


If you look at the alias set info in the MEMs, you can see that the char 
* references are in alias set 0, which means that they alias everything. 
 The short * references are in alias set 2 which means they only alias 
other stuff in alias set 2.  The difference here is that short * does 
not alias the structure pointers, but char * does.  I haven't tried 
debugging your example, but this is presumably where the difference 
comes from.


Because x and y are pointer parameters, the compiler must assume that 
they might alias.  And because char * aliases everything, the char 
references alias them too.  If you change x and y to global variables, 
then they no longer alias each other, and the compiler will schedule all 
of the loads first, even for char.


Jim


Re: Replacing DejaGNU

2019-01-14 Thread Jim Wilson

On 1/14/19 5:44 AM, MCC CS wrote:

I've been running the testsuite on my macOS, on which
it is especially unbearable. I want to (at least try to)
rewrite a DejaGNU replacement accepting the same
syntax and having no dependency, should therefore
be faster. I was wondering if there have been any
attempts on this?


CodeSourcery wrote one called qmtest, but there apparently hasn't been 
any work done on it in a while.  Joseph Myers indirectly referred to it. 
 You can find a copy here

https://github.com/MentorEmbedded/qmtest

It used to be possible to run the gcc testsuite using qmtest, but I 
don't know the current status.  I do see that there is still a 
qmtest-g++ makefile rule for running the G++ testsuite via qmtest 
though.  You could try that and see if it still works.


There is so much stuff that depends on dejagnu that replacing it will be 
difficult.


Jim


Re: how to build and test uClinux toolchains

2018-10-16 Thread Jim Wilson

On 10/16/18 7:19 AM, Christophe Lyon wrote:

While reviewing one of my patches about FDPIC support for ARM, Richard
raised the concern of testing the patch on other uClinux targets [1].

I looked at uclinux.org and at the GCC maintainers file, but it's
still not obvious to me which uClinux targets are currently supported?


You should try asking the uclinux developers.

I tried looking at uclinux, and as far as I can tell, the best supported 
targets are arm and m68k/coldfire.  crosstools-ng only supports one 
uclinux target for instance, which is m68k.  qemu has m68k support, so 
you could try that.  The other uclinux ports seem to be one time efforts 
with no long term maintenance.  I see a lot of dead links on the 
uclinux.org site, and a lot of stuff that hasn't been updated since 2004.


I see that buildroot has obvious blackfin (bfin), m68k, and xtensa 
uclinux support.  But blackfin.uclinux.org says the uclinux port was 
deprecated in 2012.  m68k as mentioned above should be usable.  It 
appears that xtensa uclinux is still alive and usable.

http://wiki.linux-xtensa.org/index.php/UClinux
There may be other uclinux targets that are usable but don't have 
obvious patches to enable them.


Jim


Re: Cannot compile using cc1.

2018-10-08 Thread Jim Wilson

On 10/06/2018 06:07 AM, Tejas Joshi wrote:

I have gcc source code, stage1-build and test directories as siblings
and I've been trying to compile test.c in test/ using:

../stage1-build/gcc/cc1 test.c


That isn't expected to work.  You need to use the compiler driver, which 
is called xgcc in the build dir, and pass an option to let it know where 
the cc1 binary is.  So this should instead be


../stage1-build/gcc/xgcc -B../stage1-build/gcc/ test.c

The trailing slash on the -B option path is important.  If that doesn't 
work, then you may have configured your gcc tree wrong.  Some operating 
systems require specific configure options to be used to get a working 
compiler.  You can see the configure options used by the default 
compiler by using "/usr/bin/gcc -v".  Debian/Ubuntu require 
--enable-multiarch for instance, and the compiler build may not succeed 
if that configure option is missing.


If you want to run cc1 directly, you may need to pass in extra default 
options that the compiler driver normally passes to it.  You can see 
these options by passing the -v option to the gcc driver while compiling 
a file.  E.g. running "../stage1-build/gcc/xgcc -B../stage1-build/gcc/ 
-v test.c" and looking at the cc1 line will show you the options you 
need to pass to cc1 to make it work.


Jim


Re: section attribute of compound literals

2018-09-14 Thread Jim Wilson
On Fri, Sep 14, 2018 at 7:44 AM Jason A. Donenfeld  wrote:
> Assuming this is an array of a huge amount of
> chacha20poly1305_testvec, I'm not sure if there's a syntax for me to
> define the symbol inline with the declarations. Any ideas?

Don't do it inline.

u8[] __stuffdata key_value = ...
...
.key = key_value

Jim


Re: section attribute of compound literals

2018-09-13 Thread Jim Wilson

On 09/10/2018 10:46 PM, Jason A. Donenfeld wrote:

Hello,

I'd like to have a compound literal exist inside a certain linker
section. However, it doesn't appear to work as I'd like:

#define __stuffdata __attribute__((__section__("stuff")))
const u8 works[] __stuffdata = { 0x1, 0x2, 0x3, 0x4 };
const u8 *breaks = (const u8[] __stuffdata){ 0x1, 0x2, 0x3, 0x4 };


Attribute section applies to symbols not to types, so you can't use it 
in a cast.  In order to access data in another section, we need an 
address for it, and symbols have an address, but types do not.  The 
compound literal could have an address, but we don't have a way to 
attach attributes to compound literals.


Since your testcase already has a symbol in the right section, you could 
just use that to initialize breaks.

const u8 *breaks = works;
This means defining a bunch of extra symbols, but is a potential 
solution to your problem.


Jim


Re: Error from dwarf2cfi.c in gcc vers 7.2.0

2018-08-13 Thread Jim Wilson

On 08/12/2018 02:38 PM, Dave Pitts wrote:
I've been hacking with version 7.2.0 of gcc trying to adapt some old md 
files that I've got to this newer gcc. I've been getting errors from the 
dwarf2out_frame_debug_expr() function in dwarf2cfi.c line 1790 calling 
gcc_unreachable(). The expression being processed is a SET. The src 
operand is a MEM reference and the dest operand is a REG reference. If I 
run the compiler with the option -fno-asynchronous-unwind-tables I do 
NOT get the error and the generated code looks reasonable.


(set (REG) (MEM)) is not one of the patterns handled by 
dwarf2out_frame_debug_expr.  If this is an epilogue instruction to 
restore a register, then it should have a REG_CFA_RESTORE regnote which 
takes you to dwarf2out_frame_debug_cfa_restore instead.  If you are 
missing REG_CFA_RESTORE regnotes, then you are probably missing other 
CFA related regnotes too.  If this is not an epilogue register restore 
instruction, then you need to figure out why it was marked as frame 
related, and figure out what should have been done instead.


Jim


Re: gcov questions

2018-08-09 Thread Jim Wilson

On 08/09/2018 02:38 AM, daro...@o2.pl wrote:

Hello,   I wanted to ask what model for
branch coverage does gcov use?


There is a comment at the start of gcc/profile.c that gives some details 
on how it works.  It is computing execution counts for edges in the 
control flow graph.  As for which edges get instrumented, basically, you 
construct a control flow graph, create a minimal spanning tree to cover 
the graph, and then you only need to instrument the edges not on the 
spanning tree, plus the function entry point.  You can compute the rest 
of the edge counts from that.  Then there are some tricks to improve 
efficiency by putting frequently executed edges on the minimal spanning 
tree, so that infrequently edges get instrumented.


Gcov was originally written in 1990, based on an idea that came from 
Knuth's Art of Computer Programming.  Ball & Larus wrote a nice paper in 
1994 that does a good job of covering the methods used, though they may 
not have been aware of gcov at the time as it hadn't been accepted into 
GCC yet.  This is "Optimally Profiling and Tracing Programs" TOPLAS July 
1994.  I don't know if there are free copies of that available.  There 
may be better references available now, as these techniques are pretty 
widely known nowadays


Jim


Re: decrement_and_branch_until_zero pattern

2018-06-08 Thread Jim Wilson
On Fri, Jun 8, 2018 at 1:12 PM, Paul Koning  wrote:
> Thanks.  I saw those sections and interpreted them as support for signal 
> processor style fast hardware loops.  If they can be adapted for dbra type 
> looping, great.  I'll give that a try.

The rs6000 port uses it for bdnz (branch decrement not zero) for
instance, which is similar to the m68k dbra.

> Meanwhile, yes, it looks like there is a documentation bug.  I can clean that 
> up.  It's more than a few lines, but does that qualify for an "obvious" 
> change?

I think the obvious rule should only apply to trivial patches, and
this will require some non-trivial changes to fix the looping pattern
section.  Just deleting the decrement_and_branch_until_zero named
pattern section looks trivial.  It looks like the REG_NONNEG section
should  mention the doloop_end pattern instead of
decrement_and_branch_until_zero, since I think the same rule applies
that they only get generated if the doloop_end pattern exists.

Jim


Re: decrement_and_branch_until_zero pattern

2018-06-08 Thread Jim Wilson

On 06/08/2018 06:21 AM, Paul Koning wrote:

Interesting.  The ChangeLog doesn't give any background.  I suppose I should 
plan to approximate the effect of this pattern with a define-peephole2 ?


The old RTL loop optimizer was replaced with a new RTL loop optimizer. 
When the old one was written, m68k was a major target, and the dbra 
optimization was written for it.  When the new one was written, m68k was 
not a major target, and this support was written differently.  We now 
have doloop_begin and doloop_end patterns that do almost the same thing, 
and can be created by the loop-doloop.c code.


There is a section in the internals docs that talks about this.
https://gcc.gnu.org/onlinedocs/gccint/Looping-Patterns.html

The fact that we still have decrement_and_branch_until_zero references 
in docs and target md files looks like a bug.  The target md files 
should use doloop patterns instead, and the doc references should be 
dropped.


Jim


Re: RISC-V ELF multilibs

2018-05-31 Thread Jim Wilson
On Thu, May 31, 2018 at 7:23 AM, Matthew Fortune
 wrote:
> I do actually have a solution for this but it is not submitted upstream.
> MIPS has basically the same set of problems that RISC-V does in this area
> and in an ideal world there would be no 'fallback' multilib such that if
> you use compiler options that map to a library variant that does not
> exist then the linker just fails to find any libraries at all rather than
> using the default multilib.
>
> I can share the raw patch for this and try to give you some idea about how
> it works. I am struggling to find time to do much open source support at
> the moment so may not be able to do all the due diligence to get it
> committed. Would you be willing to take a look and do some of the work to
> get it in tree?

I have a long list of things on my to do list.  RISC-V is a new
target, and there is lots of stuff that needs to be bug fixed,
finished, or added.  I can't make any guarantees.

But if you file a bug report and then attach a patch to it, someone
might volunteer to help finish it.  Or if it is too big to be
reasonably attached to a bug report (like the nano mips work) you
could put it on a branch, and mention the branch name as unfinished
work in a bug report.

Jim


Re: RISC-V problem with weak function references and -mcmodel=medany

2018-05-29 Thread Jim Wilson
On Tue, May 29, 2018 at 11:43 AM, Sebastian Huber
 wrote:
> would you mind trying this with -Ttext=0x9000?

This gives me for the weak call

9014: 7097  auipc ra,0x7
9018: fec080e7  jalr -20(ra) # 0 <__global_pointer$+0x6fffe7d4>

> Please have a look at:
> https://sourceware.org/bugzilla/show_bug.cgi?id=23244
> https://sourceware.org/ml/binutils/2018-05/msg00296.html

OK.  I'm still catching up on mailing lists after the US holiday weekend.

Jim


Re: RISC-V problem with weak function references and -mcmodel=medany

2018-05-29 Thread Jim Wilson

On 05/29/2018 04:19 AM, Sebastian Huber wrote:

Changing the code to something like this

void f(void) __attribute__((__weak__));

void _start(void)
{
     void (*g)(void) = f;

     if (g != 0) {
     (*g)();
     }
}


This testcase works for me also, using -mcmodel=medany -O tmp.c 
-Ttext=0x8000 -nostdlib -nostartfiles.


I need enough info to reproduce your problem in order to look at it.

One thing you can try is adding -Wl,--noinhibit-exec, which will produce 
an executable even though there was a linker error, and then you can 
disassemble the binary to see what you have for the weak call.  That 
might give a clue as to what is wrong.



Why doesn't the RISC-V generate a trampoline code to call far functions?


RISC-V is a new target.  The answer to questions like this is that we 
haven't needed it yet, and hence haven't implemented it yet.  But I 
don't see any need for trampolines to support a call to 0.  We can reach 
anywhere in the low 32-bit address space with auipc/jalr.  We can also 
use zero-relative addressing via the x0 register if necessary.  We 
already have some linker relaxation support for that, but it doesn't 
seem to be triggering for this testcase.


Jim


Re: RISC-V problem with weak function references and -mcmodel=medany

2018-05-29 Thread Jim Wilson

On 05/28/2018 06:32 AM, Sebastian Huber wrote:
I guess, that the resolution of the weak reference to the undefined 
symbol __deregister_frame_info somehow sets __deregister_frame_info to 
the absolute address 0 which is illegal in the following "call 
__deregister_frame_info"? Is this construct with weak references and a 
-mcmodel=medany supported on RISC-V at all?


Yes.  It works for me.  Given a simple testcase

extern void *__deregister_frame_info (const void *)
 __attribute__ ((weak));
void * foo;
int
main (void)
{
  if (__deregister_frame_info)
__deregister_frame_info (foo);
  return 0;
}

and compiling with -mcmodel=medany -O -Ttext=0x8000, I get

8158:   8097auipc   ra,0x8
815c:   ea8080e7jalr-344(ra) # 0 
<_start-0x8000>


for the weak call.  It isn't clear what you are doing differently.

Jim


Re: RISC-V ELF multilibs

2018-05-29 Thread Jim Wilson

On 05/26/2018 06:04 AM, Sebastian Huber wrote:

Why is the default multilib and a variant identical?


This is supposed to be a single multilib, with two names.  We use 
MULTILIB_REUSE to map the two names to a single multilib.


rohan:1030$ ./xgcc -B./ -march=rv64imafdc -mabi=lp64d --print-libgcc
./rv64imafdc/lp64d/libgcc.a
rohan:1031$ ./xgcc -B./ -march=rv64gc -mabi=lp64d --print-libgcc
./rv64imafdc/lp64d/libgcc.a
rohan:1032$ ./xgcc -B./ --print-libgcc
./libgcc.a
rohan:1033$

So this is working right when the -march option is given, but not when 
no -march is given.  I'd suggest a bug report so I can track this, if 
you haven't already filed one.



Most variants include the C extension. Would it be possible to add -march=rv32g 
and -march=rv64g variants?


The expectation is that most implementations will include the C 
extension.  It reduces code size, improves performance, and I think I 
read somewhere that it takes only 400 gates to implement.


It isn't practical to try to support every possible combination of 
architecture and ABI here, as there are too many possible combinations. 
But if there is a major RISC-V target that is rv32g or rv64g then we 
should consider it.  You can of course define your own set of multilibs.


Jim



Re: GCC 8.1 Released

2018-05-02 Thread Jim Wilson

On 05/02/2018 10:21 AM, Damian Rouson wrote:

Could someone please point me to instructions for how to submit a change to the 
gfortran changes list?  I’d like to add the following bullet:


See also
https://gcc.gnu.org/contribute.html#webchanges

Jim



Re: GCC changes for Fedora + riscv64

2018-04-09 Thread Jim Wilson

On 04/08/2018 08:22 AM, Jeff Law wrote:

On 03/31/2018 12:27 PM, Richard W.M. Jones wrote:

I'd like to talk about what changes we (may) need to GCC in
Fedora to get it working on 64-bit RISC-V, and also (more
importantly) to ask your advice on things we don't fully
understand yet.  However, I don't know even what venue you'd
prefer to discuss this in.


A discussion here is fine with me.  I know of a few issues.

I have a work-in-progress --with-multilib-list patch in PR 84797 but it 
isn't quite right yet, and needs to work more like the patch in PR 
85142, which isn't OK to check in.


There is a problem with atomics.  We only have builtins for the ones 
that can be implemented with a single instruction.  Adding -latomic 
unconditionally might fix it, but won't work for gcc builds and the gcc 
testsuite unless we also add paths pointing into the libatomic build 
dir.  I'm also concerned that this might cause build problems, if we end 
up trying to link with libatomic before we have built it.  The simplest 
solution might be to just add expanders for all of the missing atomics, 
even if they require multiple instructions, just like how all of the 
mainstream linux targets currently work.


There is a problem with the linker not searching the right set of dirs 
by default.  That is more a binutils problem than a gcc problem, but the 
linker might need some help from gcc to fix it, as the linker doesn't 
normally take -march and -mabi options.


There is a problem with libffi, which has RISC-V support upstream, but 
not in the FSF GCC copy.  This is needed for go language support.  There 
was also a dispute about go something naming, as to whether it should be 
riscv64 or riscv, with one person doing a port choosing the former and 
another person doing another port choosing the latter.


Those are all of the Linux specific ones I can remember at the moment. 
I might have missed some.


Jim


Re: Copyright assignment form

2018-01-16 Thread Jim Wilson
On Tue, Jan 16, 2018 at 12:01 PM, Siddhesh Poyarekar
 wrote:
> You need a separate assignment for every GNU project you intend to
> contribute to, so separate assignments for GCC, glibc, binutils, etc.

The form is the same for all GNU projects.

You can file an assignment that covers a single patch, or a single
project, or multiple patches, or multiple projects, or even all
patches for all projects.  Though of course the lawyers will have a
say in this, as they may not be comfortable with a broad assignment,
and may want to restrict it to specific projects, or even specific
patches for specific projects.  The more restricted the assignment,
the more paperwork you have to do, but the easier it it to get it past
lawyers uncomfortable with FSF requirements.  For instance, you can
get an assignment for a single patch, but then you have to go through
the assignment process every time you contribute a patch.

Jim


Re: Copyright assignment form

2018-01-16 Thread Jim Wilson

On 01/15/2018 03:11 PM, Shahid Khan wrote:

Our team at Qualcomm Datacenter Technologies, Inc. is interested in 
contributing patches to the upstream GCC compiler project. To get the process 
started, we'd like to request a copyright assignment form as per contribution 
guidelines outlined at https://gcc.gnu.org/contribute.html.

Please let me know if there are additional steps we need to take to become an 
effective contributor to the GCC community.


You should contact ass...@gnu.org directly.  The standard forms contain 
language about patents that Qualcomm lawyers are unlikely to be 
comfortable with, and may require negotiating a non-standard agreement. 
As best as I can tell, the FSF has never received a copyright assignment 
or disclaimer from Qualcomm.  If this is the first time Qualcomm lawyers 
are talking to the FSF, this will take a while.  I would not be 
surprised if this takes a year or two.  You will also need a VP level 
signature for the forms once you get approval from Qualcomm lawyers.


You may want to consider getting a disclaimer from your employer, and 
then filing personal assignments.  It is probably easier to get a 
disclaimer from Qualcomm than an assignment, but this requires more 
paperwork, since each individual contributing then needs their own 
personal assignment.  The disclaimers also have language about patents 
that the Qualcomm lawyers may not like, so while this should be easier, 
it is still likely a difficult process.


Siddhesh can help you with this as the rules for gcc are the same as for 
glibc.


Jim


Re: Fwd: gcc 7.2.0 error: no include path in which to search for stdc-predef.h

2017-12-05 Thread Jim Wilson

On 12/04/2017 01:11 PM, Marek wrote:

looking at config.log i see theses errors:


Configure does a number of feature tests to see what features are 
available for use.  It is expected that some of these feature tests will 
fail.  Some features are optional, and if that feature test fails there 
is no problem; we just use an alternative feature.  Some features are 
required, and if that feature test fails then configure exits with an 
error.  If you get one of these, it will be the very last feature test 
in the config.log file, and will have some kind of error message that 
indicates that configure can not continue after this failure.


But this doesn't seem relevant to your problem, as Kai Ruottu already 
pointed out what is wrong...



On Fri, Dec 1, 2017 at 11:23 AM, Kai Ruottu  wrote:

Kai Ruottu kirjoitti 1.12.2017 klo 12:02:



Answering to my own question... Yes, it should include this :
https://git.musl-libc.org/cgit/musl/tree/include


Maybe there is another target name one should use like
'x86_64-lfs-linux-musl' in your case?


The docs for musl are telling just this, one should use the
'-linux-musl' triplet!


Gcc assumes glibc for for a linux target.  If you want to use musl, you 
must include musl in the target triplet that you configure for.  See the 
gcc/config.gcc file, and look at the places where it checks the target 
triplet for musl to enable musl support.


Jim



Re: gcc 7.2.0 error: no include path in which to search for stdc-predef.h

2017-11-27 Thread Jim Wilson

On 11/26/2017 11:09 PM, Marek wrote:

Hi,

while compiling 7.2.0 im getting the following:

cc1: error: no include path in which to search for stdc-predef.h
cc1: note: self-tests are not enabled in this build


This doesn't appear to be a build error.  Configure runs the compiler to 
check for features, and if a check fails, then the feature is disabled. 
This is normal, and nothing to worry about.  Though the message is 
unusual.  If the compiler is the one you just built, there might be 
something wrong with it.  Or there might be a minor configure script bug.



configure: error: in
`/run/media/void/minnow/build/gcc-7.2.0/x86_64-lfs-linux-gnu/libgcc':
configure: error: cannot compute suffix of object files: cannot compile
See `config.log' for more details.
make[1]: *** [Makefile:12068: configure-target-libgcc] Error 1
make: *** [Makefile:880: all] Error 2


This is the real build error.  You need to look at the config.log file 
in the directory where configure failed to see what the problem is. 
This is usually a build environment problem of some sort.



If gcc is able to recognize between sources in one dir and objects in
another dir


Yes.  The usual way to configure gcc is something like
  mkdir build
  cd build
  ../gcc/configure

Jim



Re: [net-next:master 488/665] verifier.c:undefined reference to `__multi3'

2017-11-13 Thread Jim Wilson

On 11/11/2017 05:33 PM, Fengguang Wu wrote:

CC gcc list. According to Alexei:

  This is a known issue with gcc 7 on mips that is "optimizing"
  normal 64-bit multiply into 128-bit variant.
  Nothing to fix on the kernel side.


I filed a bug report.  This is now
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82981

I found a helpful thread at
https://www.linux-mips.org/archives/linux-mips/2017-08/msg00041.html
that had enough info for me to reproduce and file the bug report.

Jim



Re: [PATCH] RISC-V: Add Jim Wilson as a maintainer

2017-11-07 Thread Jim Wilson
On Mon, Nov 6, 2017 at 6:39 PM, Palmer Dabbelt  wrote:
>
> +riscv port         Jim Wilson  
>
>
It is jimw not jim for the email address.  Please fix.

Jim


Re: -ffunction-sections and -fdata-sections documentation

2017-10-13 Thread Jim Wilson

On 10/13/2017 12:06 AM, Sebastian Huber wrote:

The end-of-life of Solaris 2.6 was 2006. Is it worth to mention this here?


The reference to Solaris 2.6 is no longer useful.  Just mention ELF here.

This "AIX may have these optimizations in the future." is there since at 
least 1996. What is the current AIX status?


David answered this.

Is the "Only use these options when there are significant benefits from 
doing so. When you specify these options, the assembler and linker 
create larger object and executable files and are also slower. You 
cannot use |gprof| on all systems if you specify this option, and you 
may have problems with debugging if you specify both this option and 
-g." still correct on the systems of today?


You can get larger objects, because as Jeff mentioned, some 
compile-time/assembly-time optimizations get disabled.  That should 
probably be clarified.


The assembler/linker will be slower because they will have more work to 
do, more relocations, more sections, larger object files.


Some old systems could not support both -ffunction-sections and -pg 
together, this used to give a warning, which was removed in 2012.  I 
believe this is obsolete.  The likely explanation for this doc is

  https://gcc.gnu.org/ml/gcc-help/2008-11/msg00139.html
which mentions that it was already long ago fixed at that time.

Using -g should not be a problem on an ELF/DWARF system, which is what 
most systems use nowadays.  There could be issues with other object 
files/debug info formats, but this is unclear.  I suspect this comment 
is obsolete and can be removed.


The doc should probably refer to the linker --gc-sections option, as 
this is what makes -ffunction-sections useful for most people, by 
reducing the code size by eliminating unused functions.



Do these options affect the code generation?


Jeff answered this.

Jim




Re: Byte swapping support

2017-09-13 Thread Jim Wilson

On 09/12/2017 02:32 AM, Jürg Billeter wrote:

To support applications that assume big-endian memory layout on little-
endian systems, I'm considering adding support for reversing the
storage order to GCC. In contrast to the existing scalar storage order
support for structs, the goal is to reverse the storage order for all
memory operations to achieve maximum compatibility with the behavior on
big-endian systems, as far as observable by the application.


Intel has support for this in icc.  It took about 5 years for a small 
team to make it work on a very large application.  That includes both 
the compiler development and application development time.  There are a 
lot of complicated issues that need to solved to make this work on real 
code, both in the compiler and in the application code.  There is a Dr 
Dobbs article about some of it, search for "Writing a Bi-Endian 
Compiler" if you are interested.


Even though they got it working, it was painful to use.  Icc goes to a 
lot of trouble to optimize away unnecessary byte-swapping to improve 
performance, but that meant any variable could be big or little endian 
despite how it was declared, and could be different endianness at 
different places in the code, and could even be both endianness (stored 
in two locations) at the same time if the code needed both endianness. 
Sometimes we'd find a bug, and it would take a week to figure out if it 
was a compiler bug or an application bug.



To facilitate byte swapping at endian boundaries (kernel or libraries),
I'm also considering developing a new GCC builtin that can byte-swap
whole structs in memory. There are limitations to this, e.g., unions
could not be supported in general. However, I still expect this to be
very useful.


There is a lot more stuff that will cause problems.  Byte-swapping FP 
doesn't make sense.  You can only byte swap a variable if you know its 
type, but you don't know the type of a va_list ap argument, so you can't 
call a big-endian vprintf from little-endian code and vice versa.  If 
you have a template expanded in both big and little endian code, you 
will run into problems unless name mangling changes to include endian 
info, which means you lose ABI compatibility with the current name 
mangling scheme.


There will also be trouble with variables in shared libraries that get 
initialized by the dynamic linker.  You will either have to add a new 
set of other-endian relocations, or else you will have to add code to 
byte-swap data after relocations are performed, probably via an init 
routine, which will have to run before the other init routines.  There 
is also the same issue with static linking, but that one is a little 
easier to handle, as you can use a post-linking pass to edit the binary 
and byte swap stuff that needs to be byte swapped after relocations are 
performed.


To handle endian boundaries, you willl need to force all declarations to 
have an endianness, and you will need to convert when calling a 
big-endian function from a little-endian function, and vice versa, and 
you will need to give an error if you see something you can't convert, 
like a va_list argument.  Besides the issue of the C library not 
changing endianness, you will likely also have third party libraries 
that you can't change the endianness of, and that need to be linked into 
your application.


Before you start, you should give some thought to how debugging will 
work.  DWARF does have an endianity attribute, you will need to set it 
correctly, or debugging will be hopeless.  Even if you set it correctly, 
if you have optimizations to remove unnecessary byte swapping, debugging 
optimized code will still be hard and people using the compiler will 
have to be trained on how to deal with endianness issues.


And there are lots of other problems, I don't have time to document them 
all, or even remember them all.  Personally, I think you are better off 
trying to fix the application to make it more portable.  Fixing the 
compiler is not a magic solution to the problem that is any easier than 
fixing the application.


Jim


Re: layout of __attribute__((packed)) vs. #pragma pack

2017-08-04 Thread Jim Wilson

On 07/28/2017 04:51 AM, Geza Herman wrote:

There's an option in GCC "-mms-bitfields". The doc about it begins with:

"If packed is used on a structure, or if bit-fields are used, it may be 
that the Microsoft ABI lays out the structure differently than the way 
GCC normally does. Particularly when moving packed data between 
functions compiled with GCC and the native Microsoft compiler (either 
via function call or as data in a file), it may be necessary to access 
either format."


I'm particularly interested in packed structs, bit-fields are not a 
concern now. Does this doc mean, that a packed struct layout may differ 
between GCC and MSVC? The doc doesn't give an example of this, it just 
talks about bit-fields. If the packed layout can differ, in which way 
does it? Previously I thought that both compilers put members into the 
struct without any padding, so the layout must match.


Different ABIs handle bit-fields differently, and as a result, different 
ABIs handle packed structures with bit-fields differently.


There are testcases for this in the gcc testsuite.  In the gcc sources, 
look at gcc/testsuite/gcc.dg/bf-ms-layout.c.


Otherwise, for a packed structure without bit-fields, and without 
sub-structures, it is probably handled the same across most ABIs.



Plus, __attribute__((packed)) documentation have changed:

https://gcc.gnu.org/onlinedocs/gcc-7.1.0/gcc/Common-Variable-Attributes.html, 
here the text is:


The packed attribute specifies that a variable or structure field should 
have the smallest possible alignment—one byte for a variable, and one 
bit for a field, unless you specify a larger value with the aligned 
attribute.



https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.html, here the 
text is:


"This attribute, attached to struct or union type definition, specifies 
that each member (other than zero-width bit-fields) of the structure or 
union is placed to minimize the memory required"



What does "minimize" mean here? Does it give the same guarantees as the 
previous definition? Could it mean that there's still padding in the 
struct?


Structure layout is complicated, and it isn't possible to explain all 
details in a single sentence.  There may be cases where a packed 
structure still contains padding.  It depends on the ABI, and how 
bit-fields are handled, etc.  You can see some examples with the 
-ms-bitfields examples given above, where some packed structures are 
larger with gcc than msvc, and some are larger with msvc than gcc.


If in doubt, the best solution is to write some testcases to check, or 
add some consistency checking code, e.g. you can add some code to verify 
that type sizes are what you expect at compile-time.


Jim


Re: libatomic IFUNC question (arm & libat_have_strexbhd)

2017-06-07 Thread Jim Wilson

On 06/06/2017 09:01 AM, Steve Ellcey wrote:

So the question remains, where is libat_have_strexbhd set?  As near as
I can tell it isn't set, which would make the libatomic IFUNC pointless
on arm.


libat_have_strexbhd isn't set anywhere.  It looks like this was a 
prototype that was never fully fleshed out.  See for instance the 
libatomic/config/x86/init.c file.  Finishing this means someone has to 
figure out how to use the arm cpuid equivalent to set the two variables 
appropriately.


Jim



Re: Getting spurious FAILS in testsuite?

2017-06-05 Thread Jim Wilson

On 06/01/2017 05:59 AM, Georg-Johann Lay wrote:

Hi, when I am running the gcc testsuite in $builddir/gcc then
$ make check-gcc RUNTESTFLAGS='ubsan.exp'
comes up with spurious fails.


This was discussed before, and the suspicion was that it was a linux 
kernel bug.  There were multiple kernel fixes pointed at, it wasn't 
clear which one was required to fix it.


I have Ubuntu 16.04 LTS on my laptop, and I see the problem.  I can't 
run the ubsan testsuites with -j factor greater than one and get 
reproducible results.  There may also be other ways to trigger the problem.


See for instance the thread
https://gcc.gnu.org/ml/gcc/2016-07/msg00117.html
The first message in the thread from Andrew Pinski mentions that the log 
output is corrupted from apparent buffer overflow.


Jim


Re: FW: Build failed in Jenkins: BuildThunderX_native_gcc_upstream #1267

2017-03-17 Thread Jim Wilson

On 03/17/2017 04:12 PM, Jim Wilson wrote:

I have access to a fast box that isn't otherwise in use at the moment so
I'm taking a look.  r246225 builds OK.  r246226 does not.  So it is
Bernd's combine patch.  A little experimenting shows that the compare
difference is triggered by the use of -gtoggle in stage2, which is not
used in stage3.  Otherwise stage2 and stage3 generate identical code.
The bug is apparently due to a problem with handling debug insns in the
combine patch.


Changing a new prev_nonnote_insn call to a prev_nonnote_nondebug_insn 
call appears to solve the problem.  I will have to do a bootstrap and 
make check from scratch to verify.  I also noticed that there is a 
redundant i1 check in the patch which should be fixed also.


Jim



Re: FW: Build failed in Jenkins: BuildThunderX_native_gcc_upstream #1267

2017-03-17 Thread Jim Wilson

On 03/17/2017 03:28 PM, Jeff Law wrote:

On 03/17/2017 03:31 PM, Andrew Pinski wrote:

On Fri, Mar 17, 2017 at 11:47 AM, Bernd Schmidt 
wrote:

On 03/17/2017 07:38 PM, Pinski, Andrew wrote:


One of the following revision caused a bootstrap comparison failure on
aarch64-linux-gnu:
r246225
r246226
r246227



Can you help narrow that down?


I can though I don't want to duplicate work since Jeff was going to
provision an aarch64 system.  My automated testing is approximately
every hour or so; these commits were within an hour window even.
I did not look into the revisions when I write the email but I suspect
r246227 did NOT cause it since aarch64 does not use reload anymore.

My the box I got isn't terribly fast, but regardless I'll be walking
through each commit to see if I can trigger the failure.

246224 tested OK (as it should).

246225 is in progress.


I have access to a fast box that isn't otherwise in use at the moment so 
I'm taking a look.  r246225 builds OK.  r246226 does not.  So it is 
Bernd's combine patch.  A little experimenting shows that the compare 
difference is triggered by the use of -gtoggle in stage2, which is not 
used in stage3.  Otherwise stage2 and stage3 generate identical code. 
The bug is apparently due to a problem with handling debug insns in the 
combine patch.


Jim



Re: GNU Toolchain Fund established at the Free Software Foundation

2017-03-10 Thread Jim Wilson

On 03/10/2017 03:08 AM, David Edelsohn wrote:

On Thu, Mar 9, 2017 at 8:48 PM, Ian Lance Taylor  wrote:

On Thu, Mar 9, 2017 at 11:49 AM, David Edelsohn  wrote:

As discussed at the last Cauldron, the first interest of the community
seems to be the shared infrastructure of Sourceware: hosting, system
administration, backups, and updating the websites.


There was also a suggestion of funding travel for speakers at the GNU 
Cauldron, for people who might not be able to afford the travel otherwise.


Jim



Re: Do we really need a CPP manual?

2016-12-16 Thread Jim Wilson

On 12/16/2016 10:06 AM, Jeff Law wrote:

That's likely the manual RMS kept asking folks (semi-privately) to
review.  My response was consistently that such review should happen
publicly, which RMS opposed for reasons I don't recall.


I reviewed it, on the grounds that a happy rms is good for the gcc 
project, and because I haven't been doing much else useful.  It was a 
lot of work, about 10 hours a week for 2 months.  The document I 
reviewed has significant differences from the one on the web site, but 
has a lot of structural similarities.  I think there is a major rewrite 
still in progress.  I pointed out all of the obvious stuff, features 
dropped long ago, references to out-of-date standards, missing ISO C 
2011 features, etc.


Jim



Re: LSDA unwind information is off by one (in __gcc_personality_v0)

2016-10-20 Thread Jim Wilson

On 10/20/2016 11:51 AM, Florian Weimer wrote:

exception handling region.  Subtracting 1 is an extremely hackish way to
achieve that and likely is not portable at all.


Gdb has been doing this for over 25 years for every architecture.  When 
you use the backtrace command, it gets a return address, subtracts one, 
and then does a file name/line number lookup.  This is because the file 
name and source line number of the call instruction may not be the same 
as the instruction after the call.  This does of course assume that you 
have a return address, and are doing some kind of range based lookup on 
addresses, so you don't need an exact instruction address to get a hit. 
Exception regions work the same way.


I think that there is some sort of configure related problem here, as 
HAVE_GETIPINFO is set when I build on an Ubuntu x86_64-linux system. 
Looking at the configure test, which is in config/unwind_ipinfo.m4... if 
you don't use --with-system-libunwind, then HAVE_GETIPINFO defaults to 
on.  If you do use --with-system-libunwind, then HAVE_GETIPINFO defaults 
to off, which will break handling for signal frames.  I'm not sure if 
anyone is using --with-system-libunwind, so I'm not sure if this needs a 
gcc bug report.


But I also see that while HAVE_GETIPINFO appears to be set by configure, 
it is apparently not being used when building unwind-c.o.  I see that 
HAVE_GETIPINFO is set in the libgcc/auto-target.h file, but this file is 
not included by unwind-c.c.  I only see includes of this in 
libgcc/config/i386/cpuinfo.c and libgcc/config/sol2/gmon.c.  I don't 
know offhand how auto-target.h is supposed to work, but it appears that 
it needs to be included in the unwind files built as part of libgcc. 
This is maybe a bug accidentally caused when libgcc was moved out of the 
gcc dir and into its own top level dir.  I think this warrants a gcc bug 
report.


Jim



Re: Replacement for the .stabs directive

2016-08-24 Thread Jim Wilson

On 08/19/2016 12:55 PM, Umesh Kalappa via llvm-dev wrote:

We have the legacy code  ,that uses the .stabs directive quiet often
in the source code like

.stabs "symbol_name", 100, 0, 0, 0 + .label_one f;

.label_one
 stmt
and ,the above code is wrapped with the  inline asm in the c source file .


Presumably the ".label_one f" is actually "1f" and the ".label_one" is 
"1:".  That would make more sense, as this is a use of the GNU as local 
label feature.


Unfortunately, there is no easy to do this in dwarf, as dwarf debug info 
is split across multiple sections and encoded.  Maybe this could work if 
you handled it like a comdat symbol, but that would be inconvenient, and 
might not even work.  This seems like a option not worth pursuing.


The fact that this worked for stabs is more accident by design.  The 
code never should have been written this way in the first place.


You can make the association between a symbol name and an address by 
using an equivalence.  E.g. you could do

asm ("symbol_name = 1f");
but this puts the symbol_name in the symbol table, which works only if 
symbol_name is unique or maybe unique within its scope if function 
local.  If the name was unique, you probably wouldn't have used the ugly 
stabs trick in the first place, so this might not work.  If the symbol 
names aren't unique, maybe you can change the code to make them unique? 
Using an equivalence gives the same effective result as using

symbol_name: stmt

Jim

PS Cross posting like this is discouraged.  I would suggest just asking 
assembler questions on the binutils list.




Re: Supporting subreg style patterns

2016-08-17 Thread Jim Wilson

On 08/16/2016 03:10 AM, shmuel gutl wrote:

My hardware directly supports instructions of the form
subreg:SI(reg:VEC v1,3) = SI:a1


Subregs of hard registers should be avoided.  They are primarily useful 
for pseudo regs.  Subregs that aren't lowpart subregs should be avoided 
also.  Except when you have a subreg of a pseudo that maps to multiple 
hard regs, and can eventually become a lowpart subreg after the pseudo 
gets allocated to a hard reg and gets simplified.


It isn't clear where the subregs are coming from, but what you are doing 
sounds like a bit-field extract/insert, and these are not operations 
that the register allocator will add to the code.  Depending on what 
exactly you are trying to do, I have two general suggestions.


1) Define the vector registers as 32-bit registers, and define vector 
operations as using aligned groups of these 32-bit registers.  This 
exposes the 32-bit registers to the register allocator so that it can 
use them directly.


2) Use zero_extract and/or vec_select instead of subreg, which requires 
that you have patterns that emit the zero_extract/vec_select operations, 
patterns that recognize them, and possibly builtin functions that the 
user can call to get these zero_extract/vec_select operations emitted 
into the rtl.  There is a named pattern vec_extract that the vectorizer 
can use to generate these rtl operations.  For examples of this, in the 
aarch64 port, see for instance the aarch64_movdi_* patterns in the 
aarch64.md file, and the aarch64_get_lane* patterns in the 
aarch64-simd.md file.


Jim



Re: Change the arrch64 abi ...(Custom /Specific change)

2016-04-05 Thread Jim Wilson
On Tue, Apr 5, 2016 at 2:45 AM, Umesh Kalappa  wrote:
> I need to ,make the  changes only to the  function args(varargs),hence
> making the changes in TARGET_PROMOTE_FUNCTION_MODE will do ?.

If TARGET_PROMOTE_FUNCTION_MODE disagrees with PROMOTE_MODE, it is
possible that the middle end may generate incorrect RTL.  This was
seen with the arm target when it was using different sign extension
for args and locals.  It may or may not be a problem for SImode
extension versus DImode extension.  If you run into optimizer
problems, you may need to change PROMOTE_MODE also to solve them.

> one more question ,i do have defined the TARGET_PROMOTE_FUNCTION_MODE
> (arm.c) and cross compilling for aarch64 ,but still gcc calls
> default_promote_function_node i.e

Add it to aarch64.c instead of arm.c.  arm.c is for 32-bit arm code.
aarch64.c is for 64-bit arm code.

Jim


Re: Change the arrch64 abi ...(Custom /Specific change)

2016-04-04 Thread Jim Wilson

On 04/04/2016 08:55 AM, Umesh Kalappa wrote:

We are in process of changing the gcc compiler for aarch64 abi ,w.r.t
varargs  function arguments handling.

default(LP64) ,where 1,2,4 bytes  args are promoted to word size i.e 4
bytes ,we need to change these behaviour to 8 bytes (double word).

we are looking both hooks like  PROMOTE_MODE and
TARGET_PROMOTE_FUNCTION_MODE to make the changes.


I think this would work.  You just need to promote all modes less than 8 
bytes to DImode, instead of the current code that promotes modes smaller 
than 4 bytes to SImode.  You would do this for the default LP64 type 
system, but not for the ILP32 type system.


This would affect all function arguments and locals, which may cause 
code size and/or performance issues.  You would have to check for that.


Also, this may prevent linking with any 3rd party code compiled by 
unmodified gcc, or code compiled with other compilers (e.g. LLVM), 
because changing TARGET_PROMOTE_FUNCTION_MODE can cause ABI changes. You 
may need to check that also.


Jim



Re: extendqihi2 and GCC RTL type system

2016-02-22 Thread Jim Wilson
On Mon, Feb 22, 2016 at 7:55 AM, David Edelsohn  wrote:
> If I remove extendqihi2 (extend:HI pattern) from the PowerPC port,
> will that cause any problems for the GCC RTL type system or inhibit
> optimizations?   I see that Alpha and SPARC define extendqihi2, but
> IA-64 and AArch64 do not, so there is precedent for both approaches.

aarch64 does have an extendqihi2 pattern.  It uses so many iterator
macros that you can't use grep to look for stuff.  The extendqihi2
pattern is called qihi2.

If you have a target with registers larger than HImode, no HImode
register operations, qi/hi loads set the entire register, and you
define PROMOTE_MODE to convert all QImode and HImode operations to the
same larger mode with the same signedness, then I don't think that
there is any advantage to having an extendqihi2 pattern.  You should
get the same code with or without it, as a qimode to himode conversion
is a no-op.  The only difference should be that with an extendqihi2
pattern you will see some HImode operations in the RTL, without
extendqihi2 you will see equivalent operations in the promoted mode.

If you are concerned about this, then just try compiling some large
code base using two compilers, one with extendqihi2 and one without,
and check to see if there are any code generation differences.

Jim


Re: [RFC PR43721] Optimize a/b and a%b to single divmod call

2016-01-31 Thread Jim Wilson
On Fri, Jan 29, 2016 at 12:09 AM, Richard Biener  wrote:
> I wonder if rather than introducing a target hook ports could use
> a define_expand expanding to a libcall for this case?

Of the two divmod libcall APIs, one requires a stack temporary, which
would be awkward to allocate in a define_expand.  Though we could have
expand_twoval_binop implement the libgcc udivmoddi4 API which requires
a stack temp, and then add an ARM divmod expander that implements the
ARM API which has a double-wide result.  That sounds like it could
work.

Jim


Re: [RFC PR43721] Optimize a/b and a%b to single divmod call

2016-01-31 Thread Jim Wilson
On Sun, Jan 31, 2016 at 8:43 PM, Jim Wilson  wrote:
>> Are we certain that the libcall is a win for any target?
>> I would have expected a default of
>> q = x / y
>> r = x - (q * y)
>> to be most efficient on modern machines.  Even more so on targets like ARM
>> that have multiply-and-subtract instructions.

If there is a div insn, then yes, gcc will emit a div and a multiply.
However, a div insn is a relatively recent addition to the 32-bit ARM
architecture.  Without the div insn, we get a div libcall and a mod
libcall.  That means two libcalls, both of which are likely
implemented by calling the divmod libcall and returning the desired
part of the result.  One call to a divmod libcall is clearly more
efficient than two calls to a divmod libcall.  So that makes the
transformation useful.

Prathamesh's patch has a number of conditions required to trigger the
optimization, such as a divmod insn, or a lack of a div insn and the
presence of a divmod libcall.

Jim


Re: [RFC PR43721] Optimize a/b and a%b to single divmod call

2016-01-31 Thread Jim Wilson
On Sun, Jan 31, 2016 at 2:15 PM, Richard Henderson  wrote:
> On 01/29/2016 12:37 AM, Richard Biener wrote:
>>>
>>> To workaround this, I defined a new hook expand_divmod_libfunc, which
>>> targets must override for expanding call to target-specific dimovd.
>>> The "default" hook default_expand_divmod_libfunc() expands call to
>>> libgcc2.c:__udivmoddi4() since that's the only "generic" divmod
>>> available.
>>> Is this a reasonable approach ?
>>
>>
>> Hum.  How do they get to expand/generate it today?  That said, I'm
>> no expert in this area.
>>
>> A simpler solution may be to not do the transform if there is no
>> instruction for divmod.
>
>
> Are we certain that the libcall is a win for any target?
>
> I would have expected a default of
>
> q = x / y
> r = x - (q * y)
>
> to be most efficient on modern machines.  Even more so on targets like ARM
> that have multiply-and-subtract instructions.
>
> I suppose there's the case of the really tiny embedded chips that don't have
> multiply patterns either.  So that this default expansion still results in
> multiple libcalls.
>
> I do like the transformation, because for machines that don't have a divmod
> instruction, being able to strength-reduce a mod operation to a multiply
> operation is a nice win.
>
>
> r~


Re: [RFC PR43721] Optimize a/b and a%b to single divmod call

2016-01-28 Thread Jim Wilson
On Thu, Jan 28, 2016 at 5:37 AM, Richard Biener  wrote:
>> To workaround this, I defined a new hook expand_divmod_libfunc, which
>> targets must override for expanding call to target-specific dimovd.
>> The "default" hook default_expand_divmod_libfunc() expands call to
>> libgcc2.c:__udivmoddi4() since that's the only "generic" divmod
>> available.
>> Is this a reasonable approach ?
>
> Hum.  How do they get to expand/generate it today?  That said, I'm
> no expert in this area.

Currently, the only place where a divmod libfunc can be called is in
expand_divmod in expmed.c, which can return either the div or mod
result, but not both.  If this is called for the mod result, and there
is no div insn, and no mod insn, and no mod libfunc, then it will call
the divmod libfunc to generate the mod result.  This is exactly the
case where the ARM port needs it, as this code was written for the
arm.

There are 3 targets that define a divmod libfunc: arm, c6x, and spu.
The arm port is OK, because expand_divmod does the right thing for
arm, using the arm divmod calling convention.  The c6x port is OK
because it defines mod insns and libfuncs, and hence the divmod
libfunc will never be called and is redundant.  The spu port is also
OK, because it defines mod libcalls, and hence the divmod libfunc will
never be called, and is likewise redundant.  Both the c6x and spu
ports have their own divmod library functions in
libgcc/config/$target.  The divmod library functions are called by the
div and mod library functions, so they are necessary, they are just
never directly called.  Both the c6x and spu port uses the current
libgcc __udivmoddi4 calling convention with a pointer to the mod
result, which is different and incompatible to the ARM convention of
returning a double size result that contains div and mod.

WIth Prathamesh's patch to add support to the tree optimizers to
create divmod operations, the c6x and spu ports break.  The divmod
libfuncs are no longer redundant, and will be called, except with the
wrong ABI, so we need to extend the divmod support to handle multiple
ABIs.  This is why Prathamesh added the target hook for the divmod
libcall, so the target can specify the ABI used by its divmod
libcalls.  Prathamesh has correct support for ARM (current code), and
apparently correct code for c6x and spu (libgcc udivmodsi4).

> A simpler solution may be to not do the transform if there is no
> instruction for divmod.

This prevents the optimization from happening on ARM, which has divmod
libfuncs but no divmod insn.  We want the optimization to happen
there, as if we need both div and mod results, then calling the divmod
libfunc is faster than calling both the div and mod libfuncs.

Jim


Re: vectorization ICE for aarch64/armhf on SPEC2006 h264ref

2016-01-12 Thread Jim Wilson
On Tue, Jan 12, 2016 at 2:22 PM, Jim Wilson  wrote:
> I see a number of places in tree-vect-generic.c that add a
> VIEW_CONVERT_EXPR if useless_type_convertsion_p is false.  That should
> work, except when I try this, I see that the VIEW_CONVERT_EXPR gets
> converted to a NOP_EXPR by gimplify_build1, and gets stripped again.

To elaborate on this a bit more, I see a number of places that do this
  if (!useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (new_rhs)))
new_rhs = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, TREE_TYPE (lhs),
   new_rhs);

In match.pd, there is a rule to convert VIEW_CONVERT_EXPR to NOP_EXPR
(simplify
  (view_convert @0)
  (if ((INTEGRAL_TYPE_P (type) || POINTER_TYPE_P (type))
   && (INTEGRAL_TYPE_P (TREE_TYPE (@0)) || POINTER_TYPE_P (TREE_TYPE (@0)))
   && TYPE_PRECISION (type) == TYPE_PRECISION (TREE_TYPE (@0)))
   (convert @0)))

But according to useless_type_conversion_p, there are two more
conditions that need to be met, the signedness must be the same, and
if one is boolean  and one is not then the precision must be one.  In
my case, we have a 32-bit int type and a 32-bit boolean type.  So
useless_type_conversion_p is demanding a type conversion, but match.pd
is converting the VIEW_CONVERT_EXPR to a NOP_EXPR, and gimplify_build1
is stripping it.  So there appears to be an inconsistency here.

Jim
.


vectorization ICE for aarch64/armhf on SPEC2006 h264ref

2016-01-12 Thread Jim Wilson
I'm looking at an ICE on SPEC 2006 464.h264ref slice.c that occurs
with -O3 for both aarch64 and armhf.

palantir:2080$ ./xgcc -B./ -O3 -S slice.i
slice.c: In function ‘poc_ref_pic_reorder’:
slice.c:838:6: error: incorrect type of vector CONSTRUCTOR elements

{_48, _55, _189, _59}

vect_no_reorder_16.92_252 = {_48, _55, _189, _59};
slice.c:838:6: internal compiler error: verify_gimple failed
...

This fails because it is expecting int type elements in the
constructor, and we have instead elements with boolean type.
useless_type_conversion_p says that it isn't OK to substitute bool for
int.

I used bisection to trace the problem to the patch for buzilla 68215
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68215

The problem occurs in expand_vector_condition.  a_is_comparison is
false.  Before the patch, aa is an ssa_name with type boolean returned
by tree_vec_extract.   This is passed to passed to gimplify_build3
which returns a cond_expr with type int.  After the patch, aa is a
ne_expr with type boolean.  gimplify_build3 calls fold_build3_loc
which optimizes the cond_expr/ne_expr, and returns a nop_expr of type
int of the boolean ne_expr.  gimplify_build3 then calls STRIP_NOPS
which removes the nop_expr, and the end result here is a ne_expr with
a boolean type, which is the wrong type for the constructor.

I don't have a lot of experience with the gimple work, so I'm not sure
where this is going wrong.

I see a number of places in tree-vect-generic.c that add a
VIEW_CONVERT_EXPR if useless_type_convertsion_p is false.  That should
work, except when I try this, I see that the VIEW_CONVERT_EXPR gets
converted to a NOP_EXPR by gimplify_build1, and gets stripped again.

Maybe the gimplify_build* routines should be using
STRIP_USELESS_TYPE_CONVERSION instead of STRIP_NOPS?  That seems to
work, but I don't know if that will have cascading effects.

Or maybe verify_gimple should allow bools and ints to mix in a
constructor?  That doesn't seem like the right solution to me.

Jim


Re: reload question about unmet constraints

2015-09-15 Thread Jim Wilson
On Tue, Sep 15, 2015 at 8:53 AM, Ulrich Weigand  wrote:
> Jim Wilson wrote:
> In that case, you might be able to fix the bug by splitting the
> offending insns into two patterns, one only handling near mems
> and one handling one far mems, where the near/far-ness of the mem
> is verified by the *predicate* and not the constraints.

That is how it works currently.  He was trying to optimize a case that
involved mixed near and far mems and hence couldn't use a predicate in
that case.

Jim


Re: reload question about unmet constraints

2015-09-15 Thread Jim Wilson
On Tue, Sep 15, 2015 at 7:42 AM, Ulrich Weigand  wrote:
> But the only difference between define_memory_constraint and a plain
> define_constraint is just that define_memory_constraint guarantees
> that any memory operand can be made valid by reloading the address
> into a base register ...
>
> If the set of operands accepted by a constraint does *not* have that
> property, it must not be defined via define_memory_constraint, and
> you should simply use define_constraint instead.

An invalid near mem can be converted to a valid near mem by reloading
its address into a base reg.  An invalid far mem can be converted to a
valid far mem by reloading its address into a base reg.  But one can't
convert a near mem to a far mem by reloading the address, nor can one
convert a far mem to a near mem by reloading its address.  So we need
another dimension to the validity testing here, besides the question
of whether the address can be reloaded, there is the question of
whether it is in the right address space.  Though I don't think the
rl78 is actually using address spaces, and it isn't clear if that
would help.

Jim


Re: reload question about unmet constraints

2015-09-14 Thread Jim Wilson
On Mon, Sep 14, 2015 at 11:05 PM, DJ Delorie  wrote:
> As a test, I added this API.  It seems to work.  I suppose there could
> be a better API where we determine if a constrain matches various
> memory spaces, then compare with the memory space of the operand, but
> I can't prove that's sufficiently flexible for all targets that
> support memory spaces.  Heck, I'm not even sure what to call the
> macro, and 
> "TARGET_IS_THIS_MEMORY_ADDRESS_RELOADABLE_TO_MATCH_THIS_CONTRAINT_P()"
> is a little long ;-)
>
> What do we think of this direction?

We already have define_constraint and define_memory_constraint.  We
could perhaps add a define_special_memory_constraint that returns
CT_SPECIAL_MEMORY which mostly operates like CT_MEMORY, except that it
doesn't assume any MEM can be reloaded to match.

We already have constraint_satisfied_p, which is generated from
define*_constraint.  We could have a constraint_reloadable_to_match_p
function parallel to that, which is for operands that don't match, but
can be reloaded to match.  Perhaps we don't even need a distinction
between define_memory_constraint and define_special_memory_constraint.
We could have constraint_reloadable_to_match_p default to the current
code for memory constraints, that assumes any mem is reloadable to
match, if a special reloadable condition isn't specified.

Perhaps define_memory_constraint can be extended with an optional
field at the end, that is used to generate the
constraint_reloadable_to_match_p function.

Otherwise, I think you are headed in the right direction.  I would
worry a bit about whether we are making reload even more complicated
for folks.  But given that we already have the concept of address
spaces, there should be some way to expose this info to reload.

Jim


Re: reload question about unmet constraints

2015-09-01 Thread Jim Wilson
On Tue, Sep 1, 2015 at 6:20 PM, DJ Delorie  wrote:
>
>> It did match the first alternative (alternative 0), but it matched the
>> constraints Y/Y/m.
>
> It shouldn't match Y as those are for near addresses (unless it's only
> matching MEM==MEM), and the ones in the insn are far, but ...

Reload chooses the alternative that is the best match.  When using the
constraints Y/Y/m, 2 of the three operands match the constraints, so
this ends up being the best match.  It then tries to reload the far
mem to match Y, which fails, as all it knows how to do is reload a mem
address to make it match, which can't turn a far mem into a near mem.

You would need some way to indicate that while Y does accept a mem,
this particular mem can't be reloaded to match.  We don't have a way
to do that.

The Y constraint gets classified as contraint type CT_MEMORY.  In
find_reloads, in reload.c, there is a case CT_MEMORY, and it does
if (CONST_POOL_OK_P (operand_mode[i], operand)
|| MEM_P (operand))
  badop = 0;
constmemok = 1;
offmemok = 1;

Since the operand is a MEM, badop is set to zero.  That makes this
look like a good alternative.  You want badop to be left alone.  Also,
the fact that offmemok was set means that reload thinks that any mem
can be fixed by reloading the address to make it offsetable.  You
don't want offmemok set.  Without offmemok set, it should get reloaded
into a register, as reload will use the v constraint instead.

Jim


Re: reload question about unmet constraints

2015-09-01 Thread Jim Wilson
On 09/01/2015 12:44 AM, DJ Delorie wrote:
> I expected gcc to see that the operation doesn't meet the constraints,
> and move operands into registers to make it work (alternative 1,
> "v/v/v").

It did match the first alternative (alternative 0), but it matched the
constraints Y/Y/m.  Operands 1 and 2 are OK, so don't need reloads.  It
did create optional reloads, which it always does for mem, but these
reloads are irrelevant.  The interesting one is for operand 0.  Since Y
accepts mem, and operand 0 is a mem but doesn't match, reload assumes
that we can fix it by reloading the address to make it an offsettable
address.  But a far mem is still not acceptable even with a reloaded
address, and you get an ICE.

Reload doesn't have any concept of two different kinds of memory
operands which can't be converted via reloads.  If the constraint
accepts mem, and we have a mem operand, then it will always assume that
the problem is with the address and reload it.

I don't think that there is an easy solution to this, but my reload
skills are a bit rusty too.

Jim



Re: fake/abnormal/eh edge question

2015-08-27 Thread Jim Wilson
On 08/25/2015 02:54 PM, Steve Ellcey wrote:
> Actually, it looks like is peephole2 that is eliminating the
> instructions (and .cfi psuedo-ops).
> I am not entirely sure I need the code or if I just need the .cfi
> psuedo-ops and that I need the code to generate the .cfi stuff.

Don't create any new edges.  That doesn't make sense for unwind info.

You don't need unwind info for unreachable code.

You do need enough info to be able to unwind from any instruction in
general, or just from call sites for C++ EH.  This means you need to be
able to calculate the value of sp from the caller, which you can only
get from the r12/drap value.  So you need to keep the instructions and
the cfi directives generated from them.

Looking at the i386 port, in the i386.md file, I see a few peepholes
check the value of RTX_FRAME_RELATED_P and fail if it is set.  It
appears that you need to do something similar in the mips.md file to
prevent these instructions from being deleted by the peephole2 pass.

Jim



Re: Controlling instruction alternative selection

2015-08-03 Thread Jim Wilson
On 07/30/2015 09:54 PM, Paul Shortis wrote:
> Resulting in ...
> error: unable to find a register to spill in class ‘GP_REGS’
> 
> enabling lra and inspecting the rtl dump indicates that both
> alternatives (R and r) seem to be equally appealing to the allocater so
> it chooses 'R' and fails.

The problem isn't in lra, it is in reload.  You want lra to use the
three address instruction, but you then want reload to use the two
address alternative.

> Using constraint disparaging (?R) eradicates the errors, but of course
> that causes the 'R' three address alternative to never be used.

You want to disparage the three address alternative in reload, but not
in lra.  There is a special code for that, you can use ^ instead of ? to
make that happen.  That may or may not help though.

There is also a hook TARGET_CLASS_LIKELY_SPILLED_P which might help.
You should try defining this to return true for the 'R' class if it
doesn't already.

Jim



Re: RETURN_ADDRESS_POINTER_REGNUM Macro

2015-07-24 Thread Jim Wilson
On 07/23/2015 11:09 PM, Ajit Kumar Agarwal wrote:
> From the description of the definition of the macro 
> RETURN_ADDRESS_POINTER_REGNUM , 

> Does this impact the performance or correctness of the compiler?  On what 
> cases it is applicable to define for the given architecture?

This is used to help implement the __builtin_return_address builtin
function.  There is some default code for this, so it may work OK
without defining RETURN_ADDRESS_POINTER_REGNUM.  If the default code
doesn't work, then you may need to define RETURN_ADDRESS_POINTER_REGNUM.

Usually, it is trivial to make __builtin_return_address work for leaf
functions, non-trivial to make it work for non-leaf functions, and
difficult to impossible to make it work for level != 0.  You will have
better luck using the unwind info than __builtin_return_address.

This is an optional builtin function, so there is no performance or
correctness issue here.

Jim



Re: How to express this complicated constraint in md file

2015-07-16 Thread Jim Wilson
On 07/16/2015 01:32 PM, Dmitry Grinberg wrote:
> WUMUL x, y which will multiply 32-bit register x by 32-bit
> register y, and produce a 64-bit result, storing the high bits into
> register x and low bits into register y

You can rewrite the RTL to make this easier.  You can use a parallel to
do for instance

[(set (reg:SI x) (truncate:SI (lshiftrt:DI (mult:DI (sign_extend:DI
(reg:SI x)) (sign_extend:DI (reg:SI y))) (const_int 32)))
 (set (reg:SI y) (truncate:SI (mult:DI (sign_extend:DI (reg:SI x))
(sign_extend:DI (reg:SI y)]

Now you have only 32-bit regs, and you can use matching constraints to
make it work.  The truncate lshiftrt is the traditional way to write a
mulX_highpart pattern.  Some parts of the optimizer may recognize this
construct and know how to handle it.  For the second set, you might
consider just using (mult:SI ...) if that gives the correct result, or
you can use a subreg or whatever.  The optimizer is unlikely to generate
this pattern on its own, but you can have an expander and/or splitter
that generates it.  Use zero_extend instead of sign_extend if this is an
unsigned widening multiply.  You probably want to generate two SImode
temporaries in the expander, and copy the input regs into the
temporaries, as expanders aren't supposed to clobber input regs.  If you
want a 64-bit result out of this, then you would need extra instructions
to combine x and y into a 64-bit output.

Another way to do this is to arbtrarily force the result into a register
pair, then you can use a subreg to match the high part or the low part
of that register pair for the inputs.
[(set (reg:DI x) (mult:DI (sign_extend:DI (subreg:SI (reg:DI x) 0))
(sign_extend:DI (subreg:SI (reg:DI x) 1]
The subreg numbers may be reversed if this is little word endian instead
of big word endian.  You might need extra setup instructions to create
the register pair first.  Create a DI temp for the output, move the
inputs into the high/low word of the DI temp, and then you can do the
multiply on the DI tmep.

Jim



Re: configure.{in -> ac} rename (commit 35eafcc71b) broke in-tree binutils building of gcc

2015-07-14 Thread Jim Wilson
On Tue, Jul 14, 2015 at 10:08 AM, H.J. Lu  wrote:
> Combined tree is useful when the latest binutils is needed by GCC.

If you build and install binutils using the same --prefix as used for
gcc, then gcc will automatically find that binutils and use it.  You
don't need combined trees to make this work.

If you do still like combined trees, then I'd suggest putting binutils
and gcc into the same dir, instead of placing binutils into gcc, and
then add a simple Makefile that will configure, build and install
binutils and then likewise for gcc.

Jim


Re: configure.{in -> ac} rename (commit 35eafcc71b) broke in-tree binutils building of gcc

2015-07-14 Thread Jim Wilson
On 07/14/2015 02:13 AM, Jan Beulich wrote:
> I was quite surprised for my gcc 4.9.3 build (using binutils 2.25 instead
> of 2.24 as I had in use with 4.9.2) to fail in rather obscure ways.

in-tree/combined-tree builds aren't recommended anymore, and hence
aren't well maintained anymore.  That is an anachronism from the old
Cygnus days.  I still find it useful to drop newlib into gcc so it can
be built like the other gcc libs, but otherwise I wouldn't recommend
combining anything.

Jim





Re: Question about find modifiable mems

2015-06-03 Thread Jim Wilson
On 06/02/2015 11:39 PM, shmeel gutl wrote:
> find_modifiable_mems was introduced to gcc 4.8 in september 2012. Is
> there any documentation as to how it is supposed to help the haifa
> scheduler?

The patch was submitted here
  https://gcc.gnu.org/ml/gcc-patches/2012-08/msg00155.html
and this message contains a brief explanation of what it is supposed to
do.  The explanation looks like a useful optimization, but perhaps it is
triggering in cases when it shouldn't.

Jim



Re: [RFC] Kernel livepatching support in GCC

2015-05-28 Thread Jim Wilson
On 05/28/2015 01:39 AM, Maxim Kuvyrkov wrote:
> Hi,
> 
> Akashi-san and I have been discussing required GCC changes to make kernel's 
> livepatching work for AArch64 and other architectures.  At the moment 
> livepatching is supported for x86[_64] using the following options: "-pg 
> -mfentry -mrecord-mcount -mnop-mcount" which is geek-speak for "please add 
> several NOPs at the very beginning of each function, and make a section with 
> addresses of all those NOP pads".

FYI, there is also the darwin/rs6000 -mfix-and-continue support, which
adds 5 nops to the prologue.  This was a part of a gdb feature, to allow
one to load a fixed function into a binary inside the debugger, and then
continue executing with the fixed code.  It sounds like your kernel
feature is doing something very similar.  If you are making this a
generic feature, then maybe the darwin/rs6000 -mfix-and-continue support
can be merged with it somehow.

Jim



Re: Is there a way to adjust alignment of DImode and DFmode?

2015-05-21 Thread Jim Wilson
On 05/20/2015 10:00 AM, H.J. Lu wrote:
> By default, alignment of DImode and DFmode is set to 8 bytes.
> Intel MCU psABI specifies alignment of DImode and DFmode
> to be 4 bytes. I'd like to make get_mode_alignment to return
> 32 bits for DImode and DFmode.   Is there a way to adjust alignment
> of DImode and DFmode via ADJUST_ALIGNMENT?

I see that i386-modes.def already uses ADJUST_ALIGNMENT to change the
alignment of XFmode to 4 for ilp32 code.  ADJUST_ALIGNMENT should work
the same for DImode and DFmode.  Did you run into a problem when you
tried it?

Jim



Re: ldm/stm bus error

2015-05-18 Thread Jim Wilson
On 05/18/2015 02:05 AM, Umesh Kalappa wrote:
> Getting a bus/hard error for the below case ,make sense since ldm/stm
> expects the address to be word aligned .

> --with-pkgversion='Cisco GCC c4.7.0-p1' --with-cisco-patch-level=1

The FSF doesn't support gcc-4.7.0 anymore.  Generally, we only support
the last two versions, which is 4.9 and 5.1.  We also don't support
vendor compilers.  So you will have to reproduce in an FSF GCC 4.9 or
later tree if you want to get anyone here interested.  It appears that
this is already fixed in gcc-4.9 though.

The Cisco compiler is a MontaVista compiler with patches.  You could try
reporting this to MontaVista.

Or you can try to track down the problem yourself.  You can use
bisection to find the patch that fixed it.  Checkout a copy of the FSF
gcc tree halfway between gcc-4.7 and gcc-4.9, and check to see if it has
the bug.  That narrows the search space by half.  Then repeat on the
remaining half until you have reduced it down to a few days.  Then you
can check ChangeLog entries for a likely patch to the ARM backend.  Or
you can continue the bisection until you have it down to one patch.  You
can then backport the patch to your gcc sources and/or point MontaVista
at it.

This bisection process can be scripted.  You can find an example in the
gcc contrib/reghunt directory.  I don't have experience using this
script though, as I like to do it by hand.

If you have a sufficient understanding of gcc, you may be able to find
the patch simply by looking at what gcc-4.9 is doing differently than
gcc-4.7, and mapping that back to a ChangeLog entry and a patch.

Jim



Re: Question about macro _GLIBCXX_RES_LIMITS in libstdc++ testsuite

2015-05-17 Thread Jim Wilson
On 05/17/2015 01:16 AM, Bin.Cheng wrote:
> On Sat, May 16, 2015 at 5:35 PM, Hans-Peter Nilsson  wrote:
>> On Thu, 23 Apr 2015, Bin.Cheng wrote:
>>> Hi,
>>> In libstdc++ testsuite, I noticed that macro _GLIBCXX_RES_LIMITS is
>>> checked/set by GLIBCXX_CHECK_SETRLIMIT, which is further guarded by
>>> GLIBCXX_IS_NATIVE as below:

The setrlimit checks were made dependent on GLIBCXX_IS_NATIVE on Aug 9,
2001.
https://gcc.gnu.org/ml/gcc-patches/2001-08/msg00536.html
This is 3 days after the feature was added.  This was 14 years ago, so
people might not remember exactly why the change was made.  There was
probably no specific reason for this, other than a concern that it might
not work cross, and/or wasn't considered worth the effort to make it
work cross at the time.

It does look like this can work cross, at least for a cross to a linux
target.  For a cross to a bare metal target, it should be OK to run the
tests, they will just fail and disable the macros.  Someone just needs
write the patches to make it work and test it.  You could try submitting
a bug report if you haven't already done so.

Jim



Re: [OR1K port] where do I change the function frame structure

2015-05-06 Thread Jim Wilson

On 05/05/2015 05:19 PM, Peter T. Breuer wrote:

Please ..  where (in what file, dir) of the gcc (4.9.1) source should I
rummage in order to change the sequence of instructions eventually
emitted to do a function call?


Are you trying to change the caller or the callee?

For the callee, or1k_compute_frame_size calculates the frame size, which 
depends on the frame layout.  or1k_expand_prologue emits the RTL for the 
prologue.  or1k_expand_epilogue emits the RTL for the epilogue.  There 
are also a few other closely related helper functions.  These are all in 
gcc/config/or1k/or1k.c.


For the caller, I see that the or1k port already sets 
ACCUMULATE_OUTGOING_ARGS, so there should be no stack pointer inc/dec 
around a call.  Only in the prologue/epilogue.


Jim



Re: Build oddity (Mode = sf\|df messages in output)

2015-05-02 Thread Jim Wilson

On 04/30/2015 03:59 PM, Steve Ellcey  wrote:


I am curious, has anyone started seeing these messages in their GCC build
output:
Mode = sf\|df
Suffix = si\|2\|3


That comes from libgcc/config/t-hardfp.  This is used to generate a list 
of function names from operator, mode, and suffix, e.g. fixsfsi and adddf3.


Jim



Re: Question about perl while bootstrapping gcc

2010-04-16 Thread Jim Wilson

On 04/16/2010 11:10 AM, Dominique Dhumieres wrote:

I use to build gcc with a command line such as
make -j2>&  somelogfile&
I recently found that if I logout, the build fails with
perl: no user 501


Try "nohup make ...".  See the man page or info manual for nohup.

Jim


Re: Error while building GCC 4.5 (MinGW)

2010-04-12 Thread Jim Wilson
On Mon, 2010-04-12 at 08:34 -0700, Name lastlong wrote:
> Please check the following relevant information present in the config.log 
> as follows:

Now that you can see what is wrong, you should try to manually reproduce
the error.  Check the libraries to see if they are OK, and if the right
versions of the libraries are being linked in.  Look to see where the
undefined references are coming from.  Etc.

> Please let me know if there are any particular steps to be followed to build 
> mingw toolchain with mpc libraries.

I have no idea.  I haven't built a mingw toolchain anytime recently.

Jim




Re: GCC documentation: info format

2010-04-09 Thread Jim Wilson

On 04/09/2010 05:08 AM, christophe.ja...@ouvaton.org wrote:

I am currently trying to include GCC documentation into gNewSense
distribution, in info format.


The binutils response to the same question reminds me that the same 
answer works here.  There are pre-built info files in our official 
releases.  You can grab them from there.  I forgot about this because 
I'm working from the development sources most of the time, and we don't 
have pre-built copies there.


Jim


Re: Error while building GCC 4.5 (MinGW)

2010-04-09 Thread Jim Wilson

On 04/08/2010 07:21 AM, Name lastlong wrote:

=error
checking for the correct version of the gmp/mpfr/mpc libraries... no
configure: error: Building GCC requires GMP 4.2+, MPFR 2.3.1+ and MPC 0.8.0+.
 error


Check the config.log file for details.  A successful build should show 
something like this
configure:5634: checking for the correct version of the gmp/mpfr/mpc 
libraries

configure:5665: gcc -o conftest -g -O2conftest.c  -lmpc -lmpfr -lgmp >&5
configure:5665: $? = 0
configure:5666: result: yes

Your file should have an error here.

Jim


Re: GCC documentation: info format

2010-04-09 Thread Jim Wilson

On 04/09/2010 05:08 AM, christophe.ja...@ouvaton.org wrote:

Where may I find gcc-vers.texi?


It is created by the install.texi2html shell script, which also creates 
the HTML output files that go on the web site.  You can probably modify 
this script to generate info files instead, but as Diego mentioned, the 
recommended way to produce info files is to configure and build a full 
source tree.


Don't forget about the libstdc++-v3 docs which are not in the 
docs-sources.tar.gz file.  Also, don't forget about the other docs which 
are not in the gcc/doc directory.  "make info" in a build tree takes 
care of this stuff for you.


Jim


Re: lower subreg optimization

2010-04-09 Thread Jim Wilson

On 04/07/2010 10:48 PM, roy rosen wrote:

I saw in arm/neon.md that they have a similar problem:
...
Their solution is also not complete.
What is the proper way to handle such a case and how do I let gcc know
that this is a simple move instruction so that gcc would be able to
optimize it out?


The only simple solution at the moment is the one that the ARM port is 
using.  You avoid emitting the move when you got the lucky reg-alloc 
result, and you emit the move when you aren't lucky.


As the neon.md comment suggests, and as Ian Taylor mentioned in his 
response, a possible solution is to modify the lower-subreg.c pass 
somehow so that it no longer splits subregs of vector modes, possibly 
controlled by a hook.


We might be able to modify the register allocator to look for this 
pattern, to increase the chances of getting the good reg-alloc result, 
but the lower-subreg.c change is probably better.


Another solution might be to add a pass (or modify an existing one like 
regmove.c) to try to put things back together again, but this is 
probably also not as good as the lower-subreg.c change.


Jim


Re: Help with an Wierd Error

2010-04-06 Thread Jim Wilson

On 04/02/2010 11:02 AM, balaji.i...@gtri.gatech.edu wrote:

/opt/or32/lib/gcc/or32-elf/4.2.2/../../../../or32-elf/lib/crt0.o: In function 
`loop':
(.text+0x64): undefined reference to `___bss_start'


It looks like a case of one-too-many underscores prepended to symbol 
names.  The default for ELF is to not prepend an underscore.  The 
default for COFF is to prepend an underscore.  Check USER_LABEL_PREFIX 
which should be empty for ELF (default definition in defaults.h file). 
Check ASM_OUTPUT_LABELREF which should use %U and not an explicit 
underscore (see defaults.h file).  Check for usage of the 
-fleading-underscores option (should not be used).  Check for header 
file inclusion to see if something is out of place, such as a use of 
svr3.h when svr4.h should be used.


Also check to make sure that gcc, gas, and gld all agree on whether a 
symbol is prepended with an underscore or not.  If gcc does it but gas 
doesn't, then C code calling assembly language code may fail because of 
the mismatch.


Jim


Re: lower subreg optimization

2010-04-06 Thread Jim Wilson

On 04/06/2010 02:24 AM, roy rosen wrote:

(insn 33 32 34 7 a.c:25 (set (subreg:V2HI (reg:V4HI 114) 0)
 (plus:V2HI (subreg:V2HI (reg:V4HI 112) 0)
 (subreg:V2HI (reg:V4HI 113) 0))) 118 {addv2hi3} (nil))


Only subregs are decomposed.  So use vec_select instead of subreg.  I 
see you already have a vec_concat to combine the two v2hi into one v4hi, 
so there is no need for the subreg in the dest.  You should try 
eliminating that first and see if that helps.  If that isn't enough, 
then replace the subregs in the source with vec_select operations.


Jim


Re: Fwd: constant hoisting out of loops

2010-03-20 Thread Jim Wilson
On Sun, 2010-03-21 at 03:40 +0800, fanqifei wrote:
> foor_expand_move is changed and it works now.
> However, I still don't understand why there was no such error if below
> condition was used and foor_expand_move was not changed.
> Both below condition and "(register_operand(operands[0], SImode) ||
> register_operand(operands[1],SImode)) ..." does not accept mem&&mem.

The define_expand is used for generating RTL.  The RTL expander calls
the define_expand, which checks for MEM&CONST, and then falls through
generating the mem copy insn.

The define_insn is used for matching RTL.  After it has been generated,
we look at the movsi define_insn, and see that MEM&MEM doesn't match, so
you get an error for unrecognized RTL.

The define_expand must always match the define_insn(s).  They are used
in different phases, and they aren't checked against each other when gcc
is built.  If there is a mismatch, then you get a run-time error for
unrecognized rtl.

Jim




  1   2   3   4   >