[Bug other/113317] New test case libgomp.c++/ind-base-2.C fails with ICE

2024-01-18 Thread jules at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113317

--- Comment #5 from jules at gcc dot gnu.org ---
Hi,

Unfortunately I no longer have access to any PowerPC machines, which limits my
ability to help with this somewhat. I guess it's *something* like a
CONVERT_EXPR/NOP_EXPR in an unexpected place, and is probably quite easy to fix
with a suitable reproducer.

Julian

[Bug other/113317] New test case libgomp.c++/ind-base-2.C fails with ICE

2024-01-11 Thread jules at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113317

jules at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jules at gcc dot gnu.org

--- Comment #2 from jules at gcc dot gnu.org ---
I haven't been able to reproduce this with our local testing infrastructure on
the powerpc64le machine available to me (i.e., the test works for me there).
I'd be a bit surprised about a machine dependence at this early stage in
parsing, but here we go!

Let me know if there's any other information that might help track this down.

[Bug target/113331] New: AMDGCN: Compilation failure due to duplicate .LEHB/.LEHE symbols

2024-01-11 Thread jules at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113331

Bug ID: 113331
   Summary: AMDGCN: Compilation failure due to duplicate
.LEHB/.LEHE symbols
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jules at gcc dot gnu.org
  Target Milestone: ---

Created attachment 57037
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57037=edit
Test case

The attached file fails to compile with AMD GCN offloading on current mainline.
It also fails on GCC 13.2.0 as packaged in Debian, so this doesn't appear to be
a regression.

$ g++ -fopenmp -foffload=amdgcn-amdhsa -save-temps  dup-syms.cc -o dup-syms 
./dup-syms.xamdgcn-amdhsa.mkoffload.2.s:256:1: error: symbol '.LEHB0' is
already defined
.LEHB0:
^
./dup-syms.xamdgcn-amdhsa.mkoffload.2.s:258:1: error: symbol '.LEHE0' is
already defined
.LEHE0:
^
./dup-syms.xamdgcn-amdhsa.mkoffload.2.s:288:1: error: symbol '.LEHB1' is
already defined
.LEHB1:
^
./dup-syms.xamdgcn-amdhsa.mkoffload.2.s:290:1: error: symbol '.LEHE1' is
already defined
.LEHE1:
^

The test case doesn't trigger with NVPTX offloading, but I don't think that
definitely implies that this is something GCN-specific (vs. generically
offload-related).

[Bug middle-end/113279] New: OpenMP 5 - Audit GOMP_MAP_FIRSTPRIVATE_REFERENCE handling for references to pointers to pointers

2024-01-08 Thread jules at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113279

Bug ID: 113279
   Summary: OpenMP 5 - Audit GOMP_MAP_FIRSTPRIVATE_REFERENCE
handling for references to pointers to pointers
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jules at gcc dot gnu.org
  Target Milestone: ---

The patch (approved and soon to be committed) to support "lvalue" parsing for
C++ has a corner-case issue whereby references to pointers to pointers do not
quite work interchangeably with pointers to pointers (i.e. when implicitly
mapped). See the "ref2ptrptr_offset_decl_member_slice" functions in the libgomp
test cases baseptrs-4.C and baseptrs-6.C.

https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641924.html

It is possible that a slight semantic tweak to GOMP_MAP_FIRSTPRIVATE_REFERENCE
handling in gimplify.cc or omp-low.cc might be able to repair these cases, if
indeed that is required by the spec.

[Bug middle-end/113006] New: OpenMP 5 - lvalue parsing support for map/to/from clause

2023-12-13 Thread jules at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113006

Bug ID: 113006
   Summary: OpenMP 5 - lvalue parsing support for map/to/from
clause
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jules at gcc dot gnu.org
  Target Milestone: ---

The following test:

  libgomp/testsuite/libgomp.c++/baseptrs-4.C

partly depends on the following in-review patches:

  https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629365.html
  https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629364.html

Tobias points out there is currently an unresolved OpenMP issue, 2618 ("Clarify
behavior of mapping lvalues on target construct"), which talks about whether
code like the following:

   map(*p = 10)
   map(x = 20)
   map(x ? y[0] : p[1])
   map(f(y))

is valid or not. He writes that the sentiment was to require that a 'map'
clause list item must have a base pointer or a base variable.

As far as I know, none of these "problematic" cases are permitted by the above
patches, but it might be worth following the OpenMP issue to make sure that
nothing diverges.

[Bug middle-end/113004] New: OpenMP 5 - structs are not mapped element-wise by default

2023-12-13 Thread jules at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113004

Bug ID: 113004
   Summary: OpenMP 5 - structs are not mapped element-wise by
default
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jules at gcc dot gnu.org
  Target Milestone: ---

According to e.g. OpenMP 5.2, "5.8.2 Mapper Identifiers and mapper Modifiers":

"A structure type T has a predefined default mapper that is defined as if by a
declare mapper directive that specifies v in a map clause with the alloc
map-type and each structure element of v in a map clause with the tofrom
map-type."

At present, a variable of struct type will be mapped as as a single block (the
pre-OpenMP 5.0 legacy behaviour).

This works fine most of the time, but for example fails in the case where one
of the struct's members is a reference, demonstrated in the test case:

  libgomp/testsuite/libgomp.c++/target-49.C

Once "declare mapper" support is committed, cases where a struct with an
implicit default mapper should invoke a struct with a *non*-default mapper,
will also probably be mishandled.

[Bug libgomp/108624] [OpenMP] Map(ptr, ptr[2:N-2]) will cause a segfault, 'map(ptr, ptr[0:N])'

2023-02-01 Thread jules at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108624

jules at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jules at gcc dot gnu.org

--- Comment #1 from jules at gcc dot gnu.org ---
This works with the in-review series:

https://gcc.gnu.org/pipermail/gcc-patches/2022-December/609031.html

i.e. for the mapping:

#pragma omp target map(ptr1[1:N-1]) map(ptr2[:N])

we get (gimplify):

_1 = ptr1 + 4;
map(tofrom:*_1 [len: 252]) map(firstprivate:ptr1 [pointer assign, bias: 4])

and for the mapping:

#pragma omp target map(ptr1, ptr1[1:N-1]) map(ptr2[:N])

we get (gimplify):

_4 = ptr1.2_3 + 4
map(tofrom:ptr1 [len: 8]) map(tofrom:*_4 [len: 252]) map(attach:ptr1 [bias: 4])

the test runs fine either way.

[Bug target/102856] New: [nvptx] Misaligned accesses with cheap vectorization enabled

2021-10-20 Thread jules at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102856

Bug ID: 102856
   Summary: [nvptx] Misaligned accesses with cheap vectorization
enabled
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jules at gcc dot gnu.org
  Target Milestone: ---

Since revision 2b8453c401b699ed93c085d0413ab4b5030bcdb8 I am seeing several
OpenMP tests fail with misaligned access errors:

PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c++/../libgomp.c-c++-common/for-11.c
execution test
PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c++/../libgomp.c-c++-common/for-12.c
execution test
PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c++/../libgomp.c-c++-common/for-16.c
execution test
PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c++/../libgomp.c-c++-common/for-3.c
execution test
PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c++/../libgomp.c-c++-common/for-5.c
execution test
PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c++/../libgomp.c-c++-common/for-6.c
execution test
PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c++/../libgomp.c-c++-common/for-9.c
execution test
PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c/../libgomp.c-c++-common/for-11.c
execution test
PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c/../libgomp.c-c++-common/for-12.c
execution test
PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c/../libgomp.c-c++-common/for-16.c
execution test
PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c/../libgomp.c-c++-common/for-3.c
execution test
PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c/../libgomp.c-c++-common/for-5.c
execution test
PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c/../libgomp.c-c++-common/for-6.c
execution test
PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c/../libgomp.c-c++-common/for-9.c
execution test

These look like, e.g.:

$ ./for-11.exe 

libgomp: cuCtxSynchronize error: misaligned address

libgomp: cuMemFree_v2 error: misaligned address

libgomp: device finalization failed

I suspect the reason is that an operation that is now being vectorized (e.g.
"st.v2.u64 [%frame], %r28;") requires higher alignment than the original scalar
accesses it replaces.

I haven't spotted an obvious culprit for the problem in the nvptx backend. This
is OpenMP, so it could be the soft stack handling -- or it could be something
else.

[Bug libgomp/95590] OpenACC 'attach' behavior if already attached to different data

2020-07-24 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95590

--- Comment #1 from jules at gcc dot gnu.org ---
In discussion:

https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548679.html
https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550623.html
https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550654.html

[Bug middle-end/95270] OpenACC 'enter data attach' looks up target memory object displaced by pointer size

2020-06-08 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95270

--- Comment #2 from jules at gcc dot gnu.org ---
TBH I've suspected problems with misuse of the bias for attach/detach before,
but I've not come up with a test case. I'll have a look.

Thanks!

[Bug middle-end/92929] OpenACC/OpenMP 'target' 'exit data'/'update' optimizations

2020-06-08 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92929

--- Comment #7 from jules at gcc dot gnu.org ---
Test case & further discussion in:

https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547424.html

[Bug fortran/93025] [OpenACC] ICE with copy(y(1)%cc(:)%i) type of strided access – permitted since commit of OpenACC 2.6 deep copy patch series

2020-01-28 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93025

jules at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from jules at gcc dot gnu.org ---
I believe this is fixed on trunk now.

[Bug libgomp/93030] [OpenACC] libgomp.oacc-c-c++-common/deep-copy-10.c FAILS on AMDGCN – invalid 'async' usage?

2020-01-02 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93030

--- Comment #1 from jules at gcc dot gnu.org ---
Thanks for the report -- there's a fix for this on the og9 branch, but I'm not
sure if I've posted that upstream (the GCN worker-partitioning patches were
separated out from the deep-copy patches, so fixes around the intersection of
those two features are easy to overlook).

I'll dig that up and post it.

[Bug middle-end/92929] OpenACC/OpenMP 'target' 'exit data'/'update' optimizations

2020-01-02 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92929

--- Comment #6 from jules at gcc dot gnu.org ---
Apologies for breakage. This part of the patch was originally from the og9
patch supporting Fortran polymorphic class pointers posted at
https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00752.html. The rationale was as
follows:

[...] OpenACC "enter data" and "exit data" now have GOMP_MAP_POINTER and
GOMP_MAP_PSET mappings removed during gimplification. In some
circumstances, passing an array to a function/subroutine and then doing
an "enter data" on it could leave dangling references to the function's
stack, although the actual array data is defined outside the function.
In any case, the pointer/pointer-set mappings don't seem to be
necessary for OpenACC "enter data".

I observed the described problem with a large test program, and unfortunately
did not manage to come up with a minimised test case at the time. Someone more
familiar with Fortran might be able to do so more easily than me!

In terms of backwards compatibility, we can't do anything about the "parasitic
binding" to a function's expired stack frame as generated by an older version
of the compiler in this case, I don't think. That's most likely going to
manifest as an intermittent crash.

[Bug libgomp/92881] [OpenACC] In async context, need to use 'gomp_remove_var_async' instead of 'gomp_remove_var'

2019-12-13 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92881

--- Comment #1 from jules at gcc dot gnu.org ---
Author: jules
Date: Fri Dec 13 23:14:15 2019
New Revision: 279388

URL: https://gcc.gnu.org/viewcvs?rev=279388=gcc=rev
Log:
Fix potential race condition in OpenACC "exit data" operations

PR libgomp/92881

libgomp/
* libgomp.h (gomp_remove_var_async): Add prototype.
* oacc-mem.c (delete_copyout): Call gomp_remove_var_async instead of
gomp_remove_var.
* target.c (gomp_unref_tgt): Change return type to bool, indicating
whether target_mem_desc was unmapped.
(gomp_unref_tgt_void): New.
(gomp_remove_var): Reimplement in terms of...
(gomp_remove_var_internal): ...this new helper function.
(gomp_remove_var_async): New, implemented using above helper function.
(gomp_unmap_vars_internal): Use gomp_unref_tgt_void instead of
gomp_unref_tgt.

Reviewed-by: Thomas Schwinge 

Modified:
trunk/libgomp/ChangeLog
trunk/libgomp/libgomp.h
trunk/libgomp/oacc-mem.c
trunk/libgomp/target.c

[Bug libgomp/92843] [OpenACC] Wrong/missing dynamic reference counting for structured 'REFCOUNT_INFINITY'

2019-12-12 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92843

--- Comment #12 from jules at gcc dot gnu.org ---
For an adaptation of the refcount checking code as alluded to above -- see
PR92848.

[Bug libgomp/92848] [OpenACC] Memory leak for simple 'acc_create', 'acc_delete' sequence

2019-12-12 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92848

--- Comment #6 from jules at gcc dot gnu.org ---
Created attachment 47479
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47479=edit
Regressions with rc checking enabled

These are the OpenACC tests that regress with refcount checking enabled at
present.

[Bug libgomp/92848] [OpenACC] Memory leak for simple 'acc_create', 'acc_delete' sequence

2019-12-12 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92848

jules at gcc dot gnu.org changed:

   What|Removed |Added

  Attachment #47451|0   |1
is obsolete||

--- Comment #7 from jules at gcc dot gnu.org ---
Created attachment 47480
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47480=edit
Rebased patch for PR92848

Here's a rebased version of Thomas's patch. Any merge errors are my own!

[Bug libgomp/92848] [OpenACC] Memory leak for simple 'acc_create', 'acc_delete' sequence

2019-12-12 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92848

--- Comment #5 from jules at gcc dot gnu.org ---
Created attachment 47478
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47478=edit
Refcount checking for current trunk

Hi,

This is a merge of the refcount checking patch to current trunk, with the name
of the "virtual_refcount" field changed back to the current "dynamic_refcount".
With RC_CHECKING macro defined, this flags consistency errors in a large number
of OpenACC tests: how many of those are user-visible problems, I am not sure.
In at least some cases, we certainly have "two wrongs making a right" in that
the consistency errors do not result in a user-visible problem.

An assumption here is that "dynamic_refcount" is supposed to mean the same
thing as "virtual_refcount" does in the overhaul patch: those references that
represent dynamic data lifetimes without having links in the interlinked splay
tree/target_mem_desc structure. Or otherwise, the number which the static
refcount should be decreased by on encountering a "finalize" operation for the
data item in question.

Anyway: I hope this is useful in evaluating the rest of the refcount overhaul
patch.

[Bug libgomp/92848] [OpenACC] Memory leak for simple 'acc_create', 'acc_delete' sequence

2019-12-10 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92848

--- Comment #4 from jules at gcc dot gnu.org ---
(In reply to Thomas Schwinge from comment #3)
> (In reply to jules from comment #2)
> > Again, please don't change this code under the feet of the refcount overhaul
> > patch!
> 
> But why?  This here is one independent chance.  The big "OpenACC reference
> count overhaul" will then either lose some of its changes, or replace this
> here by something else ('GOMP_MAP_VARS_OPENACC_ENTER_DATA'), but it's an
> incremental step (forward).

It's hard for me to evaluate if the changes are correct, given that I don't
trust the current baseline, mostly.

> > Using the (currently OpenMP-specific) GOMP_MAP_VARS_ENTER_DATA is
> > going to end up mighty confusing from OpenACC-specific code.
> 
> Why do you think so?  It has exactly! the semantics that we need here, for
> this PR.

Is the desired behaviour change just the reference count initialized by
target.c:gomp_map_vars_internal? I suppose that's OK. The danger is if the
semantics needed for OpenMP ever shift away from what we need for OpenACC here.
But I guess this code might not be around for long anyway?

[Bug libgomp/92843] [OpenACC] Disallow 'acc_delete' etc. for everything without a dynamic reference count

2019-12-10 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92843

--- Comment #9 from jules at gcc dot gnu.org ---
FWIW, I wrote this at the time wrt. the refcounting changes (some parts refer
to a previous iteration of the manual deep copy patches)...

Writing a couple of new attach/detach tests, I realised that reference
counting for OpenACC was pretty seriously broken. In an attempt to fix
that, I've:

 - written self-checking code to verify reference counts. This walks
   over the whole memory-mapping splay tree and re-counts references,
   making sure they match up with the running tallies kept by the
   mapping/unmapping routines.

 - removed the dubious "finalize" arguments from the unmap routines you
   pointed out. Call "detach" in more appropriate places.

 - rewritten gomp_acc_{insert,remove}_pointer, and change the way
   "enter data" mapping works in OpenACC to be more like the
   implementation for OpenMP (gomp_acc_remove_pointer is now modelled
   on target.c:gomp_exit_data).

 - ...which means we can get rid of the "acc_dev->openacc.data_environ"
   list. I think I added that to start with (ICBW!) but now I don't
   think we need it. That means we avoid a lot of scanning through
   linear linked lists to find the "right" target_mem_desc to unmap --
   which it turns out was buggy, and also probably conceptually wrong.
   (In GOACC_enter_exit_data, that meant each mapping clause got its own
   "target_mem_desc" when entering, which isn't how things normally work
   for gomp_map_vars. That meant that when "exit data"ing, again
   individual clauses could remove the corresponding target_mem_desc.
   What *didn't* work is if the mappings for a single target_mem_desc
   returned by GOACC_enter_exit_data refer to *other* target_mem_descs
   in their argument lists, rather than pointing back to the same one. A
   lot of the code that was checking "t->refcount" for "2" or "minrefs"
   in oacc-mem.c was simply broken.)

 - lookup_dev scans over the splay tree instead of looking through the
   data_environ list. It should have done that to start with -- it
   means it'll be able to find data mapped inside "pragma acc data"
   blocks, which it wouldn't have been able to previously.

It's a lot of changes, but (I hope!) with the RC_CHECKING feature we'll
be able to be a little more confident that we've got the memory map
refcounting right, at least.

[Bug libgomp/92843] [OpenACC] Disallow 'acc_delete' etc. for everything without a dynamic reference count

2019-12-10 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92843

--- Comment #8 from jules at gcc dot gnu.org ---
(In reply to Thomas Schwinge from comment #7)
> We first need to establish a stable baseline, with test cases, and then (or,
> as part of that) merge the several independent pieces of the big "OpenACC
> reference count overhaul".

OK -- I think I misunderstood your strategy here, which is to avoid committing
that patch in one go. IMO (:-)) it's a clear improvement overall, particularly
with the consistency checking. Test coverage could probably be better -- but,
I'm sure many of the new tests introduced by the manual deep-copy patch would
not operate correctly without the refcounting fixes underneath (and the two
patches were all merged together to start with, of course).

> > The existing code was rewritten for a
> > reason -- that being, I hit various problems with it (albeit long since
> > forgotten) that interfered with the manual deep-copy implementation.
> 
> So you'll have to dig out your notes, and/or we'll have to figure out any
> rationale again, now.  Patches that change such fundamental things in
> libgomp we cannot just commit on the basis that once they made sense to
> somebody.  They have to come with rationale, and test cases.

That's holding the new code to a rather higher standard than some of the
existing code! But, that's OK, I suppose. I will see if there are any notes I
made at the time that might be helpful.

> > We have
> > refcount self-checking code for the overhauled code.
> 
> ... which surely can be adapted.

If you do that adaptation now, you might get a better idea of the state of the
current refcounting implementation! Not sure if that'll be helpful.

> And, per my understanding, this is only checking libgomp internal
> consistency, but not the semantics exposed to users via OpenACC
> directives/API calls etc., which in part you're changing of your patch
> (without test cases).  This, again, I've spent the best part of the past
> weeks understanding, writing test cases for, filing GCC PRs, resolving them
> one by one, independently, incrementally, comprehensibly.

No, not just internal consistency. At least with the deep copy bits, there was
user-visible breakage with the existing code (again, IIRC).

> > We can address corner-case bug fixes as follow-ons once the main overhaul
> > patch is committed.
> 
> Further bug fixes, yes, but we have to make some reasonable effort to not
> introduce new bugs with the big "OpenACC reference count overhaul" changes.

That's what the testsuite is for, I guess -- the burden of proof for getting
patches approved is lack of regressions in existing tests, generally. Yes there
could be more, but this is a real fundamental thing that most of the existing
OpenACC tests will touch in some way, so it's not *that* bad.

[Bug libgomp/92854] [OpenACC] Always-true conditional in 'libgomp/oacc-mem.c:acc_unmap_data'?

2019-12-09 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92854

--- Comment #6 from jules at gcc dot gnu.org ---
Created attachment 47453
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47453=edit
Patch for acc_map_data-device_already-3.c problem

This patch fixes the acc_map_data-device_already-3.c problem, which I guess has
probably been broken forever. The single tgt_mem_desc used for all the function
and global variable mappings (created in gomp_load_image_to_device) does not
have its tgt_start and tgt_end fields set properly, so oacc-mem.c:lookup_dev
cannot find any global variable.

We can fix this by calculating the total address range covered by (the union
of) all offloaded functions and global variables, and set tgt_start and tgt_end
appropriately. We then go through each splay tree key's tgt_offset, and adjust
it to be an offset within that range.

I've yet to run full tests with this, but it works for the test case mentioned.

[Bug libgomp/92848] [OpenACC] Memory leak for simple 'acc_create', 'acc_delete' sequence

2019-12-09 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92848

--- Comment #2 from jules at gcc dot gnu.org ---
Again, please don't change this code under the feet of the refcount overhaul
patch! Using the (currently OpenMP-specific) GOMP_MAP_VARS_ENTER_DATA is going
to end up mighty confusing from OpenACC-specific code.

[Bug libgomp/92843] [OpenACC] Disallow 'acc_delete' etc. for everything without a dynamic reference count

2019-12-09 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92843

--- Comment #6 from jules at gcc dot gnu.org ---
Please don't start making changes to the reference-counting code that is being
replaced by my overhaul patch. The existing code was rewritten for a reason --
that being, I hit various problems with it (albeit long since forgotten) that
interfered with the manual deep-copy implementation. We have refcount
self-checking code for the overhauled code.

We can address corner-case bug fixes as follow-ons once the main overhaul patch
is committed.

[Bug libgomp/92854] [OpenACC] Always-true conditional in 'libgomp/oacc-mem.c:acc_unmap_data'?

2019-12-09 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92854

--- Comment #5 from jules at gcc dot gnu.org ---
Huh, yes, I missed that line in the spec (about existing mappings). I'll have a
look at the test case you mentioned.

[Bug libgomp/92840] [OpenACC] Disallow 'acc_unmap_data' for everything other than 'acc_map_data'

2019-12-09 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92840

--- Comment #2 from jules at gcc dot gnu.org ---
I think that looks OK, thanks.

[Bug libgomp/92854] [OpenACC] Always-true conditional in 'libgomp/oacc-mem.c:acc_unmap_data'?

2019-12-09 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92854

--- Comment #2 from jules at gcc dot gnu.org ---
For strictly-paired acc_map_data/acc_unmap_data calls that don't interfere with
other mappings -- no, probably not. But (like I guess you noticed too) the
existing code feels wrong (or at least ugly) nonetheless.

It's probably possible to come up with a legal (but odd) test case in which the
condition is false -- maybe something like this (untested!):

int arr1[100], arr2[100], arr3[100];

void foo (void)
{
  #pragma acc data copy(arr1) copy(arr2) copy(arr3)
  {
void *darr2 = acc_deviceptr (arr2);
acc_map_data (arr2, darr2, sizeof (arr2));
[...]
acc_unmap_data (arr2);
  }
}

Now I think the acc_map_data call will reuse the tgt for the structured data
mapping, which binds together several array copies into a single device block.
Forcing the first key's refcount to infinity (as is done in the current
implementation of acc_map_data) is also wrong in that case.

[Bug libgomp/92843] [OpenACC] Disallow 'acc_delete' etc. for everything without a dynamic reference count

2019-12-06 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92843

--- Comment #2 from jules at gcc dot gnu.org ---
I don't think your example is valid, but I'm not sure it will be fail in quite
the right way with the current version of my refcount overhaul patch. Actually
I think the acc_map_data implementation using REFCOUNT_INFINITY is probably
wrong too.

I will try changing the implementation as follows:

- calls to acc_delete with the structured reference count being non-zero (or
infinity in the case of '#pragma acc declare'd data) should raise an error.

- acc_map_data should use GOMP_MAP_VARS_OPENACC_ENTER_DATA instead of forcing
the refcount to infinity (i.e., making the behaviour the same as "enter data
(create)").

Does that match your understanding of what the behaviour should be?

[Bug c/91985] Unsupported DFP not diagnosed with constants or built-in functions

2019-11-26 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91985

jules at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jules at gcc dot gnu.org

--- Comment #4 from jules at gcc dot gnu.org ---
SVN r278684 appears to cause a problem with offloading compilation in lto1 with
an nvptx offloading-enabled compiler. The following backtrace is from
libgomp/testsuite/libgomp.fortran/nestedfn5.f90, but many other tests fail
also.

lto1: internal compiler error: in operator[], at vec.h:867
0x894d9a vec::operator[](unsigned int)
/scratch/jbrown/nvptx-mainline/src/gcc-mainline/gcc/vec.h:867
0x893924 vec::operator[](unsigned int)
/scratch/jbrown/nvptx-mainline/src/gcc-mainline/gcc/vec.h:1433
0x1366f4a streamer_tree_cache_get_tree
/scratch/jbrown/nvptx-mainline/src/gcc-mainline/gcc/tree-streamer.h:98
0x136b605 streamer_get_pickled_tree(lto_input_block*, data_in*)
   
/scratch/jbrown/nvptx-mainline/src/gcc-mainline/gcc/tree-streamer-in.c:1110
0xde36fc lto_input_tree_1(lto_input_block*, data_in*, LTO_tags, unsigned int)
   
/scratch/jbrown/nvptx-mainline/src/gcc-mainline/gcc/lto-streamer-in.c:1505
0xde3958 lto_input_tree(lto_input_block*, data_in*)
   
/scratch/jbrown/nvptx-mainline/src/gcc-mainline/gcc/lto-streamer-in.c:1552
0x136a982 lto_input_ts_list_tree_pointers
   
/scratch/jbrown/nvptx-mainline/src/gcc-mainline/gcc/tree-streamer-in.c:863
0x136b4aa streamer_read_tree_body(lto_input_block*, data_in*, tree_node*)
   
/scratch/jbrown/nvptx-mainline/src/gcc-mainline/gcc/tree-streamer-in.c:1075
0xde3303 lto_read_tree_1
   
/scratch/jbrown/nvptx-mainline/src/gcc-mainline/gcc/lto-streamer-in.c:1375
0xde3480 lto_read_tree
   
/scratch/jbrown/nvptx-mainline/src/gcc-mainline/gcc/lto-streamer-in.c:1416
0xde389e lto_input_tree_1(lto_input_block*, data_in*, LTO_tags, unsigned int)
   
/scratch/jbrown/nvptx-mainline/src/gcc-mainline/gcc/lto-streamer-in.c:1528
0xde34fa lto_input_scc(lto_input_block*, data_in*, unsigned int*, unsigned
int*)
   
/scratch/jbrown/nvptx-mainline/src/gcc-mainline/gcc/lto-streamer-in.c:1440
0x88dd1c lto_read_decls
   
/scratch/jbrown/nvptx-mainline/src/gcc-mainline/gcc/lto/lto-common.c:1830
0x88ed54 lto_file_finalize
   
/scratch/jbrown/nvptx-mainline/src/gcc-mainline/gcc/lto/lto-common.c:2227
0x88edae lto_create_files_from_ids
   
/scratch/jbrown/nvptx-mainline/src/gcc-mainline/gcc/lto/lto-common.c:2237
0x88ef93 lto_file_read
   
/scratch/jbrown/nvptx-mainline/src/gcc-mainline/gcc/lto/lto-common.c:2292
0x891ee9 read_cgraph_and_symbols(unsigned int, char const**)
   
/scratch/jbrown/nvptx-mainline/src/gcc-mainline/gcc/lto/lto-common.c:2744
0x8698f3 lto_main()
/scratch/jbrown/nvptx-mainline/src/gcc-mainline/gcc/lto/lto.c:630

It looks like something is unprepared to handle a NULL_TREE or error_mark_node
-- I've not finished investigating.

[Bug libgomp/92511] [OpenACC] Support subset subarray mappings

2019-11-20 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92511

--- Comment #2 from jules at gcc dot gnu.org ---
Author: jules
Date: Wed Nov 20 17:51:09 2019
New Revision: 278514

URL: https://gcc.gnu.org/viewcvs?rev=278514=gcc=rev
Log:
OpenACC "present" subarrays: runtime API return value and unmapping fixes

PR libgomp/92511

libgomp/
* oacc-mem.c (present_create_copy): Fix device pointer return value in
case of "present" subarray.  Use tgt->tgt_start instead of tgt->to_free
in non-present/create case.
(delete_copyout): Change error condition to fail only on copies outside
of mapped block.  Adjust error message accordingly.
* testsuite/libgomp.oacc-c-c++-common/copyin-devptr-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/copyin-devptr-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/lib-20.c: Adjust expected error
message.
* testsuite/libgomp.oacc-c-c++-common/lib-23.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-22.c: Allow test to pass now.
* testsuite/libgomp.oacc-c-c++-common/lib-30.c: Likewise.

Reviewed-by: Thomas Schwinge 

Added:
trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/copyin-devptr-1.c
trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/copyin-devptr-2.c
Modified:
trunk/libgomp/ChangeLog
trunk/libgomp/oacc-mem.c
trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-20.c
trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-22.c
trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-23.c
trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-30.c

[Bug libgomp/92503] [OpenACC] Behavior of 'acc_free' if the memory space is still used in a mapping

2019-11-13 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92503

--- Comment #1 from jules at gcc dot gnu.org ---
FWIW, I don't think we should do this implicit unmapping, particularly since it
implies an expensive device-to-host-address lookup.

[Bug lto/71959] [OpenACC] lto1: ICE in inline_read_section, at ipa-fnsummary.c:3314

2019-01-10 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71959

--- Comment #11 from jules at gcc dot gnu.org ---
Author: jules
Date: Thu Jan 10 12:32:03 2019
New Revision: 267806

URL: https://gcc.gnu.org/viewcvs?rev=267806=gcc=rev
Log:
Add testcase from PR71959

libgomp/

PR lto/71959
* testsuite/libgomp.oacc-c++/pr71959-aux.cc: New.
* testsuite/libgomp.oacc-c++/pr71959.C: New.

Added:
trunk/libgomp/testsuite/libgomp.oacc-c++/pr71959-aux.cc
trunk/libgomp/testsuite/libgomp.oacc-c++/pr71959.C
Modified:
trunk/libgomp/ChangeLog

[Bug lto/71959] [OpenACC] lto1: ICE in inline_read_section, at ipa-fnsummary.c:3314

2018-12-20 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71959

jules at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jules at gcc dot gnu.org

--- Comment #10 from jules at gcc dot gnu.org ---
The og7/og8 test case libgomp.oacc-c++/pr71959.C appears to be fixed by:

https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01167.html

[Bug middle-end/86336] [9 regression] ICE in omp-low.c:7879

2018-09-12 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86336

--- Comment #3 from jules at gcc dot gnu.org ---
Author: jules
Date: Wed Sep 12 15:21:19 2018
New Revision: 264244

URL: https://gcc.gnu.org/viewcvs?rev=264244=gcc=rev
Log:
[OpenACC] C++ reference mapping

2018-09-09  Cesar Philippidis  
Julian Brown  

PR middle-end/86336

gcc/cp/
* semantics.c (finish_omp_clauses): Treat C++ references the same in
OpenACC as OpenMP.

gcc/
* gimplify.c (gimplify_scan_omp_clauses): Set
target_firstprivatize_array_bases in OpenACC parallel and kernels
region contexts.  Remove GOMP_MAP_FIRSTPRIVATE_REFERENCE clauses from
OpenACC data regions.

libgomp/
* testsuite/libgomp.oacc-c++/non-scalar-data.C: Remove XFAIL.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/cp/ChangeLog
trunk/gcc/cp/semantics.c
trunk/gcc/gimplify.c
trunk/libgomp/ChangeLog
trunk/libgomp/testsuite/libgomp.oacc-c++/non-scalar-data.C

[Bug middle-end/86757] [og8,nvptx] gangprivate related regressions

2018-08-13 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86757

jules at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jules at gcc dot gnu.org

--- Comment #1 from jules at gcc dot gnu.org ---
I believe this is fixed in the patch posted upstream:

https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00749.html

[Bug libgomp/65742] [5/6 Regression] Several libgomp.oacc-* failures after r221922.

2015-05-28 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65742

--- Comment #5 from jules at gcc dot gnu.org ---
Author: jules
Date: Thu May 28 09:29:19 2015
New Revision: 223801

URL: https://gcc.gnu.org/viewcvs?rev=223801root=gccview=rev
Log:
PR libgomp/65742

gcc/
* builtins.c (expand_builtin_acc_on_device): Don't use open-coded
sequence for !ACCEL_COMPILER.

libgomp/
* oacc-init.c (plugin/plugin-host.h): Include.
(acc_on_device): Check whether we're in an offloaded region for
host_nonshm
plugin. Don't use __builtin_acc_on_device.
* plugin/plugin-host.c (GOMP_OFFLOAD_openacc_parallel): Set
nonshm_exec flag in thread-local data.
(GOMP_OFFLOAD_openacc_create_thread_data): Allocate thread-local
data for host_nonshm plugin.
(GOMP_OFFLOAD_openacc_destroy_thread_data): Free thread-local data
for host_nonshm plugin.
* plugin/plugin-host.h: New.


Added:
trunk/libgomp/plugin/plugin-host.h
Modified:
trunk/gcc/ChangeLog
trunk/gcc/builtins.c
trunk/libgomp/ChangeLog
trunk/libgomp/oacc-init.c
trunk/libgomp/plugin/plugin-host.c


[Bug libgomp/65742] [5/6 Regression] Several libgomp.oacc-* failures after r221922.

2015-05-28 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65742

--- Comment #6 from jules at gcc dot gnu.org ---
Author: jules
Date: Thu May 28 09:38:40 2015
New Revision: 223802

URL: https://gcc.gnu.org/viewcvs?rev=223802root=gccview=rev
Log:
PR libgomp/65742

gcc/
* builtins.c (expand_builtin_acc_on_device): Don't use open-coded
sequence for !ACCEL_COMPILER.

libgomp/
* oacc-init.c (plugin/plugin-host.h): Include.
(acc_on_device): Check whether we're in an offloaded region for
host_nonshm
plugin. Don't use __builtin_acc_on_device.
* plugin/plugin-host.c (GOMP_OFFLOAD_openacc_parallel): Set
nonshm_exec flag in thread-local data.
(GOMP_OFFLOAD_openacc_create_thread_data): Allocate thread-local
data for host_nonshm plugin.
(GOMP_OFFLOAD_openacc_destroy_thread_data): Free thread-local data
for host_nonshm plugin.
* plugin/plugin-host.h: New.


Added:
branches/gomp-4_0-branch/libgomp/plugin/plugin-host.h
Modified:
branches/gomp-4_0-branch/gcc/ChangeLog.gomp
branches/gomp-4_0-branch/gcc/builtins.c
branches/gomp-4_0-branch/libgomp/ChangeLog.gomp
branches/gomp-4_0-branch/libgomp/oacc-init.c
branches/gomp-4_0-branch/libgomp/plugin/plugin-host.c


[Bug libgomp/65904] Memory corruption with acc_shutdown, nvptx offloading, libgomp.oacc-c-c++-common/asyncwait-1.c

2015-05-01 Thread jules at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65904

jules at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2015-05-01
 Ever confirmed|0   |1

--- Comment #1 from jules at gcc dot gnu.org ---
Patch posted: https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00021.html


[Bug target/49423] [4.7/4.8/4.9/4.10 Regression] [arm] internal compiler error: in push_minipool_fix

2014-05-01 Thread jules at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49423

--- Comment #26 from jules at gcc dot gnu.org ---
The testcase I previously had for this bug no longer reproduces since LRA was
enabled by default on ARM. So, it's possible the bug is now dormant, or indeed
fixed. The difference in the postreload dump around the insn which previously
failed (136) is, using -mno-lra/-mlra on current trunk:

-(insn 136 7 12 2 (set (reg:SI 11 fp [orig:231 D.6590 ] [231])
-(zero_extend:SI (mem/u/c:QI (symbol_ref/u:SI (*.LC0) [flags 0x2]) [0
 S1 A8]))) ../../aes/aeskey.c:509 182 {*arm_zero_extendqisi2_v6}
+(insn 952 7 136 2 (set (reg:SI 12 ip [1488])
+(const_int 0 [0])) ../../aes/aeskey.c:509 666 {*arm_movsi_vfp}
+ (nil))
+(insn 136 952 12 2 (set (reg:SI 11 fp [orig:231 D.6590 ] [231])
+(zero_extend:SI (reg:QI 12 ip [1488]))) ../../aes/aeskey.c:509 182
{*arm_zero_extendqisi2_v6}

Or, does anyone know of a testcase which still causes this ICE with -mlra
enabled? If not, we might be able to consider this fixed. (Insn 136 -- of the
form which causes breakage -- was generated by reload.)

Thanks,

Julian


[Bug middle-end/59134] New: Infinite loop between store_fixed_bit_field and store_split_bit_field with STRICT_ALIGNMENT

2013-11-14 Thread jules at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59134

Bug ID: 59134
   Summary: Infinite loop between store_fixed_bit_field and
store_split_bit_field with STRICT_ALIGNMENT
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jules at gcc dot gnu.org

Compiling code using packed structures with a single non-zero-sized element and
a zero-sized array can lead to segfaults due to stack exhaustion on
STRICT_ALIGNMENT targets (at least PowerPC E500 targets are affected).

See the following for further details:

http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01458.html

Earlier bugs that are possibly related include:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55438

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57748


[Bug target/57717] error: unrecognizable insn compiling ./strtod_l.c from glibc on powerpc-gnuspe

2013-08-12 Thread jules at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57717

jules at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jules at gcc dot gnu.org

--- Comment #7 from jules at gcc dot gnu.org ---
Here's another candidate patch:

http://gcc.gnu.org/ml/gcc-patches/2013-08/msg00668.html


[Bug target/49423] [4.7/4.8/4.9 Regression] [arm] internal compiler error: in push_minipool_fix

2013-06-20 Thread jules at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49423

jules at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jules at gcc dot gnu.org

--- Comment #20 from jules at gcc dot gnu.org ---
I've posted a new potential fix (and a new testcase which breaks on current
mainline) here:

http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01191.html


[Bug rtl-optimization/57159] New: Latent bug in RTL GCSE/PRE

2013-05-03 Thread jules at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57159



 Bug #: 57159

   Summary: Latent bug in RTL GCSE/PRE

Classification: Unclassified

   Product: gcc

   Version: unknown

Status: UNCONFIRMED

  Severity: minor

  Priority: P3

 Component: rtl-optimization

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: ju...@gcc.gnu.org





Created attachment 30018

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30018

testcase



We encountered a wrong-code bug in an out-of-tree port (4.6-based, though the

affected code doesn't appear to have changed that much) in target-independent

code, though I have not been able to reproduce the problem on a

currently-supported target. The port in question is unusual in that it allows

read-modify-write arithmetic directly on memory locations (for some operations,

at least).



That means that during RTL expansion, the compiler may attach REG_EQUAL notes

to insns that contain memory operands:



(insn 16 15 17 (set (reg:SI 70)

(mem/s:SI (reg/v/f:SI 68 [ iter ]) [3 iter_4(D)-xhv_riter+0 S4 A32]))

hvmin.c:32 -1

 (nil))



(insn 17 16 0 (set (reg:SI 56 [ D.4066 ])

(plus:SI (reg:SI 70)

(const_int 1 [0x1]))) hvmin.c:32 -1

 (expr_list:REG_EQUAL (plus:SI (mem/s:SI (reg/v/f:SI 68 [ iter ]) [3

iter_4(D)-xhv_riter+0 S4 A32])

(const_int 1 [0x1]))

(nil)))



That in itself is not a problem, but certain parts of gcse.c's PRE code examine

only the instruction itself (e.g. when building tables of interesting memory

locations -- compute_ld_motion_mems), whereas other parts pay attention to the

REG_EQUAL note (hash_scan_set). That means that memory references which are

too complicated for PRE to handle properly can leak through to later parts of

the algorithm and cause misoptimisations -- in the case of the attached file,

the increment in the loop is optimised away, and the loop spins forever.



The fix then is to prevent complicated memory references which may be present

in REG_EQUAL notes from being considered for load motion, by stopping them from

being added to the relevant tables to start with. That's what the attached

patch does, and it works for us -- though without a way of reproducing the bug

on mainline, it's not quite obvious that it should be applied there.



Interestingly, on x86 (which also allows read-modify-write operations), it

looks like the bug is latent because a normal addition clobbers the flags

register:



(insn 28 91 94 4 (parallel [

(set (reg:SI 60 [ D.2122 ])

(plus:SI (reg:SI 74 [ iter_4(D)-xhv_riter ])

(const_int 1 [0x1])))

(clobber (reg:CC 17 flags))

]) hvmin.c:32 252 {*addsi_1}

 (expr_list:REG_DEAD (reg:SI 74 [ iter_4(D)-xhv_riter ])

(expr_list:REG_UNUSED (reg:CC 17 flags)

(expr_list:REG_EQUAL (plus:SI (mem/s:SI (reg/v/f:DI 72 [ iter ]) [3

iter_4(D)-xhv_riter+0 S4 A32])

(const_int 1 [0x1]))

(nil)



This is apparently sufficient to prevent the misoptimisation (probably because

of want_to_gcse_p's call to can_assign_to_reg_without_clobbers_p). I tried to

reproduce on m68k also, but addsi patterns are expanded differently there so

the failing condition doesn't trigger.


[Bug rtl-optimization/57159] Latent bug in RTL GCSE/PRE

2013-05-03 Thread jules at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57159



--- Comment #1 from jules at gcc dot gnu.org 2013-05-03 11:56:53 UTC ---

Created attachment 30019

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30019

patch


[Bug rtl-optimization/57159] Latent bug in RTL GCSE/PRE

2013-05-03 Thread jules at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57159



--- Comment #4 from jules at gcc dot gnu.org 2013-05-03 19:56:33 UTC ---

Created attachment 30029

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30029

Before/after dumps



Here are some before/after dumps taken from the out-of-tree port. Notice how in

the before pre dump, the results of insns 27 and 28 are considered to be

equivalent to the results of insns 16 and 17, despite the store (insn 18)

making that equivalence invalid -- this is the misoptimisation alluded to

earlier.



This happens because oprs_available_p mistakenly thinks that insn 16/17's

results are available at the end of their BB, because oprs_unchanged_p -

load_killed_in_block_p - mems_conflict_for_gcse_p - find_rtx_in_ldst finds a

load/store that it thinks it can deal with ((mem/s:SI (reg/v/f:SI 68 [ iter

]))), but actually that mem is used (in the REG_EQUAL note, as extracted by

hash_scan_set) in a way that it cannot handle.


[Bug rtl-optimization/57159] Latent bug in RTL GCSE/PRE

2013-05-03 Thread jules at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57159



--- Comment #5 from jules at gcc dot gnu.org 2013-05-03 20:05:19 UTC ---

Actually the last paragraph might not make sense -- insn 16/17's *operands* are

not available at the end of the BB, but only if the REG_EQUAL note contents are

substituted for the insn pattern. But that's kind of what happens in

hash_scan_set, so I think the overall idea's right.


[Bug testsuite/54622] gcc.dg/vect test failures for arm big-endian

2012-09-26 Thread jules at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54622



jules at gcc dot gnu.org changed:



   What|Removed |Added



 CC||jules at gcc dot gnu.org



--- Comment #5 from jules at gcc dot gnu.org 2012-09-26 16:25:08 UTC ---

Unfortunately (due to the reasons Richard outlined) lots of things don't work

wrt. vectorization for Neon in big-endian mode, so are deliberately disabled in

the backend. IMO, the right thing to do (at least until that is fixed) is to

make the effective-target checks for vectorization features depend on

little-endian ARM mode.


[Bug rtl-optimization/47918] [4.6/4.7 Regression] noreturn discovery broke non local gotos on m68k and i386

2011-11-01 Thread jules at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47918

--- Comment #12 from jules at gcc dot gnu.org 2011-11-01 18:38:45 UTC ---
Author: jules
Date: Tue Nov  1 18:38:42 2011
New Revision: 180740

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=180740
Log:
PR rtl-optimization/47918

* reload1.c (set_initial_label_offsets): Use initial offsets
for labels on the nonlocal_goto_handler_labels chain.


Modified:
branches/gcc-4_6-branch/gcc/ChangeLog
branches/gcc-4_6-branch/gcc/reload1.c


[Bug rtl-optimization/47918] [4.6/4.7 Regression] noreturn discovery broke non local gotos on m68k and i386

2011-10-28 Thread jules at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47918

--- Comment #11 from jules at gcc dot gnu.org 2011-10-28 10:48:36 UTC ---
Author: jules
Date: Fri Oct 28 10:48:32 2011
New Revision: 180611

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=180611
Log:
PR rtl-optimization/47918

* reload1.c (set_initial_label_offsets): Use initial offsets
for labels on the nonlocal_goto_handler_labels chain.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/reload1.c


[Bug middle-end/48580] missed optimization: integer overflow checks

2011-10-05 Thread jules at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48580

jules at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jules at gcc dot gnu.org

--- Comment #12 from jules at gcc dot gnu.org 2011-10-05 12:41:55 UTC ---
I don't much like the idea of using builtins for operations as fundamental as
integer arithmetic. How about this as a straw-man suggestion: adding new
qualifiers for fat integers-with-flags, somewhat in the spirit of the
embedded-C fractional/saturating types? So you might have:

int x, y;

void foo (void)
{
  _Flagged int res;

  res = (_Flagged int) x + y;

  if (_Carry (res))
printf (sum carried\n);

  if (_Overflow (res))
printf (sum overflowed\n);
}

this avoids problems with global state, and allows for the programming style
which (I vaguely get the impression that) people seem to want -- performing a
normal integer operation, then querying for carry/overflow flags afterwards.

These types wouldn't be allowed to cross ABI boundaries (although of course you
could use the magic builtins _Carry/_Overflow to extract values to pass to
functions), and _Overflow would only be allowed for signed types. You could
also have _Borrow for subtraction maybe (whose meaning would be the inverse
of _Carry).

Signalling variants could look like, e.g.:

void bar (void)
{
  int res;

  res = (_Signalling _Flagged int) x + y;
}

Things would get awkward if you tried to use these new constructs in
more-complicated expressions, I suppose. Anyway, just an idea.


[Bug middle-end/48580] missed optimization: integer overflow checks

2011-10-05 Thread jules at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48580

--- Comment #13 from jules at gcc dot gnu.org 2011-10-05 13:05:47 UTC ---
Coming to think of it, if _Sat were allowed on plain integers too, a _Flagged
_Sat int could also be queried for saturation using a similar mechanism, like:

int foo (_Sat int x, _Sat int y)
{
  return _Saturated ((_Sat _Flagged) x + y);
}

I'm probably getting ahead of myself :-).


[Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness

2011-09-20 Thread jules at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575

jules at gcc dot gnu.org changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 CC||jules at gcc dot gnu.org
 Resolution|FIXED   |

--- Comment #9 from jules at gcc dot gnu.org 2011-09-20 19:03:43 UTC ---
This appears to have regressed on mainline. I now get the following assembly
output for the test case added by Maxim:

longfunc:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
stmfd   sp!, {r4, r5}
umull   r4, r5, r0, r2
mul r3, r0, r3
mla r1, r2, r1, r3
mov r0, r4
add r1, r1, r5
ldmfd   sp!, {r4, r5}
bx  lr


[Bug tree-optimization/45932] execute/pr37573.c fails after Neon misaligned patch.

2010-10-08 Thread jules at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45932

--- Comment #2 from jules at gcc dot gnu.org 2010-10-08 16:17:58 UTC ---
Could you be more specific about testsuite options, etc. needed to observe this
apparent regression? I haven't been able to reproduce it using default options,
nor by specifying [-mthumb] -march=armv7-a -mfpu=neon -mfloat-abi=softfp to the
test run. I do see a failure like so:

.../gcc/testsuite/gcc.c-torture/execute/pr37573.c:64:1: sorry, unimplemented:
gimple bytecode streams do not support machine specific builtin functions on
this target

for -flto and -fwhopr, but those happen irrespective of whether the
misalignment patch is applied or not.

Thanks!