Re: [PATCH 1/3, libgomp] Resolve libgomp plugin deadlock on exit, libgomp proper parts
Patch has been updated to accommodate the gomp_fini_device() removal changes. And ping. On 2015/12/14 11:47 PM, Chung-Lin Tang wrote: > [sorry, forgot to C gcc-patches in last send] > > Hi Jakub, > these patches are a revision of > https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01701.html > since that patch set have bitrotten by now. > > To recap the original situation, due to the way that device locks are held > when entering plugin code, a GOMP_PLUGIN_fatal() call will deadlock when the > GOMP_unregister_var() exit destructor tries to obtain the same device lock. > > This patch set revises many functions on libgomp plugin interface to return > false on error, > and back to libgomp to release the lock and call gomp_fatal() there. > > This first patch is the changes for the machine independent libgomp proper. > The entire patch > set was tested without regressions. Is this okay for trunk? > > Thanks, > Chung-Lin > > 2015-12-14 Chung-Lin Tang <clt...@codesourcery.com> > > * target.c (gomp_device_copy): New function. > (gomp_copy_host2dev): Likewise. > (gomp_copy_dev2host): Likewise. > (gomp_free_device_memory): Likewise. > (gomp_map_vars_existing): Adjust to call gomp_copy_host2dev(). > (gomp_map_pointer): Likewise. > (gomp_map_vars): Adjust to call gomp_copy_host2dev(), handle > NULL value from alloc_func plugin hook. > (gomp_unmap_tgt): Adjust to call gomp_free_device_memory(). > (gomp_copy_from_async): Adjust to call gomp_copy_dev2host(). > (gomp_unmap_vars): Likewise. > (gomp_update): Adjust to call gomp_copy_dev2host() and > gomp_copy_host2dev() functions. > (gomp_init_device): Handle false value from init_device_func > plugin hook. > (gomp_fini_device): Handle false value from fini_device_func > plugin hook. > (gomp_exit_data): Adjust to call gomp_copy_dev2host(). > (omp_target_free): Adjust to call gomp_free_device_memory(). > (omp_target_memcpy): Handle return values from host2dev_func, > dev2host_func, and dev2dev_func plugin hooks. > (omp_target_memcpy_rect_worker): Likewise. > * libgomp.h (struct gomp_device_descr): Adjust return type of > init_device_func, fini_device_func, free_func, dev2host_func, > host2dev_func, and dev2dev_func plugin hooks from 'void *' to > bool. > * oacc-host.c (host_init_device): Change return type to bool. > (host_fini_device): Likewise. > (host_free): Likewise. > (host_dev2host): Likewise. > (host_host2dev): Likewise. > * oacc-mem.c (acc_free): Handle plugin hook fatal error case. > (acc_memcpy_to_device): Likewise. > (acc_memcpy_from_device): Likewise. > (delete_copyout): Add libfnname parameter, handle free_func > hook fatal error case. > (acc_delete): Adjust delete_copyout call. > (acc_copyout): Likewise. > > > Index: libgomp.h === --- libgomp.h (revision 232047) +++ libgomp.h (working copy) @@ -927,16 +927,17 @@ struct gomp_device_descr unsigned int (*get_caps_func) (void); int (*get_type_func) (void); int (*get_num_devices_func) (void); - void (*init_device_func) (int); - void (*fini_device_func) (int); + bool (*init_device_func) (int); + bool (*fini_device_func) (int); unsigned (*version_func) (void); int (*load_image_func) (int, unsigned, const void *, struct addr_pair **); void (*unload_image_func) (int, unsigned, const void *); void *(*alloc_func) (int, size_t); - void (*free_func) (int, void *); - void *(*dev2host_func) (int, void *, const void *, size_t); - void *(*host2dev_func) (int, void *, const void *, size_t); - void *(*dev2dev_func) (int, void *, const void *, size_t); + bool (*free_func) (int, void *); + bool (*dev2host_func) (int, void *, const void *, size_t); + bool (*host2dev_func) (int, void *, const void *, size_t); + /*xxx*/ + bool (*dev2dev_func) (int, void *, const void *, size_t); void (*run_func) (int, void *, void *); void (*async_run_func) (int, void *, void *, void *); Index: oacc-host.c === --- oacc-host.c (revision 232047) +++ oacc-host.c (working copy) @@ -60,14 +60,16 @@ host_get_num_devices (void) return 1; } -static void +static bool host_init_device (int n __attribute__ ((unused))) { + return true; } -static void +static bool host_fini_device (int n __attribute__ ((unused))) { + return true; } static unsigned @@ -98,28 +100,29 @@ host_alloc (int n __attribute__ ((unused)), size_t return gomp_malloc (s); } -static void
[PATCH] OpenACC use_device clause ICE fix
Hi, we've been encountering an ICE for OpenACC host_data sections, which has a use_device() clause similar to OpenMP use_device_ptr. The ICE happens in make_decl_rtl() for scan-created variables, which IIUC, should not be entered at all for automatic variables. I believe the problem is, unlike other variable creation cases where the code is split out into an offloaded child function, a host_data section is actually host side code, so the child function local variable processing doesn't apply here; the use_device() referenced variable has to be added to the current host function. So here is the quite small fix. This fixed the ICE for OpenACC on trunk and gomp4. However when I tested it for OpenMP using the case that Julian provided here[1], the same ICE appeared to be already fixed. I'm not sure if some other interim change covered it up for OpenMP. This patch was tested on trunk without regressions. Okay for trunk? [1] https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00104.html Thanks, Chung-Lin * omp-low.c (scan_sharing_clauses): Call add_local_decl() for use_device/use_device_ptr variables. Index: omp-low.c === --- omp-low.c (revision 232047) +++ omp-low.c (working copy) @@ -1972,7 +1972,10 @@ scan_sharing_clauses (tree clauses, omp_context *c gcc_assert (DECL_P (decl2)); install_var_local (decl2, ctx); } - install_var_local (decl, ctx); + decl = install_var_local (decl, ctx); + /* use_device/use_device_ptr items are actually host side variables, +not on the offloaded target; add to current function here. */ + add_local_decl (cfun, decl); break; case OMP_CLAUSE_IS_DEVICE_PTR:
Re: [PATCH, libgomp] Rewire OpenACC async
Ping. On 2015/11/24 6:27 PM, Chung-Lin Tang wrote: > Hi, this patch reworks some of the way that asynchronous copyouts are > implemented for OpenACC in libgomp. > > Before this patch, we had a somewhat confusing way of implementing this > by having two refcounts for each mapping: refcount and async_refcount, > which I never got working again after the last wave of async regressions > showed up. > > So this patch implements what I believe to be a simplification: async_refcount > is removed, and instead of trying to queue the async copyouts during unmapping > we actually do that during the plugin event handling. This requires a addition > of the async stream integer as an argument to the register_async_cleanup > plugin hook, but overall I think this should be more elegant than before. > > This patch fixes the libgomp.oacc-c-c++-common/asyncwait-1.c regression. > It also fixed data-[23].c regressions before, but some other recent check-in > happened to already fixed those. > > Tested without regressions, is this okay for trunk? > > Thanks, > Chung-Lin > > 2015-11-24 Chung-Lin Tang <clt...@codesourcery.com> > > * oacc-plugin.h (GOMP_PLUGIN_async_unmap_vars): Add int parameter. > * oacc-plugin.c (GOMP_PLUGIN_async_unmap_vars): Add 'int async' > parameter, use to set async stream around call to gomp_unmap_vars, > call gomp_unmap_vars() with 'do_copyfrom' set to true. > * plugin/plugin-nvptx.c (struct ptx_event): Add 'int val' field. > (event_gc): Adjust event handling loop, collect PTX_EVT_ASYNC_CLEANUP > events and call GOMP_PLUGIN_async_unmap_vars() for each of them. > (event_add): Add int parameter, initialize 'val' field when > adding new ptx_event struct. > (nvptx_evec): Adjust event_add() call arguments. > (nvptx_host2dev): Likewise. > (nvptx_dev2host): Likewise. > (nvptx_wait_async): Likewise. > (nvptx_wait_all_async): Likewise. > (GOMP_OFFLOAD_openacc_register_async_cleanup): Add async parameter, > pass to event_add() call. > * oacc-host.c (host_openacc_register_async_cleanup): Add 'int async' > parameter. > * oacc-mem.c (gomp_acc_remove_pointer): Adjust async case to > call openacc.register_async_cleanup_func() hook. > * oacc-parallel.c (GOACC_parallel_keyed): Likewise. > * target.c (gomp_copy_from_async): Delete function. > (gomp_map_vars): Remove async_refcount. > (gomp_unmap_vars): Likewise. > (gomp_load_image_to_device): Likewise. > (omp_target_associate_ptr): Likewise. > * libgomp.h (struct splay_tree_key_s): Remove async_refcount. > (acc_dispatch_t.register_async_cleanup_func): Add int parameter. > (gomp_copy_from_async): Remove. >
[PATCH 1/3, libgomp] Resolve libgomp plugin deadlock on exit, libgomp proper parts
[sorry, forgot to C gcc-patches in last send] Hi Jakub, these patches are a revision of https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01701.html since that patch set have bitrotten by now. To recap the original situation, due to the way that device locks are held when entering plugin code, a GOMP_PLUGIN_fatal() call will deadlock when the GOMP_unregister_var() exit destructor tries to obtain the same device lock. This patch set revises many functions on libgomp plugin interface to return false on error, and back to libgomp to release the lock and call gomp_fatal() there. This first patch is the changes for the machine independent libgomp proper. The entire patch set was tested without regressions. Is this okay for trunk? Thanks, Chung-Lin 2015-12-14 Chung-Lin Tang <clt...@codesourcery.com> * target.c (gomp_device_copy): New function. (gomp_copy_host2dev): Likewise. (gomp_copy_dev2host): Likewise. (gomp_free_device_memory): Likewise. (gomp_map_vars_existing): Adjust to call gomp_copy_host2dev(). (gomp_map_pointer): Likewise. (gomp_map_vars): Adjust to call gomp_copy_host2dev(), handle NULL value from alloc_func plugin hook. (gomp_unmap_tgt): Adjust to call gomp_free_device_memory(). (gomp_copy_from_async): Adjust to call gomp_copy_dev2host(). (gomp_unmap_vars): Likewise. (gomp_update): Adjust to call gomp_copy_dev2host() and gomp_copy_host2dev() functions. (gomp_init_device): Handle false value from init_device_func plugin hook. (gomp_fini_device): Handle false value from fini_device_func plugin hook. (gomp_exit_data): Adjust to call gomp_copy_dev2host(). (omp_target_free): Adjust to call gomp_free_device_memory(). (omp_target_memcpy): Handle return values from host2dev_func, dev2host_func, and dev2dev_func plugin hooks. (omp_target_memcpy_rect_worker): Likewise. * libgomp.h (struct gomp_device_descr): Adjust return type of init_device_func, fini_device_func, free_func, dev2host_func, host2dev_func, and dev2dev_func plugin hooks from 'void *' to bool. * oacc-host.c (host_init_device): Change return type to bool. (host_fini_device): Likewise. (host_free): Likewise. (host_dev2host): Likewise. (host_host2dev): Likewise. * oacc-mem.c (acc_free): Handle plugin hook fatal error case. (acc_memcpy_to_device): Likewise. (acc_memcpy_from_device): Likewise. (delete_copyout): Add libfnname parameter, handle free_func hook fatal error case. (acc_delete): Adjust delete_copyout call. (acc_copyout): Likewise. Index: libgomp/libgomp.h === --- libgomp/libgomp.h (revision 231613) +++ libgomp/libgomp.h (working copy) @@ -914,16 +914,17 @@ struct gomp_device_descr unsigned int (*get_caps_func) (void); int (*get_type_func) (void); int (*get_num_devices_func) (void); - void (*init_device_func) (int); - void (*fini_device_func) (int); + bool (*init_device_func) (int); + bool (*fini_device_func) (int); unsigned (*version_func) (void); int (*load_image_func) (int, unsigned, const void *, struct addr_pair **); void (*unload_image_func) (int, unsigned, const void *); void *(*alloc_func) (int, size_t); - void (*free_func) (int, void *); - void *(*dev2host_func) (int, void *, const void *, size_t); - void *(*host2dev_func) (int, void *, const void *, size_t); - void *(*dev2dev_func) (int, void *, const void *, size_t); + bool (*free_func) (int, void *); + bool (*dev2host_func) (int, void *, const void *, size_t); + bool (*host2dev_func) (int, void *, const void *, size_t); + /*xxx*/ + bool (*dev2dev_func) (int, void *, const void *, size_t); void (*run_func) (int, void *, void *); void (*async_run_func) (int, void *, void *, void *); Index: libgomp/oacc-host.c === --- libgomp/oacc-host.c (revision 231613) +++ libgomp/oacc-host.c (working copy) @@ -60,14 +60,16 @@ host_get_num_devices (void) return 1; } -static void +static bool host_init_device (int n __attribute__ ((unused))) { + return true; } -static void +static bool host_fini_device (int n __attribute__ ((unused))) { + return true; } static unsigned @@ -98,28 +100,29 @@ host_alloc (int n __attribute__ ((unused)), size_t return gomp_malloc (s); } -static void +static bool host_free (int n __attribute__ ((unused)), void *p) { free (p); + return true; } -static void * +static bool host_dev2host (int n __attribute__ ((unused)), void *h __attribute__ ((unused)), const void *d __attribute__ ((unused)), size_t s __attribute__ ((unused))) { - return NULL; + return true; } -static void * +static bool host_host2dev (int n __attri
[PATCH 2/3, libgomp] Resolve libgomp plugin deadlock on exit, nvptx parts
These are the nvptx parts. Thanks, Chung-Lin * plugin/plugin-nvptx.c (CUDA_CALL_ERET): New convenience macro. (CUDA_CALL): Likewise. (CUDA_CALL_ASSERT): Likewise. (map_init): Change return type to bool, use CUDA_CALL* macros. (map_fini): Likewise. (init_streams_for_device): Change return type to bool, adjust call to map_init. (fini_streams_for_device): Change return type to bool, adjust call to map_fini. (select_stream_for_async): Release stream_lock before calls to GOMP_PLUGIN_fatal, adjust call to map_init. (nvptx_init): Use CUDA_CALL* macros. (nvptx_attach_host_thread_to_device): Change return type to bool, use CUDA_CALL* macros. (nvptx_open_device): Use CUDA_CALL* macros. (nvptx_close_device): Change return type to bool, use CUDA_CALL* macros. (nvptx_get_num_devices): Use CUDA_CALL* macros. (link_ptx): Change return type to bool, use CUDA_CALL* macros. (nvptx_exec): Use CUDA_CALL* macros. (nvptx_alloc): Use CUDA_CALL* macros. (nvptx_free): Change return type to bool, use CUDA_CALL* macros. (nvptx_host2dev): Likewise. (nvptx_dev2host): Likewise. (nvptx_wait): Use CUDA_CALL* macros. (nvptx_wait_async): Likewise. (nvptx_wait_all): Likewise. (nvptx_wait_all_async): Likewise. (nvptx_set_cuda_stream): Adjust order of stream_lock acquire, use CUDA_CALL* macros, adjust call to map_fini. (GOMP_OFFLOAD_init_device): Change return type to bool, adjust code accordingly. (GOMP_OFFLOAD_fini_device): Likewise. (GOMP_OFFLOAD_load_image): Adjust calls to nvptx_attach_host_thread_to_device/link_ptx to handle errors, use CUDA_CALL* macros. (GOMP_OFFLOAD_alloc): Adjust calls to code to handle error return. (GOMP_OFFLOAD_free): Change return type to bool, adjust calls to handle error return. (GOMP_OFFLOAD_dev2host): Likewise. (GOMP_OFFLOAD_host2dev): Likewise. (GOMP_OFFLOAD_openacc_register_async_cleanup): Use CUDA_CALL* macros. (GOMP_OFFLOAD_openacc_create_thread_data): Likewise. Index: libgomp/plugin/plugin-nvptx.c === --- libgomp/plugin/plugin-nvptx.c (revision 231613) +++ libgomp/plugin/plugin-nvptx.c (working copy) @@ -63,6 +63,34 @@ cuda_error (CUresult r) return desc; } +/* Convenience macros for the frequently used CUDA library call and + error handling sequence. This does not capture all the cases we + use in this file, but is common enough. */ + +#define CUDA_CALL_ERET(ERET, FN, ...) \ + do { \ +unsigned __r = FN (__VA_ARGS__); \ +if (__r != CUDA_SUCCESS) \ + { \ + GOMP_PLUGIN_error (#FN " error: %s", \ + cuda_error (__r)); \ + return ERET;\ + } \ + } while (0) + +#define CUDA_CALL(FN, ...) \ + CUDA_CALL_ERET (false, (FN), __VA_ARGS__) + +#define CUDA_CALL_ASSERT(FN, ...) \ + do { \ +unsigned __r = FN (__VA_ARGS__); \ +if (__r != CUDA_SUCCESS) \ + { \ + GOMP_PLUGIN_fatal (#FN " error: %s", \ + cuda_error (__r)); \ + } \ + } while (0) + static unsigned int instantiated_devices = 0; static pthread_mutex_t ptx_dev_lock = PTHREAD_MUTEX_INITIALIZER; @@ -98,25 +126,18 @@ struct map charmappings[0]; }; -static void +static bool map_init (struct ptx_stream *s) { - CUresult r; - int size = getpagesize (); assert (s); assert (!s->d); assert (!s->h); - r = cuMemAllocHost (>h, size); - if (r != CUDA_SUCCESS) -GOMP_PLUGIN_fatal ("cuMemAllocHost error: %s", cuda_error (r)); + CUDA_CALL (cuMemAllocHost, >h, size); + CUDA_CALL (cuMemHostGetDevicePointer, >d, s->h, 0); - r = cuMemHostGetDevicePointer (>d, s->h, 0); - if (r != CUDA_SUCCESS) -GOMP_PLUGIN_fatal ("cuMemHostGetDevicePointer error: %s", cuda_error (r)); - assert (s->h); s->h_begin = s->h; @@ -125,16 +146,14 @@ map_init (struct ptx_stream *s) assert (s->h_next); assert (s->h_end); + return true; } -static void +static bool map_fini (struct ptx_stream *s) { - CUresult r; - - r = cuMemFreeHost (s->h); - if (r != CUDA_SUCCESS) -GOMP_PLUGIN_fatal ("cuMemFreeHost error: %s", cuda_error (r)); + CUDA_CALL (cuMemFreeHost, s->h); + return true; } static void @@ -325,7 +344,7 @@ nvptx_thread (void) return (struct nvptx_thread *) GOMP_PLUGIN_acc_thread (); } -static void +static bool init_streams_for_device (struct ptx_device *ptx_dev, int concurrency) { int i; @@ -337,9 +356,10 @@ init_streams_for_device (struct ptx_device *ptx_de null_stream->multithreaded = true; null_stream->d = (CUdeviceptr) NULL; null_stream->h = NULL; - map_init (null_stream); - ptx_dev->null_stream = null_stream; + if (!map_init (null_stream)) +return false; + ptx_dev->null_stream =
[PATCH 3/3, libgomp] Resolve libgomp plugin deadlock on exit, intelmic parts
Hi Ilya, thanks for the prior review (https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01893.html), This version is mostly a like the prior one, with some minor code updates. Thanks, Chung-Lin 2015-12-14 Chung-Lin Tang <clt...@codesourcery.com> * plugin/libgomp-plugin-intelmic.cpp (offload): Change return type to bool, adjust return code. (GOMP_OFFLOAD_init_device): Likewise. (GOMP_OFFLOAD_fini_device): Likewise. (get_target_table): Likewise. (offload_image): Likwise. (GOMP_OFFLOAD_load_image): Adjust call to offload_image(), change exit() to return error. (GOMP_OFFLOAD_alloc): Change return type to bool, change to use out parameter to return allocated pointer. (GOMP_OFFLOAD_free): Change return type to bool, adjust return code. (GOMP_OFFLOAD_host2dev): Likewise. (GOMP_OFFLOAD_dev2host): Likewise. (GOMP_OFFLOAD_dev2dev): Likewise. Index: liboffloadmic/plugin/libgomp-plugin-intelmic.cpp === --- liboffloadmic/plugin/libgomp-plugin-intelmic.cpp (revision 231613) +++ liboffloadmic/plugin/libgomp-plugin-intelmic.cpp (working copy) @@ -205,7 +205,7 @@ GOMP_OFFLOAD_get_num_devices (void) return num_devices; } -static void +static bool offload (const char *file, uint64_t line, int device, const char *name, int num_vars, VarDesc *vars, const void **async_data) { @@ -213,20 +213,21 @@ offload (const char *file, uint64_t line, int devi if (ofld) { if (async_data == NULL) - __offload_offload1 (ofld, name, 0, num_vars, vars, NULL, 0, NULL, NULL); + return __offload_offload1 (ofld, name, 0, num_vars, vars, NULL, 0, + NULL, NULL); else { OffloadFlags flags; flags.flags = 0; flags.bits.omp_async = 1; - __offload_offload3 (ofld, name, 0, num_vars, vars, NULL, 0, NULL, - async_data, 0, NULL, flags, NULL); + return __offload_offload3 (ofld, name, 0, num_vars, vars, NULL, 0, + NULL, async_data, 0, NULL, flags, NULL); } } else { - fprintf (stderr, "%s:%d: Offload target acquire failed\n", file, line); - exit (1); + GOMP_PLUGIN_error ("%s:%d: Offload target acquire failed\n", file, line); + return false; } } @@ -256,24 +257,25 @@ register_main_image () /* liboffloadmic loads and runs offload_target_main on all available devices during a first call to offload (). */ -extern "C" void +extern "C" bool GOMP_OFFLOAD_init_device (int device) { TRACE ("(device = %d)", device); pthread_once (_image_is_registered, register_main_image); - offload (__FILE__, __LINE__, device, "__offload_target_init_proc", 0, NULL, - NULL); + return offload (__FILE__, __LINE__, device, "__offload_target_init_proc", 0, + NULL, NULL); } -extern "C" void +extern "C" bool GOMP_OFFLOAD_fini_device (int device) { TRACE ("(device = %d)", device); /* Unreachable for GOMP_OFFLOAD_CAP_OPENMP_400. */ abort (); + return true; } -static void +static bool get_target_table (int device, int _funcs, int _vars, void **) { VarDesc vd1[2] = { vd_tgt2host, vd_tgt2host }; @@ -282,8 +284,9 @@ get_target_table (int device, int _funcs, int vd1[1].ptr = _vars; vd1[1].size = sizeof (num_vars); - offload (__FILE__, __LINE__, device, "__offload_target_table_p1", 2, vd1, - NULL); + if (!offload (__FILE__, __LINE__, device, "__offload_target_table_p1", 2, + vd1, NULL)) +return false; int table_size = num_funcs + 2 * num_vars; if (table_size > 0) @@ -295,15 +298,16 @@ get_target_table (int device, int _funcs, int vd2.ptr = table; vd2.size = table_size * sizeof (void *); - offload (__FILE__, __LINE__, device, "__offload_target_table_p2", 1, , - NULL); + return offload (__FILE__, __LINE__, device, "__offload_target_table_p2", + 1, , NULL); } + return true; } /* Offload TARGET_IMAGE to all available devices and fill address_table with corresponding target addresses. */ -static void +static bool offload_image (const void *target_image) { void *image_start = ((void **) target_image)[0]; @@ -317,8 +321,8 @@ offload_image (const void *target_image) + image_size); if (!image) { - fprintf (stderr, "%s: Can't allocate memory\n", __FILE__); - exit (1); + GOMP_PLUGIN_error ("%s: Can't allocate memory\n", __FILE__); + return false; } image->size = image_size; @@ -333,13 +337,14 @@ offload_image (const void *target_image) /* Receive tables for target_image from all devices. */ DevAddrVect dev_table; + bool ret = true; for (int dev = 0; dev < num_devices; dev++) { int num_funcs = 0; int num_vars = 0; void
Re: [PATCH, libgomp] Rewire OpenACC async
On 2015/12/1 08:01 PM, Julian Brown wrote: > On Tue, 24 Nov 2015 18:27:24 +0800 > Chung-Lin Tang <clt...@codesourcery.com> wrote: > >> Hi, this patch reworks some of the way that asynchronous copyouts are >> implemented for OpenACC in libgomp. >> >> Before this patch, we had a somewhat confusing way of implementing >> this by having two refcounts for each mapping: refcount and >> async_refcount, which I never got working again after the last wave >> of async regressions showed up. >> >> So this patch implements what I believe to be a simplification: >> async_refcount is removed, and instead of trying to queue the async >> copyouts during unmapping we actually do that during the plugin event >> handling. This requires a addition of the async stream integer as an >> argument to the register_async_cleanup plugin hook, but overall I >> think this should be more elegant than before. > > This looks OK to me I think (I've only looked fairly briefly). I vaguely > remember trying something along these lines in an earlier iteration of > the async support -- maybe hitting problems with locking (I see you > have code to mitigate problems with that, and locking generally has > probably evolved a bit since I last looked at the code in detail > anyway). > > Can event_gc ever be called when the *device* lock is held? It only matters when the memmap_lockable argument is true, and for those cases, no the device lock is never held. > I'm slightly concerned that pushing async unmapping into event_gc means > that program-level semantics are deferred to the backend, which is > arguably the wrong place. But then I don't understand what went wrong > with the dual-refcount implementation, so maybe it's unavoidable for > some reason. I got the dual-refcounting to work again (after the regressions first showed up) in some cases briefly, but regressed in other testcases, which I don't recall the full details now. Indeed the copyout is now triggered inside the plugin, but it is still wrapped inside GOMP_PLUGIN_async_unmap_vars(), so it's probably not too ugly. Per our earlier internal discussion, I'm committing this to the gomp4 branch first. Trunk will need to wait for Jakub's approval. Thanks, Chung-Lin
Re: [PATCH, C++] Wrap OpenACC wait in EXPR_STMT
On 2015/12/3 06:32 PM, Chung-Lin Tang wrote: > On 2015/12/3 6:11 PM, Jakub Jelinek wrote: >> On Thu, Dec 03, 2015 at 06:05:36PM +0800, Chung-Lin Tang wrote: >>>> Oh wait, it looks like the C++ front end is not actually using the >>>> functions defined in the C/C++-shared gcc/c-family/c-omp.c, but has its >>>> own implementations in gcc/cp/semantics.c, without "c_" prefixes? In >>>> addition to finish_expr_stmt calls, I see it's also using >>>> finish_call_expr instead of build_call_expr_loc/build_call_expr_loc_vec. >>>> So I guess we'll want to model this the same way for OpenACC support >>>> functions, and then (later) we should clean this up, to move the >>>> C-specific code from gcc/c-family/c-omp.c into the C front end? (Jakub?) >>> >>> I see most OpenACC/OpenMP constructs are represented by special statement >>> codes, >>> so they should be a different case. I so far only see the OpenACC wait >>> directive >>> being represented as a CALL_EXPR (maybe there are others, haven't >>> exhaustively searched). >> >> No, Thomas is right, just look at >> finish_omp_{barrier,flush,taskwait,taskyield,cancel,cancellation_point}, >> all those are represented as CALL_EXPRs. >> >> Jakub >> > > Okay, I guess my impression was only for some OpenACC constructs. > > Overall, OpenACC wait seems one of the few cases of using c_finish_* in > cp/parser.c. > Whether other cases should move towards/away from that kind of style is a > larger question, > I was only trying to fix a libgomp.oacc-c++/template-reduction.C regression > (testcase currently still in gomp4 branch) > > Chung-Lin > Per our internal discussion, I will commit this patch first to the gomp4 branch, while awaiting trunk approval. Thanks, Chung-Lin
Re: [PATCH, C++] Wrap OpenACC wait in EXPR_STMT
On 2015/12/3 6:11 PM, Jakub Jelinek wrote: > On Thu, Dec 03, 2015 at 06:05:36PM +0800, Chung-Lin Tang wrote: >>> Oh wait, it looks like the C++ front end is not actually using the >>> functions defined in the C/C++-shared gcc/c-family/c-omp.c, but has its >>> own implementations in gcc/cp/semantics.c, without "c_" prefixes? In >>> addition to finish_expr_stmt calls, I see it's also using >>> finish_call_expr instead of build_call_expr_loc/build_call_expr_loc_vec. >>> So I guess we'll want to model this the same way for OpenACC support >>> functions, and then (later) we should clean this up, to move the >>> C-specific code from gcc/c-family/c-omp.c into the C front end? (Jakub?) >> >> I see most OpenACC/OpenMP constructs are represented by special statement >> codes, >> so they should be a different case. I so far only see the OpenACC wait >> directive >> being represented as a CALL_EXPR (maybe there are others, haven't >> exhaustively searched). > > No, Thomas is right, just look at > finish_omp_{barrier,flush,taskwait,taskyield,cancel,cancellation_point}, > all those are represented as CALL_EXPRs. > > Jakub > Okay, I guess my impression was only for some OpenACC constructs. Overall, OpenACC wait seems one of the few cases of using c_finish_* in cp/parser.c. Whether other cases should move towards/away from that kind of style is a larger question, I was only trying to fix a libgomp.oacc-c++/template-reduction.C regression (testcase currently still in gomp4 branch) Chung-Lin
Re: [PATCH, C++] Wrap OpenACC wait in EXPR_STMT
On 2015/12/3 4:59 PM, Thomas Schwinge wrote: > Hi! > > On Thu, 03 Dec 2015 09:51:31 +0100, I wrote: >> On Mon, 23 Nov 2015 21:15:00 +0800, Chung-Lin Tang <clt...@codesourcery.com> >> wrote: >>> The OpenACC wait directive is represented as a call to the runtime >>> function "GOACC_wait" instead of a tree code. I am seeing when >>> '#pragma acc wait' is using inside a template function, the CALL_EXPR >>> to GOACC_wait is being silently ignored/removed during tsubst_expr(). >> >> Uh. >> >>> I think the correct way to organize this is that the call should be inside >>> an EXPR_STMT, so here's a patch to do that; basically remove the >>> add_stmt() call from the shared c_finish_oacc_wait() code, and add >>> add_stmt()/finish_expr_stmt() in the corresponding C/C++ parts. >>> >>> Tested with no regressions on trunk, okay to commit? >> >>> --- c-family/c-omp.c(revision 230703) >>> +++ c-family/c-omp.c(working copy) >>> @@ -63,7 +63,6 @@ c_finish_oacc_wait (location_t loc, tree parms, tr >>> } >>> >>>stmt = build_call_expr_loc_vec (loc, stmt, args); >>> - add_stmt (stmt); >>> >>>vec_free (args); >> | >> |return stmt; >> | } >> >> I see in gcc/c/c-omp.c that several other c_finish_omp_* functions that >> build builtin calls instead of tree nodes, do similar things like >> c_finish_oacc_wait; I'd like to understand why it's -- presumably -- not >> a problem for these: c_finish_omp_barrier, c_finish_omp_taskwait, >> c_finish_omp_taskyield, c_finish_omp_flush? (Jakub?) > > Oh wait, it looks like the C++ front end is not actually using the > functions defined in the C/C++-shared gcc/c-family/c-omp.c, but has its > own implementations in gcc/cp/semantics.c, without "c_" prefixes? In > addition to finish_expr_stmt calls, I see it's also using > finish_call_expr instead of build_call_expr_loc/build_call_expr_loc_vec. > So I guess we'll want to model this the same way for OpenACC support > functions, and then (later) we should clean this up, to move the > C-specific code from gcc/c-family/c-omp.c into the C front end? (Jakub?) I see most OpenACC/OpenMP constructs are represented by special statement codes, so they should be a different case. I so far only see the OpenACC wait directive being represented as a CALL_EXPR (maybe there are others, haven't exhaustively searched). Chung-Lin
Re: [gomp4] Adjust Fortran OACC async lib test
Ping. Hi Thomas, this is only for gomp4 ATM, okay to commit? Thanks, Chung-Lin On 2015/11/23 7:09 PM, Chung-Lin Tang wrote: > Hi Thomas, > this fix adds more acc_wait's to libgomp.oacc-fortran/lib-1[13].f90. > > For lib-12.f90, it's sort of a fix before we can resolve the issue > of intended semantics for "wait+async". > > As for lib-13.f90, I believe these added acc_wait calls seem > reasonable, since we can't immediately assume the async-launched parallels > already completed there. > > Does this seem reasonable? > > Thanks, > Chung-Lin > > * testsuite/libgomp.oacc-fortran/lib-12.f90 (main): Add acc_wait() > after async parallel construct. > * testsuite/libgomp.oacc-fortran/lib-13.f90 (main): Add acc_wait() > calls after parallel construct launches. >
Re: [PATCH, C++] Wrap OpenACC wait in EXPR_STMT
Ping. On 2015/11/23 9:15 PM, Chung-Lin Tang wrote: > The OpenACC wait directive is represented as a call to the runtime > function "GOACC_wait" instead of a tree code. I am seeing when > '#pragma acc wait' is using inside a template function, the CALL_EXPR > to GOACC_wait is being silently ignored/removed during tsubst_expr(). > > I think the correct way to organize this is that the call should be inside > an EXPR_STMT, so here's a patch to do that; basically remove the > add_stmt() call from the shared c_finish_oacc_wait() code, and add > add_stmt()/finish_expr_stmt() in the corresponding C/C++ parts. > > Tested with no regressions on trunk, okay to commit? > > Thanks, > Chung-Lin > > * c-family/c-omp.c (c_finish_oacc_wait): Remove add_stmt() call. > * c/c-parser.c (c_parser_oacc_wait): Add add_stmt() call. > * cp/parser.c (cp_parser_oacc_wait): Add finish_expr_stmt() call. >
[PATCH, libgomp] Rewire OpenACC async
Hi, this patch reworks some of the way that asynchronous copyouts are implemented for OpenACC in libgomp. Before this patch, we had a somewhat confusing way of implementing this by having two refcounts for each mapping: refcount and async_refcount, which I never got working again after the last wave of async regressions showed up. So this patch implements what I believe to be a simplification: async_refcount is removed, and instead of trying to queue the async copyouts during unmapping we actually do that during the plugin event handling. This requires a addition of the async stream integer as an argument to the register_async_cleanup plugin hook, but overall I think this should be more elegant than before. This patch fixes the libgomp.oacc-c-c++-common/asyncwait-1.c regression. It also fixed data-[23].c regressions before, but some other recent check-in happened to already fixed those. Tested without regressions, is this okay for trunk? Thanks, Chung-Lin 2015-11-24 Chung-Lin Tang <clt...@codesourcery.com> * oacc-plugin.h (GOMP_PLUGIN_async_unmap_vars): Add int parameter. * oacc-plugin.c (GOMP_PLUGIN_async_unmap_vars): Add 'int async' parameter, use to set async stream around call to gomp_unmap_vars, call gomp_unmap_vars() with 'do_copyfrom' set to true. * plugin/plugin-nvptx.c (struct ptx_event): Add 'int val' field. (event_gc): Adjust event handling loop, collect PTX_EVT_ASYNC_CLEANUP events and call GOMP_PLUGIN_async_unmap_vars() for each of them. (event_add): Add int parameter, initialize 'val' field when adding new ptx_event struct. (nvptx_evec): Adjust event_add() call arguments. (nvptx_host2dev): Likewise. (nvptx_dev2host): Likewise. (nvptx_wait_async): Likewise. (nvptx_wait_all_async): Likewise. (GOMP_OFFLOAD_openacc_register_async_cleanup): Add async parameter, pass to event_add() call. * oacc-host.c (host_openacc_register_async_cleanup): Add 'int async' parameter. * oacc-mem.c (gomp_acc_remove_pointer): Adjust async case to call openacc.register_async_cleanup_func() hook. * oacc-parallel.c (GOACC_parallel_keyed): Likewise. * target.c (gomp_copy_from_async): Delete function. (gomp_map_vars): Remove async_refcount. (gomp_unmap_vars): Likewise. (gomp_load_image_to_device): Likewise. (omp_target_associate_ptr): Likewise. * libgomp.h (struct splay_tree_key_s): Remove async_refcount. (acc_dispatch_t.register_async_cleanup_func): Add int parameter. (gomp_copy_from_async): Remove. Index: plugin/plugin-nvptx.c === --- plugin/plugin-nvptx.c (revision 230796) +++ plugin/plugin-nvptx.c (working copy) @@ -310,6 +310,7 @@ struct ptx_event int type; void *addr; int ord; + int val; struct ptx_event *next; }; @@ -786,6 +787,7 @@ static void event_gc (bool memmap_lockable) { struct ptx_event *ptx_event = ptx_events; + struct ptx_event *async_cleanups = NULL; struct nvptx_thread *nvthd = nvptx_thread (); pthread_mutex_lock (_event_lock); @@ -803,6 +805,7 @@ event_gc (bool memmap_lockable) r = cuEventQuery (*e->evt); if (r == CUDA_SUCCESS) { + bool append_async = false; CUevent *te; te = e->evt; @@ -827,7 +830,7 @@ event_gc (bool memmap_lockable) if (!memmap_lockable) continue; - GOMP_PLUGIN_async_unmap_vars (e->addr); + append_async = true; } break; } @@ -835,6 +838,7 @@ event_gc (bool memmap_lockable) cuEventDestroy (*te); free ((void *)te); + /* Unlink 'e' from ptx_events list. */ if (ptx_events == e) ptx_events = ptx_events->next; else @@ -845,15 +849,31 @@ event_gc (bool memmap_lockable) e_->next = e_->next->next; } - free (e); + if (append_async) + { + e->next = async_cleanups; + async_cleanups = e; + } + else + free (e); } } pthread_mutex_unlock (_event_lock); + + /* We have to do these here, after ptx_event_lock is released. */ + while (async_cleanups) +{ + struct ptx_event *e = async_cleanups; + async_cleanups = async_cleanups->next; + + GOMP_PLUGIN_async_unmap_vars (e->addr, e->val); + free (e); +} } static void -event_add (enum ptx_event_type type, CUevent *e, void *h) +event_add (enum ptx_event_type type, CUevent *e, void *h, int val) { struct ptx_event *ptx_event; struct nvptx_thread *nvthd = nvptx_thread (); @@ -866,6 +886,7 @@ static void ptx_event->evt = e; ptx_event->addr = h; ptx_event->ord = nvthd->ptx_dev->
[gomp4] Adjust Fortran OACC async lib test
Hi Thomas, this fix adds more acc_wait's to libgomp.oacc-fortran/lib-1[13].f90. For lib-12.f90, it's sort of a fix before we can resolve the issue of intended semantics for "wait+async". As for lib-13.f90, I believe these added acc_wait calls seem reasonable, since we can't immediately assume the async-launched parallels already completed there. Does this seem reasonable? Thanks, Chung-Lin * testsuite/libgomp.oacc-fortran/lib-12.f90 (main): Add acc_wait() after async parallel construct. * testsuite/libgomp.oacc-fortran/lib-13.f90 (main): Add acc_wait() calls after parallel construct launches. Index: libgomp.oacc-fortran/lib-12.f90 === --- libgomp.oacc-fortran/lib-12.f90 (revision 230719) +++ libgomp.oacc-fortran/lib-12.f90 (working copy) @@ -15,6 +15,8 @@ program main end do !$acc end parallel + call acc_wait (0) + call acc_wait_async (0, 1) if (acc_async_test (0) .neqv. .TRUE.) call abort Index: libgomp.oacc-fortran/lib-13.f90 === --- libgomp.oacc-fortran/lib-13.f90 (revision 230719) +++ libgomp.oacc-fortran/lib-13.f90 (working copy) @@ -21,6 +21,9 @@ program main end do !$acc end data + call acc_wait (1) + call acc_wait (2) + if (acc_async_test (1) .neqv. .TRUE.) call abort if (acc_async_test (2) .neqv. .TRUE.) call abort
[PATCH, C++] Wrap OpenACC wait in EXPR_STMT
The OpenACC wait directive is represented as a call to the runtime function "GOACC_wait" instead of a tree code. I am seeing when '#pragma acc wait' is using inside a template function, the CALL_EXPR to GOACC_wait is being silently ignored/removed during tsubst_expr(). I think the correct way to organize this is that the call should be inside an EXPR_STMT, so here's a patch to do that; basically remove the add_stmt() call from the shared c_finish_oacc_wait() code, and add add_stmt()/finish_expr_stmt() in the corresponding C/C++ parts. Tested with no regressions on trunk, okay to commit? Thanks, Chung-Lin * c-family/c-omp.c (c_finish_oacc_wait): Remove add_stmt() call. * c/c-parser.c (c_parser_oacc_wait): Add add_stmt() call. * cp/parser.c (cp_parser_oacc_wait): Add finish_expr_stmt() call. Index: c-family/c-omp.c === --- c-family/c-omp.c(revision 230703) +++ c-family/c-omp.c(working copy) @@ -63,7 +63,6 @@ c_finish_oacc_wait (location_t loc, tree parms, tr } stmt = build_call_expr_loc_vec (loc, stmt, args); - add_stmt (stmt); vec_free (args); Index: c/c-parser.c === --- c/c-parser.c(revision 230703) +++ c/c-parser.c(working copy) @@ -13886,6 +13886,7 @@ c_parser_oacc_wait (location_t loc, c_parser *pars strcpy (p_name, " wait"); clauses = c_parser_oacc_all_clauses (parser, OACC_WAIT_CLAUSE_MASK, p_name); stmt = c_finish_oacc_wait (loc, list, clauses); + add_stmt (stmt); return stmt; } Index: cp/parser.c === --- cp/parser.c (revision 230703) +++ cp/parser.c (working copy) @@ -34930,6 +34930,7 @@ cp_parser_oacc_wait (cp_parser *parser, cp_token * "#pragma acc wait", pragma_tok); stmt = c_finish_oacc_wait (loc, list, clauses); + stmt = finish_expr_stmt (stmt); return stmt; }
Re: [PATCH 1/3, libgomp] Adjust offload plugin interface for avoiding deadlock on exit
On 2015/9/25 上午 04:27, Ilya Verbin wrote: > On Thu, Aug 27, 2015 at 21:44:50 +0800, Chung-Lin Tang wrote: >> We've discovered that, for several of the libgomp plugin interface routines, >> if the target specific routine calls exit() (usually upon a fatal condition), >> deadlock ensues. We found this using nvptx, but it's possible on intelmic as >> well. >> >> This is due to many of the plugin routines are called with the device lock >> held, >> and when exit() is called inside the plugin code, the GOMP_unregister_var() >> destructor >> tries to iterate through and acquire all device locks to cleanup. Since we >> already hold >> one of the device locks, this just gets stuck. Also because gomp_mutex_t is >> a >> simple futex based lock implementation (instead of pthreads), we don't have a >> trylock mechanism to use either. >> >> So this patch tries to alleviate this problem by changing the plugin >> interface; >> the plugin routines that are called while holding the device lock are >> adjusted >> to assume to never fatal exit, but return a value back to libgomp proper to >> indicate execution results. The core libgomp code then may unlock and call >> gomp_fatal(). >> >> We believe this is the right route to solve the problem, since there's only >> two accel target plugins so far. Besides the nvptx plugin, I have made some >> effort >> to update the intelmic plugin as well, though it's not as thoroughly audited. >> Intel folks might want to further make sure your plugin code is free of this >> problem as well. >> >> This patch contains the libgomp proper changes. The nvptx and intelmic >> patches follow. >> I have tested the libgomp testsuite without regressions for both accel >> targets, is this >> okay for trunk? > > (I have no objections) > > However, in case of intelmic, these exit()s are just the tip of the iceberg, > because underlying liboffloadmic contains other exit()s at fatal errors. > And I don't know what to do with such deadlocks. > > -- Ilya Yes, I think I saw more things to adjust wrt this issue within liboffloadmic, though I hope this plugin interface change can set things ready. And ping again, for the libgomp proper changes. Thanks, Chung-Lin
Re: [PATCH, libgomp] PR 67141, uninitialized acc_device_lock mutex
On 2015/9/18 04:02 PM, Jakub Jelinek wrote: > On Fri, Sep 18, 2015 at 03:41:30PM +0800, Chung-Lin Tang wrote: >> this patch fixes the uninitialized acc_device_lock mutex situation >> reported in PR 67141. The patch attached on the bugzilla page >> tries to solve it by constructor priorities, which we think will >> probably be less manageable in general. >> >> This patch changes goacc_host_init() to be called from >> goacc_runtime_initialize() instead, thereby ensuring the init order. >> libgomp testsuite was re-run without regressions, okay for trunk? >> >> Thanks, >> Chung-Lin >> >> 2015-09-18 Chung-Lin Tang <clt...@codesourcery.com> >> >> PR libgomp/67141 >> > > No vertical space in between PR line and subsequent entries. > >> * oacc-int.h (goacc_host_init): Add declaration. >> * oacc-host.c (goacc_host_init): Remove static and >> constructor attribute > > Full stop at the end of entry. > >> * oacc-init.c (goacc_runtime_initialize): Call goacc_host_init() >> at end. > > The patch is ok. Though, perhaps as a follow-up, I think I'd prefer getting > rid of pthread_key_create (_cleanup_key, goacc_destroy_thread);, > it is wasteful if we do the same thing in initialize_team. As the > goacc_tls_data pointer is __thread anyway, I think just putting it into > struct gomp_thread, arranging for init_team to be called from the env.c > ctor and from the team TLS destructor call also some oacc freeing if > the goacc_tls_data pointer is non-NULL (perhaps with __builtin_expect > unlikely). > > Jakub Committed, thanks for the review. I believe this patch is also needed for 5.x, okay for that branch as well? Thanks, Chung-Lin
[PATCH, nios2] Fix to nios2_legitimize_address
Nios II Linux had a bad TLS relocation generated, exposed by the test case for PR 65771. A fix for this in nios2_legitimize_address() was tested and applied. Chung-Lin 2015-09-22 Chung-Lin Tang <clt...@codesourcery.com> * config/nios2/nios2.c (nios2_legitimize_address): When handling 'reg + reloc' cases, allow first operand to be non-REG, and use force_reg() to enforce address pattern. Index: nios2.c === --- nios2.c (revision 227931) +++ nios2.c (working copy) @@ -2265,15 +2265,15 @@ Which will be output as '%tls_le(var+48)(r23)' in assembly. */ if (GET_CODE (x) == PLUS - && GET_CODE (XEXP (x, 0)) == REG && GET_CODE (XEXP (x, 1)) == CONST) { - rtx unspec, offset, reg = XEXP (x, 0); + rtx unspec, offset; split_const (XEXP (x, 1), , ); if (GET_CODE (unspec) == UNSPEC && !nios2_large_offset_p (XINT (unspec, 1)) && offset != const0_rtx) { + rtx reg = force_reg (Pmode, XEXP (x, 0)); unspec = copy_rtx (unspec); XVECEXP (unspec, 0, 0) = plus_constant (Pmode, XVECEXP (unspec, 0, 0), INTVAL (offset));
Re: [PATCH 1/3, libgomp] Adjust offload plugin interface for avoiding deadlock on exit
Ping x2. On 2015/9/9 04:08 PM, Chung-Lin Tang wrote: > Ping. > > On 2015/8/27 09:44 PM, Chung-Lin Tang wrote: >> We've discovered that, for several of the libgomp plugin interface routines, >> if the target specific routine calls exit() (usually upon a fatal condition), >> deadlock ensues. We found this using nvptx, but it's possible on intelmic as >> well. >> >> This is due to many of the plugin routines are called with the device lock >> held, >> and when exit() is called inside the plugin code, the GOMP_unregister_var() >> destructor >> tries to iterate through and acquire all device locks to cleanup. Since we >> already hold >> one of the device locks, this just gets stuck. Also because gomp_mutex_t is >> a >> simple futex based lock implementation (instead of pthreads), we don't have a >> trylock mechanism to use either. >> >> So this patch tries to alleviate this problem by changing the plugin >> interface; >> the plugin routines that are called while holding the device lock are >> adjusted >> to assume to never fatal exit, but return a value back to libgomp proper to >> indicate execution results. The core libgomp code then may unlock and call >> gomp_fatal(). >> >> We believe this is the right route to solve the problem, since there's only >> two accel target plugins so far. Besides the nvptx plugin, I have made some >> effort >> to update the intelmic plugin as well, though it's not as thoroughly audited. >> Intel folks might want to further make sure your plugin code is free of this >> problem as well. >> >> This patch contains the libgomp proper changes. The nvptx and intelmic >> patches follow. >> I have tested the libgomp testsuite without regressions for both accel >> targets, is this >> okay for trunk? >> >> Thanks, >> Chung-Lin >> >> 2015-08-27 Chung-Lin Tang <clt...@codesourcery.com> >> >> * oacc-host.c (host_init_device): Change return type to bool. >> (host_fini_device): Likewise. >> (host_dev2host): Likewise. >> (host_host2dev): Likewise. >> (host_free): Likewise. >> (host_alloc): Change return type to bool, change to use out >> parameter to return allocated pointer. >> * oacc-mem.c (acc_malloc): Adjust plugin hook declaration change, >> handle fatal error. >> (acc_free): Likewise. >> (acc_memcpy_to_device): Likewise. >> (acc_memcpy_from_device): Likewise. >> * oacc-init.c (acc_init_1): Handle gomp_init_device return code, >> handle fatal error. >> (acc_set_device_type): Likewise. >> (acc_set_device_num): Likewise. >> * target.c (gomp_map_vars): Adjust alloc_func plugin hook call, >> add device unlock, handle fatal error. >> (gomp_unmap_tgt): Change return type to bool, adjust free_func >> plugin call. >> (gomp_copy_from_async): Handle dev2host_func return code, handle >> fatal error. >> (gomp_unmap_vars): Likewise. >> (gomp_init_device): Change return type to bool, adjust call to >> init_device_func plugin hook. >> (GOMP_target): Adjust call to gomp_init_device, handle fatal error. >> (GOMP_target_data): Likewise. >> (GOMP_target_update): Likewise. >> * libgomp.h (gomp_device_descr.init_device_func): Change return >> type to bool. >> (gomp_device_descr.fini_device_func): Likewise. >> (gomp_device_descr.free_func): Likewise. >> (gomp_device_descr.dev2host_func): Likewise. >> (gomp_device_descr.host2dev_func) Likewise. >> (gomp_device_descr.alloc_func): Change return >> type to bool, use out parameter to return pointer. >> (gomp_init_device): Change return >> type to bool. >> >
[PATCH, libgomp] PR 67141, uninitialized acc_device_lock mutex
Hi, this patch fixes the uninitialized acc_device_lock mutex situation reported in PR 67141. The patch attached on the bugzilla page tries to solve it by constructor priorities, which we think will probably be less manageable in general. This patch changes goacc_host_init() to be called from goacc_runtime_initialize() instead, thereby ensuring the init order. libgomp testsuite was re-run without regressions, okay for trunk? Thanks, Chung-Lin 2015-09-18 Chung-Lin Tang <clt...@codesourcery.com> PR libgomp/67141 * oacc-int.h (goacc_host_init): Add declaration. * oacc-host.c (goacc_host_init): Remove static and constructor attribute * oacc-init.c (goacc_runtime_initialize): Call goacc_host_init() at end. Index: oacc-host.c === --- oacc-host.c (revision 227895) +++ oacc-host.c (working copy) @@ -256,7 +256,7 @@ static struct gomp_device_descr host_dispatch = }; /* Initialize and register this device type. */ -static __attribute__ ((constructor)) void +void goacc_host_init (void) { gomp_mutex_init (_dispatch.lock); Index: oacc-int.h === --- oacc-int.h (revision 227895) +++ oacc-int.h (working copy) @@ -97,6 +97,7 @@ void goacc_runtime_initialize (void); void goacc_save_and_set_bind (acc_device_t); void goacc_restore_bind (void); void goacc_lazy_initialize (void); +void goacc_host_init (void); #ifdef HAVE_ATTRIBUTE_VISIBILITY # pragma GCC visibility pop Index: oacc-init.c === --- oacc-init.c (revision 227895) +++ oacc-init.c (working copy) @@ -644,6 +644,9 @@ goacc_runtime_initialize (void) goacc_threads = NULL; gomp_mutex_init (_thread_lock); + + /* Initialize and register the 'host' device type. */ + goacc_host_init (); } /* Compiler helper functions */
Re: [gomp4] force global locks for nvptx targets
On 2015/9/9 04:02 AM, Cesar Philippidis wrote: > This patch forces GOACC_LOCK to use locks in global memory regardless if > the lock us for a worker or a gang. We were using a shared memory for > worker locks, but we ran into an issue with that would sporadically > involve deadlocks in worker reductions. We're still investigating that > issue, but for the time being, global locks appear to work albeit with a > lock contention penalty. > > I've applied this patch to gomp-4_0-branch. > > Cesar > Fixed typo, committed as obvious. Chung-Lin 2015-09-09 Chung-Lin Tang <clt...@codesourcery.com> * config/nvptx/nvptx.c (nvptx_xform_lock): Correct typo of variable 'force_global_locks'. Index: config/nvptx/nvptx.c === --- config/nvptx/nvptx.c(revision 227582) +++ config/nvptx/nvptx.c(working copy) @@ -3744,7 +3744,7 @@ nvptx_xform_lock (gimple stmt, const int *ARG_UNUS return mode > GOMP_DIM_WORKER; case IFN_GOACC_LOCK_INIT: - return force_global_lock || mode != GOMP_DIM_WORKER; + return force_global_locks || mode != GOMP_DIM_WORKER; default: gcc_unreachable(); }
Re: [PATCH 1/3, libgomp] Adjust offload plugin interface for avoiding deadlock on exit
Ping. On 2015/8/27 09:44 PM, Chung-Lin Tang wrote: > We've discovered that, for several of the libgomp plugin interface routines, > if the target specific routine calls exit() (usually upon a fatal condition), > deadlock ensues. We found this using nvptx, but it's possible on intelmic as > well. > > This is due to many of the plugin routines are called with the device lock > held, > and when exit() is called inside the plugin code, the GOMP_unregister_var() > destructor > tries to iterate through and acquire all device locks to cleanup. Since we > already hold > one of the device locks, this just gets stuck. Also because gomp_mutex_t is a > simple futex based lock implementation (instead of pthreads), we don't have a > trylock mechanism to use either. > > So this patch tries to alleviate this problem by changing the plugin > interface; > the plugin routines that are called while holding the device lock are adjusted > to assume to never fatal exit, but return a value back to libgomp proper to > indicate execution results. The core libgomp code then may unlock and call > gomp_fatal(). > > We believe this is the right route to solve the problem, since there's only > two accel target plugins so far. Besides the nvptx plugin, I have made some > effort > to update the intelmic plugin as well, though it's not as thoroughly audited. > Intel folks might want to further make sure your plugin code is free of this > problem as well. > > This patch contains the libgomp proper changes. The nvptx and intelmic > patches follow. > I have tested the libgomp testsuite without regressions for both accel > targets, is this > okay for trunk? > > Thanks, > Chung-Lin > > 2015-08-27 Chung-Lin Tang <clt...@codesourcery.com> > > * oacc-host.c (host_init_device): Change return type to bool. > (host_fini_device): Likewise. > (host_dev2host): Likewise. > (host_host2dev): Likewise. > (host_free): Likewise. > (host_alloc): Change return type to bool, change to use out > parameter to return allocated pointer. > * oacc-mem.c (acc_malloc): Adjust plugin hook declaration change, > handle fatal error. > (acc_free): Likewise. > (acc_memcpy_to_device): Likewise. > (acc_memcpy_from_device): Likewise. > * oacc-init.c (acc_init_1): Handle gomp_init_device return code, > handle fatal error. > (acc_set_device_type): Likewise. > (acc_set_device_num): Likewise. > * target.c (gomp_map_vars): Adjust alloc_func plugin hook call, > add device unlock, handle fatal error. > (gomp_unmap_tgt): Change return type to bool, adjust free_func > plugin call. > (gomp_copy_from_async): Handle dev2host_func return code, handle > fatal error. > (gomp_unmap_vars): Likewise. > (gomp_init_device): Change return type to bool, adjust call to > init_device_func plugin hook. > (GOMP_target): Adjust call to gomp_init_device, handle fatal error. > (GOMP_target_data): Likewise. > (GOMP_target_update): Likewise. > * libgomp.h (gomp_device_descr.init_device_func): Change return > type to bool. > (gomp_device_descr.fini_device_func): Likewise. > (gomp_device_descr.free_func): Likewise. > (gomp_device_descr.dev2host_func): Likewise. > (gomp_device_descr.host2dev_func) Likewise. > (gomp_device_descr.alloc_func): Change return > type to bool, use out parameter to return pointer. > (gomp_init_device): Change return > type to bool. >
[PATCH] Propagate -fdiagnostics-* options in lto-wrapper
Hi, Currently most non-target specific options are skipped when crossing the LTO/offload processing border, however since there are still quite a number of warning calls in many target backends, it makes sense to save and propagate the associated options, to preserve consistency in warning behavior. For example, currently: $ x86_64-pc-linux-gnu-gcc -fopenacc test.c -fno-diagnostics-show-caret y.c: In function 'main._omp_fn.0': y.c:6:11: warning: using num_workers (32), ignoring 500 #pragma acc parallel num_workers(500) ^ (note: this warning message is triggered by nvptx code currently only on gomp-4_0-branch, but illustrates the point) The caret stills shows, because -fno-diagnostics-show-caret does not reach the accel compiler. -flto should also have a similar issue. The attached patch allows a series of -fdiagnostics-* options to be propagated by lto-wrapper. I've tested this patch without regressions, is this okay for trunk? Thanks, Chung-Lin 2015-09-06 Chung-Lin Tang <clt...@codesourcery.com> * lto-wrapper.c (merge_and_complain): Add OPT_fdiagnostics_show_caret, OPT_fdiagnostics_show_option, OPT_fdiagnostics_show_location_, and OPT_fshow_column to handled saved option cases. (append_compiler_options): Do not skip the above added options. Index: lto-wrapper.c === --- lto-wrapper.c (revision 227508) +++ lto-wrapper.c (working copy) @@ -232,6 +232,10 @@ merge_and_complain (struct cl_decoded_option **dec break; /* Fallthru. */ + case OPT_fdiagnostics_show_caret: + case OPT_fdiagnostics_show_option: + case OPT_fdiagnostics_show_location_: + case OPT_fshow_column: case OPT_fPIC: case OPT_fpic: case OPT_fPIE: @@ -479,6 +483,10 @@ append_compiler_options (obstack *argv_obstack, st on any CL_TARGET flag and a few selected others. */ switch (option->opt_index) { + case OPT_fdiagnostics_show_caret: + case OPT_fdiagnostics_show_option: + case OPT_fdiagnostics_show_location_: + case OPT_fshow_column: case OPT_fPIC: case OPT_fpic: case OPT_fPIE:
[PATCH 1/3, libgomp] Adjust offload plugin interface for avoiding deadlock on exit
We've discovered that, for several of the libgomp plugin interface routines, if the target specific routine calls exit() (usually upon a fatal condition), deadlock ensues. We found this using nvptx, but it's possible on intelmic as well. This is due to many of the plugin routines are called with the device lock held, and when exit() is called inside the plugin code, the GOMP_unregister_var() destructor tries to iterate through and acquire all device locks to cleanup. Since we already hold one of the device locks, this just gets stuck. Also because gomp_mutex_t is a simple futex based lock implementation (instead of pthreads), we don't have a trylock mechanism to use either. So this patch tries to alleviate this problem by changing the plugin interface; the plugin routines that are called while holding the device lock are adjusted to assume to never fatal exit, but return a value back to libgomp proper to indicate execution results. The core libgomp code then may unlock and call gomp_fatal(). We believe this is the right route to solve the problem, since there's only two accel target plugins so far. Besides the nvptx plugin, I have made some effort to update the intelmic plugin as well, though it's not as thoroughly audited. Intel folks might want to further make sure your plugin code is free of this problem as well. This patch contains the libgomp proper changes. The nvptx and intelmic patches follow. I have tested the libgomp testsuite without regressions for both accel targets, is this okay for trunk? Thanks, Chung-Lin 2015-08-27 Chung-Lin Tang clt...@codesourcery.com * oacc-host.c (host_init_device): Change return type to bool. (host_fini_device): Likewise. (host_dev2host): Likewise. (host_host2dev): Likewise. (host_free): Likewise. (host_alloc): Change return type to bool, change to use out parameter to return allocated pointer. * oacc-mem.c (acc_malloc): Adjust plugin hook declaration change, handle fatal error. (acc_free): Likewise. (acc_memcpy_to_device): Likewise. (acc_memcpy_from_device): Likewise. * oacc-init.c (acc_init_1): Handle gomp_init_device return code, handle fatal error. (acc_set_device_type): Likewise. (acc_set_device_num): Likewise. * target.c (gomp_map_vars): Adjust alloc_func plugin hook call, add device unlock, handle fatal error. (gomp_unmap_tgt): Change return type to bool, adjust free_func plugin call. (gomp_copy_from_async): Handle dev2host_func return code, handle fatal error. (gomp_unmap_vars): Likewise. (gomp_init_device): Change return type to bool, adjust call to init_device_func plugin hook. (GOMP_target): Adjust call to gomp_init_device, handle fatal error. (GOMP_target_data): Likewise. (GOMP_target_update): Likewise. * libgomp.h (gomp_device_descr.init_device_func): Change return type to bool. (gomp_device_descr.fini_device_func): Likewise. (gomp_device_descr.free_func): Likewise. (gomp_device_descr.dev2host_func): Likewise. (gomp_device_descr.host2dev_func) Likewise. (gomp_device_descr.alloc_func): Change return type to bool, use out parameter to return pointer. (gomp_init_device): Change return type to bool. Index: libgomp/libgomp.h === --- libgomp/libgomp.h (revision 227257) +++ libgomp/libgomp.h (working copy) @@ -746,15 +746,15 @@ struct gomp_device_descr unsigned int (*get_caps_func) (void); int (*get_type_func) (void); int (*get_num_devices_func) (void); - void (*init_device_func) (int); - void (*fini_device_func) (int); + bool (*init_device_func) (int); + bool (*fini_device_func) (int); unsigned (*version_func) (void); int (*load_image_func) (int, unsigned, const void *, struct addr_pair **); void (*unload_image_func) (int, unsigned, const void *); - void *(*alloc_func) (int, size_t); - void (*free_func) (int, void *); - void *(*dev2host_func) (int, void *, const void *, size_t); - void *(*host2dev_func) (int, void *, const void *, size_t); + bool (*alloc_func) (int, size_t, void**); + bool (*free_func) (int, void *); + bool (*dev2host_func) (int, void *, const void *, size_t); + bool (*host2dev_func) (int, void *, const void *, size_t); void (*run_func) (int, void *, void *); /* Splay tree containing information about mapped memory regions. */ @@ -780,7 +780,7 @@ extern struct target_mem_desc *gomp_map_vars (stru size_t *, void *, bool, bool); extern void gomp_copy_from_async (struct target_mem_desc *); extern void gomp_unmap_vars (struct target_mem_desc *, bool); -extern void gomp_init_device (struct gomp_device_descr *); +extern bool gomp_init_device (struct gomp_device_descr *); extern void gomp_free_memmap (struct splay_tree_s
[PATCH 2/3, libgomp] nvptx plugin parts
These are the nvptx plugin specific parts. Chung-Lin * plugin/plugin-nvptx.c (CUDA_CALL_ERET): New convenience macro. (CUDA_CALL): Likewise. (CUDA_CALL_ASSERT): Likewise. (map_init): Change return type to bool, use CUDA_CALL* macros. (map_fini): Likewise. (init_streams_for_device): Change return type to bool, adjust call to map_init. (fini_streams_for_device): Change return type to bool, adjust call to map_fini. (select_stream_for_async): Release stream_lock before calls to GOMP_PLUGIN_fatal, adjust call to map_init. (nvptx_init): Use CUDA_CALL* macros. (nvptx_attach_host_thread_to_device): Change return type to bool, use CUDA_CALL* macros. (nvptx_open_device): Use CUDA_CALL* macros. (nvptx_close_device): Change return type to bool, use CUDA_CALL* macros. (nvptx_get_num_devices): Use CUDA_CALL* macros. (link_ptx): Change return type to bool, use CUDA_CALL* macros. (nvptx_exec): Use CUDA_CALL* macros. (nvptx_alloc): Change return type to bool, use CUDA_CALL* macros, change to use out parameter to return allocated pointer. (nvptx_free): Change return type to bool, use CUDA_CALL* macros. (nvptx_host2dev): Likewise. (nvptx_dev2host): Likewise. (nvptx_wait): Use CUDA_CALL* macros. (nvptx_wait_async): Likewise. (nvptx_wait_all): Likewise. (nvptx_wait_all_async): Likewise. (nvptx_set_cuda_stream): Adjust order of stream_lock acquire, use CUDA_CALL* macros, adjust call to map_fini. (GOMP_OFFLOAD_init_device): Change return type to bool, adjust code accordingly. (GOMP_OFFLOAD_fini_device): Likewise. (GOMP_OFFLOAD_load_image): Adjust calls to nvptx_attach_host_thread_to_device/link_ptx to handle errors, use CUDA_CALL* macros. (GOMP_OFFLOAD_alloc): Change return type to bool, adjust calls to code to handle error return. (GOMP_OFFLOAD_free): Likewise. (GOMP_OFFLOAD_dev2host): Likewise. (GOMP_OFFLOAD_host2dev): Likewise. (GOMP_OFFLOAD_openacc_register_async_cleanup): Use CUDA_CALL* macros. (GOMP_OFFLOAD_openacc_create_thread_data): Likewise. Index: libgomp/plugin/plugin-nvptx.c === --- libgomp/plugin/plugin-nvptx.c (revision 227257) +++ libgomp/plugin/plugin-nvptx.c (working copy) @@ -127,6 +127,34 @@ cuda_error (CUresult r) return errmsg; } +/* Convenience macros for the frequently used CUDA library call and + error handling sequence. This does not capture all the cases we + use in this file, but is common enough. */ + +#define CUDA_CALL_ERET(ERET, FN, ...) \ + do { \ +unsigned __r = FN (__VA_ARGS__); \ +if (__r != CUDA_SUCCESS) \ + { \ + GOMP_PLUGIN_error (#FN error: %s, \ + cuda_error (__r)); \ + return ERET;\ + } \ + } while (0) + +#define CUDA_CALL(FN, ...) \ + CUDA_CALL_ERET (false, (FN), __VA_ARGS__) + +#define CUDA_CALL_ASSERT(FN, ...) \ + do { \ +unsigned __r = FN (__VA_ARGS__); \ +if (__r != CUDA_SUCCESS) \ + { \ + GOMP_PLUGIN_fatal (#FN error: %s, \ + cuda_error (__r)); \ + } \ + } while (0) + static unsigned int instantiated_devices = 0; static pthread_mutex_t ptx_dev_lock = PTHREAD_MUTEX_INITIALIZER; @@ -162,25 +190,18 @@ struct map charmappings[0]; }; -static void +static bool map_init (struct ptx_stream *s) { - CUresult r; - int size = getpagesize (); assert (s); assert (!s-d); assert (!s-h); - r = cuMemAllocHost (s-h, size); - if (r != CUDA_SUCCESS) -GOMP_PLUGIN_fatal (cuMemAllocHost error: %s, cuda_error (r)); + CUDA_CALL (cuMemAllocHost, s-h, size); + CUDA_CALL (cuMemHostGetDevicePointer, s-d, s-h, 0); - r = cuMemHostGetDevicePointer (s-d, s-h, 0); - if (r != CUDA_SUCCESS) -GOMP_PLUGIN_fatal (cuMemHostGetDevicePointer error: %s, cuda_error (r)); - assert (s-h); s-h_begin = s-h; @@ -189,16 +210,14 @@ map_init (struct ptx_stream *s) assert (s-h_next); assert (s-h_end); + return true; } -static void +static bool map_fini (struct ptx_stream *s) { - CUresult r; - - r = cuMemFreeHost (s-h); - if (r != CUDA_SUCCESS) -GOMP_PLUGIN_fatal (cuMemFreeHost error: %s, cuda_error (r)); + CUDA_CALL (cuMemFreeHost, s-h); + return true; } static void @@ -359,7 +378,7 @@ nvptx_thread (void) return (struct nvptx_thread *) GOMP_PLUGIN_acc_thread (); } -static void +static bool init_streams_for_device (struct ptx_device *ptx_dev, int concurrency) { int i; @@ -371,9 +390,10 @@ init_streams_for_device (struct ptx_device *ptx_de null_stream-multithreaded = true; null_stream-d = (CUdeviceptr) NULL; null_stream-h = NULL; - map_init (null_stream); - ptx_dev-null_stream = null_stream; + if (!map_init (null_stream))
[PATCH 3/3, libgomp] intelmic specific parts
These are the intelmic plugin specific parts (actually beneath liboffloadmic instead of libgomp). The changes are basically to expose the return value of offload() back to libgomp. I only checked parts of the plugin, it appears that there may still be code in the liboffloadmic runtime that can call exit() while holding the lock, so Intel folks might want to audit more thoroughly later. Chung-Lin * plugin/libgomp-plugin-intelmic.cpp (offload): Change return type to bool, adjust return code. (GOMP_OFFLOAD_init_device): Likewise. (GOMP_OFFLOAD_fini_device): Likewise. (get_target_table): Likewise. (offload_image): Likwise. (GOMP_OFFLOAD_load_image): Adjust call to offload_image(), change exit() to return error. (GOMP_OFFLOAD_alloc): Change return type to bool, change to use out parameter to return allocated pointer. (GOMP_OFFLOAD_free): Change return type to bool, adjust return code. (GOMP_OFFLOAD_host2dev): Likewise. (GOMP_OFFLOAD_dev2host): Likewise. Index: liboffloadmic/plugin/libgomp-plugin-intelmic.cpp === --- liboffloadmic/plugin/libgomp-plugin-intelmic.cpp (revision 227257) +++ liboffloadmic/plugin/libgomp-plugin-intelmic.cpp (working copy) @@ -184,17 +184,18 @@ GOMP_OFFLOAD_get_num_devices (void) return num_devices; } -static void +static bool offload (const char *file, uint64_t line, int device, const char *name, int num_vars, VarDesc *vars, VarDesc2 *vars2) { OFFLOAD ofld = __offload_target_acquire1 (device, file, line); if (ofld) -__offload_offload1 (ofld, name, 0, num_vars, vars, vars2, 0, NULL, NULL); +return __offload_offload1 (ofld, name, 0, num_vars, vars, vars2, 0, + NULL, NULL); else { - fprintf (stderr, %s:%d: Offload target acquire failed\n, file, line); - exit (1); + GOMP_PLUGIN_error (%s:%d: Offload target acquire failed\n, file, line); + return false; } } @@ -206,24 +207,25 @@ register_main_image () /* liboffloadmic loads and runs offload_target_main on all available devices during a first call to offload (). */ -extern C void +extern C bool GOMP_OFFLOAD_init_device (int device) { TRACE (); pthread_once (main_image_is_registered, register_main_image); - offload (__FILE__, __LINE__, device, __offload_target_init_proc, 0, - NULL, NULL); + return offload (__FILE__, __LINE__, device, __offload_target_init_proc, 0, + NULL, NULL); } -extern C void +extern C bool GOMP_OFFLOAD_fini_device (int device) { TRACE (); /* Unreachable for GOMP_OFFLOAD_CAP_OPENMP_400. */ abort (); + return true; } -static void +static bool get_target_table (int device, int num_funcs, int num_vars, void **table) { VarDesc vd1[2] = { vd_tgt2host, vd_tgt2host }; @@ -233,8 +235,9 @@ get_target_table (int device, int num_funcs, int vd1[1].size = sizeof (num_vars); VarDesc2 vd1g[2] = { { num_funcs, 0 }, { num_vars, 0 } }; - offload (__FILE__, __LINE__, device, __offload_target_table_p1, 2, - vd1, vd1g); + if (!offload (__FILE__, __LINE__, device, __offload_target_table_p1, 2, + vd1, vd1g)) +return false; int table_size = num_funcs + 2 * num_vars; if (table_size 0) @@ -247,15 +250,16 @@ get_target_table (int device, int num_funcs, int vd2.size = table_size * sizeof (void *); VarDesc2 vd2g = { table, 0 }; - offload (__FILE__, __LINE__, device, __offload_target_table_p2, 1, - vd2, vd2g); + return offload (__FILE__, __LINE__, device, __offload_target_table_p2, + 1, vd2, vd2g); } + return true; } /* Offload TARGET_IMAGE to all available devices and fill address_table with corresponding target addresses. */ -static void +static bool offload_image (const void *target_image) { struct TargetImage { @@ -277,8 +281,8 @@ offload_image (const void *target_image) + image_size); if (!image) { - fprintf (stderr, %s: Can't allocate memory\n, __FILE__); - exit (1); + GOMP_PLUGIN_error (%s: Can't allocate memory\n, __FILE__); + return false; } image-size = image_size; @@ -291,13 +295,14 @@ offload_image (const void *target_image) /* Receive tables for target_image from all devices. */ DevAddrVect dev_table; + bool ret = true; for (int dev = 0; dev num_devices; dev++) { int num_funcs = 0; int num_vars = 0; void **table = NULL; - get_target_table (dev, num_funcs, num_vars, table); + ret = get_target_table (dev, num_funcs, num_vars, table); AddrVect curr_dev_table; @@ -326,6 +331,7 @@ offload_image (const void *target_image) address_table-insert (std::make_pair (target_image, dev_table)); free (image); + return ret; } /* Return the libgomp version number we're compatible with. There is @@ -351,15 +357,19 @@ GOMP_OFFLOAD_load_image (int device, const
[PATCH, nios2] Remove unused header from libgcc linux-atomic.c
The asm/unistd.h header was used back when Nios II Linux used a syscall cmpxchg, long since removed and actually never got into the FSF trunk. Patch removes the #include, and the following error code #defines which are all no longer used. Committed. Chung-Lin 2015-07-22 Chung-Lin Tang clt...@codesourcery.com * config/nios2/linux-atomic.c (asm/unistd.h): Remove #include. (EFAULT,EBUSY,ENOSYS): Delete unused #defines. Index: config/nios2/linux-atomic.c === --- config/nios2/linux-atomic.c (revision 226061) +++ config/nios2/linux-atomic.c (working copy) @@ -20,11 +20,6 @@ a copy of the GCC Runtime Library Exception along see the files COPYING3 and COPYING.RUNTIME respectively. If not, see http://www.gnu.org/licenses/. */ -#include asm/unistd.h -#define EFAULT 14 -#define EBUSY 16 -#define ENOSYS 38 - /* We implement byte, short and int versions of each atomic operation using the kernel helper defined below. There is no support for 64-bit operations yet. */
Re: [PATCH, gomp4] Propagate independent clause for OpenACC kernels pass
On 15/7/14 3:00 PM, Jakub Jelinek wrote: On Tue, Jul 14, 2015 at 01:46:04PM +0800, Chung-Lin Tang wrote: this patch provides a 'bool independent' field in struct loop, which will be switched on by an independent clause in a #pragma acc loop directive. I assume you'll be wiring it to the kernels parloops pass in a followup patch. Note: there are already a few other similar fields in struct loop, namely 'safelen' and 'can_be_parallel', used by OMP simd safelen and GRAPHITE respectively. The intention and/or setting of these fields are all a bit different, so I've decided to add a new bool for OpenACC. How is it different though? Can you cite exact definition of the independent clause vs. safelen (set to INT_MAX)? The OpenMP definition is: A SIMD loop has logical iterations numbered 0,1,...,N-1 where N is the number of loop iterations, and the logical numbering denotes the sequence in which the iterations would be executed if the associated loop(s) were executed with no SIMD instructions. If the safelen clause is used then no two iterations executed concurrently with SIMD instructions can have a greater distance in the logical iteration space than its value. ... Lexical forward dependencies in the iterations of the original loop must be preserved within each SIMD chunk. The wording of OpenACC independent is more simple: ... the independent clause tells the implementation that the iterations of this loop are data-independent with respect to each other. -- OpenACC spec 2.7.9 I would say this implies even more relaxed conditions than OpenMP simd safelen, essentially saying that the compiler doesn't even need dependence analysis; just assume independence of iterations. So e.g. safelen = 32 means for PTX you can safely implement it by running up to 32 consecutive iterations by all threads in the warp (assuming code that for some reason must be run by a single thread (e.g. calls to functions that are marked so that they expect to be run by the first thread in a warp initially) is run sequentially by increasing iterator), but it doesn't mean the iterations have no dependencies in between them whatsoever (see the above note about lexical forward dependencies), so you can't parallelize it by assigning different iterations to different threads outside of warp (or pthread_create created threads). So if OpenACC independent means there are no dependencies in between iterations, the OpenMP counterpart here is #pragma omp for simd schedule (auto) or #pragma omp distribute parallel for simd schedule (auto). schedule(auto) appears to correspond to the OpenACC 'auto' clause, or what is implied in a kernels compute construct, but I'm not sure it implies no dependencies between iterations? Putting aside the semantic issues, as of currently safelen0 turns on a certain amount of vectorization code that we are not currently using (and not likely at all for nvptx). Right now, we're just trying to pass the new flag to a kernels tree-parloops based pass. Maybe this can all be reconciled later in a more precise way, e.g. have flags that correspond specifically to phases of internal compiler passes (and selected by needs of the accel target), instead of ones that are sort of associated with high-level language features. Chung-Lin
[PATCH, gomp4] Propagate independent clause for OpenACC kernels pass
Hi Tom, this patch provides a 'bool independent' field in struct loop, which will be switched on by an independent clause in a #pragma acc loop directive. I assume you'll be wiring it to the kernels parloops pass in a followup patch. Note: there are already a few other similar fields in struct loop, namely 'safelen' and 'can_be_parallel', used by OMP simd safelen and GRAPHITE respectively. The intention and/or setting of these fields are all a bit different, so I've decided to add a new bool for OpenACC. Tested and committed to gomp-4_0-branch. Chung-Lin 2015-07-14 Chung-Lin Tang clt...@codesourcery.com * cfgloop.h (struct loop): Add 'bool marked_independent' field. * gimplify.c (gimplify_scan_omp_clauses): Keep OMP_CLAUSE_INDEPENDENT. * omp-low.c (struct omp_region): Add 'int kind' and 'bool independent' fields. (expand_omp_for): Set 'marked_independent' field for loop corresponding to region. (find_omp_for_region_data): New function. (find_omp_target_region_data): Set kind field. (build_omp_regions_1): Call find_omp_for_region_data() for GIMPLE_OMP_FOR statements. Index: cfgloop.h === --- cfgloop.h (revision 225758) +++ cfgloop.h (working copy) @@ -194,6 +194,10 @@ struct GTY ((chain_next (%h.next))) loop { /* True if the loop is part of an oacc kernels region. */ bool in_oacc_kernels_region; + /* True if loop is tagged as having independent iterations by user, + e.g. the OpenACC independent clause. */ + bool marked_independent; + /* For SIMD loops, this is a unique identifier of the loop, referenced by IFN_GOMP_SIMD_VF, IFN_GOMP_SIMD_LANE and IFN_GOMP_SIMD_LAST_LANE builtins. */ Index: gimplify.c === --- gimplify.c (revision 225758) +++ gimplify.c (working copy) @@ -6602,7 +6602,6 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_se break; case OMP_CLAUSE_DEVICE_RESIDENT: - case OMP_CLAUSE_INDEPENDENT: remove = true; break; @@ -6612,6 +6611,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_se case OMP_CLAUSE_COLLAPSE: case OMP_CLAUSE_AUTO: case OMP_CLAUSE_SEQ: + case OMP_CLAUSE_INDEPENDENT: case OMP_CLAUSE_MERGEABLE: case OMP_CLAUSE_PROC_BIND: case OMP_CLAUSE_SAFELEN: Index: omp-low.c === --- omp-low.c (revision 225758) +++ omp-low.c (working copy) @@ -136,8 +136,16 @@ struct omp_region /* True if this is nested inside an OpenACC kernels construct. */ bool inside_kernels_p; + /* Records a generic kind field. */ + int kind; + /* For an OpenACC loop, the level of parallelism requested. */ int gwv_this; + + /* For an OpenACC loop directive, true if has the 'independent' clause. */ + bool independent; + + tree broadcast_array; }; /* Context structure. Used to store information about each parallel @@ -8273,8 +8281,15 @@ expand_omp_for (struct omp_region *region, gimple loops_state_set (LOOPS_NEED_FIXUP); if (region-inside_kernels_p) -expand_omp_for_generic (region, fd, BUILT_IN_NONE, BUILT_IN_NONE, - inner_stmt); +{ + expand_omp_for_generic (region, fd, BUILT_IN_NONE, BUILT_IN_NONE, + inner_stmt); + if (region-independent region-cont-loop_father) + { + struct loop *loop = region-cont-loop_father; + loop-marked_independent = true; + } +} else if (gimple_omp_for_kind (fd.for_stmt) GF_OMP_FOR_SIMD) expand_omp_simd (region, fd); else if (gimple_omp_for_kind (fd.for_stmt) == GF_OMP_FOR_KIND_CILKFOR) @@ -9943,6 +9958,34 @@ find_omp_for_region_gwv (gimple stmt) return tmp; } +static void +find_omp_for_region_data (struct omp_region *region, gomp_for *stmt) +{ + region-gwv_this = find_omp_for_region_gwv (stmt); + region-kind = gimple_omp_for_kind (stmt); + + if (region-kind == GF_OMP_FOR_KIND_OACC_LOOP) +{ + struct omp_region *target_region = region-outer; + while (target_region + target_region-type != GIMPLE_OMP_TARGET) + target_region = target_region-outer; + if (!target_region) + return; + + tree clauses = gimple_omp_for_clauses (stmt); + + if (target_region-kind == GF_OMP_TARGET_KIND_OACC_PARALLEL + !find_omp_clause (clauses, OMP_CLAUSE_SEQ)) + /* In OpenACC parallel constructs, 'independent' is implied on all + loop directives without a 'seq' clause. */ + region-independent = true; + else if (target_region-kind == GF_OMP_TARGET_KIND_OACC_KERNELS + find_omp_clause (clauses, OMP_CLAUSE_INDEPENDENT)) + region-independent = true; +} +} + /* Fill in additional data for a region REGION associated with an OMP_TARGET STMT. */ @@ -9960,6 +10003,7 @@ find_omp_target_region_data (struct omp_region *re region-gwv_this |= OACC_LOOP_MASK (OACC_worker); if (find_omp_clause (clauses, OMP_CLAUSE_VECTOR_LENGTH
[gomp4] implicit firstprivate and other testcase fixes
This patch notices the index variable of an acc loop (internally an OMP_FOR) inside an OpenACC construct, and completes the implicit firstprivate behavior as described in the spec. The firstprivate clauses and FIXME in libgomp.oacc-c-c++-common/parallel-loop-2.h has also been removed together in the patch. Also a typo-bug in testcase libgomp.oacc-c-c++-common/reduction-4.c is also corrected, where reduction variable names are apparently wrong. Tested without regressions, and applied to gomp-4_0-branch. Chung-Lin 2015-07-01 Chung-Lin Tang clt...@codesourcery.com gcc/ * gimplify.c (gimplify_omp_for): For acc loops inside OpenACC constructs, notice the use of the index variable in the surrounding gimplify_omp_ctx. libgomp/ * testsuite/libgomp.oacc-c-c++-common/reduction-4.c (main): Correct the names of reduction variables in '' and '||' tests. * testsuite/libgomp.oacc-c-c++-common/parallel-loop-2.h: Remove uses of the firstprivate clause, remove FIXME comment. Index: gcc/gimplify.c === --- gcc/gimplify.c (revision 225248) +++ gcc/gimplify.c (working copy) @@ -7348,7 +7348,11 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p) else if (omp_is_private (gimplify_omp_ctxp, decl, 0)) omp_notice_variable (gimplify_omp_ctxp, decl, true); else - omp_add_variable (gimplify_omp_ctxp, decl, GOVD_PRIVATE | GOVD_SEEN); + { + if (ork == ORK_OACC gimplify_omp_ctxp-outer_context) + omp_notice_variable (gimplify_omp_ctxp-outer_context, decl, true); + omp_add_variable (gimplify_omp_ctxp, decl, GOVD_PRIVATE | GOVD_SEEN); + } /* If DECL is not a gimple register, create a temporary variable to act as an iteration counter. This is valid, since DECL cannot be Index: libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c === --- libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c (revision 225248) +++ libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c (working copy) @@ -59,14 +59,14 @@ main(void) lvresult = false; /* '' reductions. */ -#pragma acc parallel num_gangs (ng) copy (result) +#pragma acc parallel num_gangs (ng) copy (lresult) #pragma acc loop reduction (:lresult) gang for (i = 0; i n; i++) lresult = lresult (creal(result) creal(array[i])); /* Verify the reduction. */ for (i = 0; i n; i++) -lvresult = lresult (creal(result) creal(array[i])); +lvresult = lvresult (creal(result) creal(array[i])); if (lresult != lvresult) abort (); @@ -78,14 +78,14 @@ main(void) lvresult = false; /* '||' reductions. */ -#pragma acc parallel num_gangs (ng) copy (result) +#pragma acc parallel num_gangs (ng) copy (lresult) #pragma acc loop reduction (||:lresult) gang for (i = 0; i n; i++) lresult = lresult || (creal(result) creal(array[i])); /* Verify the reduction. */ for (i = 0; i n; i++) -lvresult = lresult || (creal(result) creal(array[i])); +lvresult = lvresult || (creal(result) creal(array[i])); if (lresult != lvresult) abort (); Index: libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-loop-2.h === --- libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-loop-2.h (revision 225248) +++ libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-loop-2.h (working copy) @@ -1,5 +1,3 @@ -/* FIXME: Remove the firstprivate clauses from the paralle regions. */ - #ifndef VARS #define VARS int a[1500]; @@ -19,7 +17,7 @@ __attribute__((noinline, noclone)) void N(f0) (void) { int i; -#pragma acc parallel loop L F firstprivate (i) +#pragma acc parallel loop L F for (i = 0; i 1500; i++) a[i] += 2; } @@ -36,7 +34,7 @@ __attribute__((noinline, noclone)) void N(f2) (void) { unsigned long long i; -#pragma acc parallel loop L F firstprivate (i) +#pragma acc parallel loop L F for (i = __LONG_LONG_MAX__ + 4500ULL - 27; i __LONG_LONG_MAX__ - 27ULL; i -= 3) a[(i + 26LL - __LONG_LONG_MAX__) / 3] -= 4; @@ -54,7 +52,7 @@ __attribute__((noinline, noclone)) void N(f4) (void) { unsigned int i; -#pragma acc parallel loop L F firstprivate (i) +#pragma acc parallel loop L F for (i = 30; i 20; i += 2) a[i] += 10; } @@ -64,7 +62,7 @@ N(f5) (int n11, int n12, int n21, int n22, int n31 int s1, int s2, int s3) { SC int v1, v2, v3; -#pragma acc parallel loop L F firstprivate (v1, v2, v3) +#pragma acc parallel loop L F for (v1 = n11; v1 n12; v1 += s1) #pragma acc loop S for (v2 = n21; v2 n22; v2 += s2) @@ -78,7 +76,7 @@ N(f6) (int n11, int n12, int n21, int n22, long lo { SC int v1, v2; SC long long v3; -#pragma acc parallel loop L F firstprivate (v1, v2, v3) +#pragma acc parallel loop L F for (v1
Re: [patch] fix regrename pass to ensure renamings produce valid insns
On 2015/6/30 05:06 PM, Eric Botcazou wrote: I notice the way gcc_assert() is defined in system.h now, the test won't disappear even when runtime checks are disabled, though you might still adjust it to avoid any programmer confusion. It will disappear at run time, see the definition: /* Include EXPR, so that unused variable warnings do not occur. */ #define gcc_assert(EXPR) ((void)(0 (EXPR))) so you really need to use a separate variable. I was referring to this one: #if ENABLE_ASSERT_CHECKING ... #elif (GCC_VERSION = 4005) #define gcc_assert(EXPR)\ ((void)(__builtin_expect (!(EXPR), 0) ? __builtin_unreachable (), 0 : 0)) #else ... But yeah, I guess older GCCs could be used to build a toolchain, so a separate variable should be used. Chung-Lin
Re: [patch] fix regrename pass to ensure renamings produce valid insns
On 2015/6/30 12:22 PM, Sandra Loosemore wrote: On 06/29/2015 09:07 PM, Kito Cheng wrote: Hi all: This patch seem will broken when disable assert checking for c6x Index: gcc/config/c6x/c6x.c === --- gcc/config/c6x/c6x.c (revision 225104) +++ gcc/config/c6x/c6x.c (working copy) @@ -3516,7 +3516,7 @@ try_rename_operands (rtx_insn *head, rtx best_reg = find_rename_reg (this_head, super_class, unavailable, old_reg, true); - regrename_do_replace (this_head, best_reg); + gcc_assert (regrename_do_replace (this_head, best_reg)); count_unit_reqs (new_reqs, head, PREV_INSN (tail)); merge_unit_reqs (new_reqs); @@ -3529,7 +3529,7 @@ try_rename_operands (rtx_insn *head, rtx unit_req_imbalance (reqs), unit_req_imbalance (new_reqs)); } if (unit_req_imbalance (new_reqs) unit_req_imbalance (reqs)) -regrename_do_replace (this_head, old_reg); +gcc_assert (regrename_do_replace (this_head, old_reg)); else memcpy (reqs, new_reqs, sizeof (unit_req_table)); I'm sorry; do you have a suggestion for a fix? I thought this was the change I was asked to make, and as I noted previously, I'm not set up to test (or even build) for this target. -Sandra the obviously confused :-( You probably have to separate out the regrename_do_replace() bool result into a variable, placing the whole call into the gcc_assert() might make it disappear when assertions are turned off. Chung-Lin
Re: [patch] fix regrename pass to ensure renamings produce valid insns
On 2015/6/30 下午 01:13, Chung-Lin Tang wrote: On 2015/6/30 12:22 PM, Sandra Loosemore wrote: On 06/29/2015 09:07 PM, Kito Cheng wrote: Hi all: This patch seem will broken when disable assert checking for c6x Index: gcc/config/c6x/c6x.c === --- gcc/config/c6x/c6x.c (revision 225104) +++ gcc/config/c6x/c6x.c (working copy) @@ -3516,7 +3516,7 @@ try_rename_operands (rtx_insn *head, rtx best_reg = find_rename_reg (this_head, super_class, unavailable, old_reg, true); - regrename_do_replace (this_head, best_reg); + gcc_assert (regrename_do_replace (this_head, best_reg)); count_unit_reqs (new_reqs, head, PREV_INSN (tail)); merge_unit_reqs (new_reqs); @@ -3529,7 +3529,7 @@ try_rename_operands (rtx_insn *head, rtx unit_req_imbalance (reqs), unit_req_imbalance (new_reqs)); } if (unit_req_imbalance (new_reqs) unit_req_imbalance (reqs)) -regrename_do_replace (this_head, old_reg); +gcc_assert (regrename_do_replace (this_head, old_reg)); else memcpy (reqs, new_reqs, sizeof (unit_req_table)); I'm sorry; do you have a suggestion for a fix? I thought this was the change I was asked to make, and as I noted previously, I'm not set up to test (or even build) for this target. -Sandra the obviously confused :-( You probably have to separate out the regrename_do_replace() bool result into a variable, placing the whole call into the gcc_assert() might make it disappear when assertions are turned off. I notice the way gcc_assert() is defined in system.h now, the test won't disappear even when runtime checks are disabled, though you might still adjust it to avoid any programmer confusion. Chung-Lin
Re: [gomp4] Generate sequential loop for OpenACC loop directive inside kernels
On 2015/6/16 05:05 PM, Tom de Vries wrote: On 16/06/15 10:59, Chung-Lin Tang wrote: This patch adjusts omp-low.c:expand_omp_for_generic() to expand to a sequential loop form (without the OMP runtime calls), used for loop directives inside OpenACC kernels constructs. Tom mentions that this allows the kernels parallelization to work when '#pragma acc loop' makes the front-ends create OMP_FOR, which the loop analysis phases don't understand. Tested and committed to gomp-4_0-branch. Hi Chung-Lin, can you commit a test-case to exercise the code? Thanks, - Tom Just committed the attached testcase patch to gomp-4_0-branch. Chung-Lin 2015-06-23 Chung-Lin Tang clt...@codesourcery.com gcc/testsuite/ * c-c++-common/goacc/kernels-loop.c (ACC_LOOP): Add #ifndef/#define. (main): Tag loops inside kernels construct with '#pragma ACC_LOOP'. * c-c++-common/goacc/kernels-loop-2.c: Likewise. * c-c++-common/goacc/kernels-loop-3.c: Likewise. * c-c++-common/goacc/kernels-loop-n.c: Likewise. * c-c++-common/goacc/kernels-loop-acc-loop.c: New test. * c-c++-common/goacc/kernels-loop-2-acc-loop.c: New test. * c-c++-common/goacc/kernels-loop-3-acc-loop.c: New test. * c-c++-common/goacc/kernels-loop-n-acc-loop.c: New test. Index: gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c === --- gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c (revision 0) +++ gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c (revision 0) @@ -0,0 +1,20 @@ +/* { dg-additional-options -O2 } */ +/* { dg-additional-options -ftree-parallelize-loops=32 } */ +/* { dg-additional-options -fdump-tree-parloops_oacc_kernels-all } */ +/* { dg-additional-options -fdump-tree-optimized } */ + +/* Check that loops with '#pragma acc loop' tagged gets properly parallelized. */ +#define ACC_LOOP acc loop +#include kernels-loop-3.c + +/* Check that only one loop is analyzed, and that it can be parallelized. */ +/* { dg-final { scan-tree-dump-times SUCCESS: may be parallelized 1 parloops_oacc_kernels } } */ +/* { dg-final { scan-tree-dump-not FAILED: parloops_oacc_kernels } } */ + +/* Check that the loop has been split off into a function. */ +/* { dg-final { scan-tree-dump-times (?n);; Function .*main._omp_fn.0 1 optimized } } */ + +/* { dg-final { scan-tree-dump-times (?n)pragma omp target oacc_parallel.*num_gangs\\(32\\) 1 parloops_oacc_kernels } } */ + +/* { dg-final { cleanup-tree-dump parloops_oacc_kernels } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Index: gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c === --- gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c (revision 224836) +++ gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c (working copy) @@ -8,6 +8,10 @@ #define N (1024 * 512) #define COUNTERTYPE unsigned int +#ifndef ACC_LOOP +#define ACC_LOOP +#endif + int main (void) { @@ -21,18 +25,21 @@ main (void) #pragma acc kernels copyout (a[0:N]) { +#pragma ACC_LOOP for (COUNTERTYPE i = 0; i N; i++) a[i] = i * 2; } #pragma acc kernels copyout (b[0:N]) { +#pragma ACC_LOOP for (COUNTERTYPE i = 0; i N; i++) b[i] = i * 4; } #pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N]) { +#pragma ACC_LOOP for (COUNTERTYPE ii = 0; ii N; ii++) c[ii] = a[ii] + b[ii]; } Index: gcc/testsuite/c-c++-common/goacc/kernels-loop.c === --- gcc/testsuite/c-c++-common/goacc/kernels-loop.c (revision 224836) +++ gcc/testsuite/c-c++-common/goacc/kernels-loop.c (working copy) @@ -8,6 +8,10 @@ #define N (1024 * 512) #define COUNTERTYPE unsigned int +#ifndef ACC_LOOP +#define ACC_LOOP +#endif + int main (void) { @@ -27,6 +31,7 @@ main (void) #pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N]) { +#pragma ACC_LOOP for (COUNTERTYPE ii = 0; ii N; ii++) c[ii] = a[ii] + b[ii]; } Index: gcc/testsuite/c-c++-common/goacc/kernels-loop-2-acc-loop.c === --- gcc/testsuite/c-c++-common/goacc/kernels-loop-2-acc-loop.c (revision 0) +++ gcc/testsuite/c-c++-common/goacc/kernels-loop-2-acc-loop.c (revision 0) @@ -0,0 +1,23 @@ +/* { dg-additional-options -O2 } */ +/* { dg-additional-options -ftree-parallelize-loops=32 } */ +/* { dg-additional-options -fdump-tree-parloops_oacc_kernels-all } */ +/* { dg-additional-options -fdump-tree-optimized } */ + +/* Check that loops with '#pragma acc loop' tagged gets properly parallelized. */ +#define ACC_LOOP acc loop +#include kernels-loop-2.c + +/* Check that only three loops are analyzed, and that all can be + parallelized. */ +/* { dg-final { scan-tree-dump-times SUCCESS: may be parallelized 3 parloops_oacc_kernels } } */ +/* { dg-final
[gomp4] Generate sequential loop for OpenACC loop directive inside kernels
This patch adjusts omp-low.c:expand_omp_for_generic() to expand to a sequential loop form (without the OMP runtime calls), used for loop directives inside OpenACC kernels constructs. Tom mentions that this allows the kernels parallelization to work when '#pragma acc loop' makes the front-ends create OMP_FOR, which the loop analysis phases don't understand. Tested and committed to gomp-4_0-branch. Chung-Lin 2015-06-16 Chung-Lin Tang clt...@codesourcery.com * omp-low.c (struct omp_region): Add inside_kernels_p field. (expand_omp_for_generic): Adjust to generate a 'sequential' loop when GOMP builtin arguments are BUILT_IN_NONE. (expand_omp_for): Use expand_omp_for_generic() to generate a non-parallelized loop for OMP_FORs inside OpenACC kernels regions. (expand_omp): Mark inside_kernels_p field true for regions nested inside OpenACC kernels constructs. Index: omp-low.c === --- omp-low.c (revision 224475) +++ omp-low.c (working copy) @@ -161,6 +161,9 @@ struct omp_region /* True if this is a combined parallel+workshare region. */ bool is_combined_parallel; + /* True if this is nested inside an OpenACC kernels construct. */ + bool inside_kernels_p; + /* For an OpenACC loop, the level of parallelism requested. */ int gwv_this; @@ -6734,6 +6737,7 @@ expand_omp_for_generic (struct omp_region *region, gassign *assign_stmt; bool in_combined_parallel = is_combined_parallel (region); bool broken_loop = region-cont == NULL; + bool seq_loop = (!start_fn || !next_fn); edge e, ne; tree *counts = NULL; int i; @@ -6821,8 +6825,21 @@ expand_omp_for_generic (struct omp_region *region, zero_iter_bb)); } } - if (in_combined_parallel) + if (seq_loop) { + tree n1 = fold_convert (fd-iter_type, fd-loop.n1); + tree n2 = fold_convert (fd-iter_type, fd-loop.n2); + + assign_stmt = gimple_build_assign (istart0, n1); + gsi_insert_before (gsi, assign_stmt, GSI_SAME_STMT); + + assign_stmt = gimple_build_assign (iend0, n2); + gsi_insert_before (gsi, assign_stmt, GSI_SAME_STMT); + + t = fold_build2 (NE_EXPR, boolean_type_node, istart0, iend0); +} + else if (in_combined_parallel) +{ /* In a combined parallel loop, emit a call to GOMP_loop_foo_next. */ t = build_call_expr (builtin_decl_explicit (next_fn), 2, @@ -7007,32 +7024,38 @@ expand_omp_for_generic (struct omp_region *region, collapse_bb = extract_omp_for_update_vars (fd, cont_bb, l1_bb); /* Emit code to get the next parallel iteration in L2_BB. */ - gsi = gsi_start_bb (l2_bb); + if (!seq_loop) + { + gsi = gsi_start_bb (l2_bb); - t = build_call_expr (builtin_decl_explicit (next_fn), 2, - build_fold_addr_expr (istart0), - build_fold_addr_expr (iend0)); - t = force_gimple_operand_gsi (gsi, t, true, NULL_TREE, -false, GSI_CONTINUE_LINKING); - if (TREE_TYPE (t) != boolean_type_node) - t = fold_build2 (NE_EXPR, boolean_type_node, - t, build_int_cst (TREE_TYPE (t), 0)); - gcond *cond_stmt = gimple_build_cond_empty (t); - gsi_insert_after (gsi, cond_stmt, GSI_CONTINUE_LINKING); + t = build_call_expr (builtin_decl_explicit (next_fn), 2, + build_fold_addr_expr (istart0), + build_fold_addr_expr (iend0)); + t = force_gimple_operand_gsi (gsi, t, true, NULL_TREE, + false, GSI_CONTINUE_LINKING); + if (TREE_TYPE (t) != boolean_type_node) + t = fold_build2 (NE_EXPR, boolean_type_node, + t, build_int_cst (TREE_TYPE (t), 0)); + gcond *cond_stmt = gimple_build_cond_empty (t); + gsi_insert_after (gsi, cond_stmt, GSI_CONTINUE_LINKING); + } } /* Add the loop cleanup function. */ gsi = gsi_last_bb (exit_bb); - if (gimple_omp_return_nowait_p (gsi_stmt (gsi))) -t = builtin_decl_explicit (BUILT_IN_GOMP_LOOP_END_NOWAIT); - else if (gimple_omp_return_lhs (gsi_stmt (gsi))) -t = builtin_decl_explicit (BUILT_IN_GOMP_LOOP_END_CANCEL); - else -t = builtin_decl_explicit (BUILT_IN_GOMP_LOOP_END); - gcall *call_stmt = gimple_build_call (t, 0); - if (gimple_omp_return_lhs (gsi_stmt (gsi))) -gimple_call_set_lhs (call_stmt, gimple_omp_return_lhs (gsi_stmt (gsi))); - gsi_insert_after (gsi, call_stmt, GSI_SAME_STMT); + if (!seq_loop) +{ + if (gimple_omp_return_nowait_p (gsi_stmt (gsi))) + t = builtin_decl_explicit (BUILT_IN_GOMP_LOOP_END_NOWAIT); + else if (gimple_omp_return_lhs (gsi_stmt (gsi))) + t = builtin_decl_explicit (BUILT_IN_GOMP_LOOP_END_CANCEL); + else + t = builtin_decl_explicit (BUILT_IN_GOMP_LOOP_END); + gcall *call_stmt = gimple_build_call (t, 0); + if (gimple_omp_return_lhs (gsi_stmt (gsi))) + gimple_call_set_lhs (call_stmt, gimple_omp_return_lhs (gsi_stmt (gsi))); + gsi_insert_after (gsi, call_stmt, GSI_SAME_STMT); +} gsi_remove (gsi, true); /* Connect the new
Re: [patch, libgomp] Re-factor GOMP_MAP_POINTER handling
Ping x2. On 15/5/11 7:19 PM, Chung-Lin Tang wrote: Ping. On 2015/4/21 08:21 PM, Chung-Lin Tang wrote: Hi, while investigating some issues in the variable mapping code, I observed that the GOMP_MAP_POINTER handling is essentially duplicated under the PSET case. This patch abstracts and unifies the handling code, basically just a cleanup patch. Ran libgomp tests to ensure no regressions, ok for trunk? Thanks, Chung-Lin 2015-04-21 Chung-Lin Tang clt...@codesourcery.com libgomp/ * target.c (gomp_map_pointer): New function abstracting out GOMP_MAP_POINTER handling. (gomp_map_vars): Remove GOMP_MAP_POINTER handling code and use gomp_map_pointer().
Re: [patch, libgomp] Re-factor GOMP_MAP_POINTER handling
Ping. On 2015/4/21 08:21 PM, Chung-Lin Tang wrote: Hi, while investigating some issues in the variable mapping code, I observed that the GOMP_MAP_POINTER handling is essentially duplicated under the PSET case. This patch abstracts and unifies the handling code, basically just a cleanup patch. Ran libgomp tests to ensure no regressions, ok for trunk? Thanks, Chung-Lin 2015-04-21 Chung-Lin Tang clt...@codesourcery.com libgomp/ * target.c (gomp_map_pointer): New function abstracting out GOMP_MAP_POINTER handling. (gomp_map_vars): Remove GOMP_MAP_POINTER handling code and use gomp_map_pointer().
[patch, libgomp] Re-factor GOMP_MAP_POINTER handling
Hi, while investigating some issues in the variable mapping code, I observed that the GOMP_MAP_POINTER handling is essentially duplicated under the PSET case. This patch abstracts and unifies the handling code, basically just a cleanup patch. Ran libgomp tests to ensure no regressions, ok for trunk? Thanks, Chung-Lin 2015-04-21 Chung-Lin Tang clt...@codesourcery.com libgomp/ * target.c (gomp_map_pointer): New function abstracting out GOMP_MAP_POINTER handling. (gomp_map_vars): Remove GOMP_MAP_POINTER handling code and use gomp_map_pointer(). Index: target.c === --- target.c (revision 448412) +++ target.c (working copy) @@ -163,6 +163,60 @@ get_kind (bool is_openacc, void *kinds, int idx) : ((unsigned char *) kinds)[idx]; } +static void +gomp_map_pointer (struct target_mem_desc *tgt, uintptr_t host_ptr, + uintptr_t target_offset, uintptr_t bias) +{ + struct gomp_device_descr *devicep = tgt-device_descr; + struct splay_tree_s *mem_map = devicep-mem_map; + struct splay_tree_key_s cur_node; + + cur_node.host_start = host_ptr; + if (cur_node.host_start == (uintptr_t) NULL) +{ + cur_node.tgt_offset = (uintptr_t) NULL; + /* FIXME: see comment about coalescing host/dev transfers below. */ + devicep-host2dev_func (devicep-target_id, + (void *) (tgt-tgt_start + target_offset), + (void *) cur_node.tgt_offset, + sizeof (void *)); + return; +} + /* Add bias to the pointer value. */ + cur_node.host_start += bias; + cur_node.host_end = cur_node.host_start + 1; + splay_tree_key n = splay_tree_lookup (mem_map, cur_node); + if (n == NULL) +{ + /* Could be possibly zero size array section. */ + cur_node.host_end--; + n = splay_tree_lookup (mem_map, cur_node); + if (n == NULL) + { + cur_node.host_start--; + n = splay_tree_lookup (mem_map, cur_node); + cur_node.host_start++; + } +} + if (n == NULL) +{ + gomp_mutex_unlock (devicep-lock); + gomp_fatal (Pointer target of array section wasn't mapped); +} + cur_node.host_start -= n-host_start; + cur_node.tgt_offset += n-tgt-tgt_start + n-tgt_offset + cur_node.host_start; + /* At this point tgt_offset is target address of the + array section. Now subtract bias to get what we want + to initialize the pointer with. */ + cur_node.tgt_offset -= bias; + /* FIXME: see comment about coalescing host/dev transfers below. */ + devicep-host2dev_func (devicep-target_id, + (void *) (tgt-tgt_start + target_offset), + (void *) cur_node.tgt_offset, + sizeof (void *)); +} + attribute_hidden struct target_mem_desc * gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum, void **hostaddrs, void **devaddrs, size_t *sizes, void *kinds, @@ -336,54 +390,8 @@ gomp_map_vars (struct gomp_device_descr *devicep, k-host_end - k-host_start); break; case GOMP_MAP_POINTER: - cur_node.host_start - = (uintptr_t) *(void **) k-host_start; - if (cur_node.host_start == (uintptr_t) NULL) - { - cur_node.tgt_offset = (uintptr_t) NULL; - /* FIXME: see above FIXME comment. */ - devicep-host2dev_func (devicep-target_id, - (void *) (tgt-tgt_start - + k-tgt_offset), - (void *) cur_node.tgt_offset, - sizeof (void *)); - break; - } - /* Add bias to the pointer value. */ - cur_node.host_start += sizes[i]; - cur_node.host_end = cur_node.host_start + 1; - n = splay_tree_lookup (mem_map, cur_node); - if (n == NULL) - { - /* Could be possibly zero size array section. */ - cur_node.host_end--; - n = splay_tree_lookup (mem_map, cur_node); - if (n == NULL) - { - cur_node.host_start--; - n = splay_tree_lookup (mem_map, cur_node); - cur_node.host_start++; - } - } - if (n == NULL) - { - gomp_mutex_unlock (devicep-lock); - gomp_fatal (Pointer target of array section -wasn't mapped); - } - cur_node.host_start -= n-host_start; - cur_node.tgt_offset = n-tgt-tgt_start + n-tgt_offset - + cur_node.host_start; - /* At this point tgt_offset is target address of the - array section. Now subtract bias to get what we want - to initialize the pointer with. */ - cur_node.tgt_offset -= sizes[i]; - /* FIXME: see above FIXME comment. */ - devicep-host2dev_func (devicep-target_id, - (void *) (tgt-tgt_start - + k-tgt_offset), - (void *) cur_node.tgt_offset, - sizeof (void *)); + gomp_map_pointer (tgt, (uintptr_t) *(void **) k-host_start, + k-tgt_offset, sizes[i]); break; case GOMP_MAP_TO_PSET: /* FIXME: see above FIXME comment. */ @@ -405,58 +413,12 @@ gomp_map_vars (struct gomp_device_descr *devicep, { tgt-list[j] = k; k-refcount
[patch, nios2, committed] Fix nios2-linux crti/crtn settings
We appear to have erroneously set 'extra_parts' in nios2-linux libgcc, to include the crti.o/crtn.o files intended for nios2 EABI. This still largely worked, which is why we haven't noticed it till now, expect some features like gprof profiling wasn't properly set up. This patch removes the extra_parts setting for nios2-linux libgcc; now crti.o/crtn.o links to the correct ones provided by glibc. Chung-Lin 2015-03-25 Chung-Lin Tang clt...@codesourcery.com libgcc/ * config.host (nios2-*-linux*): Remove 'extra_parts' setting. Index: config.host === --- config.host (revision 221651) +++ config.host (working copy) @@ -943,7 +943,6 @@ nds32*-elf*) ;; nios2-*-linux*) tmake_file=$tmake_file nios2/t-nios2 nios2/t-linux t-libgcc-pic t-slibgcc-libgcc - extra_parts=$extra_parts crti.o crtn.o md_unwind_header=nios2/linux-unwind.h ;; nios2-*-*)
[PATCH, nios2] Updates to Nios II Linux
The Nios II ports of glibc and Linux kernel are now both upstream. New system conventions now use a non-executable stack. Attached patch committed to support new conventions, applied to both trunk and 4.9 branch. Chung-Lin 2015-01-20 Chung-Lin Tang clt...@codesourcery.com gcc/ * config/nios2/nios2.c (nios2_asm_file_end): Implement TARGET_ASM_FILE_END hook for adding .note.GNU-stack section when needed. (TARGET_ASM_FILE_END): Define. libgcc/ * config/nios2/linux-unwind.h (nios2_fallback_frame_state): Update rt_sigframe format and address for current Nios II Linux conventions. Index: libgcc/config/nios2/linux-unwind.h === --- libgcc/config/nios2/linux-unwind.h (revision 219897) +++ libgcc/config/nios2/linux-unwind.h (working copy) @@ -67,10 +67,9 @@ nios2_fallback_frame_state (struct _Unwind_Context if (pc[0] == (0x0084 | (__NR_rt_sigreturn 6))) { struct rt_sigframe { - char retcode[12]; siginfo_t info; struct nios2_ucontext uc; - } *rt_ = context-ra; + } *rt_ = context-cfa; struct nios2_mcontext *regs = rt_-uc.uc_mcontext; int i; Index: gcc/config/nios2/nios2.c === --- gcc/config/nios2/nios2.c (revision 219897) +++ gcc/config/nios2/nios2.c (working copy) @@ -2223,6 +2223,18 @@ nios2_output_dwarf_dtprel (FILE *file, int size, r fprintf (file, )); } +/* Implemet TARGET_ASM_FILE_END. */ + +static void +nios2_asm_file_end (void) +{ + /* The Nios II Linux stack is mapped non-executable by default, so add a + .note.GNU-stack section for switching to executable stacks only when + trampolines are generated. */ + if (TARGET_LINUX_ABI trampolines_created) +file_end_indicate_exec_stack (); +} + /* Implement TARGET_ASM_FUNCTION_PROLOGUE. */ static void nios2_asm_function_prologue (FILE *file, HOST_WIDE_INT size ATTRIBUTE_UNUSED) @@ -3401,6 +3413,9 @@ nios2_merge_decl_attributes (tree olddecl, tree ne #undef TARGET_ASM_OUTPUT_ADDR_CONST_EXTRA #define TARGET_ASM_OUTPUT_ADDR_CONST_EXTRA nios2_output_addr_const_extra +#undef TARGET_ASM_FILE_END +#define TARGET_ASM_FILE_END nios2_asm_file_end + #undef TARGET_OPTION_OVERRIDE #define TARGET_OPTION_OVERRIDE nios2_option_override
Re: [PATCH] gcc/config/nios2/nios2.c: Let custom_builtin_name[*] always be zero terminated string
On 14/7/26 11:28 AM, Chen Gang wrote: The related strncpy() for custom_builtin_name[*] may set 5 none-zero characters, which may cause custom_builtin_name[*] is none-zero terminated. So add additional '\0' byte for custom_builtin_name[*]. Where did you see this? Supposedly the snprintf of the custom function type string should at most be of 'xnxx' format; 4 characters at most, not 5. Do you have a test case where the behavior you described appears? Thanks, Chung-Lin ChangeLog: * config/nios2/nios2.c (custom_builtin_name): Let it always be zero terminated string. Signed-off-by: Chen Gang gang.chen.5...@gmail.com --- gcc/config/nios2/nios2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/nios2/nios2.c b/gcc/config/nios2/nios2.c index a4e60c6..e4e005b 100644 --- a/gcc/config/nios2/nios2.c +++ b/gcc/config/nios2/nios2.c @@ -2510,7 +2510,7 @@ nios2_expand_fpu_builtin (tree exp, unsigned int code, rtx target) total of (3 + 1) * (1 + 3 + 9) == 52 custom builtin functions. */ #define NUM_CUSTOM_BUILTINS ((3 + 1) * (1 + 3 + 9)) -static char custom_builtin_name[NUM_CUSTOM_BUILTINS][5]; +static char custom_builtin_name[NUM_CUSTOM_BUILTINS][6]; static void nios2_init_custom_builtins (int start_code)
Re: [PATCH] gcc/config/nios2/nios2.c: Let custom_builtin_name[*] always be zero terminated string
On 2014/7/26 03:33 PM, Chen Gang wrote: On 07/26/2014 02:32 PM, Chung-Lin Tang wrote: On 14/7/26 11:28 AM, Chen Gang wrote: The related strncpy() for custom_builtin_name[*] may set 5 none-zero characters, which may cause custom_builtin_name[*] is none-zero terminated. So add additional '\0' byte for custom_builtin_name[*]. Where did you see this? Supposedly the snprintf of the custom function type string should at most be of 'xnxx' format; 4 characters at most, I guess, your 'xnxx' means %cn%c%c, not %sn%s%s (which is current implementation), also at present, 32 - n is 17, not 4 for snprintf() length limitation. The use of %sn%s%s is intentional. That allows me to print directly using snprintf. '32 - n' is to represent the rest of the 32-byte buffer, harmless really, as it is at most 4 chars. Also, if I were to restrict it in the snprintf argument, it would be 5 (including the null char) not 4. If we are always sure it must be no more than 4 characters (at present, it is, but in the future, I don't know). We can use strcpy() instead of strncpy() for it -- that will let other readers no doubt. If we need let custom_builtin_name[*] not only zero terminated, but also zero pad, we need pass '4' to strncpy() instead of '5', that also will clear all doubts. I don't understand what point you're trying to make here, really. As Andreas has noted in the other mail, strncpy does zero-termination automatically. Chung-Lin not 5. Do you have a test case where the behavior you described appears? I find it by reading source code, for me, it is simple code, test is welcomed, but not mandatory. But it really needs necessary discussion (for modification, and comments).
Re: [PATCH] RTEMS: Add Nios 2 support
On 2014/7/18 上午 05:19, Joel Sherrill wrote: Unless someone objects, I am going to commit this to the 4.9 branch and head. --joel Sorry about the delay, I'll review it today. Thanks, Chung-Lin On 7/7/2014 1:42 AM, Sebastian Huber wrote: Ping. On 2014-06-26 13:43, Sebastian Huber wrote: This patch should be applied to GCC 4.9 and mainline. I do not have write access, so in case this gets approved, please commit it for me. gcc/ChangeLog 2014-06-26 Sebastian Huber sebastian.hu...@embedded-brains.de * config.gcc (nios2-*-*): Add RTEMS support. * config/nios2/rtems.h: New file. * config/nios2/t-rtems: Likewise.
Re: [PATCH] RTEMS: Add Nios 2 support
For the default multilib settings, it looks like you just intended to use -mcustom-fpu-cfg=60-2. I suggest you modify t-rtems to do that instead of enumerating the individual FPU insn options. Other than that, the patch looks okay. Chung-Lin On 2014/6/26 07:43 PM, Sebastian Huber wrote: diff --git a/gcc/config/nios2/t-rtems b/gcc/config/nios2/t-rtems new file mode 100644 index 000..f95fa3c --- /dev/null +++ b/gcc/config/nios2/t-rtems @@ -0,0 +1,133 @@ +# Custom RTEMS multilibs + +MULTILIB_OPTIONS = mhw-mul mhw-mulx mhw-div mcustom-fadds=253 mcustom-fdivs=255 mcustom-fmuls=252 mcustom-fsubs=254 + +# Enumeration of multilibs + +# MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fdivs=255 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fmuls=252/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fmuls=252 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fdivs=255/mcustom-fmuls=252 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fdivs=255/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fdivs=255 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fmuls=252/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fmuls=252 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fsubs=254 +# MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fdivs=255 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fmuls=252/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fmuls=252 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fadds=253 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fdivs=255/mcustom-fmuls=252 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fdivs=255/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fdivs=255 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fmuls=252/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fmuls=252 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fdivs=255 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fmuls=252/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fmuls=252 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fadds=253 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fdivs=255/mcustom-fmuls=252 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fdivs=255/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fdivs=255 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fmuls=252/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fmuls=252 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div +MULTILIB_EXCEPTIONS += mhw-mul/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252 +MULTILIB_EXCEPTIONS += mhw-mul/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mcustom-fadds=253/mcustom-fdivs=255 +MULTILIB_EXCEPTIONS += mhw-mul/mcustom-fadds=253/mcustom-fmuls=252/mcustom-fsubs=254
Re: [PATCH] RTEMS: Add Nios 2 support
On 14/7/18 2:30 PM, Chung-Lin Tang wrote: For the default multilib settings, it looks like you just intended to use -mcustom-fpu-cfg=60-2. I suggest you modify t-rtems to do that instead of enumerating the individual FPU insn options. Other than that, the patch looks okay. Chung-Lin BTW, I assume you have done the appropriate testing for a nios2-rtems toolchain? Thanks, Chung-Lin On 2014/6/26 07:43 PM, Sebastian Huber wrote: diff --git a/gcc/config/nios2/t-rtems b/gcc/config/nios2/t-rtems new file mode 100644 index 000..f95fa3c --- /dev/null +++ b/gcc/config/nios2/t-rtems @@ -0,0 +1,133 @@ +# Custom RTEMS multilibs + +MULTILIB_OPTIONS = mhw-mul mhw-mulx mhw-div mcustom-fadds=253 mcustom-fdivs=255 mcustom-fmuls=252 mcustom-fsubs=254 + +# Enumeration of multilibs + +# MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fdivs=255 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fmuls=252/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fmuls=252 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fdivs=255/mcustom-fmuls=252 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fdivs=255/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fdivs=255 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fmuls=252/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fmuls=252 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fsubs=254 +# MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fdivs=255 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fmuls=252/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fmuls=252 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fadds=253 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fdivs=255/mcustom-fmuls=252 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fdivs=255/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fdivs=255 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fmuls=252/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fmuls=252 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fdivs=255 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fmuls=252/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fmuls=252 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fadds=253 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fdivs=255/mcustom-fmuls=252 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fdivs=255/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fdivs=255 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fmuls=252/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fmuls=252 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div +MULTILIB_EXCEPTIONS += mhw-mul/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254 +MULTILIB_EXCEPTIONS += mhw-mul/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252 +MULTILIB_EXCEPTIONS += mhw-mul/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fsubs=254
Re: [PATCH, ARM] Cortex-A9 MPCore volatile load workaround
Ping x2. On 14/6/20 2:24 PM, Chung-Lin Tang wrote: Ping. On 2014/6/9 10:03 PM, Chung-Lin Tang wrote: Hi Richard, As we talked about earlier, here's a patch to add a compiler option to work around Cortex-A9 MPCore errata 761319: http://infocenter.arm.com/help/topic/com.arm.doc.uan0004a/UAN0004A_a9_read_read.pdf What the option does basically, is to scan for volatile loads during reorg, and add a dmb barrier after it. It also strives to make dmb conditionally executed under TARGET_THUMB2, which means a new Thumb-2 specific *memory_barrier_t2 pattern in sync.md, with adjusted conds/predicable attributes and %? in output strings. Patch originally written by Julian, with additions by Meador, and finally a few trivial adjustments by me. Again, we've been carrying this fix for a release or two. Okay for trunk? Thanks, Chung-Lin 2014-06-09 Julian Brown jul...@codesourcery.com Meador Inge mead...@codesourcery.com Chung-Lin Tang clt...@codesourcery.com * config/arm/arm.c (arm_option_override): Emit warning if -mfix-cortex-a9-volatile-hazards is used on an incompatible CPU. (any_volatile_loads_p): New. (arm_cortex_a9_errata_reorg): New. (arm_reorg): Call arm_cortex_a9_errata_reorg. * config/arm/arm.opt (mfix-cortex-a9-volatile-hazards): Add option. * config/arm/sync.md (*memory_barrier): Don't use on Thumb-2. (*memory_barrier_t2): New, allow conditional execution on Thumb-2. * doc/invoke.texi (-mfix-cortex-a9-volatile-hazards): Add documentation. testsuite/ * lib/target-supports.exp (check_effective_target_arm_dmb): New. * gcc.target/arm/a9-volatile-ordering-erratum-1.c: New test. * gcc.target/arm/a9-volatile-ordering-erratum-2.c: New test. * gcc.target/arm/a9-volatile-ordering-erratum-3.c: New test. * gcc.target/arm/a9-volatile-ordering-erratum-4.c: New test.
Re: [PATCH, PR61554] ICE during CCP
On 2014/6/23 04:45 PM, Richard Biener wrote: On Mon, Jun 23, 2014 at 7:32 AM, Chung-Lin Tang clt...@codesourcery.com wrote: Hi Richard, In this change: https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01278.html where substitute_and_fold() was changed to use a dom walker, the calls to purge dead EH edges during the walk can alter the dom-tree, and have chaotic results; the testcase in PR 61554 has some blocks traversed twice during the walk, causing the segfault during CCP. The patch records the to-be-purged-for-dead-EH blocks in a similar manner like stmts_to_remove, and processes it after the walk. (another possible method would be using a bitmap to record the BBs + calling gimple_purge_all_dead_eh_edges...) Oops. Bootstrapped and tested on x86_64-linux, is this okay for trunk? Can you please use a bitmap and use gimple_purge_all_dead_eh_edges like tree-ssa-pre.c does? Also please add the reduced testcase from the PR to the g++.dg/torture Ok with that changes. Thanks, Richard. Thanks for the review. Attached is what I committed. Testcase made by Markus also added. Thanks, Chung-Lin 2014-06-24 Chung-Lin Tang clt...@codesourcery.com PR tree-optimization/61554 * tree-ssa-propagate.c: Include bitmap.h. (substitute_and_fold_dom_walker): Add 'bitmap need_eh_cleanup' member, properly update constructor/destructor. (substitute_and_fold_dom_walker::before_dom_children): Remove call to gimple_purge_dead_eh_edges, add bb-index to need_eh_cleaup instead. (substitute_and_fold): Call gimple_purge_all_dead_eh_edges on need_eh_cleanup. Index: tree-ssa-propagate.c === --- tree-ssa-propagate.c (revision 211927) +++ tree-ssa-propagate.c (working copy) @@ -29,6 +29,7 @@ #include function.h #include gimple-pretty-print.h #include dumpfile.h +#include bitmap.h #include sbitmap.h #include tree-ssa-alias.h #include internal-fn.h @@ -1031,8 +1032,13 @@ class substitute_and_fold_dom_walker : public dom_ fold_fn (fold_fn_), do_dce (do_dce_), something_changed (false) { stmts_to_remove.create (0); + need_eh_cleanup = BITMAP_ALLOC (NULL); } -~substitute_and_fold_dom_walker () { stmts_to_remove.release (); } +~substitute_and_fold_dom_walker () +{ + stmts_to_remove.release (); + BITMAP_FREE (need_eh_cleanup); +} virtual void before_dom_children (basic_block); virtual void after_dom_children (basic_block) {} @@ -1042,6 +1048,7 @@ class substitute_and_fold_dom_walker : public dom_ bool do_dce; bool something_changed; vecgimple stmts_to_remove; +bitmap need_eh_cleanup; }; void @@ -1144,7 +1151,7 @@ substitute_and_fold_dom_walker::before_dom_childre /* If we cleaned up EH information from the statement, remove EH edges. */ if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt)) - gimple_purge_dead_eh_edges (bb); + bitmap_set_bit (need_eh_cleanup, bb-index); if (is_gimple_assign (stmt) (get_gimple_rhs_class (gimple_assign_rhs_code (stmt)) @@ -1235,6 +1242,9 @@ substitute_and_fold (ssa_prop_get_value_fn get_val } } + if (!bitmap_empty_p (walker.need_eh_cleanup)) +gimple_purge_all_dead_eh_edges (walker.need_eh_cleanup); + statistics_counter_event (cfun, Constants propagated, prop_stats.num_const_prop); statistics_counter_event (cfun, Copies propagated,
[PATCH, PR61554] ICE during CCP
Hi Richard, In this change: https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01278.html where substitute_and_fold() was changed to use a dom walker, the calls to purge dead EH edges during the walk can alter the dom-tree, and have chaotic results; the testcase in PR 61554 has some blocks traversed twice during the walk, causing the segfault during CCP. The patch records the to-be-purged-for-dead-EH blocks in a similar manner like stmts_to_remove, and processes it after the walk. (another possible method would be using a bitmap to record the BBs + calling gimple_purge_all_dead_eh_edges...) Bootstrapped and tested on x86_64-linux, is this okay for trunk? Thanks, Chung-Lin 2014-06-23 Chung-Lin Tang clt...@codesourcery.com PR tree-optimization/61554 * tree-ssa-propagate.c (substitute_and_fold_dom_walker): Add 'vecbasic_block bbs_to_purge_dead_eh_edges' member, properly update constructor/destructor. (substitute_and_fold_dom_walker::before_dom_children): Remove call to gimple_purge_dead_eh_edges, add bb to bbs_to_purge_dead_eh_edges instead. (substitute_and_fold): Call gimple_purge_dead_eh_edges for bbs recorded in bbs_to_purge_dead_eh_edges. Index: tree-ssa-propagate.c === --- tree-ssa-propagate.c (revision 211874) +++ tree-ssa-propagate.c (working copy) @@ -1031,8 +1031,13 @@ class substitute_and_fold_dom_walker : public dom_ fold_fn (fold_fn_), do_dce (do_dce_), something_changed (false) { stmts_to_remove.create (0); + bbs_to_purge_dead_eh_edges.create (0); } -~substitute_and_fold_dom_walker () { stmts_to_remove.release (); } +~substitute_and_fold_dom_walker () +{ + stmts_to_remove.release (); + bbs_to_purge_dead_eh_edges.release (); +} virtual void before_dom_children (basic_block); virtual void after_dom_children (basic_block) {} @@ -1042,6 +1047,7 @@ class substitute_and_fold_dom_walker : public dom_ bool do_dce; bool something_changed; vecgimple stmts_to_remove; +vecbasic_block bbs_to_purge_dead_eh_edges; }; void @@ -1144,7 +1150,7 @@ substitute_and_fold_dom_walker::before_dom_childre /* If we cleaned up EH information from the statement, remove EH edges. */ if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt)) - gimple_purge_dead_eh_edges (bb); + bbs_to_purge_dead_eh_edges.safe_push (bb); if (is_gimple_assign (stmt) (get_gimple_rhs_class (gimple_assign_rhs_code (stmt)) @@ -1235,6 +1241,14 @@ substitute_and_fold (ssa_prop_get_value_fn get_val } } + while (!walker.bbs_to_purge_dead_eh_edges.is_empty ()) +{ + basic_block bb = walker.bbs_to_purge_dead_eh_edges.pop (); + gimple_purge_dead_eh_edges (bb); + if (dump_file dump_flags TDF_DETAILS) + fprintf (dump_file, Purge dead EH edges from bb %d\n, bb-index); +} + statistics_counter_event (cfun, Constants propagated, prop_stats.num_const_prop); statistics_counter_event (cfun, Copies propagated,
Re: [PATCH, ARM] MI-thunk fix for TARGET_THUMB1_ONLY
On 2014/6/18 上午 06:26, Ramana Radhakrishnan wrote: On Sun, Jun 8, 2014 at 12:27 PM, Chung-Lin Tang clt...@codesourcery.com wrote: Hi Richard, Ramana, Attached is a small fix for resolving a g++.old-deja/g++.jason/thunk2.C regression we found under a TARGET_THUMB1_ONLY multilib (-mthumb -march=armv6-m to be exact). Basically under those conditions, the thunk is in Thumb mode, so the subtraction should be 4 rather than 8. Yep, this is OK with a minor change to the comment to make it more explicit. + /* Output .word .LTHUNKn-[37]-.LTHUNKPCn. */ s/37/3,7/ Ok with that change and if no regressions. OK for release branches unless the RM's object in 24 hours. Re-tested on a recent trunk, verified g++.jason/thunk2.C resolved with patch and no regressions. Committed on trunk and backported to 4.8, 4.9 branches. Thanks, Chung-Lin
Re: [PATCH, ARM] Cortex-A9 MPCore volatile load workaround
Ping. On 2014/6/9 10:03 PM, Chung-Lin Tang wrote: Hi Richard, As we talked about earlier, here's a patch to add a compiler option to work around Cortex-A9 MPCore errata 761319: http://infocenter.arm.com/help/topic/com.arm.doc.uan0004a/UAN0004A_a9_read_read.pdf What the option does basically, is to scan for volatile loads during reorg, and add a dmb barrier after it. It also strives to make dmb conditionally executed under TARGET_THUMB2, which means a new Thumb-2 specific *memory_barrier_t2 pattern in sync.md, with adjusted conds/predicable attributes and %? in output strings. Patch originally written by Julian, with additions by Meador, and finally a few trivial adjustments by me. Again, we've been carrying this fix for a release or two. Okay for trunk? Thanks, Chung-Lin 2014-06-09 Julian Brown jul...@codesourcery.com Meador Inge mead...@codesourcery.com Chung-Lin Tang clt...@codesourcery.com * config/arm/arm.c (arm_option_override): Emit warning if -mfix-cortex-a9-volatile-hazards is used on an incompatible CPU. (any_volatile_loads_p): New. (arm_cortex_a9_errata_reorg): New. (arm_reorg): Call arm_cortex_a9_errata_reorg. * config/arm/arm.opt (mfix-cortex-a9-volatile-hazards): Add option. * config/arm/sync.md (*memory_barrier): Don't use on Thumb-2. (*memory_barrier_t2): New, allow conditional execution on Thumb-2. * doc/invoke.texi (-mfix-cortex-a9-volatile-hazards): Add documentation. testsuite/ * lib/target-supports.exp (check_effective_target_arm_dmb): New. * gcc.target/arm/a9-volatile-ordering-erratum-1.c: New test. * gcc.target/arm/a9-volatile-ordering-erratum-2.c: New test. * gcc.target/arm/a9-volatile-ordering-erratum-3.c: New test. * gcc.target/arm/a9-volatile-ordering-erratum-4.c: New test.
Re: [ARM] Fix build failure due to movsi_compare0 (PR 61430)
On 14/6/16 5:55 PM, James Greenhalgh wrote: On Fri, Jun 13, 2014 at 05:46:45PM +0100, Vladimir Makarov wrote: On 2014-06-11, 1:17 PM, Chung-Lin Tang wrote: Looking at this too, as an LRA exercise. I don't really think the pattern is wrong, rather LRA should just avoid creating the copy in this case; it's a result of operand constraining, after all. Attached is the small LRA patch, pending testing. Vladimir should weight in on this. The patch is safe and ok. Thanks for working on it, Chung-Lin. As this patch fixes a build failure on ARM I'd like to have it applied today. If I don't hear anything which would stop me, I'll commit this on Chung-Lin's behalf in a few hours. Cheers, James I just committed it, after a testsuite run on x86_64 (to be sure). And thanks to James for doing the ARM tests. Thanks, Chung-Lin * ira-lives.c (process_bb_lives): Skip creating copy during insn sca when src/dest has constrained to same regno.
Re: [ARM] Fix build failure due to movsi_compare0 (PR 61430)
On 2014/6/11 下午 06:32, James Greenhalgh wrote: Hi, A recent change somewhere exposed a latent bug between LRA and the definition of the movsi_compare0 pattern. This pattern ties the source and destination register of a set together a (match_dup) and register constraints: [(set (reg:CC CC_REGNUM) (compare:CC (match_operand:SI 1 s_register_operand 0,r) (const_int 0))) (set (match_operand:SI 0 s_register_operand =r,r) (match_dup 1))] This confuses LRA which expects the source and destination register of a set to be different. reduced.c: In function '_IO_vfscanf_internal': reduced.c:104:1: internal compiler error: in lra_create_copy, at lra.c:1512 } ^ 0x8c3f9a lra_create_copy(int, int, int) /work/gcc-dev/src/gcc/gcc/lra.c:1512 0x8e4ab0 process_bb_lives /work/gcc-dev/src/gcc/gcc/lra-lives.c:568 0x8e4ab0 lra_create_live_ranges(bool) /work/gcc-dev/src/gcc/gcc/lra-lives.c:1019 0x8c5a39 lra(_IO_FILE*) /work/gcc-dev/src/gcc/gcc/lra.c:2356 0x873a96 do_reload /work/gcc-dev/src/gcc/gcc/ira.c:5415 0x873a96 execute /work/gcc-dev/src/gcc/gcc/ira.c:5576 Please submit a full bug report, We can fix the pattern by moving away from match_dup and using register tying with constraints consistently. I'm not entirely convinced that this is legitimate (my vague recollection is that register tying should only be used to tie inputs to outputs). This has passed testing on a bunch of ARM targets, and fixes the build issues I've been seeing. Looking at this too, as an LRA exercise. I don't really think the pattern is wrong, rather LRA should just avoid creating the copy in this case; it's a result of operand constraining, after all. Attached is the small LRA patch, pending testing. Vladimir should weight in on this. Thanks, Chung-Lin * ira-lives.c (process_bb_lives): Skip creating copy during insn sca when src/dest has constrained to same regno. Index: lra-lives.c === --- lra-lives.c (revision 211398) +++ lra-lives.c (working copy) @@ -558,7 +558,11 @@ process_bb_lives (basic_block bb, int curr_point) /* It might be 'inheritance pseudo - reload pseudo'. */ || (src_regno = lra_constraint_new_regno_start ((int) REGNO (SET_DEST (set)) - = lra_constraint_new_regno_start + = lra_constraint_new_regno_start) + /* Remember to skip special cases where src/dest regnos are +the same, e.g. insn SET pattern has matching constraints +like =r,0. */ + src_regno != (int) REGNO (SET_DEST (set) { int hard_regno = -1, regno = -1;
[PATCH, ARM] Cortex-A9 MPCore volatile load workaround
Hi Richard, As we talked about earlier, here's a patch to add a compiler option to work around Cortex-A9 MPCore errata 761319: http://infocenter.arm.com/help/topic/com.arm.doc.uan0004a/UAN0004A_a9_read_read.pdf What the option does basically, is to scan for volatile loads during reorg, and add a dmb barrier after it. It also strives to make dmb conditionally executed under TARGET_THUMB2, which means a new Thumb-2 specific *memory_barrier_t2 pattern in sync.md, with adjusted conds/predicable attributes and %? in output strings. Patch originally written by Julian, with additions by Meador, and finally a few trivial adjustments by me. Again, we've been carrying this fix for a release or two. Okay for trunk? Thanks, Chung-Lin 2014-06-09 Julian Brown jul...@codesourcery.com Meador Inge mead...@codesourcery.com Chung-Lin Tang clt...@codesourcery.com * config/arm/arm.c (arm_option_override): Emit warning if -mfix-cortex-a9-volatile-hazards is used on an incompatible CPU. (any_volatile_loads_p): New. (arm_cortex_a9_errata_reorg): New. (arm_reorg): Call arm_cortex_a9_errata_reorg. * config/arm/arm.opt (mfix-cortex-a9-volatile-hazards): Add option. * config/arm/sync.md (*memory_barrier): Don't use on Thumb-2. (*memory_barrier_t2): New, allow conditional execution on Thumb-2. * doc/invoke.texi (-mfix-cortex-a9-volatile-hazards): Add documentation. testsuite/ * lib/target-supports.exp (check_effective_target_arm_dmb): New. * gcc.target/arm/a9-volatile-ordering-erratum-1.c: New test. * gcc.target/arm/a9-volatile-ordering-erratum-2.c: New test. * gcc.target/arm/a9-volatile-ordering-erratum-3.c: New test. * gcc.target/arm/a9-volatile-ordering-erratum-4.c: New test. Index: doc/invoke.texi === --- doc/invoke.texi (revision 211364) +++ doc/invoke.texi (working copy) @@ -535,6 +535,7 @@ Objective-C and Objective-C++ Dialects}. -mtp=@var{name} -mtls-dialect=@var{dialect} @gol -mword-relocations @gol -mfix-cortex-m3-ldrd @gol +-mfix-cortex-a9-volatile-hazards @gol -munaligned-access @gol -mneon-for-64bits @gol -mslow-flash-data @gol @@ -12677,6 +12678,16 @@ with overlapping destination and base registers ar generating these instructions. This option is enabled by default when @option{-mcpu=cortex-m3} is specified. +@item -mfix-cortex-a9-volatile-hazards +@opindex mfix-cortex-a9-volatile-hazards +Cortex-A9 MPCore processors have an erratum that in rare cases cause +successive memory loads to appear out of program order if another processor +is simultaneously writing to the same location. This causes problems if +volatile variables are used for communication between processors. +This option enables the ARM recommended workaround, to insert a @code{dmb} +instruction after each volatile load. Because of the potentially high +overhead, this workaround is not enabled by default. + @item -munaligned-access @itemx -mno-unaligned-access @opindex munaligned-access Index: config/arm/arm.opt === --- config/arm/arm.opt (revision 211364) +++ config/arm/arm.opt (working copy) @@ -264,6 +264,11 @@ Target Report Var(fix_cm3_ldrd) Init(2) Avoid overlapping destination and address registers on LDRD instructions that may trigger Cortex-M3 errata. +mfix-cortex-a9-volatile-hazards +Target Report Var(fix_a9_volatile_hazards) Init(0) +Avoid errata causing read-after-read hazards for concurrent volatile +accesses on Cortex-A9 MPCore processors. + munaligned-access Target Report Var(unaligned_access) Init(2) Enable unaligned word and halfword accesses to packed data. Index: config/arm/sync.md === --- config/arm/sync.md (revision 211364) +++ config/arm/sync.md (working copy) @@ -46,7 +46,7 @@ (define_insn *memory_barrier [(set (match_operand:BLK 0 ) (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))] - TARGET_HAVE_MEMORY_BARRIER + TARGET_HAVE_MEMORY_BARRIER !TARGET_THUMB2 { if (TARGET_HAVE_DMB) { @@ -65,6 +65,29 @@ (set_attr conds unconditional) (set_attr predicable no)]) +;; Thumb-2 version allows conditional execution +(define_insn *memory_barrier_t2 + [(set (match_operand:BLK 0 ) + (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))] + TARGET_HAVE_MEMORY_BARRIER TARGET_THUMB2 + { +if (TARGET_HAVE_DMB) + { + /* Note we issue a system level barrier. We should consider issuing + a inner shareabilty zone barrier here instead, ie. DMB ISH. */ + /* ??? Differentiate based on SEQ_CST vs less strict? */ + return dmb%?\tsy; + } + +if (TARGET_HAVE_DMB_MCR) + return mcr%?\tp15, 0, r0, c7, c10, 5; + +gcc_unreachable (); + } + [(set_attr length 4) + (set_attr conds nocond) + (set_attr predicable yes
[PATCH, ARM] MI-thunk fix for TARGET_THUMB1_ONLY
Hi Richard, Ramana, Attached is a small fix for resolving a g++.old-deja/g++.jason/thunk2.C regression we found under a TARGET_THUMB1_ONLY multilib (-mthumb -march=armv6-m to be exact). Basically under those conditions, the thunk is in Thumb mode, so the subtraction should be 4 rather than 8. Original patch was by Julian, with trivial adaptations for trunk by me. We've been carrying this fix for a while by now. Okay for trunk? (and stable branches?) Thanks, Chung-Lin 2014-06-08 Julian Brown jul...@codesourcery.com Chung-Lin Tang clt...@codesourcery.com * config/arm/arm.c (arm_output_mi_thunk): Fix offset for TARGET_THUMB1_ONLY. Add comments. Index: config/arm/arm.c === --- config/arm/arm.c (revision 211353) +++ config/arm/arm.c (working copy) @@ -28428,9 +28428,13 @@ arm_output_mi_thunk (FILE *file, tree thunk ATTRIB fputs (:\n, file); if (flag_pic) { - /* Output .word .LTHUNKn-7-.LTHUNKPCn. */ + /* Output .word .LTHUNKn-[37]-.LTHUNKPCn. */ rtx tem = XEXP (DECL_RTL (function), 0); - tem = plus_constant (GET_MODE (tem), tem, -7); + /* For TARGET_THUMB1_ONLY the thunk is in Thumb mode, so the PC + pipeline offset is four rather than eight. Adjust the offset + accordingly. */ + tem = plus_constant (GET_MODE (tem), tem, + TARGET_THUMB1_ONLY ? -3 : -7); tem = gen_rtx_MINUS (GET_MODE (tem), tem, gen_rtx_SYMBOL_REF (Pmode,
[PATCH, nios2] Misc. fixes
This contains a few small changes/fixes, committed to trunk. (1) a typo in nios2_function_profiler. (2) unneeded parameter in nios2_large_got_address(). (3) remove two no longer needed unspec enums. (4) Provide basic implementation of the delegitimize address hook, to silent a unrecognized UNSPECs warning when building with -g. (5) Adjust the nios2-linux LINK_SPEC to define the dynamic linker name to /lib/ld-linux-nios2.so.1, which is the current upstreamed arrangement. Chung-Lin 2014-04-01 Chung-Lin Tang clt...@codesourcery.com * config/nios2/nios2.md (unspec): Remove UNSPEC_TLS, UNSPEC_TLS_LDM. * config/nios2/nios2.c (nios2_function_profiler): Fix addi operand typo. (nios2_large_got_address): Remove unneeded 'sym' parameter. (nios2_got_address): Update nios2_large_got_address call site. (nios2_delegitimize_address): New function. (TARGET_DELEGITIMIZE_ADDRESS): Define to nios2_delegitimize_address. * config/nios2/linux.h (GLIBC_DYNAMIC_LINKER): Define. (LINK_SPEC): Specify dynamic linker using GNU_USER_DYNAMIC_LINKER. Index: config/nios2/linux.h === --- config/nios2/linux.h (revision 208987) +++ config/nios2/linux.h (working copy) @@ -26,11 +26,16 @@ } \ while (0) +#define GLIBC_DYNAMIC_LINKER /lib/ld-linux-nios2.so.1 + #undef LINK_SPEC #define LINK_SPEC LINK_SPEC_ENDIAN \ - %{shared:-shared} \ -%{static:-Bstatic} \ -%{rdynamic:-export-dynamic} + %{shared:-shared} \ + %{!shared: \ +%{!static: \ + %{rdynamic:-export-dynamic} \ + -dynamic-linker GNU_USER_DYNAMIC_LINKER } \ +%{static:-static}} /* This toolchain implements the ABI for Linux Systems documented in the Nios II Processor Reference Handbook. */ Index: config/nios2/nios2.md === --- config/nios2/nios2.md (revision 208987) +++ config/nios2/nios2.md (working copy) @@ -74,8 +74,6 @@ UNSPEC_PIC_SYM UNSPEC_PIC_CALL_SYM UNSPEC_PIC_GOTOFF_SYM - UNSPEC_TLS - UNSPEC_TLS_LDM UNSPEC_LOAD_TLS_IE UNSPEC_ADD_TLS_LE UNSPEC_ADD_TLS_GD Index: config/nios2/nios2.c === --- config/nios2/nios2.c (revision 208987) +++ config/nios2/nios2.c (working copy) @@ -695,7 +695,7 @@ nios2_function_profiler (FILE *file, int labelno A fprintf (file, \taddi\tr3, r3, %%lo(_gp_got - 1b)\n); fprintf (file, \tadd\tr2, r2, r3\n); fprintf (file, \tmovhi\tr3, %%call_hiadj(_mcount)\n); - fprintf (file, \taddi\tr3, %%call_lo(_mcount)\n); + fprintf (file, \taddi\tr3, r3, %%call_lo(_mcount)\n); fprintf (file, \tadd\tr3, r2, r3\n); fprintf (file, \tldw\tr2, 0(r3)\n); fprintf (file, \tcallr\tr2\n); @@ -1183,7 +1183,7 @@ nios2_unspec_offset (rtx loc, int unspec) /* Generate GOT pointer based address with large offset. */ static rtx -nios2_large_got_address (rtx sym, rtx offset) +nios2_large_got_address (rtx offset) { rtx addr = gen_reg_rtx (Pmode); emit_insn (gen_add3_insn (addr, pic_offset_table_rtx, @@ -1199,7 +1199,7 @@ nios2_got_address (rtx loc, int unspec) crtl-uses_pic_offset_table = 1; if (nios2_large_offset_p (unspec)) -return nios2_large_got_address (loc, offset); +return nios2_large_got_address (offset); return gen_rtx_PLUS (Pmode, pic_offset_table_rtx, offset); } @@ -1805,6 +1805,30 @@ nios2_legitimize_address (rtx x, rtx oldx ATTRIBUT return x; } +static rtx +nios2_delegitimize_address (rtx x) +{ + x = delegitimize_mem_from_attrs (x); + + if (GET_CODE (x) == CONST GET_CODE (XEXP (x, 0)) == UNSPEC) +{ + switch (XINT (XEXP (x, 0), 1)) + { + case UNSPEC_PIC_SYM: + case UNSPEC_PIC_CALL_SYM: + case UNSPEC_PIC_GOTOFF_SYM: + case UNSPEC_ADD_TLS_GD: + case UNSPEC_ADD_TLS_LDM: + case UNSPEC_LOAD_TLS_IE: + case UNSPEC_ADD_TLS_LE: + x = XVECEXP (XEXP (x, 0), 0, 0); + gcc_assert (GET_CODE (x) == SYMBOL_REF); + break; + } +} + return x; +} + /* Main expander function for RTL moves. */ int nios2_emit_move_sequence (rtx *operands, enum machine_mode mode) @@ -3259,6 +3283,9 @@ nios2_merge_decl_attributes (tree olddecl, tree ne #undef TARGET_LEGITIMIZE_ADDRESS #define TARGET_LEGITIMIZE_ADDRESS nios2_legitimize_address +#undef TARGET_DELEGITIMIZE_ADDRESS +#define TARGET_DELEGITIMIZE_ADDRESS nios2_delegitimize_address + #undef TARGET_LEGITIMATE_ADDRESS_P #define TARGET_LEGITIMATE_ADDRESS_P nios2_legitimate_address_p
[PATCH, nios2] Fix frame pointer calculation
The current Nios II prologue/epilogue code has a bug where the frame pointer points to the start of the register save area, rather than the frame slot where FP is saved (as specified the Nios II ABI). This was only discovered relatively recently, as dwarf-based unwinding is used most of the time, plus nios2 GDB's prologue analyzer is capable of determining where FP is stored. Still this needs to fixed to be conformant to the ABI. Tested (both the compiler and gdb) and applied to trunk. Chung-Lin 2014-03-11 Chung-Lin Tang clt...@codesourcery.com * config/nios2/nios2.c (machine_function): Add fp_save_offset field. (nios2_compute_frame_layout): Add calculation of cfun-machine-fp_save_offset. (nios2_expand_prologue): Correct setting of frame pointer register in prologue. (nios2_expand_epilogue): Update recovery of stack pointer from frame pointer accordingly. (nios2_initial_elimination_offset): Update calculation of offset for eliminating to HARD_FRAME_POINTER_REGNUM. Index: config/nios2/nios2.c === --- config/nios2/nios2.c(revision 208471) +++ config/nios2/nios2.c(working copy) @@ -81,8 +81,10 @@ struct GTY (()) machine_function int args_size; /* Number of bytes needed to store registers in frame. */ int save_reg_size; - /* Offset from new stack pointer to store registers. */ + /* Offset from new stack pointer to store registers. */ int save_regs_offset; + /* Offset from save_regs_offset to store frame pointer register. */ + int fp_save_offset; /* != 0 if frame layout already calculated. */ int initialized; }; @@ -390,6 +392,17 @@ nios2_compute_frame_layout (void) } } + cfun-machine-fp_save_offset = 0; + if (save_mask (1 HARD_FRAME_POINTER_REGNUM)) +{ + int fp_save_offset = 0; + for (regno = 0; regno HARD_FRAME_POINTER_REGNUM; regno++) + if (save_mask (1 regno)) + fp_save_offset += 4; + + cfun-machine-fp_save_offset = fp_save_offset; +} + save_reg_size = NIOS2_STACK_ALIGN (save_reg_size); total_size += save_reg_size; total_size += NIOS2_STACK_ALIGN (crtl-args.pretend_args_size); @@ -450,8 +463,8 @@ nios2_expand_prologue (void) { unsigned int regno; int total_frame_size, save_offset; - int sp_offset; /* offset from base_reg to final stack value. */ - int fp_offset; /* offset from base_reg to final fp value. */ + int sp_offset; /* offset from base_reg to final stack value. */ + int save_regs_base; /* offset from base_reg to register save area. */ rtx insn; total_frame_size = nios2_compute_frame_layout (); @@ -468,8 +481,7 @@ nios2_expand_prologue (void) gen_int_mode (cfun-machine-save_regs_offset - total_frame_size, Pmode))); RTX_FRAME_RELATED_P (insn) = 1; - - fp_offset = 0; + save_regs_base = 0; sp_offset = -cfun-machine-save_regs_offset; } else if (total_frame_size) @@ -478,16 +490,16 @@ nios2_expand_prologue (void) gen_int_mode (-total_frame_size, Pmode))); RTX_FRAME_RELATED_P (insn) = 1; - fp_offset = cfun-machine-save_regs_offset; + save_regs_base = cfun-machine-save_regs_offset; sp_offset = 0; } else -fp_offset = sp_offset = 0; +save_regs_base = sp_offset = 0; if (crtl-limit_stack) nios2_emit_stack_limit_check (); - save_offset = fp_offset + cfun-machine-save_reg_size; + save_offset = save_regs_base + cfun-machine-save_reg_size; for (regno = LAST_GP_REG; regno 0; regno--) if (cfun-machine-save_mask (1 regno)) @@ -498,9 +510,10 @@ nios2_expand_prologue (void) if (frame_pointer_needed) { + int fp_save_offset = save_regs_base + cfun-machine-fp_save_offset; insn = emit_insn (gen_add3_insn (hard_frame_pointer_rtx, stack_pointer_rtx, - gen_int_mode (fp_offset, Pmode))); + gen_int_mode (fp_save_offset, Pmode))); RTX_FRAME_RELATED_P (insn) = 1; } @@ -555,7 +568,9 @@ nios2_expand_epilogue (bool sibcall_p) if (frame_pointer_needed) { /* Recover the stack pointer. */ - insn = emit_move_insn (stack_pointer_rtx, hard_frame_pointer_rtx); + insn = emit_insn (gen_add3_insn + (stack_pointer_rtx, hard_frame_pointer_rtx, +gen_int_mode (-cfun-machine-fp_save_offset, Pmode))); cfa_adj = plus_constant (Pmode, stack_pointer_rtx, (total_frame_size - cfun-machine-save_regs_offset)); @@ -772,7 +787,8 @@ nios2_initial_elimination_offset (int from, int to /* If we are asked for the frame pointer offset
Re: [PATCH/middle-end 2/6] __builtin_thread_pointer and AARCH64 ILP32
On 2014/2/25 上午 10:06, Andrew Pinski wrote: On Wed, Dec 4, 2013 at 9:42 AM, Yufeng Zhang yufeng.zh...@arm.com wrote: On 12/03/13 21:24, Andrew Pinski wrote: Hi, With ILP32 AARCH64, Pmode (DImode) != ptrmode (SImode) so the variable decl has a mode of SImode while the register is DImode. So the target that gets passed down to expand_builtin_thread_pointer is NULL as expand does not know how to get a subreg for a pointer type. This fixes the problem by handling a NULL target like we are able to handle for a non register/correct mode target inside expand_builtin_thread_pointer. OK? Build and tested for aarch64-elf with no regressions. Thanks, Andrew Pinski * builtins.c (expand_builtin_thread_pointer): Create a new target when the target is NULL. --- gcc/ChangeLog |5 + gcc/builtins.c |2 +- 2 files changed, 6 insertions(+), 1 deletions(-) diff --git a/gcc/builtins.c b/gcc/builtins.c index 4f1c818..66797fa 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -5699,7 +5699,7 @@ expand_builtin_thread_pointer (tree exp, rtx target) if (icode != CODE_FOR_nothing) { struct expand_operand op; - if (!REG_P (target) || GET_MODE (target) != Pmode) + if (target == NULL_RTX || !REG_P (target) || GET_MODE (target) != Pmode) target = gen_reg_rtx (Pmode); create_output_operand (op, target, Pmode); expand_insn (icode, 1,op); Shouldn't thread pointer have ptr_mode instead? I'm aware that on AArch64 the thread pointer system register tpidr_el0 is 64-bit wide regardless of ILP32 or not, but in the abstracted view of AArch64 ILP32 world, the thread pointer shall be a 32-bit pointer; the OS should have taken care of the hardware register tpidr_el0 by having its higher 32 bits cleared. I think expand_builtin_thread_pointer and expand_builtin_set_thread_pointer should use ptr_mode instead. Correct me if I missed anything. Add Chung-Lin Tang to the CC list; Chung-Lin wrote these builtins in r192364 Pmode seems more correct as the hardware mode is Pmode, having it ptr_mode would expose the C size of the pointer rather than the hardware size. Thanks, Andrew Pinski Don't have any approval powers, but the fix looks rather obvious. Thanks, Chung-Lin
[PATCH, nios2] Large -fPIC support
This patch adds large GOT support for the Nios II backend. Tested by running glibc tests with -fPIC forced on. A few smaller libgcc fixes are also included as well. Patch committed. Chung-Lin 2014-02-20 Chung-Lin Tang clt...@codesourcery.com Sandra Loosemore san...@codesourcery.com gcc/ * config/nios2/nios2.md (unspec): Add UNSPEC_PIC_GOTOFF_SYM enum. * config/nios2/nios2.c (nios2_function_profiler): Add -fPIC (flag_pic == 2) support. (nios2_handle_custom_fpu_cfg): Fix warning parameter. (nios2_large_offset_p): New function. (nios2_unspec_reloc_p): Move up position, update to use nios2_large_offset_p. (nios2_unspec_address): Remove function. (nios2_unspec_offset): New function. (nios2_large_got_address): New function. (nios2_got_address): Add large offset support. (nios2_legitimize_tls_address): Update usage of removed and new functions. (nios2_symbol_binds_local_p): New function. (nios2_load_pic_address): Add -fPIC (flag_pic == 2) support. (nios2_legitimize_address): Update to use nios2_large_offset_p. (nios2_emit_move_sequence): Avoid legitimizing (const (unspec ...)). (nios2_print_operand): Merge H/L processing, add hiadj/lo processing for (const (unspec ...)). (nios2_unspec_reloc_name): Add UNSPEC_PIC_GOTOFF_SYM case. gcc/testsuite/ * gcc.target/nios2/biggot-1.c: New. * gcc.target/nios2/biggot-2.c: New. libgcc/ * config/nios2/t-nios2 (CRTSTUFF_T_CFLAGS): Add -mno-gpopt. * config/nios2/crti.S: Remove .file directive. * config/nios2/crtn.S: Likewise. Index: gcc/config/nios2/nios2.c === --- gcc/config/nios2/nios2.c (revision 207964) +++ gcc/config/nios2/nios2.c (working copy) @@ -664,7 +664,7 @@ void nios2_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED) { fprintf (file, \tmov\tr8, ra\n); - if (flag_pic) + if (flag_pic == 1) { fprintf (file, \tnextpc\tr2\n); fprintf (file, \t1: movhi\tr3, %%hiadj(_gp_got - 1b)\n); @@ -673,6 +673,18 @@ nios2_function_profiler (FILE *file, int labelno A fprintf (file, \tldw\tr2, %%call(_mcount)(r2)\n); fprintf (file, \tcallr\tr2\n); } + else if (flag_pic == 2) +{ + fprintf (file, \tnextpc\tr2\n); + fprintf (file, \t1: movhi\tr3, %%hiadj(_gp_got - 1b)\n); + fprintf (file, \taddi\tr3, r3, %%lo(_gp_got - 1b)\n); + fprintf (file, \tadd\tr2, r2, r3\n); + fprintf (file, \tmovhi\tr3, %%call_hiadj(_mcount)\n); + fprintf (file, \taddi\tr3, %%call_lo(_mcount)\n); + fprintf (file, \tadd\tr3, r2, r3\n); + fprintf (file, \tldw\tr2, 0(r3)\n); + fprintf (file, \tcallr\tr2\n); +} else fprintf (file, \tcall\t_mcount\n); fprintf (file, \tmov\tra, r8\n); @@ -920,7 +932,7 @@ nios2_handle_custom_fpu_cfg (const char *cfgname, } else warning (0, ignoring unrecognized switch %-mcustom-fpu-cfg% - value %%s%, cfg); + value %%s%, cfgname); /* Guard against errors in the standard configurations. */ nios2_custom_check_insns (); @@ -1116,20 +1128,64 @@ nios2_call_tls_get_addr (rtx ti) return ret; } +/* Return true for large offsets requiring hiadj/lo relocation pairs. */ +static bool +nios2_large_offset_p (int unspec) +{ + gcc_assert (nios2_unspec_reloc_name (unspec) != NULL); + + if (flag_pic == 2 + /* FIXME: TLS GOT offset relocations will eventually also get this + treatment, after binutils support for those are also completed. */ + (unspec == UNSPEC_PIC_SYM || unspec == UNSPEC_PIC_CALL_SYM)) +return true; + + /* 'gotoff' offsets are always hiadj/lo. */ + if (unspec == UNSPEC_PIC_GOTOFF_SYM) +return true; + + return false; +} + +/* Return true for conforming unspec relocations. Also used in + constraints.md and predicates.md. */ +bool +nios2_unspec_reloc_p (rtx op) +{ + return (GET_CODE (op) == CONST + GET_CODE (XEXP (op, 0)) == UNSPEC + ! nios2_large_offset_p (XINT (XEXP (op, 0), 1))); +} + +/* Helper to generate unspec constant. */ static rtx -nios2_unspec_address (rtx loc, rtx base_reg, int unspec) +nios2_unspec_offset (rtx loc, int unspec) { - rtx unspec_offset = -gen_rtx_CONST (Pmode, gen_rtx_UNSPEC (Pmode, gen_rtvec (1, loc), - unspec)); - return gen_rtx_PLUS (Pmode, base_reg, unspec_offset); + return gen_rtx_CONST (Pmode, gen_rtx_UNSPEC (Pmode, gen_rtvec (1, loc), + unspec)); } +/* Generate GOT pointer based address with large offset. */ static rtx +nios2_large_got_address (rtx sym, rtx offset) +{ + rtx addr = gen_reg_rtx (Pmode); + emit_insn (gen_add3_insn (addr, pic_offset_table_rtx, + force_reg (Pmode, offset))); + return addr; +} + +/* Generate a GOT pointer based address. */ +static rtx nios2_got_address (rtx loc, int unspec) { + rtx offset
[PATCH, nios2, committed] PR59784, fextsd asm output fix
Hi Savin, I've committed your patch for PR59784; the fix seems small enough to accept directly. Thanks a lot for catching this. Thanks, Chung-Lin 2014-01-30 Savin Zlobec savin.zlo...@gmail.com PR target/59784 * config/nios2/nios2.c (nios2_fpu_insn_asm): Fix asm output of SFmode to DFmode case. Index: config/nios2/nios2.c === --- config/nios2/nios2.c(revision 207296) +++ config/nios2/nios2.c(working copy) @@ -2066,7 +2066,8 @@ nios2_fpu_insn_asm (enum n2fpu_code code) } else { - op1 = %0; op2 = %1; + op1 = (dst_mode == DFmode ? %D0 : %0); + op2 = %1; op3 = (num_operands == 2 ? zero : %2); } }
Re: Reload codegen improvement
On 14/1/8 12:22 AM, Bernd Schmidt wrote: This fixes a problem identified by Chung-Lin. Once reload is done, all equivalencing insns for pseudos that didn't get a hard reg but could be eliminated using their REG_EQUIV are deleted. However, we still can produce reloads and reload insns for them in certain cases, leading to unnecessary spilling. This patch corrects this by making sure we use identical tests when deciding whether to ignore an insn while reloading, and whether to delete it afterwards. Bootstrapped and tested on x86_64-linux (with lra_p disabled). Chung-Lin says he's tested it as well, I think on arm (probably with something 4.8 based). Will commit in a few days if no objections. Bernd Hi Bernd, this does not seem to be committed yet. Thanks, Chung-Lin
Re: [buildrobot] [PATCH] Fix redefinition of BITS_PER_UNIT
On 2014/1/1 02:45 PM, Mike Stump wrote: On Dec 31, 2013, at 12:26 PM, Jan-Benedict Glaw jbg...@lug-owl.de wrote: On Tue, 2013-12-31 15:24:52 +0800, Chung-Lin Tang clt...@codesourcery.com wrote: The nios2 port was just committed. Thanks to all that gave time and effort to review this. Just a heads-up: I see a lot of warnings about BITS_PER_UNIT being redefined, see eg. http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=74923 as an example. 2013-12-31 Jan-Benedict Glaw jbg...@lug-owl.de * config/nios2/nios2.h (BITS_PER_UNIT): Don't define it. diff --git a/gcc/config/nios2/nios2.h b/gcc/config/nios2/nios2.h index 8e6941b..f333be3 100644 --- a/gcc/config/nios2/nios2.h +++ b/gcc/config/nios2/nios2.h @@ -73,7 +73,6 @@ #define BITS_BIG_ENDIAN 0 #define BYTES_BIG_ENDIAN (TARGET_BIG_ENDIAN != 0) #define WORDS_BIG_ENDIAN (TARGET_BIG_ENDIAN != 0) -#define BITS_PER_UNIT 8 #define BITS_PER_WORD 32 #define UNITS_PER_WORD 4 #define POINTER_SIZE 32 Ok? Ok. Thanks for catching that.
nios2 port committed (Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts)
On 2013/12/28 02:29 PM, Chung-Lin Tang wrote: On 13/12/23 12:54 AM, Chung-Lin Tang wrote: Other than these two, I think this can go in. Bernd Attached is the updated patch for the compiler. Since Bernd is a Global Reviewer, am I clear for committing the port now? (including the testsuite and libgcc parts) I will be taking Bernd's prior mail as an approval. For avoidance of doubt, unless there are more comments raised, I will be committing the port to trunk next week. The nios2 port was just committed. Thanks to all that gave time and effort to review this. Thanks, Chung-Lin
Re: Question about simple_return pattern for the GCC ARM backend.
On 2013/12/28 09:31 AM, Yangfei (Felix) wrote: Hi, I think that simple_return standard pattern is useful for the ARM. I mean it should be good for target code performance. But seems this pattern is not there for the GCC ARM backend. Can anyone explain the reason why we don’t need this? Cheers, Fei It does use it. Search for the return_strreturn expand pattern, and the returns code iterator in config/arm/iterators.md. Chung-Lin
Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts
On 13/12/23 12:54 AM, Chung-Lin Tang wrote: Other than these two, I think this can go in. Bernd Attached is the updated patch for the compiler. Since Bernd is a Global Reviewer, am I clear for committing the port now? (including the testsuite and libgcc parts) I will be taking Bernd's prior mail as an approval. For avoidance of doubt, unless there are more comments raised, I will be committing the port to trunk next week. Thanks, Chung-Lin
Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts
Ping x3. On 13/12/10 12:57 PM, Chung-Lin Tang wrote: Ping x2. On 2013/12/5 12:19 PM, Chung-Lin Tang wrote: Ping. On 2013/11/26 02:45 PM, Chung-Lin Tang wrote: Hi Bernd, I've updated the patch again, please see if it looks fit for approval now. Including ChangeLog again for completeness. Thanks, Chung-Lin 2013-11-26 Chung-Lin Tang clt...@codesourcery.com Sandra Loosemore san...@codesourcery.com Based on patches from Altera Corporation * config.gcc (nios2-*-*): Add nios2 config targets. * configure.ac (TLS_SECTION_ASM_FLAG): Add nios2 case. ($cpu_type): Add nios2 as new cpu type. * configure: Regenerate. * config/nios2/nios2.c: New file. * config/nios2/nios2.h: New file. * config/nios2/nios2-opts.h: New file. * config/nios2/nios2-protos.h: New file. * config/nios2/elf.h: New file. * config/nios2/elf.opt: New file. * config/nios2/linux.h: New file. * config/nios2/nios2.opt: New file. * config/nios2/nios2.md: New file. * config/nios2/predicates.md: New file. * config/nios2/constraints.md: New file. * config/nios2/t-nios2: New file. * common/config/nios2/nios2-common.c: New file. * doc/invoke.texi (Nios II options): Document Nios II specific options. * doc/md.texi (Nios II family): Document Nios II specific constraints. * doc/extend.texi (Function Specific Option Pragmas): Document Nios II supported target pragma functionality.
Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts
Ping x2. On 2013/12/5 12:19 PM, Chung-Lin Tang wrote: Ping. On 2013/11/26 02:45 PM, Chung-Lin Tang wrote: Hi Bernd, I've updated the patch again, please see if it looks fit for approval now. Including ChangeLog again for completeness. Thanks, Chung-Lin 2013-11-26 Chung-Lin Tang clt...@codesourcery.com Sandra Loosemore san...@codesourcery.com Based on patches from Altera Corporation * config.gcc (nios2-*-*): Add nios2 config targets. * configure.ac (TLS_SECTION_ASM_FLAG): Add nios2 case. ($cpu_type): Add nios2 as new cpu type. * configure: Regenerate. * config/nios2/nios2.c: New file. * config/nios2/nios2.h: New file. * config/nios2/nios2-opts.h: New file. * config/nios2/nios2-protos.h: New file. * config/nios2/elf.h: New file. * config/nios2/elf.opt: New file. * config/nios2/linux.h: New file. * config/nios2/nios2.opt: New file. * config/nios2/nios2.md: New file. * config/nios2/predicates.md: New file. * config/nios2/constraints.md: New file. * config/nios2/t-nios2: New file. * common/config/nios2/nios2-common.c: New file. * doc/invoke.texi (Nios II options): Document Nios II specific options. * doc/md.texi (Nios II family): Document Nios II specific constraints. * doc/extend.texi (Function Specific Option Pragmas): Document Nios II supported target pragma functionality.
Re: [PATCH] Hexadecimal numbers in option arguments
On 2013/7/14 09:27 PM, Joseph S. Myers wrote: On Sun, 14 Jul 2013, Chung-Lin Tang wrote: Original patch posted as part of Nios II patches: http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01087.html This patch is to allow hexadecimal numbers to be used in option arguments, e.g. -falign-loops=0x10 can now be used as equivalent to -falign-loops=16. Joseph, the patch has been modified to use IXDIGIT to check the argument string first, as you suggested in the last submission. Is this okay for trunk? This version looks like it will allow plain 0x or 0X as an argument, treating it as 0, rather than treating it as an error (i.e., you need to check there is at least one hex digit after the 0x or 0X before passing the string to strtol). Hi Joseph, Forgot to follow up on this patch. Here it is with a small update to check if 'p' got updated to a difference position. Does this now look okay? Thanks, Chung-Lin Index: opts-common.c === --- opts-common.c (revision 205847) +++ opts-common.c (working copy) @@ -147,7 +147,7 @@ find_opt (const char *input, unsigned int lang_mas return match_wrong_lang; } -/* If ARG is a non-negative integer made up solely of digits, return its +/* If ARG is a non-negative decimal or hexadecimal integer, return its value, otherwise return -1. */ int @@ -161,6 +161,17 @@ integral_argument (const char *arg) if (*p == '\0') return atoi (arg); + /* It wasn't a decimal number - try hexadecimal. */ + if (arg[0] == '0' (arg[1] == 'x' || arg[1] == 'X')) +{ + p = arg + 2; + while (*p ISXDIGIT (*p)) + p++; + + if (p != arg + 2 *p == '\0') + return strtol (arg, NULL, 16); +} + return -1; }
Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts
Ping. On 2013/11/26 02:45 PM, Chung-Lin Tang wrote: Hi Bernd, I've updated the patch again, please see if it looks fit for approval now. Including ChangeLog again for completeness. Thanks, Chung-Lin 2013-11-26 Chung-Lin Tang clt...@codesourcery.com Sandra Loosemore san...@codesourcery.com Based on patches from Altera Corporation * config.gcc (nios2-*-*): Add nios2 config targets. * configure.ac (TLS_SECTION_ASM_FLAG): Add nios2 case. ($cpu_type): Add nios2 as new cpu type. * configure: Regenerate. * config/nios2/nios2.c: New file. * config/nios2/nios2.h: New file. * config/nios2/nios2-opts.h: New file. * config/nios2/nios2-protos.h: New file. * config/nios2/elf.h: New file. * config/nios2/elf.opt: New file. * config/nios2/linux.h: New file. * config/nios2/nios2.opt: New file. * config/nios2/nios2.md: New file. * config/nios2/predicates.md: New file. * config/nios2/constraints.md: New file. * config/nios2/t-nios2: New file. * common/config/nios2/nios2-common.c: New file. * doc/invoke.texi (Nios II options): Document Nios II specific options. * doc/md.texi (Nios II family): Document Nios II specific constraints. * doc/extend.texi (Function Specific Option Pragmas): Document Nios II supported target pragma functionality.
Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts
On 13/11/22 10:31 PM, Bernd Schmidt wrote: If you don't object, I'll keep the clobbers there for now. If they serve no purpose (and I think they don't), they should go. I'll check again, but I remember df_regs_ever_live_p doesn't include the RA reg if the call pattern doesn't have the clobber. I see c6x doesn't have that clobber either, but checks crtl-is_leaf instead in the frame layout. Looking across the backends, adding a clobber appears to be the more usual style. + if (!strncasecmp (cfg, 60-1, 4)) strcmp, several times. At least judging by the docs allowing 60-1fish is unintentional? I changed them to use strncmp instead. This routine has to work on a possibly longer target attribute string, hence the 'n' variants. I don't understand this. Using strncmp matches 60-1 and any other string beginning with that prefix, but doesn't distinguish between them. If that really is the desired behaviour, it needs a comment, but it still looks to me as if this is just lacking proper error checking. This has to work on target attribute strings, which it may be in the middle of the string. But your right that the single option form may allow the extended suffix, I'll fix this next. +#define SC_REGNO (12) +#define STATIC_CHAIN_REGNUM SC_REGNO There are a lot of these pairs, and it looks like unnecessary double indirection. Lose the former in all cases. No parens around constants. I've reorganized a lot of this in nios2.h. I'd prefer the new #if 0 blocks there to go away. Argh, forgot to delete that. Thanks for catching. I'll address the rest in the next revision. Thanks, Chung-Lin
Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts
On 2013/11/21 03:25 PM, Richard Henderson wrote: On 11/21/2013 02:41 PM, Chung-Lin Tang wrote: I'm not saying we tolerate wrong RTL form, but rather that, it should be an issue of the RTL passes, not the backend. The backend should just be as much as possible, a machine description. Matching non-canonical rtl does nothing but slow down the compiler and cause confusion for the poor person that has to maintain the code later. As I mentioned in the last mail, I will modify the constraints as suggested for the *norsi3 and EQ/NE cmp patterns in the final patch. Chung-Lin
Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts
On 13/11/20 1:34 AM, Richard Sandiford wrote: Chung-Lin Tang clt...@codesourcery.com writes: +;; Integer logical Operations + +(define_code_iterator LOGICAL [and ior xor]) +(define_code_attr logical_asm [(and and) (ior or) (xor xor)]) + +(define_insn codesi3 + [(set (match_operand:SI 0 register_operand =r,r,r) +(LOGICAL:SI (match_operand:SI 1 register_operand %r,r,r) +(match_operand:SI 2 logical_operand rM,J,K)))] + + @ +logical_asm\\t%0, %1, %z2 +logical_asm%i2\\t%0, %1, %2 +logical_asmh%i2\\t%0, %1, %U2 + [(set_attr type alu)]) + +(define_insn *norsi3 + [(set (match_operand:SI 0 register_operand =r) +(and:SI (not:SI (match_operand:SI 1 register_operand %r)) +(not:SI (match_operand:SI 2 reg_or_0_operand rM] + + nor\\t%0, %1, %z2 + [(set_attr type alu)]) M constraints (for const0_rtx) and reg_or_0 seem unnecessary, no such RTL should make it to this point. Such RTL does appear under -O0. Removing the 'M' will also require a bit of re-working the operand printing mechanics; not a lot of work, but I'd rather keep it as is. The behavior of using the zero register for a 0-value is also more expected in nios2, I think. Will any RTL-level passes, say for example, propagate a zero-constant into the pattern, without also simplifying it to a NOT pattern? (that's a question, I don't really know) Personally, I haven't really seen many cases optimizable to NOR, but I think keeping the 'M' there is pretty harmless. Hmm, but if we get (not (const_int 0)) then that sounds like a bug, since (and (not X) (not (const_int 0))) should reduce to (not X). IMO target-independent code shouldn't try to create the nor-with-0 form and backends shouldn't match it. Why would removing 'M' affect the printing mechanism? Naively I'd have expected: The *norsi3 here is of course straightforward. I was referring to other cases, like the AND/IOR/XOR pattern above, where I wanted to combine them into a single alternative. That needs a bit more work to reorganize the nios2_print_operand() cases. (define_insn *norsi3 [(set (match_operand:SI 0 register_operand =r) (and:SI (not:SI (match_operand:SI 1 register_operand r)) (not:SI (match_operand:SI 2 register_operand r] nor\\t%0, %1, %2 [(set_attr type alu)]) to just work. That will definitely work, though I don't think the zero case does any harm. +;; Integer comparisons + +(define_code_iterator EQNE [eq ne]) +(define_insn nios2_cmpcode + [(set (match_operand:SI 0 register_operand =r) +(EQNE:SI (match_operand:SI 1 reg_or_0_operand %rM) + (match_operand:SI 2 arith_operand rI)))] + + cmpcode%i2\\t%0, %z1, %z2 + [(set_attr type alu)]) Once again, using reg_or_0 and M seems pointless. The compares don't support all operations, with only the second operand capable of an immediate. Using 'rM' should theoretically allow more commutative swapping. But rtl-wise, we should never generate an EQ or NE with two constants. And if one operand is constant then it's supposed to be the second. The % should give commutativity on its own, without the M. For EQ/NE I guess that's the case, for the other comparisons I'm not so sure; I'm not familiar enough with the details of the expander machinery to claim anything. If this doesn't carry to other comparisons, I intend to keep it in line with the other cmp patterns. I experimented a bit with the generated code, nothing is affected. + emit_insn + (gen_rtx_SET (Pmode, tmp, +gen_int_mode (cfun-machine-save_regs_offset, + Pmode))); Shouldn't have a mode on the SET, but really should just call emit_move_insn. Similar cases elsewhere, search for gen_rtx_SET (Pmode. I've removed the modes on SET, though I prefer the more bare generation of the insns in some contexts; emit_move_insn() seems to have a lot under the hood. There shouldn't be anything to be afraid of though. Target-independent code would use emit_move_insn for this though, so it needs to just work. It will work, and I did use it in some places, though I did not exhaustively search-and-replace. For (subtle) performance reasons, emit_move_insn() really does too much as a backend utility. Usually backend code is already very precise on what we want to generate. + HOST_WIDE_INT var_size; /* # of var. bytes allocated. */ Not the first time they occur in this file, but I suppose I should mention that these in-line comments are probably just outside our coding guidelines. Deleted the comments outside the struct defs. FWIW, I think it was more that comments should be above the field rather than tagged on the right. (One of the big problems with right-column comments is that people tend to make them too short.) I will change to use the over
Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts
On 13/11/21 7:21 AM, Richard Sandiford wrote: Chung-Lin Tang clt...@codesourcery.com writes: On 13/11/20 1:34 AM, Richard Sandiford wrote: Chung-Lin Tang clt...@codesourcery.com writes: +;; Integer logical Operations + +(define_code_iterator LOGICAL [and ior xor]) +(define_code_attr logical_asm [(and and) (ior or) (xor xor)]) + +(define_insn codesi3 + [(set (match_operand:SI 0 register_operand =r,r,r) +(LOGICAL:SI (match_operand:SI 1 register_operand %r,r,r) +(match_operand:SI 2 logical_operand rM,J,K)))] + + @ +logical_asm\\t%0, %1, %z2 +logical_asm%i2\\t%0, %1, %2 +logical_asmh%i2\\t%0, %1, %U2 + [(set_attr type alu)]) + +(define_insn *norsi3 + [(set (match_operand:SI 0 register_operand =r) +(and:SI (not:SI (match_operand:SI 1 register_operand %r)) +(not:SI (match_operand:SI 2 reg_or_0_operand rM] + + nor\\t%0, %1, %z2 + [(set_attr type alu)]) M constraints (for const0_rtx) and reg_or_0 seem unnecessary, no such RTL should make it to this point. Such RTL does appear under -O0. Removing the 'M' will also require a bit of re-working the operand printing mechanics; not a lot of work, but I'd rather keep it as is. The behavior of using the zero register for a 0-value is also more expected in nios2, I think. Will any RTL-level passes, say for example, propagate a zero-constant into the pattern, without also simplifying it to a NOT pattern? (that's a question, I don't really know) No, they shouldn't, because only the simplified form is expected to match. E.g. forwprop goes to some lengths to do this properly. Personally, I haven't really seen many cases optimizable to NOR, but I think keeping the 'M' there is pretty harmless. That's not the way to look at it though. There should be no unsimplified rtl in the instruction stream, just like there should be no non-canonical rtl. We don't have rules for canonical rtl because the other forms are harmful (in the sense of generating wrong code or whatever). We have them so that there aren't too many possible representations of the same thing, and so that code dealing with existing rtl patterns can validly expect one shape over another. If we see something that's generating the wrong rtl form, we should fix it rather than pander to it. IMO having (and (not (const_int 0)) (not X)) as an alternative representation of (not X) falls directly into that category. I'm not saying we tolerate wrong RTL form, but rather that, it should be an issue of the RTL passes, not the backend. The backend should just be as much as possible, a machine description. Saying you don't need to describe a particular thing in the backend because GCC doesn't need that sounds like a spill-over of internal details. In theory, a backend writer shouldn't need to care for such things. +;; Integer comparisons + +(define_code_iterator EQNE [eq ne]) +(define_insn nios2_cmpcode + [(set (match_operand:SI 0 register_operand =r) +(EQNE:SI (match_operand:SI 1 reg_or_0_operand %rM) + (match_operand:SI 2 arith_operand rI)))] + + cmpcode%i2\\t%0, %z1, %z2 + [(set_attr type alu)]) Once again, using reg_or_0 and M seems pointless. The compares don't support all operations, with only the second operand capable of an immediate. Using 'rM' should theoretically allow more commutative swapping. But rtl-wise, we should never generate an EQ or NE with two constants. And if one operand is constant then it's supposed to be the second. The % should give commutativity on its own, without the M. For EQ/NE I guess that's the case, for the other comparisons I'm not so sure; I'm not familiar enough with the details of the expander machinery to claim anything. Sure, but the point is that EQ and NE _are_ commutative, and rtl.texi says that: For commutative binary operations, constants should be placed in the second operand. So... If this doesn't carry to other comparisons, I intend to keep it in line with the other cmp patterns. I experimented a bit with the generated code, nothing is affected. ...no, it doesn't carry over to other comparisons, but it's not supposed to. Just like you can't add % to the other comparisons. The patterns are including constraints whose only purpose is to match non-canonical rtl. That shouldn't happen. My argument is the same as above. Having that said, I'll edit the *norsi3 and EQNE patterns to remove the 'M' like you suggested in the final patch. This is too trivial an issue to get postponed by. + emit_insn +(gen_rtx_SET (Pmode, tmp, + gen_int_mode (cfun-machine-save_regs_offset, +Pmode))); Shouldn't have a mode on the SET, but really should just call emit_move_insn. Similar cases elsewhere, search for gen_rtx_SET (Pmode. I've removed the modes
Re: [PATCH][3/3] Re-submission of Altera Nios II port, libgcc parts
On 2013/7/14 03:55 PM, Chung-Lin Tang wrote: nios2 libgcc parts. Since the original post, the only main change has been the fdpbit vs soft-fp issue raised by Joseph, which has been resolved. Other parts are mostly the same. The Nios II libgcc parts have been further updated to include a sfp-machine.h file, and the Linux atomic cmpxchg updated to now use a fixed address kernel helper cmpxchg routine, similar to ARM. Thanks, Chung-Lin 2013-11-16 Sandra Loosemore san...@codesourcery.com Chung-Lin Tang clt...@codesourcery.com Based on patches from Altera Corporation * config.host (nios2-*-*,nios2-*-linux*): Add nios2 host cases. * config/nios2/lib2-nios2.h: New file. * config/nios2/lib2-divmod-hi.c: New file. * config/nios2/linux-unwind.h: New file. * config/nios2/lib2-divmod.c: New file. * config/nios2/linux-atomic.c: New file. * config/nios2/t-nios2: New file. * config/nios2/crti.asm: New file. * config/nios2/t-linux: New file. * config/nios2/lib2-divtable.c: New file. * config/nios2/lib2-mul.c: New file. * config/nios2/tramp.c: New file. * config/nios2/crtn.asm: New file. * config/nios2/sfp-machine.h: New file. Index: libgcc/config.host === --- libgcc/config.host (revision 204897) +++ libgcc/config.host (working copy) @@ -146,6 +146,9 @@ mips*-*-*) nds32*-*) cpu_type=nds32 ;; +nios2*-*-*) + cpu_type=nios2 + ;; powerpc*-*-*) cpu_type=rs6000 ;; @@ -876,6 +879,15 @@ nds32*-elf*) ;; esac ;; +nios2-*-linux*) + tmake_file=$tmake_file nios2/t-nios2 nios2/t-linux t-libgcc-pic t-slibgcc-libgcc + extra_parts=$extra_parts crti.o crtn.o + md_unwind_header=nios2/linux-unwind.h + ;; +nios2-*-*) + tmake_file=$tmake_file nios2/t-nios2 t-softfp-sfdf t-softfp-excl t-softfp + extra_parts=$extra_parts crti.o crtn.o + ;; pdp11-*-*) tmake_file=pdp11/t-pdp11 t-fdpbit ;; Index: libgcc/config/nios2/t-linux === --- libgcc/config/nios2/t-linux (revision 0) +++ libgcc/config/nios2/t-linux (revision 0) @@ -0,0 +1,7 @@ +# Soft-float functions go in glibc only, to facilitate the possible +# future addition of exception and rounding mode support integrated +# with fenv.h. + +LIB2FUNCS_EXCLUDE = _floatdidf _floatdisf _fixunsdfsi _fixunssfsi \ + _fixunsdfdi _fixdfdi _fixunssfdi _fixsfdi _floatundidf _floatundisf +LIB2ADD += $(srcdir)/config/nios2/linux-atomic.c Index: libgcc/config/nios2/sfp-machine.h === --- libgcc/config/nios2/sfp-machine.h (revision 0) +++ libgcc/config/nios2/sfp-machine.h (revision 0) @@ -0,0 +1,78 @@ +/* Soft-FP definitions for Altera Nios II. + Copyright (C) 2013 Free Software Foundation, Inc. + +This file is free software; you can redistribute it and/or modify it +under the terms of the GNU General Public License as published by the +Free Software Foundation; either version 3, or (at your option) any +later version. + +This file is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +General Public License for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +http://www.gnu.org/licenses/. */ + +#define _FP_W_TYPE_SIZE 32 +#define _FP_W_TYPE unsigned long +#define _FP_WS_TYPE signed long +#define _FP_I_TYPE long + +#define _FP_MUL_MEAT_S(R,X,Y)\ + _FP_MUL_MEAT_1_wide(_FP_WFRACBITS_S,R,X,Y,umul_ppmm) +#define _FP_MUL_MEAT_D(R,X,Y)\ + _FP_MUL_MEAT_2_wide(_FP_WFRACBITS_D,R,X,Y,umul_ppmm) +#define _FP_MUL_MEAT_Q(R,X,Y)\ + _FP_MUL_MEAT_4_wide(_FP_WFRACBITS_Q,R,X,Y,umul_ppmm) + +#define _FP_DIV_MEAT_S(R,X,Y) _FP_DIV_MEAT_1_loop(S,R,X,Y) +#define _FP_DIV_MEAT_D(R,X,Y) _FP_DIV_MEAT_2_udiv(D,R,X,Y) +#define _FP_DIV_MEAT_Q(R,X,Y) _FP_DIV_MEAT_4_udiv(Q,R,X,Y) + +#define _FP_NANFRAC_S ((_FP_QNANBIT_S 1) - 1) +#define _FP_NANFRAC_D ((_FP_QNANBIT_D 1) - 1), -1 +#define _FP_NANFRAC_Q ((_FP_QNANBIT_Q 1) - 1), -1, -1, -1 +#define _FP_NANSIGN_S 0 +#define _FP_NANSIGN_D 0 +#define _FP_NANSIGN_Q 0 + +#define _FP_KEEPNANFRACP 1 +#define _FP_QNANNEGATEDP 0 + +/* Someone please check this. */ +#define _FP_CHOOSENAN(fs, wc, R, X, Y, OP) \ + do {\ +if ((_FP_FRAC_HIGH_RAW_##fs(X) _FP_QNANBIT_##fs) \ + !(_FP_FRAC_HIGH_RAW_##fs(Y) _FP_QNANBIT_##fs)) \ + {\ + R##_s = Y##_s; \ + _FP_FRAC_COPY_##wc(R,Y);\ + }\ +else
Re: [PATCH][2/3] Re-submission of Altera Nios II port, testsuite parts
On 2013/10/17 10:20 PM, Bernd Schmidt wrote: On 07/14/2013 09:54 AM, Chung-Lin Tang wrote: These are nios2 patches for the gcc testsuite. Some new testcases were added since the last posting. Index: gcc/testsuite/gcc.c-torture/execute/builtins/lib/chk.c === --- gcc/testsuite/gcc.c-torture/execute/builtins/lib/chk.c (revision 200946) +++ gcc/testsuite/gcc.c-torture/execute/builtins/lib/chk.c (working copy) @@ -124,16 +124,17 @@ __memmove_chk (void *dst, const void *src, __SIZE_ void * memset (void *dst, int c, __SIZE_TYPE__ n) { + while (n-- != 0) +n[(char *) dst] = c; + /* Single-byte memsets should be done inline when optimisation - is enabled. */ + is enabled. Do this after the copy in case we're being called to + initialize bss. */ #ifdef __OPTIMIZE__ if (memset_disallowed inside_main n 2) abort (); #endif - while (n-- != 0) -n[(char *) dst] = c; - return dst; } I'm not sure I understand this change. Is nios2 the only target calling memset to initialize bss, and memset_disallowed is nonzero at the start of execution? This appears to be for the nios2-elf bare metal testing. Looking at the upstream libgloss sources, nios2 is indeed not the only target that calls memset for zeroing bss. Note that however, in a somewhat reverse of situation: https://sourceware.org/ml/newlib/2013/msg00264.html It appears that due to the presumed usage model for Nios II, Sandra did not contribute the libgloss port. So the original code that needed this testsuite change is probably not there. OTOH, if this change is not deemed harmful, than it might further robustify the testsuite. Index: gcc/testsuite/gcc.target/nios2/nios2-int-types.c === --- gcc/testsuite/gcc.target/nios2/nios2-int-types.c (revision 0) +++ gcc/testsuite/gcc.target/nios2/nios2-int-types.c (revision 0) @@ -0,0 +1,34 @@ +/* Test that various types are all derived from int. */ +/* { dg-do compile { target nios2-*-* } } */ I think you can lose the { target nios2-*-* } for everything inside gcc.target/nios2. Done. The new attached patch also has the Dxx constraint test removed, as that feature is now removed from the compiler. The memset() change mentioned above is still in the patch, but will remove before committing if not approved. Thanks, Chung-Lin 2013-11-16 Sandra Loosemore san...@codesourcery.com Chung-Lin Tang clt...@codesourcery.com Based on patches from Altera Corporation * gcc.dg/stack-usage-1.c (SIZE): Define case for __nios2__. * gcc.dg/20040813-1.c: Skip for nios2-*-*. * gcc.dg/20020312-2.c: Add __nios2__ case. * g++.dg/other/PR23205.C: Skip for nios2-*-*. * g++.dg/other/pr23205-2.C: Skip for nios2-*-*. * g++.dg/cpp0x/constexpr-rom.C: Skip for nios2-*-*. * g++.dg/cpp0x/alias-decl-debug-0.C: Skip for nios2-*-*. * g++.old-deja/g++.jason/thunk3.C: Skip for nios2-*-*. * lib/target-supports.exp (check_profiling_available): Check for nios2-*-elf. * gcc.c-torture/execute/pr47237.x:: Skip for nios2-*-*. * gcc.c-torture/execute/20101011-1.c: Skip for nios2-*-*. * gcc.c-torture/execute/builtins/lib/chk.c (memset): Place char-based memset loop before inline check, to prevent problems when called to initialize .bss. Update comments. * gcc.target/nios2/nios2.exp: New DejaGNU file. * gcc.target/nios2/nios2-custom-1.c: New test. * gcc.target/nios2/nios2-trap-insn.c: New test. * gcc.target/nios2/nios2-builtin-custom.c: New test. * gcc.target/nios2/nios2-builtin-io.c: New test. * gcc.target/nios2/nios2-stack-check-1.c: New test. * gcc.target/nios2/nios2-stack-check-2.c: New test. * gcc.target/nios2/nios2-rdctl.c: New test. * gcc.target/nios2/nios2-wrctl.c: New test. * gcc.target/nios2/nios2-wrctl-zero.c: New test. * gcc.target/nios2/nios2-wrctl-not-zero.c: New test. * gcc.target/nios2/nios2-rdwrctl-1.c: New test. * gcc.target/nios2/nios2-ashlsi3-one_shift.c: New test. * gcc.target/nios2/nios2-mul-options-1.c: New test. * gcc.target/nios2/nios2-mul-options-2.c: New test. * gcc.target/nios2/nios2-mul-options-3.c: New test. * gcc.target/nios2/nios2-mul-options-4.c: New test. * gcc.target/nios2/nios2-nor.c: New test. * gcc.target/nios2/nios2-stxio.c: New test. * gcc.target/nios2/custom-fp-1.c: New test. * gcc.target/nios2/custom-fp-2.c: New test. * gcc.target/nios2/custom-fp-3.c: New test. * gcc.target/nios2/custom-fp-4.c: New test. * gcc.target/nios2/custom-fp-5.c: New test. * gcc.target/nios2/custom-fp-6.c: New test. * gcc.target/nios2/custom-fp-7.c: New test. * gcc.target
Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (2).
On 2013/10/6 05:57 PM, Richard Sandiford wrote: But case 16 is different. This case is only produced at prologue/epilogue phase, using a temporary register $r15 to hold a large constant for adjusting stack pointer. Since prologue/epilogue is after split1/split2 phase, we can only output sethi + ori directly. (The addi instruction with $r15 is a 32-bit instruction.) But this code is in the output template of the define_insn. That code is only executed during final, after all passes have been run. If the template returns #, final will split the instruction itself, which is possible even at that late stage. # doesn't have any effect on the passes themselves. (FWIW, there's also a split3 pass that runs after prologue/epilogue generation but before sched2.) However, ISTR there is/was a rule that prologue instructions shouldn't be split, since they'd lose their RTX_FRAME_RELATED_P bit or something. Maybe you hit an ICE because of that? Another way to handle this would be to have the movsi expander split large constant moves. When can_create_pseudo_p (), the intermediate results can be stored in new registers, otherwise they should reuse operands[0]. Two advantages to doing it that way are that high parts can be shared before RA, and that calls to emit_move_insn from the prologue code will split the move automatically. I think many ports do it that way (including MIPS FWIW). FWIW, most ports usually just handle such large adjustment cases in the prologue/epilogue code manually; either multiple SP-adjustments, or use of a temp register (better control of RTX_FRAME_RELATED_P anyways). You might be able to get it to work, but trying to rely on the splitter does not seem like best practice... Chung-Lin
Re: [PING] Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts
Ping. On 13/8/20 10:57 AM, Chung-Lin Tang wrote: Ping. BTW, the SC has approved the Nios II port already: http://gcc.gnu.org/ml/gcc/2013-07/msg00434.html The port is still awaiting technical review. Thanks, Chung-Lin On 13/7/14 下午3:54, Chung-Lin Tang wrote: Hi, the last ping of the Nios II patches was: http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01416.html After assessing the state, we feel it would be better to post a re-submission of the newest patches. The changes accumulated since the original post include: 1) Several bug fixes related to built-in function expanding. 2) A few holes in hard-float FPU code generation was plugged. 3) Support for parsing white-spaces in target attributes. 4) Revision of consistency check behavior of codes in custom instruction built-ins. 5) Some new testcases. The issues raised by Joseph in the first round of reviewing have been addressed. Testing has been re-done on both 32-bit and 64-bit hosts. PR55035 appears to not have been resolved yet, which affects nios2 among several other targets, thus configured with --enable-werror-always still does not build. As before, Sandra and me will serve as nios2 port maintainers. Attached is the patch for the compiler-proper. Thanks, Chung-Lin 2013-07-14 Chung-Lin Tang clt...@codesourcery.com Sandra Loosemore san...@codesourcery.com Based on patches from Altera Corporation * config.gcc (nios2-*-*): Add nios2 config targets. * configure.ac (TLS_SECTION_ASM_FLAG): Add nios2 case. ($cpu_type): Add nios2 as new cpu type. * configure: Regenerate. * config/nios2/nios2.c: New file. * config/nios2/nios2.h: New file. * config/nios2/nios2-opts.h: New file. * config/nios2/nios2-protos.h: New file. * config/nios2/elf.h: New file. * config/nios2/elf.opt: New file. * config/nios2/linux.h: New file. * config/nios2/nios2.opt: New file. * config/nios2/nios2.md: New file. * config/nios2/predicates.md: New file. * config/nios2/constraints.md: New file. * config/nios2/t-nios2: New file. * common/config/nios2/nios2-common.c: New file. * doc/invoke.texi (Nios II options): Document Nios II specific options. * doc/md.texi (Nios II family): Document Nios II specific constraints. * doc/extend.texi (Function Specific Option Pragmas): Document Nios II supported target pragma functionality.
[PING] Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts
Ping. BTW, the SC has approved the Nios II port already: http://gcc.gnu.org/ml/gcc/2013-07/msg00434.html The port is still awaiting technical review. Thanks, Chung-Lin On 13/7/14 下午3:54, Chung-Lin Tang wrote: Hi, the last ping of the Nios II patches was: http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01416.html After assessing the state, we feel it would be better to post a re-submission of the newest patches. The changes accumulated since the original post include: 1) Several bug fixes related to built-in function expanding. 2) A few holes in hard-float FPU code generation was plugged. 3) Support for parsing white-spaces in target attributes. 4) Revision of consistency check behavior of codes in custom instruction built-ins. 5) Some new testcases. The issues raised by Joseph in the first round of reviewing have been addressed. Testing has been re-done on both 32-bit and 64-bit hosts. PR55035 appears to not have been resolved yet, which affects nios2 among several other targets, thus configured with --enable-werror-always still does not build. As before, Sandra and me will serve as nios2 port maintainers. Attached is the patch for the compiler-proper. Thanks, Chung-Lin 2013-07-14 Chung-Lin Tang clt...@codesourcery.com Sandra Loosemore san...@codesourcery.com Based on patches from Altera Corporation * config.gcc (nios2-*-*): Add nios2 config targets. * configure.ac (TLS_SECTION_ASM_FLAG): Add nios2 case. ($cpu_type): Add nios2 as new cpu type. * configure: Regenerate. * config/nios2/nios2.c: New file. * config/nios2/nios2.h: New file. * config/nios2/nios2-opts.h: New file. * config/nios2/nios2-protos.h: New file. * config/nios2/elf.h: New file. * config/nios2/elf.opt: New file. * config/nios2/linux.h: New file. * config/nios2/nios2.opt: New file. * config/nios2/nios2.md: New file. * config/nios2/predicates.md: New file. * config/nios2/constraints.md: New file. * config/nios2/t-nios2: New file. * common/config/nios2/nios2-common.c: New file. * doc/invoke.texi (Nios II options): Document Nios II specific options. * doc/md.texi (Nios II family): Document Nios II specific constraints. * doc/extend.texi (Function Specific Option Pragmas): Document Nios II supported target pragma functionality.
Re: [PATCH] PR32219, weak hidden reference segfault [PING^2]
On 13/8/5 10:06 PM, Mike Stump wrote: On Aug 4, 2013, at 8:14 AM, Chung-Lin Tang clt...@codesourcery.com wrote: On 13/7/15 1:43 AM, Diego Novillo wrote: Could you please repost the patch with its description? This thread is sufficiently old and noisy that I'm not even sure what the patch does nor why. Taking the same example in my first post: Under -fPIC, the code in rtlanal.c:nonzero_address_p() does not properly recognize the PIC-reg + constant form of load as a weak symbol; it returns 'true' immediately after seeing the pic-reg indexing, and does not test the wrapped symbol for DECL_WEAK. So, I can't help but think that others would say that looking into an unspec is by nature, the wrong way to do it, unless that code is in the port. I think the followup from Bernhard points to a better solution, though the wording in the comment was objectionable. Merely say that the symbol, if weak and not defined, is then not local. When I last tested that patch which moves the DECL_WEAK check, the testcases for C++ TLS wrappers fail. I don't remember the fine details, but effectively it filters out the TLS wrappers from being treated locally, causing them to be called through @PLT, and regressing on some tests specifically checking for that... The UNSPEC interpretation here is fairly restricted, FWIW. Earlier talk on this thread also mentioned that maybe specific RTL constructs for reasoning about PIC addresses should be introduced, rather than common idiomatic pattern, though that may be a long shot for now. Chung-Lin
Re: [PATCH] PR32219, weak hidden reference segfault [PING^2]
On 13/8/5 下午10:24, Mike Stump wrote: On Aug 5, 2013, at 7:15 AM, Chung-Lin Tang clt...@codesourcery.com wrote: On 13/8/5 10:06 PM, Mike Stump wrote: On Aug 4, 2013, at 8:14 AM, Chung-Lin Tang clt...@codesourcery.com wrote: On 13/7/15 1:43 AM, Diego Novillo wrote: Could you please repost the patch with its description? This thread is sufficiently old and noisy that I'm not even sure what the patch does nor why. Taking the same example in my first post: Under -fPIC, the code in rtlanal.c:nonzero_address_p() does not properly recognize the PIC-reg + constant form of load as a weak symbol; it returns 'true' immediately after seeing the pic-reg indexing, and does not test the wrapped symbol for DECL_WEAK. So, I can't help but think that others would say that looking into an unspec is by nature, the wrong way to do it, unless that code is in the port. I think the followup from Bernhard points to a better solution, though the wording in the comment was objectionable. Merely say that the symbol, if weak and not defined, is then not local. When I last tested that patch which moves the DECL_WEAK check, the testcases for C++ TLS wrappers fail. I don't remember the fine details, but effectively it filters out the TLS wrappers from being treated locally, causing them to be called through @PLT, and regressing on some tests specifically checking for that… Hum… I wonder if there is a TLS predicate one can mix in to the check that is appropriate. The UNSPEC interpretation here is fairly restricted, FWIW. Earlier talk on this thread also mentioned that maybe specific RTL constructs for reasoning about PIC addresses should be introduced, rather than common idiomatic pattern, though that may be a long shot for now. specifying the unspecified, make is specified, and calling it unspec, would seem to be wrong. The right approach, long term, is to have address forms all specified and documented and a port merely can use the forms they are interested in. pic is one of those things that should be bumped up, unspec is kinda silly. Yes, that's what I meant. e.g. instead of using (const (unspec...)), new RTL operators for specifying the common forms of relocations (including those used PIC) should be defined; this will involve changing lots of backends, so should be aimed in the long term. Chung-Lin
Re: [PATCH] PR32219, weak hidden reference segfault [PING^2]
On 13/7/15 1:43 AM, Diego Novillo wrote: Could you please repost the patch with its description? This thread is sufficiently old and noisy that I'm not even sure what the patch does nor why. Taking the same example in my first post: extern void weakfun() __attribute__((weak,visibility(hidden))); void foo () { if (weakfun) weakfun(); } Under -O1 -m32 -fPIC, the address load and test will look like: (insn 5 2 6 2 (set (reg/f:SI 60) (plus:SI (reg:SI 3 bx) (const:SI (unspec:SI [ (symbol_ref/i:SI (f) [flags 0x43] function_decl f) ] UNSPEC_GOTOFF {*leasi} (expr_list:REG_EQUAL (symbol_ref/i:SI (f) function_decl f) (nil))) (insn 6 5 7 2 (set (reg:CCZ 17 flags) (compare:CCZ (reg/f:SI 60) (const_int 0 [0]))) p.c:8 3 {*cmpsi_ccno_1} (nil)) (jump_insn 7 6 8 2 ... Under -fPIC, the code in rtlanal.c:nonzero_address_p() does not properly recognize the PIC-reg + constant form of load as a weak symbol; it returns 'true' immediately after seeing the pic-reg indexing, and does not test the wrapped symbol for DECL_WEAK. My patch slightly modifies the test to look into the wrapped symbol when seeing a PIC-reg + unspec-constant form, which I believe has become the idiomatic form in GCC for such PIC addresses. Richard Sandiford's concerns were that, UNSPEC really is unspecified, and this might be overassuming its semantics, plus some targets may not be really following the idiomatic use. My final take at the time was, for targets that do follow the common PIC-reg+const-unspec form, this patch solves the problem, while for other targets, nothing is changed. I have re-tested the patch on i686-linux, with clean results. Please see if this patch can be accepted into trunk (patch attached again for convenience). Thanks, Chung-Lin 2013-08-04 Chung-Lin Tang clt...@codesourcery.com PR target/32219 * rtlanal.c (nonzero_address_p): Robustify checking by look recursively into PIC constant offsets and (CONST (UNSPEC ...)) expressions. Index: rtlanal.c === --- rtlanal.c (revision 201473) +++ rtlanal.c (working copy) @@ -393,7 +393,15 @@ nonzero_address_p (const_rtx x) /* Handle PIC references. */ if (XEXP (x, 0) == pic_offset_table_rtx CONSTANT_P (XEXP (x, 1))) - return true; + { + rtx offset = XEXP (x, 1); + if (GET_CODE (offset) == CONST + GET_CODE (XEXP (offset, 0)) == UNSPEC + GET_CODE (XVECEXP (XEXP (offset, 0), 0, 0)) == SYMBOL_REF) + return nonzero_address_p (XVECEXP (XEXP (offset, 0), 0, 0)); + + return true; + } return false; case PRE_MODIFY: Index: testsuite/gcc.dg/torture/pr32219.c === --- testsuite/gcc.dg/torture/pr32219.c (revision 0) +++ testsuite/gcc.dg/torture/pr32219.c (revision 0) @@ -0,0 +1,12 @@ +/* PR target/32219 */ +/* { dg-do run } */ +/* { dg-require-visibility } */ +/* { dg-options -fPIC { target fpic } } */ + +extern void f() __attribute__((weak,visibility(hidden))); +int main() +{ + if (f) +f(); + return 0; +}
Re: [PATCH] PR32219, weak hidden reference segfault [PING^2]
On 13/8/1 5:16 PM, Bernhard Reutner-Fischer wrote: On 14 July 2013 19:43, Diego Novillo dnovi...@google.com wrote: On Sun, Jul 14, 2013 at 2:08 AM, Chung-Lin Tang clt...@codesourcery.com wrote: Ping. Could you please repost the patch with its description? This thread is sufficiently old and noisy that I'm not even sure what the patch does nor why. Chung-Lin Tang, can you regtest and repost the patch please? TIA, I'll re-explain the patch later as Diego has requested, maybe this weekend. Thanks, Chung-Lin Thanks. Diego.
Altera Nios II port submission
To the GCC Steering Committee, Mentor Graphics has submitted, and recently re-submitted an updated version, of a GCC backend port for the Altera Nios II architecture, currently on gcc-patches awaiting technical review [1]. We're proposing, upon port approval and commit to trunk, Sandra Loosemore and myself (Chung-Lin Tang), both of Mentor Graphics, as target maintainers. Thank you, Chung-Lin [1] http://gcc.gnu.org/ml/gcc-patches/2013-07/msg00526.html http://gcc.gnu.org/ml/gcc-patches/2013-07/msg00527.html http://gcc.gnu.org/ml/gcc-patches/2013-07/msg00528.html
[PATCH][2/3] Re-submission of Altera Nios II port, testsuite parts
These are nios2 patches for the gcc testsuite. Some new testcases were added since the last posting. Chung-Lin 2013-07-14 Sandra Loosemore san...@codesourcery.com Chung-Lin Tang clt...@codesourcery.com Based on patches from Altera Corporation * gcc.dg/stack-usage-1.c (SIZE): Define case for __nios2__. * gcc.dg/20040813-1.c: Skip for nios2-*-*. * gcc.dg/20020312-2.c: Add __nios2__ case. * g++.dg/other/PR23205.C: Skip for nios2-*-*. * g++.dg/other/pr23205-2.C: Skip for nios2-*-*. * g++.dg/cpp0x/constexpr-rom.C: Skip for nios2-*-*. * g++.dg/cpp0x/alias-decl-debug-0.C: Skip for nios2-*-*. * g++.old-deja/g++.jason/thunk3.C: Skip for nios2-*-*. * lib/target-supports.exp (check_profiling_available): Check for nios2-*-elf. * gcc.c-torture/execute/pr47237.x:: Skip for nios2-*-*. * gcc.c-torture/execute/20101011-1.c: Skip for nios2-*-*. * gcc.c-torture/execute/builtins/lib/chk.c (memset): Place char-based memset loop before inline check, to prevent problems when called to initialize .bss. Update comments. * gcc.target/nios2/nios2.exp: New DejaGNU file. * gcc.target/nios2/nios2-custom-1.c: New test. * gcc.target/nios2/nios2-trap-insn.c: New test. * gcc.target/nios2/nios2-builtin-custom.c: New test. * gcc.target/nios2/nios2-builtin-io.c: New test. * gcc.target/nios2/nios2-stack-check-1.c: New test. * gcc.target/nios2/nios2-stack-check-2.c: New test. * gcc.target/nios2/nios2-rdctl.c: New test. * gcc.target/nios2/nios2-wrctl.c: New test. * gcc.target/nios2/nios2-wrctl-zero.c: New test. * gcc.target/nios2/nios2-wrctl-not-zero.c: New test. * gcc.target/nios2/nios2-rdwrctl-1.c: New test. * gcc.target/nios2/nios2-reg-constraints.c: New test. * gcc.target/nios2/nios2-ashlsi3-one_shift.c: New test. * gcc.target/nios2/nios2-mul-options-1.c: New test. * gcc.target/nios2/nios2-mul-options-2.c: New test. * gcc.target/nios2/nios2-mul-options-3.c: New test. * gcc.target/nios2/nios2-mul-options-4.c: New test. * gcc.target/nios2/nios2-nor.c: New test. * gcc.target/nios2/nios2-stxio.c: New test. * gcc.target/nios2/custom-fp-1.c: New test. * gcc.target/nios2/custom-fp-2.c: New test. * gcc.target/nios2/custom-fp-3.c: New test. * gcc.target/nios2/custom-fp-4.c: New test. * gcc.target/nios2/custom-fp-5.c: New test. * gcc.target/nios2/custom-fp-6.c: New test. * gcc.target/nios2/custom-fp-7.c: New test. * gcc.target/nios2/custom-fp-8.c: New test. * gcc.target/nios2/custom-fp-cmp-1.c: New test. * gcc.target/nios2/custom-fp-conversion.c: New test. * gcc.target/nios2/custom-fp-double.c: New test. * gcc.target/nios2/custom-fp-float.c: New test. * gcc.target/nios2/nios2-int-types.c: New test. * gcc.target/nios2/nios2-cache-1.c: New test. * gcc.target/nios2/nios2-cache-2.c: New test. Index: gcc/testsuite/g++.old-deja/g++.jason/thunk3.C === --- gcc/testsuite/g++.old-deja/g++.jason/thunk3.C (revision 200946) +++ gcc/testsuite/g++.old-deja/g++.jason/thunk3.C (working copy) @@ -1,5 +1,5 @@ // { dg-do run } -// { dg-skip-if fails with generic thunk support { rs6000-*-* powerpc-*-eabi v850-*-* sh-*-* sh64-*-* h8*-*-* xtensa*-*-* m32r*-*-* lm32-*-* } { * } { } } +// { dg-skip-if fails with generic thunk support { rs6000-*-* powerpc-*-eabi v850-*-* sh-*-* sh64-*-* h8*-*-* xtensa*-*-* m32r*-*-* lm32-*-* nios2-*-* } { * } { } } // Test that variadic function calls using thunks work right. // Note that this will break on any target that uses the generic thunk // support, because it doesn't support variadic functions. Index: gcc/testsuite/lib/target-supports.exp === --- gcc/testsuite/lib/target-supports.exp (revision 200946) +++ gcc/testsuite/lib/target-supports.exp (working copy) @@ -529,6 +529,7 @@ proc check_profiling_available { test_what } { || [istarget mmix-*-*] || [istarget mn10300-*-elf*] || [istarget moxie-*-elf*] +|| [istarget nios2-*-elf] || [istarget picochip-*-*] || [istarget powerpc-*-eabi*] || [istarget powerpc-*-elf] Index: gcc/testsuite/gcc.c-torture/execute/20101011-1.c === --- gcc/testsuite/gcc.c-torture/execute/20101011-1.c(revision 200946) +++ gcc/testsuite/gcc.c-torture/execute/20101011-1.c(working copy) @@ -61,6 +61,10 @@ __aeabi_idiv0 (int return_value) } # define DO_TEST 1 # endif +#elif defined (__nios2__) + /* Nios II requires both hardware support and user configuration
[PATCH][3/3] Re-submission of Altera Nios II port, libgcc parts
nios2 libgcc parts. Since the original post, the only main change has been the fdpbit vs soft-fp issue raised by Joseph, which has been resolved. Other parts are mostly the same. Thanks, Chung-Lin 2013-07-14 Sandra Loosemore san...@codesourcery.com Chung-Lin Tang clt...@codesourcery.com Based on patches from Altera Corporation * config.host (nios2-*-*,nios2-*-linux*): Add nios2 host cases. * config/nios2/lib2-nios2.h: New file. * config/nios2/lib2-divmod-hi.c: New file. * config/nios2/linux-unwind.h: New file. * config/nios2/lib2-divmod.c: New file. * config/nios2/linux-atomic.c: New file. * config/nios2/t-nios2: New file. * config/nios2/crti.asm: New file. * config/nios2/t-linux: New file. * config/nios2/lib2-divtable.c: New file. * config/nios2/lib2-mul.c: New file. * config/nios2/tramp.c: New file. * config/nios2/crtn.asm: New file. Index: libgcc/config/nios2/lib2-nios2.h === --- libgcc/config/nios2/lib2-nios2.h(revision 0) +++ libgcc/config/nios2/lib2-nios2.h(revision 0) @@ -0,0 +1,49 @@ +/* Integer arithmetic support for Altera Nios II. + + Copyright (C) 2012-2013 Free Software Foundation, Inc. + Contributed by Altera and Mentor Graphics, Inc. + + This file is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by the + Free Software Foundation; either version 3, or (at your option) any + later version. + + This file is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + http://www.gnu.org/licenses/. */ + +#ifndef LIB2_NIOS2_H +#define LIB2_NIOS2_H + +/* Types. */ + +typedef char QItype __attribute__ ((mode (QI))); +typedef unsigned char UQItype __attribute__ ((mode (QI))); +typedef short HItype __attribute__ ((mode (HI))); +typedef unsigned short UHItype __attribute__ ((mode (HI))); +typedef int SItype __attribute__ ((mode (SI))); +typedef unsigned int USItype __attribute__ ((mode (SI))); +typedef int word_type __attribute__ ((mode (__word__))); + +/* Exported functions. */ +extern SItype __divsi3 (SItype, SItype); +extern SItype __modsi3 (SItype, SItype); +extern SItype __udivsi3 (SItype, SItype); +extern SItype __umodsi3 (SItype, SItype); +extern HItype __divhi3 (HItype, HItype); +extern HItype __modhi3 (HItype, HItype); +extern UHItype __udivhi3 (UHItype, UHItype); +extern UHItype __umodhi3 (UHItype, UHItype); +extern SItype __mulsi3 (SItype, SItype); + +#endif /* LIB2_NIOS2_H */ Index: libgcc/config/nios2/crtn.S === --- libgcc/config/nios2/crtn.S (revision 0) +++ libgcc/config/nios2/crtn.S (revision 0) @@ -0,0 +1,60 @@ +/* Copyright (C) 2012-2013 Free Software Foundation, Inc. + Contributed by Jonah Graham (jgra...@altera.com). + Contributed by Mentor Graphics, Inc. + +This file is free software; you can redistribute it and/or modify it +under the terms of the GNU General Public License as published by the +Free Software Foundation; either version 3, or (at your option) any +later version. + +This file is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +General Public License for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +http://www.gnu.org/licenses/. */ + + +/* This file just makes sure that the .fini and .init sections do in +fact return. Users may put any desired instructions in those sections. +This file is the last thing linked into any executable. +*/ + .file crtn.asm + + + + .section.init + ldw ra, 44(sp) + ldw r23, 40(sp) + ldw r22, 36(sp) + ldw r21, 32(sp) + ldw r20, 28(sp) + ldw r19, 24(sp) + ldw r18, 20(sp) + ldw r17, 16(sp
[PATCH] Hexadecimal numbers in option arguments
Original patch posted as part of Nios II patches: http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01087.html This patch is to allow hexadecimal numbers to be used in option arguments, e.g. -falign-loops=0x10 can now be used as equivalent to -falign-loops=16. Joseph, the patch has been modified to use IXDIGIT to check the argument string first, as you suggested in the last submission. Is this okay for trunk? Thanks, Chung-Lin 2013-07-14 Chung-Lin Tang clt...@codesourcery.com * opts-common.c (integral_argument): Add support for hexadecimal command option integer arguments. Update comments. Index: opts-common.c === --- opts-common.c (revision 200946) +++ opts-common.c (working copy) @@ -147,7 +147,7 @@ find_opt (const char *input, unsigned int lang_mas return match_wrong_lang; } -/* If ARG is a non-negative integer made up solely of digits, return its +/* If ARG is a non-negative decimal or hexadecimal integer, return its value, otherwise return -1. */ int @@ -161,6 +161,17 @@ integral_argument (const char *arg) if (*p == '\0') return atoi (arg); + /* It wasn't a decimal number - try hexadecimal. */ + if (arg[0] == '0' (arg[1] == 'x' || arg[1] == 'X')) +{ + p = arg + 2; + while (*p ISXDIGIT (*p)) + p++; + + if (*p == '\0') + return strtol (arg, NULL, 16); +} + return -1; }
Re: [PATCH] PR32219, weak hidden reference segfault [PING^2]
Ping. On 2013/6/20 03:01 PM, Chung-Lin Tang wrote: Ping again? On 13/6/11 5:20 PM, Bernhard Reutner-Fischer wrote: ping, CCing middle-end maintainers for review. On 31 May 2013 10:13, Chung-Lin Tang clt...@codesourcery.com wrote: On 13/5/15 8:12 PM, Richard Sandiford wrote: Chung-Lin Tang clt...@codesourcery.com writes: On 13/5/10 6:37 PM, Richard Sandiford wrote: Chung-Lin Tang clt...@codesourcery.com writes: +case UNSPEC: + /* Reach for a contained symbol. */ + return nonzero_address_p (XVECEXP (x, 0, 0)); I don't think this is safe. UNSPECs really are unspecified :-), so we can't assume that (unspec X) is nonzero simply because X is. Attached is a modified patch (not yet tested but just for demonstration) with a more specific test, hopefully regarded as more safe. The point is in recognizing (const (unspec [symbol] XYZ)) offsets in PIC references, which seems quite idiomatic across all targets by now. I agree this is safer. However, there used to be some ports that use (plus pic_offset_table_rtx (symbol_ref X)) -- i.e. without an unspec -- to represent a PIC reference to X. That always seemed semantically wrong, since you're not actually adding the address of X and the PIC register, but the patch wouldn't handle that case correctly. Well I can't help those targets then, but at least nothing will be changed for them by this patch. It will just continue to return 'true'. An alternative might be to remove the pic_offset_table_rtx case altogether and rely on targetm.delegitimize_address instead. FWIW, I'd prefer that if it works, but it's not me you need to convince. :-) Like we discussed below, I think the direction should be towards making things more machine-independent, rather then pushing more into the backend. I would suggest that this probably means there should be a new, more specific construct in RTL to represent relocation values of this kind, instead of (const (unspec)) serving an unofficial role; possibly some real support for reasoning about PIC references could also be considered. Yeah, maybe we could try to introduce some target-independent knowledge of certain reloc types, a bit like the generic BFD_RELOC_*s in bfd. FWIW, I've ran tests on the newer patch on i686-linux, with no regressions. Testcase has been moved to gcc.dg/torture by recommendation of Bernhard. If any of the RTL maintainers can give an eye of merciful approval, this old PR could be resolved :) Thanks, Chung-Lin
[PING^3] Nios II port
Ping x 3. On 13/6/18 下午4:38, Chung-Lin Tang wrote: On 13/6/18 上午3:05, Sandra Loosemore wrote: Ping? I think these are the most recent versions of the unreviewed patches in the series: http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00287.html http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00760.html http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01085.html
Re: Question about conds attribute for *thumb2_alusi3_short
On 13/6/24 下午11:43, Tom de Vries wrote: Richard, I've noticed that f.i. *thumb2_alusi3_short has no explicit setting of the conds attribute, which means the value of the conds attribute for this insn is nocond: ... (define_insn *thumb2_alusi3_short [(set (match_operand:SI 0 s_register_operand =l) (match_operator:SI 3 thumb_16bit_operator [(match_operand:SI 1 s_register_operand 0) (match_operand:SI 2 s_register_operand l)])) (clobber (reg:CC CC_REGNUM))] TARGET_THUMB2 reload_completed GET_CODE(operands[3]) != PLUS GET_CODE(operands[3]) != MINUS %I3%!\\t%0, %1, %2 [(set_attr predicable yes) (set_attr length 2)] ) ... AFAIU, this insn is either: - conditional, and does not modify cc, or - unconditional, and sets cc. So the clobber of CC in the RTL conservatively describes both cases. It seems to me the logical conds setting for the conditional case is nocond, set (or perhaps clob) for the unconditional case. So, is this a more accurate value of conds for this insn: ... (set (attr conds) (if_then_else (match_test GET_CODE (PATTERN (insn)) == COND_EXEC) (const_string nocond) (const_string set)))] ... ? Is there a generic need to have this attribute accurate for all insns? Following this thread that Tom pointed to me earlier in internal discussion: http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00723.html If the short-CC-clobbered form is selected very late now, I think this pattern simply is (or should) not be used for the conditional (within IT-block) case. It should simply be set to clob. Predicable might be set to no as well... Chung-Lin
Re: [PATCH] PR32219, weak hidden reference segfault [PING]
Ping again? On 13/6/11 5:20 PM, Bernhard Reutner-Fischer wrote: ping, CCing middle-end maintainers for review. On 31 May 2013 10:13, Chung-Lin Tang clt...@codesourcery.com wrote: On 13/5/15 8:12 PM, Richard Sandiford wrote: Chung-Lin Tang clt...@codesourcery.com writes: On 13/5/10 6:37 PM, Richard Sandiford wrote: Chung-Lin Tang clt...@codesourcery.com writes: +case UNSPEC: + /* Reach for a contained symbol. */ + return nonzero_address_p (XVECEXP (x, 0, 0)); I don't think this is safe. UNSPECs really are unspecified :-), so we can't assume that (unspec X) is nonzero simply because X is. Attached is a modified patch (not yet tested but just for demonstration) with a more specific test, hopefully regarded as more safe. The point is in recognizing (const (unspec [symbol] XYZ)) offsets in PIC references, which seems quite idiomatic across all targets by now. I agree this is safer. However, there used to be some ports that use (plus pic_offset_table_rtx (symbol_ref X)) -- i.e. without an unspec -- to represent a PIC reference to X. That always seemed semantically wrong, since you're not actually adding the address of X and the PIC register, but the patch wouldn't handle that case correctly. Well I can't help those targets then, but at least nothing will be changed for them by this patch. It will just continue to return 'true'. An alternative might be to remove the pic_offset_table_rtx case altogether and rely on targetm.delegitimize_address instead. FWIW, I'd prefer that if it works, but it's not me you need to convince. :-) Like we discussed below, I think the direction should be towards making things more machine-independent, rather then pushing more into the backend. I would suggest that this probably means there should be a new, more specific construct in RTL to represent relocation values of this kind, instead of (const (unspec)) serving an unofficial role; possibly some real support for reasoning about PIC references could also be considered. Yeah, maybe we could try to introduce some target-independent knowledge of certain reloc types, a bit like the generic BFD_RELOC_*s in bfd. FWIW, I've ran tests on the newer patch on i686-linux, with no regressions. Testcase has been moved to gcc.dg/torture by recommendation of Bernhard. If any of the RTL maintainers can give an eye of merciful approval, this old PR could be resolved :) Thanks, Chung-Lin
Re: [ping**2] Nios II port
On 13/6/18 上午3:05, Sandra Loosemore wrote: Ping? I think these are the most recent versions of the unreviewed patches in the series: http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00287.html http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00760.html http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01085.html There are also these two parts that have been reviewed already: http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01329.html [approved but not applied yet?] That Dwarf fix has already been applied. http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01087.html [needs minor cleanup and separate consideration] Yes, I forgot about this one. Need some cleanup to be more robust. Will resubmit separately. Chung-Lin
[PING] Re: [PATCH 0/5] Submission of Altera Nios II port
Pinging the nios2 port. On 13/5/15 1:04 AM, Chung-Lin Tang wrote: On 2013/4/26 04:35 AM, Joseph S. Myers wrote: I should ask the following general standard new-port questions: * Does the port build cleanly when configured with --enable-werror-always and built using a native compiler from current GCC trunk - for both 32-bit host, and 64-bit host? It should. This is the standard way of ensuring the same level of warning-cleanness in a cross build as native bootstrap automatically enforces (and the build compiler needs to be from current trunk because warning-cleanness is only expected when the build compiler is the same version as the compiler being built). The new targets should be added to contrib/config-list.mk, which helps test all targets with --enable-werror-always. This is mentioned in Back End in sourcebuild.texi - check there for any other pieces that should be included in the port submission. Currently, nios2 seems to be another affected target by PR 55035, when building with a recent trunk with --enable-werror-always: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55035 I would like to ask this --enable-werror-always requirement be relaxed for the nios2 port submission, as it is not alone in the above PR, and we are early in the release cycle; there should be plenty of time to fix any problems afterwards. Though not included in the submitted patches, I will add the target entries in contrib/config-list.mk when committing. * What are test results like for the port? Again, both 32-bit and 64-bit hosts, to detect any dependencies on whether the host is 32-bit or 64-bit. Tests of i686 and x86_64 Linux hosts show same test results. FAILs that still exist include g++.dg/debug/dwarf2/non-virtual-thunk.C, due to the lack of target specific MI-thunk hooks right now, and a few tree-ssa optimization FAILs, that might be worked around by augmenting the testcase. The port in whole should be fairly stable. Thanks, Chung-Lin
Re: [PATCH] PR32219, weak hidden reference segfault
On 13/5/15 8:12 PM, Richard Sandiford wrote: Chung-Lin Tang clt...@codesourcery.com writes: On 13/5/10 6:37 PM, Richard Sandiford wrote: Chung-Lin Tang clt...@codesourcery.com writes: +case UNSPEC: + /* Reach for a contained symbol. */ + return nonzero_address_p (XVECEXP (x, 0, 0)); I don't think this is safe. UNSPECs really are unspecified :-), so we can't assume that (unspec X) is nonzero simply because X is. Attached is a modified patch (not yet tested but just for demonstration) with a more specific test, hopefully regarded as more safe. The point is in recognizing (const (unspec [symbol] XYZ)) offsets in PIC references, which seems quite idiomatic across all targets by now. I agree this is safer. However, there used to be some ports that use (plus pic_offset_table_rtx (symbol_ref X)) -- i.e. without an unspec -- to represent a PIC reference to X. That always seemed semantically wrong, since you're not actually adding the address of X and the PIC register, but the patch wouldn't handle that case correctly. Well I can't help those targets then, but at least nothing will be changed for them by this patch. It will just continue to return 'true'. An alternative might be to remove the pic_offset_table_rtx case altogether and rely on targetm.delegitimize_address instead. FWIW, I'd prefer that if it works, but it's not me you need to convince. :-) Like we discussed below, I think the direction should be towards making things more machine-independent, rather then pushing more into the backend. I would suggest that this probably means there should be a new, more specific construct in RTL to represent relocation values of this kind, instead of (const (unspec)) serving an unofficial role; possibly some real support for reasoning about PIC references could also be considered. Yeah, maybe we could try to introduce some target-independent knowledge of certain reloc types, a bit like the generic BFD_RELOC_*s in bfd. FWIW, I've ran tests on the newer patch on i686-linux, with no regressions. Testcase has been moved to gcc.dg/torture by recommendation of Bernhard. If any of the RTL maintainers can give an eye of merciful approval, this old PR could be resolved :) Thanks, Chung-Lin Index: rtlanal.c === --- rtlanal.c (revision 199474) +++ rtlanal.c (working copy) @@ -393,7 +393,15 @@ nonzero_address_p (const_rtx x) /* Handle PIC references. */ if (XEXP (x, 0) == pic_offset_table_rtx CONSTANT_P (XEXP (x, 1))) - return true; + { + rtx offset = XEXP (x, 1); + if (GET_CODE (offset) == CONST + GET_CODE (XEXP (offset, 0)) == UNSPEC + GET_CODE (XVECEXP (XEXP (offset, 0), 0, 0)) == SYMBOL_REF) + return nonzero_address_p (XVECEXP (XEXP (offset, 0), 0, 0)); + + return true; + } return false; case PRE_MODIFY: Index: testsuite/gcc.dg/torture/pr32219.c === --- testsuite/gcc.dg/torture/pr32219.c (revision 0) +++ testsuite/gcc.dg/torture/pr32219.c (revision 0) @@ -0,0 +1,12 @@ +/* PR target/32219 */ +/* { dg-do run } */ +/* { dg-require-visibility } */ +/* { dg-options -fPIC { target fpic } } */ + +extern void f() __attribute__((weak,visibility(hidden))); +int main() +{ + if (f) +f(); + return 0; +}
Re: [PATCH] PR32219, weak hidden reference segfault
On 13/5/10 6:37 PM, Richard Sandiford wrote: Chung-Lin Tang clt...@codesourcery.com writes: +case UNSPEC: + /* Reach for a contained symbol. */ + return nonzero_address_p (XVECEXP (x, 0, 0)); I don't think this is safe. UNSPECs really are unspecified :-), so we can't assume that (unspec X) is nonzero simply because X is. Attached is a modified patch (not yet tested but just for demonstration) with a more specific test, hopefully regarded as more safe. The point is in recognizing (const (unspec [symbol] XYZ)) offsets in PIC references, which seems quite idiomatic across all targets by now. I would suggest that this probably means there should be a new, more specific construct in RTL to represent relocation values of this kind, instead of (const (unspec)) serving an unofficial role; possibly some real support for reasoning about PIC references could also be considered. Chung-Lin Index: rtlanal.c === --- rtlanal.c (revision 198923) +++ rtlanal.c (working copy) @@ -393,7 +393,15 @@ nonzero_address_p (const_rtx x) /* Handle PIC references. */ if (XEXP (x, 0) == pic_offset_table_rtx CONSTANT_P (XEXP (x, 1))) - return true; + { + rtx offset = XEXP (x, 1); + if (GET_CODE (offset) == CONST + GET_CODE (XEXP (offset, 0)) == UNSPEC + GET_CODE (XVECEXP (XEXP (offset, 0), 0, 0)) == SYMBOL_REF) + return nonzero_address_p (XVECEXP (XEXP (offset, 0), 0, 0)); + + return true; + } return false; case PRE_MODIFY:
Re: [ping][patch, ARM] Fix PR42017, LR not used in leaf functions
On 13/5/15 9:43 PM, Kugan wrote: On 14/05/13 19:18, Ramana Radhakrishnan wrote: On 05/13/13 04:15, Kugan wrote: Hi, Ping this patch by Chung-Lin. http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01179.html This patch allows lr registers to be used in leaf functions for ARM. There were some concerns about performance regression in thumb2 mode for CoreMark. However, looking at the code further shows that this performance regression is due to alignment issue with core_state_transition function and as a result taking longer time to execute. In fact, there isn’t any change in the code generated for core_state_transition with and without the patch. Adding Alignment to this function improves the performance than without the patch. Is this patch ok for trunk? Thanks, Kugan Ok. Thanks Ramana. I don't have write access for committing it. Can someone please commit this. Thanks, Kugam I already committed after Ramana's approval. Chung-Lin
Re: [PATCH 2/5] Altera Nios II: libgcc
On 13/4/26 4:00 AM, Joseph S. Myers wrote: On Thu, 18 Apr 2013, Chung-Lin Tang wrote: +nios2-*-linux*) +tmake_file=$tmake_file nios2/t-nios2 nios2/t-linux t-libgcc-pic t-slibgcc-libgcc +extra_parts=$extra_parts crti.o crtn.o +md_unwind_header=nios2/linux-unwind.h +;; +nios2-*-*) +tmake_file=$tmake_file nios2/t-nios2 t-fdpbit +extra_parts=$extra_parts crti.o crtn.o +;; As I understand it, the port uses soft-fp in glibc so doesn't need it in libgcc for nios2-*-linux*. But why use the old fp-bit in libgcc for bare metal (use of t-fdpbit here), rather than soft-fp? I think this was oversight. I have switched the nios2-elf config to use use softfp. Although not currently utilized, nios2 does have the capability of a hard-float multilib, so I have used t-softfp-excl for now. The remaining more trivial formatting, file name issues, etc. should have all been resolved. Please see attached patch. Thanks, Chung-Lin Index: libgcc/config.host === --- libgcc/config.host (revision 198891) +++ libgcc/config.host (working copy) @@ -137,6 +137,9 @@ mips*-*-*) cpu_type=mips tmake_file=mips/t-mips ;; +nios2*-*-*) + cpu_type=nios2 + ;; powerpc*-*-*) cpu_type=rs6000 ;; @@ -814,6 +817,15 @@ moxie-*-rtems*) # Don't use default. extra_parts= ;; +nios2-*-linux*) + tmake_file=$tmake_file nios2/t-nios2 nios2/t-linux t-libgcc-pic t-slibgcc-libgcc + extra_parts=$extra_parts crti.o crtn.o + md_unwind_header=nios2/linux-unwind.h + ;; +nios2-*-*) + tmake_file=$tmake_file nios2/t-nios2 t-softfp-sfdf t-softfp-excl t-softfp + extra_parts=$extra_parts crti.o crtn.o + ;; pdp11-*-*) tmake_file=pdp11/t-pdp11 t-fdpbit ;; Index: libgcc/config/nios2/crtn.S === --- libgcc/config/nios2/crtn.S (revision 0) +++ libgcc/config/nios2/crtn.S (revision 0) @@ -0,0 +1,60 @@ +/* Copyright (C) 2012-2013 Free Software Foundation, Inc. + Contributed by Jonah Graham (jgra...@altera.com). + Contributed by Mentor Graphics, Inc. + +This file is free software; you can redistribute it and/or modify it +under the terms of the GNU General Public License as published by the +Free Software Foundation; either version 3, or (at your option) any +later version. + +This file is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +General Public License for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +http://www.gnu.org/licenses/. */ + + +/* This file just makes sure that the .fini and .init sections do in +fact return. Users may put any desired instructions in those sections. +This file is the last thing linked into any executable. +*/ + .file crtn.asm + + + + .section .init + ldw ra, 44(sp) + ldw r23, 40(sp) + ldw r22, 36(sp) + ldw r21, 32(sp) + ldw r20, 28(sp) + ldw r19, 24(sp) + ldw r18, 20(sp) + ldw r17, 16(sp) + ldw r16, 12(sp) + ldw fp, 8(sp) + addi sp, sp, 48 + ret + + .section .fini + ldw ra, 44(sp) + ldw r23, 40(sp) + ldw r22, 36(sp) + ldw r21, 32(sp) + ldw r20, 28(sp) + ldw r19, 24(sp) + ldw r18, 20(sp) + ldw r17, 16(sp) + ldw r16, 12(sp) + ldw fp, 8(sp) + addi sp, sp, 48 + ret + Index: libgcc/config/nios2/linux-unwind.h === --- libgcc/config/nios2/linux-unwind.h (revision 0) +++ libgcc/config/nios2/linux-unwind.h (revision 0) @@ -0,0 +1,179 @@ +/* DWARF2 EH unwinding support for Nios II Linux. + Copyright (C) 2008-2013 Free Software Foundation, Inc. + +This file is free software; you can redistribute it and/or modify it +under the terms of the GNU General Public License as published by the +Free Software Foundation; either version 3, or (at your option) any +later version. + +This file is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +General Public License for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +http://www.gnu.org/licenses/. */ + +#ifndef inhibit_libc + +/* Do code reading to identify a signal frame
Re: [PATCH 0/5] Submission of Altera Nios II port
On 2013/4/26 04:35 AM, Joseph S. Myers wrote: I should ask the following general standard new-port questions: * Does the port build cleanly when configured with --enable-werror-always and built using a native compiler from current GCC trunk - for both 32-bit host, and 64-bit host? It should. This is the standard way of ensuring the same level of warning-cleanness in a cross build as native bootstrap automatically enforces (and the build compiler needs to be from current trunk because warning-cleanness is only expected when the build compiler is the same version as the compiler being built). The new targets should be added to contrib/config-list.mk, which helps test all targets with --enable-werror-always. This is mentioned in Back End in sourcebuild.texi - check there for any other pieces that should be included in the port submission. Currently, nios2 seems to be another affected target by PR 55035, when building with a recent trunk with --enable-werror-always: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55035 I would like to ask this --enable-werror-always requirement be relaxed for the nios2 port submission, as it is not alone in the above PR, and we are early in the release cycle; there should be plenty of time to fix any problems afterwards. Though not included in the submitted patches, I will add the target entries in contrib/config-list.mk when committing. * What are test results like for the port? Again, both 32-bit and 64-bit hosts, to detect any dependencies on whether the host is 32-bit or 64-bit. Tests of i686 and x86_64 Linux hosts show same test results. FAILs that still exist include g++.dg/debug/dwarf2/non-virtual-thunk.C, due to the lack of target specific MI-thunk hooks right now, and a few tree-ssa optimization FAILs, that might be worked around by augmenting the testcase. The port in whole should be fairly stable. Thanks, Chung-Lin
Re: [ping][patch, ARM] Fix PR42017, LR not used in leaf functions
On 13/5/13 11:15 AM, Kugan wrote: Hi, Ping this patch by Chung-Lin. http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01179.html This patch allows lr registers to be used in leaf functions for ARM. There were some concerns about performance regression in thumb2 mode for CoreMark. However, looking at the code further shows that this performance regression is due to alignment issue with core_state_transition function and as a result taking longer time to execute. In fact, there isn’t any change in the code generated for core_state_transition with and without the patch. Adding Alignment to this function improves the performance than without the patch. Just curious, were changes to enforce the alignment added already? (I'm quite out of ARM-specific context lately). Chung-Lin
[PATCH] PR32219, weak hidden reference segfault
Hi, with reference to the old dicussion on PR 32219: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32219 It seems that a patch was submitted to put the DECL_WEAK check before the visibility check, but that patch was never approved or applied, due to concerns in the wording of surrounding comments: http://gcc.gnu.org/ml/gcc-patches/2010-03/msg00666.html That patch does solve the segfault in the PR, and happens to also solve the other weak-hidden + section-anchor issue Nathan's other patch solves (simply because it works the same way by rejecting DECL_WEAK inside binds_local_p). However, my own testing of the PR patch on recent trunk indicates a regression: it filters out the TLS wrapper functions for C++11 thread_local variables, causing them to be called by @PLT, and failing a few tests that check for this. Looking into the generated code for: extern void weakfun() __attribute__((weak,visibility(hidden))); void foo () { if (weakfun) weakfun(); } Under -O1 -m32 -fPIC, the address load and test will look like: (insn 5 2 6 2 (set (reg/f:SI 60) (plus:SI (reg:SI 3 bx) (const:SI (unspec:SI [ (symbol_ref/i:SI (f) [flags 0x43] function_decl f) ] UNSPEC_GOTOFF {*leasi} (expr_list:REG_EQUAL (symbol_ref/i:SI (f) function_decl f) (nil))) (insn 6 5 7 2 (set (reg:CCZ 17 flags) (compare:CCZ (reg/f:SI 60) (const_int 0 [0]))) p.c:8 3 {*cmpsi_ccno_1} (nil)) (jump_insn 7 6 8 2 ... However, the logic currently used in rtlanal.c:nonzero_address_p() only test for PIC-reg + constant_p, instead of more sophisticated checking to expose the wrapped weak symbol_ref, thus confusing CSE to eliminate the needed test. The DECL_WEAK test upward movement in binds_local_p() works because when bound non-local, x86 expand turns the address load into (mem (plus (const (unspec ... UNSPEC_GOT form, with the MEM helping to avoid the above case. Attached is a patch to make the nonzero_address_p() work for this case, bootstrapped and tested on i686-linux without regressions. I have the impression from the PR32219 discussion, that the solution should be to make all weak-hidden-undefined symbols as potentially non-local. However, I don't think that is needed, no? As long as the linker added zero value is in the same module, weak hidden semantics should remain just the same... Thanks, Chung-Lin 2013-05-09 Chung-Lin Tang clt...@codesourcery.com PR target/32219 * rtlanal.c (nonzero_address_p): Robustify checking by look recursively into PIC constant offsets and (CONST (UNSPEC ...)) expressions. Index: rtlanal.c === --- rtlanal.c (revision 198735) +++ rtlanal.c (working copy) @@ -387,13 +387,22 @@ nonzero_address_p (const_rtx x) return false; case CONST: - return nonzero_address_p (XEXP (x, 0)); + { + rtx base, offset; + /* Peel away any constant offsets from the base symbol. */ + split_const (CONST_CAST_RTX (x), base, offset); + return nonzero_address_p (base); + } +case UNSPEC: + /* Reach for a contained symbol. */ + return nonzero_address_p (XVECEXP (x, 0, 0)); + case PLUS: /* Handle PIC references. */ if (XEXP (x, 0) == pic_offset_table_rtx CONSTANT_P (XEXP (x, 1))) - return true; + return nonzero_address_p (XEXP (x, 1)); return false; case PRE_MODIFY: Index: testsuite/gcc.dg/visibility-21.c === --- testsuite/gcc.dg/visibility-21.c(revision 0) +++ testsuite/gcc.dg/visibility-21.c(revision 0) @@ -0,0 +1,12 @@ +/* PR target/32219 */ +/* { dg-do run } */ +/* { dg-require-visibility } */ +/* { dg-options -O1 -fPIC { target fpic } } */ + +extern void f() __attribute__((weak,visibility(hidden))); +int main() +{ + if (f) +f(); + return 0; +}
Re: [PATCH 4/5] Altera Nios II: dwarf generation fix
On 2013/4/23 01:35 AM, Cary Coutant wrote: A : host_integerp (value, TYPE_UNSIGNED (TREE_TYPE (value))) B : host_integerp (value, 0) AB AB AB AB type_size,hwi | 00 01 10 11 --+--- 32,32 | X X int int 64,32 | X X int int 32,64 | X X - int 64,64 | X X int int In the third column (AB == 10), we're emitting a single int today even though we know it's not technically correct: GDB will display the unsigned value as a negative number. That's marginally better than emitting nothing at all when the value is larger than an hwi, but I was arguing that as long as we're adding the ability to emit the constant as a double, why not also use a double for an unsigned that doesn't fit in a signed hwi? Yes, it'll waste some space, but the value will be correctly displayed as a result. I'm not sure of other cases, but here it is only to mark the values of an enumerator type, so as long as the values are consistent, the correct behavior is to print out the right enum string (I know that's a bit not ideal, but just to point that out) Upon further reflection, however... This comment is wrong: /* DWARF2 does not provide a way of indicating whether or not enumeration constants are signed or unsigned. GDB always assumes the values are signed, so we output all values as if they were signed. That means that enumeration constants with very large unsigned values will appear to have negative values in the debugger. */ DWARF does in fact provide a way of indicating whether a constant is signed or unsigned: DW_FORM_sdata and DW_FORM_udata. These forms were in DWARF-2, and the following comment was added to the DWARF-3 spec: If one of the DW_FORM_datan forms is used to represent a signed or unsigned integer, it can be hard for a consumer to discover the context necessary to determine which interpretation is intended. Producers are therefore strongly encouraged to use DW_FORM_sdata or DW_FORM_udata for signed and unsigned integers respectively, rather than DW_FORM_datan. We should really be emitting unsigned constants using add_AT_unsigned: if (TYPE_UNSIGNED (TREE_TYPE (value))) { if (host_integerp (value, 1)) add_AT_unsigned (enum_die, DW_AT_const_value, TREE_INT_CST_LOW (value)); else add_AT_unsigned_double (enum_die, DW_AT_const_value, TREE_INT_CST_HIGH (value), TREE_INT_CST_LOW (value)); } else { if (host_integerp (value, 0)) add_AT_int (enum_die, DW_AT_const_value, TREE_INT_CST_LOW (value)); else add_AT_double (enum_die, DW_AT_const_value, TREE_INT_CST_HIGH (value), TREE_INT_CST_LOW (value)); } add_AT_unsigned_double would be new, and would need a new dw_val_class_unsigned_const_double enum. That seems like the completely correct solution; correct unsigned/signed x int/double tags for all situations. ...1xxx... can then be correctly read as an unsigned int, rather than an excessively wide double or (incorrectly) signed int. You said OK for Julian's patch in the last mail, so I'll take that as approved (for an interim solution). If you don't mind, I'll add a TODO to the comments (attached patch). Thanks, Chung-Lin Index: dwarf2out.c === --- dwarf2out.c (revision 198177) +++ dwarf2out.c (working copy) @@ -17027,15 +17027,27 @@ gen_enumeration_type_die (tree type, dw_die_ref co if (TREE_CODE (value) == CONST_DECL) value = DECL_INITIAL (value); - if (host_integerp (value, TYPE_UNSIGNED (TREE_TYPE (value + if (host_integerp (value, TYPE_UNSIGNED (TREE_TYPE (value))) + (simple_type_size_in_bits (TREE_TYPE (value)) + = HOST_BITS_PER_WIDE_INT || host_integerp (value, 0))) /* DWARF2 does not provide a way of indicating whether or not enumeration constants are signed or unsigned. GDB always assumes the values are signed, so we output all values as if they were signed. That means that enumeration constants with very large unsigned values - will appear to have negative values in the debugger. */ - add_AT_int (enum_die, DW_AT_const_value, - tree_low_cst (value, tree_int_cst_sgn (value) 0)); + will appear to have negative values in the debugger. + + TODO: the above comment is wrong, DWARF2 does provide + DW_FORM_sdata/DW_FORM_udata to represent signed/unsigned data. + This should be re-worked to use correct signed/unsigned + int/double tags for all cases, instead of always treating as + signed. */ + add_AT_int (enum_die, DW_AT_const_value, TREE_INT_CST_LOW (value)); + else + /*
Re: [PATCH 4/5] Altera Nios II: dwarf generation fix
On 2013/4/19 12:56 AM, Cary Coutant wrote: On Thu, Apr 18, 2013 at 6:49 AM, Chung-Lin Tang clt...@codesourcery.com wrote: This patch was a fix by Julian which corrected a HOST_BITS_PER_WIDE_INT host dependency in dwarf generation. Nios II does not have need_64bit_hwint switched on during configuring, and ran into GDB test FAILs originating from this problem. 2013-04-18 Julian Brown jul...@codesourcery.com * dwarf2out.c (gen_enumeration_type_die): Fix HOST_BITS_PER_WIDE_INT dependency behavior in enumeration type DIE generation. + if (host_integerp (value, TYPE_UNSIGNED (TREE_TYPE (value))) + (simple_type_size_in_bits (TREE_TYPE (value)) + = HOST_BITS_PER_WIDE_INT || host_integerp (value, 0))) /* DWARF2 does not provide a way of indicating whether or not enumeration constants are signed or unsigned. GDB always assumes the values are signed, so we output all values as if they were signed. That means that enumeration constants with very large unsigned values will appear to have negative values in the debugger. */ +add_AT_int (enum_die, DW_AT_const_value, TREE_INT_CST_LOW (value)); + else +/* Enumeration constants may be wider than HOST_WIDE_INT. Handle + that here. */ +add_AT_double (enum_die, DW_AT_const_value, + TREE_INT_CST_HIGH (value), TREE_INT_CST_LOW (value)); I'm not sure I understand the logic here. I'd think either the value fits in a signed HOST_WIDE_INT, and we use add_AT_int, or it doesn't, and we use add_AT_double: if (host_integerp (value, 0)) add_AT_int (enum_die, DW_AT_const_value, TREE_INT_CST_LOW (value)); else add_AT_double (enum_die, DW_AT_const_value, TREE_INT_CST_HIGH (value), TREE_INT_CST_LOW (value)); Why wouldn't that work? I'd think this would even eliminate the need for the comment about signed vs. unsigned. -cary I think Julian might be able to fill clearer, but IIUC, if you use host_integer(value,0) as the test, while functionally also correct, for values like: TREE_INT_CST_HIGH (value) == ... TREE_INT_CST_LOW (value) == 1xxx... you will end up placing it as a double, even if TREE_TYPE (value) is something within 32-bits, which you can actually place as an 'int'. In other words, the more complex condition saves a bit of dwarf size. Julian, can you comment further? Thanks, Chung-Lin