Re: [PATCH 1/3, libgomp] Resolve libgomp plugin deadlock on exit, libgomp proper parts

2016-01-05 Thread Chung-Lin Tang
Patch has been updated to accommodate the gomp_fini_device() removal changes.
And ping.

On 2015/12/14 11:47 PM, Chung-Lin Tang wrote:
> [sorry, forgot to C gcc-patches in last send]
> 
> Hi Jakub,
> these patches are a revision of 
> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01701.html
> since that patch set have bitrotten by now.
> 
> To recap the original situation, due to the way that device locks are held
> when entering plugin code, a GOMP_PLUGIN_fatal() call will deadlock when the
> GOMP_unregister_var() exit destructor tries to obtain the same device lock.
> 
> This patch set revises many functions on libgomp plugin interface to return 
> false on error,
> and back to libgomp to release the lock and call gomp_fatal() there.
> 
> This first patch is the changes for the machine independent libgomp proper. 
> The entire patch
> set was tested without regressions. Is this okay for trunk?
> 
> Thanks,
> Chung-Lin
> 
> 2015-12-14  Chung-Lin Tang  <clt...@codesourcery.com>
> 
> * target.c (gomp_device_copy): New function.
> (gomp_copy_host2dev): Likewise.
> (gomp_copy_dev2host): Likewise.
> (gomp_free_device_memory): Likewise.
> (gomp_map_vars_existing): Adjust to call gomp_copy_host2dev().
> (gomp_map_pointer): Likewise.
> (gomp_map_vars): Adjust to call gomp_copy_host2dev(), handle
> NULL value from alloc_func plugin hook.
> (gomp_unmap_tgt): Adjust to call gomp_free_device_memory().
> (gomp_copy_from_async): Adjust to call gomp_copy_dev2host().
> (gomp_unmap_vars): Likewise.
> (gomp_update): Adjust to call gomp_copy_dev2host() and
> gomp_copy_host2dev() functions.
> (gomp_init_device): Handle false value from init_device_func
> plugin hook.
> (gomp_fini_device): Handle false value from fini_device_func
> plugin hook.
> (gomp_exit_data): Adjust to call gomp_copy_dev2host().
> (omp_target_free): Adjust to call gomp_free_device_memory().
> (omp_target_memcpy): Handle return values from host2dev_func,
> dev2host_func, and dev2dev_func plugin hooks.
> (omp_target_memcpy_rect_worker): Likewise.
> * libgomp.h (struct gomp_device_descr): Adjust return type of
> init_device_func, fini_device_func, free_func, dev2host_func,
> host2dev_func, and dev2dev_func plugin hooks from 'void *' to
> bool.
> * oacc-host.c (host_init_device): Change return type to bool.
> (host_fini_device): Likewise.
> (host_free): Likewise.
> (host_dev2host): Likewise.
> (host_host2dev): Likewise.
> * oacc-mem.c (acc_free): Handle plugin hook fatal error case.
> (acc_memcpy_to_device): Likewise.
> (acc_memcpy_from_device): Likewise.
> (delete_copyout): Add libfnname parameter, handle free_func
> hook fatal error case.
> (acc_delete): Adjust delete_copyout call.
> (acc_copyout): Likewise.
> 
> 
> 

Index: libgomp.h
===
--- libgomp.h   (revision 232047)
+++ libgomp.h   (working copy)
@@ -927,16 +927,17 @@ struct gomp_device_descr
   unsigned int (*get_caps_func) (void);
   int (*get_type_func) (void);
   int (*get_num_devices_func) (void);
-  void (*init_device_func) (int);
-  void (*fini_device_func) (int);
+  bool (*init_device_func) (int);
+  bool (*fini_device_func) (int);
   unsigned (*version_func) (void);
   int (*load_image_func) (int, unsigned, const void *, struct addr_pair **);
   void (*unload_image_func) (int, unsigned, const void *);
   void *(*alloc_func) (int, size_t);
-  void (*free_func) (int, void *);
-  void *(*dev2host_func) (int, void *, const void *, size_t);
-  void *(*host2dev_func) (int, void *, const void *, size_t);
-  void *(*dev2dev_func) (int, void *, const void *, size_t);
+  bool (*free_func) (int, void *);
+  bool (*dev2host_func) (int, void *, const void *, size_t);
+  bool (*host2dev_func) (int, void *, const void *, size_t);
+  /*xxx*/
+  bool (*dev2dev_func) (int, void *, const void *, size_t);
   void (*run_func) (int, void *, void *);
   void (*async_run_func) (int, void *, void *, void *);
 
Index: oacc-host.c
===
--- oacc-host.c (revision 232047)
+++ oacc-host.c (working copy)
@@ -60,14 +60,16 @@ host_get_num_devices (void)
   return 1;
 }
 
-static void
+static bool
 host_init_device (int n __attribute__ ((unused)))
 {
+  return true;
 }
 
-static void
+static bool
 host_fini_device (int n __attribute__ ((unused)))
 {
+  return true;
 }
 
 static unsigned
@@ -98,28 +100,29 @@ host_alloc (int n __attribute__ ((unused)), size_t
   return gomp_malloc (s);
 }
 
-static void

[PATCH] OpenACC use_device clause ICE fix

2016-01-05 Thread Chung-Lin Tang
Hi,
we've been encountering an ICE for OpenACC host_data sections, which
has a use_device() clause similar to OpenMP use_device_ptr.

The ICE happens in make_decl_rtl() for scan-created variables, which IIUC,
should not be entered at all for automatic variables.

I believe the problem is, unlike other variable creation cases where the
code is split out into an offloaded child function, a host_data section
is actually host side code, so the child function local variable processing
doesn't apply here; the use_device() referenced variable has to be added
to the current host function.

So here is the quite small fix. This fixed the ICE for OpenACC on trunk
and gomp4. However when I tested it for OpenMP using the case that Julian
provided here[1], the same ICE appeared to be already fixed. I'm not sure
if some other interim change covered it up for OpenMP.

This patch was tested on trunk without regressions. Okay for trunk?

[1] https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00104.html

Thanks,
Chung-Lin

* omp-low.c (scan_sharing_clauses): Call add_local_decl() for
use_device/use_device_ptr variables.

Index: omp-low.c
===
--- omp-low.c   (revision 232047)
+++ omp-low.c   (working copy)
@@ -1972,7 +1972,10 @@ scan_sharing_clauses (tree clauses, omp_context *c
  gcc_assert (DECL_P (decl2));
  install_var_local (decl2, ctx);
}
- install_var_local (decl, ctx);
+ decl = install_var_local (decl, ctx);
+ /* use_device/use_device_ptr items are actually host side variables,
+not on the offloaded target; add to current function here.  */
+ add_local_decl (cfun, decl);
  break;
 
case OMP_CLAUSE_IS_DEVICE_PTR:


Re: [PATCH, libgomp] Rewire OpenACC async

2015-12-22 Thread Chung-Lin Tang
Ping.

On 2015/11/24 6:27 PM, Chung-Lin Tang wrote:
> Hi, this patch reworks some of the way that asynchronous copyouts are
> implemented for OpenACC in libgomp.
> 
> Before this patch, we had a somewhat confusing way of implementing this
> by having two refcounts for each mapping: refcount and async_refcount,
> which I never got working again after the last wave of async regressions
> showed up.
> 
> So this patch implements what I believe to be a simplification: async_refcount
> is removed, and instead of trying to queue the async copyouts during unmapping
> we actually do that during the plugin event handling. This requires a addition
> of the async stream integer as an argument to the register_async_cleanup
> plugin hook, but overall I think this should be more elegant than before.
> 
> This patch fixes the libgomp.oacc-c-c++-common/asyncwait-1.c regression.
> It also fixed data-[23].c regressions before, but some other recent check-in
> happened to already fixed those.
> 
> Tested without regressions, is this okay for trunk?
> 
> Thanks,
> Chung-Lin
> 
> 2015-11-24  Chung-Lin Tang  <clt...@codesourcery.com>
> 
> * oacc-plugin.h (GOMP_PLUGIN_async_unmap_vars): Add int parameter.
> * oacc-plugin.c (GOMP_PLUGIN_async_unmap_vars): Add 'int async'
> parameter, use to set async stream around call to gomp_unmap_vars,
> call gomp_unmap_vars() with 'do_copyfrom' set to true.
> * plugin/plugin-nvptx.c (struct ptx_event): Add 'int val' field.
> (event_gc): Adjust event handling loop, collect PTX_EVT_ASYNC_CLEANUP
> events and call GOMP_PLUGIN_async_unmap_vars() for each of them.
> (event_add): Add int parameter, initialize 'val' field when
> adding new ptx_event struct.
> (nvptx_evec): Adjust event_add() call arguments.
> (nvptx_host2dev): Likewise.
> (nvptx_dev2host): Likewise.
> (nvptx_wait_async): Likewise.
> (nvptx_wait_all_async): Likewise.
> (GOMP_OFFLOAD_openacc_register_async_cleanup): Add async parameter,
> pass to event_add() call.
> * oacc-host.c (host_openacc_register_async_cleanup): Add 'int async'
> parameter.
> * oacc-mem.c (gomp_acc_remove_pointer): Adjust async case to
> call openacc.register_async_cleanup_func() hook.
> * oacc-parallel.c (GOACC_parallel_keyed): Likewise.
> * target.c (gomp_copy_from_async): Delete function.
> (gomp_map_vars): Remove async_refcount.
> (gomp_unmap_vars): Likewise.
> (gomp_load_image_to_device): Likewise.
> (omp_target_associate_ptr): Likewise.
> * libgomp.h (struct splay_tree_key_s): Remove async_refcount.
> (acc_dispatch_t.register_async_cleanup_func): Add int parameter.
> (gomp_copy_from_async): Remove.
> 



[PATCH 1/3, libgomp] Resolve libgomp plugin deadlock on exit, libgomp proper parts

2015-12-14 Thread Chung-Lin Tang
[sorry, forgot to C gcc-patches in last send]

Hi Jakub,
these patches are a revision of 
https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01701.html
since that patch set have bitrotten by now.

To recap the original situation, due to the way that device locks are held
when entering plugin code, a GOMP_PLUGIN_fatal() call will deadlock when the
GOMP_unregister_var() exit destructor tries to obtain the same device lock.

This patch set revises many functions on libgomp plugin interface to return 
false on error,
and back to libgomp to release the lock and call gomp_fatal() there.

This first patch is the changes for the machine independent libgomp proper. The 
entire patch
set was tested without regressions. Is this okay for trunk?

Thanks,
Chung-Lin

2015-12-14  Chung-Lin Tang  <clt...@codesourcery.com>

* target.c (gomp_device_copy): New function.
(gomp_copy_host2dev): Likewise.
(gomp_copy_dev2host): Likewise.
(gomp_free_device_memory): Likewise.
(gomp_map_vars_existing): Adjust to call gomp_copy_host2dev().
(gomp_map_pointer): Likewise.
(gomp_map_vars): Adjust to call gomp_copy_host2dev(), handle
NULL value from alloc_func plugin hook.
(gomp_unmap_tgt): Adjust to call gomp_free_device_memory().
(gomp_copy_from_async): Adjust to call gomp_copy_dev2host().
(gomp_unmap_vars): Likewise.
(gomp_update): Adjust to call gomp_copy_dev2host() and
gomp_copy_host2dev() functions.
(gomp_init_device): Handle false value from init_device_func
plugin hook.
(gomp_fini_device): Handle false value from fini_device_func
plugin hook.
(gomp_exit_data): Adjust to call gomp_copy_dev2host().
(omp_target_free): Adjust to call gomp_free_device_memory().
(omp_target_memcpy): Handle return values from host2dev_func,
dev2host_func, and dev2dev_func plugin hooks.
(omp_target_memcpy_rect_worker): Likewise.
* libgomp.h (struct gomp_device_descr): Adjust return type of
init_device_func, fini_device_func, free_func, dev2host_func,
host2dev_func, and dev2dev_func plugin hooks from 'void *' to
bool.
* oacc-host.c (host_init_device): Change return type to bool.
(host_fini_device): Likewise.
(host_free): Likewise.
(host_dev2host): Likewise.
(host_host2dev): Likewise.
* oacc-mem.c (acc_free): Handle plugin hook fatal error case.
(acc_memcpy_to_device): Likewise.
(acc_memcpy_from_device): Likewise.
(delete_copyout): Add libfnname parameter, handle free_func
hook fatal error case.
(acc_delete): Adjust delete_copyout call.
(acc_copyout): Likewise.



Index: libgomp/libgomp.h
===
--- libgomp/libgomp.h	(revision 231613)
+++ libgomp/libgomp.h	(working copy)
@@ -914,16 +914,17 @@ struct gomp_device_descr
   unsigned int (*get_caps_func) (void);
   int (*get_type_func) (void);
   int (*get_num_devices_func) (void);
-  void (*init_device_func) (int);
-  void (*fini_device_func) (int);
+  bool (*init_device_func) (int);
+  bool (*fini_device_func) (int);
   unsigned (*version_func) (void);
   int (*load_image_func) (int, unsigned, const void *, struct addr_pair **);
   void (*unload_image_func) (int, unsigned, const void *);
   void *(*alloc_func) (int, size_t);
-  void (*free_func) (int, void *);
-  void *(*dev2host_func) (int, void *, const void *, size_t);
-  void *(*host2dev_func) (int, void *, const void *, size_t);
-  void *(*dev2dev_func) (int, void *, const void *, size_t);
+  bool (*free_func) (int, void *);
+  bool (*dev2host_func) (int, void *, const void *, size_t);
+  bool (*host2dev_func) (int, void *, const void *, size_t);
+  /*xxx*/
+  bool (*dev2dev_func) (int, void *, const void *, size_t);
   void (*run_func) (int, void *, void *);
   void (*async_run_func) (int, void *, void *, void *);
 
Index: libgomp/oacc-host.c
===
--- libgomp/oacc-host.c	(revision 231613)
+++ libgomp/oacc-host.c	(working copy)
@@ -60,14 +60,16 @@ host_get_num_devices (void)
   return 1;
 }
 
-static void
+static bool
 host_init_device (int n __attribute__ ((unused)))
 {
+  return true;
 }
 
-static void
+static bool
 host_fini_device (int n __attribute__ ((unused)))
 {
+  return true;
 }
 
 static unsigned
@@ -98,28 +100,29 @@ host_alloc (int n __attribute__ ((unused)), size_t
   return gomp_malloc (s);
 }
 
-static void
+static bool
 host_free (int n __attribute__ ((unused)), void *p)
 {
   free (p);
+  return true;
 }
 
-static void *
+static bool
 host_dev2host (int n __attribute__ ((unused)),
 	   void *h __attribute__ ((unused)),
 	   const void *d __attribute__ ((unused)),
 	   size_t s __attribute__ ((unused)))
 {
-  return NULL;
+  return true;
 }
 
-static void *
+static bool
 host_host2dev (int n __attri

[PATCH 2/3, libgomp] Resolve libgomp plugin deadlock on exit, nvptx parts

2015-12-14 Thread Chung-Lin Tang
These are the nvptx parts.

Thanks,
Chung-Lin

* plugin/plugin-nvptx.c (CUDA_CALL_ERET): New convenience macro.
(CUDA_CALL): Likewise.
(CUDA_CALL_ASSERT): Likewise.
(map_init): Change return type to bool, use CUDA_CALL* macros.
(map_fini): Likewise.
(init_streams_for_device): Change return type to bool, adjust
call to map_init.
(fini_streams_for_device): Change return type to bool, adjust
call to map_fini.
(select_stream_for_async): Release stream_lock before calls to
GOMP_PLUGIN_fatal, adjust call to map_init.
(nvptx_init): Use CUDA_CALL* macros.
(nvptx_attach_host_thread_to_device): Change return type to bool,
use CUDA_CALL* macros.
(nvptx_open_device): Use CUDA_CALL* macros.
(nvptx_close_device): Change return type to bool, use CUDA_CALL*
macros.
(nvptx_get_num_devices): Use CUDA_CALL* macros.
(link_ptx): Change return type to bool, use CUDA_CALL* macros.
(nvptx_exec): Use CUDA_CALL* macros.
(nvptx_alloc): Use CUDA_CALL* macros.
(nvptx_free): Change return type to bool, use CUDA_CALL* macros.
(nvptx_host2dev): Likewise.
(nvptx_dev2host): Likewise.
(nvptx_wait): Use CUDA_CALL* macros.
(nvptx_wait_async): Likewise.
(nvptx_wait_all): Likewise.
(nvptx_wait_all_async): Likewise.
(nvptx_set_cuda_stream): Adjust order of stream_lock acquire,
use CUDA_CALL* macros, adjust call to map_fini.
(GOMP_OFFLOAD_init_device): Change return type to bool,
adjust code accordingly.
(GOMP_OFFLOAD_fini_device): Likewise.
(GOMP_OFFLOAD_load_image): Adjust calls to
nvptx_attach_host_thread_to_device/link_ptx to handle errors,
use CUDA_CALL* macros.
(GOMP_OFFLOAD_alloc): Adjust calls to code to handle error return.
(GOMP_OFFLOAD_free): Change return type to bool, adjust calls to
handle error return.
(GOMP_OFFLOAD_dev2host): Likewise.
(GOMP_OFFLOAD_host2dev): Likewise.
(GOMP_OFFLOAD_openacc_register_async_cleanup): Use CUDA_CALL* macros.
(GOMP_OFFLOAD_openacc_create_thread_data): Likewise.
Index: libgomp/plugin/plugin-nvptx.c
===
--- libgomp/plugin/plugin-nvptx.c	(revision 231613)
+++ libgomp/plugin/plugin-nvptx.c	(working copy)
@@ -63,6 +63,34 @@ cuda_error (CUresult r)
   return desc;
 }
 
+/* Convenience macros for the frequently used CUDA library call and
+   error handling sequence.  This does not capture all the cases we
+   use in this file, but is common enough.  */
+
+#define CUDA_CALL_ERET(ERET, FN, ...)		\
+  do {		\
+unsigned __r = FN (__VA_ARGS__);		\
+if (__r != CUDA_SUCCESS)			\
+  {		\
+	GOMP_PLUGIN_error (#FN " error: %s",	\
+			   cuda_error (__r));	\
+	return ERET;\
+  }		\
+  } while (0)
+
+#define CUDA_CALL(FN, ...)			\
+  CUDA_CALL_ERET (false, (FN), __VA_ARGS__)
+
+#define CUDA_CALL_ASSERT(FN, ...)		\
+  do {		\
+unsigned __r = FN (__VA_ARGS__);		\
+if (__r != CUDA_SUCCESS)			\
+  {		\
+	GOMP_PLUGIN_fatal (#FN " error: %s",	\
+			   cuda_error (__r));	\
+  }		\
+  } while (0)
+
 static unsigned int instantiated_devices = 0;
 static pthread_mutex_t ptx_dev_lock = PTHREAD_MUTEX_INITIALIZER;
 
@@ -98,25 +126,18 @@ struct map
   charmappings[0];
 };
 
-static void
+static bool
 map_init (struct ptx_stream *s)
 {
-  CUresult r;
-
   int size = getpagesize ();
 
   assert (s);
   assert (!s->d);
   assert (!s->h);
 
-  r = cuMemAllocHost (>h, size);
-  if (r != CUDA_SUCCESS)
-GOMP_PLUGIN_fatal ("cuMemAllocHost error: %s", cuda_error (r));
+  CUDA_CALL (cuMemAllocHost, >h, size);
+  CUDA_CALL (cuMemHostGetDevicePointer, >d, s->h, 0);
 
-  r = cuMemHostGetDevicePointer (>d, s->h, 0);
-  if (r != CUDA_SUCCESS)
-GOMP_PLUGIN_fatal ("cuMemHostGetDevicePointer error: %s", cuda_error (r));
-
   assert (s->h);
 
   s->h_begin = s->h;
@@ -125,16 +146,14 @@ map_init (struct ptx_stream *s)
 
   assert (s->h_next);
   assert (s->h_end);
+  return true;
 }
 
-static void
+static bool
 map_fini (struct ptx_stream *s)
 {
-  CUresult r;
-
-  r = cuMemFreeHost (s->h);
-  if (r != CUDA_SUCCESS)
-GOMP_PLUGIN_fatal ("cuMemFreeHost error: %s", cuda_error (r));
+  CUDA_CALL (cuMemFreeHost, s->h);
+  return true;
 }
 
 static void
@@ -325,7 +344,7 @@ nvptx_thread (void)
   return (struct nvptx_thread *) GOMP_PLUGIN_acc_thread ();
 }
 
-static void
+static bool
 init_streams_for_device (struct ptx_device *ptx_dev, int concurrency)
 {
   int i;
@@ -337,9 +356,10 @@ init_streams_for_device (struct ptx_device *ptx_de
   null_stream->multithreaded = true;
   null_stream->d = (CUdeviceptr) NULL;
   null_stream->h = NULL;
-  map_init (null_stream);
-  ptx_dev->null_stream = null_stream;
+  if (!map_init (null_stream))
+return false;
 
+  ptx_dev->null_stream = 

[PATCH 3/3, libgomp] Resolve libgomp plugin deadlock on exit, intelmic parts

2015-12-14 Thread Chung-Lin Tang
Hi Ilya,
thanks for the prior review 
(https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01893.html),
This version is mostly a like the prior one, with some minor code updates.

Thanks,
Chung-Lin

2015-12-14  Chung-Lin Tang  <clt...@codesourcery.com>

* plugin/libgomp-plugin-intelmic.cpp (offload): Change return type
to bool, adjust return code.
(GOMP_OFFLOAD_init_device): Likewise.
(GOMP_OFFLOAD_fini_device): Likewise.
(get_target_table): Likewise.
(offload_image): Likwise.
(GOMP_OFFLOAD_load_image): Adjust call to offload_image(), change
exit() to return error.
(GOMP_OFFLOAD_alloc): Change return type to bool, change to use out
parameter to return allocated pointer.
(GOMP_OFFLOAD_free): Change return type to bool, adjust return code.
(GOMP_OFFLOAD_host2dev): Likewise.
(GOMP_OFFLOAD_dev2host): Likewise.
(GOMP_OFFLOAD_dev2dev): Likewise.
Index: liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
===
--- liboffloadmic/plugin/libgomp-plugin-intelmic.cpp	(revision 231613)
+++ liboffloadmic/plugin/libgomp-plugin-intelmic.cpp	(working copy)
@@ -205,7 +205,7 @@ GOMP_OFFLOAD_get_num_devices (void)
   return num_devices;
 }
 
-static void
+static bool
 offload (const char *file, uint64_t line, int device, const char *name,
 	 int num_vars, VarDesc *vars, const void **async_data)
 {
@@ -213,20 +213,21 @@ offload (const char *file, uint64_t line, int devi
   if (ofld)
 {
   if (async_data == NULL)
-	__offload_offload1 (ofld, name, 0, num_vars, vars, NULL, 0, NULL, NULL);
+	return __offload_offload1 (ofld, name, 0, num_vars, vars, NULL, 0,
+   NULL, NULL);
   else
 	{
 	  OffloadFlags flags;
 	  flags.flags = 0;
 	  flags.bits.omp_async = 1;
-	  __offload_offload3 (ofld, name, 0, num_vars, vars, NULL, 0, NULL,
-			  async_data, 0, NULL, flags, NULL);
+	  return __offload_offload3 (ofld, name, 0, num_vars, vars, NULL, 0,
+ NULL, async_data, 0, NULL, flags, NULL);
 	}
 }
   else
 {
-  fprintf (stderr, "%s:%d: Offload target acquire failed\n", file, line);
-  exit (1);
+  GOMP_PLUGIN_error ("%s:%d: Offload target acquire failed\n", file, line);
+  return false;
 }
 }
 
@@ -256,24 +257,25 @@ register_main_image ()
 
 /* liboffloadmic loads and runs offload_target_main on all available devices
during a first call to offload ().  */
-extern "C" void
+extern "C" bool
 GOMP_OFFLOAD_init_device (int device)
 {
   TRACE ("(device = %d)", device);
   pthread_once (_image_is_registered, register_main_image);
-  offload (__FILE__, __LINE__, device, "__offload_target_init_proc", 0, NULL,
-	   NULL);
+  return offload (__FILE__, __LINE__, device, "__offload_target_init_proc", 0,
+		  NULL, NULL);
 }
 
-extern "C" void
+extern "C" bool
 GOMP_OFFLOAD_fini_device (int device)
 {
   TRACE ("(device = %d)", device);
   /* Unreachable for GOMP_OFFLOAD_CAP_OPENMP_400.  */
   abort ();
+  return true;
 }
 
-static void
+static bool
 get_target_table (int device, int _funcs, int _vars, void **)
 {
   VarDesc vd1[2] = { vd_tgt2host, vd_tgt2host };
@@ -282,8 +284,9 @@ get_target_table (int device, int _funcs, int
   vd1[1].ptr = _vars;
   vd1[1].size = sizeof (num_vars);
 
-  offload (__FILE__, __LINE__, device, "__offload_target_table_p1", 2, vd1,
-	   NULL);
+  if (!offload (__FILE__, __LINE__, device, "__offload_target_table_p1", 2,
+		vd1, NULL))
+return false;
 
   int table_size = num_funcs + 2 * num_vars;
   if (table_size > 0)
@@ -295,15 +298,16 @@ get_target_table (int device, int _funcs, int
   vd2.ptr = table;
   vd2.size = table_size * sizeof (void *);
 
-  offload (__FILE__, __LINE__, device, "__offload_target_table_p2", 1, ,
-	   NULL);
+  return offload (__FILE__, __LINE__, device, "__offload_target_table_p2",
+		  1, , NULL);
 }
+  return true;
 }
 
 /* Offload TARGET_IMAGE to all available devices and fill address_table with
corresponding target addresses.  */
 
-static void
+static bool
 offload_image (const void *target_image)
 {
   void *image_start = ((void **) target_image)[0];
@@ -317,8 +321,8 @@ offload_image (const void *target_image)
 		   + image_size);
   if (!image)
 {
-  fprintf (stderr, "%s: Can't allocate memory\n", __FILE__);
-  exit (1);
+  GOMP_PLUGIN_error ("%s: Can't allocate memory\n", __FILE__);
+  return false;
 }
 
   image->size = image_size;
@@ -333,13 +337,14 @@ offload_image (const void *target_image)
 
   /* Receive tables for target_image from all devices.  */
   DevAddrVect dev_table;
+  bool ret = true;
   for (int dev = 0; dev < num_devices; dev++)
 {
   int num_funcs = 0;
   int num_vars = 0;
   void 

Re: [PATCH, libgomp] Rewire OpenACC async

2015-12-05 Thread Chung-Lin Tang
On 2015/12/1 08:01 PM, Julian Brown wrote:
> On Tue, 24 Nov 2015 18:27:24 +0800
> Chung-Lin Tang <clt...@codesourcery.com> wrote:
> 
>> Hi, this patch reworks some of the way that asynchronous copyouts are
>> implemented for OpenACC in libgomp.
>>
>> Before this patch, we had a somewhat confusing way of implementing
>> this by having two refcounts for each mapping: refcount and
>> async_refcount, which I never got working again after the last wave
>> of async regressions showed up.
>>
>> So this patch implements what I believe to be a simplification:
>> async_refcount is removed, and instead of trying to queue the async
>> copyouts during unmapping we actually do that during the plugin event
>> handling. This requires a addition of the async stream integer as an
>> argument to the register_async_cleanup plugin hook, but overall I
>> think this should be more elegant than before.
> 
> This looks OK to me I think (I've only looked fairly briefly). I vaguely
> remember trying something along these lines in an earlier iteration of
> the async support -- maybe hitting problems with locking (I see you
> have code to mitigate problems with that, and locking generally has
> probably evolved a bit since I last looked at the code in detail
> anyway).
> 
> Can event_gc ever be called when the *device* lock is held?

It only matters when the memmap_lockable argument is true, and for those
cases, no the device lock is never held.

> I'm slightly concerned that pushing async unmapping into event_gc means
> that program-level semantics are deferred to the backend, which is
> arguably the wrong place. But then I don't understand what went wrong
> with the dual-refcount implementation, so maybe it's unavoidable for
> some reason.

I got the dual-refcounting to work again (after the regressions first showed up)
in some cases briefly, but regressed in other testcases, which I don't recall
the full details now.
Indeed the copyout is now triggered inside the plugin, but it is still wrapped
inside GOMP_PLUGIN_async_unmap_vars(), so it's probably not too ugly.

Per our earlier internal discussion, I'm committing this to the gomp4 branch 
first.
Trunk will need to wait for Jakub's approval.

Thanks,
Chung-Lin



Re: [PATCH, C++] Wrap OpenACC wait in EXPR_STMT

2015-12-05 Thread Chung-Lin Tang
On 2015/12/3 06:32 PM, Chung-Lin Tang wrote:
> On 2015/12/3 6:11 PM, Jakub Jelinek wrote:
>> On Thu, Dec 03, 2015 at 06:05:36PM +0800, Chung-Lin Tang wrote:
>>>> Oh wait, it looks like the C++ front end is not actually using the
>>>> functions defined in the C/C++-shared gcc/c-family/c-omp.c, but has its
>>>> own implementations in gcc/cp/semantics.c, without "c_" prefixes?  In
>>>> addition to finish_expr_stmt calls, I see it's also using
>>>> finish_call_expr instead of build_call_expr_loc/build_call_expr_loc_vec.
>>>> So I guess we'll want to model this the same way for OpenACC support
>>>> functions, and then (later) we should clean this up, to move the
>>>> C-specific code from gcc/c-family/c-omp.c into the C front end?  (Jakub?)
>>>
>>> I see most OpenACC/OpenMP constructs are represented by special statement 
>>> codes,
>>> so they should be a different case. I so far only see the OpenACC wait 
>>> directive
>>> being represented as a CALL_EXPR (maybe there are others, haven't 
>>> exhaustively searched).
>>
>> No, Thomas is right, just look at
>> finish_omp_{barrier,flush,taskwait,taskyield,cancel,cancellation_point},
>> all those are represented as CALL_EXPRs.
>>
>>  Jakub
>>
> 
> Okay, I guess my impression was only for some OpenACC constructs.
> 
> Overall, OpenACC wait seems one of the few cases of using c_finish_* in 
> cp/parser.c.
> Whether other cases should move towards/away from that kind of style is a 
> larger question,
> I was only trying to fix a libgomp.oacc-c++/template-reduction.C regression 
> (testcase currently still in gomp4 branch)
> 
> Chung-Lin
> 

Per our internal discussion, I will commit this patch first to the gomp4 branch,
while awaiting trunk approval.

Thanks,
Chung-Lin


Re: [PATCH, C++] Wrap OpenACC wait in EXPR_STMT

2015-12-03 Thread Chung-Lin Tang
On 2015/12/3 6:11 PM, Jakub Jelinek wrote:
> On Thu, Dec 03, 2015 at 06:05:36PM +0800, Chung-Lin Tang wrote:
>>> Oh wait, it looks like the C++ front end is not actually using the
>>> functions defined in the C/C++-shared gcc/c-family/c-omp.c, but has its
>>> own implementations in gcc/cp/semantics.c, without "c_" prefixes?  In
>>> addition to finish_expr_stmt calls, I see it's also using
>>> finish_call_expr instead of build_call_expr_loc/build_call_expr_loc_vec.
>>> So I guess we'll want to model this the same way for OpenACC support
>>> functions, and then (later) we should clean this up, to move the
>>> C-specific code from gcc/c-family/c-omp.c into the C front end?  (Jakub?)
>>
>> I see most OpenACC/OpenMP constructs are represented by special statement 
>> codes,
>> so they should be a different case. I so far only see the OpenACC wait 
>> directive
>> being represented as a CALL_EXPR (maybe there are others, haven't 
>> exhaustively searched).
> 
> No, Thomas is right, just look at
> finish_omp_{barrier,flush,taskwait,taskyield,cancel,cancellation_point},
> all those are represented as CALL_EXPRs.
> 
>   Jakub
> 

Okay, I guess my impression was only for some OpenACC constructs.

Overall, OpenACC wait seems one of the few cases of using c_finish_* in 
cp/parser.c.
Whether other cases should move towards/away from that kind of style is a 
larger question,
I was only trying to fix a libgomp.oacc-c++/template-reduction.C regression 
(testcase currently still in gomp4 branch)

Chung-Lin



Re: [PATCH, C++] Wrap OpenACC wait in EXPR_STMT

2015-12-03 Thread Chung-Lin Tang
On 2015/12/3 4:59 PM, Thomas Schwinge wrote:
> Hi!
> 
> On Thu, 03 Dec 2015 09:51:31 +0100, I wrote:
>> On Mon, 23 Nov 2015 21:15:00 +0800, Chung-Lin Tang <clt...@codesourcery.com> 
>> wrote:
>>> The OpenACC wait directive is represented as a call to the runtime
>>> function "GOACC_wait" instead of a tree code.  I am seeing when
>>> '#pragma acc wait' is using inside a template function, the CALL_EXPR
>>> to GOACC_wait is being silently ignored/removed during tsubst_expr().
>>
>> Uh.
>>
>>> I think the correct way to organize this is that the call should be inside
>>> an EXPR_STMT, so here's a patch to do that; basically remove the
>>> add_stmt() call from the shared c_finish_oacc_wait() code, and add
>>> add_stmt()/finish_expr_stmt() in the corresponding C/C++ parts.
>>>
>>> Tested with no regressions on trunk, okay to commit?
>>
>>> --- c-family/c-omp.c(revision 230703)
>>> +++ c-family/c-omp.c(working copy)
>>> @@ -63,7 +63,6 @@ c_finish_oacc_wait (location_t loc, tree parms, tr
>>>  }
>>>  
>>>stmt = build_call_expr_loc_vec (loc, stmt, args);
>>> -  add_stmt (stmt);
>>>  
>>>vec_free (args);
>> |  
>> |return stmt;
>> |  }
>>
>> I see in gcc/c/c-omp.c that several other c_finish_omp_* functions that
>> build builtin calls instead of tree nodes, do similar things like
>> c_finish_oacc_wait; I'd like to understand why it's -- presumably -- not
>> a problem for these: c_finish_omp_barrier, c_finish_omp_taskwait,
>> c_finish_omp_taskyield, c_finish_omp_flush?  (Jakub?)
> 
> Oh wait, it looks like the C++ front end is not actually using the
> functions defined in the C/C++-shared gcc/c-family/c-omp.c, but has its
> own implementations in gcc/cp/semantics.c, without "c_" prefixes?  In
> addition to finish_expr_stmt calls, I see it's also using
> finish_call_expr instead of build_call_expr_loc/build_call_expr_loc_vec.
> So I guess we'll want to model this the same way for OpenACC support
> functions, and then (later) we should clean this up, to move the
> C-specific code from gcc/c-family/c-omp.c into the C front end?  (Jakub?)

I see most OpenACC/OpenMP constructs are represented by special statement codes,
so they should be a different case. I so far only see the OpenACC wait directive
being represented as a CALL_EXPR (maybe there are others, haven't exhaustively 
searched).

Chung-Lin




Re: [gomp4] Adjust Fortran OACC async lib test

2015-12-02 Thread Chung-Lin Tang
Ping.

Hi Thomas, this is only for gomp4 ATM, okay to commit?

Thanks,
Chung-Lin

On 2015/11/23 7:09 PM, Chung-Lin Tang wrote:
> Hi Thomas,
> this fix adds more acc_wait's to libgomp.oacc-fortran/lib-1[13].f90.
> 
> For lib-12.f90, it's sort of a fix before we can resolve the issue
> of intended semantics for "wait+async".
> 
> As for lib-13.f90, I believe these added acc_wait calls seem
> reasonable, since we can't immediately assume the async-launched parallels
> already completed there.
> 
> Does this seem reasonable?
> 
> Thanks,
> Chung-Lin
> 
>   * testsuite/libgomp.oacc-fortran/lib-12.f90 (main): Add acc_wait()
>   after async parallel construct.
>   * testsuite/libgomp.oacc-fortran/lib-13.f90 (main): Add acc_wait()
>   calls after parallel construct launches.
> 



Re: [PATCH, C++] Wrap OpenACC wait in EXPR_STMT

2015-12-02 Thread Chung-Lin Tang
Ping.

On 2015/11/23 9:15 PM, Chung-Lin Tang wrote:
> The OpenACC wait directive is represented as a call to the runtime
> function "GOACC_wait" instead of a tree code.  I am seeing when
> '#pragma acc wait' is using inside a template function, the CALL_EXPR
> to GOACC_wait is being silently ignored/removed during tsubst_expr().
> 
> I think the correct way to organize this is that the call should be inside
> an EXPR_STMT, so here's a patch to do that; basically remove the
> add_stmt() call from the shared c_finish_oacc_wait() code, and add
> add_stmt()/finish_expr_stmt() in the corresponding C/C++ parts.
> 
> Tested with no regressions on trunk, okay to commit?
> 
> Thanks,
> Chung-Lin
> 
>   * c-family/c-omp.c (c_finish_oacc_wait): Remove add_stmt() call.
>   * c/c-parser.c (c_parser_oacc_wait): Add add_stmt() call.
>   * cp/parser.c (cp_parser_oacc_wait): Add finish_expr_stmt() call.
> 



[PATCH, libgomp] Rewire OpenACC async

2015-11-24 Thread Chung-Lin Tang
Hi, this patch reworks some of the way that asynchronous copyouts are
implemented for OpenACC in libgomp.

Before this patch, we had a somewhat confusing way of implementing this
by having two refcounts for each mapping: refcount and async_refcount,
which I never got working again after the last wave of async regressions
showed up.

So this patch implements what I believe to be a simplification: async_refcount
is removed, and instead of trying to queue the async copyouts during unmapping
we actually do that during the plugin event handling. This requires a addition
of the async stream integer as an argument to the register_async_cleanup
plugin hook, but overall I think this should be more elegant than before.

This patch fixes the libgomp.oacc-c-c++-common/asyncwait-1.c regression.
It also fixed data-[23].c regressions before, but some other recent check-in
happened to already fixed those.

Tested without regressions, is this okay for trunk?

Thanks,
Chung-Lin

2015-11-24  Chung-Lin Tang  <clt...@codesourcery.com>

* oacc-plugin.h (GOMP_PLUGIN_async_unmap_vars): Add int parameter.
* oacc-plugin.c (GOMP_PLUGIN_async_unmap_vars): Add 'int async'
parameter, use to set async stream around call to gomp_unmap_vars,
call gomp_unmap_vars() with 'do_copyfrom' set to true.
* plugin/plugin-nvptx.c (struct ptx_event): Add 'int val' field.
(event_gc): Adjust event handling loop, collect PTX_EVT_ASYNC_CLEANUP
events and call GOMP_PLUGIN_async_unmap_vars() for each of them.
(event_add): Add int parameter, initialize 'val' field when
adding new ptx_event struct.
(nvptx_evec): Adjust event_add() call arguments.
(nvptx_host2dev): Likewise.
(nvptx_dev2host): Likewise.
(nvptx_wait_async): Likewise.
(nvptx_wait_all_async): Likewise.
(GOMP_OFFLOAD_openacc_register_async_cleanup): Add async parameter,
pass to event_add() call.
* oacc-host.c (host_openacc_register_async_cleanup): Add 'int async'
parameter.
* oacc-mem.c (gomp_acc_remove_pointer): Adjust async case to
call openacc.register_async_cleanup_func() hook.
* oacc-parallel.c (GOACC_parallel_keyed): Likewise.
* target.c (gomp_copy_from_async): Delete function.
(gomp_map_vars): Remove async_refcount.
(gomp_unmap_vars): Likewise.
(gomp_load_image_to_device): Likewise.
(omp_target_associate_ptr): Likewise.
* libgomp.h (struct splay_tree_key_s): Remove async_refcount.
(acc_dispatch_t.register_async_cleanup_func): Add int parameter.
(gomp_copy_from_async): Remove.
Index: plugin/plugin-nvptx.c
===
--- plugin/plugin-nvptx.c   (revision 230796)
+++ plugin/plugin-nvptx.c   (working copy)
@@ -310,6 +310,7 @@ struct ptx_event
   int type;
   void *addr;
   int ord;
+  int val;
 
   struct ptx_event *next;
 };
@@ -786,6 +787,7 @@ static void
 event_gc (bool memmap_lockable)
 {
   struct ptx_event *ptx_event = ptx_events;
+  struct ptx_event *async_cleanups = NULL;
   struct nvptx_thread *nvthd = nvptx_thread ();
 
   pthread_mutex_lock (_event_lock);
@@ -803,6 +805,7 @@ event_gc (bool memmap_lockable)
   r = cuEventQuery (*e->evt);
   if (r == CUDA_SUCCESS)
{
+ bool append_async = false;
  CUevent *te;
 
  te = e->evt;
@@ -827,7 +830,7 @@ event_gc (bool memmap_lockable)
if (!memmap_lockable)
  continue;
 
-   GOMP_PLUGIN_async_unmap_vars (e->addr);
+   append_async = true;
  }
  break;
}
@@ -835,6 +838,7 @@ event_gc (bool memmap_lockable)
  cuEventDestroy (*te);
  free ((void *)te);
 
+ /* Unlink 'e' from ptx_events list.  */
  if (ptx_events == e)
ptx_events = ptx_events->next;
  else
@@ -845,15 +849,31 @@ event_gc (bool memmap_lockable)
  e_->next = e_->next->next;
}
 
- free (e);
+ if (append_async)
+   {
+ e->next = async_cleanups;
+ async_cleanups = e;
+   }
+ else
+   free (e);
}
 }
 
   pthread_mutex_unlock (_event_lock);
+
+  /* We have to do these here, after ptx_event_lock is released.  */
+  while (async_cleanups)
+{
+  struct ptx_event *e = async_cleanups;
+  async_cleanups = async_cleanups->next;
+
+  GOMP_PLUGIN_async_unmap_vars (e->addr, e->val);
+  free (e);
+}
 }
 
 static void
-event_add (enum ptx_event_type type, CUevent *e, void *h)
+event_add (enum ptx_event_type type, CUevent *e, void *h, int val)
 {
   struct ptx_event *ptx_event;
   struct nvptx_thread *nvthd = nvptx_thread ();
@@ -866,6 +886,7 @@ static void
   ptx_event->evt = e;
   ptx_event->addr = h;
   ptx_event->ord = nvthd->ptx_dev->

[gomp4] Adjust Fortran OACC async lib test

2015-11-23 Thread Chung-Lin Tang
Hi Thomas,
this fix adds more acc_wait's to libgomp.oacc-fortran/lib-1[13].f90.

For lib-12.f90, it's sort of a fix before we can resolve the issue
of intended semantics for "wait+async".

As for lib-13.f90, I believe these added acc_wait calls seem
reasonable, since we can't immediately assume the async-launched parallels
already completed there.

Does this seem reasonable?

Thanks,
Chung-Lin

* testsuite/libgomp.oacc-fortran/lib-12.f90 (main): Add acc_wait()
after async parallel construct.
* testsuite/libgomp.oacc-fortran/lib-13.f90 (main): Add acc_wait()
calls after parallel construct launches.
Index: libgomp.oacc-fortran/lib-12.f90
===
--- libgomp.oacc-fortran/lib-12.f90 (revision 230719)
+++ libgomp.oacc-fortran/lib-12.f90 (working copy)
@@ -15,6 +15,8 @@ program main
 end do
   !$acc end parallel
 
+  call acc_wait (0)
+
   call acc_wait_async (0, 1)
 
   if (acc_async_test (0) .neqv. .TRUE.) call abort
Index: libgomp.oacc-fortran/lib-13.f90
===
--- libgomp.oacc-fortran/lib-13.f90 (revision 230719)
+++ libgomp.oacc-fortran/lib-13.f90 (working copy)
@@ -21,6 +21,9 @@ program main
 end do
   !$acc end data
 
+  call acc_wait (1)
+  call acc_wait (2)
+
   if (acc_async_test (1) .neqv. .TRUE.) call abort
   if (acc_async_test (2) .neqv. .TRUE.) call abort
 


[PATCH, C++] Wrap OpenACC wait in EXPR_STMT

2015-11-23 Thread Chung-Lin Tang
The OpenACC wait directive is represented as a call to the runtime
function "GOACC_wait" instead of a tree code.  I am seeing when
'#pragma acc wait' is using inside a template function, the CALL_EXPR
to GOACC_wait is being silently ignored/removed during tsubst_expr().

I think the correct way to organize this is that the call should be inside
an EXPR_STMT, so here's a patch to do that; basically remove the
add_stmt() call from the shared c_finish_oacc_wait() code, and add
add_stmt()/finish_expr_stmt() in the corresponding C/C++ parts.

Tested with no regressions on trunk, okay to commit?

Thanks,
Chung-Lin

* c-family/c-omp.c (c_finish_oacc_wait): Remove add_stmt() call.
* c/c-parser.c (c_parser_oacc_wait): Add add_stmt() call.
* cp/parser.c (cp_parser_oacc_wait): Add finish_expr_stmt() call.
Index: c-family/c-omp.c
===
--- c-family/c-omp.c(revision 230703)
+++ c-family/c-omp.c(working copy)
@@ -63,7 +63,6 @@ c_finish_oacc_wait (location_t loc, tree parms, tr
 }
 
   stmt = build_call_expr_loc_vec (loc, stmt, args);
-  add_stmt (stmt);
 
   vec_free (args);
 
Index: c/c-parser.c
===
--- c/c-parser.c(revision 230703)
+++ c/c-parser.c(working copy)
@@ -13886,6 +13886,7 @@ c_parser_oacc_wait (location_t loc, c_parser *pars
   strcpy (p_name, " wait");
   clauses = c_parser_oacc_all_clauses (parser, OACC_WAIT_CLAUSE_MASK, p_name);
   stmt = c_finish_oacc_wait (loc, list, clauses);
+  add_stmt (stmt);
 
   return stmt;
 }
Index: cp/parser.c
===
--- cp/parser.c (revision 230703)
+++ cp/parser.c (working copy)
@@ -34930,6 +34930,7 @@ cp_parser_oacc_wait (cp_parser *parser, cp_token *
"#pragma acc wait", pragma_tok);
 
   stmt = c_finish_oacc_wait (loc, list, clauses);
+  stmt = finish_expr_stmt (stmt);
 
   return stmt;
 }


Re: [PATCH 1/3, libgomp] Adjust offload plugin interface for avoiding deadlock on exit

2015-09-29 Thread Chung-Lin Tang
On 2015/9/25 上午 04:27, Ilya Verbin wrote:
> On Thu, Aug 27, 2015 at 21:44:50 +0800, Chung-Lin Tang wrote:
>> We've discovered that, for several of the libgomp plugin interface routines,
>> if the target specific routine calls exit() (usually upon a fatal condition),
>> deadlock ensues. We found this using nvptx, but it's possible on intelmic as 
>> well.
>>
>> This is due to many of the plugin routines are called with the device lock 
>> held,
>> and when exit() is called inside the plugin code, the GOMP_unregister_var() 
>> destructor
>> tries to iterate through and acquire all device locks to cleanup. Since we 
>> already hold
>> one of the device locks, this just gets stuck.  Also because gomp_mutex_t is 
>> a
>> simple futex based lock implementation (instead of pthreads), we don't have a
>> trylock mechanism to use either.
>>
>> So this patch tries to alleviate this problem by changing the plugin 
>> interface;
>> the plugin routines that are called while holding the device lock are 
>> adjusted
>> to assume to never fatal exit, but return a value back to libgomp proper to
>> indicate execution results. The core libgomp code then may unlock and call 
>> gomp_fatal().
>>
>> We believe this is the right route to solve the problem, since there's only
>> two accel target plugins so far. Besides the nvptx plugin, I have made some 
>> effort
>> to update the intelmic plugin as well, though it's not as thoroughly audited.
>> Intel folks might want to further make sure your plugin code is free of this 
>> problem as well.
>>
>> This patch contains the libgomp proper changes. The nvptx and intelmic 
>> patches follow.
>> I have tested the libgomp testsuite without regressions for both accel 
>> targets, is this
>> okay for trunk?
> 
> (I have no objections)
> 
> However, in case of intelmic, these exit()s are just the tip of the iceberg,
> because underlying liboffloadmic contains other exit()s at fatal errors.
> And I don't know what to do with such deadlocks.
> 
>   -- Ilya

Yes, I think I saw more things to adjust wrt this issue within liboffloadmic, 
though I
hope this plugin interface change can set things ready.

And ping again, for the libgomp proper changes.

Thanks,
Chung-Lin





Re: [PATCH, libgomp] PR 67141, uninitialized acc_device_lock mutex

2015-09-22 Thread Chung-Lin Tang
On 2015/9/18 04:02 PM, Jakub Jelinek wrote:
> On Fri, Sep 18, 2015 at 03:41:30PM +0800, Chung-Lin Tang wrote:
>> this patch fixes the uninitialized acc_device_lock mutex situation
>> reported in PR 67141. The patch attached on the bugzilla page
>> tries to solve it by constructor priorities, which we think will
>> probably be less manageable in general.
>>
>> This patch changes goacc_host_init() to be called from
>> goacc_runtime_initialize() instead, thereby ensuring the init order.
>> libgomp testsuite was re-run without regressions, okay for trunk?
>>
>> Thanks,
>> Chung-Lin
>>
>> 2015-09-18  Chung-Lin Tang  <clt...@codesourcery.com>
>>
>>  PR libgomp/67141
>>
> 
> No vertical space in between PR line and subsequent entries.
> 
>>  * oacc-int.h (goacc_host_init): Add declaration.
>>  * oacc-host.c (goacc_host_init): Remove static and
>>  constructor attribute
> 
> Full stop at the end of entry.
> 
>>  * oacc-init.c (goacc_runtime_initialize): Call goacc_host_init()
>>  at end.
> 
> The patch is ok.  Though, perhaps as a follow-up, I think I'd prefer getting
> rid of pthread_key_create (_cleanup_key, goacc_destroy_thread);,
> it is wasteful if we do the same thing in initialize_team.  As the
> goacc_tls_data pointer is __thread anyway, I think just putting it into
> struct gomp_thread, arranging for init_team to be called from the env.c
> ctor and from the team TLS destructor call also some oacc freeing if
> the goacc_tls_data pointer is non-NULL (perhaps with __builtin_expect
> unlikely).
> 
>   Jakub

Committed, thanks for the review.
I believe this patch is also needed for 5.x, okay for that branch as well?

Thanks,
Chung-Lin




[PATCH, nios2] Fix to nios2_legitimize_address

2015-09-22 Thread Chung-Lin Tang
Nios II Linux had a bad TLS relocation generated, exposed by the
test case for PR 65771. A fix for this in nios2_legitimize_address() was
tested and applied.

Chung-Lin

2015-09-22  Chung-Lin Tang  <clt...@codesourcery.com>

* config/nios2/nios2.c (nios2_legitimize_address): When handling
'reg + reloc' cases, allow first operand to be non-REG, and use
force_reg() to enforce address pattern.
Index: nios2.c
===
--- nios2.c	(revision 227931)
+++ nios2.c	(working copy)
@@ -2265,15 +2265,15 @@
 
  Which will be output as '%tls_le(var+48)(r23)' in assembly.  */
   if (GET_CODE (x) == PLUS
-  && GET_CODE (XEXP (x, 0)) == REG
   && GET_CODE (XEXP (x, 1)) == CONST)
 {
-  rtx unspec, offset, reg = XEXP (x, 0);
+  rtx unspec, offset;
   split_const (XEXP (x, 1), , );
   if (GET_CODE (unspec) == UNSPEC
 	  && !nios2_large_offset_p (XINT (unspec, 1))
 	  && offset != const0_rtx)
 	{
+	  rtx reg = force_reg (Pmode, XEXP (x, 0));
 	  unspec = copy_rtx (unspec);
 	  XVECEXP (unspec, 0, 0)
 	= plus_constant (Pmode, XVECEXP (unspec, 0, 0), INTVAL (offset));


Re: [PATCH 1/3, libgomp] Adjust offload plugin interface for avoiding deadlock on exit

2015-09-22 Thread Chung-Lin Tang
Ping x2.

On 2015/9/9 04:08 PM, Chung-Lin Tang wrote:
> Ping.
> 
> On 2015/8/27 09:44 PM, Chung-Lin Tang wrote:
>> We've discovered that, for several of the libgomp plugin interface routines,
>> if the target specific routine calls exit() (usually upon a fatal condition),
>> deadlock ensues. We found this using nvptx, but it's possible on intelmic as 
>> well.
>>
>> This is due to many of the plugin routines are called with the device lock 
>> held,
>> and when exit() is called inside the plugin code, the GOMP_unregister_var() 
>> destructor
>> tries to iterate through and acquire all device locks to cleanup. Since we 
>> already hold
>> one of the device locks, this just gets stuck.  Also because gomp_mutex_t is 
>> a
>> simple futex based lock implementation (instead of pthreads), we don't have a
>> trylock mechanism to use either.
>>
>> So this patch tries to alleviate this problem by changing the plugin 
>> interface;
>> the plugin routines that are called while holding the device lock are 
>> adjusted
>> to assume to never fatal exit, but return a value back to libgomp proper to
>> indicate execution results. The core libgomp code then may unlock and call 
>> gomp_fatal().
>>
>> We believe this is the right route to solve the problem, since there's only
>> two accel target plugins so far. Besides the nvptx plugin, I have made some 
>> effort
>> to update the intelmic plugin as well, though it's not as thoroughly audited.
>> Intel folks might want to further make sure your plugin code is free of this 
>> problem as well.
>>
>> This patch contains the libgomp proper changes. The nvptx and intelmic 
>> patches follow.
>> I have tested the libgomp testsuite without regressions for both accel 
>> targets, is this
>> okay for trunk?
>>
>> Thanks,
>> Chung-Lin
>>
>> 2015-08-27 Chung-Lin Tang <clt...@codesourcery.com>
>>
>> * oacc-host.c (host_init_device): Change return type to bool.
>> (host_fini_device): Likewise.
>> (host_dev2host): Likewise.
>> (host_host2dev): Likewise.
>> (host_free): Likewise.
>> (host_alloc): Change return type to bool, change to use out
>> parameter to return allocated pointer.
>> * oacc-mem.c (acc_malloc): Adjust plugin hook declaration change,
>> handle fatal error.
>> (acc_free): Likewise.
>> (acc_memcpy_to_device): Likewise.
>> (acc_memcpy_from_device): Likewise.
>> * oacc-init.c (acc_init_1): Handle gomp_init_device return code,
>> handle fatal error.
>> (acc_set_device_type): Likewise.
>> (acc_set_device_num): Likewise.
>> * target.c (gomp_map_vars): Adjust alloc_func plugin hook call,
>> add device unlock, handle fatal error.
>> (gomp_unmap_tgt): Change return type to bool, adjust free_func
>> plugin call.
>> (gomp_copy_from_async): Handle dev2host_func return code, handle
>> fatal error.
>> (gomp_unmap_vars): Likewise.
>> (gomp_init_device): Change return type to bool, adjust call to
>> init_device_func plugin hook.
>> (GOMP_target): Adjust call to gomp_init_device, handle fatal error.
>> (GOMP_target_data): Likewise.
>> (GOMP_target_update): Likewise.
>> * libgomp.h (gomp_device_descr.init_device_func): Change return
>> type to bool.
>> (gomp_device_descr.fini_device_func): Likewise.
>> (gomp_device_descr.free_func): Likewise.
>> (gomp_device_descr.dev2host_func): Likewise.
>> (gomp_device_descr.host2dev_func) Likewise.
>> (gomp_device_descr.alloc_func): Change return
>> type to bool, use out parameter to return pointer.
>> (gomp_init_device): Change return
>> type to bool.
>>
> 



[PATCH, libgomp] PR 67141, uninitialized acc_device_lock mutex

2015-09-18 Thread Chung-Lin Tang
Hi,
this patch fixes the uninitialized acc_device_lock mutex situation
reported in PR 67141. The patch attached on the bugzilla page
tries to solve it by constructor priorities, which we think will
probably be less manageable in general.

This patch changes goacc_host_init() to be called from
goacc_runtime_initialize() instead, thereby ensuring the init order.
libgomp testsuite was re-run without regressions, okay for trunk?

Thanks,
Chung-Lin

2015-09-18  Chung-Lin Tang  <clt...@codesourcery.com>

PR libgomp/67141

* oacc-int.h (goacc_host_init): Add declaration.
* oacc-host.c (goacc_host_init): Remove static and
constructor attribute
* oacc-init.c (goacc_runtime_initialize): Call goacc_host_init()
at end.
Index: oacc-host.c
===
--- oacc-host.c	(revision 227895)
+++ oacc-host.c	(working copy)
@@ -256,7 +256,7 @@ static struct gomp_device_descr host_dispatch =
   };
 
 /* Initialize and register this device type.  */
-static __attribute__ ((constructor)) void
+void
 goacc_host_init (void)
 {
   gomp_mutex_init (_dispatch.lock);
Index: oacc-int.h
===
--- oacc-int.h	(revision 227895)
+++ oacc-int.h	(working copy)
@@ -97,6 +97,7 @@ void goacc_runtime_initialize (void);
 void goacc_save_and_set_bind (acc_device_t);
 void goacc_restore_bind (void);
 void goacc_lazy_initialize (void);
+void goacc_host_init (void);
 
 #ifdef HAVE_ATTRIBUTE_VISIBILITY
 # pragma GCC visibility pop
Index: oacc-init.c
===
--- oacc-init.c	(revision 227895)
+++ oacc-init.c	(working copy)
@@ -644,6 +644,9 @@ goacc_runtime_initialize (void)
 
   goacc_threads = NULL;
   gomp_mutex_init (_thread_lock);
+
+  /* Initialize and register the 'host' device type.  */
+  goacc_host_init ();
 }
 
 /* Compiler helper functions */


Re: [gomp4] force global locks for nvptx targets

2015-09-09 Thread Chung-Lin Tang
On 2015/9/9 04:02 AM, Cesar Philippidis wrote:
> This patch forces GOACC_LOCK to use locks in global memory regardless if
> the lock us for a worker or a gang. We were using a shared memory for
> worker locks, but we ran into an issue with that would sporadically
> involve deadlocks in worker reductions. We're still investigating that
> issue, but for the time being, global locks appear to work albeit with a
> lock contention penalty.
> 
> I've applied this patch to gomp-4_0-branch.
> 
> Cesar
> 

Fixed typo, committed as obvious.

Chung-Lin

2015-09-09  Chung-Lin Tang  <clt...@codesourcery.com>

* config/nvptx/nvptx.c (nvptx_xform_lock): Correct typo
of variable 'force_global_locks'.

Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c(revision 227582)
+++ config/nvptx/nvptx.c(working copy)
@@ -3744,7 +3744,7 @@ nvptx_xform_lock (gimple stmt, const int *ARG_UNUS
   return mode > GOMP_DIM_WORKER;
 
 case IFN_GOACC_LOCK_INIT:
-  return force_global_lock || mode != GOMP_DIM_WORKER;
+  return force_global_locks || mode != GOMP_DIM_WORKER;
 
 default: gcc_unreachable();
 }


Re: [PATCH 1/3, libgomp] Adjust offload plugin interface for avoiding deadlock on exit

2015-09-09 Thread Chung-Lin Tang
Ping.

On 2015/8/27 09:44 PM, Chung-Lin Tang wrote:
> We've discovered that, for several of the libgomp plugin interface routines,
> if the target specific routine calls exit() (usually upon a fatal condition),
> deadlock ensues. We found this using nvptx, but it's possible on intelmic as 
> well.
> 
> This is due to many of the plugin routines are called with the device lock 
> held,
> and when exit() is called inside the plugin code, the GOMP_unregister_var() 
> destructor
> tries to iterate through and acquire all device locks to cleanup. Since we 
> already hold
> one of the device locks, this just gets stuck.  Also because gomp_mutex_t is a
> simple futex based lock implementation (instead of pthreads), we don't have a
> trylock mechanism to use either.
> 
> So this patch tries to alleviate this problem by changing the plugin 
> interface;
> the plugin routines that are called while holding the device lock are adjusted
> to assume to never fatal exit, but return a value back to libgomp proper to
> indicate execution results. The core libgomp code then may unlock and call 
> gomp_fatal().
> 
> We believe this is the right route to solve the problem, since there's only
> two accel target plugins so far. Besides the nvptx plugin, I have made some 
> effort
> to update the intelmic plugin as well, though it's not as thoroughly audited.
> Intel folks might want to further make sure your plugin code is free of this 
> problem as well.
> 
> This patch contains the libgomp proper changes. The nvptx and intelmic 
> patches follow.
> I have tested the libgomp testsuite without regressions for both accel 
> targets, is this
> okay for trunk?
> 
> Thanks,
> Chung-Lin
> 
> 2015-08-27 Chung-Lin Tang <clt...@codesourcery.com>
> 
> * oacc-host.c (host_init_device): Change return type to bool.
> (host_fini_device): Likewise.
> (host_dev2host): Likewise.
> (host_host2dev): Likewise.
> (host_free): Likewise.
> (host_alloc): Change return type to bool, change to use out
> parameter to return allocated pointer.
> * oacc-mem.c (acc_malloc): Adjust plugin hook declaration change,
> handle fatal error.
> (acc_free): Likewise.
> (acc_memcpy_to_device): Likewise.
> (acc_memcpy_from_device): Likewise.
> * oacc-init.c (acc_init_1): Handle gomp_init_device return code,
> handle fatal error.
> (acc_set_device_type): Likewise.
> (acc_set_device_num): Likewise.
> * target.c (gomp_map_vars): Adjust alloc_func plugin hook call,
> add device unlock, handle fatal error.
> (gomp_unmap_tgt): Change return type to bool, adjust free_func
> plugin call.
> (gomp_copy_from_async): Handle dev2host_func return code, handle
> fatal error.
> (gomp_unmap_vars): Likewise.
> (gomp_init_device): Change return type to bool, adjust call to
> init_device_func plugin hook.
> (GOMP_target): Adjust call to gomp_init_device, handle fatal error.
> (GOMP_target_data): Likewise.
> (GOMP_target_update): Likewise.
> * libgomp.h (gomp_device_descr.init_device_func): Change return
> type to bool.
> (gomp_device_descr.fini_device_func): Likewise.
> (gomp_device_descr.free_func): Likewise.
> (gomp_device_descr.dev2host_func): Likewise.
> (gomp_device_descr.host2dev_func) Likewise.
> (gomp_device_descr.alloc_func): Change return
> type to bool, use out parameter to return pointer.
> (gomp_init_device): Change return
> type to bool.
> 



[PATCH] Propagate -fdiagnostics-* options in lto-wrapper

2015-09-06 Thread Chung-Lin Tang
Hi,
Currently most non-target specific options are skipped when crossing the
LTO/offload processing border, however since there are still quite a
number of warning calls in many target backends, it makes sense to
save and propagate the associated options, to preserve consistency in warning
behavior.

For example, currently:
$ x86_64-pc-linux-gnu-gcc -fopenacc test.c -fno-diagnostics-show-caret
y.c: In function 'main._omp_fn.0':
y.c:6:11: warning: using num_workers (32), ignoring 500
   #pragma acc parallel num_workers(500)
   ^
(note: this warning message is triggered by nvptx code currently only
on gomp-4_0-branch, but illustrates the point)

The caret stills shows, because -fno-diagnostics-show-caret does not
reach the accel compiler. -flto should also have a similar issue.

The attached patch allows a series of -fdiagnostics-* options to be
propagated by lto-wrapper.  I've tested this patch without regressions,
is this okay for trunk?

Thanks,
Chung-Lin

2015-09-06  Chung-Lin Tang  <clt...@codesourcery.com>

* lto-wrapper.c (merge_and_complain): Add OPT_fdiagnostics_show_caret,
OPT_fdiagnostics_show_option, OPT_fdiagnostics_show_location_, and
OPT_fshow_column to handled saved option cases.
(append_compiler_options): Do not skip the above added options.
Index: lto-wrapper.c
===
--- lto-wrapper.c	(revision 227508)
+++ lto-wrapper.c	(working copy)
@@ -232,6 +232,10 @@ merge_and_complain (struct cl_decoded_option **dec
 	break;
 
 	  /* Fallthru.  */
+	case OPT_fdiagnostics_show_caret:
+	case OPT_fdiagnostics_show_option:
+	case OPT_fdiagnostics_show_location_:
+	case OPT_fshow_column:
 	case OPT_fPIC:
 	case OPT_fpic:
 	case OPT_fPIE:
@@ -479,6 +483,10 @@ append_compiler_options (obstack *argv_obstack, st
 	 on any CL_TARGET flag and a few selected others.  */
   switch (option->opt_index)
 	{
+	case OPT_fdiagnostics_show_caret:
+	case OPT_fdiagnostics_show_option:
+	case OPT_fdiagnostics_show_location_:
+	case OPT_fshow_column:
 	case OPT_fPIC:
 	case OPT_fpic:
 	case OPT_fPIE:


[PATCH 1/3, libgomp] Adjust offload plugin interface for avoiding deadlock on exit

2015-08-27 Thread Chung-Lin Tang
We've discovered that, for several of the libgomp plugin interface routines,
if the target specific routine calls exit() (usually upon a fatal condition),
deadlock ensues. We found this using nvptx, but it's possible on intelmic as 
well.

This is due to many of the plugin routines are called with the device lock held,
and when exit() is called inside the plugin code, the GOMP_unregister_var() 
destructor
tries to iterate through and acquire all device locks to cleanup. Since we 
already hold
one of the device locks, this just gets stuck.  Also because gomp_mutex_t is a
simple futex based lock implementation (instead of pthreads), we don't have a
trylock mechanism to use either.

So this patch tries to alleviate this problem by changing the plugin interface;
the plugin routines that are called while holding the device lock are adjusted
to assume to never fatal exit, but return a value back to libgomp proper to
indicate execution results. The core libgomp code then may unlock and call 
gomp_fatal().

We believe this is the right route to solve the problem, since there's only
two accel target plugins so far. Besides the nvptx plugin, I have made some 
effort
to update the intelmic plugin as well, though it's not as thoroughly audited.
Intel folks might want to further make sure your plugin code is free of this 
problem as well.

This patch contains the libgomp proper changes. The nvptx and intelmic patches 
follow.
I have tested the libgomp testsuite without regressions for both accel targets, 
is this
okay for trunk?

Thanks,
Chung-Lin

2015-08-27 Chung-Lin Tang clt...@codesourcery.com

* oacc-host.c (host_init_device): Change return type to bool.
(host_fini_device): Likewise.
(host_dev2host): Likewise.
(host_host2dev): Likewise.
(host_free): Likewise.
(host_alloc): Change return type to bool, change to use out
parameter to return allocated pointer.
* oacc-mem.c (acc_malloc): Adjust plugin hook declaration change,
handle fatal error.
(acc_free): Likewise.
(acc_memcpy_to_device): Likewise.
(acc_memcpy_from_device): Likewise.
* oacc-init.c (acc_init_1): Handle gomp_init_device return code,
handle fatal error.
(acc_set_device_type): Likewise.
(acc_set_device_num): Likewise.
* target.c (gomp_map_vars): Adjust alloc_func plugin hook call,
add device unlock, handle fatal error.
(gomp_unmap_tgt): Change return type to bool, adjust free_func
plugin call.
(gomp_copy_from_async): Handle dev2host_func return code, handle
fatal error.
(gomp_unmap_vars): Likewise.
(gomp_init_device): Change return type to bool, adjust call to
init_device_func plugin hook.
(GOMP_target): Adjust call to gomp_init_device, handle fatal error.
(GOMP_target_data): Likewise.
(GOMP_target_update): Likewise.
* libgomp.h (gomp_device_descr.init_device_func): Change return
type to bool.
(gomp_device_descr.fini_device_func): Likewise.
(gomp_device_descr.free_func): Likewise.
(gomp_device_descr.dev2host_func): Likewise.
(gomp_device_descr.host2dev_func) Likewise.
(gomp_device_descr.alloc_func): Change return
type to bool, use out parameter to return pointer.
(gomp_init_device): Change return
type to bool.
Index: libgomp/libgomp.h
===
--- libgomp/libgomp.h	(revision 227257)
+++ libgomp/libgomp.h	(working copy)
@@ -746,15 +746,15 @@ struct gomp_device_descr
   unsigned int (*get_caps_func) (void);
   int (*get_type_func) (void);
   int (*get_num_devices_func) (void);
-  void (*init_device_func) (int);
-  void (*fini_device_func) (int);
+  bool (*init_device_func) (int);
+  bool (*fini_device_func) (int);
   unsigned (*version_func) (void);
   int (*load_image_func) (int, unsigned, const void *, struct addr_pair **);
   void (*unload_image_func) (int, unsigned, const void *);
-  void *(*alloc_func) (int, size_t);
-  void (*free_func) (int, void *);
-  void *(*dev2host_func) (int, void *, const void *, size_t);
-  void *(*host2dev_func) (int, void *, const void *, size_t);
+  bool (*alloc_func) (int, size_t, void**);
+  bool (*free_func) (int, void *);
+  bool (*dev2host_func) (int, void *, const void *, size_t);
+  bool (*host2dev_func) (int, void *, const void *, size_t);
   void (*run_func) (int, void *, void *);
 
   /* Splay tree containing information about mapped memory regions.  */
@@ -780,7 +780,7 @@ extern struct target_mem_desc *gomp_map_vars (stru
 	  size_t *, void *, bool, bool);
 extern void gomp_copy_from_async (struct target_mem_desc *);
 extern void gomp_unmap_vars (struct target_mem_desc *, bool);
-extern void gomp_init_device (struct gomp_device_descr *);
+extern bool gomp_init_device (struct gomp_device_descr *);
 extern void gomp_free_memmap (struct splay_tree_s

[PATCH 2/3, libgomp] nvptx plugin parts

2015-08-27 Thread Chung-Lin Tang
These are the nvptx plugin specific parts.

Chung-Lin

* plugin/plugin-nvptx.c (CUDA_CALL_ERET): New convenience macro.
(CUDA_CALL): Likewise.
(CUDA_CALL_ASSERT): Likewise.
(map_init): Change return type to bool, use CUDA_CALL* macros.
(map_fini): Likewise.
(init_streams_for_device): Change return type to bool, adjust
call to map_init.
(fini_streams_for_device): Change return type to bool, adjust
call to map_fini.
(select_stream_for_async): Release stream_lock before calls to
GOMP_PLUGIN_fatal, adjust call to map_init.
(nvptx_init): Use CUDA_CALL* macros.
(nvptx_attach_host_thread_to_device): Change return type to bool,
use CUDA_CALL* macros.
(nvptx_open_device): Use CUDA_CALL* macros.
(nvptx_close_device): Change return type to bool, use CUDA_CALL* macros.
(nvptx_get_num_devices): Use CUDA_CALL* macros.
(link_ptx): Change return type to bool, use CUDA_CALL* macros.
(nvptx_exec): Use CUDA_CALL* macros.
(nvptx_alloc): Change return type to bool, use CUDA_CALL* macros,
change to use out parameter to return allocated pointer.
(nvptx_free): Change return type to bool, use CUDA_CALL* macros.
(nvptx_host2dev): Likewise.
(nvptx_dev2host): Likewise.
(nvptx_wait): Use CUDA_CALL* macros.
(nvptx_wait_async): Likewise.
(nvptx_wait_all): Likewise.
(nvptx_wait_all_async): Likewise.
(nvptx_set_cuda_stream): Adjust order of stream_lock acquire,
use CUDA_CALL* macros, adjust call to map_fini.
(GOMP_OFFLOAD_init_device): Change return type to bool,
adjust code accordingly.
(GOMP_OFFLOAD_fini_device): Likewise.
(GOMP_OFFLOAD_load_image): Adjust calls to
nvptx_attach_host_thread_to_device/link_ptx to handle errors,
use CUDA_CALL* macros.
(GOMP_OFFLOAD_alloc): Change return type to bool, adjust calls
to code to handle error return.
(GOMP_OFFLOAD_free): Likewise.
(GOMP_OFFLOAD_dev2host): Likewise.
(GOMP_OFFLOAD_host2dev): Likewise.
(GOMP_OFFLOAD_openacc_register_async_cleanup): Use CUDA_CALL* macros.
(GOMP_OFFLOAD_openacc_create_thread_data): Likewise.
Index: libgomp/plugin/plugin-nvptx.c
===
--- libgomp/plugin/plugin-nvptx.c	(revision 227257)
+++ libgomp/plugin/plugin-nvptx.c	(working copy)
@@ -127,6 +127,34 @@ cuda_error (CUresult r)
   return errmsg;
 }
 
+/* Convenience macros for the frequently used CUDA library call and
+   error handling sequence.  This does not capture all the cases we
+   use in this file, but is common enough.  */
+
+#define CUDA_CALL_ERET(ERET, FN, ...)		\
+  do {		\
+unsigned __r = FN (__VA_ARGS__);		\
+if (__r != CUDA_SUCCESS)			\
+  {		\
+	GOMP_PLUGIN_error (#FN  error: %s,	\
+			   cuda_error (__r));	\
+	return ERET;\
+  }		\
+  } while (0)
+
+#define CUDA_CALL(FN, ...)			\
+  CUDA_CALL_ERET (false, (FN), __VA_ARGS__)
+
+#define CUDA_CALL_ASSERT(FN, ...)		\
+  do {		\
+unsigned __r = FN (__VA_ARGS__);		\
+if (__r != CUDA_SUCCESS)			\
+  {		\
+	GOMP_PLUGIN_fatal (#FN  error: %s,	\
+			   cuda_error (__r));	\
+  }		\
+  } while (0)
+
 static unsigned int instantiated_devices = 0;
 static pthread_mutex_t ptx_dev_lock = PTHREAD_MUTEX_INITIALIZER;
 
@@ -162,25 +190,18 @@ struct map
   charmappings[0];
 };
 
-static void
+static bool
 map_init (struct ptx_stream *s)
 {
-  CUresult r;
-
   int size = getpagesize ();
 
   assert (s);
   assert (!s-d);
   assert (!s-h);
 
-  r = cuMemAllocHost (s-h, size);
-  if (r != CUDA_SUCCESS)
-GOMP_PLUGIN_fatal (cuMemAllocHost error: %s, cuda_error (r));
+  CUDA_CALL (cuMemAllocHost, s-h, size);
+  CUDA_CALL (cuMemHostGetDevicePointer, s-d, s-h, 0);
 
-  r = cuMemHostGetDevicePointer (s-d, s-h, 0);
-  if (r != CUDA_SUCCESS)
-GOMP_PLUGIN_fatal (cuMemHostGetDevicePointer error: %s, cuda_error (r));
-
   assert (s-h);
 
   s-h_begin = s-h;
@@ -189,16 +210,14 @@ map_init (struct ptx_stream *s)
 
   assert (s-h_next);
   assert (s-h_end);
+  return true;
 }
 
-static void
+static bool
 map_fini (struct ptx_stream *s)
 {
-  CUresult r;
-
-  r = cuMemFreeHost (s-h);
-  if (r != CUDA_SUCCESS)
-GOMP_PLUGIN_fatal (cuMemFreeHost error: %s, cuda_error (r));
+  CUDA_CALL (cuMemFreeHost, s-h);
+  return true;
 }
 
 static void
@@ -359,7 +378,7 @@ nvptx_thread (void)
   return (struct nvptx_thread *) GOMP_PLUGIN_acc_thread ();
 }
 
-static void
+static bool
 init_streams_for_device (struct ptx_device *ptx_dev, int concurrency)
 {
   int i;
@@ -371,9 +390,10 @@ init_streams_for_device (struct ptx_device *ptx_de
   null_stream-multithreaded = true;
   null_stream-d = (CUdeviceptr) NULL;
   null_stream-h = NULL;
-  map_init (null_stream);
-  ptx_dev-null_stream = null_stream;
+  if (!map_init (null_stream))

[PATCH 3/3, libgomp] intelmic specific parts

2015-08-27 Thread Chung-Lin Tang
These are the intelmic plugin specific parts (actually beneath liboffloadmic
instead of libgomp).

The changes are basically to expose the return value of offload() back to
libgomp. I only checked parts of the plugin, it appears that there may
still be code in the liboffloadmic runtime that can call exit() while
holding the lock, so Intel folks might want to audit more thoroughly later.

Chung-Lin

* plugin/libgomp-plugin-intelmic.cpp (offload): Change return type
to bool, adjust return code.
(GOMP_OFFLOAD_init_device): Likewise.
(GOMP_OFFLOAD_fini_device): Likewise.
(get_target_table): Likewise.
(offload_image): Likwise.
(GOMP_OFFLOAD_load_image): Adjust call to offload_image(), change
exit() to return error.
(GOMP_OFFLOAD_alloc): Change return type to bool, change to use out
parameter to return allocated pointer.
(GOMP_OFFLOAD_free): Change return type to bool, adjust return code.
(GOMP_OFFLOAD_host2dev): Likewise.
(GOMP_OFFLOAD_dev2host): Likewise.
Index: liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
===
--- liboffloadmic/plugin/libgomp-plugin-intelmic.cpp	(revision 227257)
+++ liboffloadmic/plugin/libgomp-plugin-intelmic.cpp	(working copy)
@@ -184,17 +184,18 @@ GOMP_OFFLOAD_get_num_devices (void)
   return num_devices;
 }
 
-static void
+static bool
 offload (const char *file, uint64_t line, int device, const char *name,
 	 int num_vars, VarDesc *vars, VarDesc2 *vars2)
 {
   OFFLOAD ofld = __offload_target_acquire1 (device, file, line);
   if (ofld)
-__offload_offload1 (ofld, name, 0, num_vars, vars, vars2, 0, NULL, NULL);
+return __offload_offload1 (ofld, name, 0, num_vars, vars, vars2, 0,
+			   NULL, NULL);
   else
 {
-  fprintf (stderr, %s:%d: Offload target acquire failed\n, file, line);
-  exit (1);
+  GOMP_PLUGIN_error (%s:%d: Offload target acquire failed\n, file, line);
+  return false;
 }
 }
 
@@ -206,24 +207,25 @@ register_main_image ()
 
 /* liboffloadmic loads and runs offload_target_main on all available devices
during a first call to offload ().  */
-extern C void
+extern C bool
 GOMP_OFFLOAD_init_device (int device)
 {
   TRACE ();
   pthread_once (main_image_is_registered, register_main_image);
-  offload (__FILE__, __LINE__, device, __offload_target_init_proc, 0,
-	   NULL, NULL);
+  return offload (__FILE__, __LINE__, device, __offload_target_init_proc, 0,
+		  NULL, NULL);
 }
 
-extern C void
+extern C bool
 GOMP_OFFLOAD_fini_device (int device)
 {
   TRACE ();
   /* Unreachable for GOMP_OFFLOAD_CAP_OPENMP_400.  */
   abort ();
+  return true;
 }
 
-static void
+static bool
 get_target_table (int device, int num_funcs, int num_vars, void **table)
 {
   VarDesc vd1[2] = { vd_tgt2host, vd_tgt2host };
@@ -233,8 +235,9 @@ get_target_table (int device, int num_funcs, int
   vd1[1].size = sizeof (num_vars);
   VarDesc2 vd1g[2] = { { num_funcs, 0 }, { num_vars, 0 } };
 
-  offload (__FILE__, __LINE__, device, __offload_target_table_p1, 2,
-	   vd1, vd1g);
+  if (!offload (__FILE__, __LINE__, device, __offload_target_table_p1, 2,
+		vd1, vd1g))
+return false;
 
   int table_size = num_funcs + 2 * num_vars;
   if (table_size  0)
@@ -247,15 +250,16 @@ get_target_table (int device, int num_funcs, int
   vd2.size = table_size * sizeof (void *);
   VarDesc2 vd2g = { table, 0 };
 
-  offload (__FILE__, __LINE__, device, __offload_target_table_p2, 1,
-	   vd2, vd2g);
+  return offload (__FILE__, __LINE__, device, __offload_target_table_p2,
+		  1, vd2, vd2g);
 }
+  return true;
 }
 
 /* Offload TARGET_IMAGE to all available devices and fill address_table with
corresponding target addresses.  */
 
-static void
+static bool
 offload_image (const void *target_image)
 {
   struct TargetImage {
@@ -277,8 +281,8 @@ offload_image (const void *target_image)
 			  + image_size);
   if (!image)
 {
-  fprintf (stderr, %s: Can't allocate memory\n, __FILE__);
-  exit (1);
+  GOMP_PLUGIN_error (%s: Can't allocate memory\n, __FILE__);
+  return false;
 }
 
   image-size = image_size;
@@ -291,13 +295,14 @@ offload_image (const void *target_image)
 
   /* Receive tables for target_image from all devices.  */
   DevAddrVect dev_table;
+  bool ret = true;
   for (int dev = 0; dev  num_devices; dev++)
 {
   int num_funcs = 0;
   int num_vars = 0;
   void **table = NULL;
 
-  get_target_table (dev, num_funcs, num_vars, table);
+  ret = get_target_table (dev, num_funcs, num_vars, table);
 
   AddrVect curr_dev_table;
 
@@ -326,6 +331,7 @@ offload_image (const void *target_image)
   address_table-insert (std::make_pair (target_image, dev_table));
 
   free (image);
+  return ret;
 }
 
 /* Return the libgomp version number we're compatible with.  There is
@@ -351,15 +357,19 @@ GOMP_OFFLOAD_load_image (int device, const 

[PATCH, nios2] Remove unused header from libgcc linux-atomic.c

2015-07-22 Thread Chung-Lin Tang
The asm/unistd.h header was used back when Nios II Linux used a syscall 
cmpxchg,
long since removed and actually never got into the FSF trunk.

Patch removes the #include, and the following error code #defines which are
all no longer used. Committed.

Chung-Lin

2015-07-22  Chung-Lin Tang  clt...@codesourcery.com

* config/nios2/linux-atomic.c (asm/unistd.h): Remove #include.
(EFAULT,EBUSY,ENOSYS): Delete unused #defines.

Index: config/nios2/linux-atomic.c
===
--- config/nios2/linux-atomic.c (revision 226061)
+++ config/nios2/linux-atomic.c (working copy)
@@ -20,11 +20,6 @@ a copy of the GCC Runtime Library Exception along
 see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 http://www.gnu.org/licenses/.  */
 
-#include asm/unistd.h
-#define EFAULT  14
-#define EBUSY   16
-#define ENOSYS  38
-
 /* We implement byte, short and int versions of each atomic operation
using the kernel helper defined below.  There is no support for
64-bit operations yet.  */


Re: [PATCH, gomp4] Propagate independent clause for OpenACC kernels pass

2015-07-14 Thread Chung-Lin Tang
On 15/7/14 3:00 PM, Jakub Jelinek wrote:
 On Tue, Jul 14, 2015 at 01:46:04PM +0800, Chung-Lin Tang wrote:
 this patch provides a 'bool independent' field in struct loop, which
 will be switched on by an independent clause in a #pragma acc loop 
 directive.
 I assume you'll be wiring it to the kernels parloops pass in a followup 
 patch.

 Note: there are already a few other similar fields in struct loop, namely
 'safelen' and 'can_be_parallel', used by OMP simd safelen and GRAPHITE 
 respectively.
 The intention and/or setting of these fields are all a bit different, so I've
 decided to add a new bool for OpenACC.
 
 How is it different though?  Can you cite exact definition of the
 independent clause vs. safelen (set to INT_MAX)?
 The OpenMP definition is:
 A SIMD loop has logical iterations numbered 0,1,...,N-1 where N is the
 number of loop iterations, and the logical numbering denotes the sequence in 
 which the iterations would
 be executed if the associated loop(s) were executed with no SIMD 
 instructions. If the safelen
 clause is used then no two iterations executed concurrently with SIMD 
 instructions can have a
 greater distance in the logical iteration space than its value.
 ...
 Lexical forward dependencies in the iterations of the
 original loop must be preserved within each SIMD chunk.

The wording of OpenACC independent is more simple:
... the independent clause tells the implementation that the iterations of 
this loop
are data-independent with respect to each other. -- OpenACC spec 2.7.9

I would say this implies even more relaxed conditions than OpenMP simd safelen,
essentially saying that the compiler doesn't even need dependence analysis; just
assume independence of iterations.

 So e.g. safelen = 32 means for PTX you can safely implement it by
 running up to 32 consecutive iterations by all threads in the warp
 (assuming code that for some reason must be run by a single thread
 (e.g. calls to functions that are marked so that they expect to be run
 by the first thread in a warp initially) is run sequentially by increasing
 iterator), but it doesn't mean the iterations have no dependencies in between
 them whatsoever (see the above note about lexical forward dependencies),
 so you can't parallelize it by assigning different iterations to different
 threads outside of warp (or pthread_create created threads).

 So if OpenACC independent means there are no dependencies in between
 iterations, the OpenMP counterpart here is #pragma omp for simd schedule 
 (auto)
 or #pragma omp distribute parallel for simd schedule (auto).

schedule(auto) appears to correspond to the OpenACC 'auto' clause, or
what is implied in a kernels compute construct, but I'm not sure it implies
no dependencies between iterations?

Putting aside the semantic issues, as of currently safelen0 turns on a certain 
amount of
vectorization code that we are not currently using (and not likely at all for 
nvptx).
Right now, we're just trying to pass the new flag to a kernels tree-parloops 
based pass.

Maybe this can all be reconciled later in a more precise way, e.g. have flags 
that correspond
specifically to phases of internal compiler passes (and selected by needs of 
the accel target),
instead of ones that are sort of associated with high-level language features.

Chung-Lin



[PATCH, gomp4] Propagate independent clause for OpenACC kernels pass

2015-07-13 Thread Chung-Lin Tang
Hi Tom,
this patch provides a 'bool independent' field in struct loop, which
will be switched on by an independent clause in a #pragma acc loop directive.
I assume you'll be wiring it to the kernels parloops pass in a followup patch.

Note: there are already a few other similar fields in struct loop, namely
'safelen' and 'can_be_parallel', used by OMP simd safelen and GRAPHITE 
respectively.
The intention and/or setting of these fields are all a bit different, so I've
decided to add a new bool for OpenACC.

Tested and committed to gomp-4_0-branch.

Chung-Lin

2015-07-14  Chung-Lin Tang  clt...@codesourcery.com

* cfgloop.h (struct loop): Add 'bool marked_independent' field.
* gimplify.c (gimplify_scan_omp_clauses): Keep OMP_CLAUSE_INDEPENDENT.
* omp-low.c (struct omp_region): Add 'int kind' and
'bool independent' fields.
(expand_omp_for): Set 'marked_independent' field for loop
corresponding to region.
(find_omp_for_region_data): New function.
(find_omp_target_region_data): Set kind field.
(build_omp_regions_1): Call find_omp_for_region_data() for
GIMPLE_OMP_FOR statements.
Index: cfgloop.h
===
--- cfgloop.h	(revision 225758)
+++ cfgloop.h	(working copy)
@@ -194,6 +194,10 @@ struct GTY ((chain_next (%h.next))) loop {
   /* True if the loop is part of an oacc kernels region.  */
   bool in_oacc_kernels_region;
 
+  /* True if loop is tagged as having independent iterations by user,
+ e.g. the OpenACC independent clause.  */
+  bool marked_independent;
+
   /* For SIMD loops, this is a unique identifier of the loop, referenced
  by IFN_GOMP_SIMD_VF, IFN_GOMP_SIMD_LANE and IFN_GOMP_SIMD_LAST_LANE
  builtins.  */
Index: gimplify.c
===
--- gimplify.c	(revision 225758)
+++ gimplify.c	(working copy)
@@ -6602,7 +6602,6 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_se
 	  break;
 
 	case OMP_CLAUSE_DEVICE_RESIDENT:
-	case OMP_CLAUSE_INDEPENDENT:
 	  remove = true;
 	  break;
 
@@ -6612,6 +6611,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_se
 	case OMP_CLAUSE_COLLAPSE:
 	case OMP_CLAUSE_AUTO:
 	case OMP_CLAUSE_SEQ:
+	case OMP_CLAUSE_INDEPENDENT:
 	case OMP_CLAUSE_MERGEABLE:
 	case OMP_CLAUSE_PROC_BIND:
 	case OMP_CLAUSE_SAFELEN:
Index: omp-low.c
===
--- omp-low.c	(revision 225758)
+++ omp-low.c	(working copy)
@@ -136,8 +136,16 @@ struct omp_region
   /* True if this is nested inside an OpenACC kernels construct.  */
   bool inside_kernels_p;
 
+  /* Records a generic kind field.  */
+  int kind;
+
   /* For an OpenACC loop, the level of parallelism requested.  */
   int gwv_this;
+
+  /* For an OpenACC loop directive, true if has the 'independent' clause.  */
+  bool independent;
+
+  tree broadcast_array;
 };
 
 /* Context structure.  Used to store information about each parallel
@@ -8273,8 +8281,15 @@ expand_omp_for (struct omp_region *region, gimple
 loops_state_set (LOOPS_NEED_FIXUP);
 
   if (region-inside_kernels_p)
-expand_omp_for_generic (region, fd, BUILT_IN_NONE, BUILT_IN_NONE,
-			inner_stmt);
+{
+  expand_omp_for_generic (region, fd, BUILT_IN_NONE, BUILT_IN_NONE,
+			  inner_stmt);
+  if (region-independent  region-cont-loop_father)
+	{
+	  struct loop *loop = region-cont-loop_father; 
+	  loop-marked_independent = true;
+	}
+}
   else if (gimple_omp_for_kind (fd.for_stmt)  GF_OMP_FOR_SIMD)
 expand_omp_simd (region, fd);
   else if (gimple_omp_for_kind (fd.for_stmt) == GF_OMP_FOR_KIND_CILKFOR)
@@ -9943,6 +9958,34 @@ find_omp_for_region_gwv (gimple stmt)
   return tmp;
 }
 
+static void
+find_omp_for_region_data (struct omp_region *region, gomp_for *stmt)
+{
+  region-gwv_this = find_omp_for_region_gwv (stmt);
+  region-kind = gimple_omp_for_kind (stmt);
+
+  if (region-kind == GF_OMP_FOR_KIND_OACC_LOOP)
+{
+  struct omp_region *target_region = region-outer;
+  while (target_region
+	  target_region-type != GIMPLE_OMP_TARGET)
+	target_region = target_region-outer;
+  if (!target_region)
+	return;
+
+  tree clauses = gimple_omp_for_clauses (stmt);
+
+  if (target_region-kind == GF_OMP_TARGET_KIND_OACC_PARALLEL
+	   !find_omp_clause (clauses, OMP_CLAUSE_SEQ))
+	/* In OpenACC parallel constructs, 'independent' is implied on all
+	   loop directives without a 'seq' clause.  */
+	region-independent = true;
+  else if (target_region-kind == GF_OMP_TARGET_KIND_OACC_KERNELS
+	find_omp_clause (clauses, OMP_CLAUSE_INDEPENDENT))
+	region-independent = true;
+}
+}
+
 /* Fill in additional data for a region REGION associated with an
OMP_TARGET STMT.  */
 
@@ -9960,6 +10003,7 @@ find_omp_target_region_data (struct omp_region *re
 region-gwv_this |= OACC_LOOP_MASK (OACC_worker);
   if (find_omp_clause (clauses, OMP_CLAUSE_VECTOR_LENGTH

[gomp4] implicit firstprivate and other testcase fixes

2015-07-01 Thread Chung-Lin Tang
This patch notices the index variable of an acc loop (internally an OMP_FOR)
inside an OpenACC construct, and completes the implicit firstprivate
behavior as described in the spec. The firstprivate clauses and FIXME in
libgomp.oacc-c-c++-common/parallel-loop-2.h has also been removed together
in the patch.

Also a typo-bug in testcase libgomp.oacc-c-c++-common/reduction-4.c is also 
corrected,
where reduction variable names are apparently wrong.

Tested without regressions, and applied to gomp-4_0-branch.

Chung-Lin

2015-07-01  Chung-Lin Tang  clt...@codesourcery.com

gcc/
* gimplify.c (gimplify_omp_for): For acc loops inside OpenACC 
constructs,
notice the use of the index variable in the surrounding 
gimplify_omp_ctx.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/reduction-4.c (main): Correct
the names of reduction variables in '' and '||' tests.
* testsuite/libgomp.oacc-c-c++-common/parallel-loop-2.h:
Remove uses of the firstprivate clause, remove FIXME comment.
Index: gcc/gimplify.c
===
--- gcc/gimplify.c  (revision 225248)
+++ gcc/gimplify.c  (working copy)
@@ -7348,7 +7348,11 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
   else if (omp_is_private (gimplify_omp_ctxp, decl, 0))
omp_notice_variable (gimplify_omp_ctxp, decl, true);
   else
-   omp_add_variable (gimplify_omp_ctxp, decl, GOVD_PRIVATE | GOVD_SEEN);
+   {
+ if (ork == ORK_OACC  gimplify_omp_ctxp-outer_context)
+   omp_notice_variable (gimplify_omp_ctxp-outer_context, decl, true);
+ omp_add_variable (gimplify_omp_ctxp, decl, GOVD_PRIVATE | GOVD_SEEN);
+   }
 
   /* If DECL is not a gimple register, create a temporary variable to act
 as an iteration counter.  This is valid, since DECL cannot be
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c
===
--- libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c   (revision 
225248)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c   (working copy)
@@ -59,14 +59,14 @@ main(void)
   lvresult = false;
 
   /* '' reductions.  */
-#pragma acc parallel num_gangs (ng) copy (result)
+#pragma acc parallel num_gangs (ng) copy (lresult)
 #pragma acc loop reduction (:lresult) gang
   for (i = 0; i  n; i++)
 lresult = lresult  (creal(result)  creal(array[i]));
 
   /* Verify the reduction.  */
   for (i = 0; i  n; i++)
-lvresult = lresult  (creal(result)  creal(array[i]));
+lvresult = lvresult  (creal(result)  creal(array[i]));
 
   if (lresult != lvresult)
 abort ();
@@ -78,14 +78,14 @@ main(void)
   lvresult = false;
 
   /* '||' reductions.  */
-#pragma acc parallel num_gangs (ng) copy (result)
+#pragma acc parallel num_gangs (ng) copy (lresult)
 #pragma acc loop reduction (||:lresult) gang
   for (i = 0; i  n; i++)
 lresult = lresult || (creal(result)  creal(array[i]));
 
   /* Verify the reduction.  */
   for (i = 0; i  n; i++)
-lvresult = lresult || (creal(result)  creal(array[i]));
+lvresult = lvresult || (creal(result)  creal(array[i]));
 
   if (lresult != lvresult)
 abort ();
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-loop-2.h
===
--- libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-loop-2.h   
(revision 225248)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-loop-2.h   
(working copy)
@@ -1,5 +1,3 @@
-/* FIXME: Remove the firstprivate clauses from the paralle regions.  */
-
 #ifndef VARS
 #define VARS
 int a[1500];
@@ -19,7 +17,7 @@ __attribute__((noinline, noclone)) void
 N(f0) (void)
 {
   int i;
-#pragma acc parallel loop L F firstprivate (i)
+#pragma acc parallel loop L F
   for (i = 0; i  1500; i++)
 a[i] += 2;
 }
@@ -36,7 +34,7 @@ __attribute__((noinline, noclone)) void
 N(f2) (void)
 {
   unsigned long long i;
-#pragma acc parallel loop L F firstprivate (i)
+#pragma acc parallel loop L F
   for (i = __LONG_LONG_MAX__ + 4500ULL - 27;
i  __LONG_LONG_MAX__ - 27ULL; i -= 3)
 a[(i + 26LL - __LONG_LONG_MAX__) / 3] -= 4;
@@ -54,7 +52,7 @@ __attribute__((noinline, noclone)) void
 N(f4) (void)
 {
   unsigned int i;
-#pragma acc parallel loop L F firstprivate (i)
+#pragma acc parallel loop L F
   for (i = 30; i  20; i += 2)
 a[i] += 10;
 }
@@ -64,7 +62,7 @@ N(f5) (int n11, int n12, int n21, int n22, int n31
int s1, int s2, int s3)
 {
   SC int v1, v2, v3;
-#pragma acc parallel loop L F firstprivate (v1, v2, v3)
+#pragma acc parallel loop L F
   for (v1 = n11; v1  n12; v1 += s1)
 #pragma acc loop S
 for (v2 = n21; v2  n22; v2 += s2)
@@ -78,7 +76,7 @@ N(f6) (int n11, int n12, int n21, int n22, long lo
 {
   SC int v1, v2;
   SC long long v3;
-#pragma acc parallel loop L F firstprivate (v1, v2, v3)
+#pragma acc parallel loop L F
   for (v1

Re: [patch] fix regrename pass to ensure renamings produce valid insns

2015-06-30 Thread Chung-Lin Tang
On 2015/6/30 05:06 PM, Eric Botcazou wrote:
 I notice the way gcc_assert() is defined in system.h now, the test won't
 disappear even when runtime checks are disabled, though you might still
 adjust it to avoid any programmer confusion.
 
 It will disappear at run time, see the definition:
 
 /* Include EXPR, so that unused variable warnings do not occur.  */
 #define gcc_assert(EXPR) ((void)(0  (EXPR)))
 
 so you really need to use a separate variable.
 

I was referring to this one:
#if ENABLE_ASSERT_CHECKING
... 
#elif (GCC_VERSION = 4005)
#define gcc_assert(EXPR)\
  ((void)(__builtin_expect (!(EXPR), 0) ? __builtin_unreachable (), 0 : 0))
#else
...

But yeah, I guess older GCCs could be used to build a toolchain,
so a separate variable should be used.

Chung-Lin



Re: [patch] fix regrename pass to ensure renamings produce valid insns

2015-06-29 Thread Chung-Lin Tang
On 2015/6/30 12:22 PM, Sandra Loosemore wrote:
 On 06/29/2015 09:07 PM, Kito Cheng wrote:
 Hi all:

 This patch seem will broken when disable assert checking for c6x

 Index: gcc/config/c6x/c6x.c
 ===
 --- gcc/config/c6x/c6x.c (revision 225104)
 +++ gcc/config/c6x/c6x.c (working copy)
 @@ -3516,7 +3516,7 @@ try_rename_operands (rtx_insn *head, rtx
 best_reg =
   find_rename_reg (this_head, super_class, unavailable, old_reg, true);

 -  regrename_do_replace (this_head, best_reg);
 +  gcc_assert (regrename_do_replace (this_head, best_reg));

 count_unit_reqs (new_reqs, head, PREV_INSN (tail));
 merge_unit_reqs (new_reqs);
 @@ -3529,7 +3529,7 @@ try_rename_operands (rtx_insn *head, rtx
  unit_req_imbalance (reqs), unit_req_imbalance (new_reqs));
   }
 if (unit_req_imbalance (new_reqs)  unit_req_imbalance (reqs))
 -regrename_do_replace (this_head, old_reg);
 +gcc_assert (regrename_do_replace (this_head, old_reg));
 else
   memcpy (reqs, new_reqs, sizeof (unit_req_table));

 
 I'm sorry; do you have a suggestion for a fix?  I thought this was the change 
 I was asked to make, and as I noted previously, I'm not set up to test (or 
 even build) for this target.
 
 -Sandra the obviously confused :-(

You probably have to separate out the regrename_do_replace() bool result into a 
variable,
placing the whole call into the gcc_assert() might make it disappear when 
assertions are turned off.

Chung-Lin



Re: [patch] fix regrename pass to ensure renamings produce valid insns

2015-06-29 Thread Chung-Lin Tang
On 2015/6/30 下午 01:13, Chung-Lin Tang wrote:
 On 2015/6/30 12:22 PM, Sandra Loosemore wrote:
 On 06/29/2015 09:07 PM, Kito Cheng wrote:
 Hi all:

 This patch seem will broken when disable assert checking for c6x

 Index: gcc/config/c6x/c6x.c
 ===
 --- gcc/config/c6x/c6x.c (revision 225104)
 +++ gcc/config/c6x/c6x.c (working copy)
 @@ -3516,7 +3516,7 @@ try_rename_operands (rtx_insn *head, rtx
 best_reg =
   find_rename_reg (this_head, super_class, unavailable, old_reg, true);

 -  regrename_do_replace (this_head, best_reg);
 +  gcc_assert (regrename_do_replace (this_head, best_reg));

 count_unit_reqs (new_reqs, head, PREV_INSN (tail));
 merge_unit_reqs (new_reqs);
 @@ -3529,7 +3529,7 @@ try_rename_operands (rtx_insn *head, rtx
  unit_req_imbalance (reqs), unit_req_imbalance (new_reqs));
   }
 if (unit_req_imbalance (new_reqs)  unit_req_imbalance (reqs))
 -regrename_do_replace (this_head, old_reg);
 +gcc_assert (regrename_do_replace (this_head, old_reg));
 else
   memcpy (reqs, new_reqs, sizeof (unit_req_table));


 I'm sorry; do you have a suggestion for a fix?  I thought this was the 
 change I was asked to make, and as I noted previously, I'm not set up to 
 test (or even build) for this target.

 -Sandra the obviously confused :-(
 
 You probably have to separate out the regrename_do_replace() bool result into 
 a variable,
 placing the whole call into the gcc_assert() might make it disappear when 
 assertions are turned off.

I notice the way gcc_assert() is defined in system.h now, the test won't 
disappear even when runtime checks
are disabled, though you might still adjust it to avoid any programmer 
confusion.

Chung-Lin



Re: [gomp4] Generate sequential loop for OpenACC loop directive inside kernels

2015-06-23 Thread Chung-Lin Tang
On 2015/6/16 05:05 PM, Tom de Vries wrote:
 On 16/06/15 10:59, Chung-Lin Tang wrote:
 This patch adjusts omp-low.c:expand_omp_for_generic() to expand to a 
 sequential
 loop form (without the OMP runtime calls), used for loop directives inside
 OpenACC kernels constructs. Tom mentions that this allows the kernels 
 parallelization
 to work when '#pragma acc loop' makes the front-ends create OMP_FOR, which 
 the
 loop analysis phases don't understand.

 Tested and committed to gomp-4_0-branch.

 
 Hi Chung-Lin,
 
 can you commit a test-case to exercise the code?
 
 Thanks,
 - Tom

Just committed the attached testcase patch to gomp-4_0-branch.

Chung-Lin

2015-06-23  Chung-Lin Tang  clt...@codesourcery.com

gcc/testsuite/
* c-c++-common/goacc/kernels-loop.c (ACC_LOOP): Add #ifndef/#define.
(main): Tag loops inside kernels construct with '#pragma ACC_LOOP'.
* c-c++-common/goacc/kernels-loop-2.c: Likewise.
* c-c++-common/goacc/kernels-loop-3.c: Likewise.
* c-c++-common/goacc/kernels-loop-n.c: Likewise.
* c-c++-common/goacc/kernels-loop-acc-loop.c: New test.
* c-c++-common/goacc/kernels-loop-2-acc-loop.c: New test.
* c-c++-common/goacc/kernels-loop-3-acc-loop.c: New test.
* c-c++-common/goacc/kernels-loop-n-acc-loop.c: New test.

Index: gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c
===
--- gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c	(revision 0)
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c	(revision 0)
@@ -0,0 +1,20 @@
+/* { dg-additional-options -O2 } */
+/* { dg-additional-options -ftree-parallelize-loops=32 } */
+/* { dg-additional-options -fdump-tree-parloops_oacc_kernels-all } */
+/* { dg-additional-options -fdump-tree-optimized } */
+
+/* Check that loops with '#pragma acc loop' tagged gets properly parallelized.  */
+#define ACC_LOOP acc loop
+#include kernels-loop-3.c
+
+/* Check that only one loop is analyzed, and that it can be parallelized.  */
+/* { dg-final { scan-tree-dump-times SUCCESS: may be parallelized 1 parloops_oacc_kernels } } */
+/* { dg-final { scan-tree-dump-not FAILED: parloops_oacc_kernels } } */
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times (?n);; Function .*main._omp_fn.0 1 optimized } } */
+
+/* { dg-final { scan-tree-dump-times (?n)pragma omp target oacc_parallel.*num_gangs\\(32\\) 1 parloops_oacc_kernels } } */
+
+/* { dg-final { cleanup-tree-dump parloops_oacc_kernels } } */
+/* { dg-final { cleanup-tree-dump optimized } } */
Index: gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
===
--- gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c	(revision 224836)
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c	(working copy)
@@ -8,6 +8,10 @@
 #define N (1024 * 512)
 #define COUNTERTYPE unsigned int
 
+#ifndef ACC_LOOP
+#define ACC_LOOP
+#endif
+
 int
 main (void)
 {
@@ -21,18 +25,21 @@ main (void)
 
 #pragma acc kernels copyout (a[0:N])
   {
+#pragma ACC_LOOP
 for (COUNTERTYPE i = 0; i  N; i++)
   a[i] = i * 2;
   }
 
 #pragma acc kernels copyout (b[0:N])
   {
+#pragma ACC_LOOP
 for (COUNTERTYPE i = 0; i  N; i++)
   b[i] = i * 4;
   }
 
 #pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
   {
+#pragma ACC_LOOP
 for (COUNTERTYPE ii = 0; ii  N; ii++)
   c[ii] = a[ii] + b[ii];
   }
Index: gcc/testsuite/c-c++-common/goacc/kernels-loop.c
===
--- gcc/testsuite/c-c++-common/goacc/kernels-loop.c	(revision 224836)
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop.c	(working copy)
@@ -8,6 +8,10 @@
 #define N (1024 * 512)
 #define COUNTERTYPE unsigned int
 
+#ifndef ACC_LOOP
+#define ACC_LOOP
+#endif
+
 int
 main (void)
 {
@@ -27,6 +31,7 @@ main (void)
 
 #pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
   {
+#pragma ACC_LOOP
 for (COUNTERTYPE ii = 0; ii  N; ii++)
   c[ii] = a[ii] + b[ii];
   }
Index: gcc/testsuite/c-c++-common/goacc/kernels-loop-2-acc-loop.c
===
--- gcc/testsuite/c-c++-common/goacc/kernels-loop-2-acc-loop.c	(revision 0)
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-2-acc-loop.c	(revision 0)
@@ -0,0 +1,23 @@
+/* { dg-additional-options -O2 } */
+/* { dg-additional-options -ftree-parallelize-loops=32 } */
+/* { dg-additional-options -fdump-tree-parloops_oacc_kernels-all } */
+/* { dg-additional-options -fdump-tree-optimized } */
+
+/* Check that loops with '#pragma acc loop' tagged gets properly parallelized.  */
+#define ACC_LOOP acc loop
+#include kernels-loop-2.c
+
+/* Check that only three loops are analyzed, and that all can be
+   parallelized.  */
+/* { dg-final { scan-tree-dump-times SUCCESS: may be parallelized 3 parloops_oacc_kernels } } */
+/* { dg-final

[gomp4] Generate sequential loop for OpenACC loop directive inside kernels

2015-06-16 Thread Chung-Lin Tang
This patch adjusts omp-low.c:expand_omp_for_generic() to expand to a 
sequential
loop form (without the OMP runtime calls), used for loop directives inside
OpenACC kernels constructs. Tom mentions that this allows the kernels 
parallelization
to work when '#pragma acc loop' makes the front-ends create OMP_FOR, which the
loop analysis phases don't understand.

Tested and committed to gomp-4_0-branch.

Chung-Lin

2015-06-16  Chung-Lin Tang  clt...@codesourcery.com

* omp-low.c (struct omp_region): Add inside_kernels_p field.
(expand_omp_for_generic): Adjust to generate a 'sequential' loop
when GOMP builtin arguments are BUILT_IN_NONE.
(expand_omp_for): Use expand_omp_for_generic() to generate a
non-parallelized loop for OMP_FORs inside OpenACC kernels regions.
(expand_omp): Mark inside_kernels_p field true for regions
nested inside OpenACC kernels constructs.
Index: omp-low.c
===
--- omp-low.c	(revision 224475)
+++ omp-low.c	(working copy)
@@ -161,6 +161,9 @@ struct omp_region
   /* True if this is a combined parallel+workshare region.  */
   bool is_combined_parallel;
 
+  /* True if this is nested inside an OpenACC kernels construct.  */
+  bool inside_kernels_p;
+
   /* For an OpenACC loop, the level of parallelism requested.  */
   int gwv_this;
 
@@ -6734,6 +6737,7 @@ expand_omp_for_generic (struct omp_region *region,
   gassign *assign_stmt;
   bool in_combined_parallel = is_combined_parallel (region);
   bool broken_loop = region-cont == NULL;
+  bool seq_loop = (!start_fn || !next_fn);
   edge e, ne;
   tree *counts = NULL;
   int i;
@@ -6821,8 +6825,21 @@ expand_omp_for_generic (struct omp_region *region,
 			zero_iter_bb));
 	}
 }
-  if (in_combined_parallel)
+  if (seq_loop)
 {
+  tree n1 = fold_convert (fd-iter_type, fd-loop.n1);
+  tree n2 = fold_convert (fd-iter_type, fd-loop.n2);
+
+  assign_stmt = gimple_build_assign (istart0, n1);
+  gsi_insert_before (gsi, assign_stmt, GSI_SAME_STMT);
+
+  assign_stmt = gimple_build_assign (iend0, n2);
+  gsi_insert_before (gsi, assign_stmt, GSI_SAME_STMT);
+
+  t = fold_build2 (NE_EXPR, boolean_type_node, istart0, iend0);
+}
+  else if (in_combined_parallel)
+{
   /* In a combined parallel loop, emit a call to
 	 GOMP_loop_foo_next.  */
   t = build_call_expr (builtin_decl_explicit (next_fn), 2,
@@ -7007,32 +7024,38 @@ expand_omp_for_generic (struct omp_region *region,
 	collapse_bb = extract_omp_for_update_vars (fd, cont_bb, l1_bb);
 
   /* Emit code to get the next parallel iteration in L2_BB.  */
-  gsi = gsi_start_bb (l2_bb);
+  if (!seq_loop)
+	{
+	  gsi = gsi_start_bb (l2_bb);
 
-  t = build_call_expr (builtin_decl_explicit (next_fn), 2,
-			   build_fold_addr_expr (istart0),
-			   build_fold_addr_expr (iend0));
-  t = force_gimple_operand_gsi (gsi, t, true, NULL_TREE,
-false, GSI_CONTINUE_LINKING);
-  if (TREE_TYPE (t) != boolean_type_node)
-	t = fold_build2 (NE_EXPR, boolean_type_node,
-			 t, build_int_cst (TREE_TYPE (t), 0));
-  gcond *cond_stmt = gimple_build_cond_empty (t);
-  gsi_insert_after (gsi, cond_stmt, GSI_CONTINUE_LINKING);
+	  t = build_call_expr (builtin_decl_explicit (next_fn), 2,
+			   build_fold_addr_expr (istart0),
+			   build_fold_addr_expr (iend0));
+	  t = force_gimple_operand_gsi (gsi, t, true, NULL_TREE,
+	false, GSI_CONTINUE_LINKING);
+	  if (TREE_TYPE (t) != boolean_type_node)
+	t = fold_build2 (NE_EXPR, boolean_type_node,
+			 t, build_int_cst (TREE_TYPE (t), 0));
+	  gcond *cond_stmt = gimple_build_cond_empty (t);
+	  gsi_insert_after (gsi, cond_stmt, GSI_CONTINUE_LINKING);
+	}
 }
 
   /* Add the loop cleanup function.  */
   gsi = gsi_last_bb (exit_bb);
-  if (gimple_omp_return_nowait_p (gsi_stmt (gsi)))
-t = builtin_decl_explicit (BUILT_IN_GOMP_LOOP_END_NOWAIT);
-  else if (gimple_omp_return_lhs (gsi_stmt (gsi)))
-t = builtin_decl_explicit (BUILT_IN_GOMP_LOOP_END_CANCEL);
-  else
-t = builtin_decl_explicit (BUILT_IN_GOMP_LOOP_END);
-  gcall *call_stmt = gimple_build_call (t, 0);
-  if (gimple_omp_return_lhs (gsi_stmt (gsi)))
-gimple_call_set_lhs (call_stmt, gimple_omp_return_lhs (gsi_stmt (gsi)));
-  gsi_insert_after (gsi, call_stmt, GSI_SAME_STMT);
+  if (!seq_loop)
+{
+  if (gimple_omp_return_nowait_p (gsi_stmt (gsi)))
+	t = builtin_decl_explicit (BUILT_IN_GOMP_LOOP_END_NOWAIT);
+  else if (gimple_omp_return_lhs (gsi_stmt (gsi)))
+	t = builtin_decl_explicit (BUILT_IN_GOMP_LOOP_END_CANCEL);
+  else
+	t = builtin_decl_explicit (BUILT_IN_GOMP_LOOP_END);
+  gcall *call_stmt = gimple_build_call (t, 0);
+  if (gimple_omp_return_lhs (gsi_stmt (gsi)))
+	gimple_call_set_lhs (call_stmt, gimple_omp_return_lhs (gsi_stmt (gsi)));
+  gsi_insert_after (gsi, call_stmt, GSI_SAME_STMT);
+}
   gsi_remove (gsi, true);
 
   /* Connect the new

Re: [patch, libgomp] Re-factor GOMP_MAP_POINTER handling

2015-05-21 Thread Chung-Lin Tang
Ping x2.

On 15/5/11 7:19 PM, Chung-Lin Tang wrote:
 Ping.
 
 On 2015/4/21 08:21 PM, Chung-Lin Tang wrote:
 Hi,
 while investigating some issues in the variable mapping code, I observed
 that the GOMP_MAP_POINTER handling is essentially duplicated under the PSET 
 case.
 This patch abstracts and unifies the handling code, basically just a cleanup
 patch. Ran libgomp tests to ensure no regressions, ok for trunk?

 Thanks,
 Chung-Lin

 2015-04-21  Chung-Lin Tang  clt...@codesourcery.com

 libgomp/
 * target.c (gomp_map_pointer): New function abstracting out
 GOMP_MAP_POINTER handling.
 (gomp_map_vars): Remove GOMP_MAP_POINTER handling code and use
 gomp_map_pointer().

 



Re: [patch, libgomp] Re-factor GOMP_MAP_POINTER handling

2015-05-11 Thread Chung-Lin Tang
Ping.

On 2015/4/21 08:21 PM, Chung-Lin Tang wrote:
 Hi,
 while investigating some issues in the variable mapping code, I observed
 that the GOMP_MAP_POINTER handling is essentially duplicated under the PSET 
 case.
 This patch abstracts and unifies the handling code, basically just a cleanup
 patch. Ran libgomp tests to ensure no regressions, ok for trunk?
 
 Thanks,
 Chung-Lin
 
 2015-04-21  Chung-Lin Tang  clt...@codesourcery.com
 
 libgomp/
 * target.c (gomp_map_pointer): New function abstracting out
 GOMP_MAP_POINTER handling.
 (gomp_map_vars): Remove GOMP_MAP_POINTER handling code and use
 gomp_map_pointer().
 



[patch, libgomp] Re-factor GOMP_MAP_POINTER handling

2015-04-21 Thread Chung-Lin Tang
Hi,
while investigating some issues in the variable mapping code, I observed
that the GOMP_MAP_POINTER handling is essentially duplicated under the PSET 
case.
This patch abstracts and unifies the handling code, basically just a cleanup
patch. Ran libgomp tests to ensure no regressions, ok for trunk?

Thanks,
Chung-Lin

2015-04-21  Chung-Lin Tang  clt...@codesourcery.com

libgomp/
* target.c (gomp_map_pointer): New function abstracting out
GOMP_MAP_POINTER handling.
(gomp_map_vars): Remove GOMP_MAP_POINTER handling code and use
gomp_map_pointer().
Index: target.c
===
--- target.c	(revision 448412)
+++ target.c	(working copy)
@@ -163,6 +163,60 @@ get_kind (bool is_openacc, void *kinds, int idx)
 		: ((unsigned char *) kinds)[idx];
 }
 
+static void
+gomp_map_pointer (struct target_mem_desc *tgt, uintptr_t host_ptr,
+		  uintptr_t target_offset, uintptr_t bias)
+{
+  struct gomp_device_descr *devicep = tgt-device_descr;
+  struct splay_tree_s *mem_map = devicep-mem_map;
+  struct splay_tree_key_s cur_node;
+
+  cur_node.host_start = host_ptr;
+  if (cur_node.host_start == (uintptr_t) NULL)
+{
+  cur_node.tgt_offset = (uintptr_t) NULL;
+  /* FIXME: see comment about coalescing host/dev transfers below.  */
+  devicep-host2dev_func (devicep-target_id,
+			  (void *) (tgt-tgt_start + target_offset),
+			  (void *) cur_node.tgt_offset,
+			  sizeof (void *));
+  return;
+}
+  /* Add bias to the pointer value.  */
+  cur_node.host_start += bias;
+  cur_node.host_end = cur_node.host_start + 1;
+  splay_tree_key n = splay_tree_lookup (mem_map, cur_node);
+  if (n == NULL)
+{
+  /* Could be possibly zero size array section.  */
+  cur_node.host_end--;
+  n = splay_tree_lookup (mem_map, cur_node);
+  if (n == NULL)
+	{
+	  cur_node.host_start--;
+	  n = splay_tree_lookup (mem_map, cur_node);
+	  cur_node.host_start++;
+	}
+}
+  if (n == NULL)
+{
+  gomp_mutex_unlock (devicep-lock);
+  gomp_fatal (Pointer target of array section wasn't mapped);
+}
+  cur_node.host_start -= n-host_start;
+  cur_node.tgt_offset
+= n-tgt-tgt_start + n-tgt_offset + cur_node.host_start;
+  /* At this point tgt_offset is target address of the
+ array section.  Now subtract bias to get what we want
+ to initialize the pointer with.  */
+  cur_node.tgt_offset -= bias;
+  /* FIXME: see comment about coalescing host/dev transfers below.  */
+  devicep-host2dev_func (devicep-target_id,
+			  (void *) (tgt-tgt_start + target_offset),
+			  (void *) cur_node.tgt_offset,
+			  sizeof (void *));
+}
+
 attribute_hidden struct target_mem_desc *
 gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	   void **hostaddrs, void **devaddrs, size_t *sizes, void *kinds,
@@ -336,54 +390,8 @@ gomp_map_vars (struct gomp_device_descr *devicep,
 	k-host_end - k-host_start);
 		break;
 		  case GOMP_MAP_POINTER:
-		cur_node.host_start
-		  = (uintptr_t) *(void **) k-host_start;
-		if (cur_node.host_start == (uintptr_t) NULL)
-		  {
-			cur_node.tgt_offset = (uintptr_t) NULL;
-			/* FIXME: see above FIXME comment.  */
-			devicep-host2dev_func (devicep-target_id,
-		(void *) (tgt-tgt_start
-			  + k-tgt_offset),
-		(void *) cur_node.tgt_offset,
-		sizeof (void *));
-			break;
-		  }
-		/* Add bias to the pointer value.  */
-		cur_node.host_start += sizes[i];
-		cur_node.host_end = cur_node.host_start + 1;
-		n = splay_tree_lookup (mem_map, cur_node);
-		if (n == NULL)
-		  {
-			/* Could be possibly zero size array section.  */
-			cur_node.host_end--;
-			n = splay_tree_lookup (mem_map, cur_node);
-			if (n == NULL)
-			  {
-			cur_node.host_start--;
-			n = splay_tree_lookup (mem_map, cur_node);
-			cur_node.host_start++;
-			  }
-		  }
-		if (n == NULL)
-		  {
-			gomp_mutex_unlock (devicep-lock);
-			gomp_fatal (Pointer target of array section 
-wasn't mapped);
-		  }
-		cur_node.host_start -= n-host_start;
-		cur_node.tgt_offset = n-tgt-tgt_start + n-tgt_offset
-	  + cur_node.host_start;
-		/* At this point tgt_offset is target address of the
-		   array section.  Now subtract bias to get what we want
-		   to initialize the pointer with.  */
-		cur_node.tgt_offset -= sizes[i];
-		/* FIXME: see above FIXME comment.  */
-		devicep-host2dev_func (devicep-target_id,
-	(void *) (tgt-tgt_start
-		  + k-tgt_offset),
-	(void *) cur_node.tgt_offset,
-	sizeof (void *));
+		gomp_map_pointer (tgt, (uintptr_t) *(void **) k-host_start,
+  k-tgt_offset, sizes[i]);
 		break;
 		  case GOMP_MAP_TO_PSET:
 		/* FIXME: see above FIXME comment.  */
@@ -405,58 +413,12 @@ gomp_map_vars (struct gomp_device_descr *devicep,
 			{
 			  tgt-list[j] = k;
 			  k-refcount

[patch, nios2, committed] Fix nios2-linux crti/crtn settings

2015-03-25 Thread Chung-Lin Tang
We appear to have erroneously set 'extra_parts' in nios2-linux libgcc,
to include the crti.o/crtn.o files intended for nios2 EABI. This still
largely worked, which is why we haven't noticed it till now, expect some
features like gprof profiling wasn't properly set up.

This patch removes the extra_parts setting for nios2-linux libgcc; now
crti.o/crtn.o links to the correct ones provided by glibc.

Chung-Lin

2015-03-25  Chung-Lin Tang  clt...@codesourcery.com

libgcc/
* config.host (nios2-*-linux*): Remove 'extra_parts' setting.
Index: config.host
===
--- config.host	(revision 221651)
+++ config.host	(working copy)
@@ -943,7 +943,6 @@ nds32*-elf*)
 	;;
 nios2-*-linux*)
 	tmake_file=$tmake_file nios2/t-nios2 nios2/t-linux t-libgcc-pic t-slibgcc-libgcc
-	extra_parts=$extra_parts crti.o crtn.o
 	md_unwind_header=nios2/linux-unwind.h
 	;;
 nios2-*-*)


[PATCH, nios2] Updates to Nios II Linux

2015-01-20 Thread Chung-Lin Tang
The Nios II ports of glibc and Linux kernel are now both upstream. New
system conventions now use a non-executable stack. Attached patch
committed to support new conventions, applied to both trunk and 4.9 branch.

Chung-Lin

2015-01-20  Chung-Lin Tang  clt...@codesourcery.com

gcc/
* config/nios2/nios2.c (nios2_asm_file_end): Implement
TARGET_ASM_FILE_END hook for adding .note.GNU-stack section when
needed.
(TARGET_ASM_FILE_END): Define.

libgcc/
* config/nios2/linux-unwind.h (nios2_fallback_frame_state):
Update rt_sigframe format and address for current Nios II
Linux conventions.
Index: libgcc/config/nios2/linux-unwind.h
===
--- libgcc/config/nios2/linux-unwind.h	(revision 219897)
+++ libgcc/config/nios2/linux-unwind.h	(working copy)
@@ -67,10 +67,9 @@ nios2_fallback_frame_state (struct _Unwind_Context
   if (pc[0] == (0x0084 | (__NR_rt_sigreturn  6)))
 {
   struct rt_sigframe {
-	char retcode[12];
 	siginfo_t info;
 	struct nios2_ucontext uc;
-  } *rt_ = context-ra;
+  } *rt_ = context-cfa;
   struct nios2_mcontext *regs = rt_-uc.uc_mcontext;
   int i;
 
Index: gcc/config/nios2/nios2.c
===
--- gcc/config/nios2/nios2.c	(revision 219897)
+++ gcc/config/nios2/nios2.c	(working copy)
@@ -2223,6 +2223,18 @@ nios2_output_dwarf_dtprel (FILE *file, int size, r
   fprintf (file, ));
 }
 
+/* Implemet TARGET_ASM_FILE_END.  */
+
+static void
+nios2_asm_file_end (void)
+{
+  /* The Nios II Linux stack is mapped non-executable by default, so add a
+ .note.GNU-stack section for switching to executable stacks only when
+ trampolines are generated.  */
+  if (TARGET_LINUX_ABI  trampolines_created)
+file_end_indicate_exec_stack ();
+}
+
 /* Implement TARGET_ASM_FUNCTION_PROLOGUE.  */
 static void
 nios2_asm_function_prologue (FILE *file, HOST_WIDE_INT size ATTRIBUTE_UNUSED)
@@ -3401,6 +3413,9 @@ nios2_merge_decl_attributes (tree olddecl, tree ne
 #undef TARGET_ASM_OUTPUT_ADDR_CONST_EXTRA
 #define TARGET_ASM_OUTPUT_ADDR_CONST_EXTRA nios2_output_addr_const_extra
 
+#undef TARGET_ASM_FILE_END
+#define TARGET_ASM_FILE_END nios2_asm_file_end
+
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE nios2_option_override
 


Re: [PATCH] gcc/config/nios2/nios2.c: Let custom_builtin_name[*] always be zero terminated string

2014-07-26 Thread Chung-Lin Tang
On 14/7/26 11:28 AM, Chen Gang wrote:
 The related strncpy() for custom_builtin_name[*] may set 5 none-zero
 characters, which may cause custom_builtin_name[*] is none-zero
 terminated.
 
 So add additional '\0' byte for custom_builtin_name[*].

Where did you see this? Supposedly the snprintf of the custom function
type string should at most be of 'xnxx' format; 4 characters at most,
not 5.  Do you have a test case where the behavior you described appears?

Thanks,
Chung-Lin

 ChangeLog:
 
   * config/nios2/nios2.c (custom_builtin_name): Let it always be
   zero terminated string.
 
 Signed-off-by: Chen Gang gang.chen.5...@gmail.com
 ---
  gcc/config/nios2/nios2.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/gcc/config/nios2/nios2.c b/gcc/config/nios2/nios2.c
 index a4e60c6..e4e005b 100644
 --- a/gcc/config/nios2/nios2.c
 +++ b/gcc/config/nios2/nios2.c
 @@ -2510,7 +2510,7 @@ nios2_expand_fpu_builtin (tree exp, unsigned int code, 
 rtx target)
 total of (3 + 1) * (1 + 3 + 9) == 52 custom builtin functions.
  */
  #define NUM_CUSTOM_BUILTINS ((3 + 1) * (1 + 3 + 9))
 -static char custom_builtin_name[NUM_CUSTOM_BUILTINS][5];
 +static char custom_builtin_name[NUM_CUSTOM_BUILTINS][6];
  
  static void
  nios2_init_custom_builtins (int start_code)
 



Re: [PATCH] gcc/config/nios2/nios2.c: Let custom_builtin_name[*] always be zero terminated string

2014-07-26 Thread Chung-Lin Tang
On 2014/7/26 03:33 PM, Chen Gang wrote:
 On 07/26/2014 02:32 PM, Chung-Lin Tang wrote:
 On 14/7/26 11:28 AM, Chen Gang wrote:
 The related strncpy() for custom_builtin_name[*] may set 5 none-zero
 characters, which may cause custom_builtin_name[*] is none-zero
 terminated.

 So add additional '\0' byte for custom_builtin_name[*].

 Where did you see this? Supposedly the snprintf of the custom function
 type string should at most be of 'xnxx' format; 4 characters at most,
 
 I guess, your 'xnxx' means %cn%c%c, not %sn%s%s (which is current
 implementation), also at present, 32 - n is 17, not 4 for snprintf()
 length limitation.

The use of %sn%s%s is intentional. That allows me to print  directly
using snprintf.

'32 - n' is to represent the rest of the 32-byte buffer, harmless
really, as it is at most 4 chars.

Also, if I were to restrict it in the snprintf argument, it would be 5
(including the null char) not 4.

 If we are always sure it must be no more than 4 characters (at present,
 it is, but in the future, I don't know). We can use strcpy() instead of
 strncpy() for it -- that will let other readers no doubt.
 
 If we need let custom_builtin_name[*] not only zero terminated, but also
 zero pad, we need pass '4' to strncpy() instead of '5', that also will
 clear all doubts.

I don't understand what point you're trying to make here, really. As
Andreas has noted in the other mail, strncpy does zero-termination
automatically.

Chung-Lin

 not 5.  Do you have a test case where the behavior you described appears?

 
 I find it by reading source code, for me, it is simple code, test is
 welcomed, but not mandatory.
 
 But it really needs necessary discussion (for modification, and comments).



Re: [PATCH] RTEMS: Add Nios 2 support

2014-07-18 Thread Chung-Lin Tang
On 2014/7/18 上午 05:19, Joel Sherrill wrote:
 Unless someone objects, I am going to commit this to the
 4.9 branch and head.
 
 --joel

Sorry about the delay, I'll review it today.

Thanks,
Chung-Lin

 On 7/7/2014 1:42 AM, Sebastian Huber wrote:
 Ping.

 On 2014-06-26 13:43, Sebastian Huber wrote:
 This patch should be applied to GCC 4.9 and mainline.  I do not have
 write access, so in case this gets approved, please commit it for me.

 gcc/ChangeLog
 2014-06-26  Sebastian Huber  sebastian.hu...@embedded-brains.de

 * config.gcc (nios2-*-*): Add RTEMS support.
 * config/nios2/rtems.h: New file.
 * config/nios2/t-rtems: Likewise.
 



Re: [PATCH] RTEMS: Add Nios 2 support

2014-07-18 Thread Chung-Lin Tang
For the default multilib settings, it looks like you just intended to
use -mcustom-fpu-cfg=60-2. I suggest you modify t-rtems to do that
instead of enumerating the individual FPU insn options.

Other than that, the patch looks okay.

Chung-Lin

On 2014/6/26 07:43 PM, Sebastian Huber wrote:
 diff --git a/gcc/config/nios2/t-rtems b/gcc/config/nios2/t-rtems
 new file mode 100644
 index 000..f95fa3c
 --- /dev/null
 +++ b/gcc/config/nios2/t-rtems
 @@ -0,0 +1,133 @@
 +# Custom RTEMS multilibs
 +
 +MULTILIB_OPTIONS = mhw-mul mhw-mulx mhw-div mcustom-fadds=253 
 mcustom-fdivs=255 mcustom-fmuls=252 mcustom-fsubs=254
 +
 +# Enumeration of multilibs
 +
 +# MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fdivs=255
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fmuls=252/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fmuls=252
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mhw-div/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mhw-div/mcustom-fdivs=255/mcustom-fmuls=252
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mhw-div/mcustom-fdivs=255/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fdivs=255
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mhw-div/mcustom-fmuls=252/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fmuls=252
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fsubs=254
 +# MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fdivs=255
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fmuls=252/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fmuls=252
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fadds=253
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fdivs=255/mcustom-fmuls=252
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fdivs=255/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fdivs=255
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fmuls=252/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fmuls=252
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fdivs=255
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fmuls=252/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fmuls=252
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fadds=253
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-div/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fdivs=255/mcustom-fmuls=252
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fdivs=255/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fdivs=255
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fmuls=252/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fmuls=252
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mcustom-fadds=253/mcustom-fdivs=255
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mcustom-fadds=253/mcustom-fmuls=252/mcustom-fsubs=254

Re: [PATCH] RTEMS: Add Nios 2 support

2014-07-18 Thread Chung-Lin Tang
On 14/7/18 2:30 PM, Chung-Lin Tang wrote:
 For the default multilib settings, it looks like you just intended to
 use -mcustom-fpu-cfg=60-2. I suggest you modify t-rtems to do that
 instead of enumerating the individual FPU insn options.
 
 Other than that, the patch looks okay.
 
 Chung-Lin

BTW, I assume you have done the appropriate testing for a nios2-rtems
toolchain?

Thanks,
Chung-Lin

 On 2014/6/26 07:43 PM, Sebastian Huber wrote:
 diff --git a/gcc/config/nios2/t-rtems b/gcc/config/nios2/t-rtems
 new file mode 100644
 index 000..f95fa3c
 --- /dev/null
 +++ b/gcc/config/nios2/t-rtems
 @@ -0,0 +1,133 @@
 +# Custom RTEMS multilibs
 +
 +MULTILIB_OPTIONS = mhw-mul mhw-mulx mhw-div mcustom-fadds=253 
 mcustom-fdivs=255 mcustom-fmuls=252 mcustom-fsubs=254
 +
 +# Enumeration of multilibs
 +
 +# MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fdivs=255
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fmuls=252/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fmuls=252
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fadds=253
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mhw-div/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mhw-div/mcustom-fdivs=255/mcustom-fmuls=252
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mhw-div/mcustom-fdivs=255/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fdivs=255
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mhw-div/mcustom-fmuls=252/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fmuls=252
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div/mcustom-fsubs=254
 +# MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mhw-div
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fdivs=255
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fmuls=252/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fmuls=252
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fadds=253/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fadds=253
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-mulx/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fdivs=255/mcustom-fmuls=252
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fdivs=255/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fdivs=255
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fmuls=252/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fmuls=252
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-mulx
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fdivs=255
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fmuls=252/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fmuls=252
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fadds=253/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fadds=253
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mhw-div/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fdivs=255/mcustom-fmuls=252
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fdivs=255/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fdivs=255
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fmuls=252/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fmuls=252
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += mhw-mul/mhw-div
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252/mcustom-fsubs=254
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fmuls=252
 +MULTILIB_EXCEPTIONS += 
 mhw-mul/mcustom-fadds=253/mcustom-fdivs=255/mcustom-fsubs=254

Re: [PATCH, ARM] Cortex-A9 MPCore volatile load workaround

2014-06-26 Thread Chung-Lin Tang
Ping x2.

On 14/6/20 2:24 PM, Chung-Lin Tang wrote:
 Ping.
 
 On 2014/6/9 10:03 PM, Chung-Lin Tang wrote:
 Hi Richard,
 As we talked about earlier, here's a patch to add a compiler option to
 work around Cortex-A9 MPCore errata 761319:
 http://infocenter.arm.com/help/topic/com.arm.doc.uan0004a/UAN0004A_a9_read_read.pdf

 What the option does basically, is to scan for volatile loads during
 reorg, and add a dmb barrier after it. It also strives to make dmb
 conditionally executed under TARGET_THUMB2, which means a new Thumb-2
 specific *memory_barrier_t2 pattern in sync.md, with adjusted
 conds/predicable attributes and %? in output strings.

 Patch originally written by Julian, with additions by Meador, and
 finally a few trivial adjustments by me. Again, we've been carrying this
 fix for a release or two. Okay for trunk?

 Thanks,
 Chung-Lin

 2014-06-09  Julian Brown  jul...@codesourcery.com
 Meador Inge  mead...@codesourcery.com
 Chung-Lin Tang  clt...@codesourcery.com

  * config/arm/arm.c (arm_option_override): Emit warning if
  -mfix-cortex-a9-volatile-hazards is used on an incompatible CPU.
  (any_volatile_loads_p): New.
  (arm_cortex_a9_errata_reorg): New.
  (arm_reorg): Call arm_cortex_a9_errata_reorg.
  * config/arm/arm.opt (mfix-cortex-a9-volatile-hazards): Add
  option.
  * config/arm/sync.md (*memory_barrier): Don't use on Thumb-2.
  (*memory_barrier_t2): New, allow conditional execution on
  Thumb-2.
  * doc/invoke.texi (-mfix-cortex-a9-volatile-hazards): Add
  documentation.

 testsuite/
 * lib/target-supports.exp (check_effective_target_arm_dmb): New.
 * gcc.target/arm/a9-volatile-ordering-erratum-1.c: New test.
 * gcc.target/arm/a9-volatile-ordering-erratum-2.c: New test.
 * gcc.target/arm/a9-volatile-ordering-erratum-3.c: New test.
 * gcc.target/arm/a9-volatile-ordering-erratum-4.c: New test.

 



Re: [PATCH, PR61554] ICE during CCP

2014-06-24 Thread Chung-Lin Tang
On 2014/6/23 04:45 PM, Richard Biener wrote:
 On Mon, Jun 23, 2014 at 7:32 AM, Chung-Lin Tang clt...@codesourcery.com 
 wrote:
 Hi Richard,

 In this change:
 https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01278.html

 where substitute_and_fold() was changed to use a dom walker, the calls
 to purge dead EH edges during the walk can alter the dom-tree, and have
 chaotic results; the testcase in PR 61554 has some blocks traversed
 twice during the walk, causing the segfault during CCP.

 The patch records the to-be-purged-for-dead-EH blocks in a similar
 manner like stmts_to_remove, and processes it after the walk. (another
 possible method would be using a bitmap to record the BBs + calling
 gimple_purge_all_dead_eh_edges...)
 
 Oops.
 
 Bootstrapped and tested on x86_64-linux, is this okay for trunk?
 
 Can you please use a bitmap and use gimple_purge_all_dead_eh_edges
 like tree-ssa-pre.c does?
 
 Also please add the reduced testcase from the PR to the g++.dg/torture
 
 Ok with that changes.
 
 Thanks,
 Richard.

Thanks for the review. Attached is what I committed. Testcase made by
Markus also added.

Thanks,
Chung-Lin

2014-06-24  Chung-Lin Tang  clt...@codesourcery.com

PR tree-optimization/61554
* tree-ssa-propagate.c: Include bitmap.h.
(substitute_and_fold_dom_walker): Add 'bitmap need_eh_cleanup'
member,
properly update constructor/destructor.
(substitute_and_fold_dom_walker::before_dom_children):
Remove call to gimple_purge_dead_eh_edges, add bb-index to
need_eh_cleaup instead.
(substitute_and_fold): Call gimple_purge_all_dead_eh_edges on
need_eh_cleanup.

Index: tree-ssa-propagate.c
===
--- tree-ssa-propagate.c	(revision 211927)
+++ tree-ssa-propagate.c	(working copy)
@@ -29,6 +29,7 @@
 #include function.h
 #include gimple-pretty-print.h
 #include dumpfile.h
+#include bitmap.h
 #include sbitmap.h
 #include tree-ssa-alias.h
 #include internal-fn.h
@@ -1031,8 +1032,13 @@ class substitute_and_fold_dom_walker : public dom_
   fold_fn (fold_fn_), do_dce (do_dce_), something_changed (false)
 {
   stmts_to_remove.create (0);
+  need_eh_cleanup = BITMAP_ALLOC (NULL);
 }
-~substitute_and_fold_dom_walker () { stmts_to_remove.release (); }
+~substitute_and_fold_dom_walker ()
+{
+  stmts_to_remove.release ();
+  BITMAP_FREE (need_eh_cleanup);
+}
 
 virtual void before_dom_children (basic_block);
 virtual void after_dom_children (basic_block) {}
@@ -1042,6 +1048,7 @@ class substitute_and_fold_dom_walker : public dom_
 bool do_dce;
 bool something_changed;
 vecgimple stmts_to_remove;
+bitmap need_eh_cleanup;
 };
 
 void
@@ -1144,7 +1151,7 @@ substitute_and_fold_dom_walker::before_dom_childre
 	  /* If we cleaned up EH information from the statement,
 	 remove EH edges.  */
 	  if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt))
-	gimple_purge_dead_eh_edges (bb);
+	bitmap_set_bit (need_eh_cleanup, bb-index);
 
 	  if (is_gimple_assign (stmt)
 	   (get_gimple_rhs_class (gimple_assign_rhs_code (stmt))
@@ -1235,6 +1242,9 @@ substitute_and_fold (ssa_prop_get_value_fn get_val
 	}
 }
 
+  if (!bitmap_empty_p (walker.need_eh_cleanup))
+gimple_purge_all_dead_eh_edges (walker.need_eh_cleanup);
+
   statistics_counter_event (cfun, Constants propagated,
 			prop_stats.num_const_prop);
   statistics_counter_event (cfun, Copies propagated,


[PATCH, PR61554] ICE during CCP

2014-06-22 Thread Chung-Lin Tang
Hi Richard,

In this change:
https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01278.html

where substitute_and_fold() was changed to use a dom walker, the calls
to purge dead EH edges during the walk can alter the dom-tree, and have
chaotic results; the testcase in PR 61554 has some blocks traversed
twice during the walk, causing the segfault during CCP.

The patch records the to-be-purged-for-dead-EH blocks in a similar
manner like stmts_to_remove, and processes it after the walk. (another
possible method would be using a bitmap to record the BBs + calling
gimple_purge_all_dead_eh_edges...)

Bootstrapped and tested on x86_64-linux, is this okay for trunk?

Thanks,
Chung-Lin

2014-06-23  Chung-Lin Tang  clt...@codesourcery.com

PR tree-optimization/61554
* tree-ssa-propagate.c (substitute_and_fold_dom_walker):
Add 'vecbasic_block bbs_to_purge_dead_eh_edges' member,
properly update constructor/destructor.
(substitute_and_fold_dom_walker::before_dom_children):
Remove call to gimple_purge_dead_eh_edges, add bb to
bbs_to_purge_dead_eh_edges instead.
(substitute_and_fold): Call gimple_purge_dead_eh_edges for
bbs recorded in bbs_to_purge_dead_eh_edges.
Index: tree-ssa-propagate.c
===
--- tree-ssa-propagate.c	(revision 211874)
+++ tree-ssa-propagate.c	(working copy)
@@ -1031,8 +1031,13 @@ class substitute_and_fold_dom_walker : public dom_
   fold_fn (fold_fn_), do_dce (do_dce_), something_changed (false)
 {
   stmts_to_remove.create (0);
+  bbs_to_purge_dead_eh_edges.create (0);
 }
-~substitute_and_fold_dom_walker () { stmts_to_remove.release (); }
+~substitute_and_fold_dom_walker ()
+{
+  stmts_to_remove.release ();
+  bbs_to_purge_dead_eh_edges.release ();
+}
 
 virtual void before_dom_children (basic_block);
 virtual void after_dom_children (basic_block) {}
@@ -1042,6 +1047,7 @@ class substitute_and_fold_dom_walker : public dom_
 bool do_dce;
 bool something_changed;
 vecgimple stmts_to_remove;
+vecbasic_block bbs_to_purge_dead_eh_edges;
 };
 
 void
@@ -1144,7 +1150,7 @@ substitute_and_fold_dom_walker::before_dom_childre
 	  /* If we cleaned up EH information from the statement,
 	 remove EH edges.  */
 	  if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt))
-	gimple_purge_dead_eh_edges (bb);
+	bbs_to_purge_dead_eh_edges.safe_push (bb);
 
 	  if (is_gimple_assign (stmt)
 	   (get_gimple_rhs_class (gimple_assign_rhs_code (stmt))
@@ -1235,6 +1241,14 @@ substitute_and_fold (ssa_prop_get_value_fn get_val
 	}
 }
 
+  while (!walker.bbs_to_purge_dead_eh_edges.is_empty ())
+{
+  basic_block bb = walker.bbs_to_purge_dead_eh_edges.pop ();
+  gimple_purge_dead_eh_edges (bb);
+  if (dump_file  dump_flags  TDF_DETAILS)
+	fprintf (dump_file, Purge dead EH edges from bb %d\n, bb-index);
+}
+
   statistics_counter_event (cfun, Constants propagated,
 			prop_stats.num_const_prop);
   statistics_counter_event (cfun, Copies propagated,


Re: [PATCH, ARM] MI-thunk fix for TARGET_THUMB1_ONLY

2014-06-20 Thread Chung-Lin Tang
On 2014/6/18 上午 06:26, Ramana Radhakrishnan wrote:
 On Sun, Jun 8, 2014 at 12:27 PM, Chung-Lin Tang clt...@codesourcery.com 
 wrote:
  Hi Richard, Ramana,
 
  Attached is a small fix for resolving a g++.old-deja/g++.jason/thunk2.C
  regression we found under a TARGET_THUMB1_ONLY multilib (-mthumb
  -march=armv6-m to be exact). Basically under those conditions, the thunk
  is in Thumb mode, so the subtraction should be 4 rather than 8.
 Yep, this is OK with a minor change to the comment to make it more explicit.
 
 +  /* Output .word .LTHUNKn-[37]-.LTHUNKPCn.  */
 s/37/3,7/
 
 
 Ok with that change and if no regressions.
 
 OK for release branches unless the RM's object in 24 hours.

Re-tested on a recent trunk, verified g++.jason/thunk2.C resolved with
patch and no regressions. Committed on trunk and backported to 4.8, 4.9
branches.

Thanks,
Chung-Lin



Re: [PATCH, ARM] Cortex-A9 MPCore volatile load workaround

2014-06-20 Thread Chung-Lin Tang
Ping.

On 2014/6/9 10:03 PM, Chung-Lin Tang wrote:
 Hi Richard,
 As we talked about earlier, here's a patch to add a compiler option to
 work around Cortex-A9 MPCore errata 761319:
 http://infocenter.arm.com/help/topic/com.arm.doc.uan0004a/UAN0004A_a9_read_read.pdf
 
 What the option does basically, is to scan for volatile loads during
 reorg, and add a dmb barrier after it. It also strives to make dmb
 conditionally executed under TARGET_THUMB2, which means a new Thumb-2
 specific *memory_barrier_t2 pattern in sync.md, with adjusted
 conds/predicable attributes and %? in output strings.
 
 Patch originally written by Julian, with additions by Meador, and
 finally a few trivial adjustments by me. Again, we've been carrying this
 fix for a release or two. Okay for trunk?
 
 Thanks,
 Chung-Lin
 
 2014-06-09  Julian Brown  jul...@codesourcery.com
 Meador Inge  mead...@codesourcery.com
 Chung-Lin Tang  clt...@codesourcery.com
 
   * config/arm/arm.c (arm_option_override): Emit warning if
   -mfix-cortex-a9-volatile-hazards is used on an incompatible CPU.
   (any_volatile_loads_p): New.
   (arm_cortex_a9_errata_reorg): New.
   (arm_reorg): Call arm_cortex_a9_errata_reorg.
   * config/arm/arm.opt (mfix-cortex-a9-volatile-hazards): Add
   option.
   * config/arm/sync.md (*memory_barrier): Don't use on Thumb-2.
   (*memory_barrier_t2): New, allow conditional execution on
   Thumb-2.
   * doc/invoke.texi (-mfix-cortex-a9-volatile-hazards): Add
   documentation.
 
 testsuite/
 * lib/target-supports.exp (check_effective_target_arm_dmb): New.
 * gcc.target/arm/a9-volatile-ordering-erratum-1.c: New test.
 * gcc.target/arm/a9-volatile-ordering-erratum-2.c: New test.
 * gcc.target/arm/a9-volatile-ordering-erratum-3.c: New test.
 * gcc.target/arm/a9-volatile-ordering-erratum-4.c: New test.
 



Re: [ARM] Fix build failure due to movsi_compare0 (PR 61430)

2014-06-16 Thread Chung-Lin Tang
On 14/6/16 5:55 PM, James Greenhalgh wrote:
 On Fri, Jun 13, 2014 at 05:46:45PM +0100, Vladimir Makarov wrote:
 On 2014-06-11, 1:17 PM, Chung-Lin Tang wrote:
 Looking at this too, as an LRA exercise. I don't really think the
 pattern is wrong, rather LRA should just avoid creating the copy in this
 case; it's a result of operand constraining, after all.

 Attached is the small LRA patch, pending testing. Vladimir should weight
 in on this.


 The patch is safe and ok.  Thanks for working on it, Chung-Lin.
 
 As this patch fixes a build failure on ARM I'd like to have it applied
 today. If I don't hear anything which would stop me, I'll commit this on
 Chung-Lin's behalf in a few hours.
 
 Cheers,
 James

I just committed it, after a testsuite run on x86_64 (to be sure). And
thanks to James for doing the ARM tests.

Thanks,
Chung-Lin


 * ira-lives.c (process_bb_lives): Skip creating copy during
 insn sca when src/dest has constrained to same regno.






Re: [ARM] Fix build failure due to movsi_compare0 (PR 61430)

2014-06-11 Thread Chung-Lin Tang
On 2014/6/11 下午 06:32, James Greenhalgh wrote:
 
 Hi,
 
 A recent change somewhere exposed a latent bug between LRA and the definition
 of the movsi_compare0 pattern.
 
 This pattern ties the source and destination register of a set together
 a (match_dup) and register constraints:
 
[(set (reg:CC CC_REGNUM)
   (compare:CC (match_operand:SI 1 s_register_operand 0,r)
   (const_int 0)))
 (set (match_operand:SI 0 s_register_operand =r,r)
   (match_dup 1))]
 
 This confuses LRA which expects the source and destination register of
 a set to be different.
 
 reduced.c: In function '_IO_vfscanf_internal':
 reduced.c:104:1: internal compiler error: in lra_create_copy, at lra.c:1512
  }
  ^
 0x8c3f9a lra_create_copy(int, int, int)
   /work/gcc-dev/src/gcc/gcc/lra.c:1512
 0x8e4ab0 process_bb_lives
   /work/gcc-dev/src/gcc/gcc/lra-lives.c:568
 0x8e4ab0 lra_create_live_ranges(bool)
   /work/gcc-dev/src/gcc/gcc/lra-lives.c:1019
 0x8c5a39 lra(_IO_FILE*)
   /work/gcc-dev/src/gcc/gcc/lra.c:2356
 0x873a96 do_reload
   /work/gcc-dev/src/gcc/gcc/ira.c:5415
 0x873a96 execute
   /work/gcc-dev/src/gcc/gcc/ira.c:5576
 Please submit a full bug report,
 
 We can fix the pattern by moving away from match_dup and using register
 tying with constraints consistently.
 
 I'm not entirely convinced that this is legitimate (my vague recollection is
 that register tying should only be used to tie inputs to outputs).
 
 This has passed testing on a bunch of ARM targets, and fixes the build
 issues I've been seeing.

Looking at this too, as an LRA exercise. I don't really think the
pattern is wrong, rather LRA should just avoid creating the copy in this
case; it's a result of operand constraining, after all.

Attached is the small LRA patch, pending testing. Vladimir should weight
in on this.

Thanks,
Chung-Lin

* ira-lives.c (process_bb_lives): Skip creating copy during
insn sca when src/dest has constrained to same regno.

Index: lra-lives.c
===
--- lra-lives.c (revision 211398)
+++ lra-lives.c (working copy)
@@ -558,7 +558,11 @@ process_bb_lives (basic_block bb, int curr_point)
  /* It might be 'inheritance pseudo - reload pseudo'.  */
  || (src_regno = lra_constraint_new_regno_start
   ((int) REGNO (SET_DEST (set))
- = lra_constraint_new_regno_start
+ = lra_constraint_new_regno_start)
+ /* Remember to skip special cases where src/dest regnos are
+the same, e.g. insn SET pattern has matching constraints
+like =r,0.  */
+  src_regno != (int) REGNO (SET_DEST (set)
{
  int hard_regno = -1, regno = -1;
 


[PATCH, ARM] Cortex-A9 MPCore volatile load workaround

2014-06-09 Thread Chung-Lin Tang
Hi Richard,
As we talked about earlier, here's a patch to add a compiler option to
work around Cortex-A9 MPCore errata 761319:
http://infocenter.arm.com/help/topic/com.arm.doc.uan0004a/UAN0004A_a9_read_read.pdf

What the option does basically, is to scan for volatile loads during
reorg, and add a dmb barrier after it. It also strives to make dmb
conditionally executed under TARGET_THUMB2, which means a new Thumb-2
specific *memory_barrier_t2 pattern in sync.md, with adjusted
conds/predicable attributes and %? in output strings.

Patch originally written by Julian, with additions by Meador, and
finally a few trivial adjustments by me. Again, we've been carrying this
fix for a release or two. Okay for trunk?

Thanks,
Chung-Lin

2014-06-09  Julian Brown  jul...@codesourcery.com
Meador Inge  mead...@codesourcery.com
Chung-Lin Tang  clt...@codesourcery.com

* config/arm/arm.c (arm_option_override): Emit warning if
-mfix-cortex-a9-volatile-hazards is used on an incompatible CPU.
(any_volatile_loads_p): New.
(arm_cortex_a9_errata_reorg): New.
(arm_reorg): Call arm_cortex_a9_errata_reorg.
* config/arm/arm.opt (mfix-cortex-a9-volatile-hazards): Add
option.
* config/arm/sync.md (*memory_barrier): Don't use on Thumb-2.
(*memory_barrier_t2): New, allow conditional execution on
Thumb-2.
* doc/invoke.texi (-mfix-cortex-a9-volatile-hazards): Add
documentation.

testsuite/
* lib/target-supports.exp (check_effective_target_arm_dmb): New.
* gcc.target/arm/a9-volatile-ordering-erratum-1.c: New test.
* gcc.target/arm/a9-volatile-ordering-erratum-2.c: New test.
* gcc.target/arm/a9-volatile-ordering-erratum-3.c: New test.
* gcc.target/arm/a9-volatile-ordering-erratum-4.c: New test.
Index: doc/invoke.texi
===
--- doc/invoke.texi	(revision 211364)
+++ doc/invoke.texi	(working copy)
@@ -535,6 +535,7 @@ Objective-C and Objective-C++ Dialects}.
 -mtp=@var{name} -mtls-dialect=@var{dialect} @gol
 -mword-relocations @gol
 -mfix-cortex-m3-ldrd @gol
+-mfix-cortex-a9-volatile-hazards @gol
 -munaligned-access @gol
 -mneon-for-64bits @gol
 -mslow-flash-data @gol
@@ -12677,6 +12678,16 @@ with overlapping destination and base registers ar
 generating these instructions.  This option is enabled by default when
 @option{-mcpu=cortex-m3} is specified.
 
+@item -mfix-cortex-a9-volatile-hazards
+@opindex mfix-cortex-a9-volatile-hazards
+Cortex-A9 MPCore processors have an erratum that in rare cases cause
+successive memory loads to appear out of program order if another processor
+is simultaneously writing to the same location.  This causes problems if
+volatile variables are used for communication between processors.
+This option enables the ARM recommended workaround, to insert a @code{dmb}
+instruction after each volatile load.  Because of the potentially high
+overhead, this workaround is not enabled by default.
+
 @item -munaligned-access
 @itemx -mno-unaligned-access
 @opindex munaligned-access
Index: config/arm/arm.opt
===
--- config/arm/arm.opt	(revision 211364)
+++ config/arm/arm.opt	(working copy)
@@ -264,6 +264,11 @@ Target Report Var(fix_cm3_ldrd) Init(2)
 Avoid overlapping destination and address registers on LDRD instructions
 that may trigger Cortex-M3 errata.
 
+mfix-cortex-a9-volatile-hazards
+Target Report Var(fix_a9_volatile_hazards) Init(0)
+Avoid errata causing read-after-read hazards for concurrent volatile
+accesses on Cortex-A9 MPCore processors.
+
 munaligned-access
 Target Report Var(unaligned_access) Init(2)
 Enable unaligned word and halfword accesses to packed data.
Index: config/arm/sync.md
===
--- config/arm/sync.md	(revision 211364)
+++ config/arm/sync.md	(working copy)
@@ -46,7 +46,7 @@
 (define_insn *memory_barrier
   [(set (match_operand:BLK 0  )
 	(unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))]
-  TARGET_HAVE_MEMORY_BARRIER
+  TARGET_HAVE_MEMORY_BARRIER  !TARGET_THUMB2
   {
 if (TARGET_HAVE_DMB)
   {
@@ -65,6 +65,29 @@
(set_attr conds unconditional)
(set_attr predicable no)])
 
+;; Thumb-2 version allows conditional execution
+(define_insn *memory_barrier_t2
+  [(set (match_operand:BLK 0  )
+	(unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))]
+  TARGET_HAVE_MEMORY_BARRIER  TARGET_THUMB2
+  {
+if (TARGET_HAVE_DMB)
+  {
+	/* Note we issue a system level barrier. We should consider issuing
+	   a inner shareabilty zone barrier here instead, ie. DMB ISH.  */
+	/* ??? Differentiate based on SEQ_CST vs less strict?  */
+	return dmb%?\tsy;
+  }
+
+if (TARGET_HAVE_DMB_MCR)
+  return mcr%?\tp15, 0, r0, c7, c10, 5;
+
+gcc_unreachable ();
+  }
+  [(set_attr length 4)
+   (set_attr conds nocond)
+   (set_attr predicable yes

[PATCH, ARM] MI-thunk fix for TARGET_THUMB1_ONLY

2014-06-08 Thread Chung-Lin Tang
Hi Richard, Ramana,

Attached is a small fix for resolving a g++.old-deja/g++.jason/thunk2.C
regression we found under a TARGET_THUMB1_ONLY multilib (-mthumb
-march=armv6-m to be exact). Basically under those conditions, the thunk
is in Thumb mode, so the subtraction should be 4 rather than 8.

Original patch was by Julian, with trivial adaptations for trunk by me.
We've been carrying this fix for a while by now. Okay for trunk? (and
stable branches?)

Thanks,
Chung-Lin

2014-06-08  Julian Brown  jul...@codesourcery.com
Chung-Lin Tang  clt...@codesourcery.com

* config/arm/arm.c (arm_output_mi_thunk): Fix offset for
TARGET_THUMB1_ONLY. Add comments.
Index: config/arm/arm.c
===
--- config/arm/arm.c	(revision 211353)
+++ config/arm/arm.c	(working copy)
@@ -28428,9 +28428,13 @@ arm_output_mi_thunk (FILE *file, tree thunk ATTRIB
   fputs (:\n, file);
   if (flag_pic)
 	{
-	  /* Output .word .LTHUNKn-7-.LTHUNKPCn.  */
+	  /* Output .word .LTHUNKn-[37]-.LTHUNKPCn.  */
 	  rtx tem = XEXP (DECL_RTL (function), 0);
-	  tem = plus_constant (GET_MODE (tem), tem, -7);
+	  /* For TARGET_THUMB1_ONLY the thunk is in Thumb mode, so the PC
+	 pipeline offset is four rather than eight.  Adjust the offset
+	 accordingly.  */
+	  tem = plus_constant (GET_MODE (tem), tem,
+			   TARGET_THUMB1_ONLY ? -3 : -7);
 	  tem = gen_rtx_MINUS (GET_MODE (tem),
 			   tem,
 			   gen_rtx_SYMBOL_REF (Pmode,


[PATCH, nios2] Misc. fixes

2014-04-01 Thread Chung-Lin Tang
This contains a few small changes/fixes, committed to trunk.

(1) a typo in nios2_function_profiler.
(2) unneeded parameter in nios2_large_got_address().
(3) remove two no longer needed unspec enums.
(4) Provide basic implementation of the delegitimize address hook, to
silent a unrecognized UNSPECs warning when building with -g.
(5) Adjust the nios2-linux LINK_SPEC to define the dynamic linker name
to /lib/ld-linux-nios2.so.1, which is the current upstreamed arrangement.

Chung-Lin

2014-04-01  Chung-Lin Tang  clt...@codesourcery.com

* config/nios2/nios2.md (unspec): Remove UNSPEC_TLS, UNSPEC_TLS_LDM.
* config/nios2/nios2.c (nios2_function_profiler): Fix addi operand
typo.
(nios2_large_got_address): Remove unneeded 'sym' parameter.
(nios2_got_address): Update nios2_large_got_address call site.
(nios2_delegitimize_address): New function.
(TARGET_DELEGITIMIZE_ADDRESS): Define to nios2_delegitimize_address.
* config/nios2/linux.h (GLIBC_DYNAMIC_LINKER): Define.
(LINK_SPEC): Specify dynamic linker using GNU_USER_DYNAMIC_LINKER.
Index: config/nios2/linux.h
===
--- config/nios2/linux.h	(revision 208987)
+++ config/nios2/linux.h	(working copy)
@@ -26,11 +26,16 @@
 }   \
   while (0)
 
+#define GLIBC_DYNAMIC_LINKER /lib/ld-linux-nios2.so.1
+
 #undef LINK_SPEC
 #define LINK_SPEC LINK_SPEC_ENDIAN \
-   %{shared:-shared} \
-%{static:-Bstatic} \
-%{rdynamic:-export-dynamic}
+ %{shared:-shared} \
+  %{!shared: \
+%{!static: \
+  %{rdynamic:-export-dynamic} \
+  -dynamic-linker  GNU_USER_DYNAMIC_LINKER } \
+%{static:-static}}
 
 /* This toolchain implements the ABI for Linux Systems documented in the
Nios II Processor Reference Handbook.  */
Index: config/nios2/nios2.md
===
--- config/nios2/nios2.md	(revision 208987)
+++ config/nios2/nios2.md	(working copy)
@@ -74,8 +74,6 @@
   UNSPEC_PIC_SYM
   UNSPEC_PIC_CALL_SYM
   UNSPEC_PIC_GOTOFF_SYM
-  UNSPEC_TLS
-  UNSPEC_TLS_LDM
   UNSPEC_LOAD_TLS_IE
   UNSPEC_ADD_TLS_LE
   UNSPEC_ADD_TLS_GD
Index: config/nios2/nios2.c
===
--- config/nios2/nios2.c	(revision 208987)
+++ config/nios2/nios2.c	(working copy)
@@ -695,7 +695,7 @@ nios2_function_profiler (FILE *file, int labelno A
   fprintf (file, \taddi\tr3, r3, %%lo(_gp_got - 1b)\n);
   fprintf (file, \tadd\tr2, r2, r3\n);
   fprintf (file, \tmovhi\tr3, %%call_hiadj(_mcount)\n);
-  fprintf (file, \taddi\tr3, %%call_lo(_mcount)\n);
+  fprintf (file, \taddi\tr3, r3, %%call_lo(_mcount)\n);
   fprintf (file, \tadd\tr3, r2, r3\n);
   fprintf (file, \tldw\tr2, 0(r3)\n);
   fprintf (file, \tcallr\tr2\n);
@@ -1183,7 +1183,7 @@ nios2_unspec_offset (rtx loc, int unspec)
 
 /* Generate GOT pointer based address with large offset.  */
 static rtx
-nios2_large_got_address (rtx sym, rtx offset)
+nios2_large_got_address (rtx offset)
 {
   rtx addr = gen_reg_rtx (Pmode);
   emit_insn (gen_add3_insn (addr, pic_offset_table_rtx,
@@ -1199,7 +1199,7 @@ nios2_got_address (rtx loc, int unspec)
   crtl-uses_pic_offset_table = 1;
 
   if (nios2_large_offset_p (unspec))
-return nios2_large_got_address (loc, offset);
+return nios2_large_got_address (offset);
 
   return gen_rtx_PLUS (Pmode, pic_offset_table_rtx, offset);
 }
@@ -1805,6 +1805,30 @@ nios2_legitimize_address (rtx x, rtx oldx ATTRIBUT
   return x;
 }
 
+static rtx
+nios2_delegitimize_address (rtx x)
+{
+  x = delegitimize_mem_from_attrs (x);
+
+  if (GET_CODE (x) == CONST  GET_CODE (XEXP (x, 0)) == UNSPEC)
+{
+  switch (XINT (XEXP (x, 0), 1))
+	{
+	case UNSPEC_PIC_SYM:
+	case UNSPEC_PIC_CALL_SYM:
+	case UNSPEC_PIC_GOTOFF_SYM:
+	case UNSPEC_ADD_TLS_GD:
+	case UNSPEC_ADD_TLS_LDM:
+	case UNSPEC_LOAD_TLS_IE:
+	case UNSPEC_ADD_TLS_LE:
+	  x = XVECEXP (XEXP (x, 0), 0, 0);
+	  gcc_assert (GET_CODE (x) == SYMBOL_REF);
+	  break;
+	}
+}
+  return x;
+}
+
 /* Main expander function for RTL moves.  */
 int
 nios2_emit_move_sequence (rtx *operands, enum machine_mode mode)
@@ -3259,6 +3283,9 @@ nios2_merge_decl_attributes (tree olddecl, tree ne
 #undef TARGET_LEGITIMIZE_ADDRESS
 #define TARGET_LEGITIMIZE_ADDRESS nios2_legitimize_address
 
+#undef TARGET_DELEGITIMIZE_ADDRESS
+#define TARGET_DELEGITIMIZE_ADDRESS nios2_delegitimize_address
+
 #undef TARGET_LEGITIMATE_ADDRESS_P
 #define TARGET_LEGITIMATE_ADDRESS_P nios2_legitimate_address_p
 


[PATCH, nios2] Fix frame pointer calculation

2014-03-11 Thread Chung-Lin Tang
The current Nios II prologue/epilogue code has a bug where the frame
pointer points to the start of the register save area, rather than the
frame slot where FP is saved (as specified the Nios II ABI).

This was only discovered relatively recently, as dwarf-based unwinding
is used most of the time, plus nios2 GDB's prologue analyzer is capable
of determining where FP is stored. Still this needs to fixed to be
conformant to the ABI.

Tested (both the compiler and gdb) and applied to trunk.

Chung-Lin

2014-03-11  Chung-Lin Tang  clt...@codesourcery.com

* config/nios2/nios2.c (machine_function): Add fp_save_offset
field.
(nios2_compute_frame_layout):
Add calculation of cfun-machine-fp_save_offset.
(nios2_expand_prologue): Correct setting of frame pointer
register in prologue.
(nios2_expand_epilogue): Update recovery of stack pointer from
frame pointer accordingly.
(nios2_initial_elimination_offset): Update calculation of offset
for eliminating to HARD_FRAME_POINTER_REGNUM.
Index: config/nios2/nios2.c
===
--- config/nios2/nios2.c(revision 208471)
+++ config/nios2/nios2.c(working copy)
@@ -81,8 +81,10 @@ struct GTY (()) machine_function
   int args_size;
   /* Number of bytes needed to store registers in frame.  */
   int save_reg_size;
-   /* Offset from new stack pointer to store registers.  */
+  /* Offset from new stack pointer to store registers.  */
   int save_regs_offset;
+  /* Offset from save_regs_offset to store frame pointer register.  */
+  int fp_save_offset;
   /* != 0 if frame layout already calculated.  */
   int initialized;
 };
@@ -390,6 +392,17 @@ nios2_compute_frame_layout (void)
  }
 }
 
+  cfun-machine-fp_save_offset = 0;
+  if (save_mask  (1  HARD_FRAME_POINTER_REGNUM))
+{
+  int fp_save_offset = 0;
+  for (regno = 0; regno  HARD_FRAME_POINTER_REGNUM; regno++)
+   if (save_mask  (1  regno))
+ fp_save_offset += 4;
+
+  cfun-machine-fp_save_offset = fp_save_offset;
+}
+
   save_reg_size = NIOS2_STACK_ALIGN (save_reg_size);
   total_size += save_reg_size;
   total_size += NIOS2_STACK_ALIGN (crtl-args.pretend_args_size);
@@ -450,8 +463,8 @@ nios2_expand_prologue (void)
 {
   unsigned int regno;
   int total_frame_size, save_offset;
-  int sp_offset; /* offset from base_reg to final stack value.  */
-  int fp_offset; /* offset from base_reg to final fp value.  */
+  int sp_offset;  /* offset from base_reg to final stack value.  */
+  int save_regs_base; /* offset from base_reg to register save area.  */
   rtx insn;
 
   total_frame_size = nios2_compute_frame_layout ();
@@ -468,8 +481,7 @@ nios2_expand_prologue (void)
gen_int_mode (cfun-machine-save_regs_offset
  - total_frame_size, Pmode)));
   RTX_FRAME_RELATED_P (insn) = 1;
-
-  fp_offset = 0;
+  save_regs_base = 0;
   sp_offset = -cfun-machine-save_regs_offset;
 }
   else if (total_frame_size)
@@ -478,16 +490,16 @@ nios2_expand_prologue (void)
   gen_int_mode (-total_frame_size,
 Pmode)));
   RTX_FRAME_RELATED_P (insn) = 1;
-  fp_offset = cfun-machine-save_regs_offset;
+  save_regs_base = cfun-machine-save_regs_offset;
   sp_offset = 0;
 }
   else
-fp_offset = sp_offset = 0;
+save_regs_base = sp_offset = 0;
 
   if (crtl-limit_stack)
 nios2_emit_stack_limit_check ();
 
-  save_offset = fp_offset + cfun-machine-save_reg_size;
+  save_offset = save_regs_base + cfun-machine-save_reg_size;
 
   for (regno = LAST_GP_REG; regno  0; regno--)
 if (cfun-machine-save_mask  (1  regno))
@@ -498,9 +510,10 @@ nios2_expand_prologue (void)
 
   if (frame_pointer_needed)
 {
+  int fp_save_offset = save_regs_base + cfun-machine-fp_save_offset;
   insn = emit_insn (gen_add3_insn (hard_frame_pointer_rtx,
   stack_pointer_rtx,
-  gen_int_mode (fp_offset, Pmode)));
+  gen_int_mode (fp_save_offset, Pmode)));
   RTX_FRAME_RELATED_P (insn) = 1;
 }
 
@@ -555,7 +568,9 @@ nios2_expand_epilogue (bool sibcall_p)
   if (frame_pointer_needed)
 {
   /* Recover the stack pointer.  */
-  insn = emit_move_insn (stack_pointer_rtx, hard_frame_pointer_rtx);
+  insn = emit_insn (gen_add3_insn
+   (stack_pointer_rtx, hard_frame_pointer_rtx,
+gen_int_mode (-cfun-machine-fp_save_offset, Pmode)));
   cfa_adj = plus_constant (Pmode, stack_pointer_rtx,
   (total_frame_size
- cfun-machine-save_regs_offset));
@@ -772,7 +787,8 @@ nios2_initial_elimination_offset (int from, int to
 /* If we are asked for the frame pointer offset

Re: [PATCH/middle-end 2/6] __builtin_thread_pointer and AARCH64 ILP32

2014-02-24 Thread Chung-Lin Tang
On 2014/2/25 上午 10:06, Andrew Pinski wrote:
 On Wed, Dec 4, 2013 at 9:42 AM, Yufeng Zhang yufeng.zh...@arm.com wrote:
 On 12/03/13 21:24, Andrew Pinski wrote:

 Hi,
With ILP32 AARCH64, Pmode (DImode) != ptrmode (SImode) so the variable
 decl
 has a mode of SImode while the register is DImode.  So the target that
 gets
 passed down to expand_builtin_thread_pointer is NULL as expand does not
 know how to get a subreg for a pointer type.

 This fixes the problem by handling a NULL target like we are able to
 handle
 for a non register/correct mode target inside
 expand_builtin_thread_pointer.

 OK?  Build and tested for aarch64-elf with no regressions.

 Thanks,
 Andrew Pinski

 * builtins.c (expand_builtin_thread_pointer): Create a new target
 when the target is NULL.
 ---
   gcc/ChangeLog  |5 +
   gcc/builtins.c |2 +-
   2 files changed, 6 insertions(+), 1 deletions(-)

 diff --git a/gcc/builtins.c b/gcc/builtins.c
 index 4f1c818..66797fa 100644
 --- a/gcc/builtins.c
 +++ b/gcc/builtins.c
 @@ -5699,7 +5699,7 @@ expand_builtin_thread_pointer (tree exp, rtx target)
 if (icode != CODE_FOR_nothing)
   {
 struct expand_operand op;
 -  if (!REG_P (target) || GET_MODE (target) != Pmode)
 +  if (target == NULL_RTX || !REG_P (target) || GET_MODE (target) !=
 Pmode)
 target = gen_reg_rtx (Pmode);
 create_output_operand (op, target, Pmode);
 expand_insn (icode, 1,op);


 Shouldn't thread pointer have ptr_mode instead?  I'm aware that on AArch64
 the thread pointer system register tpidr_el0 is 64-bit wide regardless of
 ILP32 or not, but in the abstracted view of AArch64 ILP32 world, the thread
 pointer shall be a 32-bit pointer; the OS should have taken care of the
 hardware register tpidr_el0 by having its higher 32 bits cleared.  I think
 expand_builtin_thread_pointer and expand_builtin_set_thread_pointer should
 use ptr_mode instead.  Correct me if I missed anything.

 Add Chung-Lin Tang to the CC list; Chung-Lin wrote these builtins in r192364
 
 Pmode seems more correct as the hardware mode is Pmode, having it
 ptr_mode would expose the C size of the pointer rather than the
 hardware size.
 
 Thanks,
 Andrew Pinski

Don't have any approval powers, but the fix looks rather obvious.

Thanks,
Chung-Lin



[PATCH, nios2] Large -fPIC support

2014-02-20 Thread Chung-Lin Tang
This patch adds large GOT support for the Nios II backend. Tested by
running glibc tests with -fPIC forced on.  A few smaller libgcc fixes
are also included as well. Patch committed.

Chung-Lin

2014-02-20  Chung-Lin Tang  clt...@codesourcery.com
Sandra Loosemore  san...@codesourcery.com

gcc/
* config/nios2/nios2.md (unspec): Add UNSPEC_PIC_GOTOFF_SYM enum.
* config/nios2/nios2.c (nios2_function_profiler):
Add -fPIC (flag_pic == 2) support.
(nios2_handle_custom_fpu_cfg): Fix warning parameter.
(nios2_large_offset_p): New function.
(nios2_unspec_reloc_p): Move up position, update to use
nios2_large_offset_p.
(nios2_unspec_address): Remove function.
(nios2_unspec_offset): New function.
(nios2_large_got_address): New function.
(nios2_got_address): Add large offset support.
(nios2_legitimize_tls_address): Update usage of removed and new
functions.
(nios2_symbol_binds_local_p): New function.
(nios2_load_pic_address): Add -fPIC (flag_pic == 2) support.
(nios2_legitimize_address): Update to use nios2_large_offset_p.
(nios2_emit_move_sequence): Avoid legitimizing (const (unspec ...)).
(nios2_print_operand): Merge H/L processing, add hiadj/lo
processing for (const (unspec ...)).
(nios2_unspec_reloc_name): Add UNSPEC_PIC_GOTOFF_SYM case.

gcc/testsuite/
* gcc.target/nios2/biggot-1.c: New.
* gcc.target/nios2/biggot-2.c: New.

libgcc/
* config/nios2/t-nios2 (CRTSTUFF_T_CFLAGS): Add -mno-gpopt.
* config/nios2/crti.S: Remove .file directive.
* config/nios2/crtn.S: Likewise.
Index: gcc/config/nios2/nios2.c
===
--- gcc/config/nios2/nios2.c	(revision 207964)
+++ gcc/config/nios2/nios2.c	(working copy)
@@ -664,7 +664,7 @@ void
 nios2_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED)
 {
   fprintf (file, \tmov\tr8, ra\n);
-  if (flag_pic)
+  if (flag_pic == 1)
 {
   fprintf (file, \tnextpc\tr2\n);
   fprintf (file, \t1: movhi\tr3, %%hiadj(_gp_got - 1b)\n);
@@ -673,6 +673,18 @@ nios2_function_profiler (FILE *file, int labelno A
   fprintf (file, \tldw\tr2, %%call(_mcount)(r2)\n);
   fprintf (file, \tcallr\tr2\n);
 }
+  else if (flag_pic == 2)
+{
+  fprintf (file, \tnextpc\tr2\n);
+  fprintf (file, \t1: movhi\tr3, %%hiadj(_gp_got - 1b)\n);
+  fprintf (file, \taddi\tr3, r3, %%lo(_gp_got - 1b)\n);
+  fprintf (file, \tadd\tr2, r2, r3\n);
+  fprintf (file, \tmovhi\tr3, %%call_hiadj(_mcount)\n);
+  fprintf (file, \taddi\tr3, %%call_lo(_mcount)\n);
+  fprintf (file, \tadd\tr3, r2, r3\n);
+  fprintf (file, \tldw\tr2, 0(r3)\n);
+  fprintf (file, \tcallr\tr2\n);
+}
   else
 fprintf (file, \tcall\t_mcount\n);
   fprintf (file, \tmov\tra, r8\n);
@@ -920,7 +932,7 @@ nios2_handle_custom_fpu_cfg (const char *cfgname,
 }
   else
 warning (0, ignoring unrecognized switch %-mcustom-fpu-cfg% 
-	 value %%s%, cfg);
+	 value %%s%, cfgname);
 
   /* Guard against errors in the standard configurations.  */
   nios2_custom_check_insns ();
@@ -1116,20 +1128,64 @@ nios2_call_tls_get_addr (rtx ti)
   return ret;
 }
 
+/* Return true for large offsets requiring hiadj/lo relocation pairs.  */
+static bool
+nios2_large_offset_p (int unspec)
+{
+  gcc_assert (nios2_unspec_reloc_name (unspec) != NULL);
+
+  if (flag_pic == 2
+  /* FIXME: TLS GOT offset relocations will eventually also get this
+	 treatment, after binutils support for those are also completed.  */
+   (unspec == UNSPEC_PIC_SYM || unspec == UNSPEC_PIC_CALL_SYM))
+return true;
+
+  /* 'gotoff' offsets are always hiadj/lo.  */
+  if (unspec == UNSPEC_PIC_GOTOFF_SYM)
+return true;
+
+  return false;
+}
+
+/* Return true for conforming unspec relocations.  Also used in
+   constraints.md and predicates.md.  */
+bool
+nios2_unspec_reloc_p (rtx op)
+{
+  return (GET_CODE (op) == CONST
+	   GET_CODE (XEXP (op, 0)) == UNSPEC
+	   ! nios2_large_offset_p (XINT (XEXP (op, 0), 1)));
+}
+
+/* Helper to generate unspec constant.  */
 static rtx
-nios2_unspec_address (rtx loc, rtx base_reg, int unspec)
+nios2_unspec_offset (rtx loc, int unspec)
 {
-  rtx unspec_offset =
-gen_rtx_CONST (Pmode, gen_rtx_UNSPEC (Pmode, gen_rtvec (1, loc),
-	  unspec));
-  return gen_rtx_PLUS (Pmode, base_reg, unspec_offset);
+  return gen_rtx_CONST (Pmode, gen_rtx_UNSPEC (Pmode, gen_rtvec (1, loc),
+	   unspec));
 }
 
+/* Generate GOT pointer based address with large offset.  */
 static rtx
+nios2_large_got_address (rtx sym, rtx offset)
+{
+  rtx addr = gen_reg_rtx (Pmode);
+  emit_insn (gen_add3_insn (addr, pic_offset_table_rtx,
+			force_reg (Pmode, offset)));
+  return addr;
+}
+
+/* Generate a GOT pointer based address.  */
+static rtx
 nios2_got_address (rtx loc, int unspec)
 {
+  rtx offset

[PATCH, nios2, committed] PR59784, fextsd asm output fix

2014-01-29 Thread Chung-Lin Tang
Hi Savin,

I've committed your patch for PR59784; the fix seems small enough to
accept directly. Thanks a lot for catching this.

Thanks,
Chung-Lin

2014-01-30  Savin Zlobec  savin.zlo...@gmail.com

PR target/59784
* config/nios2/nios2.c (nios2_fpu_insn_asm): Fix asm output of
SFmode to DFmode case.
Index: config/nios2/nios2.c
===
--- config/nios2/nios2.c(revision 207296)
+++ config/nios2/nios2.c(working copy)
@@ -2066,7 +2066,8 @@ nios2_fpu_insn_asm (enum n2fpu_code code)
}
   else
{
- op1 = %0; op2 = %1;
+ op1 = (dst_mode == DFmode ? %D0 : %0);
+ op2 = %1;
  op3 = (num_operands == 2 ? zero : %2);
}
 }


Re: Reload codegen improvement

2014-01-23 Thread Chung-Lin Tang
On 14/1/8 12:22 AM, Bernd Schmidt wrote:
 This fixes a problem identified by Chung-Lin. Once reload is done, all
 equivalencing insns for pseudos that didn't get a hard reg but could be
 eliminated using their REG_EQUIV are deleted. However, we still can
 produce reloads and reload insns for them in certain cases, leading to
 unnecessary spilling. This patch corrects this by making sure we use
 identical tests when deciding whether to ignore an insn while reloading,
 and whether to delete it afterwards.
 
 Bootstrapped and tested on x86_64-linux (with lra_p disabled). Chung-Lin
 says he's tested it as well, I think on arm (probably with something 4.8
 based). Will commit in a few days if no objections.
 
 
 Bernd

Hi Bernd, this does not seem to be committed yet.

Thanks,
Chung-Lin



Re: [buildrobot] [PATCH] Fix redefinition of BITS_PER_UNIT

2014-01-01 Thread Chung-Lin Tang
On 2014/1/1 02:45 PM, Mike Stump wrote:
 On Dec 31, 2013, at 12:26 PM, Jan-Benedict Glaw jbg...@lug-owl.de wrote:
 On Tue, 2013-12-31 15:24:52 +0800, Chung-Lin Tang clt...@codesourcery.com 
 wrote:
 The nios2 port was just committed. Thanks to all that gave time and
 effort to review this.

 Just a heads-up: I see a lot of warnings about BITS_PER_UNIT being
 redefined, see eg.
 http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=74923
 as an example.


 2013-12-31  Jan-Benedict Glaw  jbg...@lug-owl.de

  * config/nios2/nios2.h (BITS_PER_UNIT): Don't define it.

 diff --git a/gcc/config/nios2/nios2.h b/gcc/config/nios2/nios2.h
 index 8e6941b..f333be3 100644
 --- a/gcc/config/nios2/nios2.h
 +++ b/gcc/config/nios2/nios2.h
 @@ -73,7 +73,6 @@
 #define BITS_BIG_ENDIAN 0
 #define BYTES_BIG_ENDIAN (TARGET_BIG_ENDIAN != 0)
 #define WORDS_BIG_ENDIAN (TARGET_BIG_ENDIAN != 0)
 -#define BITS_PER_UNIT 8
 #define BITS_PER_WORD 32
 #define UNITS_PER_WORD 4
 #define POINTER_SIZE 32


 Ok?
 
 Ok.
 

Thanks for catching that.



nios2 port committed (Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts)

2013-12-30 Thread Chung-Lin Tang
On 2013/12/28 02:29 PM, Chung-Lin Tang wrote:
 On 13/12/23 12:54 AM, Chung-Lin Tang wrote:
 Other than these two, I think this can go in.
 Bernd
 Attached is the updated patch for the compiler.

 Since Bernd is a Global Reviewer, am I clear for committing the port
 now? (including the testsuite and libgcc parts)
 
 I will be taking Bernd's prior mail as an approval. For avoidance of
 doubt, unless there are more comments raised, I will be committing the
 port to trunk next week.

The nios2 port was just committed. Thanks to all that gave time and
effort to review this.

Thanks,
Chung-Lin



Re: Question about simple_return pattern for the GCC ARM backend.

2013-12-28 Thread Chung-Lin Tang
On 2013/12/28 09:31 AM, Yangfei (Felix) wrote:
 Hi, 
 
   I think that simple_return standard pattern is useful for the ARM. I mean 
 it should be good for target code performance.
  But seems this pattern is not there for the GCC ARM backend. Can anyone 
 explain the reason why we don’t need this?
 
 Cheers,
 Fei
 

It does use it. Search for the return_strreturn expand pattern, and
the returns code iterator in config/arm/iterators.md.

Chung-Lin



Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts

2013-12-27 Thread Chung-Lin Tang
On 13/12/23 12:54 AM, Chung-Lin Tang wrote:
 Other than these two, I think this can go in.
  Bernd
 Attached is the updated patch for the compiler.
 
 Since Bernd is a Global Reviewer, am I clear for committing the port
 now? (including the testsuite and libgcc parts)

I will be taking Bernd's prior mail as an approval. For avoidance of
doubt, unless there are more comments raised, I will be committing the
port to trunk next week.

Thanks,
Chung-Lin



Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts

2013-12-16 Thread Chung-Lin Tang
Ping x3.

On 13/12/10 12:57 PM, Chung-Lin Tang wrote:
 Ping x2.
 
 On 2013/12/5 12:19 PM, Chung-Lin Tang wrote:
 Ping.

 On 2013/11/26 02:45 PM, Chung-Lin Tang wrote:
 Hi Bernd,
 I've updated the patch again, please see if it looks fit for approval
 now. Including ChangeLog again for completeness.

 Thanks,
 Chung-Lin

 2013-11-26  Chung-Lin Tang  clt...@codesourcery.com
 Sandra Loosemore  san...@codesourcery.com
 Based on patches from Altera Corporation

 * config.gcc (nios2-*-*): Add nios2 config targets.
 * configure.ac (TLS_SECTION_ASM_FLAG): Add nios2 case.
 ($cpu_type): Add nios2 as new cpu type.
 * configure: Regenerate.
 * config/nios2/nios2.c: New file.
 * config/nios2/nios2.h: New file.
 * config/nios2/nios2-opts.h: New file.
 * config/nios2/nios2-protos.h: New file.
 * config/nios2/elf.h: New file.
 * config/nios2/elf.opt: New file.
 * config/nios2/linux.h: New file.
 * config/nios2/nios2.opt: New file.
 * config/nios2/nios2.md: New file.
 * config/nios2/predicates.md: New file.
 * config/nios2/constraints.md: New file.
 * config/nios2/t-nios2: New file.
 * common/config/nios2/nios2-common.c: New file.
 * doc/invoke.texi (Nios II options): Document Nios II specific
 options.
 * doc/md.texi (Nios II family): Document Nios II specific
 constraints.
 * doc/extend.texi (Function Specific Option Pragmas): Document
 Nios II supported target pragma functionality.


 



Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts

2013-12-09 Thread Chung-Lin Tang
Ping x2.

On 2013/12/5 12:19 PM, Chung-Lin Tang wrote:
 Ping.
 
 On 2013/11/26 02:45 PM, Chung-Lin Tang wrote:
 Hi Bernd,
 I've updated the patch again, please see if it looks fit for approval
 now. Including ChangeLog again for completeness.

 Thanks,
 Chung-Lin

 2013-11-26  Chung-Lin Tang  clt...@codesourcery.com
 Sandra Loosemore  san...@codesourcery.com
 Based on patches from Altera Corporation

 * config.gcc (nios2-*-*): Add nios2 config targets.
 * configure.ac (TLS_SECTION_ASM_FLAG): Add nios2 case.
 ($cpu_type): Add nios2 as new cpu type.
 * configure: Regenerate.
 * config/nios2/nios2.c: New file.
 * config/nios2/nios2.h: New file.
 * config/nios2/nios2-opts.h: New file.
 * config/nios2/nios2-protos.h: New file.
 * config/nios2/elf.h: New file.
 * config/nios2/elf.opt: New file.
 * config/nios2/linux.h: New file.
 * config/nios2/nios2.opt: New file.
 * config/nios2/nios2.md: New file.
 * config/nios2/predicates.md: New file.
 * config/nios2/constraints.md: New file.
 * config/nios2/t-nios2: New file.
 * common/config/nios2/nios2-common.c: New file.
 * doc/invoke.texi (Nios II options): Document Nios II specific
 options.
 * doc/md.texi (Nios II family): Document Nios II specific
 constraints.
 * doc/extend.texi (Function Specific Option Pragmas): Document
 Nios II supported target pragma functionality.

 



Re: [PATCH] Hexadecimal numbers in option arguments

2013-12-09 Thread Chung-Lin Tang
On 2013/7/14 09:27 PM, Joseph S. Myers wrote:
 On Sun, 14 Jul 2013, Chung-Lin Tang wrote:
 
 Original patch posted as part of Nios II patches:
 http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01087.html

 This patch is to allow hexadecimal numbers to be used in option
 arguments, e.g. -falign-loops=0x10 can now be used as equivalent to
 -falign-loops=16.

 Joseph, the patch has been modified to use IXDIGIT to check the argument
 string first, as you suggested in the last submission. Is this okay for
 trunk?
 
 This version looks like it will allow plain 0x or 0X as an argument, 
 treating it as 0, rather than treating it as an error (i.e., you need to 
 check there is at least one hex digit after the 0x or 0X before 
 passing the string to strtol).
 

Hi Joseph,

Forgot to follow up on this patch. Here it is with a small update to
check if 'p' got updated to a difference position. Does this now look okay?

Thanks,
Chung-Lin

Index: opts-common.c
===
--- opts-common.c	(revision 205847)
+++ opts-common.c	(working copy)
@@ -147,7 +147,7 @@ find_opt (const char *input, unsigned int lang_mas
   return match_wrong_lang;
 }
 
-/* If ARG is a non-negative integer made up solely of digits, return its
+/* If ARG is a non-negative decimal or hexadecimal integer, return its
value, otherwise return -1.  */
 
 int
@@ -161,6 +161,17 @@ integral_argument (const char *arg)
   if (*p == '\0')
 return atoi (arg);
 
+  /* It wasn't a decimal number - try hexadecimal.  */
+  if (arg[0] == '0'  (arg[1] == 'x' || arg[1] == 'X'))
+{
+  p = arg + 2;
+  while (*p  ISXDIGIT (*p))
+	p++;
+
+  if (p != arg + 2  *p == '\0')
+	return strtol (arg, NULL, 16);
+}
+
   return -1;
 }
 


Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts

2013-12-04 Thread Chung-Lin Tang
Ping.

On 2013/11/26 02:45 PM, Chung-Lin Tang wrote:
 Hi Bernd,
 I've updated the patch again, please see if it looks fit for approval
 now. Including ChangeLog again for completeness.
 
 Thanks,
 Chung-Lin
 
 2013-11-26  Chung-Lin Tang  clt...@codesourcery.com
 Sandra Loosemore  san...@codesourcery.com
 Based on patches from Altera Corporation
 
 * config.gcc (nios2-*-*): Add nios2 config targets.
 * configure.ac (TLS_SECTION_ASM_FLAG): Add nios2 case.
 ($cpu_type): Add nios2 as new cpu type.
 * configure: Regenerate.
 * config/nios2/nios2.c: New file.
 * config/nios2/nios2.h: New file.
 * config/nios2/nios2-opts.h: New file.
 * config/nios2/nios2-protos.h: New file.
 * config/nios2/elf.h: New file.
 * config/nios2/elf.opt: New file.
 * config/nios2/linux.h: New file.
 * config/nios2/nios2.opt: New file.
 * config/nios2/nios2.md: New file.
 * config/nios2/predicates.md: New file.
 * config/nios2/constraints.md: New file.
 * config/nios2/t-nios2: New file.
 * common/config/nios2/nios2-common.c: New file.
 * doc/invoke.texi (Nios II options): Document Nios II specific
 options.
 * doc/md.texi (Nios II family): Document Nios II specific
 constraints.
 * doc/extend.texi (Function Specific Option Pragmas): Document
 Nios II supported target pragma functionality.
 



Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts

2013-11-22 Thread Chung-Lin Tang
On 13/11/22 10:31 PM, Bernd Schmidt wrote:
 If you don't object, I'll keep the clobbers there for now.
 
 If they serve no purpose (and I think they don't), they should go.

I'll check again, but I remember df_regs_ever_live_p doesn't include the
RA reg if the call pattern doesn't have the clobber.

I see c6x doesn't have that clobber either, but checks crtl-is_leaf
instead in the frame layout. Looking across the backends, adding a
clobber appears to be the more usual style.

 +  if (!strncasecmp (cfg, 60-1, 4))

 strcmp, several times. At least judging by the docs allowing 60-1fish
 is unintentional?
 I changed them to use strncmp instead. This routine has to work on a
 possibly longer target attribute string, hence the 'n' variants.
 
 I don't understand this. Using strncmp matches 60-1 and any other
 string beginning with that prefix, but doesn't distinguish between them.
 If that really is the desired behaviour, it needs a comment, but it
 still looks to me as if this is just lacking proper error checking.

This has to work on target attribute strings, which it may be in the
middle of the string. But your right that the single option form may
allow the extended suffix, I'll fix this next.

 +#define SC_REGNO (12)
 +#define STATIC_CHAIN_REGNUM SC_REGNO

 There are a lot of these pairs, and it looks like unnecessary double
 indirection. Lose the former in all cases. No parens around constants.
 I've reorganized a lot of this in nios2.h.
 
 I'd prefer the new #if 0 blocks there to go away.

Argh, forgot to delete that. Thanks for catching.

I'll address the rest in the next revision.

Thanks,
Chung-Lin





Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts

2013-11-21 Thread Chung-Lin Tang
On 2013/11/21 03:25 PM, Richard Henderson wrote:
 On 11/21/2013 02:41 PM, Chung-Lin Tang wrote:
 I'm not saying we tolerate wrong RTL form, but rather that, it should
 be an issue of the RTL passes, not the backend. The backend should just
 be as much as possible, a  machine description.
 
 Matching non-canonical rtl does nothing but slow down the compiler and cause
 confusion for the poor person that has to maintain the code later.

As I mentioned in the last mail, I will modify the constraints as
suggested for the *norsi3 and EQ/NE cmp patterns in the final patch.

Chung-Lin



Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts

2013-11-20 Thread Chung-Lin Tang
On 13/11/20 1:34 AM, Richard Sandiford wrote:
 Chung-Lin Tang clt...@codesourcery.com writes:
 +;;  Integer logical Operations
 +
 +(define_code_iterator LOGICAL [and ior xor])
 +(define_code_attr logical_asm [(and and) (ior or) (xor xor)])
 +
 +(define_insn codesi3
 +  [(set (match_operand:SI 0 register_operand =r,r,r)
 +(LOGICAL:SI (match_operand:SI 1 register_operand %r,r,r)
 +(match_operand:SI 2 logical_operand  rM,J,K)))]
 +  
 +  @
 +logical_asm\\t%0, %1, %z2
 +logical_asm%i2\\t%0, %1, %2
 +logical_asmh%i2\\t%0, %1, %U2
 +  [(set_attr type alu)])
 +
 +(define_insn *norsi3
 +  [(set (match_operand:SI 0 register_operand  =r)
 +(and:SI (not:SI (match_operand:SI 1 register_operand  %r))
 +(not:SI (match_operand:SI 2 reg_or_0_operand  rM]
 +  
 +  nor\\t%0, %1, %z2
 +  [(set_attr type alu)])

 M constraints (for const0_rtx) and reg_or_0 seem unnecessary, no such
 RTL should make it to this point.

 Such RTL does appear under -O0. Removing the 'M' will also
 require a bit of re-working the operand printing mechanics; not a lot of
 work, but I'd rather keep it as is. The behavior of using the zero
 register for a 0-value is also more expected in nios2, I think.

Will any RTL-level passes, say for example, propagate a zero-constant
into the pattern, without also simplifying it to a NOT pattern? (that's
a question, I don't really know)

Personally, I haven't really seen many cases optimizable to NOR, but I
think keeping the 'M' there is pretty harmless.

 Hmm, but if we get (not (const_int 0)) then that sounds like a bug,
 since (and (not X) (not (const_int 0))) should reduce to (not X).
 IMO target-independent code shouldn't try to create the nor-with-0
 form and backends shouldn't match it.
 
 Why would removing 'M' affect the printing mechanism?  Naively I'd
 have expected:

The *norsi3 here is of course straightforward. I was referring to other
cases, like the AND/IOR/XOR pattern above, where I wanted to combine
them into a single alternative. That needs a bit more work to reorganize
the nios2_print_operand() cases.

 (define_insn *norsi3
   [(set (match_operand:SI 0 register_operand  =r)
 (and:SI (not:SI (match_operand:SI 1 register_operand  r))
 (not:SI (match_operand:SI 2 register_operand  r]
   
   nor\\t%0, %1, %2
   [(set_attr type alu)])
 
 to just work.

That will definitely work, though I don't think the zero case does any harm.

 +;; Integer comparisons
 +
 +(define_code_iterator EQNE [eq ne])
 +(define_insn nios2_cmpcode
 +  [(set (match_operand:SI 0 register_operand   =r)
 +(EQNE:SI (match_operand:SI 1 reg_or_0_operand %rM)
 + (match_operand:SI 2 arith_operand rI)))]
 +  
 +  cmpcode%i2\\t%0, %z1, %z2
 +  [(set_attr type alu)])

 Once again, using reg_or_0 and M seems pointless.

 The compares don't support all operations, with only the second operand
 capable of an immediate. Using 'rM' should theoretically allow more
 commutative swapping.
 
 But rtl-wise, we should never generate an EQ or NE with two constants.
 And if one operand is constant then it's supposed to be the second.
 
 The % should give commutativity on its own, without the M.

For EQ/NE I guess that's the case, for the other comparisons I'm not so
sure; I'm not familiar enough with the details of the expander machinery
to claim anything.

If this doesn't carry to other comparisons, I intend to keep it in line
with the other cmp patterns. I experimented a bit with the generated
code, nothing is affected.

 +  emit_insn
 +  (gen_rtx_SET (Pmode, tmp,
 +gen_int_mode (cfun-machine-save_regs_offset,
 +  Pmode)));

 Shouldn't have a mode on the SET, but really should just call
 emit_move_insn. Similar cases elsewhere, search for gen_rtx_SET (Pmode.

 I've removed the modes on SET, though I prefer the more bare generation
 of the insns in some contexts; emit_move_insn() seems to have a lot
 under the hood.
 
 There shouldn't be anything to be afraid of though.  Target-independent
 code would use emit_move_insn for this though, so it needs to just work.

It will work, and I did use it in some places, though I did not
exhaustively search-and-replace.

For (subtle) performance reasons, emit_move_insn() really does too much
as a backend utility. Usually backend code is already very precise on
what we want to generate.

 +  HOST_WIDE_INT var_size;   /* # of var. bytes allocated.  */

 Not the first time they occur in this file, but I suppose I should
 mention that these in-line comments are probably just outside our coding
 guidelines.

 Deleted the comments outside the struct defs.
 
 FWIW, I think it was more that comments should be above the field rather
 than tagged on the right.  (One of the big problems with right-column comments
 is that people tend to make them too short.)

I will change to use the over

Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts

2013-11-20 Thread Chung-Lin Tang
On 13/11/21 7:21 AM, Richard Sandiford wrote:
 Chung-Lin Tang clt...@codesourcery.com writes:
 On 13/11/20 1:34 AM, Richard Sandiford wrote:
 Chung-Lin Tang clt...@codesourcery.com writes:
 +;;  Integer logical Operations
 +
 +(define_code_iterator LOGICAL [and ior xor])
 +(define_code_attr logical_asm [(and and) (ior or) (xor xor)])
 +
 +(define_insn codesi3
 +  [(set (match_operand:SI 0 register_operand =r,r,r)
 +(LOGICAL:SI (match_operand:SI 1 register_operand %r,r,r)
 +(match_operand:SI 2 logical_operand  rM,J,K)))]
 +  
 +  @
 +logical_asm\\t%0, %1, %z2
 +logical_asm%i2\\t%0, %1, %2
 +logical_asmh%i2\\t%0, %1, %U2
 +  [(set_attr type alu)])
 +
 +(define_insn *norsi3
 +  [(set (match_operand:SI 0 register_operand  =r)
 +(and:SI (not:SI (match_operand:SI 1 register_operand  %r))
 +(not:SI (match_operand:SI 2 reg_or_0_operand  
 rM]
 +  
 +  nor\\t%0, %1, %z2
 +  [(set_attr type alu)])

 M constraints (for const0_rtx) and reg_or_0 seem unnecessary, no such
 RTL should make it to this point.

 Such RTL does appear under -O0. Removing the 'M' will also
 require a bit of re-working the operand printing mechanics; not a lot of
 work, but I'd rather keep it as is. The behavior of using the zero
 register for a 0-value is also more expected in nios2, I think.

 Will any RTL-level passes, say for example, propagate a zero-constant
 into the pattern, without also simplifying it to a NOT pattern? (that's
 a question, I don't really know)
 
 No, they shouldn't, because only the simplified form is expected to match.
 E.g. forwprop goes to some lengths to do this properly.
 
 Personally, I haven't really seen many cases optimizable to NOR, but I
 think keeping the 'M' there is pretty harmless.
 
 That's not the way to look at it though.  There should be no
 unsimplified rtl in the instruction stream, just like there should
 be no non-canonical rtl.  We don't have rules for canonical rtl
 because the other forms are harmful (in the sense of generating wrong
 code or whatever).  We have them so that there aren't too many possible
 representations of the same thing, and so that code dealing with existing
 rtl patterns can validly expect one shape over another.  If we see
 something that's generating the wrong rtl form, we should fix it rather
 than pander to it.
 IMO having (and (not (const_int 0)) (not X)) as an alternative
 representation of (not X) falls directly into that category.

I'm not saying we tolerate wrong RTL form, but rather that, it should
be an issue of the RTL passes, not the backend. The backend should just
be as much as possible, a  machine description.

Saying you don't need to describe a particular thing in the backend
because GCC doesn't need that sounds like a spill-over of internal
details. In theory, a backend writer shouldn't need to care for such things.

 +;; Integer comparisons
 +
 +(define_code_iterator EQNE [eq ne])
 +(define_insn nios2_cmpcode
 +  [(set (match_operand:SI 0 register_operand   =r)
 +(EQNE:SI (match_operand:SI 1 reg_or_0_operand %rM)
 + (match_operand:SI 2 arith_operand rI)))]
 +  
 +  cmpcode%i2\\t%0, %z1, %z2
 +  [(set_attr type alu)])

 Once again, using reg_or_0 and M seems pointless.

 The compares don't support all operations, with only the second operand
 capable of an immediate. Using 'rM' should theoretically allow more
 commutative swapping.

 But rtl-wise, we should never generate an EQ or NE with two constants.
 And if one operand is constant then it's supposed to be the second.

 The % should give commutativity on its own, without the M.

 For EQ/NE I guess that's the case, for the other comparisons I'm not so
 sure; I'm not familiar enough with the details of the expander machinery
 to claim anything.
 
 Sure, but the point is that EQ and NE _are_ commutative, and rtl.texi
 says that:
 
   For commutative binary operations, constants should be placed in the
   second operand.
 
 So...
 
 If this doesn't carry to other comparisons, I intend to keep it in line
 with the other cmp patterns. I experimented a bit with the generated
 code, nothing is affected.
 
 ...no, it doesn't carry over to other comparisons, but it's not supposed to.
 Just like you can't add % to the other comparisons.
 
 The patterns are including constraints whose only purpose is to match
 non-canonical rtl.  That shouldn't happen.

My argument is the same as above.


Having that said, I'll edit the *norsi3 and EQNE patterns to remove the
'M' like you suggested in the final patch. This is too trivial an issue
to get postponed by.

 +  emit_insn
 +(gen_rtx_SET (Pmode, tmp,
 +  gen_int_mode (cfun-machine-save_regs_offset,
 +Pmode)));

 Shouldn't have a mode on the SET, but really should just call
 emit_move_insn. Similar cases elsewhere, search for gen_rtx_SET (Pmode.

 I've removed the modes

Re: [PATCH][3/3] Re-submission of Altera Nios II port, libgcc parts

2013-11-16 Thread Chung-Lin Tang
On 2013/7/14 03:55 PM, Chung-Lin Tang wrote:
 nios2 libgcc parts. Since the original post, the only main change has
 been the fdpbit vs soft-fp issue raised by Joseph, which has been
 resolved. Other parts are mostly the same.

The Nios II libgcc parts have been further updated to include a
sfp-machine.h file, and the Linux atomic cmpxchg updated to now use a
fixed address kernel helper cmpxchg routine, similar to ARM.

Thanks,
Chung-Lin

2013-11-16  Sandra Loosemore  san...@codesourcery.com
Chung-Lin Tang  clt...@codesourcery.com
Based on patches from Altera Corporation

* config.host (nios2-*-*,nios2-*-linux*): Add nios2 host cases.
* config/nios2/lib2-nios2.h: New file.
* config/nios2/lib2-divmod-hi.c: New file.
* config/nios2/linux-unwind.h: New file.
* config/nios2/lib2-divmod.c: New file.
* config/nios2/linux-atomic.c: New file.
* config/nios2/t-nios2: New file.
* config/nios2/crti.asm: New file.
* config/nios2/t-linux: New file.
* config/nios2/lib2-divtable.c: New file.
* config/nios2/lib2-mul.c: New file.
* config/nios2/tramp.c: New file.
* config/nios2/crtn.asm: New file.
* config/nios2/sfp-machine.h: New file.

Index: libgcc/config.host
===
--- libgcc/config.host	(revision 204897)
+++ libgcc/config.host	(working copy)
@@ -146,6 +146,9 @@ mips*-*-*)
 nds32*-*)
 	cpu_type=nds32
 	;;
+nios2*-*-*)
+	cpu_type=nios2
+	;;
 powerpc*-*-*)
 	cpu_type=rs6000
 	;;
@@ -876,6 +879,15 @@ nds32*-elf*)
 		;;
 	esac
 	;;
+nios2-*-linux*)
+	tmake_file=$tmake_file nios2/t-nios2 nios2/t-linux t-libgcc-pic t-slibgcc-libgcc
+	extra_parts=$extra_parts crti.o crtn.o
+	md_unwind_header=nios2/linux-unwind.h
+	;;
+nios2-*-*)
+	tmake_file=$tmake_file nios2/t-nios2 t-softfp-sfdf t-softfp-excl t-softfp
+	extra_parts=$extra_parts crti.o crtn.o
+	;;
 pdp11-*-*)
 	tmake_file=pdp11/t-pdp11 t-fdpbit
 	;;
Index: libgcc/config/nios2/t-linux
===
--- libgcc/config/nios2/t-linux	(revision 0)
+++ libgcc/config/nios2/t-linux	(revision 0)
@@ -0,0 +1,7 @@
+# Soft-float functions go in glibc only, to facilitate the possible
+# future addition of exception and rounding mode support integrated
+# with fenv.h.
+
+LIB2FUNCS_EXCLUDE = _floatdidf _floatdisf _fixunsdfsi _fixunssfsi \
+  _fixunsdfdi _fixdfdi _fixunssfdi _fixsfdi _floatundidf _floatundisf
+LIB2ADD += $(srcdir)/config/nios2/linux-atomic.c
Index: libgcc/config/nios2/sfp-machine.h
===
--- libgcc/config/nios2/sfp-machine.h	(revision 0)
+++ libgcc/config/nios2/sfp-machine.h	(revision 0)
@@ -0,0 +1,78 @@
+/* Soft-FP definitions for Altera Nios II.
+   Copyright (C) 2013 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+http://www.gnu.org/licenses/.  */
+
+#define _FP_W_TYPE_SIZE		32
+#define _FP_W_TYPE		unsigned long
+#define _FP_WS_TYPE		signed long
+#define _FP_I_TYPE		long
+
+#define _FP_MUL_MEAT_S(R,X,Y)\
+  _FP_MUL_MEAT_1_wide(_FP_WFRACBITS_S,R,X,Y,umul_ppmm)
+#define _FP_MUL_MEAT_D(R,X,Y)\
+  _FP_MUL_MEAT_2_wide(_FP_WFRACBITS_D,R,X,Y,umul_ppmm)
+#define _FP_MUL_MEAT_Q(R,X,Y)\
+  _FP_MUL_MEAT_4_wide(_FP_WFRACBITS_Q,R,X,Y,umul_ppmm)
+
+#define _FP_DIV_MEAT_S(R,X,Y)	_FP_DIV_MEAT_1_loop(S,R,X,Y)
+#define _FP_DIV_MEAT_D(R,X,Y)	_FP_DIV_MEAT_2_udiv(D,R,X,Y)
+#define _FP_DIV_MEAT_Q(R,X,Y)	_FP_DIV_MEAT_4_udiv(Q,R,X,Y)
+
+#define _FP_NANFRAC_S		((_FP_QNANBIT_S  1) - 1)
+#define _FP_NANFRAC_D		((_FP_QNANBIT_D  1) - 1), -1
+#define _FP_NANFRAC_Q		((_FP_QNANBIT_Q  1) - 1), -1, -1, -1
+#define _FP_NANSIGN_S		0
+#define _FP_NANSIGN_D		0
+#define _FP_NANSIGN_Q		0
+
+#define _FP_KEEPNANFRACP 1
+#define _FP_QNANNEGATEDP 0
+
+/* Someone please check this.  */
+#define _FP_CHOOSENAN(fs, wc, R, X, Y, OP)			\
+  do {\
+if ((_FP_FRAC_HIGH_RAW_##fs(X)  _FP_QNANBIT_##fs)		\
+	 !(_FP_FRAC_HIGH_RAW_##fs(Y)  _FP_QNANBIT_##fs))	\
+  {\
+	R##_s = Y##_s;		\
+	_FP_FRAC_COPY_##wc(R,Y);\
+  }\
+else

Re: [PATCH][2/3] Re-submission of Altera Nios II port, testsuite parts

2013-11-16 Thread Chung-Lin Tang
On 2013/10/17 10:20 PM, Bernd Schmidt wrote:
 On 07/14/2013 09:54 AM, Chung-Lin Tang wrote:
 These are nios2 patches for the gcc testsuite. Some new testcases were
 added since the last posting.
 
 Index: gcc/testsuite/gcc.c-torture/execute/builtins/lib/chk.c
 ===
 --- gcc/testsuite/gcc.c-torture/execute/builtins/lib/chk.c   (revision 
 200946)
 +++ gcc/testsuite/gcc.c-torture/execute/builtins/lib/chk.c   (working copy)
 @@ -124,16 +124,17 @@ __memmove_chk (void *dst, const void *src, __SIZE_
  void *
  memset (void *dst, int c, __SIZE_TYPE__ n)
  {
 +  while (n-- != 0)
 +n[(char *) dst] = c;
 +
/* Single-byte memsets should be done inline when optimisation
 - is enabled.  */
 + is enabled.  Do this after the copy in case we're being called to
 + initialize bss.  */
  #ifdef __OPTIMIZE__
if (memset_disallowed  inside_main  n  2)
  abort ();
  #endif
  
 -  while (n-- != 0)
 -n[(char *) dst] = c;
 -
return dst;
  }
 
 I'm not sure I understand this change. Is nios2 the only target calling
 memset to initialize bss, and memset_disallowed is nonzero at the start
 of execution?

This appears to be for the nios2-elf bare metal testing. Looking at the
upstream libgloss sources, nios2 is indeed not the only target that
calls memset for zeroing bss.

Note that however, in a somewhat reverse of situation:
https://sourceware.org/ml/newlib/2013/msg00264.html

It appears that due to the presumed usage model for Nios II, Sandra did
not contribute the libgloss port. So the original code that needed this
testsuite change is probably not there.

OTOH, if this change is not deemed harmful, than it might further
robustify the testsuite.

 Index: gcc/testsuite/gcc.target/nios2/nios2-int-types.c
 ===
 --- gcc/testsuite/gcc.target/nios2/nios2-int-types.c (revision 0)
 +++ gcc/testsuite/gcc.target/nios2/nios2-int-types.c (revision 0)
 @@ -0,0 +1,34 @@
 +/* Test that various types are all derived from int.  */
 +/* { dg-do compile { target nios2-*-* } } */
 
 I think you can lose the { target nios2-*-* } for everything inside
 gcc.target/nios2.

Done.

The new attached patch also has the Dxx constraint test removed, as that
feature is now removed from the compiler. The memset() change mentioned
above is still in the patch, but will remove before committing if not
approved.

Thanks,
Chung-Lin

2013-11-16  Sandra Loosemore  san...@codesourcery.com
Chung-Lin Tang  clt...@codesourcery.com
Based on patches from Altera Corporation

* gcc.dg/stack-usage-1.c (SIZE): Define case for __nios2__.
* gcc.dg/20040813-1.c: Skip for nios2-*-*.
* gcc.dg/20020312-2.c: Add __nios2__ case.
* g++.dg/other/PR23205.C: Skip for nios2-*-*.
* g++.dg/other/pr23205-2.C: Skip for nios2-*-*.
* g++.dg/cpp0x/constexpr-rom.C: Skip for nios2-*-*.
* g++.dg/cpp0x/alias-decl-debug-0.C: Skip for nios2-*-*.
* g++.old-deja/g++.jason/thunk3.C: Skip for nios2-*-*.
* lib/target-supports.exp (check_profiling_available): Check for
nios2-*-elf.
* gcc.c-torture/execute/pr47237.x:: Skip for nios2-*-*.
* gcc.c-torture/execute/20101011-1.c: Skip for nios2-*-*.
* gcc.c-torture/execute/builtins/lib/chk.c (memset): Place
char-based memset loop before inline check, to prevent
problems when called to initialize .bss. Update comments.
* gcc.target/nios2/nios2.exp: New DejaGNU file.
* gcc.target/nios2/nios2-custom-1.c: New test.
* gcc.target/nios2/nios2-trap-insn.c: New test.
* gcc.target/nios2/nios2-builtin-custom.c: New test.
* gcc.target/nios2/nios2-builtin-io.c: New test.
* gcc.target/nios2/nios2-stack-check-1.c: New test.
* gcc.target/nios2/nios2-stack-check-2.c: New test.
* gcc.target/nios2/nios2-rdctl.c: New test.
* gcc.target/nios2/nios2-wrctl.c: New test.
* gcc.target/nios2/nios2-wrctl-zero.c: New test.
* gcc.target/nios2/nios2-wrctl-not-zero.c: New test.
* gcc.target/nios2/nios2-rdwrctl-1.c: New test.
* gcc.target/nios2/nios2-ashlsi3-one_shift.c: New test.
* gcc.target/nios2/nios2-mul-options-1.c: New test.
* gcc.target/nios2/nios2-mul-options-2.c: New test.
* gcc.target/nios2/nios2-mul-options-3.c: New test.
* gcc.target/nios2/nios2-mul-options-4.c: New test.
* gcc.target/nios2/nios2-nor.c: New test.
* gcc.target/nios2/nios2-stxio.c: New test.
* gcc.target/nios2/custom-fp-1.c: New test.
* gcc.target/nios2/custom-fp-2.c: New test.
* gcc.target/nios2/custom-fp-3.c: New test.
* gcc.target/nios2/custom-fp-4.c: New test.
* gcc.target/nios2/custom-fp-5.c: New test.
* gcc.target/nios2/custom-fp-6.c: New test.
* gcc.target/nios2/custom-fp-7.c: New test.
* gcc.target

Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (2).

2013-10-06 Thread Chung-Lin Tang
On 2013/10/6 05:57 PM, Richard Sandiford wrote:
  But case 16 is different.
  This case is only produced at prologue/epilogue phase, using a temporary
  register $r15 to hold a large constant for adjusting stack pointer. 
  Since prologue/epilogue is after split1/split2 phase, we can only
  output sethi + ori directly.
  (The addi instruction with $r15 is a 32-bit instruction.)
 But this code is in the output template of the define_insn.  That code
 is only executed during final, after all passes have been run.  If the
 template returns #, final will split the instruction itself, which is
 possible even at that late stage.  # doesn't have any effect on the
 passes themselves.
 
 (FWIW, there's also a split3 pass that runs after prologue/epilogue
 generation but before sched2.)
 
 However, ISTR there is/was a rule that prologue instructions shouldn't
 be split, since they'd lose their RTX_FRAME_RELATED_P bit or something.
 Maybe you hit an ICE because of that?
 
 Another way to handle this would be to have the movsi expander split
 large constant moves.  When can_create_pseudo_p (), the intermediate
 results can be stored in new registers, otherwise they should reuse
 operands[0].  Two advantages to doing it that way are that high parts
 can be shared before RA, and that calls to emit_move_insn from the
 prologue code will split the move automatically.  I think many ports
 do it that way (including MIPS FWIW).

FWIW, most ports usually just handle such large adjustment cases in
the prologue/epilogue code manually; either multiple SP-adjustments, or
use of a temp register (better control of RTX_FRAME_RELATED_P anyways).
You might be able to get it to work, but trying to rely on the splitter
does not seem like best practice...

Chung-Lin



Re: [PING] Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts

2013-09-02 Thread Chung-Lin Tang
Ping.

On 13/8/20 10:57 AM, Chung-Lin Tang wrote:
 Ping.
 
 BTW, the SC has approved the Nios II port already:
 http://gcc.gnu.org/ml/gcc/2013-07/msg00434.html
 
 The port is still awaiting technical review.
 
 Thanks,
 Chung-Lin
 
 On 13/7/14 下午3:54, Chung-Lin Tang wrote:
 Hi, the last ping of the Nios II patches was:
 http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01416.html

 After assessing the state, we feel it would be better to post a
 re-submission of the newest patches.

 The changes accumulated since the original post include:

 1) Several bug fixes related to built-in function expanding.
 2) A few holes in hard-float FPU code generation was plugged.
 3) Support for parsing white-spaces in target attributes.
 4) Revision of consistency check behavior of codes in custom instruction
 built-ins.
 5) Some new testcases.

 The issues raised by Joseph in the first round of reviewing have been
 addressed. Testing has been re-done on both 32-bit and 64-bit hosts.

 PR55035 appears to not have been resolved yet, which affects nios2 among
 several other targets, thus configured with --enable-werror-always still
 does not build.

 As before, Sandra and me will serve as nios2 port maintainers.

 Attached is the patch for the compiler-proper.

 Thanks,
 Chung-Lin

 2013-07-14  Chung-Lin Tang  clt...@codesourcery.com
 Sandra Loosemore  san...@codesourcery.com
 Based on patches from Altera Corporation

 * config.gcc (nios2-*-*): Add nios2 config targets.
 * configure.ac (TLS_SECTION_ASM_FLAG): Add nios2 case.
 ($cpu_type): Add nios2 as new cpu type.
 * configure: Regenerate.
 * config/nios2/nios2.c: New file.
 * config/nios2/nios2.h: New file.
 * config/nios2/nios2-opts.h: New file.
 * config/nios2/nios2-protos.h: New file.
 * config/nios2/elf.h: New file.
 * config/nios2/elf.opt: New file.
 * config/nios2/linux.h: New file.
 * config/nios2/nios2.opt: New file.
 * config/nios2/nios2.md: New file.
 * config/nios2/predicates.md: New file.
 * config/nios2/constraints.md: New file.
 * config/nios2/t-nios2: New file.
 * common/config/nios2/nios2-common.c: New file.
 * doc/invoke.texi (Nios II options): Document Nios II specific
 options.
 * doc/md.texi (Nios II family): Document Nios II specific
 constraints.
 * doc/extend.texi (Function Specific Option Pragmas): Document
 Nios II supported target pragma functionality.

 



[PING] Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts

2013-08-19 Thread Chung-Lin Tang
Ping.

BTW, the SC has approved the Nios II port already:
http://gcc.gnu.org/ml/gcc/2013-07/msg00434.html

The port is still awaiting technical review.

Thanks,
Chung-Lin

On 13/7/14 下午3:54, Chung-Lin Tang wrote:
 Hi, the last ping of the Nios II patches was:
 http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01416.html
 
 After assessing the state, we feel it would be better to post a
 re-submission of the newest patches.
 
 The changes accumulated since the original post include:
 
 1) Several bug fixes related to built-in function expanding.
 2) A few holes in hard-float FPU code generation was plugged.
 3) Support for parsing white-spaces in target attributes.
 4) Revision of consistency check behavior of codes in custom instruction
 built-ins.
 5) Some new testcases.
 
 The issues raised by Joseph in the first round of reviewing have been
 addressed. Testing has been re-done on both 32-bit and 64-bit hosts.
 
 PR55035 appears to not have been resolved yet, which affects nios2 among
 several other targets, thus configured with --enable-werror-always still
 does not build.
 
 As before, Sandra and me will serve as nios2 port maintainers.
 
 Attached is the patch for the compiler-proper.
 
 Thanks,
 Chung-Lin
 
 2013-07-14  Chung-Lin Tang  clt...@codesourcery.com
 Sandra Loosemore  san...@codesourcery.com
 Based on patches from Altera Corporation
 
 * config.gcc (nios2-*-*): Add nios2 config targets.
 * configure.ac (TLS_SECTION_ASM_FLAG): Add nios2 case.
 ($cpu_type): Add nios2 as new cpu type.
 * configure: Regenerate.
 * config/nios2/nios2.c: New file.
 * config/nios2/nios2.h: New file.
 * config/nios2/nios2-opts.h: New file.
 * config/nios2/nios2-protos.h: New file.
 * config/nios2/elf.h: New file.
 * config/nios2/elf.opt: New file.
 * config/nios2/linux.h: New file.
 * config/nios2/nios2.opt: New file.
 * config/nios2/nios2.md: New file.
 * config/nios2/predicates.md: New file.
 * config/nios2/constraints.md: New file.
 * config/nios2/t-nios2: New file.
 * common/config/nios2/nios2-common.c: New file.
 * doc/invoke.texi (Nios II options): Document Nios II specific
 options.
 * doc/md.texi (Nios II family): Document Nios II specific
 constraints.
 * doc/extend.texi (Function Specific Option Pragmas): Document
 Nios II supported target pragma functionality.
 



Re: [PATCH] PR32219, weak hidden reference segfault [PING^2]

2013-08-05 Thread Chung-Lin Tang
On 13/8/5 10:06 PM, Mike Stump wrote:
 On Aug 4, 2013, at 8:14 AM, Chung-Lin Tang clt...@codesourcery.com wrote:
 On 13/7/15 1:43 AM, Diego Novillo wrote:
 Could you please repost the patch with its description?  This thread
 is sufficiently old and noisy that I'm not even sure what the patch
 does nor why.

 Taking the same example in my first post:

  Under -fPIC, the code in rtlanal.c:nonzero_address_p() does not properly
 recognize the PIC-reg + constant form of load as a weak symbol; it
 returns 'true' immediately after seeing the pic-reg indexing, and does
 not test the wrapped symbol for DECL_WEAK.
 
 So, I can't help but think that others would say that looking into an unspec 
 is by nature, the wrong way to do it, unless that code is in the port.
 
 I think the followup from Bernhard points to a better solution, though the 
 wording in the comment was objectionable.  Merely say that the symbol, if 
 weak and not defined, is then not local.

When I last tested that patch which moves the DECL_WEAK check, the
testcases for C++ TLS wrappers fail. I don't remember the fine details,
but effectively it filters out the TLS wrappers from being treated
locally, causing them to be called through @PLT, and regressing on some
tests specifically checking for that...

The UNSPEC interpretation here is fairly restricted, FWIW. Earlier talk
on this thread also mentioned that maybe specific RTL constructs for
reasoning about PIC addresses should be introduced, rather than common
idiomatic pattern, though that may be a long shot for now.

Chung-Lin



Re: [PATCH] PR32219, weak hidden reference segfault [PING^2]

2013-08-05 Thread Chung-Lin Tang
On 13/8/5 下午10:24, Mike Stump wrote:
 On Aug 5, 2013, at 7:15 AM, Chung-Lin Tang clt...@codesourcery.com wrote:
 On 13/8/5 10:06 PM, Mike Stump wrote:
 On Aug 4, 2013, at 8:14 AM, Chung-Lin Tang clt...@codesourcery.com wrote:
 On 13/7/15 1:43 AM, Diego Novillo wrote:
 Could you please repost the patch with its description?  This thread
 is sufficiently old and noisy that I'm not even sure what the patch
 does nor why.

 Taking the same example in my first post:

 Under -fPIC, the code in rtlanal.c:nonzero_address_p() does not properly
 recognize the PIC-reg + constant form of load as a weak symbol; it
 returns 'true' immediately after seeing the pic-reg indexing, and does
 not test the wrapped symbol for DECL_WEAK.

 So, I can't help but think that others would say that looking into an 
 unspec is by nature, the wrong way to do it, unless that code is in the 
 port.

 I think the followup from Bernhard points to a better solution, though the 
 wording in the comment was objectionable.  Merely say that the symbol, if 
 weak and not defined, is then not local.

 When I last tested that patch which moves the DECL_WEAK check, the
 testcases for C++ TLS wrappers fail. I don't remember the fine details,
 but effectively it filters out the TLS wrappers from being treated
 locally, causing them to be called through @PLT, and regressing on some
 tests specifically checking for that…
 
 Hum…  I wonder if there is a TLS predicate one can mix in to the check that 
 is appropriate.
 
 The UNSPEC interpretation here is fairly restricted, FWIW. Earlier talk
 on this thread also mentioned that maybe specific RTL constructs for
 reasoning about PIC addresses should be introduced, rather than common
 idiomatic pattern, though that may be a long shot for now.
 
 specifying the unspecified, make is specified, and calling it unspec, would 
 seem to be wrong.
 
 The right approach, long term, is to have address forms all specified and 
 documented and a port merely can use the forms they are interested in.  pic 
 is one of those things that should be bumped up, unspec is kinda silly.

Yes, that's what I meant. e.g. instead of using (const (unspec...)), new
RTL operators for specifying the common forms of relocations (including
those used PIC) should be defined; this will involve changing lots of
backends, so should be aimed in the long term.

Chung-Lin






Re: [PATCH] PR32219, weak hidden reference segfault [PING^2]

2013-08-04 Thread Chung-Lin Tang
On 13/7/15 1:43 AM, Diego Novillo wrote:
 Could you please repost the patch with its description?  This thread
 is sufficiently old and noisy that I'm not even sure what the patch
 does nor why.

Taking the same example in my first post:

extern void weakfun() __attribute__((weak,visibility(hidden)));
void foo ()
{
  if (weakfun) weakfun();
}

Under -O1 -m32 -fPIC, the address load and test will look like:

(insn 5 2 6 2 (set (reg/f:SI 60)
   (plus:SI (reg:SI 3 bx)
  (const:SI (unspec:SI [
(symbol_ref/i:SI (f) [flags 0x43]  function_decl f)
] UNSPEC_GOTOFF {*leasi}
 (expr_list:REG_EQUAL (symbol_ref/i:SI (f) function_decl f)
(nil)))

(insn 6 5 7 2 (set (reg:CCZ 17 flags)
(compare:CCZ (reg/f:SI 60)
(const_int 0 [0]))) p.c:8 3 {*cmpsi_ccno_1}
 (nil))

(jump_insn 7 6 8 2 ...


Under -fPIC, the code in rtlanal.c:nonzero_address_p() does not properly
recognize the PIC-reg + constant form of load as a weak symbol; it
returns 'true' immediately after seeing the pic-reg indexing, and does
not test the wrapped symbol for DECL_WEAK.

My patch slightly modifies the test to look into the wrapped symbol when
seeing a PIC-reg + unspec-constant form, which I believe has become
the idiomatic form in GCC for such PIC addresses. Richard Sandiford's
concerns were that, UNSPEC really is unspecified, and this might be
overassuming its semantics, plus some targets may not be really
following the idiomatic use.

My final take at the time was, for targets that do follow the common
PIC-reg+const-unspec form, this patch solves the problem, while for
other targets, nothing is changed.

I have re-tested the patch on i686-linux, with clean results. Please see
if this patch can be accepted into trunk (patch attached again for
convenience).

Thanks,
Chung-Lin

2013-08-04  Chung-Lin Tang  clt...@codesourcery.com

PR target/32219
* rtlanal.c (nonzero_address_p): Robustify checking by look
recursively into PIC constant offsets and (CONST (UNSPEC ...))
expressions.

Index: rtlanal.c
===
--- rtlanal.c   (revision 201473)
+++ rtlanal.c   (working copy)
@@ -393,7 +393,15 @@ nonzero_address_p (const_rtx x)
   /* Handle PIC references.  */
   if (XEXP (x, 0) == pic_offset_table_rtx
CONSTANT_P (XEXP (x, 1)))
-   return true;
+   {
+ rtx offset = XEXP (x, 1);
+ if (GET_CODE (offset) == CONST
+  GET_CODE (XEXP (offset, 0)) == UNSPEC
+  GET_CODE (XVECEXP (XEXP (offset, 0), 0, 0)) == SYMBOL_REF)
+   return nonzero_address_p (XVECEXP (XEXP (offset, 0), 0, 0));
+
+ return true;
+   }
   return false;
 
 case PRE_MODIFY:
Index: testsuite/gcc.dg/torture/pr32219.c
===
--- testsuite/gcc.dg/torture/pr32219.c  (revision 0)
+++ testsuite/gcc.dg/torture/pr32219.c  (revision 0)
@@ -0,0 +1,12 @@
+/* PR target/32219 */
+/* { dg-do run } */
+/* { dg-require-visibility  } */
+/* { dg-options -fPIC { target fpic } } */
+
+extern void f() __attribute__((weak,visibility(hidden)));
+int main()
+{
+  if (f)
+f();
+  return 0;
+}


Re: [PATCH] PR32219, weak hidden reference segfault [PING^2]

2013-08-01 Thread Chung-Lin Tang
On 13/8/1 5:16 PM, Bernhard Reutner-Fischer wrote:
 On 14 July 2013 19:43, Diego Novillo dnovi...@google.com wrote:
 On Sun, Jul 14, 2013 at 2:08 AM, Chung-Lin Tang clt...@codesourcery.com 
 wrote:
 Ping.

 Could you please repost the patch with its description?  This thread
 is sufficiently old and noisy that I'm not even sure what the patch
 does nor why.
 
 Chung-Lin Tang, can you regtest and repost the patch please?
 
 TIA,

I'll re-explain the patch later as Diego has requested, maybe this weekend.

Thanks,
Chung-Lin



 Thanks.  Diego.



Altera Nios II port submission

2013-07-22 Thread Chung-Lin Tang
To the GCC Steering Committee,

Mentor Graphics has submitted, and recently re-submitted an updated
version, of a GCC backend port for the Altera Nios II architecture,
currently on gcc-patches awaiting technical review [1].

We're proposing, upon port approval and commit to trunk, Sandra
Loosemore and myself (Chung-Lin Tang), both of Mentor Graphics, as
target maintainers.

Thank you,
Chung-Lin

[1] http://gcc.gnu.org/ml/gcc-patches/2013-07/msg00526.html
http://gcc.gnu.org/ml/gcc-patches/2013-07/msg00527.html
http://gcc.gnu.org/ml/gcc-patches/2013-07/msg00528.html


[PATCH][2/3] Re-submission of Altera Nios II port, testsuite parts

2013-07-14 Thread Chung-Lin Tang
These are nios2 patches for the gcc testsuite. Some new testcases were
added since the last posting.

Chung-Lin

2013-07-14  Sandra Loosemore  san...@codesourcery.com
Chung-Lin Tang  clt...@codesourcery.com
Based on patches from Altera Corporation

* gcc.dg/stack-usage-1.c (SIZE): Define case for __nios2__.
* gcc.dg/20040813-1.c: Skip for nios2-*-*.
* gcc.dg/20020312-2.c: Add __nios2__ case.
* g++.dg/other/PR23205.C: Skip for nios2-*-*.
* g++.dg/other/pr23205-2.C: Skip for nios2-*-*.
* g++.dg/cpp0x/constexpr-rom.C: Skip for nios2-*-*.
* g++.dg/cpp0x/alias-decl-debug-0.C: Skip for nios2-*-*.
* g++.old-deja/g++.jason/thunk3.C: Skip for nios2-*-*.
* lib/target-supports.exp (check_profiling_available): Check for
nios2-*-elf.
* gcc.c-torture/execute/pr47237.x:: Skip for nios2-*-*.
* gcc.c-torture/execute/20101011-1.c: Skip for nios2-*-*.
* gcc.c-torture/execute/builtins/lib/chk.c (memset): Place
char-based memset loop before inline check, to prevent
problems when called to initialize .bss. Update comments.
* gcc.target/nios2/nios2.exp: New DejaGNU file.
* gcc.target/nios2/nios2-custom-1.c: New test.
* gcc.target/nios2/nios2-trap-insn.c: New test.
* gcc.target/nios2/nios2-builtin-custom.c: New test.
* gcc.target/nios2/nios2-builtin-io.c: New test.
* gcc.target/nios2/nios2-stack-check-1.c: New test.
* gcc.target/nios2/nios2-stack-check-2.c: New test.
* gcc.target/nios2/nios2-rdctl.c: New test.
* gcc.target/nios2/nios2-wrctl.c: New test.
* gcc.target/nios2/nios2-wrctl-zero.c: New test.
* gcc.target/nios2/nios2-wrctl-not-zero.c: New test.
* gcc.target/nios2/nios2-rdwrctl-1.c: New test.
* gcc.target/nios2/nios2-reg-constraints.c: New test.
* gcc.target/nios2/nios2-ashlsi3-one_shift.c: New test.
* gcc.target/nios2/nios2-mul-options-1.c: New test.
* gcc.target/nios2/nios2-mul-options-2.c: New test.
* gcc.target/nios2/nios2-mul-options-3.c: New test.
* gcc.target/nios2/nios2-mul-options-4.c: New test.
* gcc.target/nios2/nios2-nor.c: New test.
* gcc.target/nios2/nios2-stxio.c: New test.
* gcc.target/nios2/custom-fp-1.c: New test.
* gcc.target/nios2/custom-fp-2.c: New test.
* gcc.target/nios2/custom-fp-3.c: New test.
* gcc.target/nios2/custom-fp-4.c: New test.
* gcc.target/nios2/custom-fp-5.c: New test.
* gcc.target/nios2/custom-fp-6.c: New test.
* gcc.target/nios2/custom-fp-7.c: New test.
* gcc.target/nios2/custom-fp-8.c: New test.
* gcc.target/nios2/custom-fp-cmp-1.c: New test.
* gcc.target/nios2/custom-fp-conversion.c: New test.
* gcc.target/nios2/custom-fp-double.c: New test.
* gcc.target/nios2/custom-fp-float.c: New test.
* gcc.target/nios2/nios2-int-types.c: New test.
* gcc.target/nios2/nios2-cache-1.c: New test.
* gcc.target/nios2/nios2-cache-2.c: New test.
Index: gcc/testsuite/g++.old-deja/g++.jason/thunk3.C
===
--- gcc/testsuite/g++.old-deja/g++.jason/thunk3.C   (revision 200946)
+++ gcc/testsuite/g++.old-deja/g++.jason/thunk3.C   (working copy)
@@ -1,5 +1,5 @@
 // { dg-do run }
-// { dg-skip-if fails with generic thunk support { rs6000-*-* powerpc-*-eabi 
v850-*-* sh-*-* sh64-*-* h8*-*-* xtensa*-*-* m32r*-*-* lm32-*-* } { * } {  
} }
+// { dg-skip-if fails with generic thunk support { rs6000-*-* powerpc-*-eabi 
v850-*-* sh-*-* sh64-*-* h8*-*-* xtensa*-*-* m32r*-*-* lm32-*-* nios2-*-* } { 
* } {  } }
 // Test that variadic function calls using thunks work right.
 // Note that this will break on any target that uses the generic thunk
 //  support, because it doesn't support variadic functions.
Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp   (revision 200946)
+++ gcc/testsuite/lib/target-supports.exp   (working copy)
@@ -529,6 +529,7 @@ proc check_profiling_available { test_what } {
 || [istarget mmix-*-*]
 || [istarget mn10300-*-elf*]
 || [istarget moxie-*-elf*]
+|| [istarget nios2-*-elf]
 || [istarget picochip-*-*]
 || [istarget powerpc-*-eabi*]
 || [istarget powerpc-*-elf]
Index: gcc/testsuite/gcc.c-torture/execute/20101011-1.c
===
--- gcc/testsuite/gcc.c-torture/execute/20101011-1.c(revision 200946)
+++ gcc/testsuite/gcc.c-torture/execute/20101011-1.c(working copy)
@@ -61,6 +61,10 @@ __aeabi_idiv0 (int return_value)
 }
 #  define DO_TEST 1
 # endif
+#elif defined (__nios2__)
+  /* Nios II requires both hardware support and user configuration

[PATCH][3/3] Re-submission of Altera Nios II port, libgcc parts

2013-07-14 Thread Chung-Lin Tang
nios2 libgcc parts. Since the original post, the only main change has
been the fdpbit vs soft-fp issue raised by Joseph, which has been
resolved. Other parts are mostly the same.

Thanks,
Chung-Lin

2013-07-14  Sandra Loosemore  san...@codesourcery.com
Chung-Lin Tang  clt...@codesourcery.com
Based on patches from Altera Corporation

* config.host (nios2-*-*,nios2-*-linux*): Add nios2 host cases.
* config/nios2/lib2-nios2.h: New file.
* config/nios2/lib2-divmod-hi.c: New file.
* config/nios2/linux-unwind.h: New file.
* config/nios2/lib2-divmod.c: New file.
* config/nios2/linux-atomic.c: New file.
* config/nios2/t-nios2: New file.
* config/nios2/crti.asm: New file.
* config/nios2/t-linux: New file.
* config/nios2/lib2-divtable.c: New file.
* config/nios2/lib2-mul.c: New file.
* config/nios2/tramp.c: New file.
* config/nios2/crtn.asm: New file.
Index: libgcc/config/nios2/lib2-nios2.h
===
--- libgcc/config/nios2/lib2-nios2.h(revision 0)
+++ libgcc/config/nios2/lib2-nios2.h(revision 0)
@@ -0,0 +1,49 @@
+/* Integer arithmetic support for Altera Nios II.
+   
+   Copyright (C) 2012-2013 Free Software Foundation, Inc.
+   Contributed by Altera and Mentor Graphics, Inc.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+   
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+   
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+   
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   http://www.gnu.org/licenses/. */
+
+#ifndef LIB2_NIOS2_H
+#define LIB2_NIOS2_H
+
+/* Types.  */
+
+typedef char QItype __attribute__ ((mode (QI)));
+typedef unsigned char UQItype __attribute__ ((mode (QI)));
+typedef short HItype __attribute__ ((mode (HI)));
+typedef unsigned short UHItype __attribute__ ((mode (HI)));
+typedef int SItype __attribute__ ((mode (SI)));
+typedef unsigned int USItype __attribute__ ((mode (SI)));
+typedef int word_type __attribute__ ((mode (__word__)));
+
+/* Exported functions.  */
+extern SItype __divsi3 (SItype, SItype);
+extern SItype __modsi3 (SItype, SItype);
+extern SItype __udivsi3 (SItype, SItype);
+extern SItype __umodsi3 (SItype, SItype);
+extern HItype __divhi3 (HItype, HItype);
+extern HItype __modhi3 (HItype, HItype);
+extern UHItype __udivhi3 (UHItype, UHItype);
+extern UHItype __umodhi3 (UHItype, UHItype);
+extern SItype __mulsi3 (SItype, SItype);
+
+#endif /* LIB2_NIOS2_H */
Index: libgcc/config/nios2/crtn.S
===
--- libgcc/config/nios2/crtn.S  (revision 0)
+++ libgcc/config/nios2/crtn.S  (revision 0)
@@ -0,0 +1,60 @@
+/* Copyright (C) 2012-2013 Free Software Foundation, Inc.
+   Contributed by Jonah Graham (jgra...@altera.com).
+   Contributed by Mentor Graphics, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+http://www.gnu.org/licenses/.  */
+
+
+/* This file just makes sure that the .fini and .init sections do in
+fact return.  Users may put any desired instructions in those sections.
+This file is the last thing linked into any executable.
+*/ 
+   .file   crtn.asm
+
+
+
+   .section.init
+   ldw ra, 44(sp)
+   ldw r23, 40(sp)
+   ldw r22, 36(sp)
+   ldw r21, 32(sp)
+   ldw r20, 28(sp)
+   ldw r19, 24(sp)
+   ldw r18, 20(sp)
+   ldw r17, 16(sp

[PATCH] Hexadecimal numbers in option arguments

2013-07-14 Thread Chung-Lin Tang
Original patch posted as part of Nios II patches:
http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01087.html

This patch is to allow hexadecimal numbers to be used in option
arguments, e.g. -falign-loops=0x10 can now be used as equivalent to
-falign-loops=16.

Joseph, the patch has been modified to use IXDIGIT to check the argument
string first, as you suggested in the last submission. Is this okay for
trunk?

Thanks,
Chung-Lin

2013-07-14  Chung-Lin Tang  clt...@codesourcery.com

* opts-common.c (integral_argument): Add support for hexadecimal
command option integer arguments. Update comments.
Index: opts-common.c
===
--- opts-common.c	(revision 200946)
+++ opts-common.c	(working copy)
@@ -147,7 +147,7 @@ find_opt (const char *input, unsigned int lang_mas
   return match_wrong_lang;
 }
 
-/* If ARG is a non-negative integer made up solely of digits, return its
+/* If ARG is a non-negative decimal or hexadecimal integer, return its
value, otherwise return -1.  */
 
 int
@@ -161,6 +161,17 @@ integral_argument (const char *arg)
   if (*p == '\0')
 return atoi (arg);
 
+  /* It wasn't a decimal number - try hexadecimal.  */
+  if (arg[0] == '0'  (arg[1] == 'x' || arg[1] == 'X'))
+{
+  p = arg + 2;
+  while (*p  ISXDIGIT (*p))
+	p++;
+
+  if (*p == '\0')
+	return strtol (arg, NULL, 16);
+}
+
   return -1;
 }
 


Re: [PATCH] PR32219, weak hidden reference segfault [PING^2]

2013-07-14 Thread Chung-Lin Tang
Ping.

On 2013/6/20 03:01 PM, Chung-Lin Tang wrote:
 Ping again?
 
 On 13/6/11 5:20 PM, Bernhard Reutner-Fischer wrote:
 ping, CCing middle-end maintainers for review.

 On 31 May 2013 10:13, Chung-Lin Tang clt...@codesourcery.com wrote:
 On 13/5/15 8:12 PM, Richard Sandiford wrote:
 Chung-Lin Tang clt...@codesourcery.com writes:
 On 13/5/10 6:37 PM, Richard Sandiford wrote:
 Chung-Lin Tang clt...@codesourcery.com writes:
 +case UNSPEC:
 +  /* Reach for a contained symbol.  */
 +  return nonzero_address_p (XVECEXP (x, 0, 0));

 I don't think this is safe.  UNSPECs really are unspecified :-),
 so we can't assume that (unspec X) is nonzero simply because X is.

 Attached is a modified patch (not yet tested but just for demonstration)
 with a more specific test, hopefully regarded as more safe.

 The point is in recognizing (const (unspec [symbol] XYZ)) offsets in PIC
 references, which seems quite idiomatic across all targets by now.

 I agree this is safer.  However, there used to be some ports that
 use (plus pic_offset_table_rtx (symbol_ref X)) -- i.e. without an unspec --
 to represent a PIC reference to X.  That always seemed semantically wrong,
 since you're not actually adding the address of X and the PIC register,
 but the patch wouldn't handle that case correctly.

 Well I can't help those targets then, but at least nothing will be
 changed for them by this patch. It will just continue to return 'true'.

 An alternative might be to remove the pic_offset_table_rtx case altogether
 and rely on targetm.delegitimize_address instead.  FWIW, I'd prefer that
 if it works, but it's not me you need to convince. :-)

 Like we discussed below, I think the direction should be towards making
 things more machine-independent, rather then pushing more into the backend.

 I would suggest that this probably means there should be a new, more
 specific construct in RTL to represent relocation values of this kind,
 instead of (const (unspec)) serving an unofficial role; possibly some
 real support for reasoning about PIC references could also be considered.

 Yeah, maybe we could try to introduce some target-independent knowledge
 of certain reloc types, a bit like the generic BFD_RELOC_*s in bfd.


 FWIW, I've ran tests on the newer patch on i686-linux, with no
 regressions. Testcase has been moved to gcc.dg/torture by recommendation
 of Bernhard. If any of the RTL maintainers can give an eye of merciful
 approval, this old PR could be resolved :)

 Thanks,
 Chung-Lin

 



[PING^3] Nios II port

2013-06-25 Thread Chung-Lin Tang
Ping x 3.

On 13/6/18 下午4:38, Chung-Lin Tang wrote:
 On 13/6/18 上午3:05, Sandra Loosemore wrote:
  Ping?  I think these are the most recent versions of the unreviewed
  patches in the series:
  
  http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00287.html
  http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00760.html
  http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01085.html



Re: Question about conds attribute for *thumb2_alusi3_short

2013-06-24 Thread Chung-Lin Tang
On 13/6/24 下午11:43, Tom de Vries wrote:
 Richard,
 
 I've noticed that f.i. *thumb2_alusi3_short has no explicit setting of the 
 conds
 attribute, which means the value of the conds attribute for this insn is 
 nocond:
 ...
 (define_insn *thumb2_alusi3_short
   [(set (match_operand:SI  0 s_register_operand =l)
 (match_operator:SI 3 thumb_16bit_operator
[(match_operand:SI 1 s_register_operand 0)
 (match_operand:SI 2 s_register_operand l)]))
(clobber (reg:CC CC_REGNUM))]
   TARGET_THUMB2  reload_completed
 GET_CODE(operands[3]) != PLUS
 GET_CODE(operands[3]) != MINUS
   %I3%!\\t%0, %1, %2
   [(set_attr predicable yes)
(set_attr length 2)]
 )
 ...
 
 AFAIU, this insn is either:
 - conditional, and does not modify cc, or
 - unconditional, and sets cc.
 So the clobber of CC in the RTL conservatively describes both cases.
 
 It seems to me the logical conds setting for the conditional case is nocond, 
 set
 (or perhaps clob) for the unconditional case. So, is this a more accurate 
 value
 of conds for this insn:
 ...
(set (attr conds)
 (if_then_else
   (match_test GET_CODE (PATTERN (insn)) == COND_EXEC)
   (const_string nocond)
   (const_string set)))]
 ...
 ?
 
 Is there a generic need to have this attribute accurate for all insns?

Following this thread that Tom pointed to me earlier in internal discussion:
http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00723.html

If the short-CC-clobbered form is selected very late now, I think this
pattern simply is (or should) not be used for the conditional (within
IT-block) case. It should simply be set to clob. Predicable might be
set to no as well...

Chung-Lin



Re: [PATCH] PR32219, weak hidden reference segfault [PING]

2013-06-20 Thread Chung-Lin Tang
Ping again?

On 13/6/11 5:20 PM, Bernhard Reutner-Fischer wrote:
 ping, CCing middle-end maintainers for review.
 
 On 31 May 2013 10:13, Chung-Lin Tang clt...@codesourcery.com wrote:
 On 13/5/15 8:12 PM, Richard Sandiford wrote:
 Chung-Lin Tang clt...@codesourcery.com writes:
 On 13/5/10 6:37 PM, Richard Sandiford wrote:
 Chung-Lin Tang clt...@codesourcery.com writes:
 +case UNSPEC:
 +  /* Reach for a contained symbol.  */
 +  return nonzero_address_p (XVECEXP (x, 0, 0));

 I don't think this is safe.  UNSPECs really are unspecified :-),
 so we can't assume that (unspec X) is nonzero simply because X is.

 Attached is a modified patch (not yet tested but just for demonstration)
 with a more specific test, hopefully regarded as more safe.

 The point is in recognizing (const (unspec [symbol] XYZ)) offsets in PIC
 references, which seems quite idiomatic across all targets by now.

 I agree this is safer.  However, there used to be some ports that
 use (plus pic_offset_table_rtx (symbol_ref X)) -- i.e. without an unspec --
 to represent a PIC reference to X.  That always seemed semantically wrong,
 since you're not actually adding the address of X and the PIC register,
 but the patch wouldn't handle that case correctly.

 Well I can't help those targets then, but at least nothing will be
 changed for them by this patch. It will just continue to return 'true'.

 An alternative might be to remove the pic_offset_table_rtx case altogether
 and rely on targetm.delegitimize_address instead.  FWIW, I'd prefer that
 if it works, but it's not me you need to convince. :-)

 Like we discussed below, I think the direction should be towards making
 things more machine-independent, rather then pushing more into the backend.

 I would suggest that this probably means there should be a new, more
 specific construct in RTL to represent relocation values of this kind,
 instead of (const (unspec)) serving an unofficial role; possibly some
 real support for reasoning about PIC references could also be considered.

 Yeah, maybe we could try to introduce some target-independent knowledge
 of certain reloc types, a bit like the generic BFD_RELOC_*s in bfd.


 FWIW, I've ran tests on the newer patch on i686-linux, with no
 regressions. Testcase has been moved to gcc.dg/torture by recommendation
 of Bernhard. If any of the RTL maintainers can give an eye of merciful
 approval, this old PR could be resolved :)

 Thanks,
 Chung-Lin




Re: [ping**2] Nios II port

2013-06-18 Thread Chung-Lin Tang
On 13/6/18 上午3:05, Sandra Loosemore wrote:
 Ping?  I think these are the most recent versions of the unreviewed
 patches in the series:
 
 http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00287.html
 http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00760.html
 http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01085.html
 
 There are also these two parts that have been reviewed already:
 http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01329.html  [approved but
 not applied yet?]

That Dwarf fix has already been applied.

 http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01087.html  [needs minor
 cleanup and separate consideration]

Yes, I forgot about this one. Need some cleanup to be more robust. Will
resubmit separately.

Chung-Lin



[PING] Re: [PATCH 0/5] Submission of Altera Nios II port

2013-06-05 Thread Chung-Lin Tang
Pinging the nios2 port.

On 13/5/15 1:04 AM, Chung-Lin Tang wrote:
 On 2013/4/26 04:35 AM, Joseph S. Myers wrote:
 I should ask the following general standard new-port questions:

 * Does the port build cleanly when configured with --enable-werror-always 
 and built using a native compiler from current GCC trunk - for both 32-bit 
 host, and 64-bit host?  It should.  This is the standard way of ensuring 
 the same level of warning-cleanness in a cross build as native bootstrap 
 automatically enforces (and the build compiler needs to be from current 
 trunk because warning-cleanness is only expected when the build compiler 
 is the same version as the compiler being built).

 The new targets should be added to contrib/config-list.mk, which helps 
 test all targets with --enable-werror-always.  This is mentioned in Back 
 End in sourcebuild.texi - check there for any other pieces that should be 
 included in the port submission.
 
 Currently, nios2 seems to be another affected target by PR 55035, when
 building with a recent trunk with --enable-werror-always:
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55035
 
 I would like to ask this --enable-werror-always requirement be relaxed
 for the nios2 port submission, as it is not alone in the above PR, and
 we are early in the release cycle; there should be plenty of time to fix
 any problems afterwards.
 
 Though not included in the submitted patches, I will add the target
 entries in contrib/config-list.mk when committing.
 
 * What are test results like for the port?  Again, both 32-bit and 64-bit 
 hosts, to detect any dependencies on whether the host is 32-bit or 64-bit.
 
 Tests of i686 and x86_64 Linux hosts show same test results. FAILs that
 still exist include g++.dg/debug/dwarf2/non-virtual-thunk.C, due to the
 lack of target specific MI-thunk hooks right now, and a few tree-ssa
 optimization FAILs, that might be worked around by augmenting the
 testcase. The port in whole should be fairly stable.
 
 Thanks,
 Chung-Lin
 
 



Re: [PATCH] PR32219, weak hidden reference segfault

2013-05-31 Thread Chung-Lin Tang
On 13/5/15 8:12 PM, Richard Sandiford wrote:
 Chung-Lin Tang clt...@codesourcery.com writes:
 On 13/5/10 6:37 PM, Richard Sandiford wrote:
 Chung-Lin Tang clt...@codesourcery.com writes:
 +case UNSPEC:
 +  /* Reach for a contained symbol.  */
 +  return nonzero_address_p (XVECEXP (x, 0, 0));

 I don't think this is safe.  UNSPECs really are unspecified :-),
 so we can't assume that (unspec X) is nonzero simply because X is.

 Attached is a modified patch (not yet tested but just for demonstration)
 with a more specific test, hopefully regarded as more safe.

 The point is in recognizing (const (unspec [symbol] XYZ)) offsets in PIC
 references, which seems quite idiomatic across all targets by now.
 
 I agree this is safer.  However, there used to be some ports that
 use (plus pic_offset_table_rtx (symbol_ref X)) -- i.e. without an unspec --
 to represent a PIC reference to X.  That always seemed semantically wrong,
 since you're not actually adding the address of X and the PIC register,
 but the patch wouldn't handle that case correctly.

Well I can't help those targets then, but at least nothing will be
changed for them by this patch. It will just continue to return 'true'.

 An alternative might be to remove the pic_offset_table_rtx case altogether
 and rely on targetm.delegitimize_address instead.  FWIW, I'd prefer that
 if it works, but it's not me you need to convince. :-)

Like we discussed below, I think the direction should be towards making
things more machine-independent, rather then pushing more into the backend.

 I would suggest that this probably means there should be a new, more
 specific construct in RTL to represent relocation values of this kind,
 instead of (const (unspec)) serving an unofficial role; possibly some
 real support for reasoning about PIC references could also be considered.
 
 Yeah, maybe we could try to introduce some target-independent knowledge
 of certain reloc types, a bit like the generic BFD_RELOC_*s in bfd.


FWIW, I've ran tests on the newer patch on i686-linux, with no
regressions. Testcase has been moved to gcc.dg/torture by recommendation
of Bernhard. If any of the RTL maintainers can give an eye of merciful
approval, this old PR could be resolved :)

Thanks,
Chung-Lin

Index: rtlanal.c
===
--- rtlanal.c   (revision 199474)
+++ rtlanal.c   (working copy)
@@ -393,7 +393,15 @@ nonzero_address_p (const_rtx x)
   /* Handle PIC references.  */
   if (XEXP (x, 0) == pic_offset_table_rtx
CONSTANT_P (XEXP (x, 1)))
-   return true;
+   {
+ rtx offset = XEXP (x, 1);
+ if (GET_CODE (offset) == CONST
+  GET_CODE (XEXP (offset, 0)) == UNSPEC
+  GET_CODE (XVECEXP (XEXP (offset, 0), 0, 0)) == SYMBOL_REF)
+   return nonzero_address_p (XVECEXP (XEXP (offset, 0), 0, 0));
+
+ return true;
+   }
   return false;
 
 case PRE_MODIFY:
Index: testsuite/gcc.dg/torture/pr32219.c
===
--- testsuite/gcc.dg/torture/pr32219.c  (revision 0)
+++ testsuite/gcc.dg/torture/pr32219.c  (revision 0)
@@ -0,0 +1,12 @@
+/* PR target/32219 */
+/* { dg-do run } */
+/* { dg-require-visibility  } */
+/* { dg-options -fPIC { target fpic } } */
+
+extern void f() __attribute__((weak,visibility(hidden)));
+int main()
+{
+  if (f)
+f();
+  return 0;
+}


Re: [PATCH] PR32219, weak hidden reference segfault

2013-05-15 Thread Chung-Lin Tang
On 13/5/10 6:37 PM, Richard Sandiford wrote:
 Chung-Lin Tang clt...@codesourcery.com writes:
 +case UNSPEC:
 +  /* Reach for a contained symbol.  */
 +  return nonzero_address_p (XVECEXP (x, 0, 0));
 
 I don't think this is safe.  UNSPECs really are unspecified :-),
 so we can't assume that (unspec X) is nonzero simply because X is.

Attached is a modified patch (not yet tested but just for demonstration)
with a more specific test, hopefully regarded as more safe.

The point is in recognizing (const (unspec [symbol] XYZ)) offsets in PIC
references, which seems quite idiomatic across all targets by now.

I would suggest that this probably means there should be a new, more
specific construct in RTL to represent relocation values of this kind,
instead of (const (unspec)) serving an unofficial role; possibly some
real support for reasoning about PIC references could also be considered.

Chung-Lin

Index: rtlanal.c
===
--- rtlanal.c   (revision 198923)
+++ rtlanal.c   (working copy)
@@ -393,7 +393,15 @@ nonzero_address_p (const_rtx x)
   /* Handle PIC references.  */
   if (XEXP (x, 0) == pic_offset_table_rtx
CONSTANT_P (XEXP (x, 1)))
-   return true;
+   {
+ rtx offset = XEXP (x, 1);
+ if (GET_CODE (offset) == CONST
+  GET_CODE (XEXP (offset, 0)) == UNSPEC
+  GET_CODE (XVECEXP (XEXP (offset, 0), 0, 0)) == SYMBOL_REF)
+   return nonzero_address_p (XVECEXP (XEXP (offset, 0), 0, 0));
+
+ return true;
+   }
   return false;
 
 case PRE_MODIFY:


Re: [ping][patch, ARM] Fix PR42017, LR not used in leaf functions

2013-05-15 Thread Chung-Lin Tang
On 13/5/15 9:43 PM, Kugan wrote:
 On 14/05/13 19:18, Ramana Radhakrishnan wrote:
 On 05/13/13 04:15, Kugan wrote:
 Hi,

 Ping this patch by Chung-Lin.
 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01179.html

 This patch allows lr registers to be used in leaf functions for ARM.

 There were some concerns about performance regression in thumb2 mode for
 CoreMark. However, looking at the code further shows that this
 performance regression is due to alignment issue with
 core_state_transition function and as a result taking longer time to
 execute. In fact, there isn’t any change in the code generated for
 core_state_transition with and without the patch. Adding Alignment to
 this function improves the performance than without the patch.

 Is this patch ok for trunk?

 Thanks,
 Kugan


 Ok.

 Thanks Ramana.
 
 I don't have write access for committing it. Can someone please commit
 this.
 
 Thanks,
 Kugam

I already committed after Ramana's approval.

Chung-Lin



Re: [PATCH 2/5] Altera Nios II: libgcc

2013-05-14 Thread Chung-Lin Tang
On 13/4/26 4:00 AM, Joseph S. Myers wrote:
 On Thu, 18 Apr 2013, Chung-Lin Tang wrote:
 
 +nios2-*-linux*)
 +tmake_file=$tmake_file nios2/t-nios2 nios2/t-linux t-libgcc-pic 
 t-slibgcc-libgcc
 +extra_parts=$extra_parts crti.o crtn.o
 +md_unwind_header=nios2/linux-unwind.h
 +;;
 +nios2-*-*)
 +tmake_file=$tmake_file nios2/t-nios2 t-fdpbit
 +extra_parts=$extra_parts crti.o crtn.o
 +;;
 
 As I understand it, the port uses soft-fp in glibc so doesn't need it in 
 libgcc for nios2-*-linux*.  But why use the old fp-bit in libgcc for bare 
 metal (use of t-fdpbit here), rather than soft-fp?

I think this was oversight. I have switched the nios2-elf config to use
use softfp.

Although not currently utilized, nios2 does have the capability of a
hard-float multilib, so I have used t-softfp-excl for now.

The remaining more trivial formatting, file name issues, etc. should
have all been resolved. Please see attached patch.

Thanks,
Chung-Lin

Index: libgcc/config.host
===
--- libgcc/config.host	(revision 198891)
+++ libgcc/config.host	(working copy)
@@ -137,6 +137,9 @@ mips*-*-*)
 	cpu_type=mips
 	tmake_file=mips/t-mips
 	;;
+nios2*-*-*)
+	cpu_type=nios2
+	;;
 powerpc*-*-*)
 	cpu_type=rs6000
 	;;
@@ -814,6 +817,15 @@ moxie-*-rtems*)
 	# Don't use default.
 	extra_parts=
 	;;
+nios2-*-linux*)
+	tmake_file=$tmake_file nios2/t-nios2 nios2/t-linux t-libgcc-pic t-slibgcc-libgcc
+	extra_parts=$extra_parts crti.o crtn.o
+	md_unwind_header=nios2/linux-unwind.h
+	;;
+nios2-*-*)
+	tmake_file=$tmake_file nios2/t-nios2 t-softfp-sfdf t-softfp-excl t-softfp
+	extra_parts=$extra_parts crti.o crtn.o
+	;;
 pdp11-*-*)
 	tmake_file=pdp11/t-pdp11 t-fdpbit
 	;;
Index: libgcc/config/nios2/crtn.S
===
--- libgcc/config/nios2/crtn.S	(revision 0)
+++ libgcc/config/nios2/crtn.S	(revision 0)
@@ -0,0 +1,60 @@
+/* Copyright (C) 2012-2013 Free Software Foundation, Inc.
+   Contributed by Jonah Graham (jgra...@altera.com).
+   Contributed by Mentor Graphics, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+http://www.gnu.org/licenses/.  */
+
+
+/* This file just makes sure that the .fini and .init sections do in
+fact return.  Users may put any desired instructions in those sections.
+This file is the last thing linked into any executable.
+*/	
+	.file	crtn.asm
+
+
+
+	.section	.init
+	ldw	ra, 44(sp)
+	ldw	r23, 40(sp)
+	ldw	r22, 36(sp)
+	ldw	r21, 32(sp)
+	ldw	r20, 28(sp)
+	ldw	r19, 24(sp)
+	ldw	r18, 20(sp)
+	ldw	r17, 16(sp)
+	ldw	r16, 12(sp)
+	ldw	fp, 8(sp)
+	addi	sp, sp, 48
+	ret
+	
+	.section	.fini
+	ldw	ra, 44(sp)
+	ldw	r23, 40(sp)
+	ldw	r22, 36(sp)
+	ldw	r21, 32(sp)
+	ldw	r20, 28(sp)
+	ldw	r19, 24(sp)
+	ldw	r18, 20(sp)
+	ldw	r17, 16(sp)
+	ldw	r16, 12(sp)
+	ldw	fp, 8(sp)
+	addi	sp, sp, 48
+	ret
+
Index: libgcc/config/nios2/linux-unwind.h
===
--- libgcc/config/nios2/linux-unwind.h	(revision 0)
+++ libgcc/config/nios2/linux-unwind.h	(revision 0)
@@ -0,0 +1,179 @@
+/* DWARF2 EH unwinding support for Nios II Linux.
+   Copyright (C) 2008-2013 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+http://www.gnu.org/licenses/.  */
+
+#ifndef inhibit_libc
+
+/* Do code reading to identify a signal frame

Re: [PATCH 0/5] Submission of Altera Nios II port

2013-05-14 Thread Chung-Lin Tang
On 2013/4/26 04:35 AM, Joseph S. Myers wrote:
 I should ask the following general standard new-port questions:
 
 * Does the port build cleanly when configured with --enable-werror-always 
 and built using a native compiler from current GCC trunk - for both 32-bit 
 host, and 64-bit host?  It should.  This is the standard way of ensuring 
 the same level of warning-cleanness in a cross build as native bootstrap 
 automatically enforces (and the build compiler needs to be from current 
 trunk because warning-cleanness is only expected when the build compiler 
 is the same version as the compiler being built).
 
 The new targets should be added to contrib/config-list.mk, which helps 
 test all targets with --enable-werror-always.  This is mentioned in Back 
 End in sourcebuild.texi - check there for any other pieces that should be 
 included in the port submission.

Currently, nios2 seems to be another affected target by PR 55035, when
building with a recent trunk with --enable-werror-always:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55035

I would like to ask this --enable-werror-always requirement be relaxed
for the nios2 port submission, as it is not alone in the above PR, and
we are early in the release cycle; there should be plenty of time to fix
any problems afterwards.

Though not included in the submitted patches, I will add the target
entries in contrib/config-list.mk when committing.

 * What are test results like for the port?  Again, both 32-bit and 64-bit 
 hosts, to detect any dependencies on whether the host is 32-bit or 64-bit.

Tests of i686 and x86_64 Linux hosts show same test results. FAILs that
still exist include g++.dg/debug/dwarf2/non-virtual-thunk.C, due to the
lack of target specific MI-thunk hooks right now, and a few tree-ssa
optimization FAILs, that might be worked around by augmenting the
testcase. The port in whole should be fairly stable.

Thanks,
Chung-Lin




Re: [ping][patch, ARM] Fix PR42017, LR not used in leaf functions

2013-05-13 Thread Chung-Lin Tang
On 13/5/13 11:15 AM, Kugan wrote:
 Hi,
 
 Ping this patch by Chung-Lin.
 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01179.html
 
 This patch allows lr registers to be used in leaf functions for ARM.
 
 There were some concerns about performance regression in thumb2 mode for
 CoreMark. However, looking at the code further shows that this
 performance regression is due to alignment issue with
 core_state_transition function and as a result taking longer time to
 execute. In fact, there isn’t any change in the code generated for
 core_state_transition with and without the patch. Adding Alignment to
 this function improves the performance than without the patch.

Just curious, were changes to enforce the alignment added already? (I'm
quite out of ARM-specific context lately).

Chung-Lin



[PATCH] PR32219, weak hidden reference segfault

2013-05-09 Thread Chung-Lin Tang
Hi, with reference to the old dicussion on PR 32219:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32219

It seems that a patch was submitted to put the DECL_WEAK check before
the visibility check, but that patch was never approved or applied, due
to concerns in the wording of surrounding comments:
http://gcc.gnu.org/ml/gcc-patches/2010-03/msg00666.html

That patch does solve the segfault in the PR, and happens to also solve
the other weak-hidden + section-anchor issue Nathan's other patch solves
(simply because it works the same way by rejecting DECL_WEAK inside
binds_local_p).

However, my own testing of the PR patch on recent trunk indicates a
regression: it filters out the TLS wrapper functions for C++11
thread_local variables, causing them to be called by @PLT, and failing a
few tests that check for this.

Looking into the generated code for:

extern void weakfun() __attribute__((weak,visibility(hidden)));
void foo ()
{
  if (weakfun) weakfun();
}

Under -O1 -m32 -fPIC, the address load and test will look like:

(insn 5 2 6 2 (set (reg/f:SI 60)
   (plus:SI (reg:SI 3 bx)
  (const:SI (unspec:SI [
(symbol_ref/i:SI (f) [flags 0x43]  function_decl f)
] UNSPEC_GOTOFF {*leasi}
 (expr_list:REG_EQUAL (symbol_ref/i:SI (f) function_decl f)
(nil)))

(insn 6 5 7 2 (set (reg:CCZ 17 flags)
(compare:CCZ (reg/f:SI 60)
(const_int 0 [0]))) p.c:8 3 {*cmpsi_ccno_1}
 (nil))

(jump_insn 7 6 8 2 ...

However, the logic currently used in rtlanal.c:nonzero_address_p() only
test for PIC-reg + constant_p, instead of more sophisticated
checking to expose the wrapped weak symbol_ref, thus confusing CSE to
eliminate the needed test.

The DECL_WEAK test upward movement in binds_local_p() works because when
bound non-local, x86 expand turns the address load into (mem (plus
(const (unspec ... UNSPEC_GOT form, with the MEM helping to avoid
the above case.

Attached is a patch to make the nonzero_address_p() work for this case,
bootstrapped and tested on i686-linux without regressions.

I have the impression from the PR32219 discussion, that the solution
should be to make all weak-hidden-undefined symbols as potentially
non-local.  However, I don't think that is needed, no? As long as the
linker added zero value is in the same module, weak hidden semantics
should remain just the same...

Thanks,
Chung-Lin

2013-05-09  Chung-Lin Tang  clt...@codesourcery.com

PR target/32219
* rtlanal.c (nonzero_address_p): Robustify checking by look
recursively into PIC constant offsets and (CONST (UNSPEC ...))
expressions.
Index: rtlanal.c
===
--- rtlanal.c   (revision 198735)
+++ rtlanal.c   (working copy)
@@ -387,13 +387,22 @@ nonzero_address_p (const_rtx x)
   return false;
 
 case CONST:
-  return nonzero_address_p (XEXP (x, 0));
+  {
+   rtx base, offset;
+   /* Peel away any constant offsets from the base symbol.  */
+   split_const (CONST_CAST_RTX (x), base, offset);
+   return nonzero_address_p (base);
+  }
 
+case UNSPEC:
+  /* Reach for a contained symbol.  */
+  return nonzero_address_p (XVECEXP (x, 0, 0));
+
 case PLUS:
   /* Handle PIC references.  */
   if (XEXP (x, 0) == pic_offset_table_rtx
CONSTANT_P (XEXP (x, 1)))
-   return true;
+   return nonzero_address_p (XEXP (x, 1));
   return false;
 
 case PRE_MODIFY:
Index: testsuite/gcc.dg/visibility-21.c
===
--- testsuite/gcc.dg/visibility-21.c(revision 0)
+++ testsuite/gcc.dg/visibility-21.c(revision 0)
@@ -0,0 +1,12 @@
+/* PR target/32219 */
+/* { dg-do run } */
+/* { dg-require-visibility  } */
+/* { dg-options -O1 -fPIC { target fpic } } */
+
+extern void f() __attribute__((weak,visibility(hidden)));
+int main()
+{
+  if (f)
+f();
+  return 0;
+}


Re: [PATCH 4/5] Altera Nios II: dwarf generation fix

2013-04-23 Thread Chung-Lin Tang
On 2013/4/23 01:35 AM, Cary Coutant wrote:
 A : host_integerp (value, TYPE_UNSIGNED (TREE_TYPE (value)))
 B : host_integerp (value, 0)

 AB  AB  AB  AB
 type_size,hwi | 00  01  10  11
 --+---
 32,32 | X   X   int int
 64,32 | X   X   int int
 32,64 | X   X   -   int
 64,64 | X   X   int int
 
 In the third column (AB == 10), we're emitting a single int today even
 though we know it's not technically correct: GDB will display the
 unsigned value as a negative number. That's marginally better than
 emitting nothing at all when the value is larger than an hwi, but I
 was arguing that as long as we're adding the ability to emit the
 constant as a double, why not also use a double for an unsigned that
 doesn't fit in a signed hwi? Yes, it'll waste some space, but the
 value will be correctly displayed as a result.

I'm not sure of other cases, but here it is only to mark the values of
an enumerator type, so as long as the values are consistent, the correct
behavior is to print out the right enum string (I know that's a bit not
ideal, but just to point that out)

 Upon further reflection, however...
 
 This comment is wrong:
 
 /* DWARF2 does not provide a way of indicating whether or
not enumeration constants are signed or unsigned.  GDB
always assumes the values are signed, so we output all
values as if they were signed.  That means that
enumeration constants with very large unsigned values
will appear to have negative values in the debugger.  */
 
 DWARF does in fact provide a way of indicating whether a constant is
 signed or unsigned: DW_FORM_sdata and DW_FORM_udata. These forms were
 in DWARF-2, and the following comment was added to the DWARF-3 spec:
 
 If one of the DW_FORM_datan forms is used to represent a signed or
 unsigned integer, it can be hard for a consumer to discover the
 context necessary to determine which interpretation is intended.
 Producers are therefore strongly encouraged to use DW_FORM_sdata or
 DW_FORM_udata for signed and unsigned integers respectively, rather
 than DW_FORM_datan.
 
 We should really be emitting unsigned constants using add_AT_unsigned:
 
   if (TYPE_UNSIGNED (TREE_TYPE (value)))
 {
   if (host_integerp (value, 1))
 add_AT_unsigned (enum_die, DW_AT_const_value, TREE_INT_CST_LOW 
 (value));
   else
 add_AT_unsigned_double (enum_die, DW_AT_const_value,
 TREE_INT_CST_HIGH (value),
 TREE_INT_CST_LOW (value));
 }
   else
 {
   if (host_integerp (value, 0))
 add_AT_int (enum_die, DW_AT_const_value, TREE_INT_CST_LOW (value));
   else
 add_AT_double (enum_die, DW_AT_const_value,
TREE_INT_CST_HIGH (value), TREE_INT_CST_LOW (value));
 }
 
 add_AT_unsigned_double would be new, and would need a new
 dw_val_class_unsigned_const_double enum.

That seems like the completely correct solution; correct unsigned/signed
x int/double tags for all situations. ...1xxx... can then be
correctly read as an unsigned int, rather than an excessively wide
double or (incorrectly) signed int.

You said OK for Julian's patch in the last mail, so I'll take that as
approved (for an interim solution). If you don't mind, I'll add a TODO
to the comments (attached patch).

Thanks,
Chung-Lin

Index: dwarf2out.c
===
--- dwarf2out.c	(revision 198177)
+++ dwarf2out.c	(working copy)
@@ -17027,15 +17027,27 @@ gen_enumeration_type_die (tree type, dw_die_ref co
 	  if (TREE_CODE (value) == CONST_DECL)
 	value = DECL_INITIAL (value);
 
-	  if (host_integerp (value, TYPE_UNSIGNED (TREE_TYPE (value
+	  if (host_integerp (value, TYPE_UNSIGNED (TREE_TYPE (value)))
+	   (simple_type_size_in_bits (TREE_TYPE (value))
+		  = HOST_BITS_PER_WIDE_INT || host_integerp (value, 0)))
 	/* DWARF2 does not provide a way of indicating whether or
 	   not enumeration constants are signed or unsigned.  GDB
 	   always assumes the values are signed, so we output all
 	   values as if they were signed.  That means that
 	   enumeration constants with very large unsigned values
-	   will appear to have negative values in the debugger.  */
-	add_AT_int (enum_die, DW_AT_const_value,
-			tree_low_cst (value, tree_int_cst_sgn (value)  0));
+	   will appear to have negative values in the debugger.
+
+	   TODO: the above comment is wrong, DWARF2 does provide
+	   DW_FORM_sdata/DW_FORM_udata to represent signed/unsigned data.
+	   This should be re-worked to use correct signed/unsigned
+	   int/double tags for all cases, instead of always treating as
+	   signed.  */
+	add_AT_int (enum_die, DW_AT_const_value, TREE_INT_CST_LOW (value));
+	  else
+	/* 

Re: [PATCH 4/5] Altera Nios II: dwarf generation fix

2013-04-22 Thread Chung-Lin Tang
On 2013/4/19 12:56 AM, Cary Coutant wrote:
 On Thu, Apr 18, 2013 at 6:49 AM, Chung-Lin Tang clt...@codesourcery.com 
 wrote:
 This patch was a fix by Julian which corrected a HOST_BITS_PER_WIDE_INT
 host dependency in dwarf generation. Nios II does not have
 need_64bit_hwint switched on during configuring, and ran into GDB test
 FAILs originating from this problem.

 2013-04-18  Julian Brown  jul...@codesourcery.com

 * dwarf2out.c (gen_enumeration_type_die): Fix
 HOST_BITS_PER_WIDE_INT dependency behavior in enumeration type
 DIE generation.
 
 +  if (host_integerp (value, TYPE_UNSIGNED (TREE_TYPE (value)))
 +   (simple_type_size_in_bits (TREE_TYPE (value))
 +  = HOST_BITS_PER_WIDE_INT || host_integerp (value, 0)))
  /* DWARF2 does not provide a way of indicating whether or
 not enumeration constants are signed or unsigned.  GDB
 always assumes the values are signed, so we output all
 values as if they were signed.  That means that
 enumeration constants with very large unsigned values
 will appear to have negative values in the debugger.  */
 +add_AT_int (enum_die, DW_AT_const_value, TREE_INT_CST_LOW (value));
 +  else
 +/* Enumeration constants may be wider than HOST_WIDE_INT.  Handle
 +   that here.  */
 +add_AT_double (enum_die, DW_AT_const_value,
 +   TREE_INT_CST_HIGH (value), TREE_INT_CST_LOW (value));
 
 I'm not sure I understand the logic here. I'd think either the value
 fits in a signed HOST_WIDE_INT, and we use add_AT_int, or it doesn't,
 and we use add_AT_double:
 
   if (host_integerp (value, 0))
 add_AT_int (enum_die, DW_AT_const_value, TREE_INT_CST_LOW (value));
   else
 add_AT_double (enum_die, DW_AT_const_value,
TREE_INT_CST_HIGH (value), TREE_INT_CST_LOW (value));
 
 Why wouldn't that work? I'd think this would even eliminate the need
 for the comment about signed vs. unsigned.
 
 -cary

I think Julian might be able to fill clearer, but IIUC, if you use
host_integer(value,0) as the test, while functionally also correct, for
values like:

TREE_INT_CST_HIGH (value) == ...
TREE_INT_CST_LOW (value)  == 1xxx...

you will end up placing it as a double, even if TREE_TYPE (value) is
something within 32-bits, which you can actually place as an 'int'. In
other words, the more complex condition saves a bit of dwarf size.

Julian, can you comment further?

Thanks,
Chung-Lin



<    1   2   3   4   5   6   >