Re: [OMPI devel] make check fails for Intel 2011.6.233 (OpenMPI 1.4.3)

2011-10-18 Thread Larry Baker

George,

Thanks for the update.  FYI, here's all the version numbers reported  
by the compiler releases I have installed:



[baker@hydra ~]$ module load compilers/intel/11.1.080
[baker@hydra ~]$ icc -v
Version 11.1
[baker@hydra ~]$ module unload compilers/intel/11.1.080



[baker@hydra ~]$ module load compilers/intel/2011.3.174
[baker@hydra ~]$ icc -v
Version 12.0.3
[baker@hydra ~]$ module unload compilers/intel/2011.3.174



[baker@hydra ~]$ module load compilers/intel/2011.4.191
[baker@hydra ~]$ icc -v
Version 12.0.4
[baker@hydra ~]$ module unload compilers/intel/2011.4.191



[baker@hydra ~]$ module load compilers/intel/2011.5.220
[baker@hydra ~]$ icc -v
Version 12.0.5
[baker@hydra ~]$ module unload compilers/intel/2011.5.220



[baker@hydra ~]$ module load compilers/intel/2011.6.233
[baker@hydra ~]$ icc -v
icc version 12.1.0 (gcc version 4.1.2 compatibility)
[baker@hydra ~]$ module unload compilers/intel/2011.6.233



Another problem I found with the Intel 12.1.0 compiler: I started to  
look at adding a test for the Intel compiler version around the  
#pragma that disables optimization for OpenMPI and I found the __ICC  
and __INTEL_COMPILER predefined macros (compiler version no.) are not  
properly defined:


$ icc -E -dD hello.c | grep __INTEL_COMPILER
#define __INTEL_COMPILER 
#define __INTEL_COMPILER_BUILD_DATE 20110811

$ icc -E -dD hello.c | grep __ICC
#define __ICC 

$ icc -v
icc version 12.1.0 (gcc version 4.1.2 compatibility)

I do not know if there is code in OpenMPI that looks at __ICC and  
__INTEL_COMPILER, but that could cause problems.  (Pass this on  
upstream to the libtool people?)


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 17 Oct 2011, at 8:18 PM, George Bosilca wrote:


Larry,

Sorry for not updating this thread. The issue was identified and  
fixed by Rainer in r25290 (https://svn.open-mpi.org/trac/ompi/changeset/25290 
). Please read the comments and the linked thread on the Intel forum  
for more info about.


I couldn't find a trace of this being fixed in the 1.4 series, so I  
would wait upgrading until this issue gets resolved.


  Thanks,
george.

On Oct 17, 2011, at 23:00 , Larry Baker wrote:


George,

I have not had time to look over the 1.4.3 make check failure for  
Intel 2011.6.233 compilers.  Have you?


I had planned to get 1.4.3 compiled on all six of our compilers  
using the latest compiler releases.  I was putting off upgrading to  
1.4.4 or 1.5.x until after that to minimize the number of things  
that could go wrong.  Do you recommend otherwise?


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 7 Oct 2011, at 6:46 PM, George Bosilca wrote:

The may_alias attribute was part of a forward-looking attribute  
checking, at a time where few compiler supported them. This  
explains why they are not widely used in the library itself.  
Moreover, as they do not affect the compilation itself (as your  
test highlights this is not the issue with the icc 2011.6.233  
compiler), there is no urge to remove the may_alias support.


I just got that particular version of the compiler installed on  
one of our machines. I'll give it a try over the weekend.


  george.

On Oct 7, 2011, at 20:21 , Larry Baker wrote:

The test for the __may_alias_ attribute uses the following short  
code snippet:



int * p_value __attribute__ ((__may_alias__));
int
main ()
{

  ;
  return 0;
}


Indeed, for Intel 2011 compilers prior to 2011.6.233, this  
results in a warning:



root@hydra openmpi-1.4.3]# module load compilers/intel/2011.5.220
[root@hydra openmpi-1.4.3]# icc -c may_alias_test.c
may_alias_test.c(123): warning #1292: attribute "__may_alias__"  
ignored

  int * p_value __attribute__ ((__may_alias__));
^

[root@hydra openmpi-1.4.3]# module unload compilers/intel/ 
2011.5.220



[root@hydra openmpi-1.4.3]# module load compilers/intel/2011.6.233
[root@hydra openmpi-1.4.3]# icc -c may_alias_test.c



I modified ./configure to force


ompi_cv___attribute__may_alias=0



Then I compiled and tested the library.  Unfortunately, the  
results were exactly the same:



make  check-TESTS
make[3]: Entering directory `/state/partition1/root/src/ 
openmpi-1.4.3/test/datatype'

/bin/sh: line 4: 26326 Segmentation fault  ${dir}$tst
FAIL: checksum
/bin/sh: line 4: 26359 Segmentation fault  ${dir}$tst
FAIL: position

2 of 2 tests failed
Please report to http://www.open-mpi.org/community/help/




I could not find any use of the may_alias attribute, other than  
in a #define in opal/include/opal_config_bottom.h.  Is  
OMPI_HAVE_ATTRIBUTE_MAY_ALIAS just cruft that can be removed?


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 7 Oct 2011, at 11:08 AM, Larry Baker wrote:

I ran into a problem this past week trying to upgrade our  
OpenMPI 1.4.3 for the latest Intel 2011 

Re: [OMPI devel] [OMPI svn] svn:open-mpi r25302

2011-10-18 Thread Ralph Castain

On Oct 18, 2011, at 7:35 AM, TERRY DONTJE wrote:

>> Strange - it ran fine for me on multiple tests. I'll check to see if 
>> something strange got into the mix and recommit.
>> 
> Not sure it is the same issue but it looks like all my MTT tests on the trunk 
> r25308 are timing out.

Okay - sorry about that. I'm looking into it now. I tested it with a multi-node 
setup, but it's always possible that something got in there after the tests 
(and sounds like it did).

> --td
> 
>> On Oct 17, 2011, at 8:51 PM, George Bosilca wrote:
>> 
>>> This commit put the mpirun process in an infinite loop for the simple case 
>>> mpirun -np 2 --mca orte_default_hostfile machinefile --bynode *my_app*
>>> 
>>>  george.
>>> 
>>> On Oct 17, 2011, at 15:49 , r...@osl.iu.edu wrote:
>>> 
 Author: rhc
 Date: 2011-10-17 15:49:04 EDT (Mon, 17 Oct 2011)
 New Revision: 25302
 URL: https://svn.open-mpi.org/trac/ompi/changeset/25302
 
 Log:
 Fix the mapping algo for computing vpids - it was borked for bynode 
 operations when using nperxxx directives
 
 Text files modified: 
  trunk/orte/mca/rmaps/base/rmaps_base_support_fns.c |67 
 --- 
  1 files changed, 34 insertions(+), 33 deletions(-)
 
 Modified: trunk/orte/mca/rmaps/base/rmaps_base_support_fns.c
 ==
 --- trunk/orte/mca/rmaps/base/rmaps_base_support_fns.c (original)
 +++ trunk/orte/mca/rmaps/base/rmaps_base_support_fns.c 2011-10-17 
 15:49:04 EDT (Mon, 17 Oct 2011)
 @@ -527,7 +527,7 @@
 int orte_rmaps_base_compute_vpids(orte_job_t *jdata)
 {
orte_job_map_t *map;
 -orte_vpid_t vpid;
 +orte_vpid_t vpid, cnt;
int i, j;
orte_node_t *node;
orte_proc_t *proc;
 @@ -539,6 +539,7 @@
ORTE_MAPPING_BYSOCKET & map->policy ||
ORTE_MAPPING_BYBOARD & map->policy) {
/* assign the ranks sequentially */
 +vpid = 0;
for (i=0; i < map->nodes->size; i++) {
if (NULL == (node = 
 (orte_node_t*)opal_pointer_array_get_item(map->nodes, i))) {
continue;
 @@ -553,12 +554,10 @@
}
if (ORTE_VPID_INVALID == proc->name.vpid) {
/* find the next available vpid */
 -for (vpid=0; vpid < jdata->num_procs; vpid++) {
 -if (NULL == 
 opal_pointer_array_get_item(jdata->procs, vpid)) {
 -break;
 -}
 +while (NULL != 
 opal_pointer_array_get_item(jdata->procs, vpid)) {
 +vpid++;
}
 -proc->name.vpid = vpid;
 +proc->name.vpid = vpid++;
ORTE_EPOCH_SET(proc->name.epoch,ORTE_EPOCH_INVALID);

 ORTE_EPOCH_SET(proc->name.epoch,orte_ess.proc_get_epoch(>name));
 
 @@ -580,39 +579,41 @@
 
if (ORTE_MAPPING_BYNODE & map->policy) {
/* assign the ranks round-robin across nodes */
 -for (i=0; i < map->nodes->size; i++) {
 -if (NULL == (node = 
 (orte_node_t*)opal_pointer_array_get_item(map->nodes, i))) {
 -continue;
 -}
 -for (j=0; j < node->procs->size; j++) {
 -if (NULL == (proc = 
 (orte_proc_t*)opal_pointer_array_get_item(node->procs, j))) {
 +cnt = 0;
 +vpid = 0;
 +do {
 +for (i=0; i < map->nodes->size; i++) {
 +if (NULL == (node = 
 (orte_node_t*)opal_pointer_array_get_item(map->nodes, i))) {
continue;
}
 -/* ignore procs from other jobs */
 -if (proc->name.jobid != jdata->jobid) {
 -continue;
 -}
 -if (ORTE_VPID_INVALID == proc->name.vpid) {
 -/* find the next available vpid */
 -vpid = i;
 -while (NULL != 
 opal_pointer_array_get_item(jdata->procs, vpid)) {
 -vpid += map->num_nodes;
 -if (jdata->num_procs <= vpid) {
 -vpid = vpid - jdata->num_procs;
 +for (j=0; j < node->procs->size; j++) {
 +if (NULL == (proc = 
 (orte_proc_t*)opal_pointer_array_get_item(node->procs, j))) {
 +continue;
 +}
 +/* ignore procs from other jobs */
 +if (proc->name.jobid != jdata->jobid) {
 +continue;
 +}
 + 

Re: [OMPI devel] [OMPI svn] svn:open-mpi r25302

2011-10-18 Thread TERRY DONTJE

Strange - it ran fine for me on multiple tests. I'll check to see if something 
strange got into the mix and recommit.

Not sure it is the same issue but it looks like all my MTT tests on the 
trunk r25308 are timing out.

--td


On Oct 17, 2011, at 8:51 PM, George Bosilca wrote:


This commit put the mpirun process in an infinite loop for the simple case
mpirun -np 2 --mca orte_default_hostfile machinefile --bynode *my_app*

  george.

On Oct 17, 2011, at 15:49 , r...@osl.iu.edu wrote:


Author: rhc
Date: 2011-10-17 15:49:04 EDT (Mon, 17 Oct 2011)
New Revision: 25302
URL: https://svn.open-mpi.org/trac/ompi/changeset/25302

Log:
Fix the mapping algo for computing vpids - it was borked for bynode operations 
when using nperxxx directives

Text files modified:
  trunk/orte/mca/rmaps/base/rmaps_base_support_fns.c |67 
---
  1 files changed, 34 insertions(+), 33 deletions(-)

Modified: trunk/orte/mca/rmaps/base/rmaps_base_support_fns.c
==
--- trunk/orte/mca/rmaps/base/rmaps_base_support_fns.c  (original)
+++ trunk/orte/mca/rmaps/base/rmaps_base_support_fns.c  2011-10-17 15:49:04 EDT 
(Mon, 17 Oct 2011)
@@ -527,7 +527,7 @@
int orte_rmaps_base_compute_vpids(orte_job_t *jdata)
{
orte_job_map_t *map;
-orte_vpid_t vpid;
+orte_vpid_t vpid, cnt;
int i, j;
orte_node_t *node;
orte_proc_t *proc;
@@ -539,6 +539,7 @@
ORTE_MAPPING_BYSOCKET&  map->policy ||
ORTE_MAPPING_BYBOARD&  map->policy) {
/* assign the ranks sequentially */
+vpid = 0;
for (i=0; i<  map->nodes->size; i++) {
if (NULL == (node = 
(orte_node_t*)opal_pointer_array_get_item(map->nodes, i))) {
continue;
@@ -553,12 +554,10 @@
}
if (ORTE_VPID_INVALID == proc->name.vpid) {
/* find the next available vpid */
-for (vpid=0; vpid<  jdata->num_procs; vpid++) {
-if (NULL == opal_pointer_array_get_item(jdata->procs, 
vpid)) {
-break;
-}
+while (NULL != opal_pointer_array_get_item(jdata->procs, 
vpid)) {
+vpid++;
}
-proc->name.vpid = vpid;
+proc->name.vpid = vpid++;
ORTE_EPOCH_SET(proc->name.epoch,ORTE_EPOCH_INVALID);

ORTE_EPOCH_SET(proc->name.epoch,orte_ess.proc_get_epoch(>name));

@@ -580,39 +579,41 @@

if (ORTE_MAPPING_BYNODE&  map->policy) {
/* assign the ranks round-robin across nodes */
-for (i=0; i<  map->nodes->size; i++) {
-if (NULL == (node = 
(orte_node_t*)opal_pointer_array_get_item(map->nodes, i))) {
-continue;
-}
-for (j=0; j<  node->procs->size; j++) {
-if (NULL == (proc = 
(orte_proc_t*)opal_pointer_array_get_item(node->procs, j))) {
+cnt = 0;
+vpid = 0;
+do {
+for (i=0; i<  map->nodes->size; i++) {
+if (NULL == (node = 
(orte_node_t*)opal_pointer_array_get_item(map->nodes, i))) {
continue;
}
-/* ignore procs from other jobs */
-if (proc->name.jobid != jdata->jobid) {
-continue;
-}
-if (ORTE_VPID_INVALID == proc->name.vpid) {
-/* find the next available vpid */
-vpid = i;
-while (NULL != opal_pointer_array_get_item(jdata->procs, 
vpid)) {
-vpid += map->num_nodes;
-if (jdata->num_procs<= vpid) {
-vpid = vpid - jdata->num_procs;
+for (j=0; j<  node->procs->size; j++) {
+if (NULL == (proc = 
(orte_proc_t*)opal_pointer_array_get_item(node->procs, j))) {
+continue;
+}
+/* ignore procs from other jobs */
+if (proc->name.jobid != jdata->jobid) {
+continue;
+}
+if (ORTE_VPID_INVALID == proc->name.vpid) {
+/* find next available vpid */
+while (NULL != 
opal_pointer_array_get_item(jdata->procs, vpid)) {
+vpid++;
+}
+proc->name.vpid = vpid++;
+ORTE_EPOCH_SET(proc->name.epoch,ORTE_EPOCH_INVALID);
+
ORTE_EPOCH_SET(proc->name.epoch,orte_ess.proc_get_epoch(>name));
+if (ORTE_SUCCESS != (rc = 
opal_pointer_array_set_item(jdata->procs,
+  
proc->name.vpid, proc))) {
+ORTE_ERROR_LOG(rc);
+  

Re: [OMPI devel] [OMPI svn] svn:open-mpi r25303

2011-10-18 Thread Ralph Castain

On Oct 17, 2011, at 8:55 PM, George Bosilca wrote:

> 
> On Oct 17, 2011, at 18:23 , Ralph Castain wrote:
> 
>> 
>> On Oct 17, 2011, at 4:14 PM, George Bosilca wrote:
>> 
>>> Ralph,
>>> 
>>> I have a concern about the code below (snippet from the commit 25303). You 
>>> moved the setting of proc_flags and proc_name in a function named set_arch. 
>>> As the name and the lengthy comment above it indicate, the scope of this 
>>> particular function is to set the architecture of the remote process, not 
>>> the locality flag or the process name.
>> 
>> I agree. However, there is no harm in moving those settings to that function.
> 
> Ralph,
> 
> It depends on your definition of harm. The large number of developers that 
> worked on the OPAL and OMPI layer tried to follow standard coding practices 
> as much as possible. Side effects like the one you just introduced are not 
> tolerated, and have been promptly addressed in the past [at least in the OPAL 
> and OMPI layers].

You have a strange definition of side effect, and a very different memory of 
anything I have encountered.

> 
> For the sake of the future developers, I would really appreciate if you avoid 
> transgressing these community-friendly rules.
> 
>> Indeed, there is a significant advantage in doing so as it allows the data 
>> to be exchanged during the modex, instead of mandating that ORTE provide it.
>> 
>> I agree that the function name is now somewhat inaccurate, but chose not to 
>> change it as (a) I couldn't think of a better alternative, and (b) it seemed 
>> a trivial issue. If it bothers you or others, however, please feel free to 
>> change the name of the function.
> 
> george.
> 
>> 
>> 
>>> 
>>> george.
>>> 
>>> 
>>> Index: ompi/proc/proc.c
>>> ===
>>> --- ompi/proc/proc.c(revision 25302)
>>> +++ ompi/proc/proc.c(working copy)
>>> @@ -119,11 +119,6 @@
>>>   if (OMPI_SUCCESS != (ret = ompi_modex_send_key_value("OMPI_ARCH", 
>>> >proc_arch, OPAL_UINT32))) {
>>>   return ret;
>>>   }
>>> -} else {
>>> -/* get the locality information */
>>> -proc->proc_flags = 
>>> orte_ess.proc_get_locality(>proc_name);
>>> -/* get the name of the node it is on */
>>> -proc->proc_hostname = 
>>> orte_ess.proc_get_hostname(>proc_name);
>>>   }
>>>   }
>>> 
>>> @@ -149,8 +144,8 @@
>>>   OPAL_THREAD_LOCK(_proc_lock);
>>> 
>>>   for( item  = opal_list_get_first(_proc_list);
>>> -item != opal_list_get_end(_proc_list);
>>> -item  = opal_list_get_next(item)) {
>>> + item != opal_list_get_end(_proc_list);
>>> + item  = opal_list_get_next(item)) {
>>>   proc = (ompi_proc_t*)item;
>>> 
>>>   if (proc->proc_name.vpid != ORTE_PROC_MY_NAME->vpid) {
>>> @@ -177,6 +172,10 @@
>>>   OPAL_THREAD_UNLOCK(_proc_lock);
>>>   return ret;
>>>   }
>>> +/* get the locality information */
>>> +proc->proc_flags = 
>>> orte_ess.proc_get_locality(>proc_name);
>>> +/* get the name of the node it is on */
>>> +proc->proc_hostname = 
>>> orte_ess.proc_get_hostname(>proc_name);
>>>   }
>>>   }
>>>   OPAL_THREAD_UNLOCK(_proc_lock);
>>> 
>>> 
>>> 
>>> 
>>> On Oct 17, 2011, at 16:51 , r...@osl.iu.edu wrote:
>>> 
 Author: rhc
 Date: 2011-10-17 16:51:22 EDT (Mon, 17 Oct 2011)
 New Revision: 25303
 URL: https://svn.open-mpi.org/trac/ompi/changeset/25303
 
 Log:
 
 Complete implementation of pmi support. Ensure we support both mpirun and 
 direct launch within same configuration to avoid requiring separate 
 builds. Add support for generic pmi, not just under slurm. Add 
 publish/subscribe support, although slurm's pmi implementation will just 
 return an error as it hasn't been done yet.
 
 
 
 Added:
 trunk/ompi/mca/pubsub/pmi/   (props changed)
 trunk/ompi/mca/pubsub/pmi/Makefile.am
 trunk/ompi/mca/pubsub/pmi/configure.m4
 trunk/ompi/mca/pubsub/pmi/pubsub_pmi.c
 trunk/ompi/mca/pubsub/pmi/pubsub_pmi.h
 trunk/ompi/mca/pubsub/pmi/pubsub_pmi_component.c
 trunk/orte/mca/ess/pmi/   (props changed)
 trunk/orte/mca/ess/pmi/Makefile.am
 trunk/orte/mca/ess/pmi/configure.m4
 trunk/orte/mca/ess/pmi/ess_pmi.h
 trunk/orte/mca/ess/pmi/ess_pmi_component.c
 trunk/orte/mca/ess/pmi/ess_pmi_module.c
 Text files modified: 
 trunk/ompi/proc/proc.c |13 
 
 trunk/orte/config/orte_check_pmi.m4| 3 
 
 trunk/orte/mca/ess/slurmd/Makefile.am  |10 
 
 trunk/orte/mca/ess/slurmd/configure.m4 |16 -   
 
 

Re: [OMPI devel] [OMPI svn] svn:open-mpi r25302

2011-10-18 Thread Ralph Castain
Strange - it ran fine for me on multiple tests. I'll check to see if something 
strange got into the mix and recommit.

On Oct 17, 2011, at 8:51 PM, George Bosilca wrote:

> This commit put the mpirun process in an infinite loop for the simple case 
> mpirun -np 2 --mca orte_default_hostfile machinefile --bynode *my_app*
> 
>  george.
> 
> On Oct 17, 2011, at 15:49 , r...@osl.iu.edu wrote:
> 
>> Author: rhc
>> Date: 2011-10-17 15:49:04 EDT (Mon, 17 Oct 2011)
>> New Revision: 25302
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/25302
>> 
>> Log:
>> Fix the mapping algo for computing vpids - it was borked for bynode 
>> operations when using nperxxx directives
>> 
>> Text files modified: 
>>  trunk/orte/mca/rmaps/base/rmaps_base_support_fns.c |67 
>> --- 
>>  1 files changed, 34 insertions(+), 33 deletions(-)
>> 
>> Modified: trunk/orte/mca/rmaps/base/rmaps_base_support_fns.c
>> ==
>> --- trunk/orte/mca/rmaps/base/rmaps_base_support_fns.c   (original)
>> +++ trunk/orte/mca/rmaps/base/rmaps_base_support_fns.c   2011-10-17 
>> 15:49:04 EDT (Mon, 17 Oct 2011)
>> @@ -527,7 +527,7 @@
>> int orte_rmaps_base_compute_vpids(orte_job_t *jdata)
>> {
>>orte_job_map_t *map;
>> -orte_vpid_t vpid;
>> +orte_vpid_t vpid, cnt;
>>int i, j;
>>orte_node_t *node;
>>orte_proc_t *proc;
>> @@ -539,6 +539,7 @@
>>ORTE_MAPPING_BYSOCKET & map->policy ||
>>ORTE_MAPPING_BYBOARD & map->policy) {
>>/* assign the ranks sequentially */
>> +vpid = 0;
>>for (i=0; i < map->nodes->size; i++) {
>>if (NULL == (node = 
>> (orte_node_t*)opal_pointer_array_get_item(map->nodes, i))) {
>>continue;
>> @@ -553,12 +554,10 @@
>>}
>>if (ORTE_VPID_INVALID == proc->name.vpid) {
>>/* find the next available vpid */
>> -for (vpid=0; vpid < jdata->num_procs; vpid++) {
>> -if (NULL == 
>> opal_pointer_array_get_item(jdata->procs, vpid)) {
>> -break;
>> -}
>> +while (NULL != 
>> opal_pointer_array_get_item(jdata->procs, vpid)) {
>> +vpid++;
>>}
>> -proc->name.vpid = vpid;
>> +proc->name.vpid = vpid++;
>>ORTE_EPOCH_SET(proc->name.epoch,ORTE_EPOCH_INVALID);
>>
>> ORTE_EPOCH_SET(proc->name.epoch,orte_ess.proc_get_epoch(>name));
>> 
>> @@ -580,39 +579,41 @@
>> 
>>if (ORTE_MAPPING_BYNODE & map->policy) {
>>/* assign the ranks round-robin across nodes */
>> -for (i=0; i < map->nodes->size; i++) {
>> -if (NULL == (node = 
>> (orte_node_t*)opal_pointer_array_get_item(map->nodes, i))) {
>> -continue;
>> -}
>> -for (j=0; j < node->procs->size; j++) {
>> -if (NULL == (proc = 
>> (orte_proc_t*)opal_pointer_array_get_item(node->procs, j))) {
>> +cnt = 0;
>> +vpid = 0;
>> +do {
>> +for (i=0; i < map->nodes->size; i++) {
>> +if (NULL == (node = 
>> (orte_node_t*)opal_pointer_array_get_item(map->nodes, i))) {
>>continue;
>>}
>> -/* ignore procs from other jobs */
>> -if (proc->name.jobid != jdata->jobid) {
>> -continue;
>> -}
>> -if (ORTE_VPID_INVALID == proc->name.vpid) {
>> -/* find the next available vpid */
>> -vpid = i;
>> -while (NULL != 
>> opal_pointer_array_get_item(jdata->procs, vpid)) {
>> -vpid += map->num_nodes;
>> -if (jdata->num_procs <= vpid) {
>> -vpid = vpid - jdata->num_procs;
>> +for (j=0; j < node->procs->size; j++) {
>> +if (NULL == (proc = 
>> (orte_proc_t*)opal_pointer_array_get_item(node->procs, j))) {
>> +continue;
>> +}
>> +/* ignore procs from other jobs */
>> +if (proc->name.jobid != jdata->jobid) {
>> +continue;
>> +}
>> +if (ORTE_VPID_INVALID == proc->name.vpid) {
>> +/* find next available vpid */
>> +while (NULL != 
>> opal_pointer_array_get_item(jdata->procs, vpid)) {
>> +vpid++;
>> +}
>> +proc->name.vpid = vpid++;
>> +ORTE_EPOCH_SET(proc->name.epoch,ORTE_EPOCH_INVALID);
>> +
>> ORTE_EPOCH_SET(proc->name.epoch,orte_ess.proc_get_epoch(>name));
>> +if (ORTE_SUCCESS != (rc = 
>> 

Re: [OMPI devel] [OMPI bugs] [Open MPI] #2888: base.h inclusion breaks Solaris build

2011-10-18 Thread TERRY DONTJE
BTW, I am working on a patch for this.  Just want to validate there are 
no other loose ends.  I remember there were a couple oddities about this 
issue.


--td

Never mind; I just ready your text more carefully - 2887 caused the problem.

Sent from my phone. No type good.

On Oct 18, 2011, at 6:19 AM, "Open MPI"  wrote:


#2888: base.h inclusion breaks Solaris build
+
Reporter:  tdd  |  Owner:  tdd
Type:  defect   | Status:  new
Priority:  blocker  |  Milestone:  Open MPI 1.5.5
Version:  trunk|   Keywords:
+
#2887 breaks the Solaris build because opal/sys/timer.h and
opal/mca/timer/base/base.h cause a redeclaration error for opal_timer_t.
This is a similar issue we saw with r25157 that r25170 fixed.

--
Ticket URL:
Open MPI

___
bugs mailing list
b...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/bugs

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI devel] [OMPI bugs] [Open MPI] #2888: base.h inclusion breaks Solaris build

2011-10-18 Thread TERRY DONTJE

Terry -

Did #2887 fix this already?


No it broke it.

--td

Sent from my phone. No type good.

On Oct 18, 2011, at 6:19 AM, "Open MPI"  wrote:


#2888: base.h inclusion breaks Solaris build
+
Reporter:  tdd  |  Owner:  tdd
Type:  defect   | Status:  new
Priority:  blocker  |  Milestone:  Open MPI 1.5.5
Version:  trunk|   Keywords:
+
#2887 breaks the Solaris build because opal/sys/timer.h and
opal/mca/timer/base/base.h cause a redeclaration error for opal_timer_t.
This is a similar issue we saw with r25157 that r25170 fixed.

--
Ticket URL:
Open MPI

___
bugs mailing list
b...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/bugs

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI devel] [OMPI bugs] [Open MPI] #2888: base.h inclusion breaks Solaris build

2011-10-18 Thread Jeff Squyres (jsquyres)
Never mind; I just ready your text more carefully - 2887 caused the problem. 

Sent from my phone. No type good. 

On Oct 18, 2011, at 6:19 AM, "Open MPI"  wrote:

> #2888: base.h inclusion breaks Solaris build
> +
> Reporter:  tdd  |  Owner:  tdd
>Type:  defect   | Status:  new
> Priority:  blocker  |  Milestone:  Open MPI 1.5.5
> Version:  trunk|   Keywords:
> +
> #2887 breaks the Solaris build because opal/sys/timer.h and
> opal/mca/timer/base/base.h cause a redeclaration error for opal_timer_t.
> This is a similar issue we saw with r25157 that r25170 fixed.
> 
> -- 
> Ticket URL: 
> Open MPI 
> 
> ___
> bugs mailing list
> b...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/bugs


Re: [OMPI devel] [OMPI bugs] [Open MPI] #2888: base.h inclusion breaks Solaris build

2011-10-18 Thread Jeff Squyres (jsquyres)
Terry -

Did #2887 fix this already?

Sent from my phone. No type good. 

On Oct 18, 2011, at 6:19 AM, "Open MPI"  wrote:

> #2888: base.h inclusion breaks Solaris build
> +
> Reporter:  tdd  |  Owner:  tdd
>Type:  defect   | Status:  new
> Priority:  blocker  |  Milestone:  Open MPI 1.5.5
> Version:  trunk|   Keywords:
> +
> #2887 breaks the Solaris build because opal/sys/timer.h and
> opal/mca/timer/base/base.h cause a redeclaration error for opal_timer_t.
> This is a similar issue we saw with r25157 that r25170 fixed.
> 
> -- 
> Ticket URL: 
> Open MPI 
> 
> ___
> bugs mailing list
> b...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/bugs


Re: [OMPI devel] make check fails for Intel 2011.6.233 (OpenMPI 1.4.3)

2011-10-18 Thread George Bosilca
Larry,

Sorry for not updating this thread. The issue was identified and fixed by 
Rainer in r25290 (https://svn.open-mpi.org/trac/ompi/changeset/25290). Please 
read the comments and the linked thread on the Intel forum for more info about.

I couldn't find a trace of this being fixed in the 1.4 series, so I would wait 
upgrading until this issue gets resolved.

  Thanks,
george.

On Oct 17, 2011, at 23:00 , Larry Baker wrote:

> George,
> 
> I have not had time to look over the 1.4.3 make check failure for Intel 
> 2011.6.233 compilers.  Have you?
> 
> I had planned to get 1.4.3 compiled on all six of our compilers using the 
> latest compiler releases.  I was putting off upgrading to 1.4.4 or 1.5.x 
> until after that to minimize the number of things that could go wrong.  Do 
> you recommend otherwise?
> 
> Larry Baker
> US Geological Survey
> 650-329-5608
> ba...@usgs.gov
> 
> On 7 Oct 2011, at 6:46 PM, George Bosilca wrote:
> 
>> The may_alias attribute was part of a forward-looking attribute checking, at 
>> a time where few compiler supported them. This explains why they are not 
>> widely used in the library itself. Moreover, as they do not affect the 
>> compilation itself (as your test highlights this is not the issue with the 
>> icc 2011.6.233 compiler), there is no urge to remove the may_alias support.
>> 
>> I just got that particular version of the compiler installed on one of our 
>> machines. I'll give it a try over the weekend.
>> 
>>   george.
>> 
>> On Oct 7, 2011, at 20:21 , Larry Baker wrote:
>> 
>>> The test for the __may_alias_ attribute uses the following short code 
>>> snippet:
>>> 
 int * p_value __attribute__ ((__may_alias__));
 int
 main ()
 {
 
   ;
   return 0;
 }
>>> 
>>> Indeed, for Intel 2011 compilers prior to 2011.6.233, this results in a 
>>> warning:
>>> 
 root@hydra openmpi-1.4.3]# module load compilers/intel/2011.5.220
 [root@hydra openmpi-1.4.3]# icc -c may_alias_test.c 
 may_alias_test.c(123): warning #1292: attribute "__may_alias__" ignored
   int * p_value __attribute__ ((__may_alias__));
 ^
 
 [root@hydra openmpi-1.4.3]# module unload compilers/intel/2011.5.220
>>> 
 [root@hydra openmpi-1.4.3]# module load compilers/intel/2011.6.233
 [root@hydra openmpi-1.4.3]# icc -c may_alias_test.c 
>>> 
>>> 
>>> I modified ./configure to force
>>> 
 ompi_cv___attribute__may_alias=0
>>> 
>>> 
>>> Then I compiled and tested the library.  Unfortunately, the results were 
>>> exactly the same:
>>> 
 make  check-TESTS
 make[3]: Entering directory 
 `/state/partition1/root/src/openmpi-1.4.3/test/datatype'
 /bin/sh: line 4: 26326 Segmentation fault  ${dir}$tst
 FAIL: checksum
 /bin/sh: line 4: 26359 Segmentation fault  ${dir}$tst
 FAIL: position
 
 2 of 2 tests failed
 Please report to http://www.open-mpi.org/community/help/
 
>>> 
>>> 
>>> I could not find any use of the may_alias attribute, other than in a 
>>> #define in opal/include/opal_config_bottom.h.  Is 
>>> OMPI_HAVE_ATTRIBUTE_MAY_ALIAS just cruft that can be removed?
>>> 
>>> Larry Baker
>>> US Geological Survey
>>> 650-329-5608
>>> ba...@usgs.gov
>>> 
>>> On 7 Oct 2011, at 11:08 AM, Larry Baker wrote:
>>> 
 I ran into a problem this past week trying to upgrade our OpenMPI 1.4.3 
 for the latest Intel 2011 compiler, 2011.6.233.
 
 make check fails with Segmentation Fault errors:
 
> [root@hydra openmpi-1.4.3]# tail -20 
> ../openmpi-1.4.3-check-intel.6.233.log
> /bin/sh ../../libtool --tag=CC   --mode=link icc  -DNDEBUG -g -O3 
> -finline-functions -fno-strict-aliasing -restrict -pthread 
> -fvisibility=hidden -shared-intel -export-dynamic -shared-intel  -o 
> ddt_pack ddt_pack.o ../../ompi/libmpi.la -lnsl -lutil  
> libtool: link: icc -DNDEBUG -g -O3 -finline-functions 
> -fno-strict-aliasing -restrict -pthread -fvisibility=hidden -shared-intel 
> -shared-intel -o .libs/ddt_pack ddt_pack.o -Wl,--export-dynamic  
> ../../ompi/.libs/libmpi.so 
> /usr/local/src/openmpi-1.4.3/orte/.libs/libopen-rte.so 
> /usr/local/src/openmpi-1.4.3/opal/.libs/libopen-pal.so -ldl -lnsl -lutil 
> -pthread -Wl,-rpath -Wl,/usr/local/lib
> make[3]: Leaving directory 
> `/state/partition1/root/src/openmpi-1.4.3/test/datatype'
> make  check-TESTS
> make[3]: Entering directory 
> `/state/partition1/root/src/openmpi-1.4.3/test/datatype'
> /bin/sh: line 4:  6322 Segmentation fault  ${dir}$tst
> FAIL: checksum
> /bin/sh: line 4:  6355 Segmentation fault  ${dir}$tst
> FAIL: position
> 
> 2 of 2 tests failed
> Please report to http://www.open-mpi.org/community/help/
>