[OMPI devel] Collective communications may be abend when it use over 2GiB buffer

2012-03-05 Thread Y.MATSUMOTO
Dear All,

Next feedback is about "collective communications".

Collective communication may be abend when it use over 2GiB buffer.
This problem occurs following condition:
-- communicator_size * count(scount/rcount) >= 2GiB
It occurs in even small PC cluster.

The following is one of the suspicious parts.
(Many similar code in ompi/coll/tuned/*.c)

--- in ompi/coll/tuned/coll_tuned_allgather.c (V1.4.X's trunk)---
398tmprecv = (char*) rbuf + rank * rcount * rext;
-

if this condition is met, "rank * rcount" is overflowed.
So, we fixed it tentatively like following:
(cast int to size_t)
--- in ompi/coll/tuned/coll_tuned_allgather.c --
398tmprecv = (char*) rbuf + (size_t)rank * rcount * rext;


It needs not only "ompi/coll/tuned" but also other codes to fix this problem.
We try to fix, but following functions have problem (argument may be 
overflowed):
-"ompi_coll_tuned_sendrecv" may be called when "scount/rcount" sets over 2GiB.
-"ompi_datatype_copy_content_same_ddt" may be called when "count" sets over 
2GiB.
-"basic_linear in Allgather": Bcast may be called when "count" sets over 2GiB.

Best Regards,
Yuki Matsumoto
MPI development team,
Fujitsu



[OMPI devel] [PATCH]Incorrect algorithm choice using coll_tuned_dynamic_rules_filename (over 2GiB message)

2012-03-01 Thread Y.MATSUMOTO
Dear All,

Next feedback is about "coll_tuned_dynamic_rules_filename".

Incorrect algorithm is selected in following conditions:
1:"--mca coll_tuned_use_dynamic_rules 1" is set.
2:"--mca coll_tuned_dynamic_rules_filename" is set.
3: Collective communication which is written in 2, called >= 2GiB communication.
(ex) MPI_Bcast:data type size * count >= 2GiB
MPI_Allgather: data type size * count * communication size >= 2GiB)

Please see attached patch(Patch is for V1.4.x).
But, we found problem when over 2GiB message is written in rulefile as "message 
size".
(over 2GiB message cannot read correctly.)
And we do not fix it.

Best Regards,
yuki Matsumoto
MPI development team,
Fujitsu
Copyright (c) 2011-2012  FUJITSU LIMITED.  All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer listed
in this license in the documentation and/or other materials
provided with the distribution.

* Neither the name of the copyright holders nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

The copyright holders provide no reassurances that the source code
provided does not infringe any patent, copyright, or any other
intellectual property rights of third parties.  The copyright holders
disclaim any liability to any recipient for claims brought against
recipient by any third party for infringement of that parties
intellectual property rights.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


Index: ompi/mca/coll/tuned/coll_tuned_dynamic_rules.c
===
--- ompi/mca/coll/tuned/coll_tuned_dynamic_rules.c  (revision 25978)
+++ ompi/mca/coll/tuned/coll_tuned_dynamic_rules.c  (working copy)
@@ -350,7 +350,7 @@
  *
  */
 
-int ompi_coll_tuned_get_target_method_params (ompi_coll_com_rule_t* 
base_com_rule, int mpi_msgsize, int *result_topo_faninout, 
+int ompi_coll_tuned_get_target_method_params (ompi_coll_com_rule_t* 
base_com_rule, size_t mpi_msgsize, int *result_topo_faninout, 
   int* result_segsize, int* 
max_requests)
 {
 ompi_coll_msg_rule_t*  msg_p = (ompi_coll_msg_rule_t*) NULL;
Index: ompi/mca/coll/tuned/coll_tuned_dynamic_rules.h
===
--- ompi/mca/coll/tuned/coll_tuned_dynamic_rules.h  (revision 25978)
+++ ompi/mca/coll/tuned/coll_tuned_dynamic_rules.h  (working copy)
@@ -37,7 +37,7 @@
int msg_rule_id; /* unique msg rule id */
 
/* RULE */
-   int msg_size;/* message size */
+   size_t msg_size; /* message size */
 
/* RESULT */
int result_alg;  /* result algorithm to use */
@@ -95,7 +95,7 @@
 
 ompi_coll_com_rule_t* ompi_coll_tuned_get_com_rule_ptr (ompi_coll_alg_rule_t* 
rules, int alg_id, int mpi_comsize);
 
-int ompi_coll_tuned_get_target_method_params (ompi_coll_com_rule_t* 
base_com_rule, int mpi_msgsize, 
+int ompi_coll_tuned_get_target_method_params (ompi_coll_com_rule_t* 
base_com_rule, size_t mpi_msgsize, 
   int* result_topo_faninout, int* 
result_segsize, 
   int* max_requests);
 


[OMPI devel] [PATCH]Segmentation Fault occurs when the function called from MPI_Comm_spawn_multiple fails

2012-02-09 Thread Y.MATSUMOTO
Dear All,

Next feedback is "MPI_Comm_spawn_multiple".

When the function called from MPI_Comm_spawn_multiple failed,
Segmentation fault occurs.
In that condition, "newcomp" sets NULL.
But member of "newcomp" is referred at following part.
(ompi/mpi/c/comm_spawn_multiple.c)
176 /* set array of errorcodes */
177 if (MPI_ERRCODES_IGNORE != array_of_errcodes) {
178 for ( i=0; i < newcomp->c_remote_group->grp_proc_count; i++ ) {
179 array_of_errcodes[i]=rc;
180 }
181 }
Attached patch fixes it. (Patch is for V1.4.x).

Best regards,
Yuki MATSUMOTO
MPI development team,
Fujitsu

Index: ompi/mpi/c/comm_spawn_multiple.c
===
--- ompi/mpi/c/comm_spawn_multiple.c(revision 25723)
+++ ompi/mpi/c/comm_spawn_multiple.c(working copy)
@@ -42,7 +42,7 @@
 int root, MPI_Comm comm, MPI_Comm *intercomm,
 int *array_of_errcodes) 
 {
-int i=0, rc=0, rank=0, flag;
+int i=0, rc=0, rank=0, size=0, flag;
 ompi_communicator_t *newcomp=NULL;
 bool send_first=false; /* they are contacting us first */
 char port_name[MPI_MAX_PORT_NAME];
@@ -175,8 +175,18 @@
 
 /* set array of errorcodes */
 if (MPI_ERRCODES_IGNORE != array_of_errcodes) {
-for ( i=0; i < newcomp->c_remote_group->grp_proc_count; i++ ) {
-array_of_errcodes[i]=rc;
+if (NULL != newcomp) {
+for ( i=0; i < newcomp->c_remote_group->grp_proc_count; i++ ) {
+array_of_errcodes[i]=rc;
+}
+} else {
+for ( i=0; i < count; i++) {
+size = size + array_of_maxprocs[i];
+}
+
+for ( i=0; i < size; i++) {
+array_of_errcodes[i]=rc;
+}
 }
 }
 


[OMPI devel] [PATCH]Some typos in error code, func_name and man

2012-01-25 Thread Y.MATSUMOTO

Dear All,

We found some typos in error code/func_name/man.
Attached three patches fix them(Patch is for in V1.4x).

Best regards,
Yuki MATSUMOTO
MPI development team,
Fujitsu
Index: ompi/errhandler/errcode-internal.c
===
--- ompi/errhandler/errcode-internal.c  (revision 25448)
+++ ompi/errhandler/errcode-internal.c  (working copy)
@@ -95,7 +95,7 @@
 ompi_err_temp_out_of_resource.code = OMPI_ERR_TEMP_OUT_OF_RESOURCE;
 ompi_err_temp_out_of_resource.mpi_code = MPI_ERR_INTERN;
 ompi_err_temp_out_of_resource.index = pos++;
-strncpy(ompi_err_temp_out_of_resource.errstring, 
"MPI_ERR_TEMP_OUT_OF_RESOURCE", OMPI_MAX_ERROR_STRING);
+strncpy(ompi_err_temp_out_of_resource.errstring, 
"OMPI_ERR_TEMP_OUT_OF_RESOURCE", OMPI_MAX_ERROR_STRING);
 opal_pointer_array_set_item(_errcodes_intern, 
ompi_err_temp_out_of_resource.index, 
 _err_temp_out_of_resource);
 
Index: ompi/mpi/man/man3/MPI_Comm_delete_attr.3in
===
--- ompi/mpi/man/man3/MPI_Comm_delete_attr.3in  (revision 25723)
+++ ompi/mpi/man/man3/MPI_Comm_delete_attr.3in  (working copy)
@@ -15,7 +15,7 @@
 .SH Fortran Syntax
 .nf
 INCLUDE 'mpif.h'
-MPI_Comm_delete_attr(\fICOMM, COMM_KEYVAL, IERROR\fP)
+MPI_COMM_DELETE_ATTR(\fICOMM, COMM_KEYVAL, IERROR\fP)
INTEGER \fICOMM, COMM_KEYVAL, IERROR \fP
 
 .fi
Index: ompi/mpi/man/man3/MPI_Init_thread.3in
===
--- ompi/mpi/man/man3/MPI_Init_thread.3in   (revision 25723)
+++ ompi/mpi/man/man3/MPI_Init_thread.3in   (working copy)
@@ -20,7 +20,7 @@
 .SH Fortran Syntax
 .nf
 INCLUDE 'mpif.h'
-MPI_INIT(\fIREQUIRED, PROVIDED, IERROR\fP)
+MPI_INIT_THREAD(\fIREQUIRED, PROVIDED, IERROR\fP)
INTEGER \fIREQUIRED, PROVIDED, IERROR\fP 
 
 .fi
Index: ompi/mpi/man/man3/MPI_Comm_split.3in
===
--- ompi/mpi/man/man3/MPI_Comm_split.3in(revision 25723)
+++ ompi/mpi/man/man3/MPI_Comm_split.3in(working copy)
@@ -54,7 +54,7 @@
 .ft R
 This function partitions the group associated with comm into disjoint 
subgroups, one for each value of color. Each subgroup contains all processes of 
the same color. Within each subgroup, the processes are ranked in the order 
defined by the value of the argument key, with ties broken according to their 
rank in the old group. A new communicator is created for each subgroup and 
returned in newcomm. A process may supply the color value MPI_UNDEFINED, in 
which case newcomm returns MPI_COMM_NULL. This is a collective call, but each 
process is permitted to provide different values for color and key. 
 .sp
-When you call MPI_Comm_split on an inter-communicator, the processes on the 
left with the same color as those on the right combine to create a new 
inter-communicator.  The key argument describes the relative rank of processes 
on each side of the inter-communicator.  The function returns MPI_COMM_NULL for 
 those colors that are specified on only one side of the inter-communicator, or 
for those that specify MPI_UNEDEFINED as the color.  
+When you call MPI_Comm_split on an inter-communicator, the processes on the 
left with the same color as those on the right combine to create a new 
inter-communicator.  The key argument describes the relative rank of processes 
on each side of the inter-communicator.  The function returns MPI_COMM_NULL for 
 those colors that are specified on only one side of the inter-communicator, or 
for those that specify MPI_UNDEFINED as the color.  
 .sp
 A call to MPI_Comm_create(\fIcomm\fP, \fIgroup\fP, \fInewcomm\fP) is 
equivalent to a call to MPI_Comm_split(\fIcomm\fP, \fIcolor\fP,\fI key\fP, 
\fInewcomm\fP), where all members of \fIgroup\fP provide \fIcolor\fP = 0 and 
\fIkey\fP = rank in group, and all processes that are not members of 
\fIgroup\fP provide \fIcolor\fP = MPI_UNDEFINED. The function MPI_Comm_split 
allows more general partitioning of a group into one or more subgroups with 
optional reordering. 
 .sp
Index: ompi/mpi/man/man3/MPI_Comm_free_keyval.3in
===
--- ompi/mpi/man/man3/MPI_Comm_free_keyval.3in  (revision 25723)
+++ ompi/mpi/man/man3/MPI_Comm_free_keyval.3in  (working copy)
@@ -39,7 +39,7 @@
 
 .SH DESCRIPTION
 .ft R
-MPI_Comm_free_keyval frees an extant attribute key. This function sets the 
value of \fIkeyval\fP to  MPI_KEYVAL_INVALID. Note that it is not erroneous to 
free an attribute key that is in use, because the actual free does not 
transpire until after all references (in other communicators on the process) to 
the key have been freed. These references need to be explictly freed by the 
program, either via calls to MPI_Comm_delete_attr that free one attribute 
instance, or by calls to MPI_Comm_free that free all attribute instances 
associated 

[OMPI devel] [PATCH] MPI_FILE_SEEK_SHARED is wrong in Fortran

2012-01-25 Thread Y.MATSUMOTO
Dear All,

Next is about "MPI_FILE_SEEK_SHARED" in Fortran.

When MPI_FILE_SEEK_SHARED is called in Fortran Program,
the shared file pointer is not updated.

Incorrent function call is  the following part:
ompi/mpi/f77/file_seek_shared_f.c---
 60 void mpi_file_seek_shared_f(MPI_Fint *fh, MPI_Offset *offset,
 61 MPI_Fint *whence, MPI_Fint *ierr)
 62 {
 63 MPI_File c_fh = MPI_File_f2c(*fh);
 64
 65 *ierr = OMPI_INT_2_FINT(MPI_File_seek(c_fh, (MPI_Offset) *offset,
 66   OMPI_FINT_2_INT(*whence)));
 67 }
ompi/mpi/f77/file_seek_shared_f.c---
Attached patch fixes it(Patch is for in V1.4x).

Best regards,
Yuki MATSUMOTO
MPI development team,
Fujitsu

Index: ompi/mpi/f77/file_seek_shared_f.c
===
--- ompi/mpi/f77/file_seek_shared_f.c   (revision 25723)
+++ ompi/mpi/f77/file_seek_shared_f.c   (working copy)
@@ -62,6 +62,6 @@
 {
 MPI_File c_fh = MPI_File_f2c(*fh);
 
-*ierr = OMPI_INT_2_FINT(MPI_File_seek(c_fh, (MPI_Offset) *offset,
+*ierr = OMPI_INT_2_FINT(MPI_File_seek_shared(c_fh, (MPI_Offset) *offset,
 
OMPI_FINT_2_INT(*whence)));
 }


[OMPI devel] Violating standard in MPI_Close_port

2012-01-20 Thread Y.MATSUMOTO
Dear All,

Next is question about "MPI_Close_port".
According to the MPI-2.2 standard, 
the "port_name" argument of
MPI_Close_port() is marked as 'IN'.
But, in Open MPI (both trunk and 1.4.x), the content of
"port_name" is updated in MPI_Close_port().
It seems to violate the MPI standard.

The following is the suspicious part.
---ompi/mca/dpm/orte/dpm_orte.c---
919 static int close_port(char *port_name)
920 {
921 /* the port name is a pointer to an array - DO NOT FREE IT! */
922 memset(port_name, 0, MPI_MAX_PORT_NAME);
923 return OMPI_SUCCESS;
924 }
---ompi/mca/dpm/orte/dpm_orte.c---

This memset makes "port_name" "INOUT".
Would you tell me why call this memset?

Best regards,
Yuki MATSUMOTO
MPI development team,
Fujitsu



Re: [OMPI devel] Incorrect and undefined return code/function/data type at C++ header

2011-12-13 Thread Y.MATSUMOTO

Dear All,

I fixed the patch.
(MPI::Fint etc.)

So, please replace the patch.

Best regards.
---
Yuki MATSUMOTO
MPI development team,
Fujitsu

(2011/12/09 11:35), Y.MATSUMOTO wrote:

Dear Jeff and all,

Thank you for your comment.
I'm sorry for not replying sooner.

1:MPI::Fint
We checked C++ header using MPI-2.1 standard.
So, it doesn't need MPI::Fint definition.
(Please remove it!)

2:MPI::Grequest::Start
Sorry! I send you incorrect list.

Best regards.
---
Yuki MATSUMOTO
MPI development team,
Fujitsu

(2011/12/06 1:35), Jeff Squyres wrote:

Many thanks for the patch!

Two minor points:

1. I do not believe that MPI::Fint exists. It's surprising, but I'm pretty sure we 
double checked this back in the MPI-2.2 timeframe and came to the conclusions that a) 
it does not exist, and b) it should not exist, because all C++<--> Fortran 
interaction is supposed to go through the C translation routines.

2. Grequest::Start is a static function on the MPI namespace -- it is not marked 
"const" in MPI 2.1 or 2.2 (I don't see it in the patch, either).

On Dec 4, 2011, at 9:31 PM, Y.MATSUMOTO wrote:


Dear all,

We send next feed back.
It's about C++ header file.

In ompi/mpi/cxx/*.h,
Some definitions of return code, type and function are lacked or incorrect.
Attached patch fixes them (This Patch is for V1.4.X).

Following list is what is lacked and incorrect.

*Undefined return code
--
MPI::ERR_ACCESS
MPI::ERR_AMODE
MPI::ERR_ASSERT
MPI::ERR_BAD_FILE
MPI::ERR_CONVERSION
MPI::ERR_DISP
MPI::ERR_DUP_DATAREP
MPI::ERR_FILE_EXISTS
MPI::ERR_FILE_IN_USE
MPI::ERR_FILE
MPI::ERR_INFO
MPI::ERR_IO
MPI::ERR_LOCKTYPE
MPI::ERR_NOT_SAME
MPI::ERR_NO_SPACE
MPI::ERR_NO_SUCH_FILE
MPI::ERR_PORT
MPI::ERR_QUOTA
MPI::ERR_READ_ONLY
MPI::ERR_RMA_CONFLICT
MPI::ERR_RMA_SYNC
MPI::ERR_SIZE
MPI::ERR_UNSUPPORTED_DATAREP
MPI::ERR_UNSUPPORTED_OPERATION
--
*Undefined data type
--
MPI::LONG_LONG_INT
MPI::Fint
MPI::F_DOUBLE_COMPLEX
--

*Undefined function
--
MPI::Datatype::Create_darray
MPI::Datatype::Pack_external
MPI::Datatype::Pack_external_size
MPI::Datatype::Unpack_external
MPI::Add_error_class
MPI::Add_error_code
MPI::Add_error_string
MPI::Datatype::Create_f90_complex
MPI::Datatype::Create_f90_integer
MPI::Datatype::Create_f90_real
MPI::Datatype::Match_size
--

*Incorrect of definitions
(MPI-2.1 standard defines these as "const", but they are not "const" in code)
--
MPI::Intercomm::Merge
MPI::Cartcomm::Sub
MPI::Grequest::Start
--

*Incorrect of definitions
(MPI-2.1 standard defines these as not "const", but they are "const" in code)
--
MPI::Comm::Set_errhandler
MPI::File::Set_errhandler
MPI::Win::Set_errhandler
--

Best regards.
--
Yuki MATSUMOTO
MPI development team,
Fujitsu

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Index: ompi/mpi/cxx/comm.h
===
--- ompi/mpi/cxx/comm.h (revision 25570)
+++ ompi/mpi/cxx/comm.h (working copy)
@@ -382,7 +382,7 @@

   static Errhandler Create_errhandler(Comm::Errhandler_fn* function);

-  virtual void Set_errhandler(const Errhandler& errhandler) const;
+  virtual void Set_errhandler(const Errhandler& errhandler);

   virtual Errhandler Get_errhandler() const;

Index: ompi/mpi/cxx/topology_inln.h
===
--- ompi/mpi/cxx/topology_inln.h(revision 25570)
+++ ompi/mpi/cxx/topology_inln.h(working copy)
@@ -99,7 +99,7 @@
 }

 inline MPI::Cartcomm
-MPI::Cartcomm::Sub(const bool remain_dims[]) 
+MPI::Cartcomm::Sub(const bool remain_dims[]) const
 {
   int ndims;
   MPI_Cartdim_get(mpi_comm, );
Index: ompi/mpi/cxx/intercomm.h
===
--- ompi/mpi/cxx/intercomm.h(revision 25570)
+++ ompi/mpi/cxx/intercomm.h(working copy)
@@ -77,7 +77,7 @@

   virtual Group Get_remote_group() const;

-  virtual Intracomm Merge(bool high);
+  virtual Intracomm Merge(bool high) const;

   virtual Intercomm Create(const Group& group) const;

Index: ompi/mpi/cxx/mpicxx.cc
===
--- ompi/mpi/cxx/mpicxx.cc  (revision 25570)
+++ ompi/mpi/cxx/mpicxx.cc  (workin

Re: [OMPI devel] Incorrect and undefined return code/function/data type at C++ header

2011-12-08 Thread Y.MATSUMOTO

Dear Jeff and all,

Thank you for your comment.
I'm sorry for not replying sooner.

1:MPI::Fint
We checked C++ header using MPI-2.1 standard.
So, it doesn't need MPI::Fint definition.
(Please remove it!)

2:MPI::Grequest::Start
Sorry! I send you incorrect list.

Best regards.
---
Yuki MATSUMOTO
MPI development team,
Fujitsu

(2011/12/06 1:35), Jeff Squyres wrote:

Many thanks for the patch!

Two minor points:

1. I do not believe that MPI::Fint exists.  It's surprising, but I'm pretty sure we 
double checked this back in the MPI-2.2 timeframe and came to the conclusions that a) 
it does not exist, and b) it should not exist, because all C++<-->  Fortran 
interaction is supposed to go through the C translation routines.

2. Grequest::Start is a static function on the MPI namespace -- it is not marked 
"const" in MPI 2.1 or 2.2 (I don't see it in the patch, either).

On Dec 4, 2011, at 9:31 PM, Y.MATSUMOTO wrote:


Dear all,

We send next feed back.
It's about C++ header file.

In ompi/mpi/cxx/*.h,
Some definitions of return code, type and function are lacked or incorrect.
Attached patch fixes them (This Patch is for V1.4.X).

Following list is what is lacked and incorrect.

*Undefined return code
--
MPI::ERR_ACCESS
MPI::ERR_AMODE
MPI::ERR_ASSERT
MPI::ERR_BAD_FILE
MPI::ERR_CONVERSION
MPI::ERR_DISP
MPI::ERR_DUP_DATAREP
MPI::ERR_FILE_EXISTS
MPI::ERR_FILE_IN_USE
MPI::ERR_FILE
MPI::ERR_INFO
MPI::ERR_IO
MPI::ERR_LOCKTYPE
MPI::ERR_NOT_SAME
MPI::ERR_NO_SPACE
MPI::ERR_NO_SUCH_FILE
MPI::ERR_PORT
MPI::ERR_QUOTA
MPI::ERR_READ_ONLY
MPI::ERR_RMA_CONFLICT
MPI::ERR_RMA_SYNC
MPI::ERR_SIZE
MPI::ERR_UNSUPPORTED_DATAREP
MPI::ERR_UNSUPPORTED_OPERATION
--
*Undefined data type
--
MPI::LONG_LONG_INT
MPI::Fint
MPI::F_DOUBLE_COMPLEX
--

*Undefined function
--
MPI::Datatype::Create_darray
MPI::Datatype::Pack_external
MPI::Datatype::Pack_external_size
MPI::Datatype::Unpack_external
MPI::Add_error_class
MPI::Add_error_code
MPI::Add_error_string
MPI::Datatype::Create_f90_complex
MPI::Datatype::Create_f90_integer
MPI::Datatype::Create_f90_real
MPI::Datatype::Match_size
--

*Incorrect of definitions
(MPI-2.1 standard defines these as "const", but they are not "const" in code)
--
MPI::Intercomm::Merge
MPI::Cartcomm::Sub
MPI::Grequest::Start
--

*Incorrect of definitions
(MPI-2.1 standard defines these as not "const", but they are "const" in code)
--
MPI::Comm::Set_errhandler
MPI::File::Set_errhandler
MPI::Win::Set_errhandler
--

Best regards.
--
Yuki MATSUMOTO
MPI development team,
Fujitsu

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel







[OMPI devel] Incorrect and undefined return code/function/data type at C++ header

2011-12-04 Thread Y.MATSUMOTO
Dear all,

We send next feed back.
It's about C++ header file.

In ompi/mpi/cxx/*.h, 
Some definitions of return code, type and function are lacked or incorrect.
Attached patch fixes them (This Patch is for V1.4.X).

Following list is what is lacked and incorrect.

*Undefined return code
--
MPI::ERR_ACCESS
MPI::ERR_AMODE
MPI::ERR_ASSERT
MPI::ERR_BAD_FILE
MPI::ERR_CONVERSION
MPI::ERR_DISP
MPI::ERR_DUP_DATAREP
MPI::ERR_FILE_EXISTS
MPI::ERR_FILE_IN_USE
MPI::ERR_FILE
MPI::ERR_INFO
MPI::ERR_IO
MPI::ERR_LOCKTYPE
MPI::ERR_NOT_SAME
MPI::ERR_NO_SPACE
MPI::ERR_NO_SUCH_FILE
MPI::ERR_PORT
MPI::ERR_QUOTA
MPI::ERR_READ_ONLY
MPI::ERR_RMA_CONFLICT
MPI::ERR_RMA_SYNC
MPI::ERR_SIZE
MPI::ERR_UNSUPPORTED_DATAREP
MPI::ERR_UNSUPPORTED_OPERATION
--
*Undefined data type
--
MPI::LONG_LONG_INT
MPI::Fint
MPI::F_DOUBLE_COMPLEX
--

*Undefined function
--
MPI::Datatype::Create_darray
MPI::Datatype::Pack_external
MPI::Datatype::Pack_external_size
MPI::Datatype::Unpack_external
MPI::Add_error_class
MPI::Add_error_code
MPI::Add_error_string
MPI::Datatype::Create_f90_complex
MPI::Datatype::Create_f90_integer
MPI::Datatype::Create_f90_real
MPI::Datatype::Match_size
--

*Incorrect of definitions
(MPI-2.1 standard defines these as "const", but they are not "const" in code)
--
MPI::Intercomm::Merge
MPI::Cartcomm::Sub
MPI::Grequest::Start
--

*Incorrect of definitions
(MPI-2.1 standard defines these as not "const", but they are "const" in code)
--
MPI::Comm::Set_errhandler
MPI::File::Set_errhandler
MPI::Win::Set_errhandler
--

Best regards.
--
Yuki MATSUMOTO
MPI development team,
Fujitsu

Index: ompi/mpi/cxx/comm.h
===
--- ompi/mpi/cxx/comm.h (revision 25518)
+++ ompi/mpi/cxx/comm.h (working copy)
@@ -11,6 +11,7 @@
 // Copyright (c) 2004-2005 The Regents of the University of California.
 // All rights reserved.
 // Copyright (c) 2006-2008 Cisco Systems, Inc.  All rights reserved.
+// Copyright (c) 2011  FUJITSU LIMITED.  All rights reserved.
 // $COPYRIGHT$
 // 
 // Additional copyrights may follow
@@ -382,7 +383,7 @@
 
   static Errhandler Create_errhandler(Comm::Errhandler_fn* function);
 
-  virtual void Set_errhandler(const Errhandler& errhandler) const;
+  virtual void Set_errhandler(const Errhandler& errhandler);
 
   virtual Errhandler Get_errhandler() const;
 
Index: ompi/mpi/cxx/topology_inln.h
===
--- ompi/mpi/cxx/topology_inln.h(revision 25518)
+++ ompi/mpi/cxx/topology_inln.h(working copy)
@@ -11,6 +11,7 @@
 // Copyright (c) 2004-2005 The Regents of the University of California.
 // All rights reserved.
 // Copyright (c) 2007  Sun Microsystems, Inc.  All rights reserved.
+// Copyright (c) 2011  FUJITSU LIMITED.  All rights reserved.
 // $COPYRIGHT$
 // 
 // Additional copyrights may follow
@@ -99,7 +100,7 @@
 }
 
 inline MPI::Cartcomm
-MPI::Cartcomm::Sub(const bool remain_dims[]) 
+MPI::Cartcomm::Sub(const bool remain_dims[]) const
 {
   int ndims;
   MPI_Cartdim_get(mpi_comm, );
Index: ompi/mpi/cxx/intercomm.h
===
--- ompi/mpi/cxx/intercomm.h(revision 25518)
+++ ompi/mpi/cxx/intercomm.h(working copy)
@@ -11,6 +11,7 @@
 // Copyright (c) 2004-2005 The Regents of the University of California.
 // All rights reserved.
 // Copyright (c) 2006  Cisco Systems, Inc.  All rights reserved.
+// Copyright (c) 2011  FUJITSU LIMITED.  All rights reserved.
 // $COPYRIGHT$
 // 
 // Additional copyrights may follow
@@ -77,7 +78,7 @@
 
   virtual Group Get_remote_group() const;
 
-  virtual Intracomm Merge(bool high);
+  virtual Intracomm Merge(bool high) const;
 
   virtual Intercomm Create(const Group& group) const;
 
Index: ompi/mpi/cxx/mpicxx.cc
===
--- ompi/mpi/cxx/mpicxx.cc  (revision 25518)
+++ ompi/mpi/cxx/mpicxx.cc  (working copy)
@@ -12,6 +12,7 @@
 // All rights reserved.
 // Copyright (c) 2007-2009 Cisco Systems, Inc.  All rights reserved.
 // Copyright (c) 2007  Sun Microsystems, Inc.  All rights reserved.
+// Copyright (c) 2011  FUJITSU LIMITED.  All rights reserved.
 // $COPYRIGHT$
 // 
 // Additional copyrights may follow
@@ -102,11 +103,13 @@
 // optional datatype (C / C++)
 const Datatype UNSIGNED_LONG_LONG(MPI_UNSIGNED_LONG_LONG);
 const Datatype 

Re: [OMPI devel] "Open MPI"-based MPI library used by K computer

2011-11-14 Thread Y.MATSUMOTO

Dear Open MPI community,

I'm a member of MPI library development team in Fujitsu,
Takahiro Kawashima, who sent mail before, is my colleague.
We start to feed back.

First, we fixed about MPI_LB/MPI_UB and data packing problem.

Program crashes when it meets all of the following conditions:
a: The type of sending data is contiguous and derived type.
b: Either or both of MPI_LB and MPI_UB is used in the data type.
c: The size of sending data is smaller than extent(Data type has gap).
d: Send-count is bigger than 1.
e: Total size of data is bigger than "eager limit"

This problem occurs in attachment C program.

An incorrect-address accessing occurs
because an unintended value of "done" inputs and
the value of "max_allowd" becomes minus
in the following place in "ompi/datatype/datatype_pack.c(in version 1.4.3)".


(ompi/datatype/datatype_pack.c)
188 packed_buffer = (unsigned char *) iov[iov_count].iov_base;
189 done = pConv->bConverted - i * pData->size;  /* partial data 
from last pack */
190 if( done != 0 ) {  /* still some data to copy from the last 
time */
191 done = pData->size - done;
192 OMPI_DDT_SAFEGUARD_POINTER( user_memory, done, 
pConv->pBaseBuf, pData, pConv->count );
193 MEMCPY_CSUM( packed_buffer, user_memory, done, pConv );
194 packed_buffer += done;
195 max_allowed -= done;
196 total_bytes_converted += done;
197 user_memory += (extent - pData->size + done);
198 }

This program assumes "done" as the size of partial data from last pack.
However, when the program crashes, "done" equals the sum of all transmitted 
data size.
It makes "max_allowed" to be a negative value.

We modified the code as following and it passed our test suite.
But we are not sure this fix is correct. Can anyone review this fix?
Patch (against Open MPI 1.4 branch) is attached to this mail.

-if( done != 0 ) {  /* still some data to copy from the last time */
+if( (done + max_allowed) >= pData->size ) {  /* still some data to 
copy from the last time */

Best regards,

Yuki MATSUMOTO
MPI development team,
Fujitsu

(2011/06/28 10:58), Takahiro Kawashima wrote:

Dear Open MPI community,

I'm a member of MPI library development team in Fujitsu. Shinji
Sumimoto, whose name appears in Jeff's blog, is one of our bosses.

As Rayson and Jeff noted, K computer, world's most powerful HPC system
developed by RIKEN and Fujitsu, utilizes Open MPI as a base of its MPI
library. We, Fujitsu, are pleased to announce that, and also have special
thanks to Open MPI community.
We are sorry to be late announce!

Our MPI library is based on Open MPI 1.4 series, and has a new point-
to-point component (BTL) and new topology-aware collective communication
algorithms (COLL). Also, it is adapted to our runtime environment (ESS,
PLM, GRPCOMM etc).

K computer connects 68,544 nodes by our custom interconnect.
Its runtime environment is our proprietary one. So we don't use orted.
We cannot tell start-up time yet because of disclosure restriction, sorry.

We are surprised by the extensibility of Open MPI, and have proved that
Open MPI is scalable to 68,000 processes level! We feel pleasure to
utilize such a great open-source software.

We cannot tell detail of our technology yet because of our contract
with RIKEN AICS, however, we will plan to feedback of our improvements
and bug fixes. We can contribute some bug fixes soon, however, for
contribution of our improvements will be next year with Open MPI
agreement.

Best regards,

MPI development team,
Fujitsu



I got more information:

http://blogs.cisco.com/performance/open-mpi-powers-8-petaflops/

Short version: yes, Open MPI is used on K and was used to power the 8PF runs.

w00t!



On Jun 24, 2011, at 7:16 PM, Jeff Squyres wrote:


w00t!

OMPI powers 8 petaflops!
(at least I'm guessing that -- does anyone know if that's true?)


On Jun 24, 2011, at 7:03 PM, Rayson Ho wrote:


Interesting... page 11:

http://www.fujitsu.com/downloads/TC/sc10/programming-on-k-computer.pdf

Open MPI based:

* Open Standard, Open Source, Multi-Platform including PC Cluster.
* Adding extension to Open MPI for "Tofu" interconnect

Rayson

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Index: ompi/datatype/datatype_pack.c
===
--- ompi/datatype/datatype_pack.c   (revision 25474)
+++ ompi/datatype/datatype_pack.c   (working copy)
@@ -187,7 +187,7 @@
 
 packed_buffer = (unsigned char *) iov[iov_count].iov_base;
 done = pConv->bConverted - i * pData->size;  /* partial data from 
last pack */
-if( done != 0 ) {  /* still some data to copy from the last time */
+if( (done + max_allowed) >= pData->size ) {  /* still some data to