Hi guys,

This is the reconsidered concept about our semi-automatic optimized parallel 
I/O system. Referring to the last discussion please take a look at the 
following link:
http://www.open-mpi.org/community/lists/users/2012/06/19517.php

Thank Ralph and Jeff for giving me so many advises. The whole system has been 
reconsidered, please take a look at the attached pictures. As the parallel I/O 
is extremely complex, we have chosen the most important and impactive part - 
I/O algorithm - to start. As for the other parts (listed by Jeff), such as the 
MPI layer, the OS of the file system, the storage controller, the network and 
so on, it is easier to be taken into consideration one by one later (Hope I am 
not wrong :)).


Description of the picture

I/O System: The system we want to implement.

Other Systems: The systems outside the I/O system and contain the database, I/O 
monitor and the file systems like GPFS.


Step 1: The client sends the commands to the I/O nodes and starts the system 
deamon, which start the MPI process, on each node.

Step 2: After preparing in the system deamon, the MPI process starts running. 
All the necessary information such as the URI of database, the address of the 
source/target file in the file system, the I/O parameters, the number of 
processes used and so on is passed to the MPI process either as MPI hints or as 
the parameters of mpirun command.

Step 3 & 4: After the MPI_Init(), we can define a function named like 
MPI_IO_Select() to obtain the best I/O algorithm/pattern from the database. A 
similar algorithm selecting function has been implemented in the OMPIO under 
the fcoll module. I think it is possible to add the database accessing part in 
the source code of this module. In addition, accessing the file system to get 
the storage property before the I/O algorithm/pattern selection is also 
possible, if the file system offers such kind of API. Then the proper I/O 
algorithm/pattern with proper I/O parameters is applied in the next steps.

Step 5 & 6: The best I/O operation runs on the file system.

Step 7: After the end of the MPI process, the system deamon continues to do the 
further work.

Step 8 & 9: During the accessing of the file system, the monitor keeps watching 
the status of the file system and the performance of the I/O operation. The 
results or information will be collected and sent to the database for further 
analyzing. This part has no interaction with the MPI process or even the I/O 
system, therefore, it does not have to be real time.


The system decides the I/O operation according to several conditions in order 
to insure that the I/O operation will not be worse than the last similar I/O 
operation. It might have some self-study ability with the help of database. The 
changeable or optimizable part is NOT ONLY the I/O algorithm, but also the I/O 
related parameters.

We think it will be very useful for those applications, which usually run 
similar or long I/O operations.

Any suggestion is welcomed.

Thanks.

Best Regards!
Xuan Wang

Reply via email to