Hi,


The log filenames suggests you are always running on a single node, is 
that correct ?

Do you create the input file on the tmpfs once for all? before each run?

Can you please post your mpirun command lines?

If you did not bind the tasks, can you try again

mpirun --bind-to core ...



Cheers,



Gilles

----- Original Message -----

Hi,

We faced an issue when testing the scalability of parallel merge sort 
using reduction tree on an array of size 1024^3.
Currently, only the master opens the input file and parse it into an 
array using fscanf and then distribute the array to other processors.
When using 32 processors, it took ~109 seconds to read from file.
When using 64 processors, it took ~216 seconds to read from file.
Despite varying number of processors, only one processor (the master) 
read the file.
The input file is stored in a tmpfs, its made up of 1024^3 + 1 numbers (
where the first number is the array size).

Additionally, I ran a C program that only read the file, it took ~104 
seconds.
However, I also ran an MPI program that only read the file, it took ~116 
and  ~118 seconds on 32 and 64 processors respectively.

Code at  https://gist.github.com/alichry/84a9721bac741ffdf891e70b82274aaf

parallel_ms.c:  
https://gist.github.com/alichry/84a9721bac741ffdf891e70b82274aaf#file-parallel_ms-c

mpi_just_read.c:  
https://gist.github.com/alichry/84a9721bac741ffdf891e70b82274aaf#file-mpi_just_read-c

just_read.c:  
https://gist.github.com/alichry/84a9721bac741ffdf891e70b82274aaf#file-just_read-c


Clearly, increasing number of processors on mpi_just_read.c did not 
severely affect the elapsed time.
For parallel_ms.c, is it possible that 63 processors are in a blocking-
read state from processor 0 somehow affecting the read from file elapsed 
time?

Any assistance or clarification would be appreciated.
Ali.

 

Reply via email to