Hi,

We faced an issue when testing the scalability of parallel merge sort using 
reduction tree on an array of size 1024^3.
Currently, only the master opens the input file and parse it into an array 
using fscanf and then distribute the array to other processors.
When using 32 processors, it took ~109 seconds to read from file.
When using 64 processors, it took ~216 seconds to read from file.
Despite varying number of processors, only one processor (the master) read the 
file.
The input file is stored in a tmpfs, its made up of 1024^3 + 1 numbers (where 
the first number is the array size).

Additionally, I ran a C program that only read the file, it took ~104 seconds.
However, I also ran an MPI program that only read the file, it took ~116 and 
~118 seconds on 32 and 64 processors respectively.

Code at https://gist.github.com/alichry/84a9721bac741ffdf891e70b82274aaf
parallel_ms.c: 
https://gist.github.com/alichry/84a9721bac741ffdf891e70b82274aaf#file-parallel_ms-c
mpi_just_read.c: 
https://gist.github.com/alichry/84a9721bac741ffdf891e70b82274aaf#file-mpi_just_read-c
just_read.c: 
https://gist.github.com/alichry/84a9721bac741ffdf891e70b82274aaf#file-just_read-c

Clearly, increasing number of processors on mpi_just_read.c did not severely 
affect the elapsed time.
For parallel_ms.c, is it possible that 63 processors are in a blocking-read 
state from processor 0 somehow affecting the read from file elapsed time?

Any assistance or clarification would be appreciated.
Ali.

Attachment: mpijr-vader-32.log
Description: mpijr-vader-32.log

Attachment: mpijr-vader-64.log
Description: mpijr-vader-64.log

Attachment: pms-tcp-32.log
Description: pms-tcp-32.log

Attachment: pms-tcp-64.log
Description: pms-tcp-64.log

Attachment: pms-vader-32.log
Description: pms-vader-32.log

Attachment: pms-vader-64.log
Description: pms-vader-64.log

Reply via email to