Thanks Hasan,
My question is: is it the way others report superstep times in research
papers such as Hama, Giraph, GraphLab, etc. Are they not considering the
I/O time? (As they report good scalable reports for Pagerank calculation).
The other case is that, even taking into account the computational
supersteps excluding I/O overhead does not make any improve. Taking into
account the first superstep (superstep 0 is not considered as you say) the
timing details for amazon would be:
# Processor cores
2 4 8 16
24
32superstep 1 5101 3642 3163 2867 3611 3435
(ms)
As you see, this is the same story! while no I/O is considered.
Regards,
Karos
On Mon, Dec 14, 2015 at 9:29 PM, Hassan Eslami <[email protected]> wrote:
> Hi,
>
> Here is my take on it:
> It may be a good idea to isolate the time spent in compute supersteps. In
> order to do that, you can look at per-superstep timing metrics and
> aggregate the times for all supersteps except input (-1) and output (last)
> superstep. This eliminates the time for IO and turns your focus only on
> time spent accessing memory and doing computation in cores. In general, if
> an application is memory-bound (for instance PageRank), increasing the
> number of threads does not necessarily decrease the time after a certain
> point, due to the fact that memory bandwidth can become a bottleneck. In
> other words, once the memory bandwidth has been saturated by a certain
> number of threads, increasing the number of threads will not decrease the
> execution time anymore.
>
> Best,
> Hassan
>
> On Mon, Dec 14, 2015 at 5:14 AM, Foad Lotfifar <[email protected]> wrote:
>
>> Hi,
>>
>> I have a scalability issue for Giraph and I can not find out where is the
>> problem.
>>
>> --- Cluster specs:
>> # nodes 1
>> # threads 32
>> Processor Intel Xeon 2.0GHz
>> OS ubuntu 32bit
>> RAM 64GB
>>
>> --- Giraph specs
>> Hadoop Apache Hadoop 1.2.1
>> Giraph 1.2.0 Snapshot
>>
>> Tested Graphs:
>> amazon0302 V=262,111, E=1,234,877
>> coAuthorsCiteseer V=227,320, E=1,628,268
>>
>>
>> I run the provided PageRank algorithm in Giraph
>> "SimplePageRankComputation" with the followng options
>>
>> (time ($HADOOP_HOME/bin/hadoop jar
>> $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar
>> \
>> org.apache.giraph.GiraphRunner
>> -Dgiraph.graphPartitionerFactoryClass=org.apache.giraph.partition.HashRangePartitionerFactory
>> \
>> org.apache.giraph.examples.PageRankComputation \
>> -vif
>> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
>> -vip /user/hduser/input/$file \
>> -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
>> -op /user/hduser/output/pagerank -w $2 \
>> -mc
>> org.apache.giraph.examples.PageRankComputation\$PageRankMasterCompute))
>> 2>&1 \
>> | tee -a ./pagerank_results/$file.GHR_$2.$iter.output.txt
>>
>> The algorithm runs without any issue. The number of supersteps is set to
>> 31 by default in the algorithm.
>>
>> *Problem:*
>> I dont get any scalability for more than 8 (or 16) processor cores that
>> is I get speedup up to 8 (or 16) cores and then the run time starts to
>> increase.
>>
>> I have run the PageRank with only one superstep as well as running other
>> algorithms such as ShortestPath algorithm. I get the same results. I can
>> not figure out where is the problem.
>>
>> 1- I have tried two options by changing the giraph.numInputThreads and
>> giraph.numOutputThreads: the performance gets a littile bit better but no
>> impact on scalability.
>> 2- Does it related to the size of the graphs? because the graphs I am
>> testing are small graphs.
>> 3- Is it a platform related issue?
>>
>> It is the timing details of amazon graph:
>>
>> # Processor cores
>> 1 2 4 8 16 24 32
>> Input
>> 3260 3447 3269 3921 4555 4766 Intialise
>> 3467 36458 45474 39091 100281 79012 Setup
>> 34 52 59 70 77 86 Shutdown
>> 9954 10226 11021 9524 13393 15930 Total
>> 135482 84483 61081 52190 58921 61898
>> HDFS READ
>> 21097485 26117723 36158199 57808783 80086015 102163071 FILE WRITE
>> 65889 109815 197667 373429 549165 724901 HDFS WRITE
>> 7330986 7331068 7331093 7330988 7330976 7331203
>>
>> Best Regards,
>> Karos
>>
>>
>>
>>
>
--
Regards,
Karos Lotfifar