Hey guys,
I was playing with PageRank these days, and had some weird results.
I wanted to use the Input Format Reader and Output Format Writer given
inside SimplePageRankComputation, so I gave my input file and called the
specific code in the command line.
Some of the vertices got value > 1. So I had a look in the logs, and
noticed that it is generating its own vertices and my input file is never
used.
The Vertex class inside the SimplePageRankVertexReader has the following
lines:
LongWritable vertexId = new LongWritable(
(inputSplit.getSplitIndex() * totalRecords) + recordsRead);
DoubleWritable vertexValue = new DoubleWritable(vertexId.get() * 10d);
long targetVertexId =
(vertexId.get() + 1) %
(inputSplit.getNumSplits() * totalRecords);
float edgeValue = vertexId.get() * 100f;
And in the Task Logs, it prints
2013-05-30 11:28:20,808 INFO
org.apache.giraph.examples.SimplePageRankVertex$SimplePageRankVertexReader:
next: Return vertexId=0, vertexValue=0.0, targetVertexId=1,
edgeValue=0.0
2013-05-30 11:28:20,809 INFO
org.apache.giraph.examples.SimplePageRankVertex$SimplePageRankVertexReader:
next: Return vertexId=1, vertexValue=10.0, targetVertexId=2,
edgeValue=100.0
2013-05-30 11:28:20,809 INFO
org.apache.giraph.examples.SimplePageRankVertex$SimplePageRankVertexReader:
next: Return vertexId=2, vertexValue=20.0, targetVertexId=3,
edgeValue=200.0
2013-05-30 11:28:20,809 INFO
org.apache.giraph.examples.SimplePageRankVertex$SimplePageRankVertexReader:
next: Return vertexId=3, vertexValue=30.0, targetVertexId=4,
edgeValue=300.0
2013-05-30 11:28:20,809 INFO
org.apache.giraph.examples.SimplePageRankVertex$SimplePageRankVertexReader:
next: Return vertexId=4, vertexValue=40.0, targetVertexId=5,
edgeValue=400.0
2013-05-30 11:28:20,809 INFO
org.apache.giraph.examples.SimplePageRankVertex$SimplePageRankVertexReader:
next: Return vertexId=5, vertexValue=50.0, targetVertexId=6,
edgeValue=500.0
2013-05-30 11:28:20,809 INFO
org.apache.giraph.examples.SimplePageRankVertex$SimplePageRankVertexReader:
next: Return vertexId=6, vertexValue=60.0, targetVertexId=7,
edgeValue=600.0
2013-05-30 11:28:20,810 INFO
org.apache.giraph.examples.SimplePageRankVertex$SimplePageRankVertexReader:
next: Return vertexId=7, vertexValue=70.0, targetVertexId=8,
edgeValue=700.0
2013-05-30 11:28:20,810 INFO
org.apache.giraph.examples.SimplePageRankVertex$SimplePageRankVertexReader:
next: Return vertexId=8, vertexValue=80.0, targetVertexId=9,
edgeValue=800.0
2013-05-30 11:28:20,810 INFO
org.apache.giraph.examples.SimplePageRankVertex$SimplePageRankVertexReader:
next: Return vertexId=9, vertexValue=90.0, targetVertexId=0,
edgeValue=900.0
Could someone explain why is this happening?
Thanks!
--
Maria Stylianou
Intern at Telefonica, Barcelona, Spain
Master Student of European Master in Distributed
Computing<http://www.kth.se/en/studies/programmes/master/em/emdc>
marsty5.wordpress.com