Hi Eli,

I've found the past email I think you are referring to.  I found an email from 
Friday 27 July, 2012.  In his email, Amir A. posed a similar question about how 
to interpret the output from the ShortestPath Example.  Amir wrote:

"Moreover, looking at  the output file of shortestpath example provided
on the website, it seems that it is exactly the same like input files
combined together.

Both the output file and the input files (put together) look a like.
Is this correct and the expected result?"

Unfortunately, I don't see a response to his email in the archives.  Can you 
please quickly explain how the ShortestPath output should be interpreted?

Let's just look at a quick snippet of the output:
hduser ~/giraph-src/bin $ hadoop fs -cat shortestPathsOutput/part*
[5,1000,[[6,500]]]
[14,9100,[[0,1400]]]

I know that this is:  node 5 (with value 14) has an edge to node 6 with edge 
cost 500.
And:  node 14 (with value 9100) has an edge to node 0 with an edge cost of 1400.

But if the ShortestPathExample computes the shortest path from node 0 to every 
other node, I am expecting to see output like:

[0, 10500, [1, <edge-cost-to-1> ]
[0, 10500, [2, <edge-cost-to-2> ]
[0, 10500, [3, <edge-cost-to-3> ]
Etc...

Am I thinking about this the right way?

Bence
From: Eli Reisman [mailto:[email protected]]
Sent: Monday, October 08, 2012 6:48 PM
To: [email protected]
Subject: Re: High-level questions about the ShortestPathsBenchmark example that 
ships with Giraph

No you've got it right. There's another mail from a few months back on the 
mailing list that explains the output. I'm glad to see its working for you!

Eli

On Sun, Oct 7, 2012 at 7:50 PM, Magyar, Bence (US SSA) 
<[email protected]<mailto:[email protected]>> wrote:
Hi Eli,

Well, I've followed your advice and have gotten the SimpleShortestPathsVertex 
example running using the GiraphRunner.  I am using the following command to 
launch the job:

./giraph ../target/giraph.jar 
org.apache.giraph.examples.SimpleShortestPathsVertex -if 
org.apache.giraph.io.JsonLongDoubleFloatDoubleVertexInputFormat -ip 
/user/hduser/shortestPathsInputGraph -of 
org.apache.giraph.io.JsonLongDoubleFloatDoubleVertexOutputFormat -op 
/user/hduser/shortestPathsOutput -w 3

As my input graph, I am using Avery's sample input files from:

http://ece.northwestern.edu/~aching/shortestPathsInputGraph.tar.gz

After the job completes, my output matches the output referenced @ 
https://github.com/aching/Giraph/wiki/Shortest-Paths-Example

(See below)

hduser ~/giraph-src/bin $ hadoop fs -cat shortestPathsOutput/part*

[5,1000,[[6,500]]]
[14,9100,[[0,1400]]]
[8,2800,[[9,800]]]
[2,100,[[3,200]]]
[11,5500,[[12,1100]]]
[7,2100,[[8,700]]]
[1,0,[[2,100]]]
[10,4500,[[11,1000]]]
[13,7800,[[14,1300]]]
[4,600,[[5,400]]]
[6,1500,[[7,600]]]
[0,10500,[[1,0]]]
[9,3600,[[10,900]]]
[12,6600,[[13,1200]]]
[3,300,[[4,300]]]

However, I don't quite understand how to interpret this output.  The 
SimpleShortestPathsVertex should find the shortest path from (by default) 
vertex "1" to every other node.
How should the above output be interpreted?  The ShortestPath example on the 
wiki also stops short of explaining this point.  Am I using the wrong output 
format?

Thanks!
Bence
________________________________
From: Eli Reisman 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Monday, September 24, 2012 6:05 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: High-level questions about the ShortestPathsBenchmark example that 
ships with Giraph

Sorry, the Giraph website is a bit out of date regarding the user-configurable 
application code. The benchmark applications are meant for just that, and are 
not written to process input or output data or results. The code you are 
looking for is in the examples/ dir. These are applications (*Vertex classes) 
in Giraph. Regarding code from the examples/ directory: you can run it at the 
command line using the "giraph" script in the bin/ dir. There are many command 
line options (including for IO formats from the io/ dir) and input/output paths 
in your HDFS for your data. Until better docs are up (soon, sorry!) your best 
bet is to read some of the example apps in the examples/ and io/ dirs and read 
GiraphRunner.java and bin/giraph script to get a feel for how user-configured 
command-line runs are performed. You might also then feel comfortable writing 
some of your own application code.

Sorry about the confusion, we will be posting better documentation of 
application-style runs ASAP.
On Mon, Sep 24, 2012 at 1:50 PM, Magyar, Bence (US SSA) 
<[email protected]<mailto:[email protected]>> wrote:
Hello Giraph User Community,

( I am re-posting this question - I think I tried posting this before I 
confirmed my registration.  Please pardon if this message is a duplicate )

This is my first post to this mailing list - I'm interested in learning more 
about Giraph and to do that I checked out the latest source code from 
https://svn.apache.org/repos/asf/giraph/trunk
and built it with maven.

I am now running the shortestPathBenchMark example that ships with Giraph and 
have a few "high-level" questions:
For the sake of this discussion, I am running the example with the following 
arguments:

hadoop jar giraph.jar org.apache.giraph.benchmark.ShortestPathsBenchmark -c 1 
-e 3 -v -V 50000 -w 4

The example takes about 90 seconds to complete on my 4-node hadoop cluster and 
I don't see any errors or issues.


1.      In computing a Dijkstra shortest path, we are looking for the shortest 
path from one node to another.  What does ShortestPathsBenchmark use as the 
"starting" node?  The "ending" node?

2.      What edge weights are being used?  The arguments don't allow me to 
specify them.

3.      Does ShortestPathsBenchmark produce any output data inside HDFS upon 
completion of this example, or is the example purely meant to visually 
illustrate processing time on my cluster?

4.      Can I feed ShortestPathsBenchmark my own graph?

5.      In the example above, I have specified 3 edges per vertex.  If I were 
to specify only 2 edges per vertex, am I not effectively dealing with a graph 
that most closely resembles a "linked list"?  When I set -e=2, the processing 
time is still somewhat comparable to -e = 3.  Shouldn't the graph be much 
simpler?


I have seen the ShortestPathExample @
https://cwiki.apache.org/confluence/display/GIRAPH/Shortest+Paths+Example

and I was planning on working through that example as well, but I thought I'd 
ask about the benchmarking example first.

Thanks!


Bence


Reply via email to