Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph
Avery Ching wrote: It shouldn't be, your code looks very similar to the unittests (i.e. TestManualCheckpoint.java). So, you're trying to run your test with the local hadoop (similar to the unittests)? Or are you using an actual hadoop setup? Hi Avery, here is a few more details on what I am trying to do, in order to run my Giraph jobs on a local Hadoop running (for testing and debugging stuff locally): GiraphJob job = new GiraphJob(shortest paths); Configuration conf = job.getConfiguration(); conf.setBoolean(GiraphJob.SPLIT_MASTER_WORKER, false); conf.setBoolean(GiraphJob.LOCAL_TEST_MODE, true); // conf.set(GiraphJob.ZOOKEEPER_JAR, file://target/dependency/zookeeper-3.3.3.jar); job.setWorkerConfiguration(1, 1, 100.0f); job.setVertexClass(SimpleShortestPathsVertex.class); job.setVertexInputFormatClass(SimpleShortestPathsVertexInputFormat.class); job.setVertexOutputFormatClass(SimpleShortestPathsVertexOutputFormat.class); FileInputFormat.addInputPath(job.getInternalJob(), new Path(src/main/resources/giraph1.txt)); Path outputPath = new Path(target/giraph1); FileSystem hdfs = FileSystem.get(conf); hdfs.delete(outputPath, true); FileOutputFormat.setOutputPath(job.getInternalJob(), outputPath); job.run(true); Am I doing something wrong/stupid here? Am I missing something important? (probably! but I do not see what I am missing) This is what I think happens... In GraphMapper something goes wrong during setup(context), probably because GiraphJob.ZOOKEEPER_JAR is not set(?) and an exception different from IOException is thrown and I do not see any useful error message: try { setup(context); while (context.nextKeyValue()) { map(context.getCurrentKey(), context.getCurrentValue(), context); } cleanup(context); } catch (IOException e) { if (mapFunctions == MapFunctions.WORKER_ONLY) { serviceWorker.failureCleanup(); } throw new IllegalStateException( run: Caught an unrecoverable exception + e.getMessage(), e); } My question is: is it possible to run a Giraph job as I am trying to do above (for testing only) or developers need to have an Hadoop cluster either remote or locally and ZooKeeper running (either remote or locally)? Thanks, Paolo
Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph
Paolo Castagna wrote: Avery Ching wrote: It shouldn't be, your code looks very similar to the unittests (i.e. TestManualCheckpoint.java). So, you're trying to run your test with the local hadoop (similar to the unittests)? Or are you using an actual hadoop setup? Hi Avery, here is a few more details on what I am trying to do, in order to run my Giraph jobs on a local Hadoop running (for testing and debugging stuff locally): GiraphJob job = new GiraphJob(shortest paths); Configuration conf = job.getConfiguration(); conf.setBoolean(GiraphJob.SPLIT_MASTER_WORKER, false); conf.setBoolean(GiraphJob.LOCAL_TEST_MODE, true); // conf.set(GiraphJob.ZOOKEEPER_JAR, file://target/dependency/zookeeper-3.3.3.jar); job.setWorkerConfiguration(1, 1, 100.0f); job.setVertexClass(SimpleShortestPathsVertex.class); job.setVertexInputFormatClass(SimpleShortestPathsVertexInputFormat.class); job.setVertexOutputFormatClass(SimpleShortestPathsVertexOutputFormat.class); FileInputFormat.addInputPath(job.getInternalJob(), new Path(src/main/resources/giraph1.txt)); Path outputPath = new Path(target/giraph1); FileSystem hdfs = FileSystem.get(conf); hdfs.delete(outputPath, true); FileOutputFormat.setOutputPath(job.getInternalJob(), outputPath); job.run(true); Am I doing something wrong/stupid here? Am I missing something important? (probably! but I do not see what I am missing) This is a better way: IterableString results = InternalVertexRunner.run( SimpleShortestPathsVertex.class, SimpleShortestPathsVertex.SimpleShortestPathsVertexInputFormat.class, SimpleShortestPathsVertex.SimpleShortestPathsVertexOutputFormat.class, params, graph); ... which starts a local ZooKeeper properly. However, I still have a question: when I run it in a unit test everything is fine. When I run it on a Java main method, it hangs towards the end. Paolo
Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph
Paolo Castagna wrote: This is a better way: IterableString results = InternalVertexRunner.run( SimpleShortestPathsVertex.class, SimpleShortestPathsVertex.SimpleShortestPathsVertexInputFormat.class, SimpleShortestPathsVertex.SimpleShortestPathsVertexOutputFormat.class, params, graph); ... which starts a local ZooKeeper properly. However, I still have a question: when I run it in a unit test everything is fine. When I run it on a Java main method, it hangs towards the end. I am using Hadoop 1.0.1, Pig 0.9.2, ZooKeeper 3.4.3 and Giraph from trunk: [INFO] +- org.apache.hadoop:hadoop-core:jar:1.0.1:compile ... [INFO] +- org.apache.pig:pig:jar:0.9.2:compile ... [INFO] +- org.apache.hbase:hbase:jar:0.92.1:compile ... [INFO] +- org.apache.zookeeper:zookeeper:jar:3.4.3:compile ... [INFO] +- org.apache.giraph:giraph:jar:0.2-SNAPSHOT:compile Paolo
Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph
Hi Paulo, Can you try something for me? I was able to get the PageRankBenchmark to work running in local mode just fine on my side. I think we should have some kind of a helper script (similar to bin/giraph) to running simple tests in LocalJobRunner. I believe that for LocalJobRunner to run, we need to do -Dgiraph.SplitMasterWorker=false -Dlocal.test.mode=true. In the case of PageRankBenchmark, I also have to set the workers to 1 (LocalJobRunner can only run one task at a time). So I get the class path that bin/giraph was using to run (just added a echo $CLASSPATH at the end) and then inserted the giraph-0.2-SNAPSHOT-jar-with-dependencies.jar in front of it (this is necessary for the ZooKeeper jar inclusion). Then I just ran a normal java command and the output below. One thing to remember is that if you rerun it, you'll have to remove the _bsp directories that are created, otherwise it will think it has already been completed. Hope that helps, Avery java -cp target/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar:/Users/aching/git/git_svn_giraph_trunk/conf:/Users/aching/.m2/repository/ant/ant/1.6.5/ant-1.6.5.jar:/Users/aching/.m2/repository/com/google/guava/guava/r09/guava-r09.jar:/Users/aching/.m2/repository/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar:/Users/aching/.m2/repository/commons-beanutils/commons-beanutils-core/1.8.0/commons-beanutils-core-1.8.0.jar:/Users/aching/.m2/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:/Users/aching/.m2/repository/commons-codec/commons-codec/1.4/commons-codec-1.4.jar:/Users/aching/.m2/repository/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar:/Users/aching/.m2/repository/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar:/Users/aching/.m2/repository/commons-digester/commons-digester/1.8/commons-digester-1.8.jar:/Users/aching/.m2/repository/commons-el/commons-el/1.0/commons-el-1.0.jar:/Users/aching/.m2/repository/commons-httpclient/commons-httpclient/3.0.1/commons-httpclient-3.0.1.jar:/Users/aching/.m2/repository/commons-lang/commons-lang/2.4/commons-lang-2.4.jar:/Users/aching/.m2/repository/commons-logging/commons-logging/1.0.3/commons-logging-1.0.3.jar:/Users/aching/.m2/repository/commons-net/commons-net/1.4.1/commons-net-1.4.1.jar:/Users/aching/.m2/repository/hsqldb/hsqldb/1.8.0.10/hsqldb-1.8.0.10.jar:/Users/aching/.m2/repository/javax/activation/activation/1.1/activation-1.1.jar:/Users/aching/.m2/repository/javax/mail/mail/1.4/mail-1.4.jar:/Users/aching/.m2/repository/jline/jline/0.9.94/jline-0.9.94.jar:/Users/aching/.m2/repository/junit/junit/3.8.1/junit-3.8.1.jar:/Users/aching/.m2/repository/log4j/log4j/1.2.15/log4j-1.2.15.jar:/Users/aching/.m2/repository/net/iharder/base64/2.3.8/base64-2.3.8.jar:/Users/aching/.m2/repository/net/java/dev/jets3t/jets3t/0.7.1/jets3t-0.7.1.jar:/Users/aching/.m2/repository/net/sf/kosmosfs/kfs/0.3/kfs-0.3.jar:/Users/aching/.m2/repository/org/apache/commons/commons-io/1.3.2/commons-io-1.3.2.jar:/Users/aching/.m2/repository/org/apache/commons/commons-math/2.1/commons-math-2.1.jar:/Users/aching/.m2/repository/org/apache/hadoop/hadoop-core/0.20.203.0/hadoop-core-0.20.203.0.jar:/Users/aching/.m2/repository/org/apache/mahout/mahout-collections/1.0/mahout-collections-1.0.jar:/Users/aching/.m2/repository/org/apache/zookeeper/zookeeper/3.3.3/zookeeper-3.3.3.jar:/Users/aching/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.8.0/jackson-core-asl-1.8.0.jar:/Users/aching/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.8.0/jackson-mapper-asl-1.8.0.jar:/Users/aching/.m2/repository/org/eclipse/jdt/core/3.1.1/core-3.1.1.jar:/Users/aching/.m2/repository/org/json/json/20090211/json-20090211.jar:/Users/aching/.m2/repository/org/mockito/mockito-all/1.8.5/mockito-all-1.8.5.jar:/Users/aching/.m2/repository/org/mortbay/jetty/jetty/6.1.26/jetty-6.1.26.jar:/Users/aching/.m2/repository/org/mortbay/jetty/jetty-util/6.1.26/jetty-util-6.1.26.jar:/Users/aching/.m2/repository/org/mortbay/jetty/jsp-2.1/6.1.14/jsp-2.1-6.1.14.jar:/Users/aching/.m2/repository/org/mortbay/jetty/jsp-api-2.1/6.1.14/jsp-api-2.1-6.1.14.jar:/Users/aching/.m2/repository/org/mortbay/jetty/servlet-api/2.5-20081211/servlet-api-2.5-20081211.jar:/Users/aching/.m2/repository/org/mortbay/jetty/servlet-api-2.5/6.1.14/servlet-api-2.5-6.1.14.jar:/Users/aching/.m2/repository/oro/oro/2.0.8/oro-2.0.8.jar:/Users/aching/.m2/repository/tomcat/jasper-compiler/5.5.12/jasper-compiler-5.5.12.jar:/Users/aching/.m2/repository/tomcat/jasper-runtime/5.5.12/jasper-runtime-5.5.12.jar:/Users/aching/.m2/repository/xmlenc/xmlenc/0.52/xmlenc-0.52.jar org.apache.giraph.benchmark.PageRankBenchmark -Dgiraph.SplitMasterWorker=false -Dlocal.test.mode=true -c 1 -e 2 -s 2 -V 10 -w 1 2012-04-13 09:30:27.261 java[45785:1903] Unable to load realm mapping info from SCDynamicStore 12/04/13 09:30:27 INFO benchmark.PageRankBenchmark: Using class org.apache.giraph.benchmark.PageRankBenchmark
Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph
Hi Avery, nope, no luck. I have changed all my log.debug(...) into log.info(...). Same behavior. I have a log4j.properties [1] file in my classpath and it has: log4j.logger.org.apache.jena.grande=DEBUG log4j.logger.org.apache.jena.grande.giraph=DEBUG I also tried to change that to: log4j.logger.org.apache.jena.grande=INFO log4j.logger.org.apache.jena.grande.giraph=INFO No luck. My Giraph job has: GiraphJob job = new GiraphJob(getConf(), getClass().getName()); job.setVertexClass(getClass()); job.setVertexInputFormatClass(TurtleVertexInputFormat.class); job.setVertexOutputFormatClass(TurtleVertexOutputFormat.class); But, if I run in debug with a breakpoint in the TurtleVertexInputFormat.class constructor, it is never instanciated. How can it be? So perhaps the problem is not the logging, it is the fact that my GiraphJob is not using TurtleVertexInputFormat.class and TurtleVertexOutputFormat.class, but I don't see what I am doing wrong. :-/ Thanks, Paolo [1] https://github.com/castagna/jena-grande/blob/master/src/test/resources/log4j.properties Avery Ching wrote: I think the issue might be that Hadoop only logs INFO and above messages by default. Can you retry with INFO level logging? Avery On 4/10/12 12:17 PM, Paolo Castagna wrote: Hi, I am still learning Giraph, so, please, be patient with me and forgive my trivial questions. As a simple initial use case, I want to compute the shortest paths from a single source in a social graph in RDF format using the FOAF [1] vocabulary. This example also will hopefully inform GIRAPH-170 [2] and related issues, such as: GIRAPH-141 [3]. Here is an example in Turtle [4] format of a tiny graph using FOAF: @prefix :http://example.org/ . @prefix foaf:http://xmlns.com/foaf/0.1/ . :alice a foaf:Person ; foaf:name Alice ; foaf:mboxmailto:al...@example.org ; foaf:knows :bob ; foaf:knows :charlie ; foaf:knows :snoopy ; . :bob foaf:name Bob ; foaf:knows :charlie ; . :charlie foaf:name Charlie ; foaf:knows :alice ; . This is nice, human friendly (RDF without angle brackets!), but not easily splittable to be processed with MapReduce (or Giraph). Here is the same graph in N-Triples [5] format: http://example.org/alice http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://xmlns.com/foaf/0.1/Person . http://example.org/alice http://xmlns.com/foaf/0.1/name Alice . http://example.org/alice http://xmlns.com/foaf/0.1/mbox mailto:al...@example.org . http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/bob . http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/charlie . http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/snoopy . http://example.org/charlie http://xmlns.com/foaf/0.1/name Charlie . http://example.org/charlie http://xmlns.com/foaf/0.1/knows http://example.org/alice . http://example.org/bob http://xmlns.com/foaf/0.1/name Bob . http://example.org/bob http://xmlns.com/foaf/0.1/knows http://example.org/charlie . This is more verbose and ugly, but splittable. The graph I am interested in is the graph represented by foaf:knows relationships/links between people (please, note --knows-- relationship here has a direction, this isn't symmetric as in centralized social networking websites such as Facebook or LinkedIn. Alice can claim to know Bob, without Bob knowing it and/or it might even be a false claim): alice --knows-- bob alice --knows-- charlie alice --knows-- snoopy bob --knows-- charlie charlie --knows-- alice As a first step, I wrote a MapReduce job [6] to transform the RDF graph above in a sort of adjacency list using Turtle syntax, here is the output (three lines): http://example.org/alice http://xmlns.com/foaf/0.1/mbox mailto:al...@example.org;http://xmlns.com/foaf/0.1/name Alice; http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://xmlns.com/foaf/0.1/Person;http://xmlns.com/foaf/0.1/knows http://example.org/charlie,http://example.org/bob, http://example.org/snoopy; .http://example.org/charlie http://xmlns.com/foaf/0.1/knows http://example.org/alice. http://example.org/bob http://xmlns.com/foaf/0.1/name Bob; http://xmlns.com/foaf/0.1/knows http://example.org/charlie; . http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/bob. http://example.org/charlie http://xmlns.com/foaf/0.1/name Charlie; http://xmlns.com/foaf/0.1/knows http://example.org/alice; . http://example.org/bob http://xmlns.com/foaf/0.1/knows http://example.org/charlie.http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/charlie. This is legal Turtle, but it is also splittable. Each line has all the RDF statements (i.e. egdes) for a person (there are also incoming edges). I wrote a TurtleVertexReader [7] which extends
Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph
I am using hadoop-core-1.0.1.jar ... could that be a problem? Paolo Paolo Castagna wrote: Hi Avery, nope, no luck. I have changed all my log.debug(...) into log.info(...). Same behavior. I have a log4j.properties [1] file in my classpath and it has: log4j.logger.org.apache.jena.grande=DEBUG log4j.logger.org.apache.jena.grande.giraph=DEBUG I also tried to change that to: log4j.logger.org.apache.jena.grande=INFO log4j.logger.org.apache.jena.grande.giraph=INFO No luck. My Giraph job has: GiraphJob job = new GiraphJob(getConf(), getClass().getName()); job.setVertexClass(getClass()); job.setVertexInputFormatClass(TurtleVertexInputFormat.class); job.setVertexOutputFormatClass(TurtleVertexOutputFormat.class); But, if I run in debug with a breakpoint in the TurtleVertexInputFormat.class constructor, it is never instanciated. How can it be? So perhaps the problem is not the logging, it is the fact that my GiraphJob is not using TurtleVertexInputFormat.class and TurtleVertexOutputFormat.class, but I don't see what I am doing wrong. :-/ Thanks, Paolo [1] https://github.com/castagna/jena-grande/blob/master/src/test/resources/log4j.properties Avery Ching wrote: I think the issue might be that Hadoop only logs INFO and above messages by default. Can you retry with INFO level logging? Avery On 4/10/12 12:17 PM, Paolo Castagna wrote: Hi, I am still learning Giraph, so, please, be patient with me and forgive my trivial questions. As a simple initial use case, I want to compute the shortest paths from a single source in a social graph in RDF format using the FOAF [1] vocabulary. This example also will hopefully inform GIRAPH-170 [2] and related issues, such as: GIRAPH-141 [3]. Here is an example in Turtle [4] format of a tiny graph using FOAF: @prefix :http://example.org/ . @prefix foaf:http://xmlns.com/foaf/0.1/ . :alice a foaf:Person ; foaf:name Alice ; foaf:mboxmailto:al...@example.org ; foaf:knows :bob ; foaf:knows :charlie ; foaf:knows :snoopy ; . :bob foaf:name Bob ; foaf:knows :charlie ; . :charlie foaf:name Charlie ; foaf:knows :alice ; . This is nice, human friendly (RDF without angle brackets!), but not easily splittable to be processed with MapReduce (or Giraph). Here is the same graph in N-Triples [5] format: http://example.org/alice http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://xmlns.com/foaf/0.1/Person . http://example.org/alice http://xmlns.com/foaf/0.1/name Alice . http://example.org/alice http://xmlns.com/foaf/0.1/mbox mailto:al...@example.org . http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/bob . http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/charlie . http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/snoopy . http://example.org/charlie http://xmlns.com/foaf/0.1/name Charlie . http://example.org/charlie http://xmlns.com/foaf/0.1/knows http://example.org/alice . http://example.org/bob http://xmlns.com/foaf/0.1/name Bob . http://example.org/bob http://xmlns.com/foaf/0.1/knows http://example.org/charlie . This is more verbose and ugly, but splittable. The graph I am interested in is the graph represented by foaf:knows relationships/links between people (please, note --knows-- relationship here has a direction, this isn't symmetric as in centralized social networking websites such as Facebook or LinkedIn. Alice can claim to know Bob, without Bob knowing it and/or it might even be a false claim): alice --knows-- bob alice --knows-- charlie alice --knows-- snoopy bob --knows-- charlie charlie --knows-- alice As a first step, I wrote a MapReduce job [6] to transform the RDF graph above in a sort of adjacency list using Turtle syntax, here is the output (three lines): http://example.org/alice http://xmlns.com/foaf/0.1/mbox mailto:al...@example.org;http://xmlns.com/foaf/0.1/name Alice; http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://xmlns.com/foaf/0.1/Person;http://xmlns.com/foaf/0.1/knows http://example.org/charlie,http://example.org/bob, http://example.org/snoopy; .http://example.org/charlie http://xmlns.com/foaf/0.1/knows http://example.org/alice. http://example.org/bob http://xmlns.com/foaf/0.1/name Bob; http://xmlns.com/foaf/0.1/knows http://example.org/charlie; . http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/bob. http://example.org/charlie http://xmlns.com/foaf/0.1/name Charlie; http://xmlns.com/foaf/0.1/knows http://example.org/alice; . http://example.org/bob http://xmlns.com/foaf/0.1/knows http://example.org/charlie.http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/charlie. This is legal Turtle, but it is also splittable. Each line has all the
Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph
It shouldn't be, your code looks very similar to the unittests (i.e. TestManualCheckpoint.java). So, you're trying to run your test with the local hadoop (similar to the unittests)? Or are you using an actual hadoop setup? Avery On 4/10/12 11:41 PM, Paolo Castagna wrote: I am using hadoop-core-1.0.1.jar ... could that be a problem? Paolo Paolo Castagna wrote: Hi Avery, nope, no luck. I have changed all my log.debug(...) into log.info(...). Same behavior. I have a log4j.properties [1] file in my classpath and it has: log4j.logger.org.apache.jena.grande=DEBUG log4j.logger.org.apache.jena.grande.giraph=DEBUG I also tried to change that to: log4j.logger.org.apache.jena.grande=INFO log4j.logger.org.apache.jena.grande.giraph=INFO No luck. My Giraph job has: GiraphJob job = new GiraphJob(getConf(), getClass().getName()); job.setVertexClass(getClass()); job.setVertexInputFormatClass(TurtleVertexInputFormat.class); job.setVertexOutputFormatClass(TurtleVertexOutputFormat.class); But, if I run in debug with a breakpoint in the TurtleVertexInputFormat.class constructor, it is never instanciated. How can it be? So perhaps the problem is not the logging, it is the fact that my GiraphJob is not using TurtleVertexInputFormat.class and TurtleVertexOutputFormat.class, but I don't see what I am doing wrong. :-/ Thanks, Paolo [1] https://github.com/castagna/jena-grande/blob/master/src/test/resources/log4j.properties Avery Ching wrote: I think the issue might be that Hadoop only logs INFO and above messages by default. Can you retry with INFO level logging? Avery On 4/10/12 12:17 PM, Paolo Castagna wrote: Hi, I am still learning Giraph, so, please, be patient with me and forgive my trivial questions. As a simple initial use case, I want to compute the shortest paths from a single source in a social graph in RDF format using the FOAF [1] vocabulary. This example also will hopefully inform GIRAPH-170 [2] and related issues, such as: GIRAPH-141 [3]. Here is an example in Turtle [4] format of a tiny graph using FOAF: @prefix :http://example.org/ . @prefix foaf:http://xmlns.com/foaf/0.1/ . :alice a foaf:Person ; foaf:name Alice ; foaf:mboxmailto:al...@example.org ; foaf:knows :bob ; foaf:knows :charlie ; foaf:knows :snoopy ; . :bob foaf:name Bob ; foaf:knows :charlie ; . :charlie foaf:name Charlie ; foaf:knows :alice ; . This is nice, human friendly (RDF without angle brackets!), but not easily splittable to be processed with MapReduce (or Giraph). Here is the same graph in N-Triples [5] format: http://example.org/alice http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://xmlns.com/foaf/0.1/Person . http://example.org/alice http://xmlns.com/foaf/0.1/name Alice . http://example.org/alice http://xmlns.com/foaf/0.1/mbox mailto:al...@example.org . http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/bob . http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/charlie . http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/snoopy . http://example.org/charlie http://xmlns.com/foaf/0.1/name Charlie . http://example.org/charlie http://xmlns.com/foaf/0.1/knows http://example.org/alice . http://example.org/bob http://xmlns.com/foaf/0.1/name Bob . http://example.org/bob http://xmlns.com/foaf/0.1/knows http://example.org/charlie . This is more verbose and ugly, but splittable. The graph I am interested in is the graph represented by foaf:knows relationships/links between people (please, note --knows-- relationship here has a direction, this isn't symmetric as in centralized social networking websites such as Facebook or LinkedIn. Alice can claim to know Bob, without Bob knowing it and/or it might even be a false claim): alice --knows-- bob alice --knows-- charlie alice --knows-- snoopy bob --knows-- charlie charlie --knows-- alice As a first step, I wrote a MapReduce job [6] to transform the RDF graph above in a sort of adjacency list using Turtle syntax, here is the output (three lines): http://example.org/alice http://xmlns.com/foaf/0.1/mbox mailto:al...@example.org;http://xmlns.com/foaf/0.1/name Alice; http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://xmlns.com/foaf/0.1/Person;http://xmlns.com/foaf/0.1/knows http://example.org/charlie,http://example.org/bob, http://example.org/snoopy; .http://example.org/charlie http://xmlns.com/foaf/0.1/knows http://example.org/alice. http://example.org/bob http://xmlns.com/foaf/0.1/name Bob; http://xmlns.com/foaf/0.1/knows http://example.org/charlie; . http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/bob. http://example.org/charlie http://xmlns.com/foaf/0.1/name Charlie; http://xmlns.com/foaf/0.1/knows http://example.org/alice; . http://example.org/bob
Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph
Avery Ching wrote: It shouldn't be, your code looks very similar to the unittests (i.e. TestManualCheckpoint.java). So, you're trying to run your test with the local hadoop (similar to the unittests)? Or are you using an actual hadoop setup? Hi Avery, while I am learning and writing the first examples, I am trying to run with a local hadoop (similar to the unit tests). This way, I can easily run and debug the code from the IDE. Tomorrow, I'll look at the unit tests again trying to see if I can spot what I am doing wrong. Thanks, Paolo Avery On 4/10/12 11:41 PM, Paolo Castagna wrote: I am using hadoop-core-1.0.1.jar ... could that be a problem? Paolo Paolo Castagna wrote: Hi Avery, nope, no luck. I have changed all my log.debug(...) into log.info(...). Same behavior. I have a log4j.properties [1] file in my classpath and it has: log4j.logger.org.apache.jena.grande=DEBUG log4j.logger.org.apache.jena.grande.giraph=DEBUG I also tried to change that to: log4j.logger.org.apache.jena.grande=INFO log4j.logger.org.apache.jena.grande.giraph=INFO No luck. My Giraph job has: GiraphJob job = new GiraphJob(getConf(), getClass().getName()); job.setVertexClass(getClass()); job.setVertexInputFormatClass(TurtleVertexInputFormat.class); job.setVertexOutputFormatClass(TurtleVertexOutputFormat.class); But, if I run in debug with a breakpoint in the TurtleVertexInputFormat.class constructor, it is never instanciated. How can it be? So perhaps the problem is not the logging, it is the fact that my GiraphJob is not using TurtleVertexInputFormat.class and TurtleVertexOutputFormat.class, but I don't see what I am doing wrong. :-/ Thanks, Paolo [1] https://github.com/castagna/jena-grande/blob/master/src/test/resources/log4j.properties Avery Ching wrote: I think the issue might be that Hadoop only logs INFO and above messages by default. Can you retry with INFO level logging? Avery On 4/10/12 12:17 PM, Paolo Castagna wrote: Hi, I am still learning Giraph, so, please, be patient with me and forgive my trivial questions. As a simple initial use case, I want to compute the shortest paths from a single source in a social graph in RDF format using the FOAF [1] vocabulary. This example also will hopefully inform GIRAPH-170 [2] and related issues, such as: GIRAPH-141 [3]. Here is an example in Turtle [4] format of a tiny graph using FOAF: @prefix :http://example.org/ . @prefix foaf:http://xmlns.com/foaf/0.1/ . :alice a foaf:Person ; foaf:name Alice ; foaf:mboxmailto:al...@example.org ; foaf:knows :bob ; foaf:knows :charlie ; foaf:knows :snoopy ; . :bob foaf:name Bob ; foaf:knows :charlie ; . :charlie foaf:name Charlie ; foaf:knows :alice ; . This is nice, human friendly (RDF without angle brackets!), but not easily splittable to be processed with MapReduce (or Giraph). Here is the same graph in N-Triples [5] format: http://example.org/alice http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://xmlns.com/foaf/0.1/Person . http://example.org/alice http://xmlns.com/foaf/0.1/name Alice . http://example.org/alice http://xmlns.com/foaf/0.1/mbox mailto:al...@example.org . http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/bob . http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/charlie . http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/snoopy . http://example.org/charlie http://xmlns.com/foaf/0.1/name Charlie . http://example.org/charlie http://xmlns.com/foaf/0.1/knows http://example.org/alice . http://example.org/bob http://xmlns.com/foaf/0.1/name Bob . http://example.org/bob http://xmlns.com/foaf/0.1/knows http://example.org/charlie . This is more verbose and ugly, but splittable. The graph I am interested in is the graph represented by foaf:knows relationships/links between people (please, note --knows-- relationship here has a direction, this isn't symmetric as in centralized social networking websites such as Facebook or LinkedIn. Alice can claim to know Bob, without Bob knowing it and/or it might even be a false claim): alice --knows-- bob alice --knows-- charlie alice --knows-- snoopy bob --knows-- charlie charlie --knows-- alice As a first step, I wrote a MapReduce job [6] to transform the RDF graph above in a sort of adjacency list using Turtle syntax, here is the output (three lines): http://example.org/alice http://xmlns.com/foaf/0.1/mbox mailto:al...@example.org;http://xmlns.com/foaf/0.1/name Alice; http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://xmlns.com/foaf/0.1/Person;http://xmlns.com/foaf/0.1/knows http://example.org/charlie,http://example.org/bob, http://example.org/snoopy; .http://example.org/charlie
A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph
Hi, I am still learning Giraph, so, please, be patient with me and forgive my trivial questions. As a simple initial use case, I want to compute the shortest paths from a single source in a social graph in RDF format using the FOAF [1] vocabulary. This example also will hopefully inform GIRAPH-170 [2] and related issues, such as: GIRAPH-141 [3]. Here is an example in Turtle [4] format of a tiny graph using FOAF: @prefix : http://example.org/ . @prefix foaf: http://xmlns.com/foaf/0.1/ . :alice a foaf:Person ; foaf:name Alice ; foaf:mbox mailto:al...@example.org ; foaf:knows :bob ; foaf:knows :charlie ; foaf:knows :snoopy ; . :bob foaf:name Bob ; foaf:knows :charlie ; . :charlie foaf:name Charlie ; foaf:knows :alice ; . This is nice, human friendly (RDF without angle brackets!), but not easily splittable to be processed with MapReduce (or Giraph). Here is the same graph in N-Triples [5] format: http://example.org/alice http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://xmlns.com/foaf/0.1/Person . http://example.org/alice http://xmlns.com/foaf/0.1/name Alice . http://example.org/alice http://xmlns.com/foaf/0.1/mbox mailto:al...@example.org . http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/bob . http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/charlie . http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/snoopy . http://example.org/charlie http://xmlns.com/foaf/0.1/name Charlie . http://example.org/charlie http://xmlns.com/foaf/0.1/knows http://example.org/alice . http://example.org/bob http://xmlns.com/foaf/0.1/name Bob . http://example.org/bob http://xmlns.com/foaf/0.1/knows http://example.org/charlie . This is more verbose and ugly, but splittable. The graph I am interested in is the graph represented by foaf:knows relationships/links between people (please, note --knows-- relationship here has a direction, this isn't symmetric as in centralized social networking websites such as Facebook or LinkedIn. Alice can claim to know Bob, without Bob knowing it and/or it might even be a false claim): alice --knows-- bob alice --knows-- charlie alice --knows-- snoopy bob --knows-- charlie charlie --knows-- alice As a first step, I wrote a MapReduce job [6] to transform the RDF graph above in a sort of adjacency list using Turtle syntax, here is the output (three lines): http://example.org/alice http://xmlns.com/foaf/0.1/mbox mailto:al...@example.org; http://xmlns.com/foaf/0.1/name Alice; http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://xmlns.com/foaf/0.1/Person; http://xmlns.com/foaf/0.1/knows http://example.org/charlie, http://example.org/bob, http://example.org/snoopy; . http://example.org/charlie http://xmlns.com/foaf/0.1/knows http://example.org/alice. http://example.org/bob http://xmlns.com/foaf/0.1/name Bob; http://xmlns.com/foaf/0.1/knows http://example.org/charlie; . http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/bob. http://example.org/charlie http://xmlns.com/foaf/0.1/name Charlie; http://xmlns.com/foaf/0.1/knows http://example.org/alice; . http://example.org/bob http://xmlns.com/foaf/0.1/knows http://example.org/charlie. http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/charlie. This is legal Turtle, but it is also splittable. Each line has all the RDF statements (i.e. egdes) for a person (there are also incoming edges). I wrote a TurtleVertexReader [7] which extends TextVertexReaderNodeWritable, Text, NodeWritable, Text and a TurtleVertexInputFormat [8] which extends TextVertexInputFormatNodeWritable, Text, NodeWritable, Text. I wrote (copying from the example SimpleShortestPathsVertex) a FoafShortestPathsVertex [9] which extends EdgeListVertexNodeWritable, IntWritable, NodeWritable, IntWritable and I am running it locally using these arguments: -Dgiraph.maxWorkers=1 -Dgiraph.SplitMasterWorker=false -DoverwriteOutput=true src/test/resources/data3.ttl target/foaf http://example.org/alice 1 TurtleVertexReader, TurtleVertexInputFormat and FoafShortestPathsVertex are still work in progress and I am sure there are plenty of stupid errors. However, I do not understand why when I run FoafShortestPathsVertex with the DEBUG level, I see debug statements from FoafShortestPathsVertex: 19:34:44 DEBUG FoafShortestPathsVertex :: main({-Dgiraph.maxWorkers=1, -Dgiraph.SplitMasterWorker=false, -DoverwriteOutput=true, src/test/resources/data3.ttl, target/foaf, http://example.org/alice, 1}) 19:34:44 DEBUG FoafShortestPathsVertex :: getConf() -- null 19:34:44 DEBUG FoafShortestPathsVertex :: setConf(Configuration: core-default.xml, core-site.xml) 19:34:44 DEBUG FoafShortestPathsVertex :: run({src/test/resources/data3.ttl, target/foaf, http://example.org/alice, 1}) 19:34:44 DEBUG FoafShortestPathsVertex :: getConf() -- Configuration: core-default.xml,
Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph
I think the issue might be that Hadoop only logs INFO and above messages by default. Can you retry with INFO level logging? Avery On 4/10/12 12:17 PM, Paolo Castagna wrote: Hi, I am still learning Giraph, so, please, be patient with me and forgive my trivial questions. As a simple initial use case, I want to compute the shortest paths from a single source in a social graph in RDF format using the FOAF [1] vocabulary. This example also will hopefully inform GIRAPH-170 [2] and related issues, such as: GIRAPH-141 [3]. Here is an example in Turtle [4] format of a tiny graph using FOAF: @prefix :http://example.org/ . @prefix foaf:http://xmlns.com/foaf/0.1/ . :alice a foaf:Person ; foaf:name Alice ; foaf:mboxmailto:al...@example.org ; foaf:knows :bob ; foaf:knows :charlie ; foaf:knows :snoopy ; . :bob foaf:name Bob ; foaf:knows :charlie ; . :charlie foaf:name Charlie ; foaf:knows :alice ; . This is nice, human friendly (RDF without angle brackets!), but not easily splittable to be processed with MapReduce (or Giraph). Here is the same graph in N-Triples [5] format: http://example.org/alice http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://xmlns.com/foaf/0.1/Person . http://example.org/alice http://xmlns.com/foaf/0.1/name Alice . http://example.org/alice http://xmlns.com/foaf/0.1/mbox mailto:al...@example.org . http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/bob . http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/charlie . http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/snoopy . http://example.org/charlie http://xmlns.com/foaf/0.1/name Charlie . http://example.org/charlie http://xmlns.com/foaf/0.1/knows http://example.org/alice . http://example.org/bob http://xmlns.com/foaf/0.1/name Bob . http://example.org/bob http://xmlns.com/foaf/0.1/knows http://example.org/charlie . This is more verbose and ugly, but splittable. The graph I am interested in is the graph represented by foaf:knows relationships/links between people (please, note --knows-- relationship here has a direction, this isn't symmetric as in centralized social networking websites such as Facebook or LinkedIn. Alice can claim to know Bob, without Bob knowing it and/or it might even be a false claim): alice --knows-- bob alice --knows-- charlie alice --knows-- snoopy bob --knows-- charlie charlie --knows-- alice As a first step, I wrote a MapReduce job [6] to transform the RDF graph above in a sort of adjacency list using Turtle syntax, here is the output (three lines): http://example.org/alice http://xmlns.com/foaf/0.1/mbox mailto:al...@example.org;http://xmlns.com/foaf/0.1/name Alice; http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://xmlns.com/foaf/0.1/Person;http://xmlns.com/foaf/0.1/knows http://example.org/charlie,http://example.org/bob, http://example.org/snoopy; .http://example.org/charlie http://xmlns.com/foaf/0.1/knows http://example.org/alice. http://example.org/bob http://xmlns.com/foaf/0.1/name Bob; http://xmlns.com/foaf/0.1/knows http://example.org/charlie; . http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/bob. http://example.org/charlie http://xmlns.com/foaf/0.1/name Charlie; http://xmlns.com/foaf/0.1/knows http://example.org/alice; . http://example.org/bob http://xmlns.com/foaf/0.1/knows http://example.org/charlie.http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/charlie. This is legal Turtle, but it is also splittable. Each line has all the RDF statements (i.e. egdes) for a person (there are also incoming edges). I wrote a TurtleVertexReader [7] which extends TextVertexReaderNodeWritable, Text, NodeWritable, Text and a TurtleVertexInputFormat [8] which extends TextVertexInputFormatNodeWritable, Text, NodeWritable, Text. I wrote (copying from the example SimpleShortestPathsVertex) a FoafShortestPathsVertex [9] which extends EdgeListVertexNodeWritable, IntWritable, NodeWritable, IntWritable and I am running it locally using these arguments: -Dgiraph.maxWorkers=1 -Dgiraph.SplitMasterWorker=false -DoverwriteOutput=true src/test/resources/data3.ttl target/foaf http://example.org/alice 1 TurtleVertexReader, TurtleVertexInputFormat and FoafShortestPathsVertex are still work in progress and I am sure there are plenty of stupid errors. However, I do not understand why when I run FoafShortestPathsVertex with the DEBUG level, I see debug statements from FoafShortestPathsVertex: 19:34:44 DEBUG FoafShortestPathsVertex :: main({-Dgiraph.maxWorkers=1, -Dgiraph.SplitMasterWorker=false, -DoverwriteOutput=true, src/test/resources/data3.ttl, target/foaf, http://example.org/alice, 1}) 19:34:44 DEBUG FoafShortestPathsVertex :: getConf() -- null 19:34:44 DEBUG FoafShortestPathsVertex :: setConf(Configuration: core-default.xml,