Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph

2012-04-13 Thread Paolo Castagna
Avery Ching wrote:
 It shouldn't be, your code looks very similar to the unittests (i.e.
 TestManualCheckpoint.java).  So, you're trying to run your test with the
 local hadoop (similar to the unittests)?  Or are you using an actual
 hadoop setup?

Hi Avery,
here is a few more details on what I am trying to do, in order to run my Giraph
jobs on a local Hadoop running (for testing and debugging stuff locally):

  GiraphJob job = new GiraphJob(shortest paths);
  Configuration conf = job.getConfiguration();
  conf.setBoolean(GiraphJob.SPLIT_MASTER_WORKER, false);
  conf.setBoolean(GiraphJob.LOCAL_TEST_MODE, true);
  // conf.set(GiraphJob.ZOOKEEPER_JAR,
file://target/dependency/zookeeper-3.3.3.jar);
  job.setWorkerConfiguration(1, 1, 100.0f);
  job.setVertexClass(SimpleShortestPathsVertex.class);
  job.setVertexInputFormatClass(SimpleShortestPathsVertexInputFormat.class);
  job.setVertexOutputFormatClass(SimpleShortestPathsVertexOutputFormat.class);
  FileInputFormat.addInputPath(job.getInternalJob(), new
Path(src/main/resources/giraph1.txt));
  Path outputPath = new Path(target/giraph1);
  FileSystem hdfs = FileSystem.get(conf);
  hdfs.delete(outputPath, true);
  FileOutputFormat.setOutputPath(job.getInternalJob(), outputPath);
  job.run(true);

Am I doing something wrong/stupid here?
Am I missing something important? (probably! but I do not see what I am missing)

This is what I think happens...

In GraphMapper something goes wrong during setup(context), probably because
GiraphJob.ZOOKEEPER_JAR is not set(?) and an exception different from
IOException is thrown and I do not see any useful error message:

try {
  setup(context);
  while (context.nextKeyValue()) {
map(context.getCurrentKey(),
context.getCurrentValue(),
context);
  }
  cleanup(context);
} catch (IOException e) {
  if (mapFunctions == MapFunctions.WORKER_ONLY) {
serviceWorker.failureCleanup();
  }
  throw new IllegalStateException(
  run: Caught an unrecoverable exception  + e.getMessage(), e);
}

My question is: is it possible to run a Giraph job as I am trying to do above
(for testing only) or developers need to have an Hadoop cluster either remote
or locally and ZooKeeper running (either remote or locally)?

Thanks,
Paolo


Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph

2012-04-13 Thread Paolo Castagna
Paolo Castagna wrote:
 Avery Ching wrote:
 It shouldn't be, your code looks very similar to the unittests (i.e.
 TestManualCheckpoint.java).  So, you're trying to run your test with the
 local hadoop (similar to the unittests)?  Or are you using an actual
 hadoop setup?
 
 Hi Avery,
 here is a few more details on what I am trying to do, in order to run my 
 Giraph
 jobs on a local Hadoop running (for testing and debugging stuff locally):
 
   GiraphJob job = new GiraphJob(shortest paths);
   Configuration conf = job.getConfiguration();
   conf.setBoolean(GiraphJob.SPLIT_MASTER_WORKER, false);
   conf.setBoolean(GiraphJob.LOCAL_TEST_MODE, true);
   // conf.set(GiraphJob.ZOOKEEPER_JAR,
 file://target/dependency/zookeeper-3.3.3.jar);
   job.setWorkerConfiguration(1, 1, 100.0f);
   job.setVertexClass(SimpleShortestPathsVertex.class);
   job.setVertexInputFormatClass(SimpleShortestPathsVertexInputFormat.class);
   job.setVertexOutputFormatClass(SimpleShortestPathsVertexOutputFormat.class);
   FileInputFormat.addInputPath(job.getInternalJob(), new
 Path(src/main/resources/giraph1.txt));
   Path outputPath = new Path(target/giraph1);
   FileSystem hdfs = FileSystem.get(conf);
   hdfs.delete(outputPath, true);
   FileOutputFormat.setOutputPath(job.getInternalJob(), outputPath);
   job.run(true);
 
 Am I doing something wrong/stupid here?
 Am I missing something important? (probably! but I do not see what I am 
 missing)

This is a better way:

  IterableString results = InternalVertexRunner.run(
SimpleShortestPathsVertex.class,
SimpleShortestPathsVertex.SimpleShortestPathsVertexInputFormat.class,
SimpleShortestPathsVertex.SimpleShortestPathsVertexOutputFormat.class,
params, graph);

... which starts a local ZooKeeper properly.

However, I still have a question: when I run it in a unit test everything is
fine. When I run it on a Java main method, it hangs towards the end.

Paolo


Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph

2012-04-13 Thread Paolo Castagna
Paolo Castagna wrote:
 This is a better way:
 
   IterableString results = InternalVertexRunner.run(
 SimpleShortestPathsVertex.class,
 SimpleShortestPathsVertex.SimpleShortestPathsVertexInputFormat.class,
 SimpleShortestPathsVertex.SimpleShortestPathsVertexOutputFormat.class,
 params, graph);
 
 ... which starts a local ZooKeeper properly.
 
 However, I still have a question: when I run it in a unit test everything is
 fine. When I run it on a Java main method, it hangs towards the end.

I am using Hadoop 1.0.1, Pig 0.9.2, ZooKeeper 3.4.3 and Giraph from trunk:

[INFO] +- org.apache.hadoop:hadoop-core:jar:1.0.1:compile
...
[INFO] +- org.apache.pig:pig:jar:0.9.2:compile
...
[INFO] +- org.apache.hbase:hbase:jar:0.92.1:compile
...
[INFO] +- org.apache.zookeeper:zookeeper:jar:3.4.3:compile
...
[INFO] +- org.apache.giraph:giraph:jar:0.2-SNAPSHOT:compile


Paolo


Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph

2012-04-13 Thread Avery Ching

Hi Paulo,

Can you try something for me?  I was able to get the PageRankBenchmark 
to work running in local mode just fine on my side.


I think we should have some kind of a helper script (similar to 
bin/giraph) to running simple tests in LocalJobRunner.


I believe that for LocalJobRunner to run, we need to do 
-Dgiraph.SplitMasterWorker=false -Dlocal.test.mode=true.  In the case of 
PageRankBenchmark, I also have to set the workers to 1 (LocalJobRunner 
can only run one task at a time).


So I get the class path that bin/giraph was using to run (just added a 
echo $CLASSPATH at the end) and then inserted the 
giraph-0.2-SNAPSHOT-jar-with-dependencies.jar in front of it (this is 
necessary for the ZooKeeper jar inclusion).  Then I just ran a normal 
java command and the output below.


One thing to remember is that if you rerun it, you'll have to remove the 
_bsp directories that are created, otherwise it will think it has 
already been completed.


Hope that helps,

Avery

 java -cp 
target/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar:/Users/aching/git/git_svn_giraph_trunk/conf:/Users/aching/.m2/repository/ant/ant/1.6.5/ant-1.6.5.jar:/Users/aching/.m2/repository/com/google/guava/guava/r09/guava-r09.jar:/Users/aching/.m2/repository/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar:/Users/aching/.m2/repository/commons-beanutils/commons-beanutils-core/1.8.0/commons-beanutils-core-1.8.0.jar:/Users/aching/.m2/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:/Users/aching/.m2/repository/commons-codec/commons-codec/1.4/commons-codec-1.4.jar:/Users/aching/.m2/repository/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar:/Users/aching/.m2/repository/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar:/Users/aching/.m2/repository/commons-digester/commons-digester/1.8/commons-digester-1.8.jar:/Users/aching/.m2/repository/commons-el/commons-el/1.0/commons-el-1.0.jar:/Users/aching/.m2/repository/commons-httpclient/commons-httpclient/3.0.1/commons-httpclient-3.0.1.jar:/Users/aching/.m2/repository/commons-lang/commons-lang/2.4/commons-lang-2.4.jar:/Users/aching/.m2/repository/commons-logging/commons-logging/1.0.3/commons-logging-1.0.3.jar:/Users/aching/.m2/repository/commons-net/commons-net/1.4.1/commons-net-1.4.1.jar:/Users/aching/.m2/repository/hsqldb/hsqldb/1.8.0.10/hsqldb-1.8.0.10.jar:/Users/aching/.m2/repository/javax/activation/activation/1.1/activation-1.1.jar:/Users/aching/.m2/repository/javax/mail/mail/1.4/mail-1.4.jar:/Users/aching/.m2/repository/jline/jline/0.9.94/jline-0.9.94.jar:/Users/aching/.m2/repository/junit/junit/3.8.1/junit-3.8.1.jar:/Users/aching/.m2/repository/log4j/log4j/1.2.15/log4j-1.2.15.jar:/Users/aching/.m2/repository/net/iharder/base64/2.3.8/base64-2.3.8.jar:/Users/aching/.m2/repository/net/java/dev/jets3t/jets3t/0.7.1/jets3t-0.7.1.jar:/Users/aching/.m2/repository/net/sf/kosmosfs/kfs/0.3/kfs-0.3.jar:/Users/aching/.m2/repository/org/apache/commons/commons-io/1.3.2/commons-io-1.3.2.jar:/Users/aching/.m2/repository/org/apache/commons/commons-math/2.1/commons-math-2.1.jar:/Users/aching/.m2/repository/org/apache/hadoop/hadoop-core/0.20.203.0/hadoop-core-0.20.203.0.jar:/Users/aching/.m2/repository/org/apache/mahout/mahout-collections/1.0/mahout-collections-1.0.jar:/Users/aching/.m2/repository/org/apache/zookeeper/zookeeper/3.3.3/zookeeper-3.3.3.jar:/Users/aching/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.8.0/jackson-core-asl-1.8.0.jar:/Users/aching/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.8.0/jackson-mapper-asl-1.8.0.jar:/Users/aching/.m2/repository/org/eclipse/jdt/core/3.1.1/core-3.1.1.jar:/Users/aching/.m2/repository/org/json/json/20090211/json-20090211.jar:/Users/aching/.m2/repository/org/mockito/mockito-all/1.8.5/mockito-all-1.8.5.jar:/Users/aching/.m2/repository/org/mortbay/jetty/jetty/6.1.26/jetty-6.1.26.jar:/Users/aching/.m2/repository/org/mortbay/jetty/jetty-util/6.1.26/jetty-util-6.1.26.jar:/Users/aching/.m2/repository/org/mortbay/jetty/jsp-2.1/6.1.14/jsp-2.1-6.1.14.jar:/Users/aching/.m2/repository/org/mortbay/jetty/jsp-api-2.1/6.1.14/jsp-api-2.1-6.1.14.jar:/Users/aching/.m2/repository/org/mortbay/jetty/servlet-api/2.5-20081211/servlet-api-2.5-20081211.jar:/Users/aching/.m2/repository/org/mortbay/jetty/servlet-api-2.5/6.1.14/servlet-api-2.5-6.1.14.jar:/Users/aching/.m2/repository/oro/oro/2.0.8/oro-2.0.8.jar:/Users/aching/.m2/repository/tomcat/jasper-compiler/5.5.12/jasper-compiler-5.5.12.jar:/Users/aching/.m2/repository/tomcat/jasper-runtime/5.5.12/jasper-runtime-5.5.12.jar:/Users/aching/.m2/repository/xmlenc/xmlenc/0.52/xmlenc-0.52.jar 
org.apache.giraph.benchmark.PageRankBenchmark 
-Dgiraph.SplitMasterWorker=false -Dlocal.test.mode=true  -c 1 -e 2 -s 2 
-V 10 -w 1


2012-04-13 09:30:27.261 java[45785:1903] Unable to load realm mapping 
info from SCDynamicStore
12/04/13 09:30:27 INFO benchmark.PageRankBenchmark: Using class 
org.apache.giraph.benchmark.PageRankBenchmark

Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph

2012-04-11 Thread Paolo Castagna
Hi Avery,
nope, no luck.

I have changed all my log.debug(...) into log.info(...). Same behavior.

I have a log4j.properties [1] file in my classpath and it has:
log4j.logger.org.apache.jena.grande=DEBUG
log4j.logger.org.apache.jena.grande.giraph=DEBUG
I also tried to change that to:
log4j.logger.org.apache.jena.grande=INFO
log4j.logger.org.apache.jena.grande.giraph=INFO
No luck.

My Giraph job has:
GiraphJob job = new GiraphJob(getConf(), getClass().getName());
job.setVertexClass(getClass());
job.setVertexInputFormatClass(TurtleVertexInputFormat.class);
job.setVertexOutputFormatClass(TurtleVertexOutputFormat.class);

But, if I run in debug with a breakpoint in the TurtleVertexInputFormat.class
constructor, it is never instanciated. How can it be?

So perhaps the problem is not the logging, it is the fact that
my GiraphJob is not using TurtleVertexInputFormat.class and
TurtleVertexOutputFormat.class, but I don't see what I am doing
wrong. :-/

Thanks,
Paolo

 [1]
https://github.com/castagna/jena-grande/blob/master/src/test/resources/log4j.properties

Avery Ching wrote:
 I think the issue might be that Hadoop only logs INFO and above messages
 by default.  Can you retry with INFO level logging?
 
 Avery
 
 On 4/10/12 12:17 PM, Paolo Castagna wrote:
 Hi,
 I am still learning Giraph, so, please, be patient with me and forgive my
 trivial questions.

 As a simple initial use case, I want to compute the shortest paths
 from a single
 source in a social graph in RDF format using the FOAF [1] vocabulary.
 This example also will hopefully inform GIRAPH-170 [2] and related
 issues, such
 as: GIRAPH-141 [3].

 Here is an example in Turtle [4] format of a tiny graph using FOAF:
 
 @prefix :http://example.org/  .
 @prefix foaf:http://xmlns.com/foaf/0.1/  .

 :alice
  a   foaf:Person ;
  foaf:name   Alice ;
  foaf:mboxmailto:al...@example.org  ;
  foaf:knows  :bob ;
  foaf:knows  :charlie ;
  foaf:knows  :snoopy ;
  .

 :bob
  foaf:name   Bob ;
  foaf:knows  :charlie ;
  .

 :charlie
  foaf:name   Charlie ;
  foaf:knows  :alice ;
  .
 
 This is nice, human friendly (RDF without angle brackets!), but not
 easily
 splittable to be processed with MapReduce (or Giraph).

 Here is the same graph in N-Triples [5] format:
 
 http://example.org/alice 
 http://www.w3.org/1999/02/22-rdf-syntax-ns#type
 http://xmlns.com/foaf/0.1/Person  .
 http://example.org/alice  http://xmlns.com/foaf/0.1/name  Alice .
 http://example.org/alice  http://xmlns.com/foaf/0.1/mbox
 mailto:al...@example.org  .
 http://example.org/alice  http://xmlns.com/foaf/0.1/knows
 http://example.org/bob  .
 http://example.org/alice  http://xmlns.com/foaf/0.1/knows
 http://example.org/charlie  .
 http://example.org/alice  http://xmlns.com/foaf/0.1/knows
 http://example.org/snoopy  .
 http://example.org/charlie  http://xmlns.com/foaf/0.1/name 
 Charlie .
 http://example.org/charlie  http://xmlns.com/foaf/0.1/knows
 http://example.org/alice  .
 http://example.org/bob  http://xmlns.com/foaf/0.1/name  Bob .
 http://example.org/bob  http://xmlns.com/foaf/0.1/knows
 http://example.org/charlie  .
 
 This is more verbose and ugly, but splittable.

 The graph I am interested in is the graph represented by foaf:knows
 relationships/links between people (please, note --knows-- 
 relationship here
 has a direction, this isn't symmetric as in centralized social networking
 websites such as Facebook or LinkedIn. Alice can claim to know Bob,
 without Bob
 knowing it and/or it might even be a false claim):

 alice --knows--  bob
 alice --knows--  charlie
 alice --knows--  snoopy
 bob --knows--  charlie
 charlie --knows--  alice

 As a first step, I wrote a MapReduce job [6] to transform the RDF
 graph above in
 a sort of adjacency list using Turtle syntax, here is the output
 (three lines):
 
 http://example.org/alice  http://xmlns.com/foaf/0.1/mbox
 mailto:al...@example.org;http://xmlns.com/foaf/0.1/name  Alice;
 http://www.w3.org/1999/02/22-rdf-syntax-ns#type
 http://xmlns.com/foaf/0.1/Person;http://xmlns.com/foaf/0.1/knows
 http://example.org/charlie,http://example.org/bob,
 http://example.org/snoopy; .http://example.org/charlie
 http://xmlns.com/foaf/0.1/knows  http://example.org/alice.

 http://example.org/bob  http://xmlns.com/foaf/0.1/name  Bob;
 http://xmlns.com/foaf/0.1/knows  http://example.org/charlie; .
 http://example.org/alice  http://xmlns.com/foaf/0.1/knows
 http://example.org/bob.

 http://example.org/charlie  http://xmlns.com/foaf/0.1/name 
 Charlie;
 http://xmlns.com/foaf/0.1/knows  http://example.org/alice; .
 http://example.org/bob  http://xmlns.com/foaf/0.1/knows
 http://example.org/charlie.http://example.org/alice
 http://xmlns.com/foaf/0.1/knows  http://example.org/charlie.
 
 This is legal Turtle, but it is also splittable. Each line has all the
 RDF
 statements (i.e. egdes) for a person (there are also incoming edges).

 I wrote a TurtleVertexReader [7] which extends
 

Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph

2012-04-11 Thread Paolo Castagna
I am using hadoop-core-1.0.1.jar ... could that be a problem?

Paolo

Paolo Castagna wrote:
 Hi Avery,
 nope, no luck.
 
 I have changed all my log.debug(...) into log.info(...). Same behavior.
 
 I have a log4j.properties [1] file in my classpath and it has:
 log4j.logger.org.apache.jena.grande=DEBUG
 log4j.logger.org.apache.jena.grande.giraph=DEBUG
 I also tried to change that to:
 log4j.logger.org.apache.jena.grande=INFO
 log4j.logger.org.apache.jena.grande.giraph=INFO
 No luck.
 
 My Giraph job has:
 GiraphJob job = new GiraphJob(getConf(), getClass().getName());
 job.setVertexClass(getClass());
 job.setVertexInputFormatClass(TurtleVertexInputFormat.class);
 job.setVertexOutputFormatClass(TurtleVertexOutputFormat.class);
 
 But, if I run in debug with a breakpoint in the TurtleVertexInputFormat.class
 constructor, it is never instanciated. How can it be?
 
 So perhaps the problem is not the logging, it is the fact that
 my GiraphJob is not using TurtleVertexInputFormat.class and
 TurtleVertexOutputFormat.class, but I don't see what I am doing
 wrong. :-/
 
 Thanks,
 Paolo
 
  [1]
 https://github.com/castagna/jena-grande/blob/master/src/test/resources/log4j.properties
 
 Avery Ching wrote:
 I think the issue might be that Hadoop only logs INFO and above messages
 by default.  Can you retry with INFO level logging?

 Avery

 On 4/10/12 12:17 PM, Paolo Castagna wrote:
 Hi,
 I am still learning Giraph, so, please, be patient with me and forgive my
 trivial questions.

 As a simple initial use case, I want to compute the shortest paths
 from a single
 source in a social graph in RDF format using the FOAF [1] vocabulary.
 This example also will hopefully inform GIRAPH-170 [2] and related
 issues, such
 as: GIRAPH-141 [3].

 Here is an example in Turtle [4] format of a tiny graph using FOAF:
 
 @prefix :http://example.org/  .
 @prefix foaf:http://xmlns.com/foaf/0.1/  .

 :alice
  a   foaf:Person ;
  foaf:name   Alice ;
  foaf:mboxmailto:al...@example.org  ;
  foaf:knows  :bob ;
  foaf:knows  :charlie ;
  foaf:knows  :snoopy ;
  .

 :bob
  foaf:name   Bob ;
  foaf:knows  :charlie ;
  .

 :charlie
  foaf:name   Charlie ;
  foaf:knows  :alice ;
  .
 
 This is nice, human friendly (RDF without angle brackets!), but not
 easily
 splittable to be processed with MapReduce (or Giraph).

 Here is the same graph in N-Triples [5] format:
 
 http://example.org/alice 
 http://www.w3.org/1999/02/22-rdf-syntax-ns#type
 http://xmlns.com/foaf/0.1/Person  .
 http://example.org/alice  http://xmlns.com/foaf/0.1/name  Alice .
 http://example.org/alice  http://xmlns.com/foaf/0.1/mbox
 mailto:al...@example.org  .
 http://example.org/alice  http://xmlns.com/foaf/0.1/knows
 http://example.org/bob  .
 http://example.org/alice  http://xmlns.com/foaf/0.1/knows
 http://example.org/charlie  .
 http://example.org/alice  http://xmlns.com/foaf/0.1/knows
 http://example.org/snoopy  .
 http://example.org/charlie  http://xmlns.com/foaf/0.1/name 
 Charlie .
 http://example.org/charlie  http://xmlns.com/foaf/0.1/knows
 http://example.org/alice  .
 http://example.org/bob  http://xmlns.com/foaf/0.1/name  Bob .
 http://example.org/bob  http://xmlns.com/foaf/0.1/knows
 http://example.org/charlie  .
 
 This is more verbose and ugly, but splittable.

 The graph I am interested in is the graph represented by foaf:knows
 relationships/links between people (please, note --knows-- 
 relationship here
 has a direction, this isn't symmetric as in centralized social networking
 websites such as Facebook or LinkedIn. Alice can claim to know Bob,
 without Bob
 knowing it and/or it might even be a false claim):

 alice --knows--  bob
 alice --knows--  charlie
 alice --knows--  snoopy
 bob --knows--  charlie
 charlie --knows--  alice

 As a first step, I wrote a MapReduce job [6] to transform the RDF
 graph above in
 a sort of adjacency list using Turtle syntax, here is the output
 (three lines):
 
 http://example.org/alice  http://xmlns.com/foaf/0.1/mbox
 mailto:al...@example.org;http://xmlns.com/foaf/0.1/name  Alice;
 http://www.w3.org/1999/02/22-rdf-syntax-ns#type
 http://xmlns.com/foaf/0.1/Person;http://xmlns.com/foaf/0.1/knows
 http://example.org/charlie,http://example.org/bob,
 http://example.org/snoopy; .http://example.org/charlie
 http://xmlns.com/foaf/0.1/knows  http://example.org/alice.

 http://example.org/bob  http://xmlns.com/foaf/0.1/name  Bob;
 http://xmlns.com/foaf/0.1/knows  http://example.org/charlie; .
 http://example.org/alice  http://xmlns.com/foaf/0.1/knows
 http://example.org/bob.

 http://example.org/charlie  http://xmlns.com/foaf/0.1/name 
 Charlie;
 http://xmlns.com/foaf/0.1/knows  http://example.org/alice; .
 http://example.org/bob  http://xmlns.com/foaf/0.1/knows
 http://example.org/charlie.http://example.org/alice
 http://xmlns.com/foaf/0.1/knows  http://example.org/charlie.
 
 This is legal Turtle, but it is also splittable. Each line has all the
 

Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph

2012-04-11 Thread Avery Ching
It shouldn't be, your code looks very similar to the unittests (i.e. 
TestManualCheckpoint.java).  So, you're trying to run your test with the 
local hadoop (similar to the unittests)?  Or are you using an actual 
hadoop setup?


Avery

On 4/10/12 11:41 PM, Paolo Castagna wrote:

I am using hadoop-core-1.0.1.jar ... could that be a problem?

Paolo

Paolo Castagna wrote:

Hi Avery,
nope, no luck.

I have changed all my log.debug(...) into log.info(...). Same behavior.

I have a log4j.properties [1] file in my classpath and it has:
log4j.logger.org.apache.jena.grande=DEBUG
log4j.logger.org.apache.jena.grande.giraph=DEBUG
I also tried to change that to:
log4j.logger.org.apache.jena.grande=INFO
log4j.logger.org.apache.jena.grande.giraph=INFO
No luck.

My Giraph job has:
GiraphJob job = new GiraphJob(getConf(), getClass().getName());
job.setVertexClass(getClass());
job.setVertexInputFormatClass(TurtleVertexInputFormat.class);
job.setVertexOutputFormatClass(TurtleVertexOutputFormat.class);

But, if I run in debug with a breakpoint in the TurtleVertexInputFormat.class
constructor, it is never instanciated. How can it be?

So perhaps the problem is not the logging, it is the fact that
my GiraphJob is not using TurtleVertexInputFormat.class and
TurtleVertexOutputFormat.class, but I don't see what I am doing
wrong. :-/

Thanks,
Paolo

  [1]
https://github.com/castagna/jena-grande/blob/master/src/test/resources/log4j.properties

Avery Ching wrote:

I think the issue might be that Hadoop only logs INFO and above messages
by default.  Can you retry with INFO level logging?

Avery

On 4/10/12 12:17 PM, Paolo Castagna wrote:

Hi,
I am still learning Giraph, so, please, be patient with me and forgive my
trivial questions.

As a simple initial use case, I want to compute the shortest paths
from a single
source in a social graph in RDF format using the FOAF [1] vocabulary.
This example also will hopefully inform GIRAPH-170 [2] and related
issues, such
as: GIRAPH-141 [3].

Here is an example in Turtle [4] format of a tiny graph using FOAF:

@prefix :http://example.org/   .
@prefix foaf:http://xmlns.com/foaf/0.1/   .

:alice
  a   foaf:Person ;
  foaf:name   Alice ;
  foaf:mboxmailto:al...@example.org   ;
  foaf:knows  :bob ;
  foaf:knows  :charlie ;
  foaf:knows  :snoopy ;
  .

:bob
  foaf:name   Bob ;
  foaf:knows  :charlie ;
  .

:charlie
  foaf:name   Charlie ;
  foaf:knows  :alice ;
  .

This is nice, human friendly (RDF without angle brackets!), but not
easily
splittable to be processed with MapReduce (or Giraph).

Here is the same graph in N-Triples [5] format:

http://example.org/alice
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://xmlns.com/foaf/0.1/Person   .
http://example.org/alice   http://xmlns.com/foaf/0.1/name   Alice .
http://example.org/alice   http://xmlns.com/foaf/0.1/mbox
mailto:al...@example.org   .
http://example.org/alice   http://xmlns.com/foaf/0.1/knows
http://example.org/bob   .
http://example.org/alice   http://xmlns.com/foaf/0.1/knows
http://example.org/charlie   .
http://example.org/alice   http://xmlns.com/foaf/0.1/knows
http://example.org/snoopy   .
http://example.org/charlie   http://xmlns.com/foaf/0.1/name
Charlie .
http://example.org/charlie   http://xmlns.com/foaf/0.1/knows
http://example.org/alice   .
http://example.org/bob   http://xmlns.com/foaf/0.1/name   Bob .
http://example.org/bob   http://xmlns.com/foaf/0.1/knows
http://example.org/charlie   .

This is more verbose and ugly, but splittable.

The graph I am interested in is the graph represented by foaf:knows
relationships/links between people (please, note --knows--
relationship here
has a direction, this isn't symmetric as in centralized social networking
websites such as Facebook or LinkedIn. Alice can claim to know Bob,
without Bob
knowing it and/or it might even be a false claim):

alice --knows--   bob
alice --knows--   charlie
alice --knows--   snoopy
bob --knows--   charlie
charlie --knows--   alice

As a first step, I wrote a MapReduce job [6] to transform the RDF
graph above in
a sort of adjacency list using Turtle syntax, here is the output
(three lines):

http://example.org/alice   http://xmlns.com/foaf/0.1/mbox
mailto:al...@example.org;http://xmlns.com/foaf/0.1/name   Alice;
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://xmlns.com/foaf/0.1/Person;http://xmlns.com/foaf/0.1/knows
http://example.org/charlie,http://example.org/bob,
http://example.org/snoopy; .http://example.org/charlie
http://xmlns.com/foaf/0.1/knows   http://example.org/alice.

http://example.org/bob   http://xmlns.com/foaf/0.1/name   Bob;
http://xmlns.com/foaf/0.1/knows   http://example.org/charlie; .
http://example.org/alice   http://xmlns.com/foaf/0.1/knows
http://example.org/bob.

http://example.org/charlie   http://xmlns.com/foaf/0.1/name
Charlie;
http://xmlns.com/foaf/0.1/knows   http://example.org/alice; .
http://example.org/bob   

Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph

2012-04-11 Thread Paolo Castagna
Avery Ching wrote:
 It shouldn't be, your code looks very similar to the unittests (i.e.
 TestManualCheckpoint.java).  So, you're trying to run your test with the
 local hadoop (similar to the unittests)?  Or are you using an actual
 hadoop setup?

Hi Avery,
while I am learning and writing the first examples, I am trying to run with
a local hadoop (similar to the unit tests). This way, I can easily run and
debug the code from the IDE.

Tomorrow, I'll look at the unit tests again trying to see if I can spot what
I am doing wrong.

Thanks,
Paolo

 
 Avery
 
 On 4/10/12 11:41 PM, Paolo Castagna wrote:
 I am using hadoop-core-1.0.1.jar ... could that be a problem?

 Paolo

 Paolo Castagna wrote:
 Hi Avery,
 nope, no luck.

 I have changed all my log.debug(...) into log.info(...). Same behavior.

 I have a log4j.properties [1] file in my classpath and it has:
 log4j.logger.org.apache.jena.grande=DEBUG
 log4j.logger.org.apache.jena.grande.giraph=DEBUG
 I also tried to change that to:
 log4j.logger.org.apache.jena.grande=INFO
 log4j.logger.org.apache.jena.grande.giraph=INFO
 No luck.

 My Giraph job has:
 GiraphJob job = new GiraphJob(getConf(), getClass().getName());
 job.setVertexClass(getClass());
 job.setVertexInputFormatClass(TurtleVertexInputFormat.class);
 job.setVertexOutputFormatClass(TurtleVertexOutputFormat.class);

 But, if I run in debug with a breakpoint in the
 TurtleVertexInputFormat.class
 constructor, it is never instanciated. How can it be?

 So perhaps the problem is not the logging, it is the fact that
 my GiraphJob is not using TurtleVertexInputFormat.class and
 TurtleVertexOutputFormat.class, but I don't see what I am doing
 wrong. :-/

 Thanks,
 Paolo

   [1]
 https://github.com/castagna/jena-grande/blob/master/src/test/resources/log4j.properties


 Avery Ching wrote:
 I think the issue might be that Hadoop only logs INFO and above
 messages
 by default.  Can you retry with INFO level logging?

 Avery

 On 4/10/12 12:17 PM, Paolo Castagna wrote:
 Hi,
 I am still learning Giraph, so, please, be patient with me and
 forgive my
 trivial questions.

 As a simple initial use case, I want to compute the shortest paths
 from a single
 source in a social graph in RDF format using the FOAF [1] vocabulary.
 This example also will hopefully inform GIRAPH-170 [2] and related
 issues, such
 as: GIRAPH-141 [3].

 Here is an example in Turtle [4] format of a tiny graph using FOAF:
 
 @prefix :http://example.org/   .
 @prefix foaf:http://xmlns.com/foaf/0.1/   .

 :alice
   a   foaf:Person ;
   foaf:name   Alice ;
   foaf:mboxmailto:al...@example.org   ;
   foaf:knows  :bob ;
   foaf:knows  :charlie ;
   foaf:knows  :snoopy ;
   .

 :bob
   foaf:name   Bob ;
   foaf:knows  :charlie ;
   .

 :charlie
   foaf:name   Charlie ;
   foaf:knows  :alice ;
   .
 
 This is nice, human friendly (RDF without angle brackets!), but not
 easily
 splittable to be processed with MapReduce (or Giraph).

 Here is the same graph in N-Triples [5] format:
 
 http://example.org/alice
 http://www.w3.org/1999/02/22-rdf-syntax-ns#type
 http://xmlns.com/foaf/0.1/Person   .
 http://example.org/alice   http://xmlns.com/foaf/0.1/name  
 Alice .
 http://example.org/alice   http://xmlns.com/foaf/0.1/mbox
 mailto:al...@example.org   .
 http://example.org/alice   http://xmlns.com/foaf/0.1/knows
 http://example.org/bob   .
 http://example.org/alice   http://xmlns.com/foaf/0.1/knows
 http://example.org/charlie   .
 http://example.org/alice   http://xmlns.com/foaf/0.1/knows
 http://example.org/snoopy   .
 http://example.org/charlie   http://xmlns.com/foaf/0.1/name
 Charlie .
 http://example.org/charlie   http://xmlns.com/foaf/0.1/knows
 http://example.org/alice   .
 http://example.org/bob   http://xmlns.com/foaf/0.1/name   Bob .
 http://example.org/bob   http://xmlns.com/foaf/0.1/knows
 http://example.org/charlie   .
 
 This is more verbose and ugly, but splittable.

 The graph I am interested in is the graph represented by foaf:knows
 relationships/links between people (please, note --knows--
 relationship here
 has a direction, this isn't symmetric as in centralized social
 networking
 websites such as Facebook or LinkedIn. Alice can claim to know Bob,
 without Bob
 knowing it and/or it might even be a false claim):

 alice --knows--   bob
 alice --knows--   charlie
 alice --knows--   snoopy
 bob --knows--   charlie
 charlie --knows--   alice

 As a first step, I wrote a MapReduce job [6] to transform the RDF
 graph above in
 a sort of adjacency list using Turtle syntax, here is the output
 (three lines):
 
 http://example.org/alice   http://xmlns.com/foaf/0.1/mbox
 mailto:al...@example.org;http://xmlns.com/foaf/0.1/name   Alice;
 http://www.w3.org/1999/02/22-rdf-syntax-ns#type
 http://xmlns.com/foaf/0.1/Person;http://xmlns.com/foaf/0.1/knows
 http://example.org/charlie,http://example.org/bob,
 http://example.org/snoopy; .http://example.org/charlie
 

A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph

2012-04-10 Thread Paolo Castagna
Hi,
I am still learning Giraph, so, please, be patient with me and forgive my
trivial questions.

As a simple initial use case, I want to compute the shortest paths from a single
source in a social graph in RDF format using the FOAF [1] vocabulary.
This example also will hopefully inform GIRAPH-170 [2] and related issues, such
as: GIRAPH-141 [3].

Here is an example in Turtle [4] format of a tiny graph using FOAF:

@prefix : http://example.org/ .
@prefix foaf:   http://xmlns.com/foaf/0.1/ .

:alice
a   foaf:Person ;
foaf:name   Alice ;
foaf:mbox   mailto:al...@example.org ;
foaf:knows  :bob ;
foaf:knows  :charlie ;
foaf:knows  :snoopy ;
.

:bob
foaf:name   Bob ;
foaf:knows  :charlie ;
.

:charlie
foaf:name   Charlie ;
foaf:knows  :alice ;
.

This is nice, human friendly (RDF without angle brackets!), but not easily
splittable to be processed with MapReduce (or Giraph).

Here is the same graph in N-Triples [5] format:

http://example.org/alice http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://xmlns.com/foaf/0.1/Person .
http://example.org/alice http://xmlns.com/foaf/0.1/name Alice .
http://example.org/alice http://xmlns.com/foaf/0.1/mbox
mailto:al...@example.org .
http://example.org/alice http://xmlns.com/foaf/0.1/knows
http://example.org/bob .
http://example.org/alice http://xmlns.com/foaf/0.1/knows
http://example.org/charlie .
http://example.org/alice http://xmlns.com/foaf/0.1/knows
http://example.org/snoopy .
http://example.org/charlie http://xmlns.com/foaf/0.1/name Charlie .
http://example.org/charlie http://xmlns.com/foaf/0.1/knows
http://example.org/alice .
http://example.org/bob http://xmlns.com/foaf/0.1/name Bob .
http://example.org/bob http://xmlns.com/foaf/0.1/knows
http://example.org/charlie .

This is more verbose and ugly, but splittable.

The graph I am interested in is the graph represented by foaf:knows
relationships/links between people (please, note --knows-- relationship here
has a direction, this isn't symmetric as in centralized social networking
websites such as Facebook or LinkedIn. Alice can claim to know Bob, without Bob
knowing it and/or it might even be a false claim):

alice --knows-- bob
alice --knows-- charlie
alice --knows-- snoopy
bob --knows-- charlie
charlie --knows-- alice

As a first step, I wrote a MapReduce job [6] to transform the RDF graph above in
a sort of adjacency list using Turtle syntax, here is the output (three lines):

http://example.org/alice http://xmlns.com/foaf/0.1/mbox
mailto:al...@example.org; http://xmlns.com/foaf/0.1/name Alice;
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://xmlns.com/foaf/0.1/Person; http://xmlns.com/foaf/0.1/knows
http://example.org/charlie, http://example.org/bob,
http://example.org/snoopy; . http://example.org/charlie
http://xmlns.com/foaf/0.1/knows http://example.org/alice.

http://example.org/bob http://xmlns.com/foaf/0.1/name Bob;
http://xmlns.com/foaf/0.1/knows http://example.org/charlie; .
http://example.org/alice http://xmlns.com/foaf/0.1/knows
http://example.org/bob.

http://example.org/charlie http://xmlns.com/foaf/0.1/name Charlie;
http://xmlns.com/foaf/0.1/knows http://example.org/alice; .
http://example.org/bob http://xmlns.com/foaf/0.1/knows
http://example.org/charlie. http://example.org/alice
http://xmlns.com/foaf/0.1/knows http://example.org/charlie.

This is legal Turtle, but it is also splittable. Each line has all the RDF
statements (i.e. egdes) for a person (there are also incoming edges).

I wrote a TurtleVertexReader [7] which extends TextVertexReaderNodeWritable,
Text, NodeWritable, Text and a TurtleVertexInputFormat [8] which extends
TextVertexInputFormatNodeWritable, Text, NodeWritable, Text.
I wrote (copying from the example SimpleShortestPathsVertex) a
FoafShortestPathsVertex [9] which extends EdgeListVertexNodeWritable,
IntWritable, NodeWritable, IntWritable and I am running it locally using these
arguments: -Dgiraph.maxWorkers=1 -Dgiraph.SplitMasterWorker=false
-DoverwriteOutput=true src/test/resources/data3.ttl target/foaf
http://example.org/alice 1

TurtleVertexReader, TurtleVertexInputFormat and FoafShortestPathsVertex are
still work in progress and I am sure there are plenty of stupid errors.
However, I do not understand why when I run FoafShortestPathsVertex with the
DEBUG level, I see debug statements from FoafShortestPathsVertex:
19:34:44 DEBUG FoafShortestPathsVertex   :: main({-Dgiraph.maxWorkers=1,
-Dgiraph.SplitMasterWorker=false, -DoverwriteOutput=true,
src/test/resources/data3.ttl, target/foaf, http://example.org/alice, 1})
19:34:44 DEBUG FoafShortestPathsVertex   :: getConf() -- null
19:34:44 DEBUG FoafShortestPathsVertex   :: setConf(Configuration:
core-default.xml, core-site.xml)
19:34:44 DEBUG FoafShortestPathsVertex   :: run({src/test/resources/data3.ttl,
target/foaf, http://example.org/alice, 1})
19:34:44 DEBUG FoafShortestPathsVertex   :: getConf() -- Configuration:
core-default.xml, 

Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph

2012-04-10 Thread Avery Ching
I think the issue might be that Hadoop only logs INFO and above messages 
by default.  Can you retry with INFO level logging?


Avery

On 4/10/12 12:17 PM, Paolo Castagna wrote:

Hi,
I am still learning Giraph, so, please, be patient with me and forgive my
trivial questions.

As a simple initial use case, I want to compute the shortest paths from a single
source in a social graph in RDF format using the FOAF [1] vocabulary.
This example also will hopefully inform GIRAPH-170 [2] and related issues, such
as: GIRAPH-141 [3].

Here is an example in Turtle [4] format of a tiny graph using FOAF:

@prefix :http://example.org/  .
@prefix foaf:http://xmlns.com/foaf/0.1/  .

:alice
 a   foaf:Person ;
 foaf:name   Alice ;
 foaf:mboxmailto:al...@example.org  ;
 foaf:knows  :bob ;
 foaf:knows  :charlie ;
 foaf:knows  :snoopy ;
 .

:bob
 foaf:name   Bob ;
 foaf:knows  :charlie ;
 .

:charlie
 foaf:name   Charlie ;
 foaf:knows  :alice ;
 .

This is nice, human friendly (RDF without angle brackets!), but not easily
splittable to be processed with MapReduce (or Giraph).

Here is the same graph in N-Triples [5] format:

http://example.org/alice  http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://xmlns.com/foaf/0.1/Person  .
http://example.org/alice  http://xmlns.com/foaf/0.1/name  Alice .
http://example.org/alice  http://xmlns.com/foaf/0.1/mbox
mailto:al...@example.org  .
http://example.org/alice  http://xmlns.com/foaf/0.1/knows
http://example.org/bob  .
http://example.org/alice  http://xmlns.com/foaf/0.1/knows
http://example.org/charlie  .
http://example.org/alice  http://xmlns.com/foaf/0.1/knows
http://example.org/snoopy  .
http://example.org/charlie  http://xmlns.com/foaf/0.1/name  Charlie .
http://example.org/charlie  http://xmlns.com/foaf/0.1/knows
http://example.org/alice  .
http://example.org/bob  http://xmlns.com/foaf/0.1/name  Bob .
http://example.org/bob  http://xmlns.com/foaf/0.1/knows
http://example.org/charlie  .

This is more verbose and ugly, but splittable.

The graph I am interested in is the graph represented by foaf:knows
relationships/links between people (please, note --knows--  relationship here
has a direction, this isn't symmetric as in centralized social networking
websites such as Facebook or LinkedIn. Alice can claim to know Bob, without Bob
knowing it and/or it might even be a false claim):

alice --knows--  bob
alice --knows--  charlie
alice --knows--  snoopy
bob --knows--  charlie
charlie --knows--  alice

As a first step, I wrote a MapReduce job [6] to transform the RDF graph above in
a sort of adjacency list using Turtle syntax, here is the output (three lines):

http://example.org/alice  http://xmlns.com/foaf/0.1/mbox
mailto:al...@example.org;http://xmlns.com/foaf/0.1/name  Alice;
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://xmlns.com/foaf/0.1/Person;http://xmlns.com/foaf/0.1/knows
http://example.org/charlie,http://example.org/bob,
http://example.org/snoopy; .http://example.org/charlie
http://xmlns.com/foaf/0.1/knows  http://example.org/alice.

http://example.org/bob  http://xmlns.com/foaf/0.1/name  Bob;
http://xmlns.com/foaf/0.1/knows  http://example.org/charlie; .
http://example.org/alice  http://xmlns.com/foaf/0.1/knows
http://example.org/bob.

http://example.org/charlie  http://xmlns.com/foaf/0.1/name  Charlie;
http://xmlns.com/foaf/0.1/knows  http://example.org/alice; .
http://example.org/bob  http://xmlns.com/foaf/0.1/knows
http://example.org/charlie.http://example.org/alice
http://xmlns.com/foaf/0.1/knows  http://example.org/charlie.

This is legal Turtle, but it is also splittable. Each line has all the RDF
statements (i.e. egdes) for a person (there are also incoming edges).

I wrote a TurtleVertexReader [7] which extends TextVertexReaderNodeWritable,
Text, NodeWritable, Text  and a TurtleVertexInputFormat [8] which extends
TextVertexInputFormatNodeWritable, Text, NodeWritable, Text.
I wrote (copying from the example SimpleShortestPathsVertex) a
FoafShortestPathsVertex [9] which extends EdgeListVertexNodeWritable,
IntWritable, NodeWritable, IntWritable  and I am running it locally using these
arguments: -Dgiraph.maxWorkers=1 -Dgiraph.SplitMasterWorker=false
-DoverwriteOutput=true src/test/resources/data3.ttl target/foaf
http://example.org/alice 1

TurtleVertexReader, TurtleVertexInputFormat and FoafShortestPathsVertex are
still work in progress and I am sure there are plenty of stupid errors.
However, I do not understand why when I run FoafShortestPathsVertex with the
DEBUG level, I see debug statements from FoafShortestPathsVertex:
19:34:44 DEBUG FoafShortestPathsVertex   :: main({-Dgiraph.maxWorkers=1,
-Dgiraph.SplitMasterWorker=false, -DoverwriteOutput=true,
src/test/resources/data3.ttl, target/foaf, http://example.org/alice, 1})
19:34:44 DEBUG FoafShortestPathsVertex   :: getConf() --  null
19:34:44 DEBUG FoafShortestPathsVertex   :: setConf(Configuration:
core-default.xml,