Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a, Friend) graph

2012-04-24 Thread Paolo Castagna
I am forwarding this, as Dan asked below.

Paolo

 Original Message 
Subject: Fwd: failure notice
Date: Mon, 23 Apr 2012 17:27:21 +0200
From: Dan Brickley 
To: castagna.li...@googlemail.com
References:

<4f957395.885e320a.41b9.9d7csmtpin_ad...@mx.google.com>

bugger, wrong email account! care to fwd this if you want to reply?
I'll try to fix my setup but can't do it right now... --Dan


-- Forwarded message --
From:  
Date: 23 April 2012 17:21
Subject: failure notice
To: dan...@danbri.org


Hi. This is the qmail-send program at apache.org.
I'm afraid I wasn't able to deliver your message to the following addresses.
This is a permanent error; I've given up. Sorry it didn't work out.

:
Sorry, only subscribers may post. If you are a subscriber, check to be
sure you are sending from your subscribed address. (#5.7.2)

--- Below this line is a copy of the message.

Return-Path: 
Received: (qmail 84383 invoked by uid 99); 23 Apr 2012 15:21:56 -
Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136)
   by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Apr 2012 15:21:56 +
X-ASF-Spam-Status: No, hits=-0.0 required=5.0
   tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL
X-Spam-Check-By: apache.org
Received-SPF: neutral (athena.apache.org: local policy)
Received: from [209.85.214.47] (HELO mail-bk0-f47.google.com) (209.85.214.47)
   by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Apr 2012 15:21:51 +
Received: by bkcjk7 with SMTP id jk7so2927962bkc.6
   for ; Mon, 23 Apr 2012
08:21:29 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
   d=google.com; s=20120113;
   h=mime-version:in-reply-to:references:date:message-id:subject:from:to
:content-type:content-transfer-encoding:x-gm-message-state;
   bh=cCEkc8+a3U3oX7B9AFZnWXGvlVmPZ8TB31bupuAKO74=;
   b=ngZhFknpnlKVLU9tDXdw5qmlbjIrQm7Ai1h/+5NWeyQGUPuil4/tf58eRFgVEUIjQ0
39O5WsFKSsoAIROaTwneJC1YXaJt98ddIiRF1X3Z0w9qZHjobG1bj3c3h7lBsKRFlbc/
METpOJSIHNl7aYXkZNuEg0PHQWi8oc3HewQPRafQ8vPy2W0evXUc6+opekswtJ+o05Z3
hRavrNi0jkZcuCRTiQ6NTvTIVcfRWIOS5j26n3QqBRaz2t/YFD6W4cbUsAkAVcIZBEWE
p8GdscMhxJj5wV0MmUNa3TUVtemVpoaVALY7lcuKJtLDnDErmx6dR2xoci6Nmzy+zleL
0xtw==
MIME-Version: 1.0
Received: by 10.205.118.139 with SMTP id fq11mr2451545bkc.123.1335194487870;
 Mon, 23 Apr 2012 08:21:27 -0700 (PDT)
Received: by 10.204.39.204 with HTTP; Mon, 23 Apr 2012 08:21:27 -0700 (PDT)
In-Reply-To: <4f84872e.4050...@googlemail.com>
References: <4f84872e.4050...@googlemail.com>
Date: Mon, 23 Apr 2012 17:21:27 +0200
Message-ID: 
Subject: Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a
 Friend) graph
From: Dan Brickley 
To: giraph-user@incubator.apache.org
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Gm-Message-State:
ALoCoQnUS1ocrSMdpe0lzN+Vop4sPicGW0IrpqSlY3mGnLTpj/FA6ZZgRxBWgm5l6k6lYbsKCrZs
X-Virus-Checked: Checked by ClamAV on apache.org

Hi Paolo,

On 10 April 2012 21:17, Paolo Castagna  wrot=
e:
> Hi,
> I am still learning Giraph, so, please, be patient with me and forgive my
> trivial questions.
>
> As a simple initial use case, I want to compute the shortest paths from a=
 single
> source in a social graph in RDF format using the FOAF [1] vocabulary.
> This example also will hopefully inform GIRAPH-170 [2] and related issues=
, such
> as: GIRAPH-141 [3].
>
> Here is an example in Turtle [4] format of a tiny graph using FOAF:
> 
> @prefix : <http://example.org/> .
> @prefix foaf: =C2=A0 <http://xmlns.com/foaf/0.1/> .
>
> :alice
> =C2=A0 =C2=A0a =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 foaf:Person ;
> =C2=A0 =C2=A0foaf:name =C2=A0 "Alice" ;
> =C2=A0 =C2=A0foaf:mbox =C2=A0 <mailto:al...@example.org> ;
> =C2=A0 =C2=A0foaf:knows =C2=A0:bob ;
> =C2=A0 =C2=A0foaf:knows =C2=A0:charlie ;
> =C2=A0 =C2=A0foaf:knows =C2=A0:snoopy ;
> =C2=A0 =C2=A0.
>
> :bob
> =C2=A0 =C2=A0foaf:name =C2=A0 "Bob" ;
> =C2=A0 =C2=A0foaf:knows =C2=A0:charlie ;
> =C2=A0 =C2=A0.
>
> :charlie
> =C2=A0 =C2=A0foaf:name =C2=A0 "Charlie" ;
> =C2=A0 =C2=A0foaf:knows =C2=A0:alice ;
> =C2=A0 =C2=A0.
> 
> This is nice, human friendly (RDF without angle brackets!), but not easil=
y
> splittable to be processed with MapReduce (or Giraph).
>
> Here is the same graph in N-Triples [5] format:
> 
> <http://example.org/alice> <http://www.w3.org/1999/02/22-rdf-syntax-ns#ty=
pe>
> <http://xmlns.com/foaf/0.1/Person> .
> <http://example.org/alice> <http://xmlns.com/foaf/0.1/name> "Alice" .
> <http://example.org/alice> <http://xmlns.com/foaf/0.1/mbox>
> <mailto:al...@example.org> .
> <http://example.org/alice> <http://xmlns.com/foaf/0.1/knows>
> 

Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph

2012-04-23 Thread Paolo Castagna
Hi Avery,
first of all, apologies for my delay (I was on holiday) and many thanks for your
help. Further comments inline.

Avery Ching wrote:
> I think we should have some kind of a helper script (similar to
> bin/giraph) to running simple tests in LocalJobRunner.

That would be good, new developer might not have an Hadoop cluster at hand or
may want to debug the stuff they write and test it on their laptop before run it
on a real cluster.

> One thing to remember is that if you rerun it, you'll have to remove the
> _bsp directories that are created, otherwise it will think it has
> already been completed.

This are the program arguments I used to run PageRankBenchmark locally directly
from Eclipse:

  -libjars target/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
-Dgiraph.SplitMasterWorker=false -Dlocal.test.mode=true  -c 1 -e 2 -s 2 -V 10 
-w 1

As you suggested, each time I need to delete the _bsp directories (not ideal,
but necessary).

The -libjars parameter is necessary, otherwise you get a NullPointerException:

12/04/23 15:56:54 WARN mapred.LocalJobRunner: job_local_0001
java.lang.NullPointerException
at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:398)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:646)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)

I hope this help others to get started and run/debug their Giraph jobs with 
Eclipse.

Thanks again,
Paolo


Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph

2012-04-13 Thread Avery Ching

Hi Paulo,

Can you try something for me?  I was able to get the PageRankBenchmark 
to work running in local mode just fine on my side.


I think we should have some kind of a helper script (similar to 
bin/giraph) to running simple tests in LocalJobRunner.


I believe that for LocalJobRunner to run, we need to do 
-Dgiraph.SplitMasterWorker=false -Dlocal.test.mode=true.  In the case of 
PageRankBenchmark, I also have to set the workers to 1 (LocalJobRunner 
can only run one task at a time).


So I get the class path that bin/giraph was using to run (just added a 
echo $CLASSPATH at the end) and then inserted the 
giraph-0.2-SNAPSHOT-jar-with-dependencies.jar in front of it (this is 
necessary for the ZooKeeper jar inclusion).  Then I just ran a normal 
java command and the output below.


One thing to remember is that if you rerun it, you'll have to remove the 
_bsp directories that are created, otherwise it will think it has 
already been completed.


Hope that helps,

Avery

 java -cp 
target/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar:/Users/aching/git/git_svn_giraph_trunk/conf:/Users/aching/.m2/repository/ant/ant/1.6.5/ant-1.6.5.jar:/Users/aching/.m2/repository/com/google/guava/guava/r09/guava-r09.jar:/Users/aching/.m2/repository/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar:/Users/aching/.m2/repository/commons-beanutils/commons-beanutils-core/1.8.0/commons-beanutils-core-1.8.0.jar:/Users/aching/.m2/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:/Users/aching/.m2/repository/commons-codec/commons-codec/1.4/commons-codec-1.4.jar:/Users/aching/.m2/repository/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar:/Users/aching/.m2/repository/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar:/Users/aching/.m2/repository/commons-digester/commons-digester/1.8/commons-digester-1.8.jar:/Users/aching/.m2/repository/commons-el/commons-el/1.0/commons-el-1.0.jar:/Users/aching/.m2/repository/commons-httpclient/commons-httpclient/3.0.1/commons-httpclient-3.0.1.jar:/Users/aching/.m2/repository/commons-lang/commons-lang/2.4/commons-lang-2.4.jar:/Users/aching/.m2/repository/commons-logging/commons-logging/1.0.3/commons-logging-1.0.3.jar:/Users/aching/.m2/repository/commons-net/commons-net/1.4.1/commons-net-1.4.1.jar:/Users/aching/.m2/repository/hsqldb/hsqldb/1.8.0.10/hsqldb-1.8.0.10.jar:/Users/aching/.m2/repository/javax/activation/activation/1.1/activation-1.1.jar:/Users/aching/.m2/repository/javax/mail/mail/1.4/mail-1.4.jar:/Users/aching/.m2/repository/jline/jline/0.9.94/jline-0.9.94.jar:/Users/aching/.m2/repository/junit/junit/3.8.1/junit-3.8.1.jar:/Users/aching/.m2/repository/log4j/log4j/1.2.15/log4j-1.2.15.jar:/Users/aching/.m2/repository/net/iharder/base64/2.3.8/base64-2.3.8.jar:/Users/aching/.m2/repository/net/java/dev/jets3t/jets3t/0.7.1/jets3t-0.7.1.jar:/Users/aching/.m2/repository/net/sf/kosmosfs/kfs/0.3/kfs-0.3.jar:/Users/aching/.m2/repository/org/apache/commons/commons-io/1.3.2/commons-io-1.3.2.jar:/Users/aching/.m2/repository/org/apache/commons/commons-math/2.1/commons-math-2.1.jar:/Users/aching/.m2/repository/org/apache/hadoop/hadoop-core/0.20.203.0/hadoop-core-0.20.203.0.jar:/Users/aching/.m2/repository/org/apache/mahout/mahout-collections/1.0/mahout-collections-1.0.jar:/Users/aching/.m2/repository/org/apache/zookeeper/zookeeper/3.3.3/zookeeper-3.3.3.jar:/Users/aching/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.8.0/jackson-core-asl-1.8.0.jar:/Users/aching/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.8.0/jackson-mapper-asl-1.8.0.jar:/Users/aching/.m2/repository/org/eclipse/jdt/core/3.1.1/core-3.1.1.jar:/Users/aching/.m2/repository/org/json/json/20090211/json-20090211.jar:/Users/aching/.m2/repository/org/mockito/mockito-all/1.8.5/mockito-all-1.8.5.jar:/Users/aching/.m2/repository/org/mortbay/jetty/jetty/6.1.26/jetty-6.1.26.jar:/Users/aching/.m2/repository/org/mortbay/jetty/jetty-util/6.1.26/jetty-util-6.1.26.jar:/Users/aching/.m2/repository/org/mortbay/jetty/jsp-2.1/6.1.14/jsp-2.1-6.1.14.jar:/Users/aching/.m2/repository/org/mortbay/jetty/jsp-api-2.1/6.1.14/jsp-api-2.1-6.1.14.jar:/Users/aching/.m2/repository/org/mortbay/jetty/servlet-api/2.5-20081211/servlet-api-2.5-20081211.jar:/Users/aching/.m2/repository/org/mortbay/jetty/servlet-api-2.5/6.1.14/servlet-api-2.5-6.1.14.jar:/Users/aching/.m2/repository/oro/oro/2.0.8/oro-2.0.8.jar:/Users/aching/.m2/repository/tomcat/jasper-compiler/5.5.12/jasper-compiler-5.5.12.jar:/Users/aching/.m2/repository/tomcat/jasper-runtime/5.5.12/jasper-runtime-5.5.12.jar:/Users/aching/.m2/repository/xmlenc/xmlenc/0.52/xmlenc-0.52.jar 
org.apache.giraph.benchmark.PageRankBenchmark 
-Dgiraph.SplitMasterWorker=false -Dlocal.test.mode=true  -c 1 -e 2 -s 2 
-V 10 -w 1


2012-04-13 09:30:27.261 java[45785:1903] Unable to load realm mapping 
info from SCDynamicStore
12/04/13 09:30:27 INFO benchmark.PageRankBenchmark: Using class 
org.apache.giraph.benchmark.PageRankBenchmark
12/04

Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph

2012-04-13 Thread Paolo Castagna
Paolo Castagna wrote:
> This is a better way:
> 
>   Iterable results = InternalVertexRunner.run(
> SimpleShortestPathsVertex.class,
> SimpleShortestPathsVertex.SimpleShortestPathsVertexInputFormat.class,
> SimpleShortestPathsVertex.SimpleShortestPathsVertexOutputFormat.class,
> params, graph);
> 
> ... which starts a local ZooKeeper properly.
> 
> However, I still have a question: when I run it in a unit test everything is
> fine. When I run it on a Java main method, it hangs towards the end.

I am using Hadoop 1.0.1, Pig 0.9.2, ZooKeeper 3.4.3 and Giraph from trunk:

[INFO] +- org.apache.hadoop:hadoop-core:jar:1.0.1:compile
...
[INFO] +- org.apache.pig:pig:jar:0.9.2:compile
...
[INFO] +- org.apache.hbase:hbase:jar:0.92.1:compile
...
[INFO] +- org.apache.zookeeper:zookeeper:jar:3.4.3:compile
...
[INFO] +- org.apache.giraph:giraph:jar:0.2-SNAPSHOT:compile


Paolo


Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph

2012-04-13 Thread Paolo Castagna
Paolo Castagna wrote:
> Avery Ching wrote:
>> It shouldn't be, your code looks very similar to the unittests (i.e.
>> TestManualCheckpoint.java).  So, you're trying to run your test with the
>> local hadoop (similar to the unittests)?  Or are you using an actual
>> hadoop setup?
> 
> Hi Avery,
> here is a few more details on what I am trying to do, in order to run my 
> Giraph
> jobs on a local Hadoop running (for testing and debugging stuff locally):
> 
>   GiraphJob job = new GiraphJob("shortest paths");
>   Configuration conf = job.getConfiguration();
>   conf.setBoolean(GiraphJob.SPLIT_MASTER_WORKER, false);
>   conf.setBoolean(GiraphJob.LOCAL_TEST_MODE, true);
>   // conf.set(GiraphJob.ZOOKEEPER_JAR,
> "file://target/dependency/zookeeper-3.3.3.jar");
>   job.setWorkerConfiguration(1, 1, 100.0f);
>   job.setVertexClass(SimpleShortestPathsVertex.class);
>   job.setVertexInputFormatClass(SimpleShortestPathsVertexInputFormat.class);
>   job.setVertexOutputFormatClass(SimpleShortestPathsVertexOutputFormat.class);
>   FileInputFormat.addInputPath(job.getInternalJob(), new
> Path("src/main/resources/giraph1.txt"));
>   Path outputPath = new Path("target/giraph1");
>   FileSystem hdfs = FileSystem.get(conf);
>   hdfs.delete(outputPath, true);
>   FileOutputFormat.setOutputPath(job.getInternalJob(), outputPath);
>   job.run(true);
> 
> Am I doing something wrong/stupid here?
> Am I missing something important? (probably! but I do not see what I am 
> missing)

This is a better way:

  Iterable results = InternalVertexRunner.run(
SimpleShortestPathsVertex.class,
SimpleShortestPathsVertex.SimpleShortestPathsVertexInputFormat.class,
SimpleShortestPathsVertex.SimpleShortestPathsVertexOutputFormat.class,
params, graph);

... which starts a local ZooKeeper properly.

However, I still have a question: when I run it in a unit test everything is
fine. When I run it on a Java main method, it hangs towards the end.

Paolo


Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph

2012-04-13 Thread Paolo Castagna
Avery Ching wrote:
> It shouldn't be, your code looks very similar to the unittests (i.e.
> TestManualCheckpoint.java).  So, you're trying to run your test with the
> local hadoop (similar to the unittests)?  Or are you using an actual
> hadoop setup?

Hi Avery,
here is a few more details on what I am trying to do, in order to run my Giraph
jobs on a local Hadoop running (for testing and debugging stuff locally):

  GiraphJob job = new GiraphJob("shortest paths");
  Configuration conf = job.getConfiguration();
  conf.setBoolean(GiraphJob.SPLIT_MASTER_WORKER, false);
  conf.setBoolean(GiraphJob.LOCAL_TEST_MODE, true);
  // conf.set(GiraphJob.ZOOKEEPER_JAR,
"file://target/dependency/zookeeper-3.3.3.jar");
  job.setWorkerConfiguration(1, 1, 100.0f);
  job.setVertexClass(SimpleShortestPathsVertex.class);
  job.setVertexInputFormatClass(SimpleShortestPathsVertexInputFormat.class);
  job.setVertexOutputFormatClass(SimpleShortestPathsVertexOutputFormat.class);
  FileInputFormat.addInputPath(job.getInternalJob(), new
Path("src/main/resources/giraph1.txt"));
  Path outputPath = new Path("target/giraph1");
  FileSystem hdfs = FileSystem.get(conf);
  hdfs.delete(outputPath, true);
  FileOutputFormat.setOutputPath(job.getInternalJob(), outputPath);
  job.run(true);

Am I doing something wrong/stupid here?
Am I missing something important? (probably! but I do not see what I am missing)

This is what I think happens...

In GraphMapper something goes wrong during setup(context), probably because
GiraphJob.ZOOKEEPER_JAR is not set(?) and an exception different from
IOException is thrown and I do not see any useful error message:

try {
  setup(context);
  while (context.nextKeyValue()) {
map(context.getCurrentKey(),
context.getCurrentValue(),
context);
  }
  cleanup(context);
} catch (IOException e) {
  if (mapFunctions == MapFunctions.WORKER_ONLY) {
serviceWorker.failureCleanup();
  }
  throw new IllegalStateException(
  "run: Caught an unrecoverable exception " + e.getMessage(), e);
}

My question is: is it possible to run a Giraph job as I am trying to do above
(for testing only) or developers need to have an Hadoop cluster either remote
or locally and ZooKeeper running (either remote or locally)?

Thanks,
Paolo


Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph

2012-04-11 Thread Paolo Castagna
Avery Ching wrote:
> It shouldn't be, your code looks very similar to the unittests (i.e.
> TestManualCheckpoint.java).  So, you're trying to run your test with the
> local hadoop (similar to the unittests)?  Or are you using an actual
> hadoop setup?

Hi Avery,
while I am learning and writing the first examples, I am trying to run with
a local hadoop (similar to the unit tests). This way, I can easily run and
debug the code from the IDE.

Tomorrow, I'll look at the unit tests again trying to see if I can spot what
I am doing wrong.

Thanks,
Paolo

> 
> Avery
> 
> On 4/10/12 11:41 PM, Paolo Castagna wrote:
>> I am using hadoop-core-1.0.1.jar ... could that be a problem?
>>
>> Paolo
>>
>> Paolo Castagna wrote:
>>> Hi Avery,
>>> nope, no luck.
>>>
>>> I have changed all my log.debug(...) into log.info(...). Same behavior.
>>>
>>> I have a log4j.properties [1] file in my classpath and it has:
>>> log4j.logger.org.apache.jena.grande=DEBUG
>>> log4j.logger.org.apache.jena.grande.giraph=DEBUG
>>> I also tried to change that to:
>>> log4j.logger.org.apache.jena.grande=INFO
>>> log4j.logger.org.apache.jena.grande.giraph=INFO
>>> No luck.
>>>
>>> My Giraph job has:
>>> GiraphJob job = new GiraphJob(getConf(), getClass().getName());
>>> job.setVertexClass(getClass());
>>> job.setVertexInputFormatClass(TurtleVertexInputFormat.class);
>>> job.setVertexOutputFormatClass(TurtleVertexOutputFormat.class);
>>>
>>> But, if I run in debug with a breakpoint in the
>>> TurtleVertexInputFormat.class
>>> constructor, it is never instanciated. How can it be?
>>>
>>> So perhaps the problem is not the logging, it is the fact that
>>> my GiraphJob is not using TurtleVertexInputFormat.class and
>>> TurtleVertexOutputFormat.class, but I don't see what I am doing
>>> wrong. :-/
>>>
>>> Thanks,
>>> Paolo
>>>
>>>   [1]
>>> https://github.com/castagna/jena-grande/blob/master/src/test/resources/log4j.properties
>>>
>>>
>>> Avery Ching wrote:
 I think the issue might be that Hadoop only logs INFO and above
 messages
 by default.  Can you retry with INFO level logging?

 Avery

 On 4/10/12 12:17 PM, Paolo Castagna wrote:
> Hi,
> I am still learning Giraph, so, please, be patient with me and
> forgive my
> trivial questions.
>
> As a simple initial use case, I want to compute the shortest paths
> from a single
> source in a social graph in RDF format using the FOAF [1] vocabulary.
> This example also will hopefully inform GIRAPH-170 [2] and related
> issues, such
> as: GIRAPH-141 [3].
>
> Here is an example in Turtle [4] format of a tiny graph using FOAF:
> 
> @prefix :   .
> @prefix foaf:   .
>
> :alice
>   a   foaf:Person ;
>   foaf:name   "Alice" ;
>   foaf:mbox   ;
>   foaf:knows  :bob ;
>   foaf:knows  :charlie ;
>   foaf:knows  :snoopy ;
>   .
>
> :bob
>   foaf:name   "Bob" ;
>   foaf:knows  :charlie ;
>   .
>
> :charlie
>   foaf:name   "Charlie" ;
>   foaf:knows  :alice ;
>   .
> 
> This is nice, human friendly (RDF without angle brackets!), but not
> easily
> splittable to be processed with MapReduce (or Giraph).
>
> Here is the same graph in N-Triples [5] format:
> 
> 
> 
>    .
>      
> "Alice" .
>    
>    .
>    
>    .
>    
>    .
>    
>    .
>    
> "Charlie" .
>    
>    .
>       "Bob" .
>    
>    .
> 
> This is more verbose and ugly, but splittable.
>
> The graph I am interested in is the graph represented by foaf:knows
> relationships/links between people (please, note --knows-->
> relationship here
> has a direction, this isn't symmetric as in centralized social
> networking
> websites such as Facebook or LinkedIn. Alice can claim to know Bob,
> without Bob
> knowing it and/or it might even be a false claim):
>
> alice --knows-->   bob
> alice --knows-->   charlie
>>

Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph

2012-04-11 Thread Avery Ching
It shouldn't be, your code looks very similar to the unittests (i.e. 
TestManualCheckpoint.java).  So, you're trying to run your test with the 
local hadoop (similar to the unittests)?  Or are you using an actual 
hadoop setup?


Avery

On 4/10/12 11:41 PM, Paolo Castagna wrote:

I am using hadoop-core-1.0.1.jar ... could that be a problem?

Paolo

Paolo Castagna wrote:

Hi Avery,
nope, no luck.

I have changed all my log.debug(...) into log.info(...). Same behavior.

I have a log4j.properties [1] file in my classpath and it has:
log4j.logger.org.apache.jena.grande=DEBUG
log4j.logger.org.apache.jena.grande.giraph=DEBUG
I also tried to change that to:
log4j.logger.org.apache.jena.grande=INFO
log4j.logger.org.apache.jena.grande.giraph=INFO
No luck.

My Giraph job has:
GiraphJob job = new GiraphJob(getConf(), getClass().getName());
job.setVertexClass(getClass());
job.setVertexInputFormatClass(TurtleVertexInputFormat.class);
job.setVertexOutputFormatClass(TurtleVertexOutputFormat.class);

But, if I run in debug with a breakpoint in the TurtleVertexInputFormat.class
constructor, it is never instanciated. How can it be?

So perhaps the problem is not the logging, it is the fact that
my GiraphJob is not using TurtleVertexInputFormat.class and
TurtleVertexOutputFormat.class, but I don't see what I am doing
wrong. :-/

Thanks,
Paolo

  [1]
https://github.com/castagna/jena-grande/blob/master/src/test/resources/log4j.properties

Avery Ching wrote:

I think the issue might be that Hadoop only logs INFO and above messages
by default.  Can you retry with INFO level logging?

Avery

On 4/10/12 12:17 PM, Paolo Castagna wrote:

Hi,
I am still learning Giraph, so, please, be patient with me and forgive my
trivial questions.

As a simple initial use case, I want to compute the shortest paths
from a single
source in a social graph in RDF format using the FOAF [1] vocabulary.
This example also will hopefully inform GIRAPH-170 [2] and related
issues, such
as: GIRAPH-141 [3].

Here is an example in Turtle [4] format of a tiny graph using FOAF:

@prefix :   .
@prefix foaf:   .

:alice
  a   foaf:Person ;
  foaf:name   "Alice" ;
  foaf:mbox   ;
  foaf:knows  :bob ;
  foaf:knows  :charlie ;
  foaf:knows  :snoopy ;
  .

:bob
  foaf:name   "Bob" ;
  foaf:knows  :charlie ;
  .

:charlie
  foaf:name   "Charlie" ;
  foaf:knows  :alice ;
  .

This is nice, human friendly (RDF without angle brackets!), but not
easily
splittable to be processed with MapReduce (or Giraph).

Here is the same graph in N-Triples [5] format:



   .
      "Alice" .
   
   .
   
   .
   
   .
   
   .
   
"Charlie" .
   
   .
      "Bob" .
   
   .

This is more verbose and ugly, but splittable.

The graph I am interested in is the graph represented by foaf:knows
relationships/links between people (please, note --knows-->
relationship here
has a direction, this isn't symmetric as in centralized social networking
websites such as Facebook or LinkedIn. Alice can claim to know Bob,
without Bob
knowing it and/or it might even be a false claim):

alice --knows-->   bob
alice --knows-->   charlie
alice --knows-->   snoopy
bob --knows-->   charlie
charlie --knows-->   alice

As a first step, I wrote a MapReduce job [6] to transform the RDF
graph above in
a sort of adjacency list using Turtle syntax, here is the output
(three lines):

   
;   "Alice";

;
,,
; .
   .

      "Bob";
   ; .
   
.

   

Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph

2012-04-10 Thread Paolo Castagna
I am using hadoop-core-1.0.1.jar ... could that be a problem?

Paolo

Paolo Castagna wrote:
> Hi Avery,
> nope, no luck.
> 
> I have changed all my log.debug(...) into log.info(...). Same behavior.
> 
> I have a log4j.properties [1] file in my classpath and it has:
> log4j.logger.org.apache.jena.grande=DEBUG
> log4j.logger.org.apache.jena.grande.giraph=DEBUG
> I also tried to change that to:
> log4j.logger.org.apache.jena.grande=INFO
> log4j.logger.org.apache.jena.grande.giraph=INFO
> No luck.
> 
> My Giraph job has:
> GiraphJob job = new GiraphJob(getConf(), getClass().getName());
> job.setVertexClass(getClass());
> job.setVertexInputFormatClass(TurtleVertexInputFormat.class);
> job.setVertexOutputFormatClass(TurtleVertexOutputFormat.class);
> 
> But, if I run in debug with a breakpoint in the TurtleVertexInputFormat.class
> constructor, it is never instanciated. How can it be?
> 
> So perhaps the problem is not the logging, it is the fact that
> my GiraphJob is not using TurtleVertexInputFormat.class and
> TurtleVertexOutputFormat.class, but I don't see what I am doing
> wrong. :-/
> 
> Thanks,
> Paolo
> 
>  [1]
> https://github.com/castagna/jena-grande/blob/master/src/test/resources/log4j.properties
> 
> Avery Ching wrote:
>> I think the issue might be that Hadoop only logs INFO and above messages
>> by default.  Can you retry with INFO level logging?
>>
>> Avery
>>
>> On 4/10/12 12:17 PM, Paolo Castagna wrote:
>>> Hi,
>>> I am still learning Giraph, so, please, be patient with me and forgive my
>>> trivial questions.
>>>
>>> As a simple initial use case, I want to compute the shortest paths
>>> from a single
>>> source in a social graph in RDF format using the FOAF [1] vocabulary.
>>> This example also will hopefully inform GIRAPH-170 [2] and related
>>> issues, such
>>> as: GIRAPH-141 [3].
>>>
>>> Here is an example in Turtle [4] format of a tiny graph using FOAF:
>>> 
>>> @prefix :  .
>>> @prefix foaf:  .
>>>
>>> :alice
>>>  a   foaf:Person ;
>>>  foaf:name   "Alice" ;
>>>  foaf:mbox  ;
>>>  foaf:knows  :bob ;
>>>  foaf:knows  :charlie ;
>>>  foaf:knows  :snoopy ;
>>>  .
>>>
>>> :bob
>>>  foaf:name   "Bob" ;
>>>  foaf:knows  :charlie ;
>>>  .
>>>
>>> :charlie
>>>  foaf:name   "Charlie" ;
>>>  foaf:knows  :alice ;
>>>  .
>>> 
>>> This is nice, human friendly (RDF without angle brackets!), but not
>>> easily
>>> splittable to be processed with MapReduce (or Giraph).
>>>
>>> Here is the same graph in N-Triples [5] format:
>>> 
>>>  
>>> 
>>>   .
>>>     "Alice" .
>>>   
>>>   .
>>>   
>>>   .
>>>   
>>>   .
>>>   
>>>   .
>>>    
>>> "Charlie" .
>>>   
>>>   .
>>>     "Bob" .
>>>   
>>>   .
>>> 
>>> This is more verbose and ugly, but splittable.
>>>
>>> The graph I am interested in is the graph represented by foaf:knows
>>> relationships/links between people (please, note --knows--> 
>>> relationship here
>>> has a direction, this isn't symmetric as in centralized social networking
>>> websites such as Facebook or LinkedIn. Alice can claim to know Bob,
>>> without Bob
>>> knowing it and/or it might even be a false claim):
>>>
>>> alice --knows-->  bob
>>> alice --knows-->  charlie
>>> alice --knows-->  snoopy
>>> bob --knows-->  charlie
>>> charlie --knows-->  alice
>>>
>>> As a first step, I wrote a MapReduce job [6] to transform the RDF
>>> graph above in
>>> a sort of adjacency list using Turtle syntax, here is the output
>>> (three lines):
>>> 
>>>   
>>> ;  "Alice";
>>> 
>>> ;
>>> ,,
>>> ; .
>>>   .
>>>
>>>     "Bob";
>>>   ; .
>>>   

Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph

2012-04-10 Thread Paolo Castagna
Hi Avery,
nope, no luck.

I have changed all my log.debug(...) into log.info(...). Same behavior.

I have a log4j.properties [1] file in my classpath and it has:
log4j.logger.org.apache.jena.grande=DEBUG
log4j.logger.org.apache.jena.grande.giraph=DEBUG
I also tried to change that to:
log4j.logger.org.apache.jena.grande=INFO
log4j.logger.org.apache.jena.grande.giraph=INFO
No luck.

My Giraph job has:
GiraphJob job = new GiraphJob(getConf(), getClass().getName());
job.setVertexClass(getClass());
job.setVertexInputFormatClass(TurtleVertexInputFormat.class);
job.setVertexOutputFormatClass(TurtleVertexOutputFormat.class);

But, if I run in debug with a breakpoint in the TurtleVertexInputFormat.class
constructor, it is never instanciated. How can it be?

So perhaps the problem is not the logging, it is the fact that
my GiraphJob is not using TurtleVertexInputFormat.class and
TurtleVertexOutputFormat.class, but I don't see what I am doing
wrong. :-/

Thanks,
Paolo

 [1]
https://github.com/castagna/jena-grande/blob/master/src/test/resources/log4j.properties

Avery Ching wrote:
> I think the issue might be that Hadoop only logs INFO and above messages
> by default.  Can you retry with INFO level logging?
> 
> Avery
> 
> On 4/10/12 12:17 PM, Paolo Castagna wrote:
>> Hi,
>> I am still learning Giraph, so, please, be patient with me and forgive my
>> trivial questions.
>>
>> As a simple initial use case, I want to compute the shortest paths
>> from a single
>> source in a social graph in RDF format using the FOAF [1] vocabulary.
>> This example also will hopefully inform GIRAPH-170 [2] and related
>> issues, such
>> as: GIRAPH-141 [3].
>>
>> Here is an example in Turtle [4] format of a tiny graph using FOAF:
>> 
>> @prefix :  .
>> @prefix foaf:  .
>>
>> :alice
>>  a   foaf:Person ;
>>  foaf:name   "Alice" ;
>>  foaf:mbox  ;
>>  foaf:knows  :bob ;
>>  foaf:knows  :charlie ;
>>  foaf:knows  :snoopy ;
>>  .
>>
>> :bob
>>  foaf:name   "Bob" ;
>>  foaf:knows  :charlie ;
>>  .
>>
>> :charlie
>>  foaf:name   "Charlie" ;
>>  foaf:knows  :alice ;
>>  .
>> 
>> This is nice, human friendly (RDF without angle brackets!), but not
>> easily
>> splittable to be processed with MapReduce (or Giraph).
>>
>> Here is the same graph in N-Triples [5] format:
>> 
>>  
>> 
>>   .
>>     "Alice" .
>>   
>>   .
>>   
>>   .
>>   
>>   .
>>   
>>   .
>>    
>> "Charlie" .
>>   
>>   .
>>     "Bob" .
>>   
>>   .
>> 
>> This is more verbose and ugly, but splittable.
>>
>> The graph I am interested in is the graph represented by foaf:knows
>> relationships/links between people (please, note --knows--> 
>> relationship here
>> has a direction, this isn't symmetric as in centralized social networking
>> websites such as Facebook or LinkedIn. Alice can claim to know Bob,
>> without Bob
>> knowing it and/or it might even be a false claim):
>>
>> alice --knows-->  bob
>> alice --knows-->  charlie
>> alice --knows-->  snoopy
>> bob --knows-->  charlie
>> charlie --knows-->  alice
>>
>> As a first step, I wrote a MapReduce job [6] to transform the RDF
>> graph above in
>> a sort of adjacency list using Turtle syntax, here is the output
>> (three lines):
>> 
>>   
>> ;  "Alice";
>> 
>> ;
>> ,,
>> ; .
>>   .
>>
>>     "Bob";
>>   ; .
>>   
>> .
>>
>>    
>> "Charlie";
>>   ; .
>>   

Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph

2012-04-10 Thread Avery Ching
I think the issue might be that Hadoop only logs INFO and above messages 
by default.  Can you retry with INFO level logging?


Avery

On 4/10/12 12:17 PM, Paolo Castagna wrote:

Hi,
I am still learning Giraph, so, please, be patient with me and forgive my
trivial questions.

As a simple initial use case, I want to compute the shortest paths from a single
source in a social graph in RDF format using the FOAF [1] vocabulary.
This example also will hopefully inform GIRAPH-170 [2] and related issues, such
as: GIRAPH-141 [3].

Here is an example in Turtle [4] format of a tiny graph using FOAF:

@prefix :  .
@prefix foaf:  .

:alice
 a   foaf:Person ;
 foaf:name   "Alice" ;
 foaf:mbox  ;
 foaf:knows  :bob ;
 foaf:knows  :charlie ;
 foaf:knows  :snoopy ;
 .

:bob
 foaf:name   "Bob" ;
 foaf:knows  :charlie ;
 .

:charlie
 foaf:name   "Charlie" ;
 foaf:knows  :alice ;
 .

This is nice, human friendly (RDF without angle brackets!), but not easily
splittable to be processed with MapReduce (or Giraph).

Here is the same graph in N-Triples [5] format:

  
  .
    "Alice" .
  
  .
  
  .
  
  .
  
  .
    "Charlie" .
  
  .
    "Bob" .
  
  .

This is more verbose and ugly, but splittable.

The graph I am interested in is the graph represented by foaf:knows
relationships/links between people (please, note --knows-->  relationship here
has a direction, this isn't symmetric as in centralized social networking
websites such as Facebook or LinkedIn. Alice can claim to know Bob, without Bob
knowing it and/or it might even be a false claim):

alice --knows-->  bob
alice --knows-->  charlie
alice --knows-->  snoopy
bob --knows-->  charlie
charlie --knows-->  alice

As a first step, I wrote a MapReduce job [6] to transform the RDF graph above in
a sort of adjacency list using Turtle syntax, here is the output (three lines):

  
;  "Alice";

;
,,
; .
  .

    "Bob";
  ; .
  
.

    "Charlie";
  ; .
  
.
  .

This is legal Turtle, but it is also splittable. Each line has all the RDF
statements (i.e. egdes) for a person (there are also incoming edges).

I wrote a TurtleVertexReader [7] which extends TextVertexReader  and a TurtleVertexInputFormat [8] which extends
TextVertexInputFormat.
I wrote (copying from the example SimpleShortestPathsVertex) a
FoafShortestPathsVertex [9] which extends EdgeListVertex  and I am running it locally using these
arguments: -Dgiraph.maxWorkers=1 -Dgiraph.SplitMasterWorker=false
-DoverwriteOutput=true src/test/resources/data3.ttl target/foaf
http://example.org/alice 1

TurtleVertexReader, TurtleVertexInputFormat and FoafShortestPathsVertex are
still work in progress and I am sure there are plenty of stupid errors.
However, I do not understand why when I run FoafShortestPathsVertex with the
DEBUG level, I see debug statements from FoafShortestPathsVertex:
19:34:44 DEBUG FoafShortestPathsVertex   :: main({-Dgiraph.maxWorkers=1,
-Dgiraph.SplitMasterWorker=false, -DoverwriteOutput=true,
src/test/resources/data3.ttl, target/foaf, http://example.org/alice, 1})
19:34:44 DEBUG FoafShortestPathsVertex   :: getConf() -->  null
19:34:44 DEBUG FoafShortestPathsVertex   :: setConf(Configuration:
core-defa

A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph

2012-04-10 Thread Paolo Castagna
Hi,
I am still learning Giraph, so, please, be patient with me and forgive my
trivial questions.

As a simple initial use case, I want to compute the shortest paths from a single
source in a social graph in RDF format using the FOAF [1] vocabulary.
This example also will hopefully inform GIRAPH-170 [2] and related issues, such
as: GIRAPH-141 [3].

Here is an example in Turtle [4] format of a tiny graph using FOAF:

@prefix :  .
@prefix foaf:    .

:alice
a   foaf:Person ;
foaf:name   "Alice" ;
foaf:mbox    ;
foaf:knows  :bob ;
foaf:knows  :charlie ;
foaf:knows  :snoopy ;
.

:bob
foaf:name   "Bob" ;
foaf:knows  :charlie ;
.

:charlie
foaf:name   "Charlie" ;
foaf:knows  :alice ;
.

This is nice, human friendly (RDF without angle brackets!), but not easily
splittable to be processed with MapReduce (or Giraph).

Here is the same graph in N-Triples [5] format:

 
 .
  "Alice" .
 
 .
 
 .
 
 .
 
 .
  "Charlie" .
 
 .
  "Bob" .
 
 .

This is more verbose and ugly, but splittable.

The graph I am interested in is the graph represented by foaf:knows
relationships/links between people (please, note --knows--> relationship here
has a direction, this isn't symmetric as in centralized social networking
websites such as Facebook or LinkedIn. Alice can claim to know Bob, without Bob
knowing it and/or it might even be a false claim):

alice --knows--> bob
alice --knows--> charlie
alice --knows--> snoopy
bob --knows--> charlie
charlie --knows--> alice

As a first step, I wrote a MapReduce job [6] to transform the RDF graph above in
a sort of adjacency list using Turtle syntax, here is the output (three lines):

 
;  "Alice";

; 
, ,
; . 
 .

  "Bob";
 ; .
 
.

  "Charlie";
 ; .
 
. 
 .

This is legal Turtle, but it is also splittable. Each line has all the RDF
statements (i.e. egdes) for a person (there are also incoming edges).

I wrote a TurtleVertexReader [7] which extends TextVertexReader and a TurtleVertexInputFormat [8] which extends
TextVertexInputFormat.
I wrote (copying from the example SimpleShortestPathsVertex) a
FoafShortestPathsVertex [9] which extends EdgeListVertex and I am running it locally using these
arguments: -Dgiraph.maxWorkers=1 -Dgiraph.SplitMasterWorker=false
-DoverwriteOutput=true src/test/resources/data3.ttl target/foaf
http://example.org/alice 1

TurtleVertexReader, TurtleVertexInputFormat and FoafShortestPathsVertex are
still work in progress and I am sure there are plenty of stupid errors.
However, I do not understand why when I run FoafShortestPathsVertex with the
DEBUG level, I see debug statements from FoafShortestPathsVertex:
19:34:44 DEBUG FoafShortestPathsVertex   :: main({-Dgiraph.maxWorkers=1,
-Dgiraph.SplitMasterWorker=false, -DoverwriteOutput=true,
src/test/resources/data3.ttl, target/foaf, http://example.org/alice, 1})
19:34:44 DEBUG FoafShortestPathsVertex   :: getConf() --> null
19:34:44 DEBUG FoafShortestPathsVertex   :: setConf(Configuration:
core-default.xml, core-site.xml)
19:34:44 DEBUG FoafShortestPathsVertex   :: run({src/test/resources/data3.ttl,
target/foaf, http://example.org/alice, 1})
19:34:44 DEBUG FoafShortestPathsVertex   :: getConf() --> Configuration:
core-