Re: [Announcement] Giraph talk in Berlin on May 29th

2012-05-12 Thread Avery Ching
Nice! Avery On 5/12/12 2:58 AM, Sebastian Schelter wrote: Hi, I will give a talk titled Large Scale Graph Processing with Apache Giraph in Berlin on May 29th. Details are available at:

Re: Possible bug when resetting aggregators ? (and missing documentation)

2012-05-02 Thread Avery Ching
I think you're right that the javadoc isn't specific enough. * Use a registered aggregator in current superstep. * Even when the same aggregator should be used in the next * superstep, useAggregator needs to be called at the beginning * of that superstep in preSuperstep(). * *

Re: Please welcome our newest committer and PMC member, Eugene!

2012-05-01 Thread Avery Ching
Awesome! Congrats Eugene, we're excited to have you taking on a big role. Avery On 5/1/12 5:18 PM, Hyunsik Choi wrote: Congrats and welcome Eugene! I'm looking forward to your contribution. -- Hyunsik Choi On Wed, May 2, 2012 at 5:39 AM, Jakob Homan jgho...@gmail.com

Re: Does Giraph support labeled graphs?

2012-04-19 Thread Avery Ching
, On 11 Apr 2012, at 18:37, Avery Ching wrote: There is no preferred way to represent labeled graphs. A close example to your adjacency list idea is LongDoubleDoubleAdjacencyListVertexInputFormat. Exactly. Giraph supports labeled Graphs very easily. My reply is a little bit lat, so you probably

Re: Slides for my talk at the Berlin Hadoop Get Together

2012-04-19 Thread Avery Ching
Very nice! Will these be similar to the 'Parallel Processing beyond MapReduce' workshop after Berlin Buzzwords? It would be good to add at leaset one of them to the page. Avery On 4/19/12 12:31 PM, Sebastian Schelter wrote: Here are the slides of my talk Introducing Apache Giraph for Large

Re: java.lang.RuntimeException [...] msgMap did not exist [...]

2012-04-17 Thread Avery Ching
have no job conf dating of the 13th. Does hadoop does not take the local time to name the files? Thanks, Étienne On 16 April 2012 19:45, Avery Ching ach...@apache.org mailto:ach...@apache.org wrote: Etienne, the task tracker logs are not what I meant, sorry for the confusion

Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph

2012-04-13 Thread Avery Ching
Hi Paulo, Can you try something for me? I was able to get the PageRankBenchmark to work running in local mode just fine on my side. I think we should have some kind of a helper script (similar to bin/giraph) to running simple tests in LocalJobRunner. I believe that for LocalJobRunner to

Re: java.lang.RuntimeException [...] msgMap did not exist [...]

2012-04-13 Thread Avery Ching
Hi Etienne, Thanks for your questions. Giraph uses map tasks to run its master and workers. Can you provide the task output logs? It looks like your workers failed to report status for some reason and we need to find out why. The datanode logs can't help us here. Avery On 4/13/12 3:35

Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph

2012-04-11 Thread Avery Ching
GiraphJob is not using TurtleVertexInputFormat.class and TurtleVertexOutputFormat.class, but I don't see what I am doing wrong. :-/ Thanks, Paolo [1] https://github.com/castagna/jena-grande/blob/master/src/test/resources/log4j.properties Avery Ching wrote: I think the issue might

Re: Does Giraph support labeled graphs?

2012-04-11 Thread Avery Ching
There is no preferred way to represent labeled graphs. A close example to your adjacency list idea is LongDoubleDoubleAdjacencyListVertexInputFormat. Hope that helps, Avery On 4/11/12 10:00 AM, Paolo Castagna wrote: Hi, I am not sure what's the best way to represent labeled graphs in

Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph

2012-04-10 Thread Avery Ching
I think the issue might be that Hadoop only logs INFO and above messages by default. Can you retry with INFO level logging? Avery On 4/10/12 12:17 PM, Paolo Castagna wrote: Hi, I am still learning Giraph, so, please, be patient with me and forgive my trivial questions. As a simple initial

Re: Announcement: 'Parallel Processing beyond MapReduce' workshop after Berlin Buzzwords

2012-04-04 Thread Avery Ching
That is great news Sebastian! Congrats, I wish I was in Berlin to attend. Avery On 4/4/12 2:12 AM, Sebastian Schelter wrote: Hi everybody, I'd like to announce the 'Parallel Processing beyond MapReduce' workshop which will take place directly after the Berlin Buzzwords conference (

Re: Exceptions when establishing RPC

2012-04-03 Thread Avery Ching
If you're using one master and one slave, you need to do -w 1. Did you see any error about the RPC server starting up? Avery On 4/3/12 1:37 PM, Robert Davis wrote: Hello, I was trying to run Giraph on two machines (one master and one slave) but kept getting exceptions when establishing RPC

Re: Incomplete output when running PageRank example

2012-03-31 Thread Avery Ching
As Benjamin mentioned, it depends on the number of map tasks your hadoop install is running with. You could set it proportionally to the number of cores it has if you like, but try using Benjamin's suggestions to get it working with more map tasks. I believe if you don't set the default, the

Re: Problem deploying Giraph job to hadoop cluster: onlineZooKeeperServers connection failure

2012-03-21 Thread Avery Ching
Benjamin, my guess is that your jar might not have all the ZooKeeper dependencies. Can you look at the log for the process that was supposed to start ZooKeeper? I'm thinking it didn't start... Avery On 3/20/12 1:14 PM, Benjamin Heitmann wrote: Hello, after getting my feet wet with the

Re: Pseudo-random number Vertex Reader

2012-03-18 Thread Avery Ching
You can use it for performance testing, although it is not a great simulation of real graphs. Real graphs tend to be more power law distributed (see https://issues.apache.org/jira/browse/GIRAPH-26). Hope that helps, Avery On 3/17/12 8:13 PM, Fleischman, Stephen (ISS SCI - Plano TX) wrote:

Re: Calling BspUtils.createVertexResolver from a TextVertexReader ?

2012-03-16 Thread Avery Ching
basically an abstract class and subclasses can override methods to provide default values for vertices and edges (otherwise values are initialized to null), just like Avery described below. If you think it's useful I can contribute this. On Wed, Mar 14, 2012 at 7:39 AM, Avery Ching ach...@apache.org

Please vote for our Giraph proposal for the upcoming Hadoop Summit

2012-03-16 Thread Avery Ching
Hi Giraphers, We have a submission for the 2012 Hadoop summit and part of deciding whether it gets accepted is based on community voting. It would be great to get more folks interested and involved in what is going on with Giraph so please vote! Here's the link:

Re: Calling BspUtils.createVertexResolver from a TextVertexReader ?

2012-03-14 Thread Avery Ching
. We'd love to have your contributions, it's a great fit. =) Looking forward to your response! Thanks! On Mon, Mar 12, 2012 at 9:09 PM, Avery Ching ach...@apache.org mailto:ach...@apache.org wrote: Benjamin, By the way, you're not the first to ask for a feature of this kind

Re: Question about TextInputFormat pattern for parsing e.g. RDF

2012-03-12 Thread Avery Ching
Sorry for the delayed response. Responses inline. Avery On 3/8/12 7:14 AM, Benjamin Heitmann wrote: Hello again, I am wondering if it would be possible to parse RDF input files from a TextInputFormat class. The most suitable text format for RDF is called NTriples, and it has this very

Re: Calling BspUtils.createVertexResolver from a TextVertexReader ?

2012-03-12 Thread Avery Ching
Benjamin, By the way, you're not the first to ask for a feature of this kind. Perhaps we should consider an alternative format for loading input vertex data that is based on the edges or data of the vertices rather than totally vertex-centric. We could load an edge, or a vertex value and

Re: Error in instantiating custom Vertex class via InternalVertexRunner.run

2012-03-05 Thread Avery Ching
Inline responses. We look forward to hearing about your work Benjamin! On 3/5/12 9:12 AM, Benjamin Heitmann wrote: On 2 Mar 2012, at 23:15, Avery Ching wrote: If I'm reading this right, you're using a public abstract class for the vertex. The vertex class must be instantiable and cannot

Re: PageRankBenchmark failing with zooKeeper.KeeperException

2012-03-05 Thread Avery Ching
Hi Abhishek, Nice to meet you. Can you try it with less workers? For instance -w 1 or -w 2? I think the likely issue is that you need have as many map slots as the number of workers + at least one master. If you don't have enough slots, the job will fail. Also, you might want to dial

Re: Giraph input format restrictions

2012-02-19 Thread Avery Ching
Sorry about the old documentation. I just updated the shortest paths example. Before major changes to the graph distribution, the vertex ids were required to be sorted. That is no longer the case. You can input vertices in any order. The only restriction is that the vertex ids must be

Re: how to use SimplePageRankVertex

2012-02-18 Thread Avery Ching
IntIntNullIntTextInputFormat in the examples package (extending TextVertexInputFormat as David suggests) is very similar to what you need I think, although the types might be different for your application. You can start with that perhaps. Avery On 2/18/12 7:48 AM, David Garcia wrote: The

Re: counter limit question

2012-02-16 Thread Avery Ching
Yes, there is a way to disable the counters at runtime. See GiraphJob: /** Use superstep counters? (boolean) */ public static final String USE_SUPERSTEP_COUNTERS = giraph.useSuperstepCounters; and set to false. Avery On 2/16/12 1:41 PM, David Garcia wrote: I have a job that could

Re: maven, hadoop, zookeeper, and giraph!

2012-02-16 Thread Avery Ching
Hi Jeffrey, Best attempt as answers inline. On 2/16/12 6:12 PM, Jeffrey Yunes wrote: Hi Giraph community, I think I followed all of the directions (for a Giraph on a psuedo-cluster), and it looks like mvn clean test -Dprop.mapred.job.tracker=localhost:9001 runs fine. However, I'm new to

Re: Giraph Architecture bug in

2012-02-08 Thread Avery Ching
AFAIK we don't have any SOP for opening issues. Maybe I'll take a crack at this one tonight if I find some time, unless you were planning to work on it David. Avery On 2/8/12 5:46 PM, David Garcia wrote: I opened up * GIRAPH-144https://issues.apache.org/jira/browse/GIRAPH-144 I apologize

Re: running job with giraph dependency anomaly

2012-02-07 Thread Avery Ching
If you're using GiraphJob, the mapper class should be set for you. That's weird. Avery On 2/7/12 5:58 PM, David Garcia wrote: That's interesting. Yes, I don't need native libraries. The problem I'm having is that after I run job.waitForCompletion(..), The job runs a mapper that is

Re: creating non existing vertices by sending messages

2012-02-03 Thread Avery Ching
Thanks for the comments David. The behavior of what happens is completely defined by the chosen VertexResolver, see (GiraphJob#setWorkerContextClass). Developers can implement any behavior they want. I believe the only reason to bypass was as a performance optimization. Avery On 2/3/12

Re: multi-graph support in giraph

2012-02-03 Thread Avery Ching
We can diverge from the Pregel API as long as we have a good reason for it. I do agree that while we can support multi-graphs with a user-chosen edge type, some built-in support that makes programming easier sounds like a good goal. Andre or Claudio, feel free to open a JIRA to discuss this.

Re: [VOTE] Release Giraph 0.1-incubating (rc0)

2012-01-31 Thread Avery Ching
To address the issues of binaries, could we release multiple binaries of Giraph that coincide with the different versions of Hadoop? On 1/31/12 7:44 PM, David Garcia wrote: I think these concerns preclude the entire idea of a release. A release should be something that users can use as a

Re: giraph stability problem

2012-01-23 Thread Avery Ching
Glad to hear you fixed your problem. It would be great if you could describe any improvements that would help you have found the issues earlier. Maybe we (or you) could add them =). Avery On 1/23/12 8:31 AM, André Kelpe wrote: Hi all, thanks for all the answers so far, it turns out that

Re: Scalability results for GoldenOrb and comparison with Giraph

2011-12-14 Thread Avery Ching
/ find optimal configurations for various regimes of problems, and would like to see Giraph succeed, so let me know if there's any open issues which I might be able to dig into (I'm on the dev mailing list as well, though haven't posted there). Thanks, Jon On Dec 11, 2011, at 1:02 PM, Avery Ching

Re: Packaging a Giraph application in a jar

2011-11-09 Thread Avery Ching
Would be great if you can document what you did. =) Thanks, Avery On 11/8/11 3:13 PM, Claudio Martella wrote: Sorry guys, may bad. Was calling job.waitForCompletion() directly. I've been coding standard mapreduce whole weekend... Anyway I got a solution for clean packaging of your own

Re: way to run unit tests from inside IDE?

2011-10-29 Thread Avery Ching
I use Eclipse and it's okay for running unittests, but I need to set the VM args in the junit run configuration for each specific test to -Dprop.jarLocation=target/giraph-0.70-jar-with-dependencies.jar. I assume you need to do the same for Intellij. This is done in pom.xml when doing 'mvn

Re: Restriction of VertexInputFormat

2011-10-26 Thread Avery Ching
Hi Gianmarco, Welcome to Giraph! We definitely look forward to having your input/contributions. Answers inline. On 10/26/11 8:07 AM, Gianmarco De Francisci Morales wrote: Hi, First of all let me introduce myself, my name is Gianmarco and I am a researcher. Second, let me congratulate

Re: Message processing

2011-09-09 Thread Avery Ching
The GraphLab model is more asynchronous than BSP They allow you to update your neighbors rather than the BSP model of messaging per superstep. Rather than one massive barrier in BSP, they implement this with vertex locking. They also all a vertex to modify the state of its neighbors. We could