Re: Test message

2011-09-05 Thread Jake Mannix
Greetings Giraphians! I'm trying out some some simple pagerank tests of Giraph on our cluster here at Twitter, and I'm wondering what the data-size blow-up is usually expected to be for the on-disk to in-memory graph representation. I tried running a pretty tiny (a single part-file, 2GB big, wh

Re: Test message

2011-09-06 Thread Jake Mannix
s are wrapped primitives. > I'm glad to hear you're trying out Giraph at Twitter. Please keep us aware > of any problems you run into and we'll try to help. > Definitely, thanks. We've got some relatively big graphs, I'd be happy to report our "stress-testing&

Primitives vs Objects (the Movie!)

2011-09-06 Thread Jake Mannix
(changing thread title to reflect current discussion topic) On Tue, Sep 6, 2011 at 8:49 AM, Avery Ching wrote: > Answers are inlined. No vacation for you this weekend I guess =). > It was a good "vacation" :) > Which JIRAs? > > https://issues.apache.org/jira/browse/GIRAPH-11 - Balancing m

Re: Primitives vs Objects (the Movie!)

2011-09-06 Thread Jake Mannix
On Wed, Sep 7, 2011 at 1:02 AM, Avery Ching wrote: > > > Yeah, one edge is pretty silly. To get some real numbers, I should try > it out with a more realistic (power-law distributed) bit of synthetic data. > > Agreed. > I'll see if I can write up some simple extensions of the PageRank benchmark

Re: Primitives vs Objects (the Movie!)

2011-09-07 Thread Jake Mannix
On Wed, Sep 7, 2011 at 6:33 AM, Avery Ching wrote: > >I'm not sure that this is precisely the right API, but exposing the > inner SortedMap definitely has a "leaky abstraction" smell to it to me, > especially where there are no examples or algorithm implementations in the > codebase which curr

Re: Primitives vs Objects (the Movie!)

2011-09-07 Thread Jake Mannix
(*sh* Dmitriy don't tell!) > On 9/7/11 12:51 PM, Jake Mannix wrote: > > Maybe a few more examples would help? Cases where you want to do a BSP > computation where the total sort (both the vertexes, and the edges for each > vertex) is required, as is the random access na

Re: Primitives vs Objects (the Movie!)

2011-09-07 Thread Jake Mannix
Methinks you will want an updateEdgeValue(), too. >> >> D >> >> On Wed, Sep 7, 2011 at 3:14 PM, Avery Ching wrote: >> >>> On 9/7/11 3:00 PM, Jake Mannix wrote: >>> >>> On Wed, Sep 7, 2011 at 9:26 PM, Avery Ching wrote: >>> >&g

Re: Primitives vs Objects (the Movie!)

2011-09-07 Thread Jake Mannix
On Wed, Sep 7, 2011 at 10:43 PM, Dmitriy Ryaboy wrote: > Something else to think about -- in some cases you may want to > implement extreme compression of the edge list. For example, we might > want to calculate n-th degree reach for all nodes in a graph. Given a > graph with a few nodes with lot

Message processing

2011-09-08 Thread Jake Mannix
Question about the message passing API: BaseVertex#sendMsg(I id, M msg) sends messages. And BaseVertex#compute(Iterator msgIterator) deals with them in a big iterator provided by the framework. My question is where the Iterator of messages is created? If I wanted to say, subvert the proc

Re: Message processing

2011-09-08 Thread Jake Mannix
Avery > > > On 9/8/11 12:59 PM, Jake Mannix wrote: > >> Question about the message passing API: >> >> BaseVertex#sendMsg(I id, M msg) >> >> sends messages. And >> >> BaseVertex#compute(Iterator msgIterator) >> >> deals with

Re: Message processing

2011-09-08 Thread Jake Mannix
cal node), where the GraphMapper then loops through and gives them to each Vertex before they do their compute() calls? On 9/8/11 1:55 PM, Jake Mannix wrote: > > So in particular, each GraphMapper has a BasicRPCCommunications object (and > a CommunicationsInterface proxy object for each of the o

Re: Message processing

2011-09-09 Thread Jake Mannix
On Fri, Sep 9, 2011 at 3:22 AM, Claudio Martella wrote: > One misunderstanding my side. Isn't it true that the messages have to be > buffered as they all have to be collected before they can be processed (by > definition of superstep)? So you cannot really process them as they come? This is the

Re: Message processing

2011-09-09 Thread Jake Mannix
On Fri, Sep 9, 2011 at 8:03 AM, Avery Ching wrote: > The GraphLab model is more asynchronous than BSP They allow you to update > your neighbors rather than the BSP model of messaging per superstep. Rather > than one massive barrier in BSP, they implement this with vertex locking. > They also a

Re: ValueIndexed OutEdgeMap

2011-09-13 Thread Jake Mannix
Hi Claudio, So what you want is to be able to build up your own application-specific data structure for the outbound edges of a vertex (in your case, one that effectively supports fulltext search on the edge values)? Check out: https://issues.apache.org/jira/browse/GIRAPH-31 - this change, if

Re: ValueIndexed OutEdgeMap

2011-09-13 Thread Jake Mannix
he new api? the new > addEdge is stil final. I can add my own "getEdgesByValue()" (as I might have > more outedges with the same label) to my own vertex, but i must be able to > modify override the addEdge() somehow. > > > On Tue, Sep 13, 2011 at 4:55 PM, Jake Mannix wrote:

Re: ValueIndexed OutEdgeMap

2011-09-13 Thread Jake Mannix
arly when you should subclass which one. -jake On Tue, Sep 13, 2011 at 9:37 AM, Dmitriy Ryaboy wrote: > We should add that to Vertex's javadoc... > > > On Tue, Sep 13, 2011 at 9:31 AM, Jake Mannix wrote: > >> Claudio, >> >> If your vertex class has spec

Re: ValueIndexed OutEdgeMap

2011-09-13 Thread Jake Mannix
ertex, just do what you need in your MutableVertex subclass. Vertex has a final addEdge() method, because its subclasses are depending on the fact that it's adding edges to the TreeMap that it contains. -jake > > On Tue, Sep 13, 2011 at 6:31 PM, Jake Mannix > wrote: > &

Re: ValueIndexed OutEdgeMap

2011-09-13 Thread Jake Mannix
mitted tonight (according to Avery), so I'd wait until that goes in to implement your MutableVertex subclass. Once you do, let us know how the API feels, as it's in flux, and user feedback is much appreciated! -jake On Wed, Sep 14, 2011 at 1:01 AM, Jake Mannix wrote: > > >

Re: trunk doesn't build because of header license missing

2011-09-14 Thread Jake Mannix
Yeah, my bad as well, I actually had that header in my GitHub branch, but somehow didn't get it into the patch - I did mvn install, I promise! :) On Wed, Sep 14, 2011 at 8:53 AM, Avery Ching wrote: > And committers too (my bad) =). > > > On 9/14/11 8:49 AM, Owen O'Malley wrote: > >> On Wed, Sep

Re: Announcement: Welcome to our new committers and PPMC members - Jake Mannix (Twitter) and Dmitriy Ryaboy (Twitter)!

2011-09-15 Thread Jake Mannix
:) I'm originally a physics nerd, turned mathematician, turned software engineer mostly working on search (I built large parts of this<http://www.linkedin.com/search/fpsearch?type=people&keywords=jake+mannix>search engine, as well as this <http://twitter.com/#!/who_to_follow/sea

JIRA watching?

2011-09-16 Thread Jake Mannix
Hey all, Is there a way to get all comments/activity on the Giraph JIRA sent to me via email (ie. is there a giraph-jira@ mailing list?) -jake

Re: JIRA watching?

2011-09-16 Thread Jake Mannix
And now I feel pretty stupid, because I already made a gmail subfilter for that! On Fri, Sep 16, 2011 at 1:57 PM, Dmitriy Ryaboy wrote: > giraph-dev works. > > > On Fri, Sep 16, 2011 at 1:54 PM, Jake Mannix wrote: > >> Hey all, >> >> Is there a way to get al

Re: List of Algos implemented on Giraph

2011-09-27 Thread Jake Mannix
Not really. It's really really early, and they're in the "examples" stage - nothing is really productionized. There's things like PageRank, finding shortest path, but nothing is really ready for prime time yet. On Tue, Sep 27, 2011 at 11:56 AM, Josh Patterson wrote: > Is there a list of known

Re: List of Algos implemented on Giraph

2011-09-27 Thread Jake Mannix
quot;not ready for prime-time" is *bad*, in case that's what it looked like I was saying. More examples the better, so we can see where the bottlenecks are, and move them toward productionalized stage! -jake > > Aapo > > On Sep 27, 2011, at 3:03 PM, Jake Mannix wrote: &

Re: List of Algos implemented on Giraph

2011-09-27 Thread Jake Mannix
ssues.apache.org/jira/browse/GIRAPH-37 Watch those spaces for upcoming improvements! -jake > Aapo > > On Sep 27, 2011, at 4:14 PM, Jake Mannix wrote: > > > > On Tue, Sep 27, 2011 at 12:31 PM, Aapo Kyrola wrote: > >> >> I have written a very simple Belief P

Re: Can't setClass in GiraphJob

2011-09-29 Thread Jake Mannix
On Thu, Sep 29, 2011 at 7:55 AM, Claudio Martella < claudio.marte...@gmail.com> wrote: > Hello list, > > I see I cannot submit a GiraphJob with my own Vertex as it doesn't > implement Vertex (it extends MutableVertex which extends BasicVertex) > but GiraphJob.setClass() calls: > getConfiguration(

Re: On pre/post Application/Superstep contract

2011-09-30 Thread Jake Mannix
Remember that there's already a "singleton"-like object available to all vertices: the GraphState object, which has a handle on the GraphMapper. Maybe this is the right place to get your handle on the FSDataOutputStream? -jake On Fri, Sep 30, 2011 at 7:25 AM, Claudio Martella < claudio.marte..

Re: On pre/post Application/Superstep contract

2011-10-01 Thread Jake Mannix
able workerObject); > > public void preSuperstep(Configurable workerObject); > > public void postSuperstep(Configurable workerObject); > > public Configurable getWorkerObject(); > > > > Anyone else think of a cleaner way to do it? > > > > Avery > >

Re: way to run unit tests from inside IDE?

2011-10-29 Thread Jake Mannix
test' and other mvn commands. > > Avery > > > On 10/28/11 11:21 PM, Jake Mannix wrote: > > I seem to be getting weird stuff like: > > setup: Using local job runner with location for testBspCombiner > 11/10/28 23:21:00 WARN mapred.JobClient: Use GenericOptionsParser for

Re: Please welcome our newest committer and PPMC member, Claudio!

2011-11-18 Thread Jake Mannix
Congrats and welcome, Claudio, looking forward to more of your contributions! -jake On Fri, Nov 18, 2011 at 9:10 AM, Jakob Homan wrote: > I'm very happy to announce we've elected a new committer and PPMC > member, Claudio Martella. It's been great working with Claudio so far > and I'm excite

Re: NullWritable causes IllegalAccessException

2011-12-06 Thread Jake Mannix
We could special-case NullWritable, and if that class is put into the conf, instantiate it separately. Seems like a common case if a user wants to ignore something. -jake On Mon, Dec 5, 2011 at 4:29 PM, Avery Ching wrote: > Actually, isn't that because the default constructor is private? Yo

Re: Comparing BSP and MR

2011-12-09 Thread Jake Mannix
[hama-user to bcc:] Let's not crosspost, please, it make the thread of conversation totally opaque as to who is talking about what. On Fri, Dec 9, 2011 at 1:42 AM, Praveen Sripati wrote: > Thanks to Thomas and Avery for the response. > > > For Giraph you are quite correct, all the stuff is submi

Re: Comparing BSP and MR

2011-12-09 Thread Jake Mannix
or clustering), then moving around may make sense. Also: if nodes die. -jake > Regards, > Praveen > > On Fri, Dec 9, 2011 at 11:33 PM, Jake Mannix wrote: > >> [hama-user to bcc:] >> >> Let's not crosspost, please, it make the thread of conversation totally

Re: Use Giraph to simulate Storm ?

2012-01-03 Thread Jake Mannix
In the current form, this is true for Giraph, but I think it's not necessarily *required* of the system. Modifying mapreduce on Hadoop to become realtime would be fundamentally wrong (single batch vs realtime), but pregel on Hadoop is different enough that maybe it would work (whether it *should*

Re: Caching (with LRU or something) strategy in Giraph?

2012-01-31 Thread Jake Mannix
Hi David, The *point* of the Pregel architecture (which Giraph is an implementation of) is that the whole graph is in (distributed) memory. If you are willing to go to disk, doing your calculations via MapReduce (possibly talking to a distributed hashtable of some kind colocated with your hadoo