Greetings Giraphians!
I'm trying out some some simple pagerank tests of Giraph on our cluster
here at Twitter, and I'm wondering what the data-size blow-up is usually
expected to be for the on-disk to in-memory graph representation. I tried
running a pretty tiny (a single part-file, 2GB big, wh
s are wrapped primitives.
> I'm glad to hear you're trying out Giraph at Twitter. Please keep us aware
> of any problems you run into and we'll try to help.
>
Definitely, thanks. We've got some relatively big graphs, I'd be happy to
report our "stress-testing&
(changing thread title to reflect current discussion topic)
On Tue, Sep 6, 2011 at 8:49 AM, Avery Ching wrote:
> Answers are inlined. No vacation for you this weekend I guess =).
>
It was a good "vacation" :)
> Which JIRAs?
>
> https://issues.apache.org/jira/browse/GIRAPH-11 - Balancing m
On Wed, Sep 7, 2011 at 1:02 AM, Avery Ching wrote:
>
>
> Yeah, one edge is pretty silly. To get some real numbers, I should try
> it out with a more realistic (power-law distributed) bit of synthetic data.
>
> Agreed.
>
I'll see if I can write up some simple extensions of the PageRank benchmark
On Wed, Sep 7, 2011 at 6:33 AM, Avery Ching wrote:
>
>I'm not sure that this is precisely the right API, but exposing the
> inner SortedMap definitely has a "leaky abstraction" smell to it to me,
> especially where there are no examples or algorithm implementations in the
> codebase which curr
(*sh*
Dmitriy don't tell!)
> On 9/7/11 12:51 PM, Jake Mannix wrote:
>
> Maybe a few more examples would help? Cases where you want to do a BSP
> computation where the total sort (both the vertexes, and the edges for each
> vertex) is required, as is the random access na
Methinks you will want an updateEdgeValue(), too.
>>
>> D
>>
>> On Wed, Sep 7, 2011 at 3:14 PM, Avery Ching wrote:
>>
>>> On 9/7/11 3:00 PM, Jake Mannix wrote:
>>>
>>> On Wed, Sep 7, 2011 at 9:26 PM, Avery Ching wrote:
>>>
>&g
On Wed, Sep 7, 2011 at 10:43 PM, Dmitriy Ryaboy wrote:
> Something else to think about -- in some cases you may want to
> implement extreme compression of the edge list. For example, we might
> want to calculate n-th degree reach for all nodes in a graph. Given a
> graph with a few nodes with lot
Question about the message passing API:
BaseVertex#sendMsg(I id, M msg)
sends messages. And
BaseVertex#compute(Iterator msgIterator)
deals with them in a big iterator provided by the framework. My question is
where the Iterator of messages is created? If I wanted to say, subvert the
proc
Avery
>
>
> On 9/8/11 12:59 PM, Jake Mannix wrote:
>
>> Question about the message passing API:
>>
>> BaseVertex#sendMsg(I id, M msg)
>>
>> sends messages. And
>>
>> BaseVertex#compute(Iterator msgIterator)
>>
>> deals with
cal node), where the
GraphMapper then loops through and gives them to each Vertex before they do
their compute() calls?
On 9/8/11 1:55 PM, Jake Mannix wrote:
>
> So in particular, each GraphMapper has a BasicRPCCommunications object (and
> a CommunicationsInterface proxy object for each of the o
On Fri, Sep 9, 2011 at 3:22 AM, Claudio Martella wrote:
> One misunderstanding my side. Isn't it true that the messages have to be
> buffered as they all have to be collected before they can be processed (by
> definition of superstep)? So you cannot really process them as they come?
This is the
On Fri, Sep 9, 2011 at 8:03 AM, Avery Ching wrote:
> The GraphLab model is more asynchronous than BSP They allow you to update
> your neighbors rather than the BSP model of messaging per superstep. Rather
> than one massive barrier in BSP, they implement this with vertex locking.
> They also a
Hi Claudio,
So what you want is to be able to build up your own application-specific
data structure for the outbound edges of a vertex (in your case, one that
effectively supports fulltext search on the edge values)?
Check out: https://issues.apache.org/jira/browse/GIRAPH-31 - this change,
if
he new api? the new
> addEdge is stil final. I can add my own "getEdgesByValue()" (as I might have
> more outedges with the same label) to my own vertex, but i must be able to
> modify override the addEdge() somehow.
>
>
> On Tue, Sep 13, 2011 at 4:55 PM, Jake Mannix wrote:
arly when you should subclass which one.
-jake
On Tue, Sep 13, 2011 at 9:37 AM, Dmitriy Ryaboy wrote:
> We should add that to Vertex's javadoc...
>
>
> On Tue, Sep 13, 2011 at 9:31 AM, Jake Mannix wrote:
>
>> Claudio,
>>
>> If your vertex class has spec
ertex, just do what you
need
in your MutableVertex subclass.
Vertex has a final addEdge() method, because its subclasses are depending
on the fact that it's adding edges to the TreeMap that it contains.
-jake
>
> On Tue, Sep 13, 2011 at 6:31 PM, Jake Mannix
> wrote:
> &
mitted tonight (according to Avery), so
I'd wait until that goes in to implement your MutableVertex subclass. Once
you
do, let us know how the API feels, as it's in flux, and user feedback is
much
appreciated!
-jake
On Wed, Sep 14, 2011 at 1:01 AM, Jake Mannix wrote:
> >
>
Yeah, my bad as well, I actually had that header in my GitHub branch, but
somehow didn't get it into the patch - I did mvn install, I promise! :)
On Wed, Sep 14, 2011 at 8:53 AM, Avery Ching wrote:
> And committers too (my bad) =).
>
>
> On 9/14/11 8:49 AM, Owen O'Malley wrote:
>
>> On Wed, Sep
:)
I'm originally a physics nerd, turned mathematician, turned software
engineer mostly working on search (I built large parts of
this<http://www.linkedin.com/search/fpsearch?type=people&keywords=jake+mannix>search
engine, as well as
this <http://twitter.com/#!/who_to_follow/sea
Hey all,
Is there a way to get all comments/activity on the Giraph JIRA sent to me
via email (ie. is there a giraph-jira@ mailing list?)
-jake
And now I feel pretty stupid, because I already made a gmail subfilter for
that!
On Fri, Sep 16, 2011 at 1:57 PM, Dmitriy Ryaboy wrote:
> giraph-dev works.
>
>
> On Fri, Sep 16, 2011 at 1:54 PM, Jake Mannix wrote:
>
>> Hey all,
>>
>> Is there a way to get al
Not really. It's really really early, and they're in the "examples" stage -
nothing is
really productionized. There's things like PageRank, finding shortest path,
but
nothing is really ready for prime time yet.
On Tue, Sep 27, 2011 at 11:56 AM, Josh Patterson wrote:
> Is there a list of known
quot;not ready for prime-time" is *bad*, in case
that's what it looked like I was saying.
More examples the better, so we can see where the bottlenecks are, and move
them toward productionalized stage!
-jake
>
> Aapo
>
> On Sep 27, 2011, at 3:03 PM, Jake Mannix wrote:
&
ssues.apache.org/jira/browse/GIRAPH-37
Watch those spaces for upcoming improvements!
-jake
> Aapo
>
> On Sep 27, 2011, at 4:14 PM, Jake Mannix wrote:
>
>
>
> On Tue, Sep 27, 2011 at 12:31 PM, Aapo Kyrola wrote:
>
>>
>> I have written a very simple Belief P
On Thu, Sep 29, 2011 at 7:55 AM, Claudio Martella <
claudio.marte...@gmail.com> wrote:
> Hello list,
>
> I see I cannot submit a GiraphJob with my own Vertex as it doesn't
> implement Vertex (it extends MutableVertex which extends BasicVertex)
> but GiraphJob.setClass() calls:
> getConfiguration(
Remember that there's already a "singleton"-like object available to all
vertices: the GraphState object, which has a handle on the GraphMapper.
Maybe this is the right place to get your handle on the FSDataOutputStream?
-jake
On Fri, Sep 30, 2011 at 7:25 AM, Claudio Martella <
claudio.marte..
able workerObject);
> > public void preSuperstep(Configurable workerObject);
> > public void postSuperstep(Configurable workerObject);
> > public Configurable getWorkerObject();
> >
> > Anyone else think of a cleaner way to do it?
> >
> > Avery
> >
test' and other mvn commands.
>
> Avery
>
>
> On 10/28/11 11:21 PM, Jake Mannix wrote:
>
> I seem to be getting weird stuff like:
>
> setup: Using local job runner with location for testBspCombiner
> 11/10/28 23:21:00 WARN mapred.JobClient: Use GenericOptionsParser for
Congrats and welcome, Claudio, looking forward to more of your
contributions!
-jake
On Fri, Nov 18, 2011 at 9:10 AM, Jakob Homan wrote:
> I'm very happy to announce we've elected a new committer and PPMC
> member, Claudio Martella. It's been great working with Claudio so far
> and I'm excite
We could special-case NullWritable, and if that class is put into the conf,
instantiate it separately. Seems like a common case if a user wants to
ignore something.
-jake
On Mon, Dec 5, 2011 at 4:29 PM, Avery Ching wrote:
> Actually, isn't that because the default constructor is private? Yo
[hama-user to bcc:]
Let's not crosspost, please, it make the thread of conversation totally
opaque as to who is talking about what.
On Fri, Dec 9, 2011 at 1:42 AM, Praveen Sripati wrote:
> Thanks to Thomas and Avery for the response.
>
> > For Giraph you are quite correct, all the stuff is submi
or clustering), then moving
around may make sense. Also: if nodes die.
-jake
> Regards,
> Praveen
>
> On Fri, Dec 9, 2011 at 11:33 PM, Jake Mannix wrote:
>
>> [hama-user to bcc:]
>>
>> Let's not crosspost, please, it make the thread of conversation totally
In the current form, this is true for Giraph, but I think it's not
necessarily
*required* of the system. Modifying mapreduce on Hadoop to become realtime
would be fundamentally wrong (single batch vs realtime), but pregel on
Hadoop
is different enough that maybe it would work (whether it *should*
Hi David,
The *point* of the Pregel architecture (which Giraph is an implementation
of) is that the whole graph is in (distributed) memory. If you are willing
to go to disk, doing your calculations via MapReduce (possibly talking to a
distributed hashtable of some kind colocated with your hadoo
35 matches
Mail list logo