Re: Multiple node types in Giraph and doing a selective M/R over one of them

Eli Reisman Mon, 28 Jan 2013 09:32:06 -0800

I agree, something like this is possible using the vertex value. In giraph,
we now have native support for multigraphs, but before we had that support,
I described a kind of "cheat" to process multigraphs. You could use a
variation of that same cheat (its on the site confluence wiki) to do what
you're talking about I think, even though you're not dealing with a
multigraph in the problem you described. Essentially, you can get clever
about what sort of Writable you use for the vertex value type, and/or what
the values it holds can represent in your dataset.

Alternately, in the off chance that the row-keys do not repeat in the
tables, then really the "row key" can be a Writable vertex ID as long as
each is unique .The only repetition would be the fact that other rows with
their own unique row-keys contain row values that mark out-edges to other
unique row-keys in the table, but more than once since any row-key could
have lots of other rows "pointing" an out-edge value towards it. Thinking
of each row key as unique vertex ID then just turns this into a vanilla
graph. However, if the row keys are not unique in among all your tables,
this oversimplifies the problem and you really are stuck wtih the above
vertex value option.

My point: Giraph has vertex value, ID, out-edge-to-other-vertex ID's, and
message data types, and as long as the properties required of each for a
graph are met, and each is a Writable, you can think of the problem (often)
in one of several ways that Giraph can support.

One last thought: assuming the graph does not mutate during processing, you
could also write a custom input format that evaluates each row as it builds
it into a graph vertex data structure, and chooses only row keys that are
of a certain classification in your use case to make into graph data for
that job run, simply skipping the other rows as it reads them. again, this
"solution" depends on the nature of your problem. Just something to play
with.

Good luck with your use case!

On Mon, Jan 28, 2013 at 7:09 AM, Claudio Martella <
[email protected]> wrote:

> Giraph does not support multipartite graph in a natural way. But you can
> try to model your different sets through the vertexvalue. You can then
> propagate it (by composing with the ID?) to the neighbors, and obtain your
> join.
>
>
> On Mon, Jan 28, 2013 at 2:52 PM, David Koch <[email protected]> wrote:
>
>> Hello,
>>
>> In Giraph is it possible to have different node types in a graph and have
>> a Map/Reduce only iterate over nodes of this type and their direct
>> successors?
>>
>> If it sounds a bit cryptic here is something more about our use-case:
>> We have different HBase tables which we want to "pseudo-join" in
>> Map/Reduce computations. The node types I mentioned above correspond to the
>> respective row-key types used in each of those tables, edges are generated
>> by the fact that the KeyValues in each table can contain row-key values
>> found in one of the other tables.
>>
>> The graph would describe these relations. In a Map/Reduce I then want to
>> be able to iterate over all nodes of a given type while also having access
>> to a node's successor nodes in the same Mapper instance or better yet the
>> same map() call. One would then carry out additional Gets to retrieve the
>> data from the tables thus doing a fairly crude join.
>>
>> The Graph is likely to change so it would be nice if it could be updated
>> incrementally.
>>
>> Does all this sound like something that would be possible with Giraph?
>>
>> Thank you,
>>
>> /David
>>
>>
>>
>>
>
>
> --
>    Claudio Martella
>    [email protected]
>

Re: Multiple node types in Giraph and doing a selective M/R over one of them

Reply via email to