Abstractly, a lot of graph algorithms fall into this paradigm:
foreach node in the graph
neighbors = getNeighbors(graph, node);
doSomething(node, neighbors);
end
I am trying to parallelize this operation for large graphs with
millions of nodes. So, one approach would be to partition the graph
nodes and have a mapper operate on each partition. The representation
I'm seeking is one which will allow me to access the neighbors of a
node and read/write to their attributes.
On Oct 23, 4:44 pm, Dave Bailey <[EMAIL PROTECTED]> wrote:
> Sounds like fun. Just curious what is actually supposed to be encoded
> in the protobuf message, if the graph is too big to fit in memory?
> Are you sending just a single node, with its edges? It seems like you
> wouldn't be able to avoid scenarios where some of the edges in your
> message point to nodes that are not also serialized in your message,
> given that you can't actually store the entire graph in memory, let
> alone a protobuf message? What graph operations are you looking to
> parallelize?
>
> -dave
>
> On Oct 23, 12:12 am, GDR <[EMAIL PROTECTED]> wrote:
>
> > Thanks. I am trying to use protocol buffers with a map reduce
> > framework to work on large graphs. The graphs are too big to fit in
> > memory so building auxiliary data structures is difficult. Any ideas?
>
> > On Oct 22, 2:48 pm, Jeremy Leader <[EMAIL PROTECTED]> wrote:
>
> > > Protocol Buffers are a serialization format, rather than general-purpose
> > > data structures. To do computations, you'd probably want to build some
> > > auxiliary data structures, which you populate when you deserialize the
> > > protobuf data. You could have node objects that resemble your original
> > > .proto file, where nodes have references to their neighbors, and you'd
> > > probably need a map from node id to node object reference, which you'd
> > > use during deserialization.
>
> > > --
> > > Jeremy Leader
> > > [EMAIL PROTECTED]
>
> > > GDR wrote:
> > > > my bad. The code snippet should be as follows:
>
> > > > for(UndirectedGraphNode node : UndirectedGraph.getNodesList() ) {
> > > > double sum = 0;
> > > > int count = 0;
> > > > for(UndirectedGraphNodeReference neighbor :
> > > > node.getNeighborsList() ) {
> > > > sum += ????
> > > > count++;
> > > > }
> > > > node.setWeight(sum/count);
>
> > > > }
>
> > > > ----- graph.proto -----
>
> > > > package graph;
>
> > > > option java_package = "graph";
> > > > option java_outer_classname = "UndirectedGraphType";
> > > > option optimize_for = CODE_SIZE;
>
> > > > message UndirectedGraphNodeReference {
> > > > required string id = 1;
> > > > }
>
> > > > message UndirectedGraphNode {
> > > > required string id = 1;
> > > > required double weight = 2;
> > > > repeated UndirectedGraphNodeReference neighbors = 2;
> > > > }
>
> > > > message UndirectedGraph {
> > > > repeated UndirectedGraphNode nodes = 1;
> > > > }
>
> > > > On Oct 22, 2:36 pm, GDR <[EMAIL PROTECTED]> wrote:
> > > >> That does solve the duplicate information problem but it makes updates
> > > >> to node attributes (like weight) difficult. Let's say, I want to
> > > >> assign the weight of each node to the average of its neighbors.
>
> > > >> for(UndirectedGraphNode node : UndirectedGraph.getNodesList() ) {
> > > >> double sum = 0;
> > > >> int count = 0;
> > > >> for(UndirectedGraphNodeReference neighbor :
> > > >> node.getNeighborsList() ) {
> > > >> sum += ????
> > > >> count++;
> > > >> }
> > > >> node.setWeight(sum/count);
>
> > > >> }
>
> > > >> ----- graph.proto -----
>
> > > >> package graph;
>
> > > >> option java_package = "graph";
> > > >> option java_outer_classname = "UndirectedGraphType";
> > > >> option optimize_for = CODE_SIZE;
>
> > > >> message UndirectedGraphNodeReference {
> > > >> required string id = 1;
> > > >> required double weight = 2;
>
> > > >> }
>
> > > >> message UndirectedGraphNode {
> > > >> required string id = 1;
> > > >> repeated UndirectedGraphNodeReference neighbors = 2;
>
> > > >> }
>
> > > >> message UndirectedGraph {
> > > >> repeated UndirectedGraphNode nodes = 1;
>
> > > >> }
>
> > > >> On Oct 22, 2:14 pm, Jeremy Leader <[EMAIL PROTECTED]> wrote:
>
> > > >>> I was assuming all the properties of a node (weight, label, color,
> > > >>> whatever) would be in UndirectedGraphNode;
> > > >>> UndirectedGraphNodeReference
> > > >>> would only have the id and nothing else.
> > > >>> --
> > > >>> Jeremy Leader
> > > >>> [EMAIL PROTECTED]
> > > >>> GDR wrote:
> > > >>>> Thanks Jeremy. That worked!
> > > >>>> But we now have information about the same node being replicated. For
> > > >>>> instance, let's say we have a field 'weight' attached to each node as
> > > >>>> shown below. This setup will replicate the weight information of a
> > > >>>> node as many times as its degree. If the weight of a node changes, I
> > > >>>> will have update all it's occurrences in the PB. Any way I can avoid
> > > >>>> it?
> > > >>>> package graph;
> > > >>>> option java_package = "graph";
> > > >>>> option java_outer_classname = "UndirectedGraphType";
> > > >>>> option optimize_for = CODE_SIZE;
> > > >>>> message UndirectedGraphNodeReference {
> > > >>>> required string id = 1;
> > > >>>> required double weight = 2;
> > > >>>> }
> > > >>>> message UndirectedGraphNode {
> > > >>>> required string id = 1;
> > > >>>> repeated UndirectedGraphNodeReference neighbors = 2;
> > > >>>> }
> > > >>>> message UndirectedGraph {
> > > >>>> repeated UndirectedGraphNode nodes = 1;
> > > >>>> }
> > > >>>> On Oct 21, 6:37 pm, Jeremy Leader <[EMAIL PROTECTED]> wrote:
> > > >>>>> Keep in mind that protobufs describe serialized data, and there's no
> > > >>>>> concept of an object reference like Java uses. In your example, if
> > > >>>>> A
> > > >>>>> and B are neighbors, then in your proto, the data representing A
> > > >>>>> contains the data representing B, and the data representing B
> > > >>>>> contains
> > > >>>>> the data representing A!
> > > >>>>> One way around this is to implement your own form of references,
> > > >>>>> perhaps
> > > >>>>> using the node ids like this:
> > > >>>>> package graph;
> > > >>>>> option java_package = "graph";
> > > >>>>> option java_outer_classname = "UndirectedGraph";
> > > >>>>> option optimize_for = CODE_SIZE;
> > > >>>>> message UndirectedGraphNodeReference {
> > > >>>>> required string id = 0;
> > > >>>>> }
> > > >>>>> message UndirectedGraphNode {
> > > >>>>> required string id = 0;
> > > >>>>> repeated UndirectedGraphNodeReference neighbors;
> > > >>>>> }
> > > >>>>> message UndirectedGraph {
> > > >>>>> repeated UndirectedGraphNode nodes;
> > > >>>>> }
> > > >>>>> --
> > > >>>>> Jeremy Leader
> > > >>>>> [EMAIL PROTECTED]
> > > >>>>> GDR wrote:
> > > >>>>>> Hi,
> > > >>>>>> I'm wondering how would one go about implementing self-referential
> > > >>>>>> data structures? As an exercise, I tried to implement a PB
> > > >>>>>> version of
> > > >>>>>> the adjacency list representation of a graph. I'm having a hard
> > > >>>>>> time
> > > >>>>>> getting it work. Any suggestions?
> > > >>>>>> Thanks!
> > > >>>>>> --- graph.proto ---
> > > >>>>>> package graph;
> > > >>>>>> option java_package = "graph";
> > > >>>>>> option java_outer_classname = "UndirectedGraph";
> > > >>>>>> option optimize_for = CODE_SIZE;
> > > >>>>>> message UndirectedGraphNode {
> > > >>>>>> required string id = 0;
> > > >>>>>> repeated UndirectedGraphNode neighbors;
> > > >>>>>> }
> > > >>>>>> message UndirectedGraph {
> > > >>>>>> repeated UndirectedGraphNode nodes;
> > > >>>>>> }
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Protocol Buffers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/protobuf?hl=en
-~----------~----~----~----~------~----~------~--~---