Re: Data structures using protocol buffers

2008-10-24 Thread GDR

Abstractly, a lot of graph algorithms fall into this paradigm:

foreach node in the graph
 neighbors = getNeighbors(graph, node);
 doSomething(node, neighbors);
end

I am trying to parallelize this operation for large graphs with
millions of nodes. So, one approach would be to partition the graph
nodes and have a mapper operate on each partition. The representation
I'm seeking is one which will allow me to access the neighbors of a
node and read/write to their attributes.

On Oct 23, 4:44 pm, Dave Bailey <[EMAIL PROTECTED]> wrote:
> Sounds like fun.  Just curious what is actually supposed to be encoded
> in the protobuf message, if the graph is too big to fit in memory?
> Are you sending just a single node, with its edges?  It seems like you
> wouldn't be able to avoid scenarios where some of the edges in your
> message point to nodes that are not also serialized in your message,
> given that you can't actually store the entire graph in memory, let
> alone a protobuf message?  What graph operations are you looking to
> parallelize?
>
> -dave
>
> On Oct 23, 12:12 am, GDR <[EMAIL PROTECTED]> wrote:
>
> > Thanks. I am trying to use protocol buffers with a map reduce
> > framework to work on large graphs. The graphs are too big to fit in
> > memory so building auxiliary data structures is difficult. Any ideas?
>
> > On Oct 22, 2:48 pm, Jeremy Leader <[EMAIL PROTECTED]> wrote:
>
> > > Protocol Buffers are a serialization format, rather than general-purpose
> > > data structures.  To do computations, you'd probably want to build some
> > > auxiliary data structures, which you populate when you deserialize the
> > > protobuf data.  You could have node objects that resemble your original
> > > .proto file, where nodes have references to their neighbors, and you'd
> > > probably need a map from node id to node object reference, which you'd
> > > use during deserialization.
>
> > > --
> > > Jeremy Leader
> > > [EMAIL PROTECTED]
>
> > > GDR wrote:
> > > > my bad. The code snippet should be as follows:
>
> > > > for(UndirectedGraphNode node : UndirectedGraph.getNodesList() ) {
> > > >    double sum = 0;
> > > >    int count = 0;
> > > >    for(UndirectedGraphNodeReference neighbor :
> > > > node.getNeighborsList() ) {
> > > >      sum += 
> > > >      count++;
> > > >    }
> > > >    node.setWeight(sum/count);
>
> > > > }
>
> > > > - graph.proto -
>
> > > > package graph;
>
> > > > option java_package = "graph";
> > > > option java_outer_classname = "UndirectedGraphType";
> > > > option optimize_for = CODE_SIZE;
>
> > > > message UndirectedGraphNodeReference {
> > > >    required string id = 1;
> > > > }
>
> > > > message UndirectedGraphNode {
> > > >    required string id = 1;
> > > >    required double weight = 2;
> > > >    repeated UndirectedGraphNodeReference neighbors = 2;
> > > > }
>
> > > > message UndirectedGraph {
> > > >    repeated UndirectedGraphNode nodes = 1;
> > > > }
>
> > > > On Oct 22, 2:36 pm, GDR <[EMAIL PROTECTED]> wrote:
> > > >> That does solve the duplicate information problem but it makes updates
> > > >> to node attributes (like weight) difficult. Let's say, I want to
> > > >> assign the weight of each node to the average of its neighbors.
>
> > > >> for(UndirectedGraphNode node : UndirectedGraph.getNodesList() ) {
> > > >>    double sum = 0;
> > > >>    int count = 0;
> > > >>    for(UndirectedGraphNodeReference neighbor :
> > > >> node.getNeighborsList() ) {
> > > >>      sum += 
> > > >>      count++;
> > > >>    }
> > > >>    node.setWeight(sum/count);
>
> > > >> }
>
> > > >> - graph.proto -
>
> > > >> package graph;
>
> > > >> option java_package = "graph";
> > > >> option java_outer_classname = "UndirectedGraphType";
> > > >> option optimize_for = CODE_SIZE;
>
> > > >> message UndirectedGraphNodeReference {
> > > >>    required string id = 1;
> > > >>    required double weight = 2;
>
> > > >> }
>
> > > >> message UndirectedGraphNode {
> > > >>    required string id = 1;
> > > >>    repeated UndirectedGraphNodeReference neighbors = 2;
>
> > > >> }
>
> > > >> message UndirectedGraph {
> > > >>    repeated UndirectedGraphNode nodes = 1;
>
> > > >> }
>
> > > >> On Oct 22, 2:14 pm, Jeremy Leader <[EMAIL PROTECTED]> wrote:
>
> > > >>> I was assuming all the properties of a node (weight, label, color,
> > > >>> whatever) would be in UndirectedGraphNode; 
> > > >>> UndirectedGraphNodeReference
> > > >>> would only have the id and nothing else.
> > > >>> --
> > > >>> Jeremy Leader
> > > >>> [EMAIL PROTECTED]
> > > >>> GDR wrote:
> > >  Thanks Jeremy. That worked!
> > >  But we now have information about the same node being replicated. For
> > >  instance, let's say we have a field 'weight' attached to each node as
> > >  shown below. This setup will replicate the weight information of a
> > >  node as many times as its degree. If the weight of a node changes, I
> > >  will have update all it's occurrences in the PB. Any way I can avoid
> 

Re: Data structures using protocol buffers

2008-10-23 Thread Dave Bailey

Sounds like fun.  Just curious what is actually supposed to be encoded
in the protobuf message, if the graph is too big to fit in memory?
Are you sending just a single node, with its edges?  It seems like you
wouldn't be able to avoid scenarios where some of the edges in your
message point to nodes that are not also serialized in your message,
given that you can't actually store the entire graph in memory, let
alone a protobuf message?  What graph operations are you looking to
parallelize?

-dave

On Oct 23, 12:12 am, GDR <[EMAIL PROTECTED]> wrote:
> Thanks. I am trying to use protocol buffers with a map reduce
> framework to work on large graphs. The graphs are too big to fit in
> memory so building auxiliary data structures is difficult. Any ideas?
>
> On Oct 22, 2:48 pm, Jeremy Leader <[EMAIL PROTECTED]> wrote:
>
> > Protocol Buffers are a serialization format, rather than general-purpose
> > data structures.  To do computations, you'd probably want to build some
> > auxiliary data structures, which you populate when you deserialize the
> > protobuf data.  You could have node objects that resemble your original
> > .proto file, where nodes have references to their neighbors, and you'd
> > probably need a map from node id to node object reference, which you'd
> > use during deserialization.
>
> > --
> > Jeremy Leader
> > [EMAIL PROTECTED]
>
> > GDR wrote:
> > > my bad. The code snippet should be as follows:
>
> > > for(UndirectedGraphNode node : UndirectedGraph.getNodesList() ) {
> > >    double sum = 0;
> > >    int count = 0;
> > >    for(UndirectedGraphNodeReference neighbor :
> > > node.getNeighborsList() ) {
> > >      sum += 
> > >      count++;
> > >    }
> > >    node.setWeight(sum/count);
>
> > > }
>
> > > - graph.proto -
>
> > > package graph;
>
> > > option java_package = "graph";
> > > option java_outer_classname = "UndirectedGraphType";
> > > option optimize_for = CODE_SIZE;
>
> > > message UndirectedGraphNodeReference {
> > >    required string id = 1;
> > > }
>
> > > message UndirectedGraphNode {
> > >    required string id = 1;
> > >    required double weight = 2;
> > >    repeated UndirectedGraphNodeReference neighbors = 2;
> > > }
>
> > > message UndirectedGraph {
> > >    repeated UndirectedGraphNode nodes = 1;
> > > }
>
> > > On Oct 22, 2:36 pm, GDR <[EMAIL PROTECTED]> wrote:
> > >> That does solve the duplicate information problem but it makes updates
> > >> to node attributes (like weight) difficult. Let's say, I want to
> > >> assign the weight of each node to the average of its neighbors.
>
> > >> for(UndirectedGraphNode node : UndirectedGraph.getNodesList() ) {
> > >>    double sum = 0;
> > >>    int count = 0;
> > >>    for(UndirectedGraphNodeReference neighbor :
> > >> node.getNeighborsList() ) {
> > >>      sum += 
> > >>      count++;
> > >>    }
> > >>    node.setWeight(sum/count);
>
> > >> }
>
> > >> - graph.proto -
>
> > >> package graph;
>
> > >> option java_package = "graph";
> > >> option java_outer_classname = "UndirectedGraphType";
> > >> option optimize_for = CODE_SIZE;
>
> > >> message UndirectedGraphNodeReference {
> > >>    required string id = 1;
> > >>    required double weight = 2;
>
> > >> }
>
> > >> message UndirectedGraphNode {
> > >>    required string id = 1;
> > >>    repeated UndirectedGraphNodeReference neighbors = 2;
>
> > >> }
>
> > >> message UndirectedGraph {
> > >>    repeated UndirectedGraphNode nodes = 1;
>
> > >> }
>
> > >> On Oct 22, 2:14 pm, Jeremy Leader <[EMAIL PROTECTED]> wrote:
>
> > >>> I was assuming all the properties of a node (weight, label, color,
> > >>> whatever) would be in UndirectedGraphNode; UndirectedGraphNodeReference
> > >>> would only have the id and nothing else.
> > >>> --
> > >>> Jeremy Leader
> > >>> [EMAIL PROTECTED]
> > >>> GDR wrote:
> >  Thanks Jeremy. That worked!
> >  But we now have information about the same node being replicated. For
> >  instance, let's say we have a field 'weight' attached to each node as
> >  shown below. This setup will replicate the weight information of a
> >  node as many times as its degree. If the weight of a node changes, I
> >  will have update all it's occurrences in the PB. Any way I can avoid
> >  it?
> >  package graph;
> >  option java_package = "graph";
> >  option java_outer_classname = "UndirectedGraphType";
> >  option optimize_for = CODE_SIZE;
> >  message UndirectedGraphNodeReference {
> >     required string id = 1;
> >     required double weight = 2;
> >  }
> >  message UndirectedGraphNode {
> >     required string id = 1;
> >     repeated UndirectedGraphNodeReference neighbors = 2;
> >  }
> >  message UndirectedGraph {
> >     repeated UndirectedGraphNode nodes = 1;
> >  }
> >  On Oct 21, 6:37 pm, Jeremy Leader <[EMAIL PROTECTED]> wrote:
> > > Keep in mind that protobufs describe serialized data, and there's no
> > > concept of an object reference

Re: Data structures using protocol buffers

2008-10-23 Thread GDR

Thanks. I am trying to use protocol buffers with a map reduce
framework to work on large graphs. The graphs are too big to fit in
memory so building auxiliary data structures is difficult. Any ideas?

On Oct 22, 2:48 pm, Jeremy Leader <[EMAIL PROTECTED]> wrote:
> Protocol Buffers are a serialization format, rather than general-purpose
> data structures.  To do computations, you'd probably want to build some
> auxiliary data structures, which you populate when you deserialize the
> protobuf data.  You could have node objects that resemble your original
> .proto file, where nodes have references to their neighbors, and you'd
> probably need a map from node id to node object reference, which you'd
> use during deserialization.
>
> --
> Jeremy Leader
> [EMAIL PROTECTED]
>
> GDR wrote:
> > my bad. The code snippet should be as follows:
>
> > for(UndirectedGraphNode node : UndirectedGraph.getNodesList() ) {
> >    double sum = 0;
> >    int count = 0;
> >    for(UndirectedGraphNodeReference neighbor :
> > node.getNeighborsList() ) {
> >      sum += 
> >      count++;
> >    }
> >    node.setWeight(sum/count);
>
> > }
>
> > - graph.proto -
>
> > package graph;
>
> > option java_package = "graph";
> > option java_outer_classname = "UndirectedGraphType";
> > option optimize_for = CODE_SIZE;
>
> > message UndirectedGraphNodeReference {
> >    required string id = 1;
> > }
>
> > message UndirectedGraphNode {
> >    required string id = 1;
> >    required double weight = 2;
> >    repeated UndirectedGraphNodeReference neighbors = 2;
> > }
>
> > message UndirectedGraph {
> >    repeated UndirectedGraphNode nodes = 1;
> > }
>
> > On Oct 22, 2:36 pm, GDR <[EMAIL PROTECTED]> wrote:
> >> That does solve the duplicate information problem but it makes updates
> >> to node attributes (like weight) difficult. Let's say, I want to
> >> assign the weight of each node to the average of its neighbors.
>
> >> for(UndirectedGraphNode node : UndirectedGraph.getNodesList() ) {
> >>    double sum = 0;
> >>    int count = 0;
> >>    for(UndirectedGraphNodeReference neighbor :
> >> node.getNeighborsList() ) {
> >>      sum += 
> >>      count++;
> >>    }
> >>    node.setWeight(sum/count);
>
> >> }
>
> >> - graph.proto -
>
> >> package graph;
>
> >> option java_package = "graph";
> >> option java_outer_classname = "UndirectedGraphType";
> >> option optimize_for = CODE_SIZE;
>
> >> message UndirectedGraphNodeReference {
> >>    required string id = 1;
> >>    required double weight = 2;
>
> >> }
>
> >> message UndirectedGraphNode {
> >>    required string id = 1;
> >>    repeated UndirectedGraphNodeReference neighbors = 2;
>
> >> }
>
> >> message UndirectedGraph {
> >>    repeated UndirectedGraphNode nodes = 1;
>
> >> }
>
> >> On Oct 22, 2:14 pm, Jeremy Leader <[EMAIL PROTECTED]> wrote:
>
> >>> I was assuming all the properties of a node (weight, label, color,
> >>> whatever) would be in UndirectedGraphNode; UndirectedGraphNodeReference
> >>> would only have the id and nothing else.
> >>> --
> >>> Jeremy Leader
> >>> [EMAIL PROTECTED]
> >>> GDR wrote:
>  Thanks Jeremy. That worked!
>  But we now have information about the same node being replicated. For
>  instance, let's say we have a field 'weight' attached to each node as
>  shown below. This setup will replicate the weight information of a
>  node as many times as its degree. If the weight of a node changes, I
>  will have update all it's occurrences in the PB. Any way I can avoid
>  it?
>  package graph;
>  option java_package = "graph";
>  option java_outer_classname = "UndirectedGraphType";
>  option optimize_for = CODE_SIZE;
>  message UndirectedGraphNodeReference {
>     required string id = 1;
>     required double weight = 2;
>  }
>  message UndirectedGraphNode {
>     required string id = 1;
>     repeated UndirectedGraphNodeReference neighbors = 2;
>  }
>  message UndirectedGraph {
>     repeated UndirectedGraphNode nodes = 1;
>  }
>  On Oct 21, 6:37 pm, Jeremy Leader <[EMAIL PROTECTED]> wrote:
> > Keep in mind that protobufs describe serialized data, and there's no
> > concept of an object reference like Java uses.  In your example, if A
> > and B are neighbors, then in your proto, the data representing A
> > contains the data representing B, and the data representing B contains
> > the data representing A!
> > One way around this is to implement your own form of references, perhaps
> > using the node ids like this:
> > package graph;
> > option java_package = "graph";
> > option java_outer_classname = "UndirectedGraph";
> > option optimize_for = CODE_SIZE;
> > message UndirectedGraphNodeReference {
> >    required string id = 0;
> > }
> > message UndirectedGraphNode {
> >    required string id = 0;
> >    repeated UndirectedGraphNodeReference neighbors;
> > }
> > message UndirectedGra

Re: Data structures using protocol buffers

2008-10-22 Thread Jeremy Leader

Protocol Buffers are a serialization format, rather than general-purpose 
data structures.  To do computations, you'd probably want to build some 
auxiliary data structures, which you populate when you deserialize the 
protobuf data.  You could have node objects that resemble your original 
.proto file, where nodes have references to their neighbors, and you'd 
probably need a map from node id to node object reference, which you'd 
use during deserialization.

-- 
Jeremy Leader
[EMAIL PROTECTED]

GDR wrote:
> my bad. The code snippet should be as follows:
> 
> for(UndirectedGraphNode node : UndirectedGraph.getNodesList() ) {
>double sum = 0;
>int count = 0;
>for(UndirectedGraphNodeReference neighbor :
> node.getNeighborsList() ) {
>  sum += 
>  count++;
>}
>node.setWeight(sum/count);
> 
> }
> 
> - graph.proto -
> 
> package graph;
> 
> option java_package = "graph";
> option java_outer_classname = "UndirectedGraphType";
> option optimize_for = CODE_SIZE;
> 
> message UndirectedGraphNodeReference {
>required string id = 1;
> }
> 
> message UndirectedGraphNode {
>required string id = 1;
>required double weight = 2;
>repeated UndirectedGraphNodeReference neighbors = 2;
> }
> 
> message UndirectedGraph {
>repeated UndirectedGraphNode nodes = 1;
> }
> 
> On Oct 22, 2:36 pm, GDR <[EMAIL PROTECTED]> wrote:
>> That does solve the duplicate information problem but it makes updates
>> to node attributes (like weight) difficult. Let's say, I want to
>> assign the weight of each node to the average of its neighbors.
>>
>> for(UndirectedGraphNode node : UndirectedGraph.getNodesList() ) {
>>double sum = 0;
>>int count = 0;
>>for(UndirectedGraphNodeReference neighbor :
>> node.getNeighborsList() ) {
>>  sum += 
>>  count++;
>>}
>>node.setWeight(sum/count);
>>
>> }
>>
>> - graph.proto -
>>
>> package graph;
>>
>> option java_package = "graph";
>> option java_outer_classname = "UndirectedGraphType";
>> option optimize_for = CODE_SIZE;
>>
>> message UndirectedGraphNodeReference {
>>required string id = 1;
>>required double weight = 2;
>>
>> }
>>
>> message UndirectedGraphNode {
>>required string id = 1;
>>repeated UndirectedGraphNodeReference neighbors = 2;
>>
>> }
>>
>> message UndirectedGraph {
>>repeated UndirectedGraphNode nodes = 1;
>>
>> }
>>
>> On Oct 22, 2:14 pm, Jeremy Leader <[EMAIL PROTECTED]> wrote:
>>
>>> I was assuming all the properties of a node (weight, label, color,
>>> whatever) would be in UndirectedGraphNode; UndirectedGraphNodeReference
>>> would only have the id and nothing else.
>>> --
>>> Jeremy Leader
>>> [EMAIL PROTECTED]
>>> GDR wrote:
 Thanks Jeremy. That worked!
 But we now have information about the same node being replicated. For
 instance, let's say we have a field 'weight' attached to each node as
 shown below. This setup will replicate the weight information of a
 node as many times as its degree. If the weight of a node changes, I
 will have update all it's occurrences in the PB. Any way I can avoid
 it?
 package graph;
 option java_package = "graph";
 option java_outer_classname = "UndirectedGraphType";
 option optimize_for = CODE_SIZE;
 message UndirectedGraphNodeReference {
required string id = 1;
required double weight = 2;
 }
 message UndirectedGraphNode {
required string id = 1;
repeated UndirectedGraphNodeReference neighbors = 2;
 }
 message UndirectedGraph {
repeated UndirectedGraphNode nodes = 1;
 }
 On Oct 21, 6:37 pm, Jeremy Leader <[EMAIL PROTECTED]> wrote:
> Keep in mind that protobufs describe serialized data, and there's no
> concept of an object reference like Java uses.  In your example, if A
> and B are neighbors, then in your proto, the data representing A
> contains the data representing B, and the data representing B contains
> the data representing A!
> One way around this is to implement your own form of references, perhaps
> using the node ids like this:
> package graph;
> option java_package = "graph";
> option java_outer_classname = "UndirectedGraph";
> option optimize_for = CODE_SIZE;
> message UndirectedGraphNodeReference {
>required string id = 0;
> }
> message UndirectedGraphNode {
>required string id = 0;
>repeated UndirectedGraphNodeReference neighbors;
> }
> message UndirectedGraph {
>repeated UndirectedGraphNode nodes;
> }
> --
> Jeremy Leader
> [EMAIL PROTECTED]
> GDR wrote:
>> Hi,
>> I'm wondering how would one go about implementing self-referential
>> data structures?  As an exercise, I tried to implement a PB version of
>> the adjacency list representation of a graph. I'm having a hard time
>> getting it work. Any suggestions?
>> Thanks!
>> --- graph.proto ---
>> package graph;

Re: Data structures using protocol buffers

2008-10-22 Thread GDR

my bad. The code snippet should be as follows:

for(UndirectedGraphNode node : UndirectedGraph.getNodesList() ) {
   double sum = 0;
   int count = 0;
   for(UndirectedGraphNodeReference neighbor :
node.getNeighborsList() ) {
 sum += 
 count++;
   }
   node.setWeight(sum/count);

}

- graph.proto -

package graph;

option java_package = "graph";
option java_outer_classname = "UndirectedGraphType";
option optimize_for = CODE_SIZE;

message UndirectedGraphNodeReference {
   required string id = 1;
}

message UndirectedGraphNode {
   required string id = 1;
   required double weight = 2;
   repeated UndirectedGraphNodeReference neighbors = 2;
}

message UndirectedGraph {
   repeated UndirectedGraphNode nodes = 1;
}

On Oct 22, 2:36 pm, GDR <[EMAIL PROTECTED]> wrote:
> That does solve the duplicate information problem but it makes updates
> to node attributes (like weight) difficult. Let's say, I want to
> assign the weight of each node to the average of its neighbors.
>
> for(UndirectedGraphNode node : UndirectedGraph.getNodesList() ) {
>    double sum = 0;
>    int count = 0;
>    for(UndirectedGraphNodeReference neighbor :
> node.getNeighborsList() ) {
>      sum += 
>      count++;
>    }
>    node.setWeight(sum/count);
>
> }
>
> - graph.proto -
>
> package graph;
>
> option java_package = "graph";
> option java_outer_classname = "UndirectedGraphType";
> option optimize_for = CODE_SIZE;
>
> message UndirectedGraphNodeReference {
>    required string id = 1;
>    required double weight = 2;
>
> }
>
> message UndirectedGraphNode {
>    required string id = 1;
>    repeated UndirectedGraphNodeReference neighbors = 2;
>
> }
>
> message UndirectedGraph {
>    repeated UndirectedGraphNode nodes = 1;
>
> }
>
> On Oct 22, 2:14 pm, Jeremy Leader <[EMAIL PROTECTED]> wrote:
>
> > I was assuming all the properties of a node (weight, label, color,
> > whatever) would be in UndirectedGraphNode; UndirectedGraphNodeReference
> > would only have the id and nothing else.
>
> > --
> > Jeremy Leader
> > [EMAIL PROTECTED]
>
> > GDR wrote:
> > > Thanks Jeremy. That worked!
> > > But we now have information about the same node being replicated. For
> > > instance, let's say we have a field 'weight' attached to each node as
> > > shown below. This setup will replicate the weight information of a
> > > node as many times as its degree. If the weight of a node changes, I
> > > will have update all it's occurrences in the PB. Any way I can avoid
> > > it?
>
> > > package graph;
>
> > > option java_package = "graph";
> > > option java_outer_classname = "UndirectedGraphType";
> > > option optimize_for = CODE_SIZE;
>
> > > message UndirectedGraphNodeReference {
> > >    required string id = 1;
> > >    required double weight = 2;
> > > }
>
> > > message UndirectedGraphNode {
> > >    required string id = 1;
> > >    repeated UndirectedGraphNodeReference neighbors = 2;
> > > }
>
> > > message UndirectedGraph {
> > >    repeated UndirectedGraphNode nodes = 1;
> > > }
>
> > > On Oct 21, 6:37 pm, Jeremy Leader <[EMAIL PROTECTED]> wrote:
> > >> Keep in mind that protobufs describe serialized data, and there's no
> > >> concept of an object reference like Java uses.  In your example, if A
> > >> and B are neighbors, then in your proto, the data representing A
> > >> contains the data representing B, and the data representing B contains
> > >> the data representing A!
>
> > >> One way around this is to implement your own form of references, perhaps
> > >> using the node ids like this:
>
> > >> package graph;
>
> > >> option java_package = "graph";
> > >> option java_outer_classname = "UndirectedGraph";
> > >> option optimize_for = CODE_SIZE;
>
> > >> message UndirectedGraphNodeReference {
> > >>    required string id = 0;
>
> > >> }
>
> > >> message UndirectedGraphNode {
> > >>    required string id = 0;
> > >>    repeated UndirectedGraphNodeReference neighbors;
>
> > >> }
>
> > >> message UndirectedGraph {
> > >>    repeated UndirectedGraphNode nodes;
>
> > >> }
>
> > >> --
> > >> Jeremy Leader
> > >> [EMAIL PROTECTED]
>
> > >> GDR wrote:
> > >>> Hi,
> > >>> I'm wondering how would one go about implementing self-referential
> > >>> data structures?  As an exercise, I tried to implement a PB version of
> > >>> the adjacency list representation of a graph. I'm having a hard time
> > >>> getting it work. Any suggestions?
> > >>> Thanks!
> > >>> --- graph.proto ---
> > >>> package graph;
> > >>> option java_package = "graph";
> > >>> option java_outer_classname = "UndirectedGraph";
> > >>> option optimize_for = CODE_SIZE;
> > >>> message UndirectedGraphNode {
> > >>>   required string id = 0;
> > >>>   repeated UndirectedGraphNode neighbors;
> > >>> }
> > >>> message UndirectedGraph {
> > >>>   repeated UndirectedGraphNode nodes;
> > >>> }
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group

Re: Data structures using protocol buffers

2008-10-22 Thread GDR

That does solve the duplicate information problem but it makes updates
to node attributes (like weight) difficult. Let's say, I want to
assign the weight of each node to the average of its neighbors.

for(UndirectedGraphNode node : UndirectedGraph.getNodesList() ) {
   double sum = 0;
   int count = 0;
   for(UndirectedGraphNodeReference neighbor :
node.getNeighborsList() ) {
 sum += 
 count++;
   }
   node.setWeight(sum/count);
}


- graph.proto -

package graph;

option java_package = "graph";
option java_outer_classname = "UndirectedGraphType";
option optimize_for = CODE_SIZE;

message UndirectedGraphNodeReference {
   required string id = 1;
   required double weight = 2;
}

message UndirectedGraphNode {
   required string id = 1;
   repeated UndirectedGraphNodeReference neighbors = 2;
}

message UndirectedGraph {
   repeated UndirectedGraphNode nodes = 1;
}



On Oct 22, 2:14 pm, Jeremy Leader <[EMAIL PROTECTED]> wrote:
> I was assuming all the properties of a node (weight, label, color,
> whatever) would be in UndirectedGraphNode; UndirectedGraphNodeReference
> would only have the id and nothing else.
>
> --
> Jeremy Leader
> [EMAIL PROTECTED]
>
> GDR wrote:
> > Thanks Jeremy. That worked!
> > But we now have information about the same node being replicated. For
> > instance, let's say we have a field 'weight' attached to each node as
> > shown below. This setup will replicate the weight information of a
> > node as many times as its degree. If the weight of a node changes, I
> > will have update all it's occurrences in the PB. Any way I can avoid
> > it?
>
> > package graph;
>
> > option java_package = "graph";
> > option java_outer_classname = "UndirectedGraphType";
> > option optimize_for = CODE_SIZE;
>
> > message UndirectedGraphNodeReference {
> >    required string id = 1;
> >    required double weight = 2;
> > }
>
> > message UndirectedGraphNode {
> >    required string id = 1;
> >    repeated UndirectedGraphNodeReference neighbors = 2;
> > }
>
> > message UndirectedGraph {
> >    repeated UndirectedGraphNode nodes = 1;
> > }
>
> > On Oct 21, 6:37 pm, Jeremy Leader <[EMAIL PROTECTED]> wrote:
> >> Keep in mind that protobufs describe serialized data, and there's no
> >> concept of an object reference like Java uses.  In your example, if A
> >> and B are neighbors, then in your proto, the data representing A
> >> contains the data representing B, and the data representing B contains
> >> the data representing A!
>
> >> One way around this is to implement your own form of references, perhaps
> >> using the node ids like this:
>
> >> package graph;
>
> >> option java_package = "graph";
> >> option java_outer_classname = "UndirectedGraph";
> >> option optimize_for = CODE_SIZE;
>
> >> message UndirectedGraphNodeReference {
> >>    required string id = 0;
>
> >> }
>
> >> message UndirectedGraphNode {
> >>    required string id = 0;
> >>    repeated UndirectedGraphNodeReference neighbors;
>
> >> }
>
> >> message UndirectedGraph {
> >>    repeated UndirectedGraphNode nodes;
>
> >> }
>
> >> --
> >> Jeremy Leader
> >> [EMAIL PROTECTED]
>
> >> GDR wrote:
> >>> Hi,
> >>> I'm wondering how would one go about implementing self-referential
> >>> data structures?  As an exercise, I tried to implement a PB version of
> >>> the adjacency list representation of a graph. I'm having a hard time
> >>> getting it work. Any suggestions?
> >>> Thanks!
> >>> --- graph.proto ---
> >>> package graph;
> >>> option java_package = "graph";
> >>> option java_outer_classname = "UndirectedGraph";
> >>> option optimize_for = CODE_SIZE;
> >>> message UndirectedGraphNode {
> >>>   required string id = 0;
> >>>   repeated UndirectedGraphNode neighbors;
> >>> }
> >>> message UndirectedGraph {
> >>>   repeated UndirectedGraphNode nodes;
> >>> }
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Data structures using protocol buffers

2008-10-22 Thread Jeremy Leader

I was assuming all the properties of a node (weight, label, color, 
whatever) would be in UndirectedGraphNode; UndirectedGraphNodeReference 
would only have the id and nothing else.

-- 
Jeremy Leader
[EMAIL PROTECTED]

GDR wrote:
> Thanks Jeremy. That worked!
> But we now have information about the same node being replicated. For
> instance, let's say we have a field 'weight' attached to each node as
> shown below. This setup will replicate the weight information of a
> node as many times as its degree. If the weight of a node changes, I
> will have update all it's occurrences in the PB. Any way I can avoid
> it?
> 
> package graph;
> 
> option java_package = "graph";
> option java_outer_classname = "UndirectedGraphType";
> option optimize_for = CODE_SIZE;
> 
> message UndirectedGraphNodeReference {
>required string id = 1;
>required double weight = 2;
> }
> 
> message UndirectedGraphNode {
>required string id = 1;
>repeated UndirectedGraphNodeReference neighbors = 2;
> }
> 
> message UndirectedGraph {
>repeated UndirectedGraphNode nodes = 1;
> }
> 
> On Oct 21, 6:37 pm, Jeremy Leader <[EMAIL PROTECTED]> wrote:
>> Keep in mind that protobufs describe serialized data, and there's no
>> concept of an object reference like Java uses.  In your example, if A
>> and B are neighbors, then in your proto, the data representing A
>> contains the data representing B, and the data representing B contains
>> the data representing A!
>>
>> One way around this is to implement your own form of references, perhaps
>> using the node ids like this:
>>
>> package graph;
>>
>> option java_package = "graph";
>> option java_outer_classname = "UndirectedGraph";
>> option optimize_for = CODE_SIZE;
>>
>> message UndirectedGraphNodeReference {
>>required string id = 0;
>>
>> }
>>
>> message UndirectedGraphNode {
>>required string id = 0;
>>repeated UndirectedGraphNodeReference neighbors;
>>
>> }
>>
>> message UndirectedGraph {
>>repeated UndirectedGraphNode nodes;
>>
>> }
>>
>> --
>> Jeremy Leader
>> [EMAIL PROTECTED]
>>
>> GDR wrote:
>>> Hi,
>>> I'm wondering how would one go about implementing self-referential
>>> data structures?  As an exercise, I tried to implement a PB version of
>>> the adjacency list representation of a graph. I'm having a hard time
>>> getting it work. Any suggestions?
>>> Thanks!
>>> --- graph.proto ---
>>> package graph;
>>> option java_package = "graph";
>>> option java_outer_classname = "UndirectedGraph";
>>> option optimize_for = CODE_SIZE;
>>> message UndirectedGraphNode {
>>>   required string id = 0;
>>>   repeated UndirectedGraphNode neighbors;
>>> }
>>> message UndirectedGraph {
>>>   repeated UndirectedGraphNode nodes;
>>> }
> 

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Data structures using protocol buffers

2008-10-22 Thread GDR

Thanks Jeremy. That worked!
But we now have information about the same node being replicated. For
instance, let's say we have a field 'weight' attached to each node as
shown below. This setup will replicate the weight information of a
node as many times as its degree. If the weight of a node changes, I
will have update all it's occurrences in the PB. Any way I can avoid
it?

package graph;

option java_package = "graph";
option java_outer_classname = "UndirectedGraphType";
option optimize_for = CODE_SIZE;

message UndirectedGraphNodeReference {
   required string id = 1;
   required double weight = 2;
}

message UndirectedGraphNode {
   required string id = 1;
   repeated UndirectedGraphNodeReference neighbors = 2;
}

message UndirectedGraph {
   repeated UndirectedGraphNode nodes = 1;
}

On Oct 21, 6:37 pm, Jeremy Leader <[EMAIL PROTECTED]> wrote:
> Keep in mind that protobufs describe serialized data, and there's no
> concept of an object reference like Java uses.  In your example, if A
> and B are neighbors, then in your proto, the data representing A
> contains the data representing B, and the data representing B contains
> the data representing A!
>
> One way around this is to implement your own form of references, perhaps
> using the node ids like this:
>
> package graph;
>
> option java_package = "graph";
> option java_outer_classname = "UndirectedGraph";
> option optimize_for = CODE_SIZE;
>
> message UndirectedGraphNodeReference {
>    required string id = 0;
>
> }
>
> message UndirectedGraphNode {
>    required string id = 0;
>    repeated UndirectedGraphNodeReference neighbors;
>
> }
>
> message UndirectedGraph {
>    repeated UndirectedGraphNode nodes;
>
> }
>
> --
> Jeremy Leader
> [EMAIL PROTECTED]
>
> GDR wrote:
> > Hi,
>
> > I'm wondering how would one go about implementing self-referential
> > data structures?  As an exercise, I tried to implement a PB version of
> > the adjacency list representation of a graph. I'm having a hard time
> > getting it work. Any suggestions?
>
> > Thanks!
>
> > --- graph.proto ---
>
> > package graph;
>
> > option java_package = "graph";
> > option java_outer_classname = "UndirectedGraph";
> > option optimize_for = CODE_SIZE;
>
> > message UndirectedGraphNode {
> >   required string id = 0;
> >   repeated UndirectedGraphNode neighbors;
> > }
>
> > message UndirectedGraph {
> >   repeated UndirectedGraphNode nodes;
> > }
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Data structures using protocol buffers

2008-10-21 Thread Jeremy Leader

Keep in mind that protobufs describe serialized data, and there's no 
concept of an object reference like Java uses.  In your example, if A 
and B are neighbors, then in your proto, the data representing A 
contains the data representing B, and the data representing B contains 
the data representing A!

One way around this is to implement your own form of references, perhaps 
using the node ids like this:

package graph;

option java_package = "graph";
option java_outer_classname = "UndirectedGraph";
option optimize_for = CODE_SIZE;

message UndirectedGraphNodeReference {
   required string id = 0;
}

message UndirectedGraphNode {
   required string id = 0;
   repeated UndirectedGraphNodeReference neighbors;
}

message UndirectedGraph {
   repeated UndirectedGraphNode nodes;
}

-- 
Jeremy Leader
[EMAIL PROTECTED]

GDR wrote:
> Hi,
> 
> I'm wondering how would one go about implementing self-referential
> data structures?  As an exercise, I tried to implement a PB version of
> the adjacency list representation of a graph. I'm having a hard time
> getting it work. Any suggestions?
> 
> Thanks!
> 
> --- graph.proto ---
> 
> package graph;
> 
> option java_package = "graph";
> option java_outer_classname = "UndirectedGraph";
> option optimize_for = CODE_SIZE;
> 
> message UndirectedGraphNode {
>   required string id = 0;
>   repeated UndirectedGraphNode neighbors;
> }
> 
> message UndirectedGraph {
>   repeated UndirectedGraphNode nodes;
> }
> 
> 

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---