Re: [protobuf] Sorting protocol buffer messages.

2010-02-15 Thread Henner Zeller
Some random idea: repeated fields support stl-style iterators; should
be possible to use regular std::sort() on it with a custom functor.
You probably can hide a lot of code complexity by writing such
functors and avoiding runtime complexity by keeping stuff sorted,
indeed.

On Sun, Feb 14, 2010 at 11:35, J Jack  wrote:
> Hello,
>
> We're using protocol buffers extensively within our application, in
> what I believe is a somewhat strange manner.
>
> We define a message type, that essentially looks like a struct of an
> id, possibly some data, and a type, and add this include the struct
> within the message (i.e, infinite recursability),  to give us a tree-
> like structure.
>
> We add nodes with some data, as well as nodes to those nodes with some
> other data (specifically related to those nodes), etc.
>
> Quite often, we haveto merge two of these structures based on
> arbitrary rules (for instance, add this to container X, but only if it
> doesn't contain Y, or contain "something" already), etc.
> Right now, we're essentially iterating through both containers at the
> top level matching up ids, etc, and then recursing into each sub-node
> based on the id, this works fine, but, the code isn't as nice as it
> could be.
>
> What I'm wondering is if there's any way to store the messages sorted
> in some manner within each record (by the id), this would allow us to
> just do a binary search on merges, turning time into (I believe) log
> N, drastically speeding up both searches, and merges.
>
> I understand that protocol buffers itself offers no such facility, but
> I'm wondering if implementing one on top is advisable? Essentially I
> guess I'd haveto rebuild the structure everytime I do a merge (in
> order to keep it sorted). Each structure individually is fairly small
> (~1kb), but, we deal with billions of these structures on a daily
> basis(merging, searching through them, etc), I'm worried having to
> keep them always sorted (and hence re-building the structure on each
> merge) will cause memory fragmentation in the long run.
>
> On the top level (i.e, right below the root), we have maybe 10-15
> nodes, each of which contain up to 1000 items, (a random amount of
> these will be nodes that contain subnodes too).
>
> Another idea I had was to store the entire thing as a hashmap of
> hashmaps, but, right now, the coding/overhead for it doesn't seem to
> be worth it.
>
> And finally, is there some way Swap could be used to maintain a sorted
> structure efficiently?
>
> Any responses appreciated,
>
> Thanks!
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Protocol Buffers" group.
> To post to this group, send email to proto...@googlegroups.com.
> To unsubscribe from this group, send email to 
> protobuf+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/protobuf?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Sorting protocol buffer messages.

2010-02-15 Thread J Jack
Hello,

We're using protocol buffers extensively within our application, in
what I believe is a somewhat strange manner.

We define a message type, that essentially looks like a struct of an
id, possibly some data, and a type, and add this include the struct
within the message (i.e, infinite recursability),  to give us a tree-
like structure.

We add nodes with some data, as well as nodes to those nodes with some
other data (specifically related to those nodes), etc.

Quite often, we haveto merge two of these structures based on
arbitrary rules (for instance, add this to container X, but only if it
doesn't contain Y, or contain "something" already), etc.
Right now, we're essentially iterating through both containers at the
top level matching up ids, etc, and then recursing into each sub-node
based on the id, this works fine, but, the code isn't as nice as it
could be.

What I'm wondering is if there's any way to store the messages sorted
in some manner within each record (by the id), this would allow us to
just do a binary search on merges, turning time into (I believe) log
N, drastically speeding up both searches, and merges.

I understand that protocol buffers itself offers no such facility, but
I'm wondering if implementing one on top is advisable? Essentially I
guess I'd haveto rebuild the structure everytime I do a merge (in
order to keep it sorted). Each structure individually is fairly small
(~1kb), but, we deal with billions of these structures on a daily
basis(merging, searching through them, etc), I'm worried having to
keep them always sorted (and hence re-building the structure on each
merge) will cause memory fragmentation in the long run.

On the top level (i.e, right below the root), we have maybe 10-15
nodes, each of which contain up to 1000 items, (a random amount of
these will be nodes that contain subnodes too).

Another idea I had was to store the entire thing as a hashmap of
hashmaps, but, right now, the coding/overhead for it doesn't seem to
be worth it.

And finally, is there some way Swap could be used to maintain a sorted
structure efficiently?

Any responses appreciated,

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.