Message forwarding and partial parsing

2009-10-07 Thread villintehaspam

Hi,

I am wondering about the best way of forwarding received protocol
buffer messages from one entity to another without having to parse the
entire message just to serialize it again.

My scenario is the following: I have a process A connected to process
B using local IPC. B is in turn connected to process C on another
machine using tcp and C is connected to process D using local IPC. I.e
A-B-C-D.

Process A wants to send messages to process B, C and D, to control the
operations. Process A has no concept of tcp/ip and uses process B to
forward messages to the 'C' processes running on other machines (each
machine has a unique id). Each machine might have several 'D'
processes running (each has a unique id).

The basic message is similar to this:
message MyMessage {
extensions 100 to max;
}

and several messages that would make sense to B, C and D are declared
similar to this:

message MyExtension {
extend MyMessage {
optional MyExtension my_extension = 100;
}
...
}

In a naive implementation, a message sent from A to D would involve
the message being serialized by A, deserialized by B, serialized by B,
deserialized by C, serialized by C and then finally being deserialized
by D. This seems a bit too much to me, so I am hoping that anyone
would be willing to comment on the possible solutions to routing
messages, while minimizing unnecessary serialization/deserialization
overhead.

I have several options:

Option 1:
Extend the MyMessage message with destination information like this:
message MyMessage {
optional MyId destination_id;
...
}

When process B deserializes the message it can look at the
destination_id to decide where to forward the message. The problem
with this would be that some extensions would be recognized by process
B even though they are aimed at process C, which I'm _guessing_ would
mean that the extension would be parsed and then encoded again when
the message is forwarded. So I'm thinking this approach is out.

Option 2:
Extend MyMessage with an internal message:
message MyMessage {
optional MyId destination_id = 1;
optional bytes internal_message = 2;
...
}
Now process B would not have to parse the internal message. However,
process A would have to first serialize the message to a byte
sequence, then insert that into another message and serialize that.
This seems awkward to me.

Option 3:
Extend the header sent on the channel with more information. Right now
I am sending the length of the message first, then the actual
serialized message. This could be extended into more of a header with
the destination id as well. Sounds like a protocol buffer message
would be suitable for use as a header... something like this:

message MyHeader {
optional destination_id = 1;
required uint32 message_length;
}

On the wire, I would still need to first send the length of the header
(or possibly make sure that the header has a fixed length), then the
serialized header followed by the serialized message. Process B could
then simply forward the bytes in the message without having to parse
the contents.


Of these three options, I'm thinking that option 3 is the correct way
to go. Am I missing some functionality provided by protocol buffers
(such as the ability to skip parsing extensions even if they are
recognized or similar or only parse as much as needed)? Am I missing
any problems?



On a somewhat related note, is it possible to parse a partially
transmitted message and continue parsing at a later time when more
data is available? I.e. since I cannot guarantee that all data for a
message is available directly, do I need to buffer data until I know
that I have the entire message (which is what I do today) before
allowing protocol buffers to parse it?

Example: the message X is sent on the wire consisting of a number of
fields. It is delivered on the other side of the connection as a
series of chunks. For instance, in a theoretical scenario the first
chunk could contain the first field descriptor, the first data value
and half the second field descriptor. The next chunk could contain the
second half of the second field descriptor and half the second data
value and the last chunk could contain the rest of the message.

Can I allow protocol buffers to parse the chunks of data as they come
in without having to worry about half field descriptors, half data
values and so on? I see that there are ParsePartialFrom... functions
for messages, but the documentation states that the difference between
these and the regular ParseFrom... functions are that they allow
required fields to be missing. I assume that this means that there is
no partial parse functionality in the sense that partial field
descriptors or partial values can be continued at a later time?

Sorry for a lengthy post... Any comments on either problem are
appreciated!

Cheers,
V




--~--~-~--~~~---~--~~
You received this message because you are subscribed to the 

Protocol Buffers: Protocol message end-group tag did not match expected tag.

2009-10-07 Thread Ramon

Hi there,
I recently started working with Protocol Buffers. I used the
Addressbook Example to become acquainted with the PBs. (http://
code.google.com/intl/de-DE/apis/protocolbuffers/docs/
javatutorial.html)
The only difference is that I use an OutputStream to write the address
book instance (in the example they used a FileOutputStream).
Everything works fine, I compiled the proto file and imported it to my
Java project and even that compiles without errors, but when my client
code tries to get (parse) the addressbook instance from the server the
following merror message appears:

com.google.protobuf.InvalidProtocolBufferException: Protocol message
end-group tag did not match expected tag.
at com.google.protobuf.InvalidProtocolBufferException.invalidEndTag
(InvalidProtocolBufferException.java:73)
at com.google.protobuf.CodedInputStream.checkLastTagWas
(CodedInputStream.java:105)
at com.google.protobuf.AbstractMessageLite$Builder.mergeFrom
(AbstractMessageLite.java:202)
at com.google.protobuf.AbstractMessage$Builder.mergeFrom
(AbstractMessage.java:664)
at protoc.Example$AddressBook.parseFrom(Example.java:929)
at protoc.Proto.createConnection(Proto.java:33)
at protoc.Proto.main(Proto.java:24)


That's the code:

(Server)

...

Person.Builder person = Person.newBuilder();
person.setName(Peter);
person.setId(5);

AddressBook.Builder addressbook = AddressBook.newBuilder();
addressbook.addPerson(person.build());

addressbook.build().writeTo(client.getOutputStream());  //
client is a Socket object
client.close();



(Client)

...

public static void createConnection() {
server = null;
try {
server = new Socket(192.168.1.30, 4141);
System.out.println(Connected to server);
AddressBook mission2 = 
AddressBook.parseFrom(server.getInputStream
());
} catch (UnknownHostException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
closeConnection();
}


The proto file ...



package protoc;

option java_package = protoc;
option java_outer_classname = Example;



message Person {
  required string name = 1;
  required int32 id = 2;
  optional string email = 3;

  enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
  }

  message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];
  }

  repeated PhoneNumber phone = 4;
}

message AddressBook {
  repeated Person person = 1;
}


Can anyone tell me what is wrong?
I can't find my mistake ... :(

Thank you very much in advance!
Ramon

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Save multiple messages to a single file C++

2009-10-07 Thread Petko

Thanks a lot Kenton. I went for the second option. If you are storing
a lot of data in pb format, wrapping around with an outer message
makes you parse all at once. it is like DOM vs SAX for XML I guess.
thanks again!
p.

On Oct 5, 10:17 pm, Kenton Varda ken...@google.com wrote:
 The easiest solution is to create an outer message like:
   message NodeList {
     repeated Node node = 1;
   }

 then write a single NodeList to the file, and read it back again as a single
 NodeList.

 If you really need to read/write individual messages separately (because the
 file is too big to read/write all at once), see:

 http://code.google.com/apis/protocolbuffers/docs/techniques.html#stre...



 On Mon, Oct 5, 2009 at 9:08 PM, Petko petko.bogda...@gmail.com wrote:

  Hello,
  Is there a way to save multiple messages to a single file and then
  read it back? I want to be able to do something like this:

  message Node {
   required ind64 id = 1;
  }

  main.cc:
  fstream out(test, ios::out | ios::binary);
   for (int i = 0; i 20 ; i ++ ) {
     Node n;
     string s;
     n.set_id(i);
     // SAVE n TO out
   }
   out.close();

  fstream in(test, ios::in | ios::binary);
  while (!in.eof()) {
   Node n;
   //READ ONE NODE FROM in TO n
  }
  in.close();

  What should I use in the positions:
  // SAVE n TO out
  and
   //READ ONE NODE FROM in TO n ?

  Any comments and ideas welcome.
  --petko
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Save multiple messages to a single file C++

2009-10-07 Thread Christopher Piggott

This whole topic - how to save multiple messages to a single stream -
comes up frequently enough that I'm starting to think there should be
a more flexible answer than what's in the FAQ.  Declaring a one-byte
End of Object seems like it would be one way to handle it.  Whatever
it is, it should keep in mind that protocol buffers may not be coming
from files, but streaming from sockets (e.g. tcp).

If nothing else, I think this should be addressed for the sake of
consistency.  I've been encoding a 32-bit length before my my
protocol buffers... which works just fine but like I said, consistency
would be helpful.

Just my $0.02.


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Message forwarding and partial parsing

2009-10-07 Thread Kenton Varda
On Wed, Oct 7, 2009 at 5:46 AM, villintehaspam villintehas...@gmail.comwrote:

 I am wondering about the best way of forwarding received protocol
 buffer messages from one entity to another without having to parse the
 entire message just to serialize it again.


It looks like you've figured out all the major options.

One thing I'd encourage you to do if you haven't already is actually profile
your system to find out if repeated parsing and serialization is a real
problem for you.  It may not be a real problem in practice even if it feels
wrong.


 Of these three options, I'm thinking that option 3 is the correct way

to go.


All three options are reasonable.  Option 3 is the most complicated
solution, but probably the most performant.


 Am I missing some functionality provided by protocol buffers
 (such as the ability to skip parsing extensions even if they are
 recognized or similar or only parse as much as needed)? Am I missing
 any problems?


If you are using C++, then all compiled-in extensions will be eagerly
parsed.  If you only compile-in the extensions that each process actually
cares about, that solves your problem.

In Java you provide an ExtensionRegistry listing extensions you care about,
so it's trivial to include only the ones you want.  I'm guessing you aren't
using Java.


 On a somewhat related note, is it possible to parse a partially
 transmitted message and continue parsing at a later time when more
 data is available?


Not without blocking.  The library is designed to parse an entire message at
once.  Allowing partial parsing (without blocking) would be quite
complicated.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



$500 In A Single Day!

2009-10-07 Thread jerlene coover

Here's just a taste of what you'll find out... How to go from zero to
$1,000 in 7 days
http://www.easyinternetbiz.net/index.html
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Message::ByteSize() wrong

2009-10-07 Thread Brenden Matthews

The value returned by Message::ByteSize() does not match the actually
number of bytes that are consumed after writing a message to a
stream.  Example:

some_message m;
/* ... populate m ... */
size_t len = m.ByteSize();
int pos = boost::iostreams::position_to_offset(stream.tellp()); /*
save the current position of the stream */
m.SerializeToOstream(stream);

/* and now this assert will fail */
assert(pos + len == boost::iostreams::position_to_offset
(stream.tellp());

Maybe I'm missing something here, but shouldn't the value returned by
ByteSize() be the same as the actual number of bytes written to the
stream?
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Message::ByteSize() wrong

2009-10-07 Thread Brenden Matthews
Here you go...attached is an example that fails quite reliably (for me).
Compile like so:

mkdir build
cd build
cmake ../
make
./test-pb

On Wed, Oct 7, 2009 at 4:49 PM, Kenton Varda ken...@google.com wrote:

 ByteSize() definitely returns the right value -- if it didn't, tons of
 stuff would be broken.  Can you provide a complete example program that
 demonstrates your problem?
 On Wed, Oct 7, 2009 at 4:01 PM, Brenden Matthews bren...@diddyinc.comwrote:


 The value returned by Message::ByteSize() does not match the actually
 number of bytes that are consumed after writing a message to a
 stream.  Example:

some_message m;
/* ... populate m ... */
size_t len = m.ByteSize();
int pos = boost::iostreams::position_to_offset(stream.tellp()); /*
 save the current position of the stream */
m.SerializeToOstream(stream);

/* and now this assert will fail */
assert(pos + len == boost::iostreams::position_to_offset
 (stream.tellp());

 Maybe I'm missing something here, but shouldn't the value returned by
 ByteSize() be the same as the actual number of bytes written to the
 stream?
 



--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



pbtest.tar.bz2
Description: BZip2 compressed data


Re: Message::ByteSize() wrong

2009-10-07 Thread Kenton Varda
This example is too big for me to debug.  Can't you reproduce this with a
10-line program + proto file?
On Wed, Oct 7, 2009 at 6:11 PM, Brenden Matthews bren...@diddyinc.comwrote:

 Oops, had some bad math in that last sample.  This is more correct (but
 still fails).


 On Wed, Oct 7, 2009 at 5:33 PM, Brenden Matthews bren...@diddyinc.comwrote:

 Here you go...attached is an example that fails quite reliably (for me).
 Compile like so:

 mkdir build
 cd build
 cmake ../
 make
 ./test-pb

 On Wed, Oct 7, 2009 at 4:49 PM, Kenton Varda ken...@google.com wrote:

 ByteSize() definitely returns the right value -- if it didn't, tons of
 stuff would be broken.  Can you provide a complete example program that
 demonstrates your problem?
 On Wed, Oct 7, 2009 at 4:01 PM, Brenden Matthews 
 bren...@diddyinc.comwrote:


 The value returned by Message::ByteSize() does not match the actually
 number of bytes that are consumed after writing a message to a
 stream.  Example:

some_message m;
/* ... populate m ... */
size_t len = m.ByteSize();
int pos = boost::iostreams::position_to_offset(stream.tellp()); /*
 save the current position of the stream */
m.SerializeToOstream(stream);

/* and now this assert will fail */
assert(pos + len == boost::iostreams::position_to_offset
 (stream.tellp());

 Maybe I'm missing something here, but shouldn't the value returned by
 ByteSize() be the same as the actual number of bytes written to the
 stream?
 





--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Message::ByteSize() wrong

2009-10-07 Thread Brenden Matthews
On Wed, Oct 7, 2009 at 6:48 PM, Henner Zeller h.zel...@acm.org wrote:

 Still haven't run it (I only seem to have a too old cmake; a simple
 Makefile with a .proto and .cc file showing the problem would be
 better. Please strip down the example if someone should help debugging
 it ;) )

 Anyway, it seems that you in write_message() to write a header with
 some magic number and the size.
stream  magic8;
stream  (uint8_t)data_types::PB_DATA;
stream  _htole32(message.GetCachedSize());

 .. but wouldn't this write the output in decimal instead of binary
 that you intend ? So it will not be exactly 4 bytes. So better write
 this in binary ;)

 (and BTW, it is not a good idea to use some system macros (such as
 _htole32() .. better use htonl())


Indeed, except I need it to work on Windows with MSVC and MinGW.  I decided
after much fussing around to just roll my own.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---