Re: [protobuf] New protobuf feature proposal: Generated classes for streaming / visitors

2011-02-08 Thread Evan Jones
I read this proposal somewhat carefully, and thought about it for a  
couple days. I think something like this might solve the problem that  
many people have with streams of messages. However, I was wondering a  
couple things about the design:



* It seems to me that this will solve the problem for people who know  
statically at compile time what types they need to handle from a  
stream, so they can define the stream type appropriately. Will users  
find themselves running into the case where they need to handle  
generic messages, and end up needing to roll their own stream  
support anyway?


I ask this question because I built my own RPC system on top of  
protocol buffers, and in this domain it is useful to be able to pass  
unknown messages around, typically as unparsed byte strings. Hence,  
this streams proposal wouldn't be useful to me, so I'm just wondering:  
am I an anomaly here, or could it be that many applications will find  
themselves needing to handle any protocol buffer message in their  
streams?



The Visitor class has two standard implementations:  Writer and  
Filler.  MyStream::Writer writes the visited fields to a  
CodedOutputStream, using the same wire format as would be used to  
encode MyStream as one big message.


Imagine I wanted a different protocol. Eg. I want something that  
checksums each message, or maybe compresses them, etc. Will I need to  
subclass MessageType::Visitor for each stream that I want to encode?  
Or will I need to change the code generator? Maybe this is an unusual  
enough need that the design doesn't need to be flexible enough to  
handle this, but it is worth thinking about a little, since features  
like being able to detect broken streams and resume in the middle  
are useful.


Thanks!

Evan

--
http://evanjones.ca/

--
You received this message because you are subscribed to the Google Groups Protocol 
Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] New protobuf feature proposal: Generated classes for streaming / visitors

2011-02-08 Thread Evan Jones

On Feb 8, 2011, at 13:34 , Kenton Varda wrote:
I handle user messages by passing them as bytes, embedded in my  
own outer message.


This is what I do as well, as does protobuf-socket-rpc:

http://code.google.com/p/protobuf-socket-rpc/source/browse/trunk/proto/rpc.proto


I guess I was thinking that if you already have to do some sort of  
lookup of the message type that is stored in that byte blob, then  
maybe you don't need the streaming extension. For example, you could  
just build a library that produces a sequence of byte strings, which  
the user of the library can then parse appropriately.


I see how you are using it though: it is a friendly wrapper around  
this simple sequence of byte strings model, that automatically  
parses that byte string using the tag and schema message. This might  
be useful for some people.


This is somewhat inefficient currently, as it will require an extra  
copy of all those bytes.  However, it seems likely that future  
improvements to protocol buffers will allow bytes fields to share  
memory with the original buffer, which will eliminate this concern.


Ah cool. I was considering changing my protocol to be two messages:  
the first one is the descriptor (eg. your CallRequest message), then  
the second would be the body of the request, which I would then  
parse based on the type passed in the CallRequest.



Note that I expect people will generally only stream their top- 
level message.  Although the proposal allows for streaming sub- 
messages as well, I expect that people will normally want to parse  
them into message objects which are handled whole.  So, you only  
have to manually implement the top-level stream, and then you can  
invoke some reflective algorithm from there.


Right, but my concern is that I might want to use this streaming API  
to write messages into files. In this case, I might have a file  
containing the FooStream and another file containing the BarStream.  
I'll have to implement both these ::Writer interfaces, or hack the  
code generator to generate it for me. Although now that I think about  
this, the implementation of these two APIs will be relatively trivial...



features like being able to detect broken streams and resume in  
the middle are useful.
I'm not sure how this relates.  This seems like it should be handled  
at a lower layer, like in the InputStream -- if the connection is  
lost, it can re-establish and resume, without the parser ever  
knowing what happened.


Sorry, just an example of why you might want a different protocol. If  
I've streamed 10e9 messages to disk, I don't want this stream to break  
if there is some weird corruption in the middle, so I want some  
protocol that can resume from corruption.


Evan

--
http://evanjones.ca/

--
You received this message because you are subscribed to the Google Groups Protocol 
Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] New protobuf feature proposal: Generated classes for streaming / visitors

2011-02-08 Thread Kenton Varda
On Tue, Feb 8, 2011 at 11:23 AM, Evan Jones ev...@mit.edu wrote:

 Sorry, just an example of why you might want a different protocol. If I've
 streamed 10e9 messages to disk, I don't want this stream to break if there
 is some weird corruption in the middle, so I want some protocol that can
 resume from corruption.


Ah, yes.  This isn't an appropriate protocol for enormous files.  It's
more targeted at network protocols.

Although, you might be able to build a decent seekable file protocol on top
of it, by choosing a random string to use as a sync point, then writing that
string every now and then...

  message FileStream {
repeated string sync_point = 1;

repeated Foo foo = 2;
repeated Bar bar = 3;
...
  }

When writing, after every few messages, write a copy of sync_point.  Then,
you can seek to an arbitrary position in the file by looking for a nearby
copy of the sync point byte sequence, and starting to parse immediately
after that.  The sync point just needs to be an 128-bit (or so)
cryptographically random sequence, chosen differently for each file, so that
there's no chance that the bytes will appear in the file by accident.

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.