Re: [protobuf] New protobuf feature proposal: Generated classes for streaming / visitors
I read this proposal somewhat carefully, and thought about it for a couple days. I think something like this might solve the problem that many people have with streams of messages. However, I was wondering a couple things about the design: * It seems to me that this will solve the problem for people who know statically at compile time what types they need to handle from a stream, so they can define the stream type appropriately. Will users find themselves running into the case where they need to handle generic messages, and end up needing to roll their own stream support anyway? I ask this question because I built my own RPC system on top of protocol buffers, and in this domain it is useful to be able to pass unknown messages around, typically as unparsed byte strings. Hence, this streams proposal wouldn't be useful to me, so I'm just wondering: am I an anomaly here, or could it be that many applications will find themselves needing to handle any protocol buffer message in their streams? The Visitor class has two standard implementations: Writer and Filler. MyStream::Writer writes the visited fields to a CodedOutputStream, using the same wire format as would be used to encode MyStream as one big message. Imagine I wanted a different protocol. Eg. I want something that checksums each message, or maybe compresses them, etc. Will I need to subclass MessageType::Visitor for each stream that I want to encode? Or will I need to change the code generator? Maybe this is an unusual enough need that the design doesn't need to be flexible enough to handle this, but it is worth thinking about a little, since features like being able to detect broken streams and resume in the middle are useful. Thanks! Evan -- http://evanjones.ca/ -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] New protobuf feature proposal: Generated classes for streaming / visitors
On Feb 8, 2011, at 13:34 , Kenton Varda wrote: I handle user messages by passing them as bytes, embedded in my own outer message. This is what I do as well, as does protobuf-socket-rpc: http://code.google.com/p/protobuf-socket-rpc/source/browse/trunk/proto/rpc.proto I guess I was thinking that if you already have to do some sort of lookup of the message type that is stored in that byte blob, then maybe you don't need the streaming extension. For example, you could just build a library that produces a sequence of byte strings, which the user of the library can then parse appropriately. I see how you are using it though: it is a friendly wrapper around this simple sequence of byte strings model, that automatically parses that byte string using the tag and schema message. This might be useful for some people. This is somewhat inefficient currently, as it will require an extra copy of all those bytes. However, it seems likely that future improvements to protocol buffers will allow bytes fields to share memory with the original buffer, which will eliminate this concern. Ah cool. I was considering changing my protocol to be two messages: the first one is the descriptor (eg. your CallRequest message), then the second would be the body of the request, which I would then parse based on the type passed in the CallRequest. Note that I expect people will generally only stream their top- level message. Although the proposal allows for streaming sub- messages as well, I expect that people will normally want to parse them into message objects which are handled whole. So, you only have to manually implement the top-level stream, and then you can invoke some reflective algorithm from there. Right, but my concern is that I might want to use this streaming API to write messages into files. In this case, I might have a file containing the FooStream and another file containing the BarStream. I'll have to implement both these ::Writer interfaces, or hack the code generator to generate it for me. Although now that I think about this, the implementation of these two APIs will be relatively trivial... features like being able to detect broken streams and resume in the middle are useful. I'm not sure how this relates. This seems like it should be handled at a lower layer, like in the InputStream -- if the connection is lost, it can re-establish and resume, without the parser ever knowing what happened. Sorry, just an example of why you might want a different protocol. If I've streamed 10e9 messages to disk, I don't want this stream to break if there is some weird corruption in the middle, so I want some protocol that can resume from corruption. Evan -- http://evanjones.ca/ -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] New protobuf feature proposal: Generated classes for streaming / visitors
On Tue, Feb 8, 2011 at 11:23 AM, Evan Jones ev...@mit.edu wrote: Sorry, just an example of why you might want a different protocol. If I've streamed 10e9 messages to disk, I don't want this stream to break if there is some weird corruption in the middle, so I want some protocol that can resume from corruption. Ah, yes. This isn't an appropriate protocol for enormous files. It's more targeted at network protocols. Although, you might be able to build a decent seekable file protocol on top of it, by choosing a random string to use as a sync point, then writing that string every now and then... message FileStream { repeated string sync_point = 1; repeated Foo foo = 2; repeated Bar bar = 3; ... } When writing, after every few messages, write a copy of sync_point. Then, you can seek to an arbitrary position in the file by looking for a nearby copy of the sync point byte sequence, and starting to parse immediately after that. The sync point just needs to be an 128-bit (or so) cryptographically random sequence, chosen differently for each file, so that there's no chance that the bytes will appear in the file by accident. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.