Hi there, we're using protobuf messages here to do some kind of RPC over HTTP. Now we have the need for a service which returns either a metadata structure or a message containing some large binary data.
The caller does not know the return type upon request instanciation (eg. a call getContent(pathname) were pathname is either a directory or a file - for directory a list of files is returned and for files the content is returned - packed into the same structure). No problem so far. It would be nice to add an attachment without reading its content into memory. But: these binary contents will become very large (hundreds of MBytes) and as I understood the documentation protobuf is designed to transfer large binary data (and as the Java developers told me, reading a byte array with protobuf creates 3 copies of it in memory). So what would be a good solution to this? We thought of the following solutions: 1. Make this a transport issue :) Replace the buffer in the message which contains the attachment by a unique identifier which references an attachment. Attachments are added to the RpcChannel / RpcService implementation as a reference to some stream instance where they can be read from. At transport level the attachments will be delivered as parts of a multipart HTTP request / response identified by a Content-Id header. Sure - when reading attachments in the wrong order some attachments have to be read into memory - this should be no problem. This approach does not add full-fledged support of attachments to protocol buffers messages because attachments will be stored externally and methods like 'SerializeToOStream()' will not work anymore because these don't know anything about attachments. 2. Add attachment support to protocol buffers :) What about adding a new datatype called 'attachment'? Lets create a new datatype called Attachment which holds either a reference to a stream or a pointer to a buffer containing the binary data. When serializing a message all instances of the Attachment class will be put into a list and will be appended after the message. The fields of type attachmend will become some kind of reference to an attachment (eg. an integer type which marks the attachment index). Deserializing a message could be done by another method which creates an instance of Attachment for each referenced attachment (eg. parseFromStreamWithoutAttachments). An application should query the existence of an attachment before accessing it. If it does not exist yet, the application should call 'loadAttachment (attachment_identifier, Attachment*)' which appends the content of an attachment to a previously created instance of Attachment (remember: pointing to a buffer or to a stream). Methods like parseFromStream or parseFromFile will create Attachment instances automatically and load the content into memory. Of course there should be another method which reads from the stream until the last attachment has been read. Thus we should be able to use all of the common serialization and parsing methods and if we're interested in handling attachments efficiently, we should use the parseFromStreamWithoutAttachments and loadAttachment methods. I am not sure how attachments of embedded messages could be handled - should these be serialized at the end of the message itself or at the end of the containing message(s)? Of course, this requires an instance of the stream where the attachments could be read from in the application. What do you think? - Would this be a solution to the attachment issue and would you like to see this in protocol buffers? - How would you deal with the issue? Regards, Ronny
-- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.