I was thinking about this limitation last week and wondered if it would be feasible to add a new value type of IOStream. In code, one would just get/set output/input streams. (in Java) Message.writeTo would switch between streaming the in-mem object data and the referenced input stream(s) to the output stream.  ToByte* and toString of course would still be subject to heap problems with really large data.


There is api in c++, ParseFromIstream, but is there any similar api in

No, there's no Python equivalent right now.

But, the parsed objects are bigger than the original serialized data, so if the original serialized data can't fit in memory, then the parsed objects definitely can't.  In general, protocol buffers are designed to encode small to medium-sized messages, generally less than 1MB (usually much less).  If your data is larger than that, you should split it up into multiple small messages and devise some higher-level container format to wrap them so you can parse one at a time.

In your case, you might try separating the messages from the payload.  That is, remove the blk_data field from Block, and instead write all of the data to the stream *after* the DifferUpload message.  Then on the receiving end, you can parse the whole protocol message first and then use it to write the data directly to the final destination as you read it.

package fileupload;
message Range {
required uint64 start = 1;
required uint32 len = 2;
}
message Block {
required Range r = 1;
required bytes blk_hash = 2;
required bytes blk_data = 3;
}
message DifferUpload {
repeated Block blk = 1;
}
