Re: Question about parsing Protocal buffers

2009-03-23 Thread Jim Sermersheim




I was thinking about this limitation last week and wondered if it would
be feasible to add a new value type of IOStream. In code, one would
just get/set output/input streams. (in Java) Message.writeTo would
switch between streaming the in-mem object data and the referenced
input stream(s) to the output stream.  ToByte* and toString of course
would still be subject to heap problems with really large data.

Jim

Kenton Varda wrote:

  On Mon, Mar 23, 2009 at 1:38 AM, ode 
wrote:
  
There is api in c++, ParseFromIstream, but is there any similar api in
python?
  
  
  No, there's no Python equivalent right now.
  
  
  But, the parsed objects are bigger than the original serialized
data, so if the original serialized data can't fit in memory, then the
parsed objects definitely can't.  In general, protocol buffers are
designed to encode small to medium-sized messages, generally less than
1MB (usually much less).  If your data is larger than that, you should
split it up into multiple small messages and devise some higher-level
container format to wrap them so you can parse one at a time.
  
  
  In your case, you might try separating the messages from the
payload.  That is, remove the blk_data field from Block, and instead
write all of the data to the stream *after* the DifferUpload message.
 Then on the receiving end, you can parse the whole protocol message
first and then use it to write the data directly to the final
destination as you read it.
   
  


On Mar 23, 4:06 pm, ode  wrote:
> hi,
>
>    I'm going to use protocol buffers in http post data, seems
> SerializeToString can be used to generate binary string, but what
if
> the data is very large, is all data serialize in memory?
>
>    The following is the proto file I defined, It is used for file
> upload.
>
> /
> package fileupload;
>
> message Range {
>   required uint64 start = 1;
>   required uint32 len = 2;
>
> }
>
> message Block {
>   required Range r = 1;
>   required bytes blk_hash = 2;
>   required bytes blk_data = 3;
>
> }
>
> message DifferUpload {
>   repeated Block blk = 1;
>
> }
>
> /
>
>    Any solutions>
>
>   Thanks in advance



  
  
  
  
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google
Groups "Protocol Buffers" group. 
To post to this group, send email to protobuf@googlegroups.com 
To unsubscribe from this group, send email to
protobuf+unsubscr...@googlegroups.com 
For more options, visit this group at
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---
  


-- 

  

  Jim Sermersheim
  Senior Software Engineer,
  Applications Development
  
 
  
 m:
801.380.8760
  
  
 l:
801.424.5511
  
  
 f:
801.293.3054
  
  
 e:
jsermersh...@fusionio.com
  

  
  
  
6350 S. 3000 E, 6th floor
Salt Lake City, UT 84121
www.fusionio.com 

  



CONFIDENTIAL

This document and attachments contain information from Fusion-io, Inc. which is confidential and/or legally privileged.
The information is intended only for the use of the individual or entity named on this transmission.
If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or taking of any 
action in reliance on the contents of this emailed information is strictly prohibited, and that the documents should be returned to Fusion-io, Inc. immediately.
In this regard, if you have received this email in error, please notify us by return email immediately.





Re: Question about parsing Protocal buffers

2009-03-23 Thread Kenton Varda
On Mon, Mar 23, 2009 at 1:38 AM, ode  wrote:

>
> There is api in c++, ParseFromIstream, but is there any similar api in
> python?


No, there's no Python equivalent right now.

But, the parsed objects are bigger than the original serialized data, so if
the original serialized data can't fit in memory, then the parsed objects
definitely can't.  In general, protocol buffers are designed to encode small
to medium-sized messages, generally less than 1MB (usually much less).  If
your data is larger than that, you should split it up into multiple small
messages and devise some higher-level container format to wrap them so you
can parse one at a time.

In your case, you might try separating the messages from the payload.  That
is, remove the blk_data field from Block, and instead write all of the data
to the stream *after* the DifferUpload message.  Then on the receiving end,
you can parse the whole protocol message first and then use it to write the
data directly to the final destination as you read it.


>
>
> On Mar 23, 4:06 pm, ode  wrote:
> > hi,
> >
> >I'm going to use protocol buffers in http post data, seems
> > SerializeToString can be used to generate binary string, but what if
> > the data is very large, is all data serialize in memory?
> >
> >The following is the proto file I defined, It is used for file
> > upload.
> >
> > /
> > package fileupload;
> >
> > message Range {
> >   required uint64 start = 1;
> >   required uint32 len = 2;
> >
> > }
> >
> > message Block {
> >   required Range r = 1;
> >   required bytes blk_hash = 2;
> >   required bytes blk_data = 3;
> >
> > }
> >
> > message DifferUpload {
> >   repeated Block blk = 1;
> >
> > }
> >
> > /
> >
> >Any solutions>
> >
> >   Thanks in advance
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Question about parsing Protocal buffers

2009-03-23 Thread ode

There is api in c++, ParseFromIstream, but is there any similar api in
python?

On Mar 23, 4:06 pm, ode  wrote:
> hi,
>
>    I'm going to use protocol buffers in http post data, seems
> SerializeToString can be used to generate binary string, but what if
> the data is very large, is all data serialize in memory?
>
>    The following is the proto file I defined, It is used for file
> upload.
>
> /
> package fileupload;
>
> message Range {
>   required uint64 start = 1;
>   required uint32 len = 2;
>
> }
>
> message Block {
>   required Range r = 1;
>   required bytes blk_hash = 2;
>   required bytes blk_data = 3;
>
> }
>
> message DifferUpload {
>   repeated Block blk = 1;
>
> }
>
> /
>
>    Any solutions>
>
>   Thanks in advance
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Question about parsing Protocal buffers

2009-03-23 Thread ode

hi,

   I'm going to use protocol buffers in http post data, seems
SerializeToString can be used to generate binary string, but what if
the data is very large, is all data serialize in memory?

   The following is the proto file I defined, It is used for file
upload.

/
package fileupload;

message Range {
  required uint64 start = 1;
  required uint32 len = 2;
}

message Block {
  required Range r = 1;
  required bytes blk_hash = 2;
  required bytes blk_data = 3;
}

message DifferUpload {
  repeated Block blk = 1;
}

/

   Any solutions>

  Thanks in advance
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---