Re: [protobuf] incompatible type changes philosophy

2012-05-08 Thread Daniel Wright
On Tue, May 8, 2012 at 4:42 PM, Jeremy Stribling st...@nicira.com wrote:

 I'm working on a project to upgrade- and downgrade-proof a distributed
 system that uses protobufs to communicate data between instances of a C
 ++ program.  I'm trying to cover all possible cases for data schema
 changes between versions of my programs, and I was hoping to get some
 insight from the community on what the best practice is for the
 following tricky scenario.

 To reduce serialization type and protobuf message size, the format of
 a field in a message is changed between incompatible types.  For
 example, a string field gets changed to an int, or perhaps a field
 gets changed from one message type to another.  Because this is being
 done as an optimization, it makes no sense to keep both versions of
 the data around, so I think whether we change the field ID is not
 relevant -- we only ever want to have one version of the field in any
 particular protobuf.


Even though you don't keep both versions of the data around, you should
keep both fields around, and have the code be able to read from whichever
is set during the transition.  You can rename the old one (say put
deprecated in the name) so that people know that it's old, but don't
actually remove it from the .proto file until no old instances of the proto
remain.  To put it more concretely, say you have

  optional string my_data = 1;

Now you come up with a way to encode it as an int64 instead.  You'd change
the .proto to:

  optional string deprecated_my_data = 1;
  optional int64 my_data = 2;

- At this point, you write the data to deprecated_my_data and not
my_data, but when you read, you check has_my_data() and
has_deprecated_my_data() and read from whichever one is present.  It might
help to wrapper functions for reading and writing during the transition if
the field is accessed in many places.

- once all instances of the program have been re-compiled so they all know
about the new int64 field, you can start writing to my_data and not
deprecated_my_data.

- once all of the instances of the program have been recompiled again, you
can remove the code that reads deprecated_my_data, and delete the field.

This is kind of painful, but it's much cleaner than adding a version
number.  It also only ever writes the data to one field, so there's no
bloat during the transition.

Daniel

Of course, this makes communicating between versions of the program
 very difficult, and I think it requires there to be some kind of
 translator code to transform the field from one format to the other.
 Ideally, this transformation would be invisible to the rest of the
 program.  One ugly thought I had was to have a version field in every
 message, and then in the autogenerated C++ serialize code, maybe in
 MergePartialCodedFromStream, I could insert a call to an external
 translator program that would transform the input bytes into something
 that could be decoded by the version of the message expected by this
 instance of the program.  I don't think there's an insertion point
 defined for this part of the code, so I'd have to write my own script
 to do it.  The external translator program could be upgraded
 independently of the main program, so older versions would know how to
 intepret the fields of the newer versions.

 I'm wondering if anyone has experience with a scenario like this, and
 if there's a more elegant way to solve it.  If not, what do folks
 think of this business of an external translator program?  Foolish
 nonsense?  Worthy of a proper insertion point?

 Thanks,

 Jeremy

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to protobuf@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Re: suggestions on improving the performance?

2012-01-15 Thread Daniel Wright
)
 {
int item_count = 0;
while(1)
{
CodedInputStream in(raw_input);
if(!ReadNextRecord(in, instruments))
break;
item_count++;
}
cout  Finished reading file. Total item_count items
 read.endl;
 }


 int _tmain(int argc, _TCHAR* argv[])
 {
GOOGLE_PROTOBUF_VERIFY_VERSION;

ZeroCopyInputStream *raw_input;
CodedInputStream *coded_input;
stdext::hash_setstd::string instruments;

string filename = S:/users/aaj/sandbox/tickdata/bin/hk/
 2011/2011.01.04.bin;
int fd = _open(filename.c_str(), _O_BINARY | O_RDONLY);

if( fd == -1 )
{
printf( Error opening the file. \n );
exit( 1 );
}

raw_input = new FileInputStream(fd);
coded_input = new CodedInputStream(raw_input);

uint32 magic_no;

coded_input-ReadLittleEndian32(magic_no);

cout  HEADER:   \t  magic_noendl;
cout  Reading data objects..  endl;
delete coded_input;
cout  td  '\n';

ReadAllMessages(raw_input, instruments);

cout  td  '\n';

delete raw_input;
_close(fd);
google::protobuf::ShutdownProtobufLibrary();

return 0;
 }

 /code


 On Jan 14, 3:37 am, Henner Zeller henner.zel...@googlemail.com
 wrote:
  On Fri, Jan 13, 2012 at 11:22, Daniel Wright dwri...@google.com wrote:
   It's extremely unlikely that text parsing is faster than binary
 parsing on
   pretty much any message.  My guess is that there's something wrong in
 the
   way you're reading the binary file -- e.g. no buffering, or possibly a
 bug
   where you hand the protobuf library multiple messages concatenated
 together.
 
  In particular, the
 object type, object, object type object ..
  doesn't seem to include headers that describe the length of the
  following message, but such a separator is needed.
  (http://code.google.com/apis/protocolbuffers/docs/techniques.html#stre..
 .)
 
 
 
 
 
 
 
It'd be easier to comment if you post the code.
 
   Cheers
   Daniel
 
   On Fri, Jan 13, 2012 at 1:22 AM, alok alok.jad...@gmail.com wrote:
 
   any suggestions? experiences?
 
   regards,
   Alok
 
   On Jan 11, 1:16 pm, alok alok.jad...@gmail.com wrote:
my point is ..should i have one message something like
 
Message Record{
  required HeaderMessage header;
  optional TradeMessage trade;
  repeated QuoteMessage quotes; // 0 or more
  repeated CustomMessage customs; // 0 or more
 
}
 
or rather should i keep my file plain as
object type, object, objecttype, object
without worrying about the concept of a record.
 
Each message in file is usually header + any 1 type of message
 (trade,
quote or custom) ..  and mostly only 1 quote or custom message not
more.
 
what would be faster to decode?
 
Regards,
Alok
 
On Jan 11, 12:41 pm, alok alok.jad...@gmail.com wrote:
 
 Hi everyone,
 
 My program is taking more time to read binary files than the text
 files. I think the issue is with the structure of the binary files
 that i have designed. (Or could it be possible that binary
 decoding is
 slower than text files parsing? ).
 
 Data file is a large text file with 1 record per row. upto 1.2 GB.
 Binary file is around 900 MB.
 
 **
  - Text file reading takes 3 minutes to read the file.
  - Binary file reading takes 5 minutes.
 
 I saw a very strange behavior.
  - Just to see how long it takes to skim through binary file, i
 started reading header on each message which holds the length of
 the
 message and then skipped that many bytes using the Skip()
 function of
 coded_input object. After making this change, i was expecting that
 reading through file should take less time, but it took more than
 10
 minutes. Is skipping not same as adding n bytes to the file
 pointer?
 is it slower to skip the object than read it?
 
 Are their any guidelines on how the structure should be designed
 to
 get the best performance?
 
 My current structure looks as below
 
 message HeaderMessage {
   required double timestamp = 1;
   required string ric_code = 2;
   required int32 count = 3;
   required int32 total_message_size = 4;
 
 }
 
 message QuoteMessage {
 enum Side {
 ASK = 0;
 BID = 1;
   }
   required Side type = 1;
 required int32 level = 2;
 optional double price = 3;
 optional int64 size = 4;
 optional int32 count = 5;
 optional HeaderMessage header = 6;
 
 }
 
 message CustomMessage {
 required string field_name = 1;
 required double value = 2;
 optional HeaderMessage header = 3;
 
 }
 
 message TradeMessage {
 optional double price = 1;
 optional int64 size

Re: [protobuf] Re: suggestions on improving the performance?

2012-01-13 Thread Daniel Wright
It's extremely unlikely that text parsing is faster than binary parsing on
pretty much any message.  My guess is that there's something wrong in the
way you're reading the binary file -- e.g. no buffering, or possibly a bug
where you hand the protobuf library multiple messages concatenated
together.  It'd be easier to comment if you post the code.

Cheers
Daniel

On Fri, Jan 13, 2012 at 1:22 AM, alok alok.jad...@gmail.com wrote:

 any suggestions? experiences?

 regards,
 Alok

 On Jan 11, 1:16 pm, alok alok.jad...@gmail.com wrote:
  my point is ..should i have one message something like
 
  Message Record{
required HeaderMessage header;
optional TradeMessage trade;
repeated QuoteMessage quotes; // 0 or more
repeated CustomMessage customs; // 0 or more
 
  }
 
  or rather should i keep my file plain as
  object type, object, objecttype, object
  without worrying about the concept of a record.
 
  Each message in file is usually header + any 1 type of message (trade,
  quote or custom) ..  and mostly only 1 quote or custom message not
  more.
 
  what would be faster to decode?
 
  Regards,
  Alok
 
  On Jan 11, 12:41 pm, alok alok.jad...@gmail.com wrote:
 
 
 
 
 
 
 
   Hi everyone,
 
   My program is taking more time to read binary files than the text
   files. I think the issue is with the structure of the binary files
   that i have designed. (Or could it be possible that binary decoding is
   slower than text files parsing? ).
 
   Data file is a large text file with 1 record per row. upto 1.2 GB.
   Binary file is around 900 MB.
 
   **
- Text file reading takes 3 minutes to read the file.
- Binary file reading takes 5 minutes.
 
   I saw a very strange behavior.
- Just to see how long it takes to skim through binary file, i
   started reading header on each message which holds the length of the
   message and then skipped that many bytes using the Skip() function of
   coded_input object. After making this change, i was expecting that
   reading through file should take less time, but it took more than 10
   minutes. Is skipping not same as adding n bytes to the file pointer?
   is it slower to skip the object than read it?
 
   Are their any guidelines on how the structure should be designed to
   get the best performance?
 
   My current structure looks as below
 
   message HeaderMessage {
 required double timestamp = 1;
 required string ric_code = 2;
 required int32 count = 3;
 required int32 total_message_size = 4;
 
   }
 
   message QuoteMessage {
   enum Side {
   ASK = 0;
   BID = 1;
 }
 required Side type = 1;
   required int32 level = 2;
   optional double price = 3;
   optional int64 size = 4;
   optional int32 count = 5;
   optional HeaderMessage header = 6;
 
   }
 
   message CustomMessage {
   required string field_name = 1;
   required double value = 2;
   optional HeaderMessage header = 3;
 
   }
 
   message TradeMessage {
   optional double price = 1;
   optional int64 size = 2;
   optional int64 AccumulatedVolume = 3;
   optional HeaderMessage header = 4;
 
   }
 
   Binary file format is
   object type, object, object type object ...
 
   1st object of a record holds header with n number of objects in that
   record. next n-1 objects will not hold header since they all belong to
   same record (same update time).
   now n+1th object belongs to the new record and it will hold header for
   next record.
 
   Any advices?
 
   Regards,
   Alok

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to protobuf@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Storing protocol buffers: binary vs. text

2011-12-11 Thread Daniel Wright
The main concern with text format is that it doesn't have nearly as good
backwards- and forwards- compatibility as the binary format.  E.g. what
happens if you release your program, and then in a future update want to
remove or rename a field?  The new binary format code would have no trouble
reading the existing data, but if the existing data was in text format it
would be a problem.

On Sat, Dec 10, 2011 at 11:05 AM, Tom Swirly tom.ritchf...@gmail.comwrote:

 Hello, proto-people.  I suspect some of you know me already...

 I'm just finishing a moderately large project that makes extensive use of
 protocol buffers as a storage format for data files for a consumer desktop
 application.  (The protos are working extremely well, of course, and I have
 a really slick object-oriented persistence mechanism with them that's
 really useful, but that's for another day).

 I have a flag that lets me store the protocol buffers either as text
 (using the Print* and Parse* methods from google::protobuf::TextFormat) or
 serialized.  It's of course much easier to keep them as text when I'm
 developing, and since the files are pretty tiny (though there are a lot of
 them) I'm thinking of keeping them as text files even for the first public
 release.

 But this got me thinking.  If I see a file I haven't seen before that
 might be either a binary proto or a text proto, why can't I try to parse it
 as text, and then if that fails, as binary?

 Yes, yes, this has some spiritual dubiousness.  Nothing in the proto
 definition precludes the idea that the binary form of one proto buffer
 cannot be the text form of another.

 And there's certainly the case of the empty file - which could either be
 the text string representing the default protocol buffer, or the binary
 string representing that same protocol buffer.  But in that case, I don't
 care.

 But practically speaking, I don't see how this would not work.  If I try
 to read a binary format as text, then Between the wire types and my
 protocol buffer field IDs (which are all less than 32), the text parsing
 has to run into an unprintable byte very soon and terminate...

 Am I right?  It's not a big deal if not...

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To view this discussion on the web visit
 https://groups.google.com/d/msg/protobuf/-/yYNE3tPUtcgJ.
 To post to this group, send email to protobuf@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Storing a Protocol Buffer inside another Protocol Buffer

2011-08-09 Thread Daniel Wright
Both methods that you mention will work and are fairly common.  The first is
generally preferred because you're less likely to have type errors, and
debugging tools can show you more of RPCBuffer (e.g. the rpc library could
print the DebugString of the RPCBuffer in debugging mode, showing exactly
what's in the message).

In terms of performance, the second approach will make one extra string copy
of messageBuffer -- it shouldn't be a significant issue unless the code is
extremely cpu and memory sensitive.

Within Google we handle the unique id issue by using changelist numbers --
our source control system generates unique numbers for every change, so it's
pretty easy to just use those numbers as the field ids.

Daniel

On Tue, Aug 9, 2011 at 7:52 AM, Dave dave.johns...@me.com wrote:

 Hi,

 We have some legacy code, that provides an RPC library for several
 services.  All RPC messages have a handler name, which is used to
 determine which callback, a message should be sent to when it is
 received (when a service starts up it registers 1 or more handlers).
 I have been working towards replacing the marshalling/unmarshalling
 code with google protocol buffers, and currently have messages like
 this:

 message RPCBuffer {
   required string handler = 1;
   extensions 100 to max;
 }

 message example {

extend RPCBuffer {
optional string name = 110;
optional int32 num = 111;
}
 }

 Now the legacy code behaves the same way as before, when it receives a
 message it uses a ParseFrom method to construct the RPCBuffer protocol
 buffer. It then gets the 'handler', and uses it to send the entire
 buffer to the appropriate callback.  The callback then uses the
 extensions API to access data from inside the buffer.

 This works, but I can see it getting messy, as over time the different
 services that extend RPCBuffer, have to make sure that they use unique
 identifiers.


 An alternative approach maybe this:

 message RPCBuffer {
   required string handler = 1;
   required bytes messageBuffer = 2;
 }

 message example {
required string name = 1;
required int32 num = 2;
 }

 The code will still, use the handler name to determine which callback
 to call, but instead of passing the entire protocol buffer, it would
 pass the 'messageBuffer'.  The callback, would then construct the
 'example' protocol buffer, calling ParseFrom and passing in the
 messageBuffer.  The callback, can then access the data members uses
 the usual google protocol buffer API.

 The only problem I have with the second approach, is that there are
 effectively two unmarshalls (ParseFrom() is called twice, once on
 the RPCBuffer, and then once on the example buffer).   Will this
 provide significant overhead in terms of memory and CPU ?

 Any comments/suggestions ?

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to protobuf@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Dynamic Message

2010-11-05 Thread Daniel Wright
This is exactly what extensions are for -- see
http://code.google.com/apis/protocolbuffers/docs/proto.html#extensions

It would look something like:

message BaseMessage {
  required MsgType type = 1;
  extensions 100 to 10;
}

Then each module would have a message like:

message Msg1 {
  extend BaseMessage {
optional Msg1 msg1 = some_unique_id
  }
  required int32 field = 1;
}

Then the main program can pass the entire BaseMessage to the right module
based on type, and the module can retrieve the parsed extension.

On Fri, Nov 5, 2010 at 10:31 AM, AdamM adammaga...@gmail.com wrote:

 Hello PB Group,

 I am programming a group of programs in C++ that use PB to
 communicate. In my first version of the program I was using a .proto
 file that looked similar to the example 1 below. I would then run a
 switch statement on a MsgType and create the proper message.

 I am now starting to write my second version of the program to include
 a plugin module architecture. My program calls for a core program and
 then multiple modules that are written against a base module. Well in
 my core program I get packets over the network and then parse them
 info a PB Message.

 I would like a way to have some sort of Base Message that the my core
 program would use to parse with the  base message would contain a
 packet packet operation code and the data for the actual message
 structure. I want to program it so the Core program has no idea what
 the actual message structure is. It would pass it to the respective
 module based on the operation code who would then  parse the actual
 message structure because it knows what that structure should be.

 Does anyone have any suggestions how to do this? Any help would be
 much appreciated.

 Example 1.

 enum MsgType {
   MSG1,
   MSG2,
   MSG3
 }

 Message Msg1 {
   required int32 field = 1 ;
 }

 Message Msg2 {
   required int32 field = 1 ;
 }

 Message Msg3 {
   required int32 field = 1 ;
 }

 Message BaseMessage {
   required MsgType type = 1 ;
   optional Msg1 = 2 ;
   optional Msg2 = 3 ;
   optional Msg3 = 4 ;
 }

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Re: Dynamic Message

2010-11-05 Thread Daniel Wright
On Fri, Nov 5, 2010 at 2:12 PM, AdamM adammaga...@gmail.com wrote:

 Thank you Daniel is was not aware of this feature in PB I will give it
 a try.


There is a slight gain for smaller numbers, but it's small enough that it's
not worth the maintenance headache of trying to choose unique small numbers.
 Basically, tags with numbers less than 16 use 1 byte.  16 to 2048 use two
bytes, 2048 to 262144 use 3 bytes, 262144 to 33554432 use 4 bytes etc (you
get 7 bits per byte, except for the first byte which only gets 4 bits.
http://code.google.com/apis/protocolbuffers/docs/encoding.html has the
details).

Note that the actual extensions 100 to 10; line in the .proto file
costs nothing -- the gain I'm talking about above is when you encode a
message with a given field number.

One question though is there any optimization gain you get from using
 extensions 100 to 200; over extensions 100 to 10;?

 On Nov 5, 3:21 pm, Daniel Wright dwri...@google.com wrote:
  This is exactly what extensions are for -- seehttp://
 code.google.com/apis/protocolbuffers/docs/proto.html#extensions
 
  It would look something like:
 
  message BaseMessage {
required MsgType type = 1;
extensions 100 to 10;
 
  }
 
  Then each module would have a message like:
 
  message Msg1 {
extend BaseMessage {
  optional Msg1 msg1 = some_unique_id
}
required int32 field = 1;
 
  }
 
  Then the main program can pass the entire BaseMessage to the right module
  based on type, and the module can retrieve the parsed extension.
 
 
 
 
 
 
 
  On Fri, Nov 5, 2010 at 10:31 AM, AdamM adammaga...@gmail.com wrote:
   Hello PB Group,
 
   I am programming a group of programs in C++ that use PB to
   communicate. In my first version of the program I was using a .proto
   file that looked similar to the example 1 below. I would then run a
   switch statement on a MsgType and create the proper message.
 
   I am now starting to write my second version of the program to include
   a plugin module architecture. My program calls for a core program and
   then multiple modules that are written against a base module. Well in
   my core program I get packets over the network and then parse them
   info a PB Message.
 
   I would like a way to have some sort of Base Message that the my core
   program would use to parse with the  base message would contain a
   packet packet operation code and the data for the actual message
   structure. I want to program it so the Core program has no idea what
   the actual message structure is. It would pass it to the respective
   module based on the operation code who would then  parse the actual
   message structure because it knows what that structure should be.
 
   Does anyone have any suggestions how to do this? Any help would be
   much appreciated.
 
   Example 1.
 
   enum MsgType {
 MSG1,
 MSG2,
 MSG3
   }
 
   Message Msg1 {
 required int32 field = 1 ;
   }
 
   Message Msg2 {
 required int32 field = 1 ;
   }
 
   Message Msg3 {
 required int32 field = 1 ;
   }
 
   Message BaseMessage {
 required MsgType type = 1 ;
 optional Msg1 = 2 ;
 optional Msg2 = 3 ;
 optional Msg3 = 4 ;
   }
 
   --
   You received this message because you are subscribed to the Google
 Groups
   Protocol Buffers group.
   To post to this group, send email to proto...@googlegroups.com.
   To unsubscribe from this group, send email to
   protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 protobuf%2bunsubscr...@googlegroups.c om
   .
   For more options, visit this group at
  http://groups.google.com/group/protobuf?hl=en.

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Message vs MessageLite

2010-10-26 Thread Daniel Wright
Yes -- the serialized format is identical.

On Tue, Oct 26, 2010 at 2:56 PM, Alsmom2005 gundanu...@gmail.com wrote:

 Hi all,

 Is it ok if the serialization is made using libprotobuf library and
 the deserialization (on the other end) is made using code built with
 libprotobuf-lite library ? That meaning 2 .proto files (the only
 difference bw those two file is that one contains 'option optimize_for
 = LITE_RUNTIME') .

 Thank you in advance!

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] outer classname for C++

2010-10-25 Thread Daniel Wright
No -- in C++ the message classes are placed directly in a namespace named
after the package, so there's no outer class.

On Mon, Oct 25, 2010 at 11:42 AM, Paul mjpabl...@gmail.com wrote:

 Hi,

 In the example of the .proto definition in Java, there is a

 option java_outer_classname = AddressBookProtos;

 Is there an equivalent statement for C++?

 Thanks,
 Paul

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] serialize to a file using FileOutputStream

2010-10-11 Thread Daniel Wright
I think it's being buffered in the FileOutputStream -- you should be sure to
delete the output streams (in the reverse order that you created them)
before you close the file.

On Mon, Oct 11, 2010 at 1:07 PM, Paul Yang mjpabl...@gmail.com wrote:

 Hi,
 I am new to protocol buffers, and I am trying to serialize a message
 to a file.  I need to serialize the file so that in can be opened in
 Java using parseDelimitedFrom().  However, when I write to the file
 using SerializeToCodedStream or SerializeWithCachedSizes, nothing
 happens to the serializedMessage.bin file, and it stays at 0 bytes.
 Why is nothing being written to the file?  I have attached the code
 below:

 int fd = open(serializedMessage.bin, O_WRONLY);

 google::protobuf::io::ZeroCopyOutputStream* fileOutput = new
 google::protobuf::io::FileOutputStream(fd);

 google::protobuf::io::CodedOutputStream* codedOutput = new
 google::protobuf::io::CodedOutputStream(fileOutput);

 codedOutput-WriteVarint32(message1.ByteSize());
 message1.SerializeToCodedStream(codedOutput);
 // also tried snap1.SerializeWithCachedSizes(codedOutput)

 close(fd);

 Thanks!

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] What's the state of TextFormat?

2010-10-04 Thread Daniel Wright
TextFormat is used extensively within Google, often for exactly the purpose
you describe.  It should work well, though I'd recommend keeping the files
in UTF8 to avoid localisation issues.

On Mon, Oct 4, 2010 at 5:08 PM, Dan drtet...@gmail.com wrote:

 Hi there.

 I'm wondering what the state of TextFormat support is in protocol
 buffers and if it is and will remain a fully supported option.

 Among other things, I'm thinking of using it as a way of storing
 configuration/state for objects in a message based system in an easy
 to read format that can also be serialised efficiently to binary when
 needed. Does this seem like a reasonable use for PB or could there be
 problems with this approach? Would there be any issues with
 localisation for string fields by doing this?

 Thanks for the advice.

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] doubt

2010-09-08 Thread Daniel Wright
They're effectively the same -- it's just a style question.  If B only makes
sense in the context of A, I'd go with the second version, otherwise I'd go
with the first.  But the generated code should be the same except for the
name of B in the generated code.

2010/9/8 alf alberto@gmail.com

 what is different o what is recommended

 is the same ?


 message A
 {
 optional string stuff = 1
 repeated message A = 2
 }

 message B
 {
 optional string stuff=1
 }

 or


 message A
 {
  optional string stuff = 1
  repeated message A = 2
  message B
   {
optional string stuff=1
   }

 }






 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Re: Status of protobufs

2010-09-02 Thread Daniel Wright
See
http://code.google.com/apis/protocolbuffers/docs/techniques.html#streaming

Your solution of trying to parse after each byte received is not just slow,
it's completely incorrect.  It's entirely possible, and quite likely, that
if you break the encoded version of a message in half, each half will parse
successfully on its own, but the contents of the message will be incorrect.

On Thu, Sep 2, 2010 at 7:29 AM, Jean-Sebastien Stoezel js.stoe...@gmail.com
 wrote:

 Hello,

 Thanks for the status update. I guess I will be reusing the message
 delimiter I had developed a couple of years ago =].

 Jean-Sebastien



 On Aug 26, 12:49 pm, Evan Jones ev...@mit.edu wrote:
  On Aug 26, 2010, at 12:07 , Jean-Sebastien Stoezel wrote:
 
   More specifically how they are parsed from real time datastreams?
 
  You should manually insert a leading length of next message field
  into the data stream. The Java implementation even has a shortcut
  methods for this (see below). In C++ you have to implement it
  yourself, but it is only a few lines of code.
 
  See:
 
  http://code.google.com/apis/protocolbuffers/docs/techniques.html#stre...
 
 
 http://code.google.com/apis/protocolbuffers/docs/reference/java/com/g...)http://code.google.com/apis/protocolbuffers/docs/reference/java/com/g..
 .)
 
  Evan
 
  --
  Evan Joneshttp://evanjones.ca/

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Performance analysis of RepeatedField versus generated class

2010-08-17 Thread Daniel Wright
How about this way-more-readable variant of option B:

add_double_vector(deal_pb.bucket(i).bucketdouble().data(),
deal_pb.bucket(i).bucketdouble().size());

This assumes that add_bouble_vector only needs a const pointer.  If it needs
a non-const pointer, add the appropriate mutable_ prefixes.

On Tue, Aug 17, 2010 at 8:51 AM, nirajshr niraj...@gmail.com wrote:

 I want to know which of the following implementation is better in
 terms of performance and usability. Option A is much easier and
 portable across different protobuf versions. Is it worth the trouble
 trying to go with Option B? In my tests, I could not find much of a
 performance gain by using Option B.

 Option A uses the protobuf compiler generated class to loop through
 each elements of a repeated doubles.
 On the other hand, Option B uses RepeatedField class to extract a
 pointer to the array of doubles.


 Option A:

 
 int size = bucket.bucketdouble(j).data_size();
 double* doubleVector = new double[size];
 for(int k=0; ksize; k++) {
doubleVector[k] = bucket.bucketdouble(j).data(k);
 }
 // Custom function that takes arrays of doubles
 add_double_vector(doubleVector, size);
 delete [] doubleVector;


 Option B:

 
 using namespace google::protobuf;
 DoubleData* doubledata_m = deal_pb.mutable_bucket(i)-
 mutable_bucketdouble(j);
 RepeatedField double * doublearray_m = doubledata_m-mutable_data();
 // Custom function that takes arrays of doubles
 add_double_vector(doublearray_m-mutable_data(), doublearray_m-
 size());



 My .proto file looks something like this:

 
 message DoubleData {
required string name = 1 [default = noname];
repeated double data = 2;
 }

 message Bucket {
   repeated DoubleData bucketDouble = 2;
 }

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Encoding/Decoding of data - Question on CodedInputStream CodedOutputStream

2010-08-16 Thread Daniel Wright
I'm not completely sure I understand your question, but if you're asking
about the difference between writeTo(OutputStream) and
writeTo(CodedOutputStream), they're the same -- writeTo(OutputStream) just
wraps the OutputStream in a CodedOutputStream and writes to that.  Here's
the code:

  public void writeTo(final OutputStream output) throws IOException {
final int bufferSize =
CodedOutputStream.computePreferredBufferSize(getSerializedSize());
final CodedOutputStream codedOutput =
CodedOutputStream.newInstance(output, bufferSize);
writeTo(codedOutput);
codedOutput.flush();
  }

Most users can just use OutputStream and let the above wrapper take care of
things for you, unless you're writing lots of tiny messages and the cost of
creating the CodedOutputStream becomes significant, or you're using the
CodedOutputStream to write your own metadata besides the message itself
(e.g. delimiters).

On Mon, Aug 16, 2010 at 8:02 AM, Prakash Rao prakashrao1...@gmail.comwrote:

 Hi,
 I'm new to PB and would like to know whether it is CodedInputStream 
 CodedOutputStream which takes care of encoding data while writing into
 streams. In few APIs I'm directly using InputStream  OutputStream
 taken from HTTP URL connection class and I would like to know if the
 data will be encoded in these cases (not using CodedInputStream 
 CodedOutputStream for these APIs). What other advantages
 CodedInputStream  CodedOutputStream will have as compared to direct
 InputStream  OutputStream taken from HTTP URL connection.

 Regards,
 Prakash

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Multiple messages

2010-07-23 Thread Daniel Wright
The standard solution for this is to length-delimit your messages in the
file with CodedOutputStream.  So you create a CodedOutputStream for the file
(call it coded_output_stream), and then for each message, something like
(untested):

coded_output_stream.WriteVarint32(message.ByteSize());
message.SerializeToCodedStream(coded_output_stream);

Then to read them in, for each message:

int32 size;
coded_input_stream.ReadVarint32(size);
CodedInputStream::Limit old_limit = coded_input_stream.PushLimit(size);
message.ParseFromCodedStream(coded_input_stream);
coded_input_stream.PopLimit(old_limit);

2010/7/23 Julian González julian@gmail.com

 Hi,

 I am a newbie with Protocol Buffers I think I can use it in my
 application because right now I am using a CSV file and it takes a
 while to generate the file and also it is very big, I mean huge above
 2 GB, so I thought switching to a binary format would be better I
 found protocol buffers, I really like it, it is easy and flexible but
 I could find how to write several messages of the same type in a
 single file. In my application I write samples, millions o samples and
 every sample contains like 6 different fields, I would like to use
 protocol buffer, but so far I could not find a way to write a read
 several messages. Can somebody help me?

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] [protobuff] How to set a field of Message type? (C++)

2010-07-07 Thread Daniel Wright
In C++ you use the mutable_ accessor to set the value.  So for example, you
could do:
  my_message.mutable_auth_resp_msg()-set_foo(1);

On Tue, Jul 6, 2010 at 3:01 PM, Maxim Leonovich lm.b...@gmail.com wrote:

 I have a protocol like that:

 message MSG {
enum MessageType {
//One enum value is needed for each message type
AUTHORIZATION_REQUEST = 1;
AUTHORIZATION_RESPONCE = 2;
}
required MessageType type = 1;
required int32 proto_version = 2 [default = 1];

//One value for each message
optional AuthorizationRequest auth_req_msg = 3;
optional AuthorizationResponce auth_resp_msg = 4;
 }

message AuthorizationRequest {
required string login = 1;
required string password = 2;
 }

 message AuthorizationResponce {
enum Result {
ACCEPTED = 1;
INVALID_LOGIN_PASS = 2;
ACCOUNT_SUSPENDED = 3;
REJECTED = 4;
}
required Result result = 1;
optional string message = 2;
 }


 But in generated C++ class I have setters only for type and
 proto_version. For auth_msg_req and auth_msg_resp I have only

 // optional .proto.AuthorizationResponce auth_resp_msg = 4;
  inline bool has_auth_resp_msg() const;
  inline void clear_auth_resp_msg();
  static const int kAuthRespMsgFieldNumber = 4;
  inline const ::proto::AuthorizationResponce auth_resp_msg() const;
  inline ::proto::AuthorizationResponce* mutable_auth_resp_msg();

 But there is no setter here.
 OK. How can I set a value for auth_resp and auth_req?
 P.S. I have this problem only in C++. In Java all setters exists.

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Tcl: decoding a serialized ProtoBuf

2010-05-26 Thread Daniel Wright
The wire format is documented in
http://code.google.com/apis/protocolbuffers/docs/encoding.html

Daniel

On Wed, May 26, 2010 at 9:11 AM, nedbrek nedb...@yahoo.com wrote:

 Hello all,
   I am interested in decoding a ProtoBuf I read off the network
 (080010001800220100) in Tcl.  Full PB support for Tcl would be nice
 to have, but in the near term, I just want to know what was in the
 message.

   What documentation should I read for this?

 Thanks,
 Ned

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] File size of the serialized records

2010-03-22 Thread Daniel Wright
The most likely cause is a bug in your code where there's something you
aren't clearing each time you write a record, so at each iteration in your
loop, the record you're writing is getting bigger.  Of course I can't say
for sure without seeing the code.

Daniel

On Mon, Mar 22, 2010 at 1:13 PM, Vinit Mahedia shortempe...@gmail.comwrote:

 Hi Jason,

 Thanks for the quick reply.

 I am not surprised by the increase in file size, But I am under impression
 that If I insert the same record
 thousand times, the size of file should be large accordingly,

 e.g, assume that one record generates the file of size 32 bytes; with1024
 records should sum up to 32K
 size or close to that but it does not and that is why I am surprised. The
 growth in size is not linear and
 that was the reason I posted my findings. I am a student so might be
 missing a small concept or anything
 here, if so, apologies in advance for taking your time.

 Once again appreciate your help,



 On Mon, Mar 22, 2010 at 12:42 PM, Jason Hsueh jas...@google.com wrote:

 If you're measuring using sizeof(), you won't account for memory allocated
 by subobjects (strings and submessages are stored as pointers). You should
 use Message::SpaceUsed() instead. The inmemory vs serialized size is going
 to depend on your proto definition and how you use it. If you have a lot of
 optional fields, but only set one of them, the serialized size will likely
 be much smaller than the in memory size. If you have lots of large strings,
 they're probably going to be pretty close since both sizes will be dominated
 by the actual bytes of the strings.

 It sounds like you are surprised that the serialized size increases as you
 increase the number of records. What exactly do you expect to happen here?


 On Mon, Mar 22, 2010 at 12:15 PM, Vinit shortempe...@gmail.com wrote:

 I was testing to see the upper limit for numbers of records in one
 file.
 I used the addressbook example, and I noticed that for one record
 it generates file double the size.

 for ex. size of the class I was putting into it was 48 bytes and the
 file
 was of 97 bytes on ubuntu 9.10.

 Now, I go test it with 1000 records bang! it goes many fold and with
 records in hundreds of thousands, file size increases in many folds.

 Has anyone investigated around this area ? I did not note down the
 exact numbers as I thought someone should already have done it.

 Please let me know if you want the detail test numbers, I can run
 through it again and provide you with information.

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.





 --
 Vinit

  --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] PreProcessor for TextFormat files

2010-03-17 Thread Daniel Wright
You might want to consider something like GNU m4 as a preprocessor to your
config files.  I've never used it for proto files, but used it successfully
for other things -- it lets you define macros and evaluate expressions.
 See http://en.wikipedia.org/wiki/M4_(computer_language)

On Wed, Mar 17, 2010 at 10:09 AM, Kenton Varda ken...@google.com wrote:

 If you're asking whether text format supports expression evaluation, the
 answer is no.  Implementing this would probably add more complication to the
 parser than it already has, and it would never be good enough to satisfy
 everyone.  If you need computed values, you should write code in a real
 programming language to do the computation.


 On Mon, Mar 15, 2010 at 4:40 PM, nicksun nick...@gmail.com wrote:

 I've been using protobuf as configuration files and status messages to
 deliver to various compute nodes.  The thought was to eventually
 replace our nasty #DEFINE X_PARAM 1020 with elegantly disseminated
 protobuf messages read from a human readable and editable file.

 We've gotten to the point where we're trying to determine relations
 between two ProtoBuf values such as the following preprocessor
 definition:

 #DEFINE Y_PARAM X_PARAM*Z_PARAM

 Is there a solution within protobuf that would allow the TextFormatter
 to parse the explicit message and produce the appropriate message for
 serialization?  Thanks in advance.

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.


  --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Re: Protocol Buffers using Lzip

2009-12-11 Thread Daniel Wright
As I'm sure you can imagine, we store a lot of data in protocol buffer
format at Google, so we often want to store very large files with many
serialized protocol buffers.  The technique we use is to batch a bunch of
records together, compress them and write the compressed block to the file
(with checksumming and positioning markers).  Then if you want to read a
specific record, you need to decompress one block.  That doesn't take much
longer than the disk seek, so it's not a problem unless you have huge
blocks.

The code we use to do this and the file format is honestly a bit of mess
(it's grown slowly over many years), so it's not suitable to be
open-sourced.  It certainly makes sense to have an open source library to do
this, and it sounds similar to what your code is aiming at.  But I agree
with Kenton that it should not be part of the Protocol Buffer library -- it
should be a separate project.  It doesn't even need to be directly connected
to Protocol Buffers -- you can use the same format for any kind of record.

Daniel

On Fri, Dec 11, 2009 at 2:20 PM, Jacob Rief jacob.r...@gmail.com wrote:

 Hello Chris,

 2009/12/10 Christopher Smith cbsm...@gmail.com:
  One compression algo that I thought would be particularly useful with
 PB's
  would be LZO. It lines up nicely with PB's goals of being fast and
 compact.
  Have you thought about allowing an integrated LZO stream?
 
  --Chris

 My goal is to compress huge amounts 5GB of small serialized chunks
 (~150...500 Bytes) into a single stream, and still being able to
 randomly access each part of it without having to decompress to whole
 stream. GzipOutputStream (with level 5) reduces the size to about 40%
 compared to the uncompressed binary stream, whereas my
 LzipOutputStream (with level 5) reduces the size to about 20%. The
 difficulty with gzip is to find synchronizing boundaries in the stream
 during uncompression
 If your aim is to exchange small messages, say by RPC, than a fast but
 less efficient algorithm is the right choice. If however you want to
 store huge amounts of data permanently, your requirements may be
 different.

 In my opinion, generic streaming classes such as
 ZeroCopyIn/OutputStream, shall offer different compression algorithms
 for different purposes. LZO has advantages if used for communication
 of small to medium sized chunks of data. LZMA on the other hand has
 advantages if you have to store lots of data for a long term. GZIP is
 somewhere in the middle. Unfortunately Kenton has another opinion
 about adding too many compression streaming classes.

 Today I studied the API of LZO. From what I have seen, I think one
 could implement two LzoIn/OutputStream classes. LZO compression
 however has a small drawback, let me explain why: The LZO API is not
 intended to be used for streams. Instead it always compresses and
 decompresses a whole block. This is different behaviour than gzip and
 lzip, which are intended to compress streams. A compression class has
 a fixed sized buffer of typically 8 or 64kB. If this buffer is filled
 with data, lzip and gzip digest the input and you can start to fill
 the buffer from the beginning. On the other hand, the LZO compressor
 has to compress the whole buffer in one step. The next block then has
 to be concatenated with the already compressed data, which means that
 during decompression you have to fiddle these chunks apart.

 If your intention is to compress a chunk of data with, say less than
 64kB each, and then to put it on the wire, then LZO is the right
 solution for you. For my requirements, as you will understand now, LZO
 does not really fit well.
 If there is a strong interest in an alternative Protocol Buffer
 compression stream, don't hesitate to contact me.

 Jacob

 --

 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.




--

You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.




Re: [protobuf] Re: protoc generated .h file has errors in VS2008

2009-12-10 Thread Daniel Wright
Maybe ask the compiler for the pre-processed output (sorry, I don't know the
VS2008 flag), and search through that for the first appearance of those
identifiers.  That'll tell you what header they come from.

Adding a package should fix this kind of problem by putting all of the
messages in a c++ namespace.

On Thu, Dec 10, 2009 at 3:36 PM, bayWeiss bzwr...@yahoo.com wrote:

 Ok, some more info.. thanks to your help

 So I renamed Arc to ArcShape in the protofile, for example, and it all
 works fine.
 Funny thing is, if I rename Square to Polyline, I get the same
 errors.

 If I rename Polyline to PolyLine.. all is ok.

 Using Ellipse throws errors also...

 So, I guess I cant pin down exactly where the conflict is, but there
 must be some geometry objects included with the c++ console project
 type i created...

 thanks!

 On Dec 10, 4:36 pm, Kenton Varda ken...@google.com wrote:
  The errors suggest to me that somewhere you have #defined Arc as a macro.
   Is this possible?
 
 
 
  On Thu, Dec 10, 2009 at 9:21 AM, bayWeiss bzwr...@yahoo.com wrote:
   I cant post the whole generated .h file, but below is a snippet where
   for some reason the compiler cant identify the Arc class. The
   declaration for Arc is definitely above the below c++ snippet, but it
   acts as if it cant recognize it.
 
   ---the errors are from VS2008--
 
   error C4430: missing type specifier - int assumed. Note: C++ does not
   support default-int c:\googleprototest\googleprototestcppserver
   \person.pb.h669 GoogleProtoTestCPPServer
   error C2143: syntax error : missing ';' before ''
  c:\googleprototest
   \googleprototestcppserver\person.pb.h   669
   error C2838: 'Arc' : illegal qualified name in member declaration
 c:
   \googleprototest\googleprototestcppserver\person.pb.h   669
   error C4430: missing type specifier - int assumed. Note: C++ does not
   support default-int c:\googleprototest\googleprototestcppserver
   \person.pb.h669
 
   -here is the generated header snippet from 2.2.0a protoc, within
   the Entity class declaration
 
// optional .Square square = 3;
inline bool has_square() const;
inline void clear_square();
static const int kSquareFieldNumber = 3;
inline const ::Square square() const;
inline ::Square* mutable_square();
 
// optional .Arc arc = 4;
inline bool has_arc() const;
inline void clear_arc();
static const int kArcFieldNumber = 4;
inline const ::Arc arc() const; // - the
   problem is here, line 669
inline ::Arc* mutable_arc();   // and
   here as well
 
   ---here is the .proto file-
 
   message Point
   {
  required double X = 1;
  required double Y = 2;
   }
 
   message Circle
   {
  required Point point1 = 1;
  required Point point2 = 2;
  required Point point3 = 3;
  required Point point4 = 4;
   }
 
   message Square
   {
  repeated Point pointList = 1;
   }
 
   message Arc
   {
  required Point center = 1;
  required double startAngle = 2;
  required double endAngle = 3;
   }
 
   message Parameter
   {
  required double somethingHere = 1;
   }
 
   message Entity
   {
  enum Type { CIRCLE = 1; SQUARE = 2; ARC = 3; }
 
  // Identifies which entity it is.
  required Type type = 1;
 
  optional Circle circle = 2;
  optional Square square = 3;
  optional Arc arc = 4;
   }
 
   message Job
   {
  repeated Entity entity = 1;
   }
 
   message Initialize
   {
   }
 
   message Start
   {
   }
 
   message MainMessage
   {
  enum Type { NEW_JOB = 1; INITIALIZE = 2; START = 3;}
 
  // Identifies which field is filled in.
  required Type type = 1;
 
  // One of the following will be filled in.
  optional Job job = 2;
  optional Initialize initialize = 3;
  optional Start start = 4;
   }
 
   --
 
   You received this message because you are subscribed to the Google
 Groups
   Protocol Buffers group.
   To post to this group, send email to proto...@googlegroups.com.
   To unsubscribe from this group, send email to
   protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 protobuf%2bunsubscr...@googlegroups.c­om
   .
   For more options, visit this group at
  http://groups.google.com/group/protobuf?hl=en.- Hide quoted text -
 
  - Show quoted text -

 --

 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.




--

You received this message because you are subscribed to the Google Groups