Re: Message forwarding and partial parsing

2009-10-08 Thread villintehaspam

Hi Kenton,

Thank you for your quick response and your feedback.

I'm going to use option 3, since as you say this will probably be the
fastest solution and I think that it will fit in the best with our
application. You are probably right that this will not be an issue for
most messages that are going to be forwarded (most messages will be
quite small), but I consider the complexity of the different options
to be roughly the same so I might as well go for the solution that
feels the best.

Thanks,
V

On Oct 7, 9:44 pm, Kenton Varda ken...@google.com wrote:
 On Wed, Oct 7, 2009 at 5:46 AM, villintehaspam 
 villintehas...@gmail.comwrote:

  I am wondering about the best way of forwarding received protocol
  buffer messages from one entity to another without having to parse the
  entire message just to serialize it again.

 It looks like you've figured out all the major options.

 One thing I'd encourage you to do if you haven't already is actually profile
 your system to find out if repeated parsing and serialization is a real
 problem for you.  It may not be a real problem in practice even if it feels
 wrong.

  Of these three options, I'm thinking that option 3 is the correct way

 to go.

 All three options are reasonable.  Option 3 is the most complicated
 solution, but probably the most performant.

  Am I missing some functionality provided by protocol buffers
  (such as the ability to skip parsing extensions even if they are
  recognized or similar or only parse as much as needed)? Am I missing
  any problems?

 If you are using C++, then all compiled-in extensions will be eagerly
 parsed.  If you only compile-in the extensions that each process actually
 cares about, that solves your problem.

 In Java you provide an ExtensionRegistry listing extensions you care about,
 so it's trivial to include only the ones you want.  I'm guessing you aren't
 using Java.

  On a somewhat related note, is it possible to parse a partially
  transmitted message and continue parsing at a later time when more
  data is available?

 Not without blocking.  The library is designed to parse an entire message at
 once.  Allowing partial parsing (without blocking) would be quite
 complicated.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Generating.proto files from java source file

2009-10-08 Thread grasshopper
Hi all, a guy that isn't working in my company anymore has defined some
protocol buffer messages that we still use. We need to extend these messages
now, but we don't have the .proto file. Is there a straight way to generate
the proto files from the java classes? How could I do this?
Thanks

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



arrays??

2009-10-08 Thread sergei175


 Hi,

 I've looked at protocol buffers, and I've noted that there is no
support for arrays
 of values (double, integers). This is a significant drawback, for
example
 JSOM, HDF5 etc they all have this.

 One post suggested  that one should put an array  as one single
string in a field
 I've did this, and the performance  was very bad in Java and very
memory consuming
 (compare to the standard Java serialization).
 I've wrote 500 times the same array (10,000 double numbers ), and
after the array 500,
 my computer was out of memory,

 Secondly, all tutorials suggest that the file should be written at
once, i.e. at the end
 of the program, when the messages
 are filled. I want to write data to the disk in several steps. say I
want to write one record first (say, one array),
 then I want to append data to the existing file, and so on, this way
I will not need
  to keep all records in the computer memory. The merge mechanism
shown in the tutorial
 seems parses the old file
 first, and then add new record, and write a new file.

 Do I understand this correctly? If yes, then the protocol  buffers is
not too good for large data volumes,
 especially with numerical arrays

 best wishes, Sergei

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: arrays??

2009-10-08 Thread Constantinos Michael
On Thu, Oct 8, 2009 at 5:35 PM, sergei175 sergei...@googlemail.com wrote:



  Hi,

  I've looked at protocol buffers, and I've noted that there is no
 support for arrays
  of values (double, integers). This is a significant drawback, for
 example
  JSOM, HDF5 etc they all have this.


Have you looked at repeated fields? You can define one like so:

repeated double my_number = 1;



  One post suggested  that one should put an array  as one single
 string in a field
  I've did this, and the performance  was very bad in Java and very
 memory consuming
  (compare to the standard Java serialization).
  I've wrote 500 times the same array (10,000 double numbers ), and
 after the array 500,
  my computer was out of memory,

  Secondly, all tutorials suggest that the file should be written at
 once, i.e. at the end
  of the program, when the messages
  are filled. I want to write data to the disk in several steps. say I
 want to write one record first (say, one array),
  then I want to append data to the existing file, and so on, this way
 I will not need
  to keep all records in the computer memory. The merge mechanism
 shown in the tutorial
  seems parses the old file
  first, and then add new record, and write a new file.

  Do I understand this correctly? If yes, then the protocol  buffers is
 not too good for large data volumes,
  especially with numerical arrays

  best wishes, Sergei

 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: arrays??

2009-10-08 Thread sergei175


Hi,

 This is exactly what I've done before putting arrays into a string.
 When I've implemented arrays via repeated fields, the program was
even slower,
 and the file size was too large (compare to Java serialization
mechanism+ zip).
 This is why I've moved  my array into a  string, thinking that there
 will be no significant overhead storing such object. I guess, each
repeated
 filed has some used additional bits to store them

 Yes, I used [packed=true] for double field. I did not check what
 will happen after removing at (probably, the file size will be even
bigger!!)


 cheers, Sergei

On Oct 8, 10:48 am, Marc Gravell marc.grav...@gmail.com wrote:
 For basic types, you can also use packed encoding to reduce the space
 required; just add [packed=true] to a repeated element.

 Marc

 On Oct 8, 4:46 pm, Constantinos Michael constanti...@google.com
 wrote:

  On Thu, Oct 8, 2009 at 5:35 PM, sergei175 sergei...@googlemail.com wrote:

    Hi,

    I've looked at protocol buffers, and I've noted that there is no
   support for arrays
    of values (double, integers). This is a significant drawback, for
   example
    JSOM, HDF5 etc they all have this.

  Have you looked at repeated fields? You can define one like so:

  repeated double my_number = 1;

    One post suggested  that one should put an array  as one single
   string in a field
    I've did this, and the performance  was very bad in Java and very
   memory consuming
    (compare to the standard Java serialization).
    I've wrote 500 times the same array (10,000 double numbers ), and
   after the array 500,
    my computer was out of memory,

    Secondly, all tutorials suggest that the file should be written at
   once, i.e. at the end
    of the program, when the messages
    are filled. I want to write data to the disk in several steps. say I
   want to write one record first (say, one array),
    then I want to append data to the existing file, and so on, this way
   I will not need
    to keep all records in the computer memory. The merge mechanism
   shown in the tutorial
    seems parses the old file
    first, and then add new record, and write a new file.

    Do I understand this correctly? If yes, then the protocol  buffers is
   not too good for large data volumes,
    especially with numerical arrays

    best wishes, Sergei
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: arrays??

2009-10-08 Thread Henner Zeller

Hi,

  This is exactly what I've done before putting arrays into a string.
  When I've implemented arrays via repeated fields, the program was
 even slower,
  and the file size was too large (compare to Java serialization
 mechanism+ zip).

If you put the values in a string and do you own array management on
top as compared to using a repeated field with packed option, there
should not be a significant difference because it is essentially the
same.
Protobufs don't come with a compression, so if you compare the sizes,
you need to compare compressed Java serialization with compressed
proto serialization.

If you provide an example of what you want to do and what are your
current solutions you compare, people on this list might be able to
help.

-h

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: arrays??

2009-10-08 Thread sergei175


 Ok, this is a simple example of proto buffers file.
 I want to write 1000 Records. Each record has its name and
NamedArray

 Each array has its name and a set of double numbers,  For my example,
 I've filled array with 10 000 numbers for all 1000 Records.

 There are 2 things you will see:

 1) After event 500, even 200MB memory is not enough.
 2) It's slower by factor ~5 compare to the java serialization with
the
compression.
 3) File size is very large. I do not know how to fill
 compressed recorsd on fly using this package.

 Finally, there is no even sensible approach to append new Records
 to the existing file (without merge, which in fact has to parse
the
  existing file first!)

 So, I do not see any superiority of Protocol Buffers compare
 to use file formats, it's actually much worst as it come to such
situations..


**
// orginize in repeated records
message Record {

   optional string name = 1;

  message NamedArray {
required string  name=1 [default = none];
repeated double  value=2 [packed=true];
  }
  optional NamedArray array = 2;

message PBuffer {
  repeated Record record = 1;
}













On Oct 8, 11:33 am, Henner Zeller h.zel...@acm.org wrote:
 Hi,

   This is exactly what I've done before putting arrays into a string.
   When I've implemented arrays via repeated fields, the program was
  even slower,
   and the file size was too large (compare to Java serialization
  mechanism+ zip).

 If you put the values in a string and do you own array management on
 top as compared to using a repeated field with packed option, there
 should not be a significant difference because it is essentially the
 same.
 Protobufs don't come with a compression, so if you compare the sizes,
 you need to compare compressed Java serialization with compressed
 proto serialization.

 If you provide an example of what you want to do and what are your
 current solutions you compare, people on this list might be able to
 help.

 -h
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: arrays??

2009-10-08 Thread Henner Zeller

Hi,
On Thu, Oct 8, 2009 at 10:57, sergei175 sergei...@googlemail.com wrote:


  Ok, this is a simple example of proto buffers file.
  I want to write 1000 Records. Each record has its name and
 NamedArray

  Each array has its name and a set of double numbers,  For my example,
  I've filled array with 10 000 numbers for all 1000 Records.

  There are 2 things you will see:

  1) After event 500, even 200MB memory is not enough.
  2) It's slower by factor ~5 compare to the java serialization with
 the
    compression.

So for java serialization, you have a class that contains a
ArrayListNamedArray with NamedArray objects containing a
Vectordouble and then serialize the whole ArrayListNamedArray to
disk ?

  3) File size is very large. I do not know how to fill
     compressed recorsd on fly using this package.

If you want to write the independent records, you should write them
delimited to a file and not put everything in memory.
Regarding compression: you write the stuff to a stream eventually, so
you can wrap that with a GZipOutputStream - I guess that is what you
do with the Java serialization with compression as well.

  Finally, there is no even sensible approach to append new Records
  to the existing file (without merge, which in fact has to parse
 the
  existing file first!)

Protocol buffers don't provide the transport or storage layer. They
provide the encoding. You have to provide for the storage yourself. A
simple default implementation might be useful to start but still many
people still would need to write their own way of storing things.
OTOH, it is only a handful of lines to write it yourself.

For things like this (and is has been discussed many times on this
list), you should write out delimiters telling the size of the next
record followed by the record itself. I think there even has been
something added recently to the API to make this simpler (don't know,
I use my own implementation ;) )

-h

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: arrays??

2009-10-08 Thread Kenton Varda
On Thu, Oct 8, 2009 at 10:57 AM, sergei175 sergei...@googlemail.com wrote:

  1) After event 500, even 200MB memory is not enough.
  2) It's slower by factor ~5 compare to the java serialization with
 the
compression.


Protocol Buffers do not include compression, so to make this comparison fair
you would need to add compression on top of them too.  If your speed is
dominated by file I/O time (likely!) then you might find that this makes
protocol buffers faster.


  3) File size is very large. I do not know how to fill
 compressed recorsd on fly using this package.


Use java.util.zip.GZIPOutputStream.


  Finally, there is no even sensible approach to append new Records
  to the existing file (without merge, which in fact has to parse
 the
  existing file first!)


http://code.google.com/apis/protocolbuffers/docs/techniques.html#streaming

Protocol Buffers convert between raw bytes and structures.  They are not
intended to provide a mechanism for managing multiple individually-loadable
records.  If you have a very large data set, you need to split that set into
individual records in order to avoid reading/writing the whole thing at
once.  Each individual record can be encoded using protobufs, but you should
not encode the entire file as a protobuf.


  So, I do not see any superiority of Protocol Buffers compare
  to use file formats, it's actually much worst as it come to such
 situations..


By all means, don't use them then.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Generating.proto files from java source file

2009-10-08 Thread Kenton Varda
Yikes.  That's kind of like someone left you with just .class files without
the .java files.
If you look at the code, though, you will notice that there are comments in
it defining each field, like:
  // optional int32 i = 1;

These should be the exact field definitions as they might appear in the
.proto file.  So if you extract those -- keeping track of which inner class
each comment appeared in -- you should be able to reproduce the original
.proto file.

On Thu, Oct 8, 2009 at 6:34 AM, grasshopper pbde...@gmail.com wrote:

 Hi all, a guy that isn't working in my company anymore has defined some
 protocol buffer messages that we still use. We need to extend these messages
 now, but we don't have the .proto file. Is there a straight way to generate
 the proto files from the java classes? How could I do this?
 Thanks


 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Generating.proto files from java source file

2009-10-08 Thread Henner Zeller

On Thu, Oct 8, 2009 at 11:32, Kenton Varda ken...@google.com wrote:
 Yikes.  That's kind of like someone left you with just .class files without
 the .java files.
 If you look at the code, though, you will notice that there are comments in
 it defining each field, like:
   // optional int32 i = 1;
 These should be the exact field definitions as they might appear in the
 .proto file.  So if you extract those -- keeping track of which inner class
 each comment appeared in -- you should be able to reproduce the original
 .proto file.

Shouldn't it be possible to extract the descriptor from the java class
and then use reflection to emit a proto-file ?

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Generating.proto files from java source file

2009-10-08 Thread Kenton Varda
Hmm, that's true.  Although I'm not sure if there's actual code for writing
the .proto file in Java.  In C++, descriptors have a DebugString() method
which returns a compilable .proto file.

On Thu, Oct 8, 2009 at 11:41 AM, Henner Zeller h.zel...@acm.org wrote:

 On Thu, Oct 8, 2009 at 11:32, Kenton Varda ken...@google.com wrote:
  Yikes.  That's kind of like someone left you with just .class files
 without
  the .java files.
  If you look at the code, though, you will notice that there are comments
 in
  it defining each field, like:
// optional int32 i = 1;
  These should be the exact field definitions as they might appear in the
  .proto file.  So if you extract those -- keeping track of which inner
 class
  each comment appeared in -- you should be able to reproduce the original
  .proto file.

 Shouldn't it be possible to extract the descriptor from the java class
 and then use reflection to emit a proto-file ?


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Compile error: must implement the inherited abstract method

2009-10-08 Thread TD

Hi,

I've just downloaded and build a fresh Protocol Buffer package.  I'm
planning to use the Java version.  I've added protoc to my path and
compiled two .proto files successfully.  I've created an eclipse
project, and added the protobuf-java-2.2.0.jar file to the path.  The
two generated a source files go in the src directory.  Now when
eclipse tries to build the classes, I'm getting an error like the one
below for all inner classes in the generated class:
The type OneDircontent.DirectoryContent must implement the inherited
abstract method Message.toBuilder()

Any idea what's wrong?  I've set my workspace to build for JDK 1.6.

Thanks for any pointers you can provide!

Regards,
Tom.


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: arrays??

2009-10-08 Thread sergei175


Thanks, I've started to understand this better. Indeed, I have to
implement
 my own approach for I/O - protobuf alone is not enough. I only worry
that
 my own I/O to read/write records will not be cross platform, so I
could not
 benefit from the strength of this package.


On Oct 8, 1:24 pm, Kenton Varda ken...@google.com wrote:
 On Thu, Oct 8, 2009 at 10:57 AM, sergei175 sergei...@googlemail.com wrote:
   1) After event 500, even 200MB memory is not enough.
   2) It's slower by factor ~5 compare to the java serialization with
  the
     compression.

 Protocol Buffers do not include compression, so to make this comparison fair
 you would need to add compression on top of them too.  If your speed is
 dominated by file I/O time (likely!) then you might find that this makes
 protocol buffers faster.

   3) File size is very large. I do not know how to fill
      compressed recorsd on fly using this package.

 Use java.util.zip.GZIPOutputStream.

   Finally, there is no even sensible approach to append new Records
   to the existing file (without merge, which in fact has to parse
  the
   existing file first!)

 http://code.google.com/apis/protocolbuffers/docs/techniques.html#stre...

 Protocol Buffers convert between raw bytes and structures.  They are not
 intended to provide a mechanism for managing multiple individually-loadable
 records.  If you have a very large data set, you need to split that set into
 individual records in order to avoid reading/writing the whole thing at
 once.  Each individual record can be encoded using protobufs, but you should
 not encode the entire file as a protobuf.

   So, I do not see any superiority of Protocol Buffers compare
   to use file formats, it's actually much worst as it come to such
  situations..

 By all means, don't use them then.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---