Re: [protobuf] File size of the serialized records

2010-03-22 Thread Jason Hsueh
If you're measuring using sizeof(), you won't account for memory allocated
by subobjects (strings and submessages are stored as pointers). You should
use Message::SpaceUsed() instead. The inmemory vs serialized size is going
to depend on your proto definition and how you use it. If you have a lot of
optional fields, but only set one of them, the serialized size will likely
be much smaller than the in memory size. If you have lots of large strings,
they're probably going to be pretty close since both sizes will be dominated
by the actual bytes of the strings.

It sounds like you are surprised that the serialized size increases as you
increase the number of records. What exactly do you expect to happen here?

On Mon, Mar 22, 2010 at 12:15 PM, Vinit shortempe...@gmail.com wrote:

 I was testing to see the upper limit for numbers of records in one
 file.
 I used the addressbook example, and I noticed that for one record
 it generates file double the size.

 for ex. size of the class I was putting into it was 48 bytes and the
 file
 was of 97 bytes on ubuntu 9.10.

 Now, I go test it with 1000 records bang! it goes many fold and with
 records in hundreds of thousands, file size increases in many folds.

 Has anyone investigated around this area ? I did not note down the
 exact numbers as I thought someone should already have done it.

 Please let me know if you want the detail test numbers, I can run
 through it again and provide you with information.

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] File size of the serialized records

2010-03-22 Thread Vinit Mahedia
Hi Jason,

Thanks for the quick reply.

I am not surprised by the increase in file size, But I am under impression
that If I insert the same record
thousand times, the size of file should be large accordingly,

e.g, assume that one record generates the file of size 32 bytes; with1024
records should sum up to 32K
size or close to that but it does not and that is why I am surprised. The
growth in size is not linear and
that was the reason I posted my findings. I am a student so might be missing
a small concept or anything
here, if so, apologies in advance for taking your time.

Once again appreciate your help,



On Mon, Mar 22, 2010 at 12:42 PM, Jason Hsueh jas...@google.com wrote:

 If you're measuring using sizeof(), you won't account for memory allocated
 by subobjects (strings and submessages are stored as pointers). You should
 use Message::SpaceUsed() instead. The inmemory vs serialized size is going
 to depend on your proto definition and how you use it. If you have a lot of
 optional fields, but only set one of them, the serialized size will likely
 be much smaller than the in memory size. If you have lots of large strings,
 they're probably going to be pretty close since both sizes will be dominated
 by the actual bytes of the strings.

 It sounds like you are surprised that the serialized size increases as you
 increase the number of records. What exactly do you expect to happen here?


 On Mon, Mar 22, 2010 at 12:15 PM, Vinit shortempe...@gmail.com wrote:

 I was testing to see the upper limit for numbers of records in one
 file.
 I used the addressbook example, and I noticed that for one record
 it generates file double the size.

 for ex. size of the class I was putting into it was 48 bytes and the
 file
 was of 97 bytes on ubuntu 9.10.

 Now, I go test it with 1000 records bang! it goes many fold and with
 records in hundreds of thousands, file size increases in many folds.

 Has anyone investigated around this area ? I did not note down the
 exact numbers as I thought someone should already have done it.

 Please let me know if you want the detail test numbers, I can run
 through it again and provide you with information.

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.





-- 
Vinit

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] File size of the serialized records

2010-03-22 Thread Daniel Wright
The most likely cause is a bug in your code where there's something you
aren't clearing each time you write a record, so at each iteration in your
loop, the record you're writing is getting bigger.  Of course I can't say
for sure without seeing the code.

Daniel

On Mon, Mar 22, 2010 at 1:13 PM, Vinit Mahedia shortempe...@gmail.comwrote:

 Hi Jason,

 Thanks for the quick reply.

 I am not surprised by the increase in file size, But I am under impression
 that If I insert the same record
 thousand times, the size of file should be large accordingly,

 e.g, assume that one record generates the file of size 32 bytes; with1024
 records should sum up to 32K
 size or close to that but it does not and that is why I am surprised. The
 growth in size is not linear and
 that was the reason I posted my findings. I am a student so might be
 missing a small concept or anything
 here, if so, apologies in advance for taking your time.

 Once again appreciate your help,



 On Mon, Mar 22, 2010 at 12:42 PM, Jason Hsueh jas...@google.com wrote:

 If you're measuring using sizeof(), you won't account for memory allocated
 by subobjects (strings and submessages are stored as pointers). You should
 use Message::SpaceUsed() instead. The inmemory vs serialized size is going
 to depend on your proto definition and how you use it. If you have a lot of
 optional fields, but only set one of them, the serialized size will likely
 be much smaller than the in memory size. If you have lots of large strings,
 they're probably going to be pretty close since both sizes will be dominated
 by the actual bytes of the strings.

 It sounds like you are surprised that the serialized size increases as you
 increase the number of records. What exactly do you expect to happen here?


 On Mon, Mar 22, 2010 at 12:15 PM, Vinit shortempe...@gmail.com wrote:

 I was testing to see the upper limit for numbers of records in one
 file.
 I used the addressbook example, and I noticed that for one record
 it generates file double the size.

 for ex. size of the class I was putting into it was 48 bytes and the
 file
 was of 97 bytes on ubuntu 9.10.

 Now, I go test it with 1000 records bang! it goes many fold and with
 records in hundreds of thousands, file size increases in many folds.

 Has anyone investigated around this area ? I did not note down the
 exact numbers as I thought someone should already have done it.

 Please let me know if you want the detail test numbers, I can run
 through it again and provide you with information.

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.





 --
 Vinit

  --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] File size of the serialized records

2010-03-22 Thread Vinit Mahedia
I am using add_person.cc provided in the sample file. The only change I have
done is,
a while loop around this code. So it's same record inserted multiple times.


// Write the new address book back to disk.fstream output(argv[1], ios::out
| ios::trunc | ios::binary);
*int nRecords = 10;*
*while ( nRecords-- )*
*{*if (!address_book.SerializeToOstream(output)) {
  cerr  Failed to write address book.  endl;
  return -1;
}*}*

On Mon, Mar 22, 2010 at 1:34 PM, Daniel Wright dwri...@google.com wrote:

 The most likely cause is a bug in your code where there's something you
 aren't clearing each time you write a record, so at each iteration in your
 loop, the record you're writing is getting bigger.  Of course I can't say
 for sure without seeing the code.

 Daniel

 On Mon, Mar 22, 2010 at 1:13 PM, Vinit Mahedia shortempe...@gmail.comwrote:

 Hi Jason,

 Thanks for the quick reply.

 I am not surprised by the increase in file size, But I am under impression
 that If I insert the same record
 thousand times, the size of file should be large accordingly,

 e.g, assume that one record generates the file of size 32 bytes; with1024
 records should sum up to 32K
 size or close to that but it does not and that is why I am surprised. The
 growth in size is not linear and
 that was the reason I posted my findings. I am a student so might be
 missing a small concept or anything
 here, if so, apologies in advance for taking your time.

 Once again appreciate your help,



 On Mon, Mar 22, 2010 at 12:42 PM, Jason Hsueh jas...@google.com wrote:

 If you're measuring using sizeof(), you won't account for memory
 allocated by subobjects (strings and submessages are stored as pointers).
 You should use Message::SpaceUsed() instead. The inmemory vs serialized size
 is going to depend on your proto definition and how you use it. If you have
 a lot of optional fields, but only set one of them, the serialized size will
 likely be much smaller than the in memory size. If you have lots of large
 strings, they're probably going to be pretty close since both sizes will be
 dominated by the actual bytes of the strings.

 It sounds like you are surprised that the serialized size increases as
 you increase the number of records. What exactly do you expect to happen
 here?


 On Mon, Mar 22, 2010 at 12:15 PM, Vinit shortempe...@gmail.com wrote:

 I was testing to see the upper limit for numbers of records in one
 file.
 I used the addressbook example, and I noticed that for one record
 it generates file double the size.

 for ex. size of the class I was putting into it was 48 bytes and the
 file
 was of 97 bytes on ubuntu 9.10.

 Now, I go test it with 1000 records bang! it goes many fold and with
 records in hundreds of thousands, file size increases in many folds.

 Has anyone investigated around this area ? I did not note down the
 exact numbers as I thought someone should already have done it.

 Please let me know if you want the detail test numbers, I can run
 through it again and provide you with information.

 --
 You received this message because you are subscribed to the Google
 Groups Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.





 --
 Vinit

  --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.





-- 
Vinit

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] File size of the serialized records

2010-03-22 Thread Kenton Varda
That's not the right way to write multiple records.  What you're doing is
writing multiple address books without proper boundaries between them.  The
right thing to do would be to add multiple persons to one address book,
then write it once.

That said, the file produced by your code should grow proportionally to
nRecords, since you're just writing the exact same bytes nRecords times.  So
there must be some other problem in some part of your code which you didn't
show us.

On Mon, Mar 22, 2010 at 1:56 PM, Vinit Mahedia shortempe...@gmail.comwrote:

 I am using add_person.cc provided in the sample file. The only change I
 have done is,
 a while loop around this code. So it's same record inserted multiple times.


  // Write the new address book back to disk. fstream output(argv[1], ios::out
 | ios::trunc | ios::binary);
 *int nRecords = 10;*
 *while ( nRecords-- )*
 *{*  if (!address_book.SerializeToOstream(output)) {
cerr  Failed to write address book.  endl;
return -1;
  }
 *}*

 On Mon, Mar 22, 2010 at 1:34 PM, Daniel Wright dwri...@google.com wrote:

 The most likely cause is a bug in your code where there's something you
 aren't clearing each time you write a record, so at each iteration in your
 loop, the record you're writing is getting bigger.  Of course I can't say
 for sure without seeing the code.

 Daniel

 On Mon, Mar 22, 2010 at 1:13 PM, Vinit Mahedia shortempe...@gmail.comwrote:

 Hi Jason,

 Thanks for the quick reply.

 I am not surprised by the increase in file size, But I am under
 impression that If I insert the same record
 thousand times, the size of file should be large accordingly,

 e.g, assume that one record generates the file of size 32 bytes; with1024
 records should sum up to 32K
 size or close to that but it does not and that is why I am surprised. The
 growth in size is not linear and
 that was the reason I posted my findings. I am a student so might be
 missing a small concept or anything
 here, if so, apologies in advance for taking your time.

 Once again appreciate your help,



 On Mon, Mar 22, 2010 at 12:42 PM, Jason Hsueh jas...@google.com wrote:

 If you're measuring using sizeof(), you won't account for memory
 allocated by subobjects (strings and submessages are stored as pointers).
 You should use Message::SpaceUsed() instead. The inmemory vs serialized 
 size
 is going to depend on your proto definition and how you use it. If you have
 a lot of optional fields, but only set one of them, the serialized size 
 will
 likely be much smaller than the in memory size. If you have lots of large
 strings, they're probably going to be pretty close since both sizes will be
 dominated by the actual bytes of the strings.

 It sounds like you are surprised that the serialized size increases as
 you increase the number of records. What exactly do you expect to happen
 here?


 On Mon, Mar 22, 2010 at 12:15 PM, Vinit shortempe...@gmail.com wrote:

 I was testing to see the upper limit for numbers of records in one
 file.
 I used the addressbook example, and I noticed that for one record
 it generates file double the size.

 for ex. size of the class I was putting into it was 48 bytes and the
 file
 was of 97 bytes on ubuntu 9.10.

 Now, I go test it with 1000 records bang! it goes many fold and with
 records in hundreds of thousands, file size increases in many folds.

 Has anyone investigated around this area ? I did not note down the
 exact numbers as I thought someone should already have done it.

 Please let me know if you want the detail test numbers, I can run
 through it again and provide you with information.

 --
 You received this message because you are subscribed to the Google
 Groups Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.





 --
 Vinit

  --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.





 --
 Vinit

  --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To