Re: [protobuf] File size of the serialized records

2010-03-22 Thread Kenton Varda
That's not the right way to write multiple records.  What you're doing is
writing multiple address books without proper boundaries between them.  The
right thing to do would be to add multiple "person"s to one address book,
then write it once.

That said, the file produced by your code should grow proportionally to
nRecords, since you're just writing the exact same bytes nRecords times.  So
there must be some other problem in some part of your code which you didn't
show us.

On Mon, Mar 22, 2010 at 1:56 PM, Vinit Mahedia wrote:

> I am using add_person.cc provided in the sample file. The only change I
> have done is,
> a while loop around this code. So it's same record inserted multiple times.
>
>
>  // Write the new address book back to disk. fstream output(argv[1], ios::out
> | ios::trunc | ios::binary);
> *int nRecords = 10;*
> *while ( nRecords-- )*
> *{*  if (!address_book.SerializeToOstream(&output)) {
>cerr << "Failed to write address book." << endl;
>return -1;
>  }
> *}*
>
> On Mon, Mar 22, 2010 at 1:34 PM, Daniel Wright  wrote:
>
>> The most likely cause is a bug in your code where there's something you
>> aren't clearing each time you write a record, so at each iteration in your
>> loop, the record you're writing is getting bigger.  Of course I can't say
>> for sure without seeing the code.
>>
>> Daniel
>>
>> On Mon, Mar 22, 2010 at 1:13 PM, Vinit Mahedia wrote:
>>
>>> Hi Jason,
>>>
>>> Thanks for the quick reply.
>>>
>>> I am not surprised by the increase in file size, But I am under
>>> impression that If I insert the same record
>>> thousand times, the size of file should be large accordingly,
>>>
>>> e.g, assume that one record generates the file of size 32 bytes; with1024
>>> records should sum up to 32K
>>> size or close to that but it does not and that is why I am surprised. The
>>> growth in size is not linear and
>>> that was the reason I posted my findings. I am a student so might be
>>> missing a small concept or anything
>>> here, if so, apologies in advance for taking your time.
>>>
>>> Once again appreciate your help,
>>>
>>>
>>>
>>> On Mon, Mar 22, 2010 at 12:42 PM, Jason Hsueh  wrote:
>>>
 If you're measuring using sizeof(), you won't account for memory
 allocated by subobjects (strings and submessages are stored as pointers).
 You should use Message::SpaceUsed() instead. The inmemory vs serialized 
 size
 is going to depend on your proto definition and how you use it. If you have
 a lot of optional fields, but only set one of them, the serialized size 
 will
 likely be much smaller than the in memory size. If you have lots of large
 strings, they're probably going to be pretty close since both sizes will be
 dominated by the actual bytes of the strings.

 It sounds like you are surprised that the serialized size increases as
 you increase the number of records. What exactly do you expect to happen
 here?


 On Mon, Mar 22, 2010 at 12:15 PM, Vinit  wrote:

> I was testing to see the upper limit for numbers of records in one
> file.
> I used the addressbook example, and I noticed that for one record
> it generates file double the size.
>
> for ex. size of the class I was putting into it was 48 bytes and the
> file
> was of 97 bytes on ubuntu 9.10.
>
> Now, I go test it with 1000 records bang! it goes many fold and with
> records in hundreds of thousands, file size increases in many folds.
>
> Has anyone investigated around this area ? I did not note down the
> exact numbers as I thought someone should already have done it.
>
> Please let me know if you want the detail test numbers, I can run
> through it again and provide you with information.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Protocol Buffers" group.
> To post to this group, send email to proto...@googlegroups.com.
> To unsubscribe from this group, send email to
> protobuf+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/protobuf?hl=en.
>
>

>>>
>>>
>>> --
>>> Vinit
>>>
>>>  --
>>> You received this message because you are subscribed to the Google Groups
>>> "Protocol Buffers" group.
>>> To post to this group, send email to proto...@googlegroups.com.
>>> To unsubscribe from this group, send email to
>>> protobuf+unsubscr...@googlegroups.com
>>> .
>>> For more options, visit this group at
>>> http://groups.google.com/group/protobuf?hl=en.
>>>
>>
>>
>
>
> --
> Vinit
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To post to this group, send email to proto...@googlegroups.com.
> To unsubscribe from this group, send email to
> protobuf+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/protob

Re: [protobuf] File size of the serialized records

2010-03-22 Thread Vinit Mahedia
I am using add_person.cc provided in the sample file. The only change I have
done is,
a while loop around this code. So it's same record inserted multiple times.


// Write the new address book back to disk.fstream output(argv[1], ios::out
| ios::trunc | ios::binary);
*int nRecords = 10;*
*while ( nRecords-- )*
*{*if (!address_book.SerializeToOstream(&output)) {
  cerr << "Failed to write address book." << endl;
  return -1;
}*}*

On Mon, Mar 22, 2010 at 1:34 PM, Daniel Wright  wrote:

> The most likely cause is a bug in your code where there's something you
> aren't clearing each time you write a record, so at each iteration in your
> loop, the record you're writing is getting bigger.  Of course I can't say
> for sure without seeing the code.
>
> Daniel
>
> On Mon, Mar 22, 2010 at 1:13 PM, Vinit Mahedia wrote:
>
>> Hi Jason,
>>
>> Thanks for the quick reply.
>>
>> I am not surprised by the increase in file size, But I am under impression
>> that If I insert the same record
>> thousand times, the size of file should be large accordingly,
>>
>> e.g, assume that one record generates the file of size 32 bytes; with1024
>> records should sum up to 32K
>> size or close to that but it does not and that is why I am surprised. The
>> growth in size is not linear and
>> that was the reason I posted my findings. I am a student so might be
>> missing a small concept or anything
>> here, if so, apologies in advance for taking your time.
>>
>> Once again appreciate your help,
>>
>>
>>
>> On Mon, Mar 22, 2010 at 12:42 PM, Jason Hsueh  wrote:
>>
>>> If you're measuring using sizeof(), you won't account for memory
>>> allocated by subobjects (strings and submessages are stored as pointers).
>>> You should use Message::SpaceUsed() instead. The inmemory vs serialized size
>>> is going to depend on your proto definition and how you use it. If you have
>>> a lot of optional fields, but only set one of them, the serialized size will
>>> likely be much smaller than the in memory size. If you have lots of large
>>> strings, they're probably going to be pretty close since both sizes will be
>>> dominated by the actual bytes of the strings.
>>>
>>> It sounds like you are surprised that the serialized size increases as
>>> you increase the number of records. What exactly do you expect to happen
>>> here?
>>>
>>>
>>> On Mon, Mar 22, 2010 at 12:15 PM, Vinit  wrote:
>>>
 I was testing to see the upper limit for numbers of records in one
 file.
 I used the addressbook example, and I noticed that for one record
 it generates file double the size.

 for ex. size of the class I was putting into it was 48 bytes and the
 file
 was of 97 bytes on ubuntu 9.10.

 Now, I go test it with 1000 records bang! it goes many fold and with
 records in hundreds of thousands, file size increases in many folds.

 Has anyone investigated around this area ? I did not note down the
 exact numbers as I thought someone should already have done it.

 Please let me know if you want the detail test numbers, I can run
 through it again and provide you with information.

 --
 You received this message because you are subscribed to the Google
 Groups "Protocol Buffers" group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.


>>>
>>
>>
>> --
>> Vinit
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "Protocol Buffers" group.
>> To post to this group, send email to proto...@googlegroups.com.
>> To unsubscribe from this group, send email to
>> protobuf+unsubscr...@googlegroups.com
>> .
>> For more options, visit this group at
>> http://groups.google.com/group/protobuf?hl=en.
>>
>
>


-- 
Vinit

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] File size of the serialized records

2010-03-22 Thread Daniel Wright
The most likely cause is a bug in your code where there's something you
aren't clearing each time you write a record, so at each iteration in your
loop, the record you're writing is getting bigger.  Of course I can't say
for sure without seeing the code.

Daniel

On Mon, Mar 22, 2010 at 1:13 PM, Vinit Mahedia wrote:

> Hi Jason,
>
> Thanks for the quick reply.
>
> I am not surprised by the increase in file size, But I am under impression
> that If I insert the same record
> thousand times, the size of file should be large accordingly,
>
> e.g, assume that one record generates the file of size 32 bytes; with1024
> records should sum up to 32K
> size or close to that but it does not and that is why I am surprised. The
> growth in size is not linear and
> that was the reason I posted my findings. I am a student so might be
> missing a small concept or anything
> here, if so, apologies in advance for taking your time.
>
> Once again appreciate your help,
>
>
>
> On Mon, Mar 22, 2010 at 12:42 PM, Jason Hsueh  wrote:
>
>> If you're measuring using sizeof(), you won't account for memory allocated
>> by subobjects (strings and submessages are stored as pointers). You should
>> use Message::SpaceUsed() instead. The inmemory vs serialized size is going
>> to depend on your proto definition and how you use it. If you have a lot of
>> optional fields, but only set one of them, the serialized size will likely
>> be much smaller than the in memory size. If you have lots of large strings,
>> they're probably going to be pretty close since both sizes will be dominated
>> by the actual bytes of the strings.
>>
>> It sounds like you are surprised that the serialized size increases as you
>> increase the number of records. What exactly do you expect to happen here?
>>
>>
>> On Mon, Mar 22, 2010 at 12:15 PM, Vinit  wrote:
>>
>>> I was testing to see the upper limit for numbers of records in one
>>> file.
>>> I used the addressbook example, and I noticed that for one record
>>> it generates file double the size.
>>>
>>> for ex. size of the class I was putting into it was 48 bytes and the
>>> file
>>> was of 97 bytes on ubuntu 9.10.
>>>
>>> Now, I go test it with 1000 records bang! it goes many fold and with
>>> records in hundreds of thousands, file size increases in many folds.
>>>
>>> Has anyone investigated around this area ? I did not note down the
>>> exact numbers as I thought someone should already have done it.
>>>
>>> Please let me know if you want the detail test numbers, I can run
>>> through it again and provide you with information.
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Protocol Buffers" group.
>>> To post to this group, send email to proto...@googlegroups.com.
>>> To unsubscribe from this group, send email to
>>> protobuf+unsubscr...@googlegroups.com
>>> .
>>> For more options, visit this group at
>>> http://groups.google.com/group/protobuf?hl=en.
>>>
>>>
>>
>
>
> --
> Vinit
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To post to this group, send email to proto...@googlegroups.com.
> To unsubscribe from this group, send email to
> protobuf+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/protobuf?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] File size of the serialized records

2010-03-22 Thread Vinit Mahedia
Hi Jason,

Thanks for the quick reply.

I am not surprised by the increase in file size, But I am under impression
that If I insert the same record
thousand times, the size of file should be large accordingly,

e.g, assume that one record generates the file of size 32 bytes; with1024
records should sum up to 32K
size or close to that but it does not and that is why I am surprised. The
growth in size is not linear and
that was the reason I posted my findings. I am a student so might be missing
a small concept or anything
here, if so, apologies in advance for taking your time.

Once again appreciate your help,



On Mon, Mar 22, 2010 at 12:42 PM, Jason Hsueh  wrote:

> If you're measuring using sizeof(), you won't account for memory allocated
> by subobjects (strings and submessages are stored as pointers). You should
> use Message::SpaceUsed() instead. The inmemory vs serialized size is going
> to depend on your proto definition and how you use it. If you have a lot of
> optional fields, but only set one of them, the serialized size will likely
> be much smaller than the in memory size. If you have lots of large strings,
> they're probably going to be pretty close since both sizes will be dominated
> by the actual bytes of the strings.
>
> It sounds like you are surprised that the serialized size increases as you
> increase the number of records. What exactly do you expect to happen here?
>
>
> On Mon, Mar 22, 2010 at 12:15 PM, Vinit  wrote:
>
>> I was testing to see the upper limit for numbers of records in one
>> file.
>> I used the addressbook example, and I noticed that for one record
>> it generates file double the size.
>>
>> for ex. size of the class I was putting into it was 48 bytes and the
>> file
>> was of 97 bytes on ubuntu 9.10.
>>
>> Now, I go test it with 1000 records bang! it goes many fold and with
>> records in hundreds of thousands, file size increases in many folds.
>>
>> Has anyone investigated around this area ? I did not note down the
>> exact numbers as I thought someone should already have done it.
>>
>> Please let me know if you want the detail test numbers, I can run
>> through it again and provide you with information.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Protocol Buffers" group.
>> To post to this group, send email to proto...@googlegroups.com.
>> To unsubscribe from this group, send email to
>> protobuf+unsubscr...@googlegroups.com
>> .
>> For more options, visit this group at
>> http://groups.google.com/group/protobuf?hl=en.
>>
>>
>


-- 
Vinit

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] File size of the serialized records

2010-03-22 Thread Jason Hsueh
If you're measuring using sizeof(), you won't account for memory allocated
by subobjects (strings and submessages are stored as pointers). You should
use Message::SpaceUsed() instead. The inmemory vs serialized size is going
to depend on your proto definition and how you use it. If you have a lot of
optional fields, but only set one of them, the serialized size will likely
be much smaller than the in memory size. If you have lots of large strings,
they're probably going to be pretty close since both sizes will be dominated
by the actual bytes of the strings.

It sounds like you are surprised that the serialized size increases as you
increase the number of records. What exactly do you expect to happen here?

On Mon, Mar 22, 2010 at 12:15 PM, Vinit  wrote:

> I was testing to see the upper limit for numbers of records in one
> file.
> I used the addressbook example, and I noticed that for one record
> it generates file double the size.
>
> for ex. size of the class I was putting into it was 48 bytes and the
> file
> was of 97 bytes on ubuntu 9.10.
>
> Now, I go test it with 1000 records bang! it goes many fold and with
> records in hundreds of thousands, file size increases in many folds.
>
> Has anyone investigated around this area ? I did not note down the
> exact numbers as I thought someone should already have done it.
>
> Please let me know if you want the detail test numbers, I can run
> through it again and provide you with information.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To post to this group, send email to proto...@googlegroups.com.
> To unsubscribe from this group, send email to
> protobuf+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/protobuf?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] File size of the serialized records

2010-03-22 Thread Vinit
I was testing to see the upper limit for numbers of records in one
file.
I used the addressbook example, and I noticed that for one record
it generates file double the size.

for ex. size of the class I was putting into it was 48 bytes and the
file
was of 97 bytes on ubuntu 9.10.

Now, I go test it with 1000 records bang! it goes many fold and with
records in hundreds of thousands, file size increases in many folds.

Has anyone investigated around this area ? I did not note down the
exact numbers as I thought someone should already have done it.

Please let me know if you want the detail test numbers, I can run
through it again and provide you with information.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.