Re: [protobuf] How realistic are benchmarks such as NorthWind ?

2010-05-15 Thread Marc Gravell
As I say, you can do that *now* (not parsing unwanted fields, and scanning
through the data without buffering everything in memory). For protobuf-net,
there is an example of the *second* part of this ("streaming demo" or
something - I don't have the code handy). The first part is simply: pass in
a T that doesn't define the fields you don't want and doesn't implement
IExtensible.

Interestingly, this is perhaps something that could be done even more
frugally in the "v2" code, but that is incomplete.

Let me know if you want an example of streaming and filtering; I'm tied up
at the moment, but should be able to add something later today if needed.

I stress; this is all specific to the protobuf-net implementation; I can't
comment on the other implementations. But it sounds like you /are/ talking
about protobuf-net.

Marc

On 15 May 2010 05:45, Kevin Apte  wrote:

> Marc:
>
> Thanks for your input. I think your comment helps me clarify my query:
>
> Most applications or services that are "producers"  will generate data with
> N fields in it. Consumers may be interested in only m fields- m could be 5
> and N could be 20. For example: An address book service will generate an
> address with 25 fields in it. An application that consumes the service will
> want only 3- say name, phone number, and zip code
>
> In the current implementation, there is a way of picking 5 fields only.
> Ideally, the time taken to pick only 3 fields, should be a lot less than
> picking 25 fields.
>
> An even better implementation will screen records based on field values.  I
> do not agree that this is "making it a database". XML has allowed query
> processing for at least 10 years. XML even allows joining 2 XML records
> based on a common key. In a database, whether the traditional RDBMS or a
> NoSQL kind, one has to pay the price for ACID properties or for "CAP" -
> consistency, availability and partitioning. These problems do not exist if
> one is screening 10,000 protocol buffers looking for  a particular field.
>
> I would imagine that there are many applications which read Protocol
> Buffers for thousands of records, picking only a small fraction of them.
>
> I appreciate the simplicity of Protocol Buffers, but adding features like
> these have nothing to do with complicating the original simplicity, as it is
> like a layer that adds value without overhead- Those applications which want
> to screen based on field values, can screen.
>
> Kevin
>
>
>
>
>
>
>
>
>
>
>
> On Fri, May 14, 2010 at 11:52 PM, Marc Gravell wrote:
>
>> Firstly, I must note that those benchmarks are specific to protobuf-net (a
>> specific implementation), not "protocol buffers"  (which covers a range of
>> implementations). Re "is it not more realistic"; well, that depends entirely
>> on what your use-case *is*. It /sounds/ like you are really talking about
>> querying ad-hoc data; if so a file-based database may be more appropriate.
>> But it depends entirely on your scenario.
>>
>> It /would/ be possible (with protobuf-net at least; I can't comment beyond
>> that) to construct a type that represents the data that you *are* interested
>> in - the other fields would be quietly dropped without having to fully
>> process them, avoiding some CPU. Likewise, it is possible to read items in a
>> non-buffered way (i.e. you only have 1 object directly available in memory;
>> any others are discarded immediately, available for GC). However; again - it
>> sounds like you *really* want a database. Which "protocol buffers" isn't.
>>
>> Marc Gravell
>>
>> On 14 May 2010 11:31, Kevin Apte- SOA and Cloud Computing Architect <
>> technicalarchitect2...@gmail.com> wrote:
>>
>>>I saw that ProtoBuf has been benchmarked using the Northwind data
>>> set- a data set of size 130K, with 3000 objects including orders and
>>> order line items.
>>>
>>> This is an excellent review:
>>> http://code.google.com/p/protobuf-net/wiki/Performance
>>>
>>> Is it not more realistic, to have a benchmark with a much larger file,
>>> in which we are interested only in a few records, and a few fields
>>> within those records.
>>>
>>> For example: 10,000 order line items, we want only a line item with a
>>> particular product code.
>>> Or we want to pick orders for a particular customer type, or with a
>>> particular description.
>>>
>>> Are there use cases where data is stored in Protocol Buffer Format in
>>> a file, and read into memory?
>>>
>>> Another issue is that the size seems rather small- it is only 256
>>> bytes per object,- I would imagine there are many use cases where the
>>> objects are much bigger.
>>>
>>> Many  use cases are going to be with much larger objects and will
>>> select m out N fields- where m will be 5 and N will be 20.  This is
>>> because very rarely can an application want all of the information in
>>> a protocol buffer generated by another program.
>>>
>>> Any comments?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> You received this message because you a

Re: [protobuf] How realistic are benchmarks such as NorthWind ?

2010-05-14 Thread Kevin Apte
Marc:

Thanks for your input. I think your comment helps me clarify my query:

Most applications or services that are "producers"  will generate data with
N fields in it. Consumers may be interested in only m fields- m could be 5
and N could be 20. For example: An address book service will generate an
address with 25 fields in it. An application that consumes the service will
want only 3- say name, phone number, and zip code

In the current implementation, there is a way of picking 5 fields only.
Ideally, the time taken to pick only 3 fields, should be a lot less than
picking 25 fields.

An even better implementation will screen records based on field values.  I
do not agree that this is "making it a database". XML has allowed query
processing for at least 10 years. XML even allows joining 2 XML records
based on a common key. In a database, whether the traditional RDBMS or a
NoSQL kind, one has to pay the price for ACID properties or for "CAP" -
consistency, availability and partitioning. These problems do not exist if
one is screening 10,000 protocol buffers looking for  a particular field.

I would imagine that there are many applications which read Protocol Buffers
for thousands of records, picking only a small fraction of them.

I appreciate the simplicity of Protocol Buffers, but adding features like
these have nothing to do with complicating the original simplicity, as it is
like a layer that adds value without overhead- Those applications which want
to screen based on field values, can screen.

Kevin










On Fri, May 14, 2010 at 11:52 PM, Marc Gravell wrote:

> Firstly, I must note that those benchmarks are specific to protobuf-net (a
> specific implementation), not "protocol buffers"  (which covers a range of
> implementations). Re "is it not more realistic"; well, that depends entirely
> on what your use-case *is*. It /sounds/ like you are really talking about
> querying ad-hoc data; if so a file-based database may be more appropriate.
> But it depends entirely on your scenario.
>
> It /would/ be possible (with protobuf-net at least; I can't comment beyond
> that) to construct a type that represents the data that you *are* interested
> in - the other fields would be quietly dropped without having to fully
> process them, avoiding some CPU. Likewise, it is possible to read items in a
> non-buffered way (i.e. you only have 1 object directly available in memory;
> any others are discarded immediately, available for GC). However; again - it
> sounds like you *really* want a database. Which "protocol buffers" isn't.
>
> Marc Gravell
>
> On 14 May 2010 11:31, Kevin Apte- SOA and Cloud Computing Architect <
> technicalarchitect2...@gmail.com> wrote:
>
>>I saw that ProtoBuf has been benchmarked using the Northwind data
>> set- a data set of size 130K, with 3000 objects including orders and
>> order line items.
>>
>> This is an excellent review:
>> http://code.google.com/p/protobuf-net/wiki/Performance
>>
>> Is it not more realistic, to have a benchmark with a much larger file,
>> in which we are interested only in a few records, and a few fields
>> within those records.
>>
>> For example: 10,000 order line items, we want only a line item with a
>> particular product code.
>> Or we want to pick orders for a particular customer type, or with a
>> particular description.
>>
>> Are there use cases where data is stored in Protocol Buffer Format in
>> a file, and read into memory?
>>
>> Another issue is that the size seems rather small- it is only 256
>> bytes per object,- I would imagine there are many use cases where the
>> objects are much bigger.
>>
>> Many  use cases are going to be with much larger objects and will
>> select m out N fields- where m will be 5 and N will be 20.  This is
>> because very rarely can an application want all of the information in
>> a protocol buffer generated by another program.
>>
>> Any comments?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Protocol Buffers" group.
>> To post to this group, send email to proto...@googlegroups.com.
>> To unsubscribe from this group, send email to
>> protobuf+unsubscr...@googlegroups.com
>> .
>> For more options, visit this group at
>> http://groups.google.com/group/protobuf?hl=en.
>>
>>
>
>
> --
> Regards,
>
> Marc
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] How realistic are benchmarks such as NorthWind ?

2010-05-14 Thread Marc Gravell
Firstly, I must note that those benchmarks are specific to protobuf-net (a
specific implementation), not "protocol buffers"  (which covers a range of
implementations). Re "is it not more realistic"; well, that depends entirely
on what your use-case *is*. It /sounds/ like you are really talking about
querying ad-hoc data; if so a file-based database may be more appropriate.
But it depends entirely on your scenario.

It /would/ be possible (with protobuf-net at least; I can't comment beyond
that) to construct a type that represents the data that you *are* interested
in - the other fields would be quietly dropped without having to fully
process them, avoiding some CPU. Likewise, it is possible to read items in a
non-buffered way (i.e. you only have 1 object directly available in memory;
any others are discarded immediately, available for GC). However; again - it
sounds like you *really* want a database. Which "protocol buffers" isn't.

Marc Gravell

On 14 May 2010 11:31, Kevin Apte- SOA and Cloud Computing Architect <
technicalarchitect2...@gmail.com> wrote:

>I saw that ProtoBuf has been benchmarked using the Northwind data
> set- a data set of size 130K, with 3000 objects including orders and
> order line items.
>
> This is an excellent review:
> http://code.google.com/p/protobuf-net/wiki/Performance
>
> Is it not more realistic, to have a benchmark with a much larger file,
> in which we are interested only in a few records, and a few fields
> within those records.
>
> For example: 10,000 order line items, we want only a line item with a
> particular product code.
> Or we want to pick orders for a particular customer type, or with a
> particular description.
>
> Are there use cases where data is stored in Protocol Buffer Format in
> a file, and read into memory?
>
> Another issue is that the size seems rather small- it is only 256
> bytes per object,- I would imagine there are many use cases where the
> objects are much bigger.
>
> Many  use cases are going to be with much larger objects and will
> select m out N fields- where m will be 5 and N will be 20.  This is
> because very rarely can an application want all of the information in
> a protocol buffer generated by another program.
>
> Any comments?
>
>
>
>
>
>
>
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To post to this group, send email to proto...@googlegroups.com.
> To unsubscribe from this group, send email to
> protobuf+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/protobuf?hl=en.
>
>


-- 
Regards,

Marc

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.