Re: Protocol Buffers Vs. XML Fast Infoset

2009-04-13 Thread Alexander Philippou

On Apr 10, 10:19 pm, Kenton Varda ken...@google.com wrote:
 I think we define compression differently.  In my book, redundancy
 elimination and compression are pretty much synonymous.  It sounds like
 you are using a more specific definition (LZW?).

If that was true then string interning would also be classified as
compression ;-)

What you are actually referring to is compaction, not compression.
Compaction reduces the amount of data used to represent a given amount
of information. For example, an XML encoder can perform compaction by
eliminating unnecessary redundancy, removing irrelevancy or using a
special representation such as a restricted alphabet; all these are
part of the encoder's work. Compression does not reduce the amount of
data used to represent a given amount of information as compaction
does, it reduces the space taken by that data. Contrary to an XML
encoder, a compressor cannot create a representation of any
information, it can only be fed with an existing representation; its
output is the same representation packed into a more dense format.
Fast Infoset is a compact encoding of the XML Infoset. GZIP is a
compressed data format. The binary XML community uses the term
compactness when considering the size of a representation of the XML
Infoset; the term compression is used when GZIP or another compression
format is used to further reduce the size of a binary XML
representation.

 Sure, but FI wasn't smaller than protobuf either, was it?

In the few tests that we performed FI was smaller than protobuf, but
not by a large margin. However, both formats have the potential of
being considerably more compact than the other under different
circumstances; for example, protobuf with small datasets, FI with
medium/large datasets containing repeating values.

 I would expect
 that after applying some sort of LZW compression to *both* documents, they'd
 come out roughly the same size.  (FI would probably have some overhead for
 self-description but for large documents that wouldn't matter.)

In the same tests as those mentioned above, using GZIP compression on
Fast Infoset and protobuf documents resulted in roughly the same
size of compressed docs.

 Without the LZW applied, perhaps FI is smaller due to its redundancy
 elimination -- I still don't know enough about FI to really understand how
 it works.  However, I suspect protobuf will be much faster to parse and
 encode, by virtue of being simpler.

Yes, protobuf is much faster, I stated so in an earlier post.

Alexander
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Protocol Buffers Vs. XML Fast Infoset

2009-04-10 Thread Kenton Varda
On Fri, Apr 10, 2009 at 5:24 AM, Alexander Philippou 
alexander.philip...@gmail.com wrote:

 The redundancy elimination mechanism of FI is actually a vocabulary
 and it works differently than compression algorithms do.


I think we define compression differently.  In my book, redundancy
elimination and compression are pretty much synonymous.  It sounds like
you are using a more specific definition (LZW?).


 FI documents
 are good candidates for compression irrespective of whether a
 vocabulary is used or not. We've done a few tests with medium/large-
 sized documents and protobuf wasn't more compact than FI.


Sure, but FI wasn't smaller than protobuf either, was it?  I would expect
that after applying some sort of LZW compression to *both* documents, they'd
come out roughly the same size.  (FI would probably have some overhead for
self-description but for large documents that wouldn't matter.)

Without the LZW applied, perhaps FI is smaller due to its redundancy
elimination -- I still don't know enough about FI to really understand how
it works.  However, I suspect protobuf will be much faster to parse and
encode, by virtue of being simpler.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Protocol Buffers Vs. XML Fast Infoset

2009-04-08 Thread Jon Skeet sk...@pobox.com

On Apr 3, 10:40 am, ShirishKul shirish...@gmail.com wrote:
 I worked to see the difference between the *XML fast infoset* and the
 *Protocol Buffers* (although I'm not aware about what are internal
 things happening therein).

 I found that for a typical data to be transferred across the wire for
 size of 500KB that a XML file would represent has corresponding file
 size as 300KB for PB binary and around 130KB for XML Fast Infoset
 binary file.

Just going back to these numbers, a less-than-50% benefit for going
from XML to PB is surprisingly bad.

Do you have a sample file with non-confidential data that we could
look at?

Jon


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Protocol Buffers Vs. XML Fast Infoset

2009-04-08 Thread Kenton Varda
On Tue, Apr 7, 2009 at 10:15 PM, ShirishKul shirish...@gmail.com wrote:

 I do not have any sample file to share with you. But I think FI
 handles the repeatative attribute-values.


OK, well, I call that compression.  Try gzipping the final protobuf and FI
documents and comparing the compressed sizes.  The protobuf will probably
compress better, so I'd expect the final results to be roughly even.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Protocol Buffers Vs. XML Fast Infoset

2009-04-03 Thread Kenton Varda
On Fri, Apr 3, 2009 at 2:40 AM, ShirishKul shirish...@gmail.com wrote:

 I found that for a typical data to be transferred across the wire for
 size of 500KB that a XML file would represent has corresponding file
 size as 300KB for PB binary and around 130KB for XML Fast Infoset
 binary file.


What kind of data were you encoding?

I'm guessing you enabled some kind of compression for the FI encoding?  Note
that protocol buffers, while compact, do not actually apply any sort of
compression themselves.  For repetitive data or data containing a lot of
text strings, applying zlib compression to the encoded message can make it
much smaller.


 Timings to parsing and serializing is extremely good for Protocol
 buffers.


:)

(Don't forget to use optimize_for = SPEED if performance is important --
this will be the default in the next version.)

What makes a difference if we consider XML fast infoset binary against
 PB binary in terms for Sizes, speed to parse them up etc.?


I don't actually know much about FI.  My guess based on reading some
descriptions of FI is that PB is similar to FI's non-self-describing,
no-compression mode.  I would also guess that because XML is a much more
complicated format than protocol buffers, FI probably has more overhead when
encoding simple structured data, especially number-heavy data.  For
string-heavy data, though, XML works pretty well and so this overhead may
not be an issue in that case.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---