Re: Protocol Buffers Vs. XML Fast Infoset

Alexander Philippou Mon, 13 Apr 2009 14:40:12 -0700

On Apr 10, 10:19 pm, Kenton Varda <ken...@google.com> wrote:
> I think we define "compression" differently.  In my book, "redundancy
> elimination" and "compression" are pretty much synonymous.  It sounds like
> you are using a more specific definition (LZW?).


If that was true then string interning would also be classified as
compression ;-)

What you are actually referring to is "compaction", not "compression".
Compaction reduces the amount of data used to represent a given amount
of information. For example, an XML encoder can perform compaction by
eliminating unnecessary redundancy, removing irrelevancy or using a
special representation such as a restricted alphabet; all these are
part of the encoder's work. Compression does not reduce the amount of
data used to represent a given amount of information as compaction
does, it reduces the space taken by that data. Contrary to an XML
encoder, a compressor cannot create a representation of any
information, it can only be fed with an existing representation; its
output is the same representation packed into a more dense format.
Fast Infoset is a compact encoding of the XML Infoset. GZIP is a
compressed data format. The binary XML community uses the term
compactness when considering the size of a representation of the XML
Infoset; the term compression is used when GZIP or another compression
format is used to further reduce the size of a binary XML
representation.

> Sure, but FI wasn't smaller than protobuf either, was it?

In the few tests that we performed FI was smaller than protobuf, but
not by a large margin. However, both formats have the potential of
being considerably more compact than the other under different
circumstances; for example, protobuf with small datasets, FI with
medium/large datasets containing repeating values.

> I would expect
> that after applying some sort of LZW compression to *both* documents, they'd
> come out roughly the same size.  (FI would probably have some overhead for
> self-description but for large documents that wouldn't matter.)

In the same tests as those mentioned above, using GZIP compression on
Fast Infoset and protobuf documents resulted in "roughly the same
size" of compressed docs.

> Without the LZW applied, perhaps FI is smaller due to its "redundancy
> elimination" -- I still don't know enough about FI to really understand how
> it works.  However, I suspect protobuf will be much faster to parse and
> encode, by virtue of being simpler.

Yes, protobuf is much faster, I stated so in an earlier post.

Alexander
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Protocol Buffers Vs. XML Fast Infoset

Reply via email to