On Apr 10, 10:19 pm, Kenton Varda <ken...@google.com> wrote: > I think we define "compression" differently. In my book, "redundancy > elimination" and "compression" are pretty much synonymous. It sounds like > you are using a more specific definition (LZW?).
If that was true then string interning would also be classified as compression ;-) What you are actually referring to is "compaction", not "compression". Compaction reduces the amount of data used to represent a given amount of information. For example, an XML encoder can perform compaction by eliminating unnecessary redundancy, removing irrelevancy or using a special representation such as a restricted alphabet; all these are part of the encoder's work. Compression does not reduce the amount of data used to represent a given amount of information as compaction does, it reduces the space taken by that data. Contrary to an XML encoder, a compressor cannot create a representation of any information, it can only be fed with an existing representation; its output is the same representation packed into a more dense format. Fast Infoset is a compact encoding of the XML Infoset. GZIP is a compressed data format. The binary XML community uses the term compactness when considering the size of a representation of the XML Infoset; the term compression is used when GZIP or another compression format is used to further reduce the size of a binary XML representation. > Sure, but FI wasn't smaller than protobuf either, was it? In the few tests that we performed FI was smaller than protobuf, but not by a large margin. However, both formats have the potential of being considerably more compact than the other under different circumstances; for example, protobuf with small datasets, FI with medium/large datasets containing repeating values. > I would expect > that after applying some sort of LZW compression to *both* documents, they'd > come out roughly the same size. (FI would probably have some overhead for > self-description but for large documents that wouldn't matter.) In the same tests as those mentioned above, using GZIP compression on Fast Infoset and protobuf documents resulted in "roughly the same size" of compressed docs. > Without the LZW applied, perhaps FI is smaller due to its "redundancy > elimination" -- I still don't know enough about FI to really understand how > it works. However, I suspect protobuf will be much faster to parse and > encode, by virtue of being simpler. Yes, protobuf is much faster, I stated so in an earlier post. Alexander --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to email@example.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~----------~----~----~----~------~----~------~--~---