Re: Protocol Buffers Vs. XML Fast Infoset
On Apr 10, 10:19 pm, Kenton Varda ken...@google.com wrote: I think we define compression differently. In my book, redundancy elimination and compression are pretty much synonymous. It sounds like you are using a more specific definition (LZW?). If that was true then string interning would also be classified as compression ;-) What you are actually referring to is compaction, not compression. Compaction reduces the amount of data used to represent a given amount of information. For example, an XML encoder can perform compaction by eliminating unnecessary redundancy, removing irrelevancy or using a special representation such as a restricted alphabet; all these are part of the encoder's work. Compression does not reduce the amount of data used to represent a given amount of information as compaction does, it reduces the space taken by that data. Contrary to an XML encoder, a compressor cannot create a representation of any information, it can only be fed with an existing representation; its output is the same representation packed into a more dense format. Fast Infoset is a compact encoding of the XML Infoset. GZIP is a compressed data format. The binary XML community uses the term compactness when considering the size of a representation of the XML Infoset; the term compression is used when GZIP or another compression format is used to further reduce the size of a binary XML representation. Sure, but FI wasn't smaller than protobuf either, was it? In the few tests that we performed FI was smaller than protobuf, but not by a large margin. However, both formats have the potential of being considerably more compact than the other under different circumstances; for example, protobuf with small datasets, FI with medium/large datasets containing repeating values. I would expect that after applying some sort of LZW compression to *both* documents, they'd come out roughly the same size. (FI would probably have some overhead for self-description but for large documents that wouldn't matter.) In the same tests as those mentioned above, using GZIP compression on Fast Infoset and protobuf documents resulted in roughly the same size of compressed docs. Without the LZW applied, perhaps FI is smaller due to its redundancy elimination -- I still don't know enough about FI to really understand how it works. However, I suspect protobuf will be much faster to parse and encode, by virtue of being simpler. Yes, protobuf is much faster, I stated so in an earlier post. Alexander --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Protocol Buffers Vs. XML Fast Infoset
On Fri, Apr 10, 2009 at 5:24 AM, Alexander Philippou alexander.philip...@gmail.com wrote: The redundancy elimination mechanism of FI is actually a vocabulary and it works differently than compression algorithms do. I think we define compression differently. In my book, redundancy elimination and compression are pretty much synonymous. It sounds like you are using a more specific definition (LZW?). FI documents are good candidates for compression irrespective of whether a vocabulary is used or not. We've done a few tests with medium/large- sized documents and protobuf wasn't more compact than FI. Sure, but FI wasn't smaller than protobuf either, was it? I would expect that after applying some sort of LZW compression to *both* documents, they'd come out roughly the same size. (FI would probably have some overhead for self-description but for large documents that wouldn't matter.) Without the LZW applied, perhaps FI is smaller due to its redundancy elimination -- I still don't know enough about FI to really understand how it works. However, I suspect protobuf will be much faster to parse and encode, by virtue of being simpler. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Protocol Buffers Vs. XML Fast Infoset
On Apr 3, 10:40 am, ShirishKul shirish...@gmail.com wrote: I worked to see the difference between the *XML fast infoset* and the *Protocol Buffers* (although I'm not aware about what are internal things happening therein). I found that for a typical data to be transferred across the wire for size of 500KB that a XML file would represent has corresponding file size as 300KB for PB binary and around 130KB for XML Fast Infoset binary file. Just going back to these numbers, a less-than-50% benefit for going from XML to PB is surprisingly bad. Do you have a sample file with non-confidential data that we could look at? Jon --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Protocol Buffers Vs. XML Fast Infoset
On Tue, Apr 7, 2009 at 10:15 PM, ShirishKul shirish...@gmail.com wrote: I do not have any sample file to share with you. But I think FI handles the repeatative attribute-values. OK, well, I call that compression. Try gzipping the final protobuf and FI documents and comparing the compressed sizes. The protobuf will probably compress better, so I'd expect the final results to be roughly even. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Protocol Buffers Vs. XML Fast Infoset
On Fri, Apr 3, 2009 at 2:40 AM, ShirishKul shirish...@gmail.com wrote: I found that for a typical data to be transferred across the wire for size of 500KB that a XML file would represent has corresponding file size as 300KB for PB binary and around 130KB for XML Fast Infoset binary file. What kind of data were you encoding? I'm guessing you enabled some kind of compression for the FI encoding? Note that protocol buffers, while compact, do not actually apply any sort of compression themselves. For repetitive data or data containing a lot of text strings, applying zlib compression to the encoded message can make it much smaller. Timings to parsing and serializing is extremely good for Protocol buffers. :) (Don't forget to use optimize_for = SPEED if performance is important -- this will be the default in the next version.) What makes a difference if we consider XML fast infoset binary against PB binary in terms for Sizes, speed to parse them up etc.? I don't actually know much about FI. My guess based on reading some descriptions of FI is that PB is similar to FI's non-self-describing, no-compression mode. I would also guess that because XML is a much more complicated format than protocol buffers, FI probably has more overhead when encoding simple structured data, especially number-heavy data. For string-heavy data, though, XML works pretty well and so this overhead may not be an issue in that case. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---