In article <[email protected]>,
Adam Tauno Williams <[email protected]> wrote:
> XML works extremely well for large datasets.
Barf. I'll agree that there are some nice points to XML. It is
portable. It is (to a certain extent) human readable, and in a pinch
you can use standard text tools to do ad-hoc queries (i.e. grep for a
particular entry). And, yes, there are plenty of toolsets for dealing
with XML files.
On the other hand, the verbosity is unbelievable. I'm currently working
with a data feed we get from a supplier in XML. Every day we get
incremental updates of about 10-50 MB each. The total data set at this
point is 61 GB. It's got stuff like this in it:
<Parental-Advisory>FALSE</Parental-Advisory>
That's 54 bytes to store a single bit of information. I'm all for
human-readable formats, but bloating the data by a factor of 432 is
rather excessive. Of course, that's an extreme example. A more
efficient example would be:
<Id>1173722</Id>
which is 26 bytes to store an integer. That's only a bloat factor of
6-1/2.
Of course, one advantage of XML is that with so much redundant text, it
compresses well. We typically see gzip compression ratios of 20:1.
But, that just means you can archive them efficiently; you can't do
anything useful until you unzip them.
--
http://mail.python.org/mailman/listinfo/python-list