On Tue, Dec 21, 2010 at 5:19 AM, Stefan Behnel <stefan...@behnel.de> wrote: > Alan Gauld, 21.12.2010 10:58: >> >> "David Hutto" wrote >>> >>> >>> http://www.google.com/search?client=ubuntu&channel=fs&q=parsing+gigabyte+xml+python&ie=utf-8&oe=utf-8 >> >> Eeek! One of the listings says: >> >>> 22 Jan 2009 ... Stripping Illegal Characters from XML in Python >> >> >> ... I'd be asking Python to process 6.4 gigabytes of CSV into >> 6.5 gigabytes of XML 1. ..... In fact, what happened was that >> the parsing didn't work and the whole db was ... >> >> And I thought a 1G file was extreme... Do these people stop to think that >> with XML as much as 80% of their "data" is just description (ie the tags). > > As I already said, it compresses well. In run-length compressed XML files, > the tags can easily take up a negligible amount of space compared to the > more widely varying data content (although that also commonly tends to > compress rather well). And depending on how fast your underlying storage is, > decompressing and parsing the file may still be faster than parsing a huge > uncompressed file directly. So, again, the shear uncompressed file size is > *not* a very interesting argument. >
However, could they (as mentioned elsewhere, and by other in another form)mitigate the damage by using smaller tags exclusively? And also compressed is formatted, even for the tags, correct? _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor