On 21 December 2010 14:11, Alan Gauld <alan.ga...@btinternet.com> wrote:
> But I don't understand how uncompressing a file before parsing it can > be faster than parsing the original uncompressed file? > Because of IO overhead/benefits. It's not so much that the parsing aspect of it is faster of course (it is what it is), it's that the total time taken to (read+decompress+parse) is faster than just (read+parse), because the time to actually read the compressed data is less than the time it takes to decompress that data into RAM. Generally speaking, compared to your CPU and memory, with respect to IO your disk is always going to be the culprit, though of course it does depend on exactly how much data we're talking about, how fast your CPU is, etc. In general computing this is less of an issue nowadays than perhaps a few years ago, and the gains can be as you say small, or sometimes not so small, depending exactly how much data you've got, how highly compressed it's become, how fast/efficient the decompresser is, how slow your I/O channel is etc, but the point nevertheless stands. Case in point, it's perhaps interesting to note that this technique is used regularly on the web in general -- most web servers actually stream their HTML content as LZ compressed data streams, since (as above) it's quicker to compress, stream, decompress and parse than it is to just stream the data direct. (And, of course, thanks to zlib + urllib one can even use this feature from Python should you wish to do so.) Anyway, just my $0.02! Walter
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor