Am I missing something? I don't read where the poster mentioned the operation as being CPU intensive. He does mention that the entirety of a 10 GB file cannot be loaded into memory. If you discount physical swapfile paging and base this assumption on a "normal" PC that might have maybe 1 or 2 GB of RAM is his assumption that out of line?
And I don't doubt that Python is efficient as possible for I/O operations. But since it is an interpreted scripting language how could it be "just as fast as any language" as you claim? C would have to be faster. Machine language would have to be faster. And even other interpreted languages *could* be faster, given certain conditions. A generalization like the claim kind of invalidates the remainder of your assertion. fuzzylollipop wrote: > K.S.Sreeram wrote: > > Diez B. Roggisch wrote: > > > What the OP needs is a different approach to XML-documents that won't > > > parse the whole file into one giant tree - but I'm pretty sure that > > > (c)ElementTree will do the job as well as expat. And I don't recall the > > > OP musing about performances woes, btw. > > > > > > There's just NO WAY that the 10gb xml file can be loaded into memory as > > a tree on any normal machine, irrespective of whether we use C or > > Python. So the *only* way is to perform some kind of 'stream' processing > > on the file. Perhaps using a SAX like API. So (c)ElementTree is ruled > > out for this. > > > > Diez B. Roggisch wrote: > > > No what exactly makes C grok a 10Gb file where python will fail to do so? > > > > In most typical cases where there's any kind of significant python code, > > its possible to achieve a *minimum* of a 10x speedup by using C. In most > > cases, the speedup is not worth it and we just trade it for the > > increased flexiblity/power of the python language. But in this situation > > using a bit of tight C code could make the difference between the > > process taking just 15mins or taking a few hours! > > > > Ofcourse I'm not asking him to write the entire application in C. It > > makes sense to just write the performance critical sections in C, and > > wrap it in Python, and write the rest of the application in Python. > > > you got no idea what you are talking about, anyone knows that something > like this is IO bound. > CPU is the least of his worries. And for IO bound applications Python > is just as fast as any other language. -- http://mail.python.org/mailman/listinfo/python-list