On 12/12/11 13:59, Carl (CBM) wrote: > This is correct, but the overall memory usage depends on the XML > library and programming technique being used. For XML that is too > large to comfortably fit in memory, there are techniques to allow for > the script to process the data before the entire XML file is parsed > (google "SAX" or "stream-oriented parsing"). But this requires more > advanced programming techniques, such as callbacks, compared to the > more naive method of parsing all the XML into a data structure and > then returning the data structure. That naive technique can result in > large memory use if, say, the program tries to create a memory array > of every page revision on enwiki. > > Of course if the perl script is doing the parsing itself, by just > matching regular expressions, this is not hard to do in a > stream-oriented way. > > - Carl
Obviously. No matter if it's read from a .xml or a .xml.bz2, if it tried to build a xml tree in memory the memory usage would be incredibly huge. I would expect such app to get killed for such. _______________________________________________ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette