On 12/12/11 13:59, Carl (CBM) wrote:
> This is correct, but the overall memory usage depends on the XML
> library and programming technique being used. For XML that is too
> large to comfortably fit in memory, there are techniques to allow for
> the script to process the data before the entire XML file is parsed
> (google "SAX" or "stream-oriented parsing"). But this requires more
> advanced programming techniques, such as callbacks, compared to the
> more naive method of parsing all the XML into a data structure and
> then returning the data structure.  That naive technique can result in
> large memory use if, say, the program tries to create a memory array
> of every page revision on enwiki.
> 
> Of course if the perl script is doing the parsing itself, by just
> matching regular expressions, this is not hard to do in a
> stream-oriented way.
> 
> - Carl

Obviously. No matter if it's read from a .xml or a .xml.bz2, if it tried
to build a xml tree in memory the memory usage would be incredibly huge.
I would expect such app to get killed for such.


_______________________________________________
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Reply via email to