Hello,

I need to parse two big XML files in a row (30+MB each). I have tried
both REXML and Hpricot. They do work. Thing is, with both libraries,
the parsing of each file takes a huge amount of memory: more than
700MB each!

So I was wondering:
- is it normal that parsing a 30MB file takes 700MB of memory? Could
it be that something is wrong with the file? Is there an alternative
way to deal with such big files?
- is there a way to force the release of the memory when I don't need
the file anymore? At the moment it is not released instantly after the
first file, so I end up with 1.5GB memory use.

I have reduced the code to the minimum to isolate the memory issue:

xml = File.read("myfile.xml")
doc = REXML::Document.new(xml) or doc = Hpricot.XML(xml)
doc = nil

and repeat with the second file.

Also, I tried libxml in case. I get an error message that I can't
explain:
LibXML::XML::Error (Fatal error: Input is not proper UTF-8, indicate
encoding !  yet the file is UTF-8 as far as I can tell.

Thanks a lot for your help.
Pierre

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to