Hi all, Thank you so much for pointing me in the right direction.
I used a REXML SAX2Parser: it solved my problem. It's a bit more code indeed, but it uses a fraction of the memory and it seems quite fast to me. Thanks a lot, Pierre On Jun 11, 5:22 pm, Maurício Linhares <[email protected]> wrote: > Hi Pierre, > > I had a 45~50mb file to parse using Ruby libraries but to no avail, > the DOM based libraries were slow to death and the SAX based one that > I tried (libxml-ruby) had some serious memory leaks. Now there's this > SaxMachine from paul dix that looks usable > -http://www.pauldix.net/2009/01/sax-machine-sax-parsing-made-easy.html > > As to my problem, I wrote a StAX based parser using Java to get it to > run in reasonable time :( > > - > Maurício > Linhareshttp://codeshooter.wordpress.com/|http://twitter.com/mauriciojr > > > > On Thu, Jun 11, 2009 at 5:41 AM, PierreW<[email protected]> wrote: > > > Hello, > > > I need to parse two big XML files in a row (30+MB each). I have tried > > both REXML and Hpricot. They do work. Thing is, with both libraries, > > the parsing of each file takes a huge amount of memory: more than > > 700MB each! > > > So I was wondering: > > - is it normal that parsing a 30MB file takes 700MB of memory? Could > > it be that something is wrong with the file? Is there an alternative > > way to deal with such big files? > > - is there a way to force the release of the memory when I don't need > > the file anymore? At the moment it is not released instantly after the > > first file, so I end up with 1.5GB memory use. > > > I have reduced the code to the minimum to isolate the memory issue: > > > xml = File.read("myfile.xml") > > doc = REXML::Document.new(xml) or doc = Hpricot.XML(xml) > > doc = nil > > > and repeat with the second file. > > > Also, I tried libxml in case. I get an error message that I can't > > explain: > > LibXML::XML::Error (Fatal error: Input is not proper UTF-8, indicate > > encoding ! yet the file is UTF-8 as far as I can tell. > > > Thanks a lot for your help. > > Pierre --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---

