Hi all,

Thank you so much for pointing me in the right direction.

I used a REXML SAX2Parser: it solved my problem. It's a bit more code
indeed, but it uses a fraction of the memory and it seems quite fast
to me.

Thanks a lot,
Pierre

On Jun 11, 5:22 pm, Maurício Linhares <[email protected]>
wrote:
> Hi Pierre,
>
> I had a 45~50mb file to parse using Ruby libraries but to no avail,
> the DOM based libraries were slow to death and the SAX based one that
> I tried (libxml-ruby) had some serious memory leaks. Now there's this
> SaxMachine from paul dix that looks usable 
> -http://www.pauldix.net/2009/01/sax-machine-sax-parsing-made-easy.html
>
> As to my problem, I wrote a StAX based parser using Java to get it to
> run in reasonable time :(
>
> -
> Maurício 
> Linhareshttp://codeshooter.wordpress.com/|http://twitter.com/mauriciojr
>
>
>
> On Thu, Jun 11, 2009 at 5:41 AM, PierreW<[email protected]> wrote:
>
> > Hello,
>
> > I need to parse two big XML files in a row (30+MB each). I have tried
> > both REXML and Hpricot. They do work. Thing is, with both libraries,
> > the parsing of each file takes a huge amount of memory: more than
> > 700MB each!
>
> > So I was wondering:
> > - is it normal that parsing a 30MB file takes 700MB of memory? Could
> > it be that something is wrong with the file? Is there an alternative
> > way to deal with such big files?
> > - is there a way to force the release of the memory when I don't need
> > the file anymore? At the moment it is not released instantly after the
> > first file, so I end up with 1.5GB memory use.
>
> > I have reduced the code to the minimum to isolate the memory issue:
>
> > xml = File.read("myfile.xml")
> > doc = REXML::Document.new(xml) or doc = Hpricot.XML(xml)
> > doc = nil
>
> > and repeat with the second file.
>
> > Also, I tried libxml in case. I get an error message that I can't
> > explain:
> > LibXML::XML::Error (Fatal error: Input is not proper UTF-8, indicate
> > encoding !  yet the file is UTF-8 as far as I can tell.
>
> > Thanks a lot for your help.
> > Pierre
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to