On Sat, Apr 11, 2009 at 10:06 PM, Moos Heintzen <[email protected]> wrote: > Mark Tolonen <[email protected]> wrote: >> Your data looks like XML. If it is actually well-formed XML, have you tried >> ElementTree? > > It is XML. I used minidom from xml.dom, and it worked fine, except it > was ~16 times slower. I'm parsing a ~70mb file, and the difference is > 3 minutes to 10 seconds with re's. > > I used separate re's for each field I wanted, and it worked nicely. > (1-1 between DOM calls and re.search and re.finditer) > > This problem raised when I tried to do the match in one re. > > I guess instead of minidom I could try lxml, which uses libxml2, which > is written in C.
ElementTree is likely faster than minidom, it ha a C implementation. > Kent Johnson <[email protected]> wrote: >> This re doesn't have to match anything after </ship> so it doesn't. >> You can force it to match to the end by adding $ at the end but that >> is not enough, you have to make the "</ship>.*?" *not* match <title>. >> One way to do that is to use [^<]*? instead of .*?: > > Ah. Thanks. > Unfortunately, the input string is multi-line, and doesn't end in </title> Perhaps you should show your actual input then. Kent _______________________________________________ Tutor maillist - [email protected] http://mail.python.org/mailman/listinfo/tutor
