On Sat, Apr 11, 2009 at 10:06 PM, Moos Heintzen <[email protected]> wrote:
> Mark Tolonen <[email protected]> wrote:
>> Your data looks like XML.  If it is actually well-formed XML, have you tried
>> ElementTree?
>
> It is XML. I used minidom from xml.dom, and it worked fine, except it
> was ~16 times slower. I'm parsing a ~70mb file, and the difference is
> 3 minutes to 10 seconds with re's.
>
> I used separate re's for each field I wanted, and it worked nicely.
> (1-1 between DOM calls and re.search and re.finditer)
>
> This problem raised when I tried to do the match in one re.
>
> I guess instead of minidom I could try lxml, which uses libxml2, which
> is written in C.

ElementTree is likely faster than minidom, it ha a C implementation.

> Kent Johnson <[email protected]> wrote:
>> This re doesn't have to match anything after </ship> so it doesn't.
>> You can force it to match to the end by adding $ at the end but that
>> is not enough, you have to make the "</ship>.*?" *not* match <title>.
>> One way to do that is to use [^<]*? instead of .*?:
>
> Ah. Thanks.
> Unfortunately, the input string is multi-line, and doesn't end in </title>

Perhaps you should show your actual input then.

Kent
_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

Reply via email to