Mark Tolonen <[email protected]> wrote:
> Your data looks like XML.  If it is actually well-formed XML, have you tried
> ElementTree?

It is XML. I used minidom from xml.dom, and it worked fine, except it
was ~16 times slower. I'm parsing a ~70mb file, and the difference is
3 minutes to 10 seconds with re's.

I used separate re's for each field I wanted, and it worked nicely.
(1-1 between DOM calls and re.search and re.finditer)

This problem raised when I tried to do the match in one re.

I guess instead of minidom I could try lxml, which uses libxml2, which
is written in C.

Kent Johnson <[email protected]> wrote:
> This re doesn't have to match anything after </ship> so it doesn't.
> You can force it to match to the end by adding $ at the end but that
> is not enough, you have to make the "</ship>.*?" *not* match <title>.
> One way to do that is to use [^<]*? instead of .*?:

Ah. Thanks.
Unfortunately, the input string is multi-line, and doesn't end in </title>


Moos

P.S.

I'm still relatively new to RE's, or IRE's. sed, awk, grep, and perl
have different format for re's. grep alone has four different versions
of RE's!

Since the only form of re I'm using is "start(.*?)end" I was thinking
about writing a C program to do that.
_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

Reply via email to