Re: [Tutor] Another regular expression question

Bernard Lebel Wed, 14 Sep 2005 07:39:16 -0700

Thanks for that pointer Kent, I'll check it out. Also thanks for
letting me know I'm not nuts! :-)


Alan's suggestion about BeautifulSoup is actually excellent. The
documentation is nice and the tool is very easy to use.

However is it normal that to parse a 2618 lines xml file it takes
20-30 seconds or so?


Thanks
Bernard



On 9/14/05, Kent Johnson <[EMAIL PROTECTED]> wrote:
> Bernard Lebel wrote:
> > Thanks Alan,
> >
> > I'll check BeautifulSoup asap.
> >
> > I'm using regex simply because I have no clue where to start to parse
> > XML. I have read the various xml tools available in the Python
> > library, however I'm a complete loss at what to make out of them. Many
> > of them seem to use some programming standards, wich I am completely
> > unfamiliar with (this is the first time that I dig into XML writing
> > and parsing).
> >
> > I don't know where to start to learn about all these standards, and as
> > usual with new programming things, the documentation is hard to
> > swallow (it usually is written more as a reference than a proper user
> > guide/tutorial). I have to admit this is very frustrating, so if I'm
> > looking at things from a wrong perspective please advise me, I need
> > it.
> 
> I agree that the Python XML story is confusing even for the files in the 
> standard library. Worse, the (IMO) best solutions are not to be found in the 
> standard lib or PyXML at all.
> 
> The std lib and PyXML are based on the DOM and SAX standards. These standards 
> were designed to be "language-neutral" - there are implementations in Python, 
> Java and other languages. The good side of this is, if you learn how to use 
> them, the knowledge is pretty portable to other languages. The bad side is, 
> the APIs defined by the standard are IMO clunky and painful to use, 
> especially in Python.
> 
> There is a current thread on comp.lang.python discussing this with good 
> suggestions and pointers to more info:
> http://groups.google.com/group/comp.lang.python/browse_frm/thread/a48891aa645ead13/dcd8fdc20b4b191b?hl=en#dcd8fdc20b4b191b
> 
> My personal preference is ElementTree. Beautiful Soup is good too though I 
> have only tried it with HTML. If I was running on Linux I would try lxml 
> which uses the ElementTree API and adds full XPath support. Amara looks like 
> the Cadillac solution - big and cushy. I haven't tried it. Uche's articles 
> (referenced in the thread above) have pointers to many other choices but 
> these seem to be the most popular.
> 
> My favorite XML lib is actually dom4j which is in Java. It works great with 
> Jython.
> 
> Kent
> 
> >
> > So right now I'm just taking a shortcut and using ultra-simple
> > re-based parser to retrieve the tags I'm looking for. I know it will
> > probably be slow, but hopefully I'll get familiar with sophisticated
> > parsing in the future and improve my code. As it stands right now,
> > even the re syntax is not super easy to learn.
> 
> For what you are doing re seems fine to me. You can get in trouble using re's 
> with XML because of nested tags, variations in spelling and order, probably a 
> bunch of other things. But for simple stuff it can work fine.
> 
> Kent
> 
> >
> >
> > Kent: That works (of course!). Thanks a bunch once again!
> >
> >
> > Thanks
> > Bernard
> >
> > On 9/14/05, Alan G <[EMAIL PROTECTED]> wrote:
> >
> >>Hi Bernard,
> >>
> >>
> >>>Hello, yet another regular expression question :-)
> >>>
> >>>So I have this xml file that I'm trying to find a
> >>>specific tag in.
> >>
> >>I'm always suspicious when I see regular expression
> >>and xml/html in the same context. regex are not good
> >>for parsing xml/html files and it's usually much easier
> >>to use a proper parser - such as beautiful soup.
> >>
> >>http://www.crummy.com/software/BeautifulSoup/
> >>
> >>Is there any special reason why you are using a regex
> >>sledgehammer to crack this particular nut? Or is it
> >>just to gain experience using regex?
> >>
> >>Alan G.
> >>
> >
> > _______________________________________________
> > Tutor maillist  -  Tutor@python.org
> > http://mail.python.org/mailman/listinfo/tutor
> >
> >
> 
> _______________________________________________
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
>
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Another regular expression question

Reply via email to