Hi Kent,
Well even before reading your last email I gave it a go, just parsing
the xml file and trying out some basic functions. It ran in less than
two seconds. I don't know why BeautifulSoup is taking so long...
Thanks for the "to get you started"!
Bernard
On 9/14/05, Kent Johnson <[EMAIL PROTECTED]> wrote:
> Bernard Lebel wrote:
> > The file size is 112 Kb. Most lines look this way:
> >
> > <parameter name="roty" type="Parameter" sourceclassname="nosource">
> >
> >
> > I'll give a try to ElementTree.
>
> To get you started:
>
> from elementtree import ElementTree
> doc = ElementTree.parse('myfile.xml')
> for sceneobject in doc.findall('//sceneobject'):
> if sceneobject.get('type') == 'CameraRoot':
> # this is a sceneobject that you want
> print sceneobject.get('name')
>
> One gotcha - if your XML uses namespaces, you have to prefix the namespace to
> the tag name in findall(). It will look something like
> d.findall('//{http://www.imsproject.org/xsd/imscp_rootv1p1p2}resource')
>
> Let us know how long that takes...
>
> Kent
>
> >
> >
> > Bernard
> >
> >
> >
> > On 9/14/05, Kent Johnson <[EMAIL PROTECTED]> wrote:
> >
> >>Bernard Lebel wrote:
> >>
> >>>Thanks for that pointer Kent, I'll check it out. Also thanks for
> >>>letting me know I'm not nuts! :-)
> >>>
> >>>Alan's suggestion about BeautifulSoup is actually excellent. The
> >>>documentation is nice and the tool is very easy to use.
> >>>
> >>>However is it normal that to parse a 2618 lines xml file it takes
> >>>20-30 seconds or so?
> >>
> >>That seems slow to me unless the lines are really long! How many bytes is
> >>the file? But I don't have much experience with BeautifulSoup.
> >>
> >>ElementTree is fast and cElementTree (the C implementation) is really fast.
> >>I have used it to read, process and write a 28 MB XML file, it took about
> >>10 seconds.
> >>
> >>Kent
> >>
> >>
> >>>
> >>>Thanks
> >>>Bernard
> >>>
> >>>
> >>>
> >>>On 9/14/05, Kent Johnson <[EMAIL PROTECTED]> wrote:
> >>>
> >>>
> >>>>Bernard Lebel wrote:
> >>>>
> >>>>
> >>>>>Thanks Alan,
> >>>>>
> >>>>>I'll check BeautifulSoup asap.
> >>>>>
> >>>>>I'm using regex simply because I have no clue where to start to parse
> >>>>>XML. I have read the various xml tools available in the Python
> >>>>>library, however I'm a complete loss at what to make out of them. Many
> >>>>>of them seem to use some programming standards, wich I am completely
> >>>>>unfamiliar with (this is the first time that I dig into XML writing
> >>>>>and parsing).
> >>>>>
> >>>>>I don't know where to start to learn about all these standards, and as
> >>>>>usual with new programming things, the documentation is hard to
> >>>>>swallow (it usually is written more as a reference than a proper user
> >>>>>guide/tutorial). I have to admit this is very frustrating, so if I'm
> >>>>>looking at things from a wrong perspective please advise me, I need
> >>>>>it.
> >>>>
> >>>>I agree that the Python XML story is confusing even for the files in the
> >>>>standard library. Worse, the (IMO) best solutions are not to be found in
> >>>>the standard lib or PyXML at all.
> >>>>
> >>>>The std lib and PyXML are based on the DOM and SAX standards. These
> >>>>standards were designed to be "language-neutral" - there are
> >>>>implementations in Python, Java and other languages. The good side of
> >>>>this is, if you learn how to use them, the knowledge is pretty portable
> >>>>to other languages. The bad side is, the APIs defined by the standard are
> >>>>IMO clunky and painful to use, especially in Python.
> >>>>
> >>>>There is a current thread on comp.lang.python discussing this with good
> >>>>suggestions and pointers to more info:
> >>>>http://groups.google.com/group/comp.lang.python/browse_frm/thread/a48891aa645ead13/dcd8fdc20b4b191b?hl=en#dcd8fdc20b4b191b
> >>>>
> >>>>My personal preference is ElementTree. Beautiful Soup is good too though
> >>>>I have only tried it with HTML. If I was running on Linux I would try
> >>>>lxml which uses the ElementTree API and adds full XPath support. Amara
> >>>>looks like the Cadillac solution - big and cushy. I haven't tried it.
> >>>>Uche's articles (referenced in the thread above) have pointers to many
> >>>>other choices but these seem to be the most popular.
> >>>>
> >>>>My favorite XML lib is actually dom4j which is in Java. It works great
> >>>>with Jython.
> >>>>
> >>>>Kent
> >>>>
> >>>>
> >>>>
> >>>>>So right now I'm just taking a shortcut and using ultra-simple
> >>>>>re-based parser to retrieve the tags I'm looking for. I know it will
> >>>>>probably be slow, but hopefully I'll get familiar with sophisticated
> >>>>>parsing in the future and improve my code. As it stands right now,
> >>>>>even the re syntax is not super easy to learn.
> >>>>
> >>>>For what you are doing re seems fine to me. You can get in trouble using
> >>>>re's with XML because of nested tags, variations in spelling and order,
> >>>>probably a bunch of other things. But for simple stuff it can work fine.
> >>>>
> >>>>Kent
> >>>>
> >>>>
> >>>>
> >>>>>Kent: That works (of course!). Thanks a bunch once again!
> >>>>>
> >>>>>
> >>>>>Thanks
> >>>>>Bernard
> >>>>>
> >>>>>On 9/14/05, Alan G <[EMAIL PROTECTED]> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>>Hi Bernard,
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>Hello, yet another regular expression question :-)
> >>>>>>>
> >>>>>>>So I have this xml file that I'm trying to find a
> >>>>>>>specific tag in.
> >>>>>>
> >>>>>>I'm always suspicious when I see regular expression
> >>>>>>and xml/html in the same context. regex are not good
> >>>>>>for parsing xml/html files and it's usually much easier
> >>>>>>to use a proper parser - such as beautiful soup.
> >>>>>>
> >>>>>>http://www.crummy.com/software/BeautifulSoup/
> >>>>>>
> >>>>>>Is there any special reason why you are using a regex
> >>>>>>sledgehammer to crack this particular nut? Or is it
> >>>>>>just to gain experience using regex?
> >>>>>>
> >>>>>>Alan G.
> >>>>>>
> >>>>>
> >>>>>_______________________________________________
> >>>>>Tutor maillist - [email protected]
> >>>>>http://mail.python.org/mailman/listinfo/tutor
> >>>>>
> >>>>>
> >>>>
> >>>>_______________________________________________
> >>>>Tutor maillist - [email protected]
> >>>>http://mail.python.org/mailman/listinfo/tutor
> >>>>
> >>>
> >>>_______________________________________________
> >>>Tutor maillist - [email protected]
> >>>http://mail.python.org/mailman/listinfo/tutor
> >>>
> >>>
> >>
> >>_______________________________________________
> >>Tutor maillist - [email protected]
> >>http://mail.python.org/mailman/listinfo/tutor
> >>
> >
> >
> >
>
> _______________________________________________
> Tutor maillist - [email protected]
> http://mail.python.org/mailman/listinfo/tutor
>
_______________________________________________
Tutor maillist - [email protected]
http://mail.python.org/mailman/listinfo/tutor