subject:"HTML parsing confusion"

Re: HTML parsing confusion

2008-01-23 Thread Gabriel Genellina

En Wed, 23 Jan 2008 10:40:14 -0200, Alnilam <[EMAIL PROTECTED]> escribió: > Skipping past html validation, and html to xhtml 'cleaning', and > instead starting with the assumption that I have files that are valid > XHTML, can anyone give me a good example of how I would use _ htmllib, > HTMLParser

Re: HTML parsing confusion

2008-01-23 Thread Jerry Hill

On Jan 23, 2008 7:40 AM, Alnilam <[EMAIL PROTECTED]> wrote: > Skipping past html validation, and html to xhtml 'cleaning', and > instead starting with the assumption that I have files that are valid > XHTML, can anyone give me a good example of how I would use _ htmllib, > HTMLParser, or ElementTre

Re: HTML parsing confusion

2008-01-23 Thread Alnilam

On Jan 23, 3:54 am, "M.-A. Lemburg" <[EMAIL PROTECTED]> wrote: > >> I was asking this community if there was a simple way to use only the > >> tools included with Python to parse a bit of html. > > There are lots of ways doing HTML parsing in Python. A common > one is e.g. using mxTidy to convert

Re: HTML parsing confusion

2008-01-23 Thread cokofreedom

> The pages I'm trying to write this code to run against aren't in the > wild, though. They are static html files on my company's lan, are very > consistent in format, and are (I believe) valid html. Obvious way to check this is to go to http://validator.w3.org/ and see what it tells you about you

Re: HTML parsing confusion

2008-01-23 Thread M.-A. Lemburg

On 2008-01-23 01:29, Gabriel Genellina wrote: > En Tue, 22 Jan 2008 19:20:32 -0200, Alnilam <[EMAIL PROTECTED]> escribió: > >> On Jan 22, 11:39 am, "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote: >>> Alnilam wrote: On Jan 22, 8:44 am, Alnilam <[EMAIL PROTECTED]> wrote: >> Pardon me, but the

Re: HTML parsing confusion

2008-01-22 Thread Alnilam

On Jan 22, 7:29 pm, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote: > > > I was asking this community if there was a simple way to use only the > > tools included with Python to parse a bit of html. > > If you *know* that your document is valid HTML, you can use the HTMLParser > module in the stan

Re: HTML parsing confusion

2008-01-22 Thread [EMAIL PROTECTED]

On Jan 22, 7:29 pm, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote: > > > I was asking this community if there was a simple way to use only the > > tools included with Python to parse a bit of html. > > If you *know* that your document is valid HTML, you can use the HTMLParser > module in the stand

Re: HTML parsing confusion

2008-01-22 Thread Gabriel Genellina

En Tue, 22 Jan 2008 19:20:32 -0200, Alnilam <[EMAIL PROTECTED]> escribió: > On Jan 22, 11:39 am, "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote: >> Alnilam wrote: >> > On Jan 22, 8:44 am, Alnilam <[EMAIL PROTECTED]> wrote: >> >> > Pardon me, but the standard issue Python 2.n (for n in range(5, 2, >>

Re: HTML parsing confusion

2008-01-22 Thread Alnilam

On Jan 22, 11:39 am, "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote: > Alnilam wrote: > > On Jan 22, 8:44 am, Alnilam <[EMAIL PROTECTED]> wrote: > >> > Pardon me, but the standard issue Python 2.n (for n in range(5, 2, > >> > -1)) doesn't have an xml.dom.ext ... you must have the mega-monstrous > >>

Re: HTML parsing confusion

2008-01-22 Thread Diez B. Roggisch

Alnilam wrote: > On Jan 22, 8:44 am, Alnilam <[EMAIL PROTECTED]> wrote: >> > Pardon me, but the standard issue Python 2.n (for n in range(5, 2, >> > -1)) doesn't have an xml.dom.ext ... you must have the mega-monstrous >> > 200-modules PyXML package installed. And you don't want the 75Kb >> > Beau

Re: HTML parsing confusion

2008-01-22 Thread Alnilam

On Jan 22, 8:44 am, Alnilam <[EMAIL PROTECTED]> wrote: > > Pardon me, but the standard issue Python 2.n (for n in range(5, 2, > > -1)) doesn't have an xml.dom.ext ... you must have the mega-monstrous > > 200-modules PyXML package installed. And you don't want the 75Kb > > BeautifulSoup? > > I wasn'

Re: HTML parsing confusion

2008-01-22 Thread Paul McGuire

On Jan 22, 7:44 am, Alnilam <[EMAIL PROTECTED]> wrote: > ...I move from computer to > computer regularly, and while all have a recent copy of Python, each > has different (or no) extra modules, and I don't always have the > luxury of downloading extras. That being said, if there's a simple way > of

Re: HTML parsing confusion

2008-01-22 Thread Alnilam

> Pardon me, but the standard issue Python 2.n (for n in range(5, 2, > -1)) doesn't have an xml.dom.ext ... you must have the mega-monstrous > 200-modules PyXML package installed. And you don't want the 75Kb > BeautifulSoup? I wasn't aware that I had PyXML installed, and can't find a reference to

Re: HTML parsing confusion

2008-01-22 Thread Paul Boddie

On 22 Jan, 06:31, Alnilam <[EMAIL PROTECTED]> wrote: > Sorry for the noob question, but I've gone through the documentation > on python.org, tried some of the diveintopython and boddie's examples, > and looked through some of the numerous posts in this group on the > subject and I'm still rather co

Re: HTML parsing confusion

2008-01-22 Thread John Machin

On Jan 22, 4:31 pm, Alnilam <[EMAIL PROTECTED]> wrote: > Sorry for the noob question, but I've gone through the documentation > on python.org, tried some of the diveintopython and boddie's examples, > and looked through some of the numerous posts in this group on the > subject and I'm still rather

HTML parsing confusion

2008-01-21 Thread Alnilam

Sorry for the noob question, but I've gone through the documentation on python.org, tried some of the diveintopython and boddie's examples, and looked through some of the numerous posts in this group on the subject and I'm still rather confused. I know that there are some great tools out there for

Re: HTML parsing confusion

Re: HTML parsing confusion

Re: HTML parsing confusion

Re: HTML parsing confusion

Re: HTML parsing confusion

Re: HTML parsing confusion

Re: HTML parsing confusion

Re: HTML parsing confusion

Re: HTML parsing confusion

Re: HTML parsing confusion

Re: HTML parsing confusion

Re: HTML parsing confusion

Re: HTML parsing confusion

Re: HTML parsing confusion

Re: HTML parsing confusion

HTML parsing confusion

16 matches

Site Navigation

Mail list logo

Footer information