subject:"Looking for a decent HTML parser for Python..."

Re: Looking for a decent HTML parser for Python...

2006-12-06 Thread hubritic

Agreed that the web sites are probably broken. Try running the HTML though HTMLTidy (http://tidy.sourceforge.net/). Doing that has allowed me to parse where I had problem such as yours. I have also had luck with BeautifulSoup, which also includes a tidy function in it. Just Another Victim of t

Re: Looking for a decent HTML parser for Python...

2006-12-06 Thread Stephen Eilert

Fredrik Lundh escreveu: > > Except it appears to be buggy or, at least, not very robust. There are > > websites for which it falsely terminates early in the parsing. > > which probably means that the sites are broken. the amount of broken > HTML on the net is staggering, as is the amount of

Re: Looking for a decent HTML parser for Python...

2006-12-05 Thread Fredrik Lundh

> Except it appears to be buggy or, at least, not very robust. There are > websites for which it falsely terminates early in the parsing. which probably means that the sites are broken. the amount of broken HTML on the net is staggering, as is the amount of code in a typical web browser

Re: Looking for a decent HTML parser for Python...

2006-12-05 Thread Just Another Victim of the Ambient Morality

"Just Another Victim of the Ambient Morality" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > >Okay, I think I found what I'm looking for in HTMLParser in the > HTMLParser module. Except it appears to be buggy or, at least, not very robust. There are websites for which i

Re: Looking for a decent HTML parser for Python...

2006-12-05 Thread Just Another Victim of the Ambient Morality

"Just Another Victim of the Ambient Morality" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] >I'm trying to parse HTML in a very generic way. >So far, I'm using SGMLParser in the sgmllib module. The problem is > that it forces you to parse very specific tags through object

Looking for a decent HTML parser for Python...

2006-12-05 Thread Just Another Victim of the Ambient Morality

I'm trying to parse HTML in a very generic way. So far, I'm using SGMLParser in the sgmllib module. The problem is that it forces you to parse very specific tags through object methods like start_a(), start_p() and the like, forcing you to know exactly which tags you want to handle. I

Re: Looking for a decent HTML parser for Python...

Re: Looking for a decent HTML parser for Python...

Re: Looking for a decent HTML parser for Python...

Re: Looking for a decent HTML parser for Python...

Re: Looking for a decent HTML parser for Python...

Looking for a decent HTML parser for Python...

6 matches

Site Navigation

Mail list logo

Footer information