Re: [xml] HTMLparser

2010-04-30 Thread Sergio Monteiro Basto
On Thu, 2010-04-29 at 15:17 +0200, Daniel Veillard wrote: > On Wed, Apr 28, 2010 at 07:08:49PM +0100, Sergio Monteiro Basto wrote: > > who is the maintainer of HTMLparser , I had report a bug , and no one > > had reply . > > I do handle the bugs, but I'm busy with other projects, so I don't > a

Re: [xml] HTMLparser

2010-04-29 Thread Daniel Veillard
On Wed, Apr 28, 2010 at 07:08:49PM +0100, Sergio Monteiro Basto wrote: > who is the maintainer of HTMLparser , I had report a bug , and no one > had reply . I do handle the bugs, but I'm busy with other projects, so I don't answer in a timely fashion. > What I could do about that ? provide

Re: [xml] HTMLparser

2010-04-28 Thread Stefan Behnel
Sergio Monteiro Basto, 28.04.2010 20:08: who is the maintainer of HTMLparser , I had report a bug , and no one had reply . What I could do about that ? Should HTMLparser parse bad broken html or not ? IIRC, the last thing I read was that the HTML parser should basically follow HTML5 where poss

Re: [xml] HTMLparser

2010-04-28 Thread Darko Miletic
Probably not. You should clean up that html with tidy before passing it through xml parser. Sergio Monteiro Basto wrote: who is the maintainer of HTMLparser , I had report a bug , and no one had reply . What I could do about that ? Should HTMLparser parse bad broken html or not ? Thanks,

[xml] HTMLparser

2010-04-28 Thread Sergio Monteiro Basto
who is the maintainer of HTMLparser , I had report a bug , and no one had reply . What I could do about that ? Should HTMLparser parse bad broken html or not ? Thanks, -- Sérgio M. B. smime.p7s Description: S/MIME cryptographic signature ___ xml ma

[xml] HTMLparser strips blank chars after some elements

2010-04-12 Thread Benoit Boissinot
The following happens with lxml (sorry it's easier for me with lxml): In [25]: etree.tostring(etree.fromstring('Article 1er bis (nouveau)', etree.HTMLParser())) Out[25]: 'Article 1erbis (nouveau)' In [26]: etree.tostring(etree.fromstring('Article 1er bis (nouveau)')) Out[26]: 'Article 1er bis

Re: [xml] HTMLparser: comments in

2007-04-12 Thread Daniel Veillard
On Thu, Apr 12, 2007 at 01:14:20PM +1000, Michael Day wrote: > Hi Daniel, > > Here is the patch to stop htmlParseScript() interpreting

Re: [xml] HTMLparser: comments in

2007-04-11 Thread Michael Day
Hi Daniel, Here is the patch to stop htmlParseScript() interpreting

Re: [xml] HTMLparser: comments in

2007-04-09 Thread Michael Day
Hi Daniel, > http://www.w3.org/TR/html4/types.html#type-cdata > says nothing about comments, sone one supposedly must know SGML specific > on the topic and sorry I never studied SGML. If you have pointer to a > description explaining that comments are not to be interpreted in CDATA > a patch sh

Re: [xml] HTMLparser: comments in

2007-04-09 Thread Daniel Veillard
On Mon, Apr 09, 2007 at 06:29:08PM +1000, Michael Day wrote: > Hi, > > Currently the HTML parser seems to incorrectly parse comments in the > element. For example: > >