On Thu, 2010-04-29 at 15:17 +0200, Daniel Veillard wrote:
> On Wed, Apr 28, 2010 at 07:08:49PM +0100, Sergio Monteiro Basto wrote:
> > who is the maintainer of HTMLparser , I had report a bug , and no one
> > had reply .
>
> I do handle the bugs, but I'm busy with other projects, so I don't
> a
On Wed, Apr 28, 2010 at 07:08:49PM +0100, Sergio Monteiro Basto wrote:
> who is the maintainer of HTMLparser , I had report a bug , and no one
> had reply .
I do handle the bugs, but I'm busy with other projects, so I don't
answer in a timely fashion.
> What I could do about that ?
provide
Sergio Monteiro Basto, 28.04.2010 20:08:
who is the maintainer of HTMLparser , I had report a bug , and no one
had reply .
What I could do about that ?
Should HTMLparser parse bad broken html or not ?
IIRC, the last thing I read was that the HTML parser should basically
follow HTML5 where poss
Probably not. You should clean up that html with tidy before passing it
through xml parser.
Sergio Monteiro Basto wrote:
who is the maintainer of HTMLparser , I had report a bug , and no one
had reply .
What I could do about that ?
Should HTMLparser parse bad broken html or not ?
Thanks,
who is the maintainer of HTMLparser , I had report a bug , and no one
had reply .
What I could do about that ?
Should HTMLparser parse bad broken html or not ?
Thanks,
--
Sérgio M. B.
smime.p7s
Description: S/MIME cryptographic signature
___
xml ma
The following happens with lxml (sorry it's easier for me with lxml):
In [25]: etree.tostring(etree.fromstring('Article 1er
bis (nouveau)', etree.HTMLParser()))
Out[25]: 'Article 1erbis
(nouveau)'
In [26]: etree.tostring(etree.fromstring('Article 1er
bis (nouveau)'))
Out[26]: 'Article 1er bis
On Thu, Apr 12, 2007 at 01:14:20PM +1000, Michael Day wrote:
> Hi Daniel,
>
> Here is the patch to stop htmlParseScript() interpreting
Hi Daniel,
Here is the patch to stop htmlParseScript() interpreting
Hi Daniel,
> http://www.w3.org/TR/html4/types.html#type-cdata
> says nothing about comments, sone one supposedly must know SGML specific
> on the topic and sorry I never studied SGML. If you have pointer to a
> description explaining that comments are not to be interpreted in CDATA
> a patch sh
On Mon, Apr 09, 2007 at 06:29:08PM +1000, Michael Day wrote:
> Hi,
>
> Currently the HTML parser seems to incorrectly parse comments in the
> element. For example:
>
>