I also built lxml 4.2.5 with pristine libxml2 2.9.8 (using a variation of
the above command), and got the same results. So I don't think it's a
distro specific problem.

I tried to reproduce it with only xmllint as you suggest, but I'm not
having much luck. It produces correct results with "--html --debug
bad.html", "--html --debug --stream bad.html", "--html --debug --push
bad.html", and "--html --debug --sax bad.html".

Maybe I'm just not using the right flags - I don't know if lxml uses SAX
mode, or streaming, etc. But at this point I wouldn't be too surprised if
it depended on the size of some internal input buffer that's different in
lxml vs xmllint. I'd welcome any advice about what else I should try, or
how can I find out what calls are being made from lxml to libxml2.

Other than that: It's not ideal, but could you please check if you can also
reproduce the bug with the first set of commands I posted? Just to verify
it's not just me.


On Tue, Jan 22, 2019 at 5:11 PM Nick Wellnhofer <wellnho...@aevum.de> wrote:

> On 22/01/2019 15:43, Tomi Belan via xml wrote:
> > After a lot of debugging, I determined the problem is in libxml2 and not
> the
> > other libraries in my stack, and that it only seems to happen on version
> > 2.9.8. But I don't see any related changes in news.html for 2.9.9, nor
> in the
> > diff between them, so I am still worried: I don't know if the bug is
> really
> > fixed, or just dormant. I hope you can find the root cause, and maybe
> add a
> > regression test if you do.
> I also don't see any directly related changes in either 2.9.8 or 2.9.9.
> > This will download
> > the manylinux binary build of lxml 4.2.5, which is statically linked to
> > libxml2 2.9.8.
> Are you sure that a pristine 2.9.8 build was used? Maybe there are
> additional
> patches added by a distro?
> > I couldn't shorten the file very much, because if I delete even a single
> > character, the bug stops triggering. (Could it be some buffer boundary
> issue?)
> Yes, a buffer boundary issue seems likely.
> > I also built my own lxml 4.2.5 with libxml2 2.9.9 and it was not
> affected. So
> > I believe this is a bug in libxml2 2.9.8 specifically, and not in a
> particular
> > version of lxml.
> Did you also try your own build with the official libxml2 2.9.8 sources?
> > I hope you can solve the mystery. Please let me know if I can be of any
> help.
> It would help if you could reproduce the issue with xmllint and no Python
> code
> involved. git-bisect might also be useful.
> Nick
xml mailing list, project page  http://xmlsoft.org/

Reply via email to