While waiting for a proper test file, I did some more checking, this time
with Valgrind (unfortunately, Valgrind and Python tend to work in a very
adversarial manner, so this isn't very easy).  I think I now understand
where the original trouble comes from.

The pythonDocLoaderFuncWrapper function in python/libxslt.c creates a
parser context, pctxt, using xmlNewParserCtxt().  Apparently the reason
for doing this is so that later, after calling the user's loader, some
additional error checking and cleanup can be done.  pctxt is then
converted into a Python object [pctxtobj =
libxml_xmlParserCtxtPtrWrap(pctxt)] which is passed to the user's loader
function (the routine 'loader' in your test program).

Within the loader function which you posted, you create a new object:
  parserContext = libxml2.parserCtxt(_obj=pctx)
where 'pctx' is pctxtobj.  However, parserContext is only a local python
object, so at the end of the loader function Python very kindly calls
upon it's Garbage Collector to dispose of it.  That action causes
xmlFreeParserCtxt to be called for the underlying parser context pointer,
which in this instance is the (original C) variable pctxt.  That, in
turn, causes nothing but trouble for the remainder of the code within
pythonDocLoaderFuncWrapper.

That's about as far as I can go, since it now would appear to be a
problem with the basic design of the loader code.  Please let me know if
I have made some error in my analysis described above, or if I can
further assist in any way.

Bill

William M. Brack wrote:
> Could you provide the file ("file2.html") that you are using for this
> test which fails?  If I use a file like libxml2/test/HTML/doc2.htm:
>
> [EMAIL PROTECTED] ~/gnomesvn/work $ ln -s HTML/doc2.htm file2.html
> [EMAIL PROTECTED] ~/gnomesvn/work $ python bug.py
> ./file2.html:10: HTML parser error : Misplaced DOCTYPE declaration
> <!-- END Naviscope Javascript --><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML
> 4.0 Tra
>                                  ^
> <?xml version="1.0"?>
> <html>
>   <head/>
>   <body>
>     <div>
> <!-- saved from url=(0016)http://intranet/ -->
> <!-- BEGIN Naviscope Javascript -->
> <!-- END Naviscope Javascript -->
> <!-- saved from url=(0027)http://www.agents-tech.com/ -->
>     </div>
>     <div>this is xml</div>
>   </body>
> </html>
>
> which seems to indicate that at least something is working :-).
>
> (note that I'm using the latest SVN for both libxslt and libxml2)
>
>
> Bill
>
> Nic James Ferrier wrote:
>> Daniel Veillard <[EMAIL PROTECTED]> writes:
>>
>>> Nic said:
>>>>  *** glibc detected *** double free or corruption (!prev): 0x081b6300
>>>> ***
>>>>  Aborted
>>>>
>>>   But did you update libxslt too and make install for it too ? Please
>>> do
>>> he fixed the problems in libxslt not in libxml2,
>>
>> Ah!
>>
>> Yes. It stopped segfaulting. I can't get it to parse the HTML... but
>> it has stopped segfaulting.
>>
>>   doc.dump(sys.stdout)
>>
>> shows this for every document I get back that parses:
>>
>> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
>> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
>> "http://www.w3.org/TR/REC-html40/loose.dtd";>
>>
>> Here's the relevant bit of the loader again:
>>
>>   def loader(url, pctx, ctx, type):
>>       doc = None
>>       context_object = None
>>       if type:
>>           context_object = libxslt.stylesheet(_obj=ctx)
>>       else:
>>           context_object = libxslt.transformCtxt(_obj=ctx)
>>       # The parserContext and resulting document
>>       parserContext = libxml2.parserCtxt(_obj=pctx)
>>       doc = None
>>       if url == "/one":
>>           doc = parserContext.htmlCtxtReadFile("file2.html", "UTF8", 1)
>>       else:
>>           doc = parserContext.ctxtReadDoc("""<document>
>>   <h1>this is xml</h1>
>>   </document>""", url, "UTF8", 0)
>>       return doc
>>
>>
>> so when I ask for "/one" from my stylesheet I get back (practically)
>> nothing.
>>
>> --
>> Nic Ferrier
>> http://www.tapsellferrier.co.uk   for all your tapsell ferrier needs
>> _______________________________________________
>> xml mailing list, project page  http://xmlsoft.org/
>> [email protected]
>> http://mail.gnome.org/mailman/listinfo/xml
>>
>
>
> _______________________________________________
> xml mailing list, project page  http://xmlsoft.org/
> [email protected]
> http://mail.gnome.org/mailman/listinfo/xml
>


_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to