Re: [xml] Questions on usage: xmlTextReaderCurrentDoc, XPath and xmlTextReaderRead

William M. Brack Mon, 24 Dec 2007 11:51:31 -0800

I'm responding to your questions on Valgrind, but leaving the
questions on Text Reader vs. XPath to others:


Eric West wrote:
> [disclaimer: I am new to coding with libxml2]

Good reason to rely heavily on libxml2/doc/examples for code models.

> I have a test program to write and read some sample xml. It "works",
> but I have noticed that
> valgrind reports some problems related to xmlTextReaderRead and
> xmlNewTextReaderFilename.

Using Valgrind on your programs is a GOOD thing.  I only wish more
would follow your example.

> [Details below]
>
> My test program uses the TextReader APIs to extract XML content. At
> parent nodes, it utilizes
> XPath queries to extract child and grandchild content. In
> pseudocode, this is
>
>      reader = xmlNewReaderFilename();
>      ret = xmlReaderRead( reader);
>      doc = xmlTextReaderCurrentDoc( reader);
>      while ( ret == 1) {
>            processNode( reader, doc);
>            xmlTextReaderRead( reader);
>       }
>
>       xmlFree( doc);

There is room for improvement here - check what reader3.c and
reader4.c do with the doc returned by xmlTextReaderCurrentDoc. 
(Hint: they *don't* use xmlFree for this).

>       xmlFreeTextReader( reader);
>       xmlCleanupParser();
>
> Within processNode(), I get path context via the doc handle and then
> make
> a series of XPath queries. The various libxml2 free routines are
> called on
> memory allocated as appropriate. valgrind finds no issues in
> processNode().
>
> Now at the risk of solving my own problem... If I comment out the
> call to processNode, valgrind
> still flags memory mismanagement. If I also comment out the call to
> xmlTextReaderCurrentDoc,
> voilà! -- valgrind is happy.
>
> Q: Thus I must conclude that there is an order of operations problem
> here. I noticed that the sample code
> textReader3.c does the parsing with xmlTextReader and then calls
> xmlTextReaderCurrentDoc. That
> observation and the documentation suggests that the appropriate
> approach is (a) parse the entire
> file via xmlTextReader and then (b) get a doc pointer to process the
> in-memory data. Is this correct?

A: follow the sequence(s) used by the example programs.

> Q: Can xmlTextReader calls be interwoven with XPath queries? (Je
> pense que non.). The python
> example does this, but the equivalent in C is not apparent to me. As
> best I can tell one needs a
> doc pointer to call xmlXPath API:
>
>       node =  xmlTextReaderExpand( reader);
>       ctx = xmlXPathNewContext( docPtr );
>       ctx->node = node
>       pObj = xmlXPathEval( BAD_CAST xmlXPathQuery, ctx);
>
> Q: Does mixing xmlTextReader API calls with XPath APIs defeat the
> memory utilization benefits
> of the xmlTextReader implementation? At least as per the example,
> the series of xmlTextReader calls
> will build a tree in-memory so that the subsequent call to
> xmlTextReaderCurrentDoc returns a
> pointer to the complete tree.
>
> Thanks in advance.
>
>
>   --Eric
>
>
> ###################
>
> $ valgrind --leak-check=full ./xmlTest
>     ...
> ==4610== 23 bytes in 3 blocks are definitely lost in loss record 1
> of 6
> ==4610==    at 0x4C21D06: malloc (in
> /usr/lib64/valgrind/amd64-linux/vgpreload_memcheck.so)
> ==4610==    by 0x4ED27BF: xmlStrndup (in
> /usr/lib64/libxml2.so.2.6.30)
> ==4610==    by 0x4E8483E: xmlNewDoc (in
> /usr/lib64/libxml2.so.2.6.30)
> ==4610==    by 0x4F20BDD: xmlSAX2StartDocument (in
> /usr/lib64/libxml2.so.2.6.30)
> ==4610==    by 0x4E7BBB8: xmlParseChunk (in
> /usr/lib64/libxml2.so.2.6.30)
> ==4610==    by 0x4F0C99D: (within /usr/lib64/libxml2.so.2.6.30)
> ==4610==    by 0x4F0D5CD: xmlTextReaderRead (in
> /usr/lib64/libxml2.so.2.6.30)
> ==4610==    by 0x40221E: main (xmlTest.c:492)
> ==4610==
> ==4610==
> ==4610== 4,312 (48 direct, 4,264 indirect) bytes in 1 blocks are
> definitely lost in loss record 3 of 6
> ==4610==    at 0x4C21D06: malloc (in
> /usr/lib64/valgrind/amd64-linux/vgpreload_memcheck.so)
> ==4610==    by 0x4F1DA89: xmlDictCreate (in
> /usr/lib64/libxml2.so.2.6.30)
> ==4610==    by 0x4E65F94: xmlInitParserCtxt (in
> /usr/lib64/libxml2.so.2.6.30)
> ==4610==    by 0x4E6600D: xmlNewParserCtxt (in
> /usr/lib64/libxml2.so.2.6.30)
> ==4610==    by 0x4E7E195: xmlCreatePushParserCtxt (in
> /usr/lib64/libxml2.so.2.6.30)
> ==4610==    by 0x4F0E11C: xmlNewTextReader (in
> /usr/lib64/libxml2.so.2.6.30)
> ==4610==    by 0x4F0E587: xmlNewTextReaderFilename (in
> /usr/lib64/libxml2.so.2.6.30)
> ==4610==    by 0x402206: main (xmlTest.c:488)
> ==4610==
> ==4610== LEAK SUMMARY:
> ==4610==    definitely lost: 71 bytes in 4 blocks.
> ==4610==    indirectly lost: 4,264 bytes in 5 blocks.
> ==4610==      possibly lost: 0 bytes in 0 blocks.
> ==4610==    still reachable: 0 bytes in 0 blocks.
> ==4610==         suppressed: 0 bytes in 0 blocks.
>
> The problem points seem to be related to xml
>
>
>
> --
> E r i c   W e s t
> Spark! Creative Group
> Boston, MA 02134-1406
> http://www.sparkcg.com
>
> -- E r i c   W e s [EMAIL PROTECTED] o s t o n ,   M
> A_______________________________________________

Bill

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Re: [xml] Questions on usage: xmlTextReaderCurrentDoc, XPath and xmlTextReaderRead

Reply via email to