I'm responding to your questions on Valgrind, but leaving the
questions on Text Reader vs. XPath to others:
Eric West wrote:
> [disclaimer: I am new to coding with libxml2]
Good reason to rely heavily on libxml2/doc/examples for code models.
> I have a test program to write and read some sample xml. It "works",
> but I have noticed that
> valgrind reports some problems related to xmlTextReaderRead and
> xmlNewTextReaderFilename.
Using Valgrind on your programs is a GOOD thing. I only wish more
would follow your example.
> [Details below]
>
> My test program uses the TextReader APIs to extract XML content. At
> parent nodes, it utilizes
> XPath queries to extract child and grandchild content. In
> pseudocode, this is
>
> reader = xmlNewReaderFilename();
> ret = xmlReaderRead( reader);
> doc = xmlTextReaderCurrentDoc( reader);
> while ( ret == 1) {
> processNode( reader, doc);
> xmlTextReaderRead( reader);
> }
>
> xmlFree( doc);
There is room for improvement here - check what reader3.c and
reader4.c do with the doc returned by xmlTextReaderCurrentDoc.
(Hint: they *don't* use xmlFree for this).
> xmlFreeTextReader( reader);
> xmlCleanupParser();
>
> Within processNode(), I get path context via the doc handle and then
> make
> a series of XPath queries. The various libxml2 free routines are
> called on
> memory allocated as appropriate. valgrind finds no issues in
> processNode().
>
> Now at the risk of solving my own problem... If I comment out the
> call to processNode, valgrind
> still flags memory mismanagement. If I also comment out the call to
> xmlTextReaderCurrentDoc,
> voilĂ ! -- valgrind is happy.
>
> Q: Thus I must conclude that there is an order of operations problem
> here. I noticed that the sample code
> textReader3.c does the parsing with xmlTextReader and then calls
> xmlTextReaderCurrentDoc. That
> observation and the documentation suggests that the appropriate
> approach is (a) parse the entire
> file via xmlTextReader and then (b) get a doc pointer to process the
> in-memory data. Is this correct?
A: follow the sequence(s) used by the example programs.
> Q: Can xmlTextReader calls be interwoven with XPath queries? (Je
> pense que non.). The python
> example does this, but the equivalent in C is not apparent to me. As
> best I can tell one needs a
> doc pointer to call xmlXPath API:
>
> node = xmlTextReaderExpand( reader);
> ctx = xmlXPathNewContext( docPtr );
> ctx->node = node
> pObj = xmlXPathEval( BAD_CAST xmlXPathQuery, ctx);
>
> Q: Does mixing xmlTextReader API calls with XPath APIs defeat the
> memory utilization benefits
> of the xmlTextReader implementation? At least as per the example,
> the series of xmlTextReader calls
> will build a tree in-memory so that the subsequent call to
> xmlTextReaderCurrentDoc returns a
> pointer to the complete tree.
>
> Thanks in advance.
>
>
> --Eric
>
>
> ###################
>
> $ valgrind --leak-check=full ./xmlTest
> ...
> ==4610== 23 bytes in 3 blocks are definitely lost in loss record 1
> of 6
> ==4610== at 0x4C21D06: malloc (in
> /usr/lib64/valgrind/amd64-linux/vgpreload_memcheck.so)
> ==4610== by 0x4ED27BF: xmlStrndup (in
> /usr/lib64/libxml2.so.2.6.30)
> ==4610== by 0x4E8483E: xmlNewDoc (in
> /usr/lib64/libxml2.so.2.6.30)
> ==4610== by 0x4F20BDD: xmlSAX2StartDocument (in
> /usr/lib64/libxml2.so.2.6.30)
> ==4610== by 0x4E7BBB8: xmlParseChunk (in
> /usr/lib64/libxml2.so.2.6.30)
> ==4610== by 0x4F0C99D: (within /usr/lib64/libxml2.so.2.6.30)
> ==4610== by 0x4F0D5CD: xmlTextReaderRead (in
> /usr/lib64/libxml2.so.2.6.30)
> ==4610== by 0x40221E: main (xmlTest.c:492)
> ==4610==
> ==4610==
> ==4610== 4,312 (48 direct, 4,264 indirect) bytes in 1 blocks are
> definitely lost in loss record 3 of 6
> ==4610== at 0x4C21D06: malloc (in
> /usr/lib64/valgrind/amd64-linux/vgpreload_memcheck.so)
> ==4610== by 0x4F1DA89: xmlDictCreate (in
> /usr/lib64/libxml2.so.2.6.30)
> ==4610== by 0x4E65F94: xmlInitParserCtxt (in
> /usr/lib64/libxml2.so.2.6.30)
> ==4610== by 0x4E6600D: xmlNewParserCtxt (in
> /usr/lib64/libxml2.so.2.6.30)
> ==4610== by 0x4E7E195: xmlCreatePushParserCtxt (in
> /usr/lib64/libxml2.so.2.6.30)
> ==4610== by 0x4F0E11C: xmlNewTextReader (in
> /usr/lib64/libxml2.so.2.6.30)
> ==4610== by 0x4F0E587: xmlNewTextReaderFilename (in
> /usr/lib64/libxml2.so.2.6.30)
> ==4610== by 0x402206: main (xmlTest.c:488)
> ==4610==
> ==4610== LEAK SUMMARY:
> ==4610== definitely lost: 71 bytes in 4 blocks.
> ==4610== indirectly lost: 4,264 bytes in 5 blocks.
> ==4610== possibly lost: 0 bytes in 0 blocks.
> ==4610== still reachable: 0 bytes in 0 blocks.
> ==4610== suppressed: 0 bytes in 0 blocks.
>
> The problem points seem to be related to xml
>
>
>
> --
> E r i c W e s t
> Spark! Creative Group
> Boston, MA 02134-1406
> http://www.sparkcg.com
>
> -- E r i c W e s [EMAIL PROTECTED] o s t o n , M
> A_______________________________________________
Bill
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml