Re: Simple XPath Extraction

David Bertoni Tue, 24 Feb 2009 17:48:26 -0800

Mark Schmit wrote:

I'm trying what I thought would be a very simple task: given an XPath
and a string containing an XML file's contents, produce a string
containing only the XPath-matching excerpts.  I tried the following,
based heavily on the SimpleXPathAPI and XPathWrapper samples:


================
XMLPlatformUtils::Initialize();
XPathEvaluator::initialize();
XalanNode* context_node = NULL;
{
  XPathEvaluator xpath_evaluator;
  XalanSourceTreeDOMSupport dom_support;
  XalanSourceTreeParserLiaison parser_liaison;
  dom_support.setParserLiaison(&parser_liaison);

  MemBufInputSource input_source(
      reinterpret_cast<const XMLByte*>(full_xml_contents.c_str()),
      full_xml_contents.length(), "what_is_buf_id");

  XalanDocument* doc = parser_liaison.parseXMLStream(input_source);
  ASSERT(doc) << "Failed to create doc";
  XalanDocumentPrefixResolver prefix_resolver(doc);
  context_node = xpath_evaluator.selectSingleNode(dom_support, doc,

XalanDOMString(context).c_str(),
                                                  prefix_resolver);
  if (!context_node) {
    LOG << "Failed to find node for " << context;
  } else {
    const XObjectPtr result(xpath_evaluator.evaluate(dom_support,
                                                     context_node,

XalanDOMString("*").c_str(),
                                                     prefix_resolver));
    ASSERT(!result.null());
    LOG << "Result type: " << result->getTypeString();
    const NodeRefListBase& nodeset = result->nodeset();
    for (int i = 0; i < nodeset.getLength(); ++i) {
      XalanNode* node = nodeset.item(i);
      LOG << "Node " << i << ": " << node->getNodeName();
      XalanDOMString str;
      DOMServices::getNodeData(*node, str);
      LOG << "Node " << i << " contents: " << str;
      // TODO: Append to a single return string
    }
  }
}
XPathEvaluator::terminate();
XMLPlatformUtils::Terminate();
================

This prints each of the nodes' contents, albeit with all of the tags'
contents stripped out.  My problems are that: A) I want to keep all
the XML tags, and B) I want to put them into one aggregated string.
For example, given the the following XML file:

I'm a little confused with what you mean by "all of the tags' contentsstripped out." Do you mean the tags are stripped out? If so, that'sexpected, because you're working with the XPath data model, and not withmarkup.


================
<?xml version="1.0" encoding="UTF-8"?>
<GenInfo xmlns="urn:mynamespace">
  <EntityId>ABC123</EntityId>
  <EntityName>My Favorite Entity</EntityName>
  <MembersInfo>
    <Member ID="123456">
      <Name>Bob Smith</Name>
    </Member>
    <Member ID="234567">
      <Name>Jane Doe</Name>
    </Member>
  </MembersInfo>
</GenInfo>
================

I'd like to produce this:
================
<Member ID="123456">
  <Name>Bob Smith</Name>
</Member>
<Member ID="234567">
  <Name>Jane Doe</Name>
</Member>
================

I think you're trying to take the nodes and serialize an external parsedentity. Take a look at the SerializeNodeSet sample for moreinformation. Unfortunately, Xalan-C's serializer does not implementnamespace fixup, so you will have trouble serializing documents that usenamespaces.


I looked at the DOMServices and DOMSupport classes and couldn't find
anything that produced a string of XML from a given XalanNode.  Do I
need to use the XalanTransformer class in some way?  Should I
essentially be generating an XSL transformation rather than using
XPathEvaluator?

Well, running a stylesheet would certainly take care of a lot of themessy details with using XPathEvaluator and doing serialization, but itdepends on whether your XPath expression are dynamic or static.


Also, how does the 'urn:mynamespace' aspect figure into this?  Does
that get passed to the prefix resolver?

Yes, the XPath process will need to know the bindings for namespaces.Note there's a simple PrefixResolver implementation, calledElementPrefixResolverProxy that will take an instance of a XalanElementand resolve namespace bindings using the prefixes defined on thatelement. Note this only works if all of the prefixes you need aredefined on a single element. Note also that it doesn't work withdocuments that use default namespace bindings, because XPath 1.0requires you use a prefix for QNames.


Finally, is there any documentation online that features descriptions
of the classes or do I need to infer everything from class names?  I
spent a ton of time trying to figure this out from the API docs but
the classes feature almost no descriptions of what role they actually
serve, or how to perform activities that seem to me to be pretty
fundamental (e.g. seeing the XML text representation of a given node
and its children).

We really need a "programmer's guide," but no one has ever volunteeredto write one, and I just don't have the time to do it. The sampleapplications are really the best place to start. As for your specificexample, "serialization" is the common term for generating markup froman instance of the data model, hence the "SerializeNodeSet" sampleapplication.


Dave

Re: Simple XPath Extraction

Reply via email to