Re: XPath: text() ignores CDATA

Henry Zongaro Mon, 10 Mar 2008 07:45:29 -0700

Hi, Ahmed.

Ahmed Ashour <[EMAIL PROTECTED]> wrote on 2008-03-06 01:04:57 PM:
> 1- Xalan doesn't concatenate the text into one, but returns the text
> before CDATA only.
> 
>        javax.xml.xpath.XPathExpression expression = xpath.compile("
> /root/text()");
> 
>        org.w3c.dom.NodeList result = (NodeList) expression.
> evaluate(new org.xml.sax.InputSource("foo.xml"), 
XPathConstants.NODESET);
>        for( int i=0; i < result.getLength(); i++ ) {
>            System.out.println(((Text) result.item(i)).
> getTextContent());//prints 'a' only, without CDATA or 'b'
>        }


You didn't show how you created the JAXP XPathFactory in this, but I'm 
assuming you created one that supports DOM as the object model - in fact, 
that's the only model supported by Xalan-J's implementation of the JAXP 
XPath API.

A DOM tree can have adjacent DOM Text nodes and DOM CDATA nodes, and 
that's almost certainly what your DOM tree looks like in this case:  a DOM 
Element node named root, with three children, a DOM Text node with string 
value "a", a DOM CDATA node with string value " CDATA " and a DOM Text 
node with string value "b".  But, as Joe and David have pointed out, the 
XPath data model has to present a logical view in which a text node never 
has a sibling that is another text node, so in XPath the three DOM nodes 
are seen as a single XPath text node, with string value "a CDATA b".

When you return the result of /root/text() as a nodeset, what is returned 
is the first DOM node of the three DOM Text nodes that comprise the 
logical XPath text node.

> 2- I see the point raised in secion 5.7 of Xpath 1.0, but wonder why
> both Internet Explorer and Firefox do not behave accordingly, please
> find below:
> 
> <html><head><title>foo</title><script>
>   function test() {
>     var text='<root>a<![CDATA[ CDATA ]]>b</root>';
>     if (window.ActiveXObject) {
>       var doc=new ActiveXObject('Microsoft.XMLDOM');
>       doc.async=false;
>       doc.loadXML(text);
>     } else {
>       var parser=new DOMParser();
>       var doc=parser.parseFromString(text,'text/xml');
>     }
>  alert(doc.documentElement.childNodes.length);
>   }
> </script></head><body onload='test()'>
> </body></html>

I'm a bit confused by this example.  It doesn't look like you're working 
with any XPath expression here, but with the DOM tree only, so you don't 
get the opportunity to notice the discrepancy between DOM and the XPath 
data model.

Thanks,

Henry
------------------------------------------------------------------
Henry Zongaro
XML Transformation & Query Development
IBM Toronto Lab   T/L 313-6044;  Phone +1 905 413-6044
mailto:[EMAIL PROTECTED]

Re: XPath: text() ignores CDATA

Reply via email to