I've stumbled across a behaviour I don't quite understand. I've been executing XPath queries against rc1 (using jdk 1.3.1, Redhat Linux 7.1) and looking at the results via DOM after using ResourceSet's getMembersAsResource(). One of the nodes I've been expecting to get back should have the body of "Harry Potter and the Philosopher's Stone" but it comes back as "Harry Potter and the Philosopher" (i.e., truncated at the '). Has anyone experienced anything like this?

In detail...

My XML documents contain elements like this:

<title>Harry Potter and the Philosopher's Stone</title>

When I use the command line tools, I see results like:

xindice xpath -c /db/books -q "/book[body/p[contains(.,'experience')]]/title"

<?xml version="1.0"?>
<title xmlns:src="http://xml.apache.org/xindice/Query"; src:col="/db/books" src:key="potter.xml">Harry Potter and the Philosopher&apos;s Stone</title>
<?xml version="1.0"?>
<title xmlns:src="http://xml.apache.org/xindice/Query"; src:col="/db/books" src:key="whywebuy.xml">Why We Buy: The Science of Shopping</title>


When I code up that query in Java I end up with code like this:

// A method to run the query and return a DOM:
public Document getResultsAsDOM() throws XMLDBException
{
XPathQueryService service =
        (XPathQueryService)collection.getService("XPathQueryService", "1.0");
ResourceSet resultSet = service.query(xpath);
                
// Any results?
if (resultSet == null || resultSet.getSize() == 0)
        return null;

// We want all the results as an XML document:
Resource xml =resultSet.getMembersAsResource();

// Sanity check:
if (xml.getResourceType() != XMLResource.RESOURCE_TYPE)
throw new XMLDBException(ErrorCodes.VENDOR_ERROR, "Unexpected result type");

return ((XMLResource)xml).getContentAsDOM().getOwnerDocument();
}


And then I call the above method to get a DOM and test the results:

// We know what the titles are:
String title1 = "Harry Potter and the Philosopher's Stone";
String title2 = "Why We Buy: The Science of Shopping";

assertEquals("Wrong second title", title2, titles.item(1).getFirstChild().getNodeValue());

assertEquals("Wrong first title", title1, titles.item(0).getFirstChild().getNodeValue());

.... and I get a failure on this last assert (the one for the Harry Potter title):


Wrong first title expected:<Harry Potter and the Philosopher's Stone> but was:<Harry Potter and the Philosopher>

The DOM seems fine (two results in it, as expected... the "Why we Buy" test passes). So I'm wondering if there's something I don't understand about the handling of '/&apos;.

NB. If I remove the ' from the title in my original XML and reimport the file, there are no problems.

Any clues much appreciated
Richard





Reply via email to