OK, working it through step by step:

The synthesized node is now returning the proper ownerDocument node. We're
failing when trying to proceed past that point, when you're calling
            NodeList resultNodeSet = XPathAPI.selectNodeList(nl.item(i),
                                        "self::text()");
starting from this node.

In DTMManagerDefault.getDTMHandleFromNode(org.w3c.dom.Node node), we're
searching for a DTM which contains this node -- appropriate for DOM2DTM.
But the DTM has apparently never been added to this DTMManager -- m_dtms is
empty.  So we proceed to try to find the Document node and create a new
DTM.

But because that new DTM is generated from the original DOM document, it
creates a new synthesized node... so the one we started from, quite
correctly, is _NOT_ in this new document and returns can't-be-found becuase
the DTM Manager is looking in the wrong place.

Sigh. The DTM code itself is essentially working as designed; this just
happens to be a fringe case where the facts that the synthesized node is
not actually part of the source DOM _AND_ that the DTM wasn't registered
with the manager are combining badly.



So why is this DTM not being put into the DTMManager so it can be found and
reused? Well, that seems to be because it was created by this
getDTMHandleFromNode fallback in the first place.

The question is, how best to fix this?

   We could have the implicitly created DTMs added to the manager (and
   probably should, since otherwise their size will be limited)... but that
   gets us into the question of determining when they can be deregistered
   and discarded.

   We can change XPathAPI to always request a new DTM for the node (pass in
   a real DOMSource) rather than using the implicit creation mechanism...
   but that would mean that calling XPath recursively from within an
   extension function would create a new DTM, which is wasteful and
   perpetuates the above bug.

   We can have seperaten XPathAPI calls for "assume it's a new DOM and
   create a DTM" versus "assume it's an existing DTM and try to handle it
   that way." But if you assume wrong, you're in trouble.

   We can remove the implicit DTM creation, have getDTMHandleFromNode
   return DTM.NULL in that case, and make it the caller's responsibility to
   create DTMs when they don't exist and try again. That's somewhat more
   ugly to use, _BUT_ it has the advantage that the caller knows when they
   have created a DTM and can actively decide when to discard it again.


   ... Any other ideas?


I _can_ simply kluge past this one failure case. But as noted above, we now
Really Want DTMs to be managed, which requires that we explicitly deal with
when they're discarded... and I'd prefer to see us solve this basic issue
rather than applying a band-aid.



General observation: getHandleFromNode is potentially hugely expensive in
DOM2DTM, if the node is late in the document. As the comments indicate,
there's a possible speedup... but even that would be costly the first time
this node is accessed. Unfortunately I haven't yet found an inexpensive and
portable alternative.


Reply via email to