Re: Xalan transform hand-built DOM

David Bertoni Mon, 05 Jun 2006 14:40:01 -0700

[EMAIL PROTECTED] wrote:


Hi Dave,

Thank you once again for your helpful reply.

 >>
 >> I have, as suggested, created a trivial program that illustrates the
 >> problem.
 >> It is based on the XalanTransfrom example with minimal changes.
 >> It is attached.
 >>
 >> Hopefully it shows where I am going wrong.
 >> Maybe the problem is with my approach rather than any bug in Xalan.
 >
 >OK, the bad news is this is really something we haven't supported in a
 >very long time.  In fact, I'm not sure if XalanTransformer ever
 >supported using a pre-built tree in this way.  We should probably remove
 >the XalanNode constructor from XSLTInputSource.

Well, from my point of view that would be very helpful
as it would have saved me from some frustration.

 >The good news is the way you can do this is fairly straightforward.  You
 >need to create your own derivative of XalanParsedSource, and wrap your
 >document instance in that.  You really need to do a bit more work to get
 >your own XalanDocument implementation to work anyway.

OK. I had started down the path of using XalanParsedSource
and given up. However given you suggestion I tried it again
and have finally succeeded in getting a trivial translation to
work. There were a number of problems that I had which
meant it took a lot longer than I would have liked but
nothing too obscure.

I am not sure exactly what you meant about
getting my implementation of XalanDocument to work.
I know (now) about the bug in getFirstChild/getLastChild
and I plan to implement the getElementById function.
But was there something else that I have missed?

Xalan-C does not use the getElementById() function, so please don'timplement it for the sake of doing XSLT. What I meant to point out wasthat XSLT needs to order node-sets in document order, which can be veryexpensive. Xalan-C's default implementation optimizes this by keepingan integer in each node that indicates the node's position in thedocument. If you want good performance for your XalanDOMimplementation, you should really do this too.

 >To get your own custom implementation of XalanDOM to work you need to
 >provide a derivative of the DOMSupport class that understands your
 >implementation. You can probably just use DOMSupportDefault, but you
 >should create your own if you can optimize node-ordering for your
 >implementation.  For example, XalanSourceTree (the default
 >implementation) uses indexing to support fast node ordering.  I would
 >highly recommend you do something like that, if you want to see
 >reasonable performance.

I am interested in this point.
I can't realistically provide (global) node indexing.
(My DOM is a virtualization of a database and I can't
order everything without reading the whole
database - which is never going to happen.) However I plan to index
my nodes locally (that is, I can find the index of a node wrt its parent

quickly) and therefore I can order nodes much faster than theimplementationI find in DOMServices.cpp (Which has to do a linear search of thechildren.)


I assume that you mean that I should provide my own implementation of
"DOMSupport" and provide my own version of "isNodeAfter" which I can
make run a lot faster. This I understand - no problem.


Exactly.


What's not immediately clear to me is why this would make Xalan run
a lot faster. Does Xalan spend a lot of time checking the ordering
of nodes? Is this obvious? (Is there some reference I could read to
understand why this should be the case?)

XPath states that operations applied to node-sets assume that nodesappear in document order. This happens all the time, and can be veryexpensive. For example, the XPath expression:


foo/bar[1]

would result in searching for all of the "foo" nodes, ordering them,then searching for all of the "bar" nodes and ordering them. Of course,this simple case is optimized, but more complex expressions are notoptimized, because detecting them and implementing the optimization isdifficult. If you have lots of union operations (foo/bar | foo/baz) inyour expressions, then that can be very expensive. This may not affectyou, since you will optimize when nodes are siblings.


Is there anything else that I should try and do
that would particularly help performance?

I hesitate to recommend anything off-the-cuff, because guessing aboutoptimization is usually a waste of time. I would suggest you do yourimplementation, then feed it through some sample data sets using yourfavorite profiling tool.


 >Feel free to post again if this isn't clear, or you have more questions.

OK. One more question.
As I mentioned my DOM is a virtualization of
a database. (Not a relational database BTW). So I (plan to) have an
element for a table containing many elements
(potentially 10s of millions of elements) for the rows.
I plan to support user queries expressed as XPaths.
Now if an XPath like ".../tableName/rowName[13]" appears then
what Xalan appears to do is to walk all of the children
using getFirstChild and getNextSibling and then apply the predicate.
OK. I can understand this as it has to apply a node test
and it doesn't know that all the children are "rowName" without
access to something like the PSVI. But even if I do something
like ".../tableName/node()[13]" it looks like it will always
enumerate all of the children in XPath::findChildren
I guess I would prefer some way to get it to use the
DOM's XalanNodeList.item function to find child 13.
(I know that things are slightly more complicated because
the axis might affect the ordering but this doesn't
seem to make it undoable.)
I assume that the reason that Xalan works this way
is because the Xalan DOM classes themselves don't support "item"
and therefore the Xpath implementation can't expect
to use this function. However the question remains:
Is there any way to get Xalan to evaluate
Xpaths whilst avoiding enumerating all of the children.

We don't use the XalanNode::getChildNodes() because, in most cases,implementing XalanNodeList is more expensive than implementinggetFirstChild()/getNextChild() in an manner that optimizes for space.Unfortunately for you, it's probably cheaper to implement XalanNodeList,so you might want to modify XPath::findChildren() to use your ownimplementation.

We used to provide an abstract interface in XPath, but the cost was toogreat, given that very few users were interested in implementing theirown source tree (DOM).


(Oh, I know about the "id" function and I plan
to try and use that in the short term.
But that relies on the client generating efficient queries.)

And finally... It so happens that my underlying database
holds string data in UTF-8. At the moment this means that
whenever I implement a node it has to copy the data
from UTF-8 into XalanDOMStrings. Is there any chance that
in future XalanDOMString will become an interface?
That would allow me to avoid the copies.

Xerces-C and Xalan-C were implemented using UTF-16, since that's whatmany of the W3 recommendations chose, and what SAX chose. There wassome discussion of turning XalanDOMString into an abstract base class,but we've always hesitated, because the cost of doing that would begreat, in comparison to the benefit in the typical cases.

BTW, can you please modify your newreader to post in text format, ratherthan in HTML? Your last post came through with some very strangeformatting, including absurdly tiny fonts.


Dave

Re: Xalan transform hand-built DOM

Reply via email to