[EMAIL PROTECTED] wrote:
Hi Dave,
Thank you once again for your helpful reply.
>>
>> I have, as suggested, created a trivial program that illustrates the
>> problem.
>> It is based on the XalanTransfrom example with minimal changes.
>> It is attached.
>>
>> Hopefully it shows where I am going wrong.
>> Maybe the problem is with my approach rather than any bug in Xalan.
>
>OK, the bad news is this is really something we haven't supported in a
>very long time. In fact, I'm not sure if XalanTransformer ever
>supported using a pre-built tree in this way. We should probably remove
>the XalanNode constructor from XSLTInputSource.
Well, from my point of view that would be very helpful
as it would have saved me from some frustration.
>The good news is the way you can do this is fairly straightforward. You
>need to create your own derivative of XalanParsedSource, and wrap your
>document instance in that. You really need to do a bit more work to get
>your own XalanDocument implementation to work anyway.
OK. I had started down the path of using XalanParsedSource
and given up. However given you suggestion I tried it again
and have finally succeeded in getting a trivial translation to
work. There were a number of problems that I had which
meant it took a lot longer than I would have liked but
nothing too obscure.
I am not sure exactly what you meant about
getting my implementation of XalanDocument to work.
I know (now) about the bug in getFirstChild/getLastChild
and I plan to implement the getElementById function.
But was there something else that I have missed?
Xalan-C does not use the getElementById() function, so please don't
implement it for the sake of doing XSLT. What I meant to point out was
that XSLT needs to order node-sets in document order, which can be very
expensive. Xalan-C's default implementation optimizes this by keeping
an integer in each node that indicates the node's position in the
document. If you want good performance for your XalanDOM
implementation, you should really do this too.
>To get your own custom implementation of XalanDOM to work you need to
>provide a derivative of the DOMSupport class that understands your
>implementation. You can probably just use DOMSupportDefault, but you
>should create your own if you can optimize node-ordering for your
>implementation. For example, XalanSourceTree (the default
>implementation) uses indexing to support fast node ordering. I would
>highly recommend you do something like that, if you want to see
>reasonable performance.
I am interested in this point.
I can't realistically provide (global) node indexing.
(My DOM is a virtualization of a database and I can't
order everything without reading the whole
database - which is never going to happen.) However I plan to index
my nodes locally (that is, I can find the index of a node wrt its parent
quickly) and therefore I can order nodes much faster than the
implementation
I find in DOMServices.cpp (Which has to do a linear search of the
children.)
I assume that you mean that I should provide my own implementation of
"DOMSupport" and provide my own version of "isNodeAfter" which I can
make run a lot faster. This I understand - no problem.
Exactly.
What's not immediately clear to me is why this would make Xalan run
a lot faster. Does Xalan spend a lot of time checking the ordering
of nodes? Is this obvious? (Is there some reference I could read to
understand why this should be the case?)
XPath states that operations applied to node-sets assume that nodes
appear in document order. This happens all the time, and can be very
expensive. For example, the XPath expression:
foo/bar[1]
would result in searching for all of the "foo" nodes, ordering them,
then searching for all of the "bar" nodes and ordering them. Of course,
this simple case is optimized, but more complex expressions are not
optimized, because detecting them and implementing the optimization is
difficult. If you have lots of union operations (foo/bar | foo/baz) in
your expressions, then that can be very expensive. This may not affect
you, since you will optimize when nodes are siblings.
Is there anything else that I should try and do
that would particularly help performance?
I hesitate to recommend anything off-the-cuff, because guessing about
optimization is usually a waste of time. I would suggest you do your
implementation, then feed it through some sample data sets using your
favorite profiling tool.
>Feel free to post again if this isn't clear, or you have more questions.
OK. One more question.
As I mentioned my DOM is a virtualization of
a database. (Not a relational database BTW). So I (plan to) have an
element for a table containing many elements
(potentially 10s of millions of elements) for the rows.
I plan to support user queries expressed as XPaths.
Now if an XPath like ".../tableName/rowName[13]" appears then
what Xalan appears to do is to walk all of the children
using getFirstChild and getNextSibling and then apply the predicate.
OK. I can understand this as it has to apply a node test
and it doesn't know that all the children are "rowName" without
access to something like the PSVI. But even if I do something
like ".../tableName/node()[13]" it looks like it will always
enumerate all of the children in XPath::findChildren
I guess I would prefer some way to get it to use the
DOM's XalanNodeList.item function to find child 13.
(I know that things are slightly more complicated because
the axis might affect the ordering but this doesn't
seem to make it undoable.)
I assume that the reason that Xalan works this way
is because the Xalan DOM classes themselves don't support "item"
and therefore the Xpath implementation can't expect
to use this function. However the question remains:
Is there any way to get Xalan to evaluate
Xpaths whilst avoiding enumerating all of the children.
We don't use the XalanNode::getChildNodes() because, in most cases,
implementing XalanNodeList is more expensive than implementing
getFirstChild()/getNextChild() in an manner that optimizes for space.
Unfortunately for you, it's probably cheaper to implement XalanNodeList,
so you might want to modify XPath::findChildren() to use your own
implementation.
We used to provide an abstract interface in XPath, but the cost was too
great, given that very few users were interested in implementing their
own source tree (DOM).
(Oh, I know about the "id" function and I plan
to try and use that in the short term.
But that relies on the client generating efficient queries.)
And finally... It so happens that my underlying database
holds string data in UTF-8. At the moment this means that
whenever I implement a node it has to copy the data
from UTF-8 into XalanDOMStrings. Is there any chance that
in future XalanDOMString will become an interface?
That would allow me to avoid the copies.
Xerces-C and Xalan-C were implemented using UTF-16, since that's what
many of the W3 recommendations chose, and what SAX chose. There was
some discussion of turning XalanDOMString into an abstract base class,
but we've always hesitated, because the cost of doing that would be
great, in comparison to the benefit in the typical cases.
BTW, can you please modify your newreader to post in text format, rather
than in HTML? Your last post came through with some very strange
formatting, including absurdly tiny fonts.
Dave