I've been doing a *lot* of running benchmarks and profiles of Xalan
[DTM_EXP] the last two weeks.  One of the real hotspots that has come up is
variable and parameter execution.  Basically, our current mechanism sucks
right now, and has got to go.

What we do now is create a stack frame for each template, and use it as a
proper stack, pushing Arg objects on it (complications having to do with
params deleted for sake of brevity).  When we execute a variable in an
expression, we do a sequential search in that stack for an Arg that holds
the variable name, doing an expensive comparison of the QName objects.

OK, so here's a redesign that I plan to implement tomorrow.  Please let me
know if you have better ideas.

1) ElemTemplate will hold three variables: frameSize) which is the stack
frame size for the template, which is equal to the maximum number of params
and variables that can be declared in the template at one time.
inArgsSize) The size of the portion of the stack frame that can hold
parameter arguments, and argsQNameIDs, which is a list of
namespace/local-name expressed as integers (using the ExtendedNameTable
class in the DTM), that are unique qname identifiers for the arguments.
The position of a given qname in the list is the argument ID, and thus the
position in the stack frame.

1b) Treat the global space (stylesheet) with the same variables as in
ElemTemplate.

2) In ProcessorTemplate#startElement, begin maintaining a list of variable
qnames, and a maximum value.

3) Have ElemVariable hold a stack frame index ID.

4) For each ElemVariable (including ElemParam) add the qname to the list of
variable qnames.  For ElemParam add the qname ID to the list of param names
in the Template. (Note that xsl:param is constrained to be the first
children of the xsl:template).

5) In XSLTAttributeDef#processEXPR, processPATTERN, and processAVT, call
new method fixUpVariableRefs(Vector qnames, int top) on each Expression.
In XPath land, this will be called recursivly down the expression tree, and
each Variable expression will search backwards to match the QName, and then
assign the position index to a Variable member variable.  [This is where a
proper abstract syntax tree would be much better].

6) In each ProcessorTemplateElem#endElement, pop off the variables assigned
since ProcessorTemplateElem#startElement.

7) For xsl:call template, xsl:with-param (ElemWithParam) can assign it's
parameters based on index, since it knows which template it's going to
execute at stylesheet build time.  [however, I'm a little worried that
xsl:include and xsl:import will complicate this enough that I'll have to do
the index assignment in the compose() method].

8) xsl:apply-templates is a bit nastier.  The same apply-templates can end
up calling 20 different templates, all with different xsl:param setups.  So
each xsl:with-param in apply-templates will be assigned a qname ID, and the
argsQNameIDs list of the template being called will be searched
sequentially to find the slot for that param.  This is not perfect, but
since the comparison will be of integers, will only be done at call time
instead of variable execution time, it's not too bad.  Also, you only have
to search again if the template being called changes.

9) Each execution of xsl:variable and xsl:param will stick its data via
index into the stack frame.

10) Each execution of Variable will be a straight, quick lookup into the
stack frame array.

10b) but I'll have to take into account globals.  So the index will either
be into the local stack from or the global stack frame.

11) Each stack frame will be a seperate object, stored by pool, so we'll
only need to create new stack frames as the depth is increased the first
time.  I think this is preferable to a single array that has to be resized,
and is how it is implemented now.

Does this sound reasonable?  Can I do it without screwing things up?  This
should improve performance a fair amount for certain classes of
stylesheets, and actually improve performance a little bit for all
stylesheets, even if they don't use variables (because we won't have to be
doing push/popContextMarker for every element).  The variable mechanism
tends to be very delicate, but I think this should be pretty robust out of
the box, and shouldn't effect the actual variable evaluation (i.e. lazy
evaluation and stuff like that).  Or am I deluding myself?

-scott



Reply via email to