I've been doing a *lot* of running benchmarks and profiles of Xalan [DTM_EXP] the last two weeks. One of the real hotspots that has come up is variable and parameter execution. Basically, our current mechanism sucks right now, and has got to go. What we do now is create a stack frame for each template, and use it as a proper stack, pushing Arg objects on it (complications having to do with params deleted for sake of brevity). When we execute a variable in an expression, we do a sequential search in that stack for an Arg that holds the variable name, doing an expensive comparison of the QName objects. OK, so here's a redesign that I plan to implement tomorrow. Please let me know if you have better ideas. 1) ElemTemplate will hold three variables: frameSize) which is the stack frame size for the template, which is equal to the maximum number of params and variables that can be declared in the template at one time. inArgsSize) The size of the portion of the stack frame that can hold parameter arguments, and argsQNameIDs, which is a list of namespace/local-name expressed as integers (using the ExtendedNameTable class in the DTM), that are unique qname identifiers for the arguments. The position of a given qname in the list is the argument ID, and thus the position in the stack frame. 1b) Treat the global space (stylesheet) with the same variables as in ElemTemplate. 2) In ProcessorTemplate#startElement, begin maintaining a list of variable qnames, and a maximum value. 3) Have ElemVariable hold a stack frame index ID. 4) For each ElemVariable (including ElemParam) add the qname to the list of variable qnames. For ElemParam add the qname ID to the list of param names in the Template. (Note that xsl:param is constrained to be the first children of the xsl:template). 5) In XSLTAttributeDef#processEXPR, processPATTERN, and processAVT, call new method fixUpVariableRefs(Vector qnames, int top) on each Expression. In XPath land, this will be called recursivly down the expression tree, and each Variable expression will search backwards to match the QName, and then assign the position index to a Variable member variable. [This is where a proper abstract syntax tree would be much better]. 6) In each ProcessorTemplateElem#endElement, pop off the variables assigned since ProcessorTemplateElem#startElement. 7) For xsl:call template, xsl:with-param (ElemWithParam) can assign it's parameters based on index, since it knows which template it's going to execute at stylesheet build time. [however, I'm a little worried that xsl:include and xsl:import will complicate this enough that I'll have to do the index assignment in the compose() method]. 8) xsl:apply-templates is a bit nastier. The same apply-templates can end up calling 20 different templates, all with different xsl:param setups. So each xsl:with-param in apply-templates will be assigned a qname ID, and the argsQNameIDs list of the template being called will be searched sequentially to find the slot for that param. This is not perfect, but since the comparison will be of integers, will only be done at call time instead of variable execution time, it's not too bad. Also, you only have to search again if the template being called changes. 9) Each execution of xsl:variable and xsl:param will stick its data via index into the stack frame. 10) Each execution of Variable will be a straight, quick lookup into the stack frame array. 10b) but I'll have to take into account globals. So the index will either be into the local stack from or the global stack frame. 11) Each stack frame will be a seperate object, stored by pool, so we'll only need to create new stack frames as the depth is increased the first time. I think this is preferable to a single array that has to be resized, and is how it is implemented now. Does this sound reasonable? Can I do it without screwing things up? This should improve performance a fair amount for certain classes of stylesheets, and actually improve performance a little bit for all stylesheets, even if they don't use variables (because we won't have to be doing push/popContextMarker for every element). The variable mechanism tends to be very delicate, but I think this should be pretty robust out of the box, and shouldn't effect the actual variable evaluation (i.e. lazy evaluation and stuff like that). Or am I deluding myself? -scott
