[
http://issues.apache.org/jira/browse/XALANC-593?page=comments#action_12361134 ]
David Bertoni commented on XALANC-593:
--------------------------------------
This stylesheet has constructs that cannot scale. For example:
<xsl:for-each select="//UPN[ ( . = preceding::UPN ) and not( . =
following::UPN ) ]">
<xsl:for-each select="//PupilIdentifiers[ UPN = current() ] ">
<xsl:call-template name="Error">
<xsl:with-param name="err_num" select="60"/>
<xsl:with-param name="data" select="concat('UPN:', UPN,
'|Surname:', Surname, '|Forename:', Forename, '|Gender:', Gender, '|DOB:', DOB
)"/>
</xsl:call-template>
</xsl:for-each>
</xsl:for-each>
The first problem is using "//" in XPath expressions. "//" forces the
processor to search the _entire_ document for UPN elements. That means the
processor has to look at every element in the source tree, which is never going
to scale.
It looks to me like the element UPN only appears in the following paths:
/Message/Pupils/PupilsOnRoll/PupilOnRoll/PupilIdentifiers/UPN
/Message/Pupils/PupilsNoLongerOnRoll/PupilNoLongerOnRoll/PupilIdentifiers/UPN
So you could re-write this as a union of those two paths:
(/Message/Pupils/PupilsOnRoll/PupilOnRoll/PupilIdentifiers/UPN |
/Message/Pupils/PupilsNoLongerOnRoll/PupilNoLongerOnRoll/PupilIdentifiers/UPN)
However, the predicate for this XPath expression is an even bigger problem:
[ ( . = preceding::UPN ) and not( . = following::UPN ) ]
The preceding and following axes won't scale, because their complexity is not
linear. You are again forcing the processor to look at all the elements
preceding and following the current UPN to look for other UPN elements. I
think you should look at xsl:key and modify your stylesheet to use keys for
these identity constraint cases. Without spending too much trying to
understand the semantics of your stylesheet, I suspect you are using brute
force lookup to find duplicate UPN elements. This is trivial to do with keys,
and is much faster because the processor builds lookup tables for each key.
As a final comment, I would also like to point out that much of the work you're
doing with your stylesheet is validating the content of the document, which
would be much better done with an XML schema that validates the instance
document while it's being parsed. You can, of course, write a stylesheet to do
this, but I think what you're seeing is the performane is not really optimal.
> Poor performance with a complex XSL stylesheet and large XML file
> -----------------------------------------------------------------
>
> Key: XALANC-593
> URL: http://issues.apache.org/jira/browse/XALANC-593
> Project: XalanC
> Type: Bug
> Components: XalanC
> Versions: 1.9, 1.10
> Environment: Platform: Windows XP Professional
> Processor: 2GHz
> RAM: 1Gb
> Reporter: [EMAIL PROTECTED]
> Attachments: SchoolCensus06-ErrorList-v1.4.xsl,
> SchoolCensus06-ValidationRules-v1.4.xsl, TEMP_XI_Y1219154918.xml,
> TEMP_XI_Y1219154918_halved.xml
>
> Xalan is performing poorly for a complex XSL transform on a large XML file.
> I have the details below, and I am attaching files for XML input and the XSL
> files.
> There are 2 problems - one is that a 1.5MB XML file takes about 2 minutes to
> be transformed.
> This could be solved by changing the XSL? - any suggestions welcome!
> The second problem is that the performance does not 'scale' with the size of
> the XML input - I took the same XML file and halved the size, and the
> performance more than doubled.
> So it looks like performance worsens with the size of the XML input.
> Performance in 1_10 is slightly worse than 1_9.
> ===============================
> Xalan-C_1_9_0-win32-msvc_60
> Xalan -t:
> 1.5MB XML:
> Source tree parsing time: 340.398336 milliseconds.
> Stylesheet compilation time: 133.1826288 milliseconds.
> Transformation time: 119932.1820512 milliseconds.
> 733Kb XML:
> Source tree parsing time: 158.737142 milliseconds.
> Stylesheet compilation time: 67.1794638 milliseconds.
> Transformation time: 36380.30150 milliseconds.
> ===============================
> 1.5MB XML:
> Xalan-C_1_10_0-win32-msvc_60
> Xalan -t:
> Source tree parsing time: 255.852040 milliseconds.
> Stylesheet compilation time: 68.236948 milliseconds.
> Transformation time: 134556.299906 milliseconds.
> 733Kb XML:
> Source tree parsing time: 142.380952 milliseconds.
> Stylesheet compilation time: 68.1692120 milliseconds.
> Transformation time: 41232.867330 milliseconds.
> ===============================
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]