DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=11831>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=11831

Extremely slow with long attribute values





------- Additional Comments From [EMAIL PROTECTED]  2002-09-11 20:21 -------
Neil,

I'm not used to sending bug reports, so sorry for the diff stuff. The contrived 
data for the parser is a large (eg.130K) file which looks like:

<a a="
aaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaa
...
aaaaaaaaaaaaaaaaaaaaaaaaa
"/>

In the current version of XERCES, for each instance of the parser, for the 
first time you parse this file, the parsing takes a lot of time. Once an 
instance processed this file each subsequent processing is done much faster. 
There is another problem assosiated with that though. Each Parser instance 
which parsed that file holds a 130K buffer. In a multithreaded environment, in 
which each thread holds its own instance of the parser (that scenario is a 
recommended approach, see javax.xml.parsers.DocumentBuilder) this can lead to 
running out of memory. The patch I submited does not resolve that problem, but 
enables a parser instance to release the buffer each time after parsing without 
real performance degradation.

As to your questions: The data has already been shown. At each run I parsed the 
file only once. The time shortened from 20s to 0.5s (including class loading). 
I also noticed some improvement (6%) in parsing the XNI-CONFIG.XML file 100 
times using a new instance of the parser each time (5.540ms->5.189ms). On the 
other hand I also noticed minor performance degradation on some other files. So 
the patch only shows the approach to the problem resolution and can surely be 
further optimized.

As to the patch itself. Do not be worried about the method calls. Since the 
methods are private they can all be inlined by modern VMs. The loop might look 
a bit suspicious :) But in fact it can only be executed at most 30 times during 
the lifetime of a parse instance, so it's more like if statement. I used a 
standard approach to extending an array. Instead of extending it in constant 
size chunks (which is an O(n^2) process), it's better to extend the array by 
multiplying its size (which is an O(n) process, not matter what the multiplier 
is).

Hope this will help
Regards
-Andrzej

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to