DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=897>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=897

Memory leak reading large XML-files with SAX parser





------- Additional Comments From [EMAIL PROTECTED]  2003-11-10 04:01 -------
This is still a problem in the lastest version of Xerces (2.5).  The number
"java.io.StringReader" increases until it runs out of memory - they are never
able to be garbage collected.

Here's some sample RDF/XML:<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE rdf:RDF [
<!ENTITY math  "http://kowari.org/math#";>
<!ENTITY owl   "http://www.w3.org/2002/07/owl#";>
<!ENTITY rdf   "http://www.w3.org/1999/02/22-rdf-syntax-ns#";>
<!ENTITY rdfs  "http://www.w3.org/2000/01/rdf-schema#";>
<!ENTITY xsd   "http://www.w3.org/2001/XMLSchema#";>
]>

<rdf:RDF xmlns:math ="&math;"
         xmlns:owl  ="&owl;"
         xmlns:rdf  ="&rdf;"
         xmlns:rdfs ="&rdfs;">
<rdf:Description>
  <owl:sameIndividualAs rdf:datatype="&xsd;integer">14</owl:sameIndividualAs>
  <rdfs:label xml:lang="en">fourteen</rdfs:label>
  <math:roman>XIV</math:roman>
  <math:square rdf:datatype="&xsd;integer">196</math:square>
  <math:primeFactorization>
    <rdf:Bag>
      <rdf:li rdf:datatype="&xsd;integer">2</rdf:li>
      <rdf:li rdf:datatype="&xsd;integer">7</rdf:li>
    </rdf:Bag>
  </math:primeFactorization>
</rdf:Description>
<rdf:Description>
  <owl:sameIndividualAs rdf:datatype="&xsd;integer">15</owl:sameIndividualAs>
  <rdfs:label xml:lang="en">fifteen</rdfs:label>
  <math:roman>XV</math:roman>
  <math:square rdf:datatype="&xsd;integer">225</math:square>
  <math:primeFactorization>
    <rdf:Bag>
      <rdf:li rdf:datatype="&xsd;integer">3</rdf:li>
      <rdf:li rdf:datatype="&xsd;integer">5</rdf:li>
    </rdf:Bag>
  </math:primeFactorization>
  <rdf:type rdf:resource="&math;TriangularNumber"/>
</rdf:Description>
</rdf:RDF>

When you inline all of the references, then it only ever has 4 objects
allocated.  For example:
<rdf:Description>
  <owl:sameIndividualAs
rdf:datatype="http://www.w3.org/2001/XMLSchema#integer";>14</owl:sameIndividualAs>
  <rdfs:label xml:lang="en">fourteen</rdfs:label>
  <math:roman>XIV</math:roman>
  <math:square
rdf:datatype="http://www.w3.org/2001/XMLSchema#integer";>196</math:square>
  <math:primeFactorization>
    <rdf:Bag>
      <rdf:li rdf:datatype="http://www.w3.org/2001/XMLSchema#integer";>2</rdf:li>
      <rdf:li rdf:datatype="http://www.w3.org/2001/XMLSchema#integer";>7</rdf:li>
    </rdf:Bag>
  </math:primeFactorization>
</rdf:Description>


Here's a report from Optimize It after parsing a large amount of this XML:
2509 instances of java.io.StringReader  allocated.
   100.0% org.apache.xerces.impl.XMLEntityManager.startEntity()
      100.0% org.apache.xerces.impl.XMLScanner.scanAttributeValue()
         100.0%
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanAttribute()
            100.0%
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement()
               99.84%
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch()
                  99.84%
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument()
                     99.84% org.apache.xerces.parsers.DTDConfiguration.parse()

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to