Hi, 
 
  I'm using Xerces2 2.2.1 (DOMParser), JDK 1.4.1 (Sun) on Linux
and I have been fighting with a strange error for 2 days. It
looks like java bug but not for sure.   
  I have a simple but big XML file (aprox. 4.6 Mb) and trying
just to read all elements of type <content>XXX</content>. The XXX
are international texts in UTF-8 coding. The total number of
elements is 1799 but after reading element num. 123 it crashs
with "Exception in thread "main" java.lang.OutOfMemoryError". If
I increase the maximum heap size it can read a litle bit more
records but crashs anyway. Here is my test code:   
import org.apache.xerces.parsers.DOMParser; 
import org.w3c.dom.*; 
import org.xml.sax.*; 
import java.io.*; 
 
/** 
 * Test Class (throws java.lang.OutOfMemoryError for Linux and
JDK 1.4.1)   */ 
public class Test 
{ 
   ///////////////////////////////////////////////////////////// 
   // C O N S T A N T S 
   ///////////////////////////////////////////////////////////// 
    
   /** Test file name */ 
   public static final String TEST_FILE_NAME = "test.xml"; 
    
   ///////////////////////////////////////////////////////////// 
   // M E T H O D S 
   ///////////////////////////////////////////////////////////// 
    
   /** Constructor 
    */ 
   public Test() 
   { 
   } 
    
   /** Import default file 
    */ 
   public void runTest() 
   { 
      try 
      { 
         // New parser 
         DOMParser parser = new DOMParser(); 
          
         // Create input source and parse file 
         InputSource is = new InputSource(new
InputStreamReader(new FileInputStream(TEST_FILE_NAME), "UTF-8"));
          parser.parse(is); 
         System.out.println("Document parsed ..."); 
          
         // Get document 
         Document doc = parser.getDocument(); 
 
         // Get all <content ...> nodes 
         NodeList list = doc.getElementsByTagName("content"); 
         System.out.println("Total number of <content...>
elements: " + list.getLength());            
         // Just read all of them 
         for(int i=0; i<list.getLength(); i++) 
         {             
            System.out.println("Item # " + i); 
            Node data_node = list.item(i).getFirstChild(); 
            String content = data_node.getNodeValue(); 
         } 
          
      } 
      catch(Exception e) 
      { 
         e.printStackTrace(); 
      } 
   } 
    
   /** Main 
    */ 
   public static void main(String args[]) 
   { 
      // Create object 
      Test test = new Test(); 
       
      // Run the test 
      test.runTest(); 
   } 
} 
 
My investigation shows: 
 
* The Test does not crash under win2000 and JDK 1.4.1 
* The Test does not crash under Linux and JDK1.3.1 
 
I have traced down the error. It seems that the trouble is in
java.lang.StringBuffer class. In the class
org.apache.xerces.dom.DeferredDocumentImpl is a final
StringBuffer variable fBufferStr. In the method
getNodeValueString(...) is this StringBuffer reseted by calling  

fBufferStr.setLength(0); 
 
but somehow the memory is not free, even if you call garbage
collector immediately after reseting. I removed the "final"
keyword from fBufferStr declaration and replaced reseting by:   
fBufferStr = new StringBuffer(); 
 
and it worked fine (without memory error). Other way how to make
the example work is to slightly change method synchronizeData in
the class org.apache.xerces.dom.DeferredTextImpl. I have replaced
line   
data = ownerDocument.getNodeValueString(fNodeIndex); 
 
with 
 
data = new String(ownerDocument.getNodeValueString(fNodeIndex)); 
 
and it works but it's still much much slower then win2000 or JDK
1.3.1.   
Any idea? 
 
  Radomir 


-- 
---
EXOTIKA * POTÁPĚNÍ * HORSKÉ MASIVY * buďte na své dovolené VOLNÝ
* http://cestovani.volny.cz


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to