Hi,
I'm using Xerces2 2.2.1 (DOMParser), JDK 1.4.1 (Sun) on Linux
and I have been fighting with a strange error for 2 days. It
looks like java bug but not for sure.
I have a simple but big XML file (aprox. 4.6 Mb) and trying
just to read all elements of type <content>XXX</content>. The XXX
are international texts in UTF-8 coding. The total number of
elements is 1799 but after reading element num. 123 it crashs
with "Exception in thread "main" java.lang.OutOfMemoryError". If
I increase the maximum heap size it can read a litle bit more
records but crashs anyway. Here is my test code:
import org.apache.xerces.parsers.DOMParser;
import org.w3c.dom.*;
import org.xml.sax.*;
import java.io.*;
/**
* Test Class (throws java.lang.OutOfMemoryError for Linux and
JDK 1.4.1) */
public class Test
{
/////////////////////////////////////////////////////////////
// C O N S T A N T S
/////////////////////////////////////////////////////////////
/** Test file name */
public static final String TEST_FILE_NAME = "test.xml";
/////////////////////////////////////////////////////////////
// M E T H O D S
/////////////////////////////////////////////////////////////
/** Constructor
*/
public Test()
{
}
/** Import default file
*/
public void runTest()
{
try
{
// New parser
DOMParser parser = new DOMParser();
// Create input source and parse file
InputSource is = new InputSource(new
InputStreamReader(new FileInputStream(TEST_FILE_NAME), "UTF-8"));
parser.parse(is);
System.out.println("Document parsed ...");
// Get document
Document doc = parser.getDocument();
// Get all <content ...> nodes
NodeList list = doc.getElementsByTagName("content");
System.out.println("Total number of <content...>
elements: " + list.getLength());
// Just read all of them
for(int i=0; i<list.getLength(); i++)
{
System.out.println("Item # " + i);
Node data_node = list.item(i).getFirstChild();
String content = data_node.getNodeValue();
}
}
catch(Exception e)
{
e.printStackTrace();
}
}
/** Main
*/
public static void main(String args[])
{
// Create object
Test test = new Test();
// Run the test
test.runTest();
}
}
My investigation shows:
* The Test does not crash under win2000 and JDK 1.4.1
* The Test does not crash under Linux and JDK1.3.1
I have traced down the error. It seems that the trouble is in
java.lang.StringBuffer class. In the class
org.apache.xerces.dom.DeferredDocumentImpl is a final
StringBuffer variable fBufferStr. In the method
getNodeValueString(...) is this StringBuffer reseted by calling
fBufferStr.setLength(0);
but somehow the memory is not free, even if you call garbage
collector immediately after reseting. I removed the "final"
keyword from fBufferStr declaration and replaced reseting by:
fBufferStr = new StringBuffer();
and it worked fine (without memory error). Other way how to make
the example work is to slightly change method synchronizeData in
the class org.apache.xerces.dom.DeferredTextImpl. I have replaced
line
data = ownerDocument.getNodeValueString(fNodeIndex);
with
data = new String(ownerDocument.getNodeValueString(fNodeIndex));
and it works but it's still much much slower then win2000 or JDK
1.3.1.
Any idea?
Radomir
--
---
EXOTIKA * POTÁPĚNÍ * HORSKÉ MASIVY * buďte na své dovolené VOLNÝ
* http://cestovani.volny.cz
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]