http://nagoya.apache.org/bugzilla/show_bug.cgi?id=2336

*** shadow/2336 Tue Jun 26 12:56:20 2001
--- shadow/2336.tmp.8454        Tue Jun 26 12:56:20 2001
***************
*** 0 ****
--- 1,80 ----
+ +============================================================================+
+ | Large data problem with SAX: characters method chops value when offset at  |
+ +----------------------------------------------------------------------------+
+ |        Bug #: 2336                        Product: Xerces-J                |
+ |       Status: NEW                         Version: 1.4.1                   |
+ |   Resolution:                            Platform: PC                      |
+ |     Severity: Major                    OS/Version: Windows NT/2K           |
+ |     Priority: Other                     Component: SAX                     |
+ +----------------------------------------------------------------------------+
+ |  Assigned To: [EMAIL PROTECTED]                                  |
+ |  Reported By: [EMAIL PROTECTED]                                         |
+ +----------------------------------------------------------------------------+
+ |          URL:                                                              |
+ +============================================================================+
+ |                              DESCRIPTION                                   |
+ Using JDK 1.3.1 on Win2K and SAX 2.0 (also with SAX 1.x).
+ 
+ Steps to reproduce:
+ 
+ 1. run against the file: http://www.geocities.com/ascii_text/sax_example.xml
+ 2. as the file is parsed, a the character array/buffer of size 16384 is passed
+    to the characters method in the implementation class
+ 3. the offset value passed to the characters method is moved along to match 
+    text contained within the elements
+ 4. what happens if text value is partly in one buffer and partly in the next
+    buffer?  only part of the text value is given
+ 
+ For example, in the file referenced above, we have the following xml snippet:
+ 
+ "...
+     <column>
+       <column_name>COLUMN_5</column_name>
+       <column_value>VALUE_5</column_value>
+     </column>
+ ..."
+ 
+ It turns out that as the file is processed, the current buffer contains:
+ 
+ "...
+     <column>
+       <column_name>CO" [END OF BUFFER]
+ 
+ and the next buffer contains:
+ 
+ [BEGINNING OF BUFFER]"LUMN_5</column_name>
+       <column_value>VALUE_5</column_value>
+     </column>
+ ..."
+ 
+ The corresponding buffer-related values passed to the characters 
+ method are as follows.  
+ 
+ * For the current buffer:
+ 
+ OFFSET: 16382
+ LENGTH: 2
+ CHARACTER ARRAY LENGTH: 16384
+ 
+ * For the next buffer:
+ 
+ OFFSET: 0
+ LENGTH: 6
+ CHARACTER ARRAY LENGTH: 16384
+ 
+ We can see that the text value COLUMN_5 is chopped into "CO" and "LUMN_5"
+ and the reason is that the first part of the value ("CO") lies in one
+ buffer and the second part ("LUMN_5") lies in the next buffer.
+ 
+ As a result of all this, the characters value reported is incorrect.
+ 
+ This doesn't happen for every buffer; in fact, it took an xml file
+ 45,000 lines long to get the problem to show up.  But for large xml
+ files, it almost always happens to me.  This is a serious problem
+ because it prevents me from using and testing large data sets.
+ 
+ Please e-mail me if there is a work-around or if I am mis-using
+ the API.
+ 
+ I was unable to find another reference to this problem, but I
+ would be surprised if others haven't encountered it.
\ No newline at end of file

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to