[jira] Commented: (XERCESC-1207) XMLScanner::scanCharData fills XMLBuffer until out of memory

jira Wed, 12 May 2004 14:38:09 -0700

The following comment has been added to this issue:

     Author: Dan Rosen
    Created: Wed, 12 May 2004 2:38 PM
       Body:
I have a fix that's just about ready. You can specify an optional maximum size for an 
XMLBuffer and specify a handler to be invoked if the limit is reached. The handler 
should make its best attempt to empty the buffer as appropriate to the task. In this 
case, the handler for the fCDataBuf (a.k.a. "toUse" in XMLBuffer::scanCharData) 
invokes XMLBuffer::sendCharData to send a characters() callback and flush the buffer.


This is the cleanest design I can think of: it requires no changes to any of the 
scanning code in sendCharData or movePlainContentChars to handle the special case of 
the buffer being full, and it should account for only minimal performance overhead in 
XMLBuffer's internals (in most cases, only one if-zero comparison).

What I haven't gotten implemented yet is a way to pass a parameter at instantiation 
time specifying this limit size. I'd like some feedback on how best to do this.
---------------------------------------------------------------------
View this comment:
  http://issues.apache.org/jira/browse/XERCESC-1207?page=comments#action_35512

---------------------------------------------------------------------
View the issue:
  http://issues.apache.org/jira/browse/XERCESC-1207

Here is an overview of the issue:
---------------------------------------------------------------------
        Key: XERCESC-1207
    Summary: XMLScanner::scanCharData fills XMLBuffer until out of memory
       Type: Bug

     Status: Unassigned
   Priority: Critical

    Project: Xerces-C++
 Components: 
             Non-Validating Parser
   Versions:
             2.5.0

   Assignee: 
   Reporter: Dan Rosen

    Created: Mon, 10 May 2004 10:51 AM
    Updated: Wed, 12 May 2004 2:38 PM

Description:
When parsing an XML file consisting primarily of very large (hundreds of megabytes) 
blocks of contiguous character data, XMLScanner::scanCharData() happily attempts to 
build a single XMLBuffer containing all the data. Eventually the buffer becomes so 
large that the reallocation within XMLBuffer::insureCapacity() fails, causing 
std::bad_alloc to be thrown, or a crash in memcpy (depending on compiler). The 
fundamental problem seems to be that there is no upper bound imposed on buffer length.

In the SAX model, it is acceptable to issue multiple ContentHandler::characters() 
callbacks for a single contiguous block of data. The only restriction on how this 
should be implemented is that all characters in any single event must come from the 
same external entity; no further behavior is specified. So it would be perfectly 
conformant to the SAX model to set an upper bound on the size of a single characters() 
event.

(As far as I understand, allowing an upper bound in XMLScanner::scanCharData() would 
not affect the DOM)

I'd propose that an upper bound for character buffer size be added as an optional 
parameter (with some reasonable value as a default), either in the constructor of the 
parser or in useScanner(), and that that parameter be used to inform 
XMLScanner::scanCharData() when to force a call to sendCharData() to dump the buffer 
to its client.


---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.

If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa

If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (XERCESC-1207) XMLScanner::scanCharData fills XMLBuffer until out of memory

Reply via email to