The following comment has been added to this issue:
Author: Dan Rosen
Created: Thu, 13 May 2004 10:48 AM
Body:
Hi Neil,
Thanks for the advice ...
> On the other hand, if you can design a handler that knows how to
> make appropriate calls to the scanner's sendChars() method so that
> the buffer gets flushed when a maximum buffer size is reached, then
> perhaps a pluggable handler wouldn't be necessary since the default
> behaviour would always work when an application has chosen to set
> this limit.
The code I have, as written, does have the pluggable handler notion, since I didn't
want to arbitrarily couple the XMLBuffer class implementation (or worse, it's
interface) to the scanner. I didn't go so overboard as to allow multiple registered
handlers, or anything like that; this seemed simple enough without being a hack.
> I'd also observe that XMLBuffer has to check to use the
> infelicitously named "insureCapacity()" method to make sure it's
> large enough
Yes, this is where I implemented the full-handler invocation. It's more or less as you
describe. Once I get everything cleaned up, I'll post the patch here; you'll probably
find it to be entirely unsurprising.
I think what I'll end up doing to allow user-configurable buffer size limit will be to
add a setter method on Parser and AbstractDOMParser, something like
setInputBufferSize(). I think that would be most consistent with the existing API.
Sound ok?
---------------------------------------------------------------------
View this comment:
http://issues.apache.org/jira/browse/XERCESC-1207?page=comments#action_35530
---------------------------------------------------------------------
View the issue:
http://issues.apache.org/jira/browse/XERCESC-1207
Here is an overview of the issue:
---------------------------------------------------------------------
Key: XERCESC-1207
Summary: XMLScanner::scanCharData fills XMLBuffer until out of memory
Type: Bug
Status: Unassigned
Priority: Critical
Project: Xerces-C++
Components:
Non-Validating Parser
Versions:
2.5.0
Assignee:
Reporter: Dan Rosen
Created: Mon, 10 May 2004 10:51 AM
Updated: Thu, 13 May 2004 10:48 AM
Description:
When parsing an XML file consisting primarily of very large (hundreds of megabytes)
blocks of contiguous character data, XMLScanner::scanCharData() happily attempts to
build a single XMLBuffer containing all the data. Eventually the buffer becomes so
large that the reallocation within XMLBuffer::insureCapacity() fails, causing
std::bad_alloc to be thrown, or a crash in memcpy (depending on compiler). The
fundamental problem seems to be that there is no upper bound imposed on buffer length.
In the SAX model, it is acceptable to issue multiple ContentHandler::characters()
callbacks for a single contiguous block of data. The only restriction on how this
should be implemented is that all characters in any single event must come from the
same external entity; no further behavior is specified. So it would be perfectly
conformant to the SAX model to set an upper bound on the size of a single characters()
event.
(As far as I understand, allowing an upper bound in XMLScanner::scanCharData() would
not affect the DOM)
I'd propose that an upper bound for character buffer size be added as an optional
parameter (with some reasonable value as a default), either in the constructor of the
parser or in useScanner(), and that that parameter be used to inform
XMLScanner::scanCharData() when to force a call to sendCharData() to dump the buffer
to its client.
---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
If you want more information on JIRA, or have a bug to report see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]