RE: Chunking of characters() callbacks

Dan Rosen Mon, 10 May 2004 10:15:55 -0700

I'll file a bug, and will be working on this over the next couple days.
Looking at the annotated code in CVS, it looks like the primary people to ask
for help on this would be peiyongz, neilg and knoaman. (All IBM guys?) Would
any of you guys be able to give me some pointers up front?

Thanks,
dr

-----Original Message-----
From: Dean Roddey [mailto:[EMAIL PROTECTED] 
Sent: Friday, May 07, 2004 9:45 PM
To: [EMAIL PROTECTED]
Subject: RE: Chunking of characters() callbacks

Personally, I don't think there's probably much to be gained by it *ever*
being over a few K at a time really. So an easy and flexible fix would just
be to max it out at maybe 8K and be done with it. Everyone *has* to be
prepared for the possibility of multiple chunks, even if they only have two
characters worth of data, so no one can complain that this breaks their code,
because if it does, then they weren't compliant anyway. And there's probably
not much performance gain or loss one way or another. And, if you are looking
at this data in a streaming way, you'll find errors sooner and not do further
parsing of redundant data.

-------------------------------------
Dean Roddey
The Charmed Quark Controller
[EMAIL PROTECTED]
www.charmedquark.com

-----Original Message-----
From: Sean Kelly [mailto:[EMAIL PROTECTED] 
Sent: Friday, May 07, 2004 8:42 PM
To: [EMAIL PROTECTED]
Subject: Re: Chunking of characters() callbacks

Dan Rosen wrote:
> 
> The trouble I'm running into is, when parsing, a buffer in memory for
> the characters in this tremendous block of data is being maintained, 
> and is grown when necessary by XMLBuffer::insureCapacity. This buffer 
> gets so large that at some point, the allocation in insureCapacity 
> fails, and parsing can't continue. What I'd like to be able to do is, 
> specify to Xerces that it should buffer up only a certain maximum 
> amount of character data at a time before calling sendCharData (in 
> IGXMLScanner::scanCharData), rather than waiting until it has 
> everything.
> 
> As far as I can tell, there isn't a way to do this currently. But I'd
> like some feedback as to how easily people think this might be 
> implemented, whether it's reasonable to do so, etc., and (as a newbie 
> to the Xerces
> codebase) hopefully get some assistance in implementing it.

I'm quite interested in this as well.  I asked this question a few 
months ago and didn't get a response.  I tend to work with very large 
XML streams that have substantial chunks of data Base64 encoded as 
character data.  All I'd really like to be able to do is set the 
character buffer size to, say, 4k and not allow it to grow beyond that. 
  This would obviously be for the SAX parser.

Sean

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Chunking of characters() callbacks

Reply via email to