Hi Glenn, I figured it had to be a buffer issue. Thx for clueing me in that that is indeed the case. thx carl
-----Original Message----- From: Glenn Marcy [mailto:[EMAIL PROTECTED]] Sent: Friday, October 26, 2001 6:44 PM To: [EMAIL PROTECTED] Subject: Re: SAXParser question Carl, It is frequently the case that XML processors read documents using buffered I/O. This leads to occasions where character data crosses buffer boundaries. To avoid costly data copying and memory allocation, SAX permits the parser to generate multiple character data "events". If you consider the case of larger documents with sparse markup, the alternative could be very expensive. For example, imagine a several meg text file with a single start and end tag wrapped around it being read using a fairly typical 8k buffer size !! -Glenn Carl Christianson <CChristianson@knowledgep To: "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]> lanet.com> cc: Subject: SAXParser question 10/26/2001 09:21 AM Please respond to xerces-j-dev Hi all, this might be a dumb question but I'm wondering if something I'm seeing is "normal". I have noticed that the ContentHandler I'm using that when I'm parsing a fairly large file that the characters method gets fired multiple times for the data in an element. Let me explain I have an XML record that looks like. <PersonRecord> <ForCompany>PRUD</ForCompany> <Operation>S</Operation> <XStudentId>carl</XStudentId> <FirstName>Carl</FirstName> <LastName>Christianson</LastName> <UserName>carl</UserName> <Password>carl</Password> <StudentStatus>Active</StudentStatus> <XRegionId>xcbn</XRegionId> <XField1>xField1</XField1> <XField2>xField2</XField2> </PersonRecord> to do a load test of my application I copied this simple record over and over again (4000 times). I have noticed that the public void characters(char[] ch, int start, int end) method will occasionally get fired multiple times on some of those elements. i.e. it will fire for the <StudentStatus> element and pass in "Activ" and then fire again passing in "e". The start and end of the element only (correctly) get fired once. I upgraded to 1.4.3 and the behaviour is the same. It will consistently do it an records 751, 2480, 2800, 3830. All the records are identical. The element that gets split up is consistent but different for each of those. i.e. record 2480 will split up the <ForCompany> element into "PR" and "UD". I've coded around this but is this normal? thx Carl --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
