Carl,

It is frequently the case that XML processors read documents using buffered
I/O.  This leads to occasions where character data crosses buffer
boundaries.
To avoid costly data copying and memory allocation, SAX permits the parser
to generate multiple character data "events".  If you consider the case of
larger documents with sparse markup, the alternative could be very
expensive.
For example, imagine a several meg text file with a single start and end
tag
wrapped around it being read using a fairly typical 8k buffer size !!

-Glenn



                                                                                       
                                      
                    Carl Christianson                                                  
                                      
                    <CChristianson@knowledgep       To:     
"'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>    
                    lanet.com>                      cc:                                
                                      
                                                    Subject:     SAXParser question    
                                      
                    10/26/2001 09:21 AM                                                
                                      
                    Please respond to                                                  
                                      
                    xerces-j-dev                                                       
                                      
                                                                                       
                                      
                                                                                       
                                      



Hi all,
this might be a dumb question but I'm wondering if something I'm seeing is
"normal".

I have noticed that the ContentHandler I'm using that when I'm parsing a
fairly large file that
the characters method gets fired multiple times for the data in an element.
Let me explain

I have an XML record that looks like.

<PersonRecord>
                     <ForCompany>PRUD</ForCompany>
                     <Operation>S</Operation>
                     <XStudentId>carl</XStudentId>
                     <FirstName>Carl</FirstName>
                     <LastName>Christianson</LastName>
                     <UserName>carl</UserName>
                     <Password>carl</Password>
                     <StudentStatus>Active</StudentStatus>
                     <XRegionId>xcbn</XRegionId>
                     <XField1>xField1</XField1>
                     <XField2>xField2</XField2>
</PersonRecord>


to do a load test of my application I copied this simple record over and
over again (4000 times).

I have noticed that the public void characters(char[] ch, int start, int
end) method will occasionally get fired multiple times on some of those
elements.
i.e.  it will fire for the <StudentStatus> element and pass in "Activ" and
then fire again passing in "e".
The start and end of the element only (correctly) get fired once.

I upgraded to 1.4.3 and the behaviour is the same.  It will consistently do
it an records 751, 2480, 2800, 3830.
All the records are identical. The element that gets split up is consistent
but different for each of those. i.e. record 2480 will split up the
<ForCompany> element into "PR" and "UD".
I've coded around this but is this normal?
thx
Carl

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to