Hi,

        I have not heard from the person who originally responded to me about this 
issue for a couple weeks.   Does anybody on this list happen to have any insight into 
this issue?

Thanks,
David

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Tuesday, July 22, 2003 2:56 PM
To: [EMAIL PROTECTED]
Subject: DO NOT REPLY [Bug 21415] - bug in XercesDOMParser treatment of
whitespace and empty elements


DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=21415>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=21415

bug in XercesDOMParser treatment of whitespace and empty elements





------- Additional Comments From [EMAIL PROTECTED]  2003-07-22 19:55 -------
PeiYong,

    Thanks for the reply.

    I have embedded my comments inside a copy of your e-mail on lines that do 
not begin with > below.

 >   1. First I modify the sample MemParse,  
 >      . add the line 
 >        #include <xercesc/parsers/XercesDOMParser.hpp>
 >      . change the line 
 >        SAXParser* parser = new SAXParser;       to
 >        XercesDOMParser *parser = new XercesDOMParser;
 >      . comment out 
 >        parser->setValidationScheme(valScheme);  and
 >        parser->setValidationScheme(valScheme);
 >      . change the line
 >        static const char*  gXMLInMemBuf = "<outer> <a></a><b>\n</b> 
</outer>";
 > 
 >       and result the sample, the result is something like this
 >        <outer> <a></a><b>
 >        </b> </outer>

 Looking at MemParse.cpp reveals that the output it gives comes from the 
following code:
"
        cout << "\nFinished parsing the memory buffer containing the following "
             << "XML statements:\n\n"
             << gXMLInMemBuf
" (line 338 in the original MemParse.cpp)
...it's just outputting the exact value that we set earlier (basically whatever 
you specified in the last step of "1.").   So it appears that this particular 
test doesn't exercise the functionality in question.

 >    2. Second, I put your string into a file, and feed it to the sample 
DOMPrint 
 >       and i have the same result as #1.
Sorry, I forgot to specify that "pretty print" should be turned on.   So I 
guess a good test here would be:
    1) Create a file named test.xml with the following:
"
<outer> <a></a> <b>
</b> </outer>
"
(in other words: "<outer> <a></a> <b>\n</b> </outer>")
    2) run the following at a command prompt:
       domprint -wfpp=on test.xml
    and pipe the result into test_2.xml (e.g. "domprint -wfpp=on test.xml 
>test2.xml")
    3) run the following at a command prompt:
       domprint -wfpp=on test_2.xml
    and pipe the result into test_3.xml.

    You should see the following:
---test.xml--->
"
<outer> <a></a> <b>
</b> </outer>
"
---test_2.xml--->
"
<outer>

  <a/>

  <b></b>

</outer>
"
---test_3.xml--->
"
<outer>

  <a/>

  <b/>

</outer>
"
   Note that test_3 merely took the output of test_2 and passed it through the 
same code again.   I have not had time to figure out how Xerces work and where 
the code for this stuff is located, but--It's as if, somewhere in the Xerces 
code, something is taking a first pass and changing occurances of "<b></b>" 
into "<b/>", and *then* it's dropping whitespace...and now we have a situation 
where "<b>\n</b>" was changed into "<b></b>", but it never gets hit with the 
change of "<b/>" because that would've happened during a step that already 
occured earlier.   If that is the case, then perhaps the step that 
changes "<b></b>" into "<b/> just need to be moved to someplace after the step 
that removes the whilespace (and anything else that could introduce that 
condition)?

 >    3. Last, in fact, the pair <b></b> is quivalent to <b/>, and there is 
**no 
 >       space** in between <b></b>.
  Sorry, could you clarify this last thing for me?   I agree that 
given "<b></b>" and "<b/>" Xerces gives you a consistent "<b/>" output for both 
of them, but the issue is *when there's whitespace*...so the example I give 
is "<b>\n</b>" and "<b/>" are *not* being treated as the same thing.   Also, 
what exactly are you referring to when you say "in between <b></b>"?   I 
apologize if I'm just misreading or something...

Thanks again,
David

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to