Hi Joseph,

Thanks for your reply.  I have actually tried out a couple of things and this is
what i found out:

  • I tried defining an inernal subset and tried using just character references in it.
        My XML header looked something like this:

<?xml version="1.0" ?>
<!DOCTYPE myRootElement [<!ENTITY eacute "&#231;"> ] >

then i removed all special characters in my input file and ran my java application
which converts the input to XML (which has a header as shown above), parses and
then validates it. I got the following  error message: Element type myRootElement
must be declared. I then added a DTD instead of directly adding entities and ended
up creating an entire DTD with all the elements that appear in my XML !!

How do i actually define an internal subset ? How do i define only character entities
in it ? Where am i going wrong ?

  • According to your suggestion i tried using the UTF-8 encoding tag in my doc header:
       <?xml version="1.0"  encoding="utf-8" ?>
      and put the special characters back into my doc. However, the following error showed
      up this time:

      Invalid byte 2 of 3-byte UTF-8 sequence.
 
     I am using Xalan (which i believe in turn uses Xerces) to do the parsing and validation. Why
     could this error be occuring ?


Thanks for your help.


Regards,

Chetan





Joseph Kesselman <[EMAIL PROTECTED]>

08/17/2004 08:04 PM

Please respond to
[EMAIL PROTECTED]

To
[EMAIL PROTECTED]
cc
Subject
Re: Special characters in XML being validated against a schema









Alternatively, you can use an internal subset (DTD syntax inside the
document itself, which keeps things self-contained)...

or you skip the mnemonic names and just use numeric character references...

or you skip those and just use the characters themselves and pick an
encoding which supports them and which your processor understands. (if in
doubt, UTF-8 does support everything and is supported by all XML
processors).


(Schema considered adding an Entity-like macro/import facility, but decided
to leave this for other standards to deal with. Unfortunately there hasn't
yet been a clear consensus on what the best replacement is...)

______________________________________
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world changed profoundly and unpredictably the day Tim Berners Lee
got bitten by a radioactive spider." -- Rafe Culpin, in r.m.filk


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


ForwardSourceID:NT0000261E    
DISCLAIMER: The information contained in this message is intended only and 
solely for the addressed individual or entity indicated in this message and for 
the exclusive use of the said addressed individual or entity indicated in this 
message (or responsible for delivery
of the message to such person) and may contain legally privileged and 
confidential information belonging to Tata Consultancy Services. It must not be 
printed, read, copied, disclosed, forwarded, distributed or used (in whatsoever 
manner) by any person other than the
addressee. Unauthorized use, disclosure or copying is strictly prohibited and 
may constitute unlawful act and can possibly attract legal action, civil and/or 
criminal. The contents of this message need not necessarily reflect or endorse 
the views of Tata Consultancy Services
on any subject matter.]
Any action taken or omitted to be taken based on this message is entirely at 
your risk and neither the originator of this message nor Tata Consultancy 
Services takes any responsibility or liability towards the same. Opinions, 
conclusions and any other
information contained in this message that do not relate to the official 
business of Tata Consultancy Services shall be understood as neither given nor 
endorsed by Tata Consultancy Services or any affiliate of Tata Consultancy 
Services. If you have received this message in error,
you should destroy this message and may please notify the sender by e-mail. 
Thank you.



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to