Re: xerces-c createdocs.bat and the BOM character
Boris Kolpackov wrote: There is no such thing as BOM for UTF-8. http://unicode.org/faq/utf_bom.html#25 a BOM can be used as a signature no matter how the Unicode text is transformed: UTF-16, UTF-8, UTF-7, etc. The exact bytes comprising the BOM will be whatever the Unicode character FEFF is converted into by that transformation format. In that form, the BOM serves to indicate both that it is a Unicode file, and which of the formats it is in. Scott - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: xerces-c createdocs.bat and the BOM character
Actually, the XML spec discusses the UTF-8 BOM. See http://www.w3.org/TR/2006/REC-xml-20060816/#sec-guessing-no-ext-info. Whether it makes sense is another question. I suppose it could be used to quickly distinguish UTF-8 from ASCII and similar encodings. Since conforming processors are required to handle UTF-8 and UTF-16, but no other encodings, this might have some value. -Original Message- From: Boris Kolpackov [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 30, 2007 10:37 AM To: c-dev@xerces.apache.org Subject: Re: xerces-c createdocs.bat and the BOM character Hi Justin, Justin Dearing [EMAIL PROTECTED] writes: Gerald, the author of XML copy editor seems to think the BOM should be there as the docs are UTF-8 and it is a UTF-8 BOM. BOM (byte order marker) does not make any sense for UTF-8 since it is a 1-byte encoding. 1) What is the intended encoding of the documentation? Since the documents are written in English my understanding is UTF-8 would work just fine but I don't know a lot about unicode. UTF-8. 2) Does the java tool that builds the documentation handle BOMs correctly for UTF-8 or is my editor at fault. There is no such thing as BOM for UTF-8. 3) As a developer working on a windows platform how would I get encoding information about a file? I assume you are talking about .xml files in the doc/ directory. In this case: those XML file do not explicitly state their encoding (in XML declaration) so it defaults to UTF-8. 4) As a developer working on a unix platform how would I get encoding information about a file? Ditto. Boris -- Boris Kolpackov Code Synthesis Tools CC http://www.codesynthesis.com Open-Source, Cross-Platform C++ XML Data Binding - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: xerces-c createdocs.bat and the BOM character
On 10/30/07, David Bertoni [EMAIL PROTECTED] wrote: It would help to see the error message you're getting, and to know what tool is issuing it. My apologies. This is the java tool building version 2 of the xerces-c library on windows of course [XalanProcessor] Applying XSL sheet sbk:/style/stylesheets/any2project.xsl [XercesParser] The markup in the document preceding the root element must be wel l-formed. [File: sbk:/sources/build-winunix.xml Line: 1 Column: 1] org.apache.stylebook.CreationException: SAXException caught while parsing at org.apache.stylebook.parsers.XercesParser.parse(XercesParser.java:55) at org.apache.stylebook.parsers.CachingParser.parse(CachingParser.java:9 2) at org.apache.stylebook.parsers.AbstractParser.parse(AbstractParser.java :28) at org.apache.stylebook.producers.ParserProducer.produce(ParserProducer. java:26) at org.apache.stylebook.Project.processEntry(Project.java:110) at org.apache.stylebook.Project.processNodeList(Project.java:54) at org.apache.stylebook.Project.init(Project.java:42) at org.apache.stylebook.Loader.load(Loader.java:57) at org.apache.stylebook.StyleBook.getProject(StyleBook.java:122) at org.apache.stylebook.StyleBook.main(StyleBook.java:85) [StyleBook] Caught org.apache.stylebook.LoadingException org.apache.stylebook.LoadingException: Processing Entry (SAXException caught whi le parsing) at org.apache.stylebook.Project.processEntry(Project.java:126) at org.apache.stylebook.Project.processNodeList(Project.java:54) at org.apache.stylebook.Project.init(Project.java:42) at org.apache.stylebook.Loader.load(Loader.java:57) at org.apache.stylebook.StyleBook.getProject(StyleBook.java:122) at org.apache.stylebook.StyleBook.main(StyleBook.java:85) [StyleBook] Error creating project Java version: C:\Documents and Settings\justin.dearing\My Documents\xercesjava -version java version 1.6.0_03 Java(TM) SE Runtime Environment (build 1.6.0_03-b05) Java HotSpot(TM) Client VM (build 1.6.0_03-b05, mixed mode, sharing) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
XNI-based Xerces C++
Hi, Bringing back a question from 2004 - is there a C++ version of Xerces based on XNI? From looking at the Xerces 2.8 code it does not appear that it is based on the XNI framework. Thanks in advance, Venky Raju Senior Staff Engineer Samsung Telecommunications America - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: XNI-based Xerces C++
Thanks, Michael! Venky -Original Message- From: Michael Glavassevich [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 30, 2007 1:36 PM To: c-dev@xerces.apache.org Subject: Re: XNI-based Xerces C++ Hi Venky, Venky Raju [EMAIL PROTECTED] wrote on 10/30/2007 02:28:02 PM: Hi, Bringing back a question from 2004 - is there a C++ version of Xerces based on XNI? From looking at the Xerces 2.8 code it does not appear that it is based on the XNI framework. No, XNI only exists in Xerces-J. Thanks in advance, Venky Raju Senior Staff Engineer Samsung Telecommunications America - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Michael Glavassevich XML Parser Development IBM Toronto Lab E-mail: [EMAIL PROTECTED] E-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: xerces-c createdocs.bat and the BOM character
Scott Morgan [EMAIL PROTECTED] writes: http://unicode.org/faq/utf_bom.html#25 a BOM can be used as a signature no matter how the Unicode text is transformed: UTF-16, UTF-8, UTF-7, etc. The exact bytes comprising the BOM will be whatever the Unicode character FEFF is converted into by that transformation format. In that form, the BOM serves to indicate both that it is a Unicode file, and which of the formats it is in. I guess this illustrates a major difference between mathematics and software. In math if someone tells you that 2 + 2 = 5 then you can call BS without looking into any external sources. In software, if someone tells you there is a byte order marker for encoding which by definition does not have a notion of byte order, you actually need to check whether some standards body came up with such a thing. I can't seem to learn the lesson ;-). Boris -- Boris Kolpackov Code Synthesis Tools CC http://www.codesynthesis.com Open-Source, Cross-Platform C++ XML Data Binding - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]