Re: xerces-c createdocs.bat and the BOM character

2007-10-30 Thread Scott Morgan
Boris Kolpackov wrote:
 There is no such thing as BOM for UTF-8.
   

http://unicode.org/faq/utf_bom.html#25

a BOM can be used as a signature no matter how the Unicode text is
transformed: UTF-16, UTF-8, UTF-7, etc. The exact bytes comprising the
BOM will be whatever the Unicode character FEFF is converted into by
that transformation format. In that form, the BOM serves to indicate
both that it is a Unicode file, and which of the formats it is in.

Scott



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: xerces-c createdocs.bat and the BOM character

2007-10-30 Thread Jesse Pelton
Actually, the XML spec discusses the UTF-8 BOM.  See
http://www.w3.org/TR/2006/REC-xml-20060816/#sec-guessing-no-ext-info.

Whether it makes sense is another question.  I suppose it could be used
to quickly distinguish UTF-8 from ASCII and similar encodings.  Since
conforming processors are required to handle UTF-8 and UTF-16, but no
other encodings, this might have some value.

-Original Message-
From: Boris Kolpackov [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 30, 2007 10:37 AM
To: c-dev@xerces.apache.org
Subject: Re: xerces-c createdocs.bat and the BOM character

Hi Justin,

Justin Dearing [EMAIL PROTECTED] writes:

 Gerald, the author of XML copy editor seems to think the BOM should be
 there as the docs are UTF-8 and it is a UTF-8 BOM.

BOM (byte order marker) does not make any sense for UTF-8 since it is
a 1-byte encoding.


 1) What is the intended encoding of the documentation? Since the
documents
 are written in English my understanding is UTF-8 would work just fine
but I
 don't know a lot about unicode.

UTF-8.

 2) Does the java tool that builds the documentation handle BOMs
correctly
 for UTF-8 or is my editor at fault.

There is no such thing as BOM for UTF-8.

 3) As a developer working on a windows platform how would I get
encoding
 information about a file?

I assume you are talking about .xml files in the doc/ directory. In this
case: those XML file do not explicitly state their encoding (in XML
declaration) so it defaults to UTF-8.

 4) As a developer working on a unix platform how would I get encoding
 information about a file?

Ditto.

Boris

-- 
Boris Kolpackov
Code Synthesis Tools CC
http://www.codesynthesis.com
Open-Source, Cross-Platform C++ XML Data Binding

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: xerces-c createdocs.bat and the BOM character

2007-10-30 Thread Justin Dearing
On 10/30/07, David Bertoni [EMAIL PROTECTED] wrote:
 It would help to see the error message you're getting, and to know what
 tool is issuing it.

My apologies. This is the java tool building version 2 of the xerces-c
library on windows of course

[XalanProcessor] Applying XSL sheet sbk:/style/stylesheets/any2project.xsl
[XercesParser] The markup in the document preceding the root element must be wel
l-formed. [File: sbk:/sources/build-winunix.xml Line: 1 Column: 1]
org.apache.stylebook.CreationException: SAXException caught while parsing
at org.apache.stylebook.parsers.XercesParser.parse(XercesParser.java:55)

at org.apache.stylebook.parsers.CachingParser.parse(CachingParser.java:9
2)
at org.apache.stylebook.parsers.AbstractParser.parse(AbstractParser.java
:28)
at org.apache.stylebook.producers.ParserProducer.produce(ParserProducer.
java:26)
at org.apache.stylebook.Project.processEntry(Project.java:110)
at org.apache.stylebook.Project.processNodeList(Project.java:54)
at org.apache.stylebook.Project.init(Project.java:42)
at org.apache.stylebook.Loader.load(Loader.java:57)
at org.apache.stylebook.StyleBook.getProject(StyleBook.java:122)
at org.apache.stylebook.StyleBook.main(StyleBook.java:85)
[StyleBook] Caught org.apache.stylebook.LoadingException
org.apache.stylebook.LoadingException: Processing Entry (SAXException caught whi
le parsing)
at org.apache.stylebook.Project.processEntry(Project.java:126)
at org.apache.stylebook.Project.processNodeList(Project.java:54)
at org.apache.stylebook.Project.init(Project.java:42)
at org.apache.stylebook.Loader.load(Loader.java:57)
at org.apache.stylebook.StyleBook.getProject(StyleBook.java:122)
at org.apache.stylebook.StyleBook.main(StyleBook.java:85)
[StyleBook] Error creating project

Java version:

C:\Documents and Settings\justin.dearing\My Documents\xercesjava -version
java version 1.6.0_03
Java(TM) SE Runtime Environment (build 1.6.0_03-b05)
Java HotSpot(TM) Client VM (build 1.6.0_03-b05, mixed mode, sharing)

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



XNI-based Xerces C++

2007-10-30 Thread Venky Raju
Hi,

Bringing back a question from 2004 - is there a C++ version of Xerces
based on XNI?  From looking at the Xerces 2.8 code it does not appear
that it is based on the XNI framework.

Thanks in advance,

Venky Raju
Senior Staff Engineer
Samsung Telecommunications America

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: XNI-based Xerces C++

2007-10-30 Thread Venky Raju
Thanks, Michael!

Venky

-Original Message-
From: Michael Glavassevich [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 30, 2007 1:36 PM
To: c-dev@xerces.apache.org
Subject: Re: XNI-based Xerces C++

Hi Venky,

Venky Raju [EMAIL PROTECTED] wrote on 10/30/2007 02:28:02 PM:

 Hi,

 Bringing back a question from 2004 - is there a C++ version of Xerces
 based on XNI?  From looking at the Xerces 2.8 code it does not appear
 that it is based on the XNI framework.

No, XNI only exists in Xerces-J.

 Thanks in advance,

 Venky Raju
 Senior Staff Engineer
 Samsung Telecommunications America

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: xerces-c createdocs.bat and the BOM character

2007-10-30 Thread Boris Kolpackov
Scott Morgan [EMAIL PROTECTED] writes:

 http://unicode.org/faq/utf_bom.html#25

 a BOM can be used as a signature no matter how the Unicode text is
 transformed: UTF-16, UTF-8, UTF-7, etc. The exact bytes comprising the
 BOM will be whatever the Unicode character FEFF is converted into by
 that transformation format. In that form, the BOM serves to indicate
 both that it is a Unicode file, and which of the formats it is in.

I guess this illustrates a major difference between mathematics and
software. In math if someone tells you that 2 + 2 = 5 then you can
call BS without looking into any external sources. In software, if
someone tells you there is a byte order marker for encoding which
by definition does not have a notion of byte order, you actually
need to check whether some standards body came up with such a
thing. I can't seem to learn the lesson ;-).


Boris

-- 
Boris Kolpackov
Code Synthesis Tools CC
http://www.codesynthesis.com
Open-Source, Cross-Platform C++ XML Data Binding

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]