At 08:13 PM 3/1/00 , Michael Moore wrote:
>I don't think I can just look for the string "<?XML" for two reasons.
>First, the XML doocument might start with comments and second, the XML
>document might start with ".<.?.X.M.L" (where . = null) if using UCS-2
>encoding. Furthermore I am told that the REAL first character might be
>a byte-order marker like (0xFEFF). I suppose that given a different
>byte order I might actually be looking at "<?.X.M.L." as my first
>characters.

This is the "normal" approach: you analyse the stream comparing it with the
various possibilities until you hit a "match". Yes, when you factor in the
different byte orders, it means there are several options to look for. Note
that byte-order is normally handled by the I/O library (provided it's
Unicode aware) so it's normally not a problem.

It appears that your system does not have a Unicode I/O library. I'd
suggest you double-check for this (if you have a standard C compiler, its
I/O library should be Unicode-aware).

However I find what you are try doing dangerous. Think of what will happen
when you add new format! You will need to look for EDI, XML and something
else... You should investigate using a multi-file protocol similar to MIME.
This protocol would take care of breaking the various files in pieces.

>A fellow programmer wants to convert the XML to ASCII with some sort of
>shift-in/Shift-out escape byte to indicate when multibyte characters
>are in the stream. (The object being that all down stream process can
>then process the file as though it were ASCII and not be concerned with
>UCS-2 formatting.) But I don't like this idea because I can't believe
>that there is not some standard way to handle this sort of thing. I
>hate to write some home grown solution when there might be an industry
>standard that will do the job.

It's not a bad idea provided you stick to standard encodings. Unicode
(www.unicode.org) has various encodings that will allow you to streamline
it to the ASCII set.
An alternative would be to rely on character entities to encode the odd
non-ASCII character in the XML stream.

Hope it helps

--ben
Benoît Marchal, Pineapplesoft

As e-commerce Grows, Understanding XML Becomes a Key Job Skill
XML by Example / $24.99 / ISBN 0-7897-2242-9 / www.worth-it.com

==========================================
XML/EDI Group members-only discussion list
Homepage =  http://www.xmledi.com

Brought to you by: Online Technologies Corporation
                  Home of BizServe - www.bizserve.com

TO UNSUBSCRIBE: Send email to <[EMAIL PROTECTED]>
               Leave the subject blank, and
               In the body of the message, enter ONLY: unsubscribe

Questions/requests should be sent to: [EMAIL PROTECTED]
To join the XML/EDI Group complete the form located at:
http://www.geocities.com/WallStreet/Floor/5815/mail1.htm

Reply via email to