Byte Order Marks
What is the default CharSet for OpenEHR ADL?
ASCII? UTF-8?
I ask because ADL itself does not anywhere declare a character set & we
have had a number of adl files which have failed (either to be opened or
to be transformed into XML) & in each occasion the reason has been the
presence of a byte order mark (hex bytes EF BB BF) e.g.
Exception in thread "main" se.acode.openehr.parser.TokenMgrError:
Lexical error at line 1, column 1. Encountered: "\u00ef" (239), after : ""
at
se.acode.openehr.parser.ADLParserTokenManager.getNextToken(ADLParserTokenManager.java:27554)
Equally if a text editor opens an ADL, assumes UTF-8 & puts on a BOM
then the Archetype editor dies in addition to the Java ADL parser & the
Windows ADL2XML converter.
Our standard for XML is UTF-8 & I am wondering that if the std is in ADL
ASCII then how does/will adl support extended charater sets?
e.g. in one of our adl there is some Dutch including
"Een pati?nt in rolstoel moet zonder hulp met hoeken en deuren kunnen
omgaan" where "pati?nt" is mis-rendered (though given mail clients are
pretty good this will probably be correctly rendered)
With other tooling we always had a problem wrt people pasting in content
from other tools (mostly Word) which had a non-UTF-8 codeset.
Equally we have a fair amount of existing content which might also be
pasted into the ADL files via the Archetype editor.
How does the Archetype editor deal with non-ASCII chars if they are
pasted in?
Is there a possible loss of fidelity when converting between ADL (in
ASCII) and XML in UTF8?
So...
A) In general what are the stds etc for Char Sets & ADL?
B) The various parsers etc should not blow up upon running into a std
Byte Order Mark.
C) Would setting the ADL to a non BOM UTF (e.g. UTF-16LE) be OK?
Right now I can simply clean up the adl by running each one through a
CharsetEncoder/Decoder however....I would prefer this to be fixed at source.
Adam
***********************************************************************
This message may contain confidential and privileged information.
If you are not the intended recipient you should not disclose, copy
or distribute information in this e-mail or take any action in reliance
on its contents. To do so is strictly prohibited and may be unlawful.
Please inform the sender that this message has gone astray before
deleting it. Thank you.
2008 marks the 60th anniversary of the NHS. It's an opportunity to pay
tribute to the NHS staff and volunteers who help shape the service, and
celebrate their achievements.
If you work for the NHS and would like an NHSmail email account, go
to: www.connectingforhealth.nhs.uk/nhsmail
***********************************************************************