Byte Order Marks

What is the default CharSet for OpenEHR ADL?

ASCII? UTF-8?

I ask because ADL itself does not anywhere declare a character set & we 
have had a number of adl files which have failed (either to be opened or 
to be transformed into XML) & in each occasion the reason has been the 
presence of a byte order mark (hex bytes EF BB BF) e.g.

Exception in thread "main" se.acode.openehr.parser.TokenMgrError: 
Lexical error at line 1, column 1.  Encountered: "\u00ef" (239), after : ""
    at 
se.acode.openehr.parser.ADLParserTokenManager.getNextToken(ADLParserTokenManager.java:27554)


Equally if a text editor opens an ADL, assumes UTF-8 & puts on a BOM 
then the Archetype editor dies in addition to the Java ADL parser & the 
Windows ADL2XML converter.

  
Our standard for XML is UTF-8 & I am wondering that if the std is in ADL 
ASCII then how does/will adl support extended charater sets?

e.g. in one of our adl there is some Dutch including     

"Een pati?nt in rolstoel moet zonder hulp met hoeken en deuren kunnen 
omgaan" where "pati?nt" is mis-rendered (though given mail clients are 
pretty good this will probably be correctly rendered)

With other tooling we always had a problem wrt people pasting in content 
from other tools (mostly Word) which had a non-UTF-8 codeset.
Equally we have a fair amount of existing content which might also be 
pasted into the ADL files via the Archetype editor.
How does the Archetype editor deal with non-ASCII chars if they are 
pasted in?
 
Is there a possible loss of fidelity when converting between ADL (in 
ASCII) and XML in UTF8?
 
 So...
 
 A) In general what are the stds etc for Char Sets & ADL?
 B) The various parsers etc should not blow up upon running into a std 
Byte Order Mark.
 C) Would setting the ADL to a non BOM UTF (e.g. UTF-16LE) be OK?
 
 
 Right now I can simply clean up the adl by running each one through a 
CharsetEncoder/Decoder however....I would prefer this to be fixed at source.
 
Adam

***********************************************************************
This  message  may  contain  confidential and  privileged  information.
If you  are not the  intended recipient  you should not  disclose, copy
or distribute information in this e-mail or take any action in reliance
on its contents.  To do so is strictly  prohibited and may be unlawful.
Please  inform  the  sender that  this  message has  gone astray before
deleting it.  Thank you.

2008 marks the 60th anniversary of the NHS.  It's an opportunity to pay
tribute to the NHS staff and volunteers who help shape the service, and
celebrate their achievements.

If you work for the NHS  and  would like  an NHSmail  email account, go
to: www.connectingforhealth.nhs.uk/nhsmail
***********************************************************************


Reply via email to