2013-11-04 21:00, Jennifer Wong wrote:
The use case is that customers want to integrate data from our enterprise solution to their ASCII-based downstream systems.
This is very different from the question about removing accents while conforming to language standards. The very goal makes it impossible to conform to language standards. The next question should be what the data will be used for, and how.
Thus all accents need to be removed.
I would not jump into that conclusion. Just because some system is ASCII-based does not mean that you cannot in any way handle non-ASCII data. You can encode non-ASCII characters in ASCII in many ways. To take a trivial example, you could convert È to E` and later possibly convert it back, though in such approaches you need to be careful to make the conversion reversible (if it needs to be). In some cases, out-of-band information could be included, e.g. entering a name in a simplified form in ASCII but accompanied with a note (in ASCII) describing accents that have been omitted.
Even if it is acceptable to do lossy mappings (like just dropping all accents, or mapping, say, Ä to AE without worrying about possible AE in original data), the crucial question is how the data will be used, now and in the future.
Yucca

