Hope some of the gurus with programming experience who read this list can provide me with some additional insight or pointers to resources about the following (NOTE: I've already looked at Markus Kuhn's FAQ):
QUESTIONS: (1) Is examination of the LC_CTYPE environment variable on UNIX-like environments a sufficient way of detecting locale? (2) Are there a UTF-8 competent terminals available for OS X/Darwin, or should one just use an X-based terminal like mlterm or xterm? (3) Aside from xterm or mlterm running under Cygwin, are there other UTF-8 competent terminals available on Win32? Which one are "the best"? I don't mind subjective responses regarding which are "the best" terminals. For example, I would personally rank mlterm as much more capable than xterm since it handles Arabic, Hebrew, and Indic scripts. DETAILS: I'm writing an interactive console-based program (i.e., started from xterm, mlterm, or other terminal emulator) for UNIX-like environments (this would include Cygwin on Win32 and Mac OS X/Darwin in addition to the obvious other ones like Solaris and Linux) which will support just two "locales": the ASCII subset of UTF-8, and UTF-8. That's it! For UTF-8, initially the program will support plane 0 (BMP). Support beyond plane 0 probably won't ever be necessary. My initial plan for finding out about the current locale is that the program will, at start up, look at the LC_CTYPE environment variable. If that variable is defined and contains the substring "UTF-8" or regex-able variants thereof (like "utf8" on Linux), then everything is fine. If not present, the program prints a warning message to the user suggesting they set the locale to a UTF-8 locale and provides an example of how to do that. If the locale is not set properly, the program still functions, but of course any UTF-8 encoded data will not be displayed properly on the terminal. (Of course, even if a locale *is* set to a UTF-8 locale, it doesn't guarantee that UTF-8 data will be displayed properly because (1) glyphs still may not be available in the fonts on the system (2) the terminal may not handle the script properly (i.e., when I last checked, xterm didn't handle Indic or RTL scripts)). If anybody sees limitations to this approach (Actually, I'm hoping you will!), please let me know. This approach seems sufficient using xterm under Cygwin and mlterm on Linux and OpenBSD, and I haven't got around to testing with Solaris yet. There might, for example, be much better ways to do it on Cygwin/Win32 that I don't know about. Also, I don't have a clue how to do it on OS X.

