H.S. wrote:

Yes, but only if the unicode (utf8 or utf16 or others) text support has
been polished and verified that it works in the OSes in their languages.

I would say that goes without saying. But people were computing in those languages, except for Gujarati, long before Unicode.

Having said this, the reality is the operating systems have been
predominantly ascii based. Their source code is predominantly ascii
based (if you are a programmer, how many programs have you made which
support unicode?). It is just not realistic to simply ignore that.

Well, I suppose you could claim that even Unicode systems are ASCII based, since Unicode contains all the ASCII characters in the right order. But in fact even the earlier PCs already contained a full 512 characters in each character set. So did the earliest Macs. Pure 7-bit ASCII was already effectively dead by then, the Web killed it off almost entirely.

That source code for traditional computer languages is traditionally ASCII based is true. But what has that to do with file names that don’t contain computer code in those languages?

Secondly, for a person who has *absolutely* no idea about any English
characters and want to input strings to the operating system safely, I
would suggest s/he not use any OS which is predominantly ASCII based. It
would be better to use an OS created in his/her language.

So they aren’t supposed to use Linux, Macintosh, Windows, or Unix Solaris or Novell Unix? What should they use? Even in Japan those are the predominant operating systems? What should they use instead in Iraq and Iran?

In any case, if someone is programming, say in Japanese, on some system using a legacy character set, then normally they will be creating files in this set, usually without difficulty, unless perhaps they are using Mojiko or perhaps G T Code. But Mojiko and G T Code are not normal character sets and aren’t supported by most computers in use in Japan.

In my own view, unicode support is coming along quite well. But it is
just not there yet so that I can recommend a Gujrati user to change
his/her locale and simply forget all English s/he knows. Some European
languages are much better supported, though.

I never recommended this. That was my extreme example of an unusual script. However the language has a Wikipedia outlet at <http://gu.wiktionary.org/wiki/%E0%AA%AE%E0%AB%81%E0%AA%96%E0%AA%AA%E0%AB%83%E0%AA%B7%E0%AB%8D%E0%AA%A0> and several newspapers on line. Obviously the language does work under Unicode well enough to allow this. If you suggest that those using Gujarati should create files (but not file names) in the Gujarati language, I would say that was bad advice. The problem characters are not Gujarati characters at all, but some of the basic ASCII characters.

People who wish to do so can create Gujarati files with Gujarati file names right now, without difficulty, and transfer them to another Unicode system without a problem, as long as they avoid those same ASCII characters in the names. Even, then, they will probably have no problems if they stick to Linux.

But if you translate the recommendation I made in my earlier post to a
different language than English, it still hold!

It may, because the dangerous characters are all ASCII characters and are therefore likely to be easily available in other languages, even those that don’t use Latin characters, as a holdover from earlier times and their use in programming languages.

Please don't get me wrong, I know where you are coming from, but it is
wishful thinking that ignoring the underlying realities of basic
computer usage will not result in any problems.

I never said that. No-one here has said that. But it is not wishful thinking to point out that a large number of minority languages have their own websites these days and that hardly anyone is having any problems with such characters as “þ” and “æ” and “ŵ”.

Here is where I will repeat that a users should first get comfortable
with the limitations of their computer instead of continuing to use it
in blissful ignorance. Some of the special characters should be avoided,
no matter what language is being used.

Yes: avoid / ? < > \ : * | " ' ^ # , ~ ` space, period except immediately before file extension, and any control character. That’s it. All of the above are not illegal on all systems, but they may be illegal on some operating systems or file systems or may be illegal in some utilities that may process files. Also, avoid producing file names which differ only in casing.

That’s all. But in fact a file name like “$1.00.txt” will work fine on almost any system today (whether you include the quotes or don’t include them).

In any case, I see what you mean by the application throwing a useful
warning. I am just trying to point out that there are all kinds of users
out there and they may not even understand what a particular warning is
about if they are completely computer-illiterate. The underlying problem
is making people understand that they need to get some basic education
about computer usage.

I agree. And it is a reasonably good suggestion to generally avoid common ASCII symbol/punctuation characters as these are the characters among which the problem characters are to be found.

Jim Allan


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to