Re: UTF-8 vs. current locale charset mess...

2000-08-11 Thread Nick Lamb

On Thu, Aug 10, 2000 at 09:24:42PM +0300, Tor Lillqvist wrote:
 GTK+ 1.3 (and 2.0) uses UTF-8 internally, while the file system
 related C runtime calls like stat(), open() and opendir() uses a
 "current codepage" (the Windows term, on Unix you want to use whatever
 encoding/charset the user's locale uses).

For Linux at least the filesystems speak UTF8. I don't see a problem
(well, OK as usual Windows doesn't work, but Tor you're used to hacking
around that without needing to consult us) how about *BSD, Solaris etc?

Nick.



Re: UTF-8 vs. current locale charset mess...

2000-08-11 Thread Marc Lehmann

On Fri, Aug 11, 2000 at 12:50:12PM +0100, Nick Lamb [EMAIL PROTECTED] wrote:
 For Linux at least the filesystems speak UTF8.

While this is the proposed standard, there exist about zero systems in
practise that follow it, and the kernel does neither check nor enforce it.

 around that without needing to consult us) how about *BSD, Solaris etc?

"unix", in general, only supports characters from the portable filename
character set, so "in theory" there is no problem at all, as characters
127 do not exist in that set.

So there is no way around supporting native character sets.

-- 
  -==- |
  ==-- _   |
  ---==---(_)__  __   __   Marc Lehmann  +--
  --==---/ / _ \/ // /\ \/ /   [EMAIL PROTECTED] |e|
  -=/_/_//_/\_,_/ /_/\_\   XX11-RIPE --+
The choice of a GNU generation   |
 |



Re: UTF-8 vs. current locale charset mess...

2000-08-11 Thread Tor Lillqvist

Marc Lehmann writes:
  "unix", in general, only supports characters from the portable filename
  character set, so "in theory" there is no problem at all, as characters
  127 do not exist in that set.

True, but in real life, I would assume most Unix systems are quite
happy with using any bytes in path names except '/' and '\0'. It's
then up to the site-specific or user-specific locale what charset
these are interpreted to be in.

I would also guess that very few Unix installations actually use UTF-8
locales now and in the immediate future. However, GTK+ 2.0 will use
UTF-8. (And GTK+ 1.3 as used on Windows use it already.) It's good to
start thinking a bit on the implications now.

What I was looking for with my message was mostly an okay (or strong
opposition) to adding code to GIMP at this point to convert back and
forth between the file system charset and UTF-8. (As I said, said code
would expand to semantically no-op g_strdup() calls on GTK+ 1.2.x, and
thus would mostly be just a cosmetic issue.)

One decision would be good to make now: Are the file names passed
around in PDB calls in UTF-8 or in the "file system" charset (the
current locale's charset)? Both approaches have pros and cons:

- pass around UTF-8: GIMP and plug-ins have to convert to the current
  locale charset before doing system calls with path names. OTOH the
  strings can be directly passed to GTK+.

- pass around path names as they are in the system: GIMP and plug-ins
  have to convert to UTF-8 when passing path names to GTK+ for
  display, and from UTF-8 when receiving pathnames from user input.
  OTOH they can be directly used in file system calls.

My guess is that the second approaches is preferrable?

--tml