Ienup Sung <[EMAIL PROTECTED]> wrote:

> I think distingushing between UTF-8 and ISO8859-? codesets by
> examining byte values or patterns used in file names is quite difficult
> and not always possible. I'd be interested to hear from you on
> what would be the best way of achieving that.

I did not think about other codings but only about ISO-8859-1 as it is 
the most popular single byte coding. I did not yet think about the idea 
completely. But there is another idea to (mostly) deal with the problem.
See below....


> PS. BTW, I think we have about three (or perhaps more) file name length
> restrictions or constraints and then various problems stem out from
> any possible combinations out of the three:
>
> - Different locales/codesets use different number of bytes to
>    represent the same characters.
> - Multiple user land side max filename length definitions.
> - Multiple per file system max filename length definitions.

In order to avoid unneeded problems, I recommend to change MAXNAMELEN in
usr/src/uts/common/fs/lookup.c (and maybe a few other files) to 1024.
This would already allow to use hsfs with Joliet without limitations.
If you like to test, use the undocumented mount option "jolietlong" and
a Joliet CD with very long file names. If you do not change lookup.c,
you will be able to see long file names (up to 330 bytes - 110 UCS-2 chars) 
but not to stat/open the files. If you change MAXNAMELEN, you may also 
use them.

> And I think we already have this problem of "mismatch" in terms of
> the number of bytes and the number of characters allowed in file names in
> various levels vertically and horizontally.
>
> While people usually don't see the problem so often (since not that many
> people create and use really lengthy file names daily), I think the problem
> does exist in today and that's not just on traditional file systems
> such as UFS but also with rather new Unicode file systems such as NTFS,
> HFS+, UDF, and so on since, for instance, NTFS allows 255 16-bit units for
> a filename and it can be translated into 255 UTF-16 characters or
> 127 UTF-16 characters. The similar for UDF; it could be 127 or 254
> characters depending on what is the compression id used with.

If NTFS allows 255 UTF-2 chars, you need to set MAXPATHNAME to at least 765.

I did not check ZFS on disk structures, but on UFS MAXPATHNAME could be 
enhanced to 503 to allow longer UNICODE names. If MAXPATHNAME is 503, then
we could allow 251 ISO-8859-1 chars from the 8-bit range or 167 katakana
characters.


Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
       [EMAIL PROTECTED]                (uni)  
       [EMAIL PROTECTED]     (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
_______________________________________________
opensolaris-discuss mailing list
[email protected]

Reply via email to