Re: ffs and utf8

2014-12-03 Thread Joel Rees
Dmitrij had some questions about my intent, I'll try to clarify. 2014/12/02 18:57 Joel Rees joel.r...@gmail.com: (apologies for the html.) 2014/12/02 9:52 Dmitrij D. Czarkoff czark...@gmail.com: [ ... and others Snipped context: There was some discussion of what kind of file names should be

Re: ffs and utf8

2014-12-03 Thread Anthony J. Bentley
Joel Rees writes: You can even handle broken UTF-8 and unconverted UTF-16/32 of whatever byte order spit into the file name as a sequence of bytes if and only if you escape NUL, slash, and your escape character properly, restoring the escaped characters when putting the file names on the

Re: ffs and utf8

2014-12-03 Thread Dmitrij D. Czarkoff
Anthony J. Bentley said: I haven't used Apple OSses since around 10.4, but Mac OS X was doing a thing where certain well-known directory names were aliased according to the current locale. For instance, the user's music directory was shown as 「音楽」 when the locale was set to ja_JP.UTF-8.

Re: ffs and utf8

2014-12-03 Thread Joel Rees
On Wed, Dec 3, 2014 at 9:09 PM, Dmitrij D. Czarkoff czark...@gmail.com wrote: Anthony J. Bentley said: I haven't used Apple OSses since around 10.4, but Mac OS X was doing a thing where certain well-known directory names were aliased according to the current locale. For instance, the user's

Re: ffs and utf8

2014-12-03 Thread Dmitrij D. Czarkoff
First of all, I really don't believe that preservation of non-canonical form should be a consideration for any software. There is no single reason to allow non-canonical forms to exist at all, while there are several reasons to avoid them. More so for foreign encodings in filenames - if you are

Re: ffs and utf8

2014-12-03 Thread Joel Rees
2014/12/03 22:23 Dmitrij D. Czarkoff czark...@gmail.com: First of all, I really don't believe that preservation of non-canonical form should be a consideration for any software. There is no particular canonical form for some kinds of software. Unix, in particular, happens to have file name

Re: ffs and utf8

2014-12-03 Thread Anthony J. Bentley
Joel Rees writes: 2014/12/03 22:23 Dmitrij D. Czarkoff czark...@gmail.com: First of all, I really don't believe that preservation of non-canonical form should be a consideration for any software. There is no particular canonical form for some kinds of software. Unix, in particular,

Re: ffs and utf8

2014-12-03 Thread Theo de Raadt
Joel Rees writes: 2014/12/03 22:23 Dmitrij D. Czarkoff czark...@gmail.com: First of all, I really don't believe that preservation of non-canonical form should be a consideration for any software. There is no particular canonical form for some kinds of software. Unix, in particular,

Re: ffs and utf8

2014-12-03 Thread Dmitrij D. Czarkoff
Joel Rees said: Maybe it would be better just to not make those directories until they are needed by an application, and then ask the user to name them instead of providing standard names. Actually, it is still workable if you carry your ~/.config/user-dirs.dir around, so that you could

Re: ffs and utf8

2014-12-02 Thread Joel Rees
(apologies for the html.) 2014/12/02 9:52 Dmitrij D. Czarkoff czark...@gmail.com: Joel Rees said: Now, what would you do with this? ジョエル Why not decompose it to the following? ジョエル Because it is not what Unicode normalization is. Well, it definitely isn't

Re: ffs and utf8

2014-12-01 Thread Anthony J. Bentley
Hi Ingo, Ingo Schwarze writes: While the article is old, the essence of what Schneier said here still stands, and it is not likely to fall in the future: https://www.schneier.com/crypto-gram-0007.html#9 The most interesting sentence here is: Unicode is just too complex to ever be secure.

Re: ffs and utf8

2014-12-01 Thread pizdelect
On Sat, Nov 29, 2014 at 09:48:53PM +0100, Dmitrij D. Czarkoff wrote: That said, the standard provides just enough facilities to make filesystem-related aspects of Unicode work nicely, particularily in case of utf-8. Eg. ability to enforce NFD for all operations on file names could actually

Re: ffs and utf8

2014-12-01 Thread Dmitrij D. Czarkoff
pizdel...@gmail.com said: How do you 'enforce' NFD? Let the kernel normalize (ie /destructively/ transform) the file names behind user's back, so that a file will be listed with a different name than that with which it was created? That's very nice and secure, indeed. I would enforce

Re: ffs and utf8

2014-12-01 Thread Janne Johansson
2014-12-01 10:20 GMT+01:00 Dmitrij D. Czarkoff czark...@gmail.com: pizdel...@gmail.com said: How do you 'enforce' NFD? Let the kernel normalize (ie /destructively/ transform) the file names behind user's back, so that a file will be listed with a different name than that with which it

Re: ffs and utf8

2014-12-01 Thread Stefan Sperling
On Mon, Dec 01, 2014 at 10:38:40AM +0200, pizdel...@gmail.com wrote: On Sat, Nov 29, 2014 at 09:48:53PM +0100, Dmitrij D. Czarkoff wrote: That said, the standard provides just enough facilities to make filesystem-related aspects of Unicode work nicely, particularily in case of utf-8. Eg.

Re: ffs and utf8

2014-12-01 Thread Stefan Sperling
On Mon, Dec 01, 2014 at 10:20:08AM +0100, Dmitrij D. Czarkoff wrote: I would enforce normalization at filename access time (open(), fopen(), readdir(), etc). Yes, destructively transform. I would reject filenames that won't decode. If this is documented, I just don't see how it is behind

Re: ffs and utf8

2014-12-01 Thread Dmitrij D. Czarkoff
Stefan Sperling said: Bad idea. See my other post. Apple did this and broke existing applications. OpenBSD changed time_t and broke existing applications, but hardly anyone thinks it was a bad idea. Fancy filenames are long known to be problematic, so filename policy enforcement is a breakage

Re: ffs and utf8

2014-12-01 Thread Janne Johansson
2014-12-01 12:05 GMT+01:00 Dmitrij D. Czarkoff czark...@gmail.com: Stefan Sperling said: Bad idea. See my other post. Apple did this and broke existing applications. OpenBSD changed time_t and broke existing applications, but hardly anyone thinks it was a bad idea. Fancy filenames are

Re: ffs and utf8

2014-12-01 Thread Dmitrij D. Czarkoff
Janne Johansson said: There is quite a bit of difference between changing the storage format and making some dates impossible that previously did work. Don't think so. Something got changed, things got broken and need to be fixed. The only real question is: is the change worth the trouble. I

Re: ffs and utf8

2014-12-01 Thread Joel Rees
On Mon, Dec 1, 2014 at 8:43 PM, Dmitrij D. Czarkoff czark...@gmail.com wrote: Janne Johansson said: There is quite a bit of difference between changing the storage format and making some dates impossible that previously did work. Don't think so. Something got changed, things got broken and

Re: ffs and utf8

2014-12-01 Thread Dmitrij D. Czarkoff
Joel Rees said: Hmm. What would you suggest doing with the following file name? /etc (You may need a Japanese font to display it.) If you try to normalize it on a *nix box, it will hopefully conflict with your system file permissions. But, then what do you do with it? If you throw it

Re: ffs and utf8

2014-12-01 Thread Ted Unangst
On Mon, Dec 01, 2014 at 12:43, Dmitrij D. Czarkoff wrote: Janne Johansson said: There is quite a bit of difference between changing the storage format and making some dates impossible that previously did work. Don't think so. Something got changed, things got broken and need to be fixed.

Re: ffs and utf8

2014-12-01 Thread frantisek holop
Joel Rees, 01 Dec 2014 22:04: Hmm. What would you suggest doing with the following file name? /etc (You may need a Japanese font to display it.) If you try to normalize it on a *nix box, it will hopefully conflict with your system file permissions. But, then what do you do with it?

Re: ffs and utf8

2014-12-01 Thread frantisek holop
Stefan Sperling, 29 Nov 2014 18:17: Are you aware of 'detox' package? There's also converters/convmv $ touch »´ÁÉǑÄ« $ convmv * wrong/unknown from encoding! $ convmv -f utf8 -t latin1 * Starting a dry run without changes... iso-8859-1 doesn't cover all needed characters for: ./»´ÁÉǑÄ« To

Re: ffs and utf8

2014-12-01 Thread Joel Rees
On Mon, Dec 1, 2014 at 11:13 PM, Dmitrij D. Czarkoff czark...@gmail.com wrote: Joel Rees said: Hmm. What would you suggest doing with the following file name? /etc (You may need a Japanese font to display it.) If you try to normalize it on a *nix box, it will hopefully conflict with your

Re: ffs and utf8

2014-12-01 Thread Anthony J. Bentley
Ted Unangst writes: On Mon, Dec 01, 2014 at 12:43, Dmitrij D. Czarkoff wrote: Janne Johansson said: There is quite a bit of difference between changing the storage format and making some dates impossible that previously did work. Don't think so. Something got changed, things got

Re: ffs and utf8

2014-12-01 Thread Dmitrij D. Czarkoff
Joel Rees said: Now, what would you do with this? ジョエル Why not decompose it to the following? ジョエル Because it is not what Unicode normalization is. I know what the Unicode rules say, but my boss says, if I'm going to play with file names, he wants it done his way. And now you

Re: ffs and utf8

2014-11-30 Thread Dmitrij D. Czarkoff
Joel Rees said: That said, the standard provides just enough facilities to make filesystem-related aspects of Unicode work nicely, particularily in case of utf-8. Eg. ability to enforce NFD for all operations on file names could actually make several things more secure by preventing homograph

Re: ffs and utf8

2014-11-30 Thread Dmitrij D. Czarkoff
Thomas Bohl said: # ls | cat Will display the characters right. Not entirely sure why though. From ls(1) manual: | -q Force printing of non-graphic characters in file names as the | character `?'; this is the default when output is to a terminal. -- Dmitrij D. Czarkoff

Re: ffs and utf8

2014-11-30 Thread Joel Rees
On Sun, Nov 30, 2014 at 6:31 PM, Dmitrij D. Czarkoff czark...@gmail.com wrote: Joel Rees said: That said, the standard provides just enough facilities to make filesystem-related aspects of Unicode work nicely, particularily in case of utf-8. Eg. ability to enforce NFD for all operations on

Re: ffs and utf8

2014-11-30 Thread Christian Weisgerber
On 2014-11-29, Ingo Schwarze schwa...@usta.de wrote: But Unicode must never be allowed near anything that might get executed as program code, including scripts in interpreted languages, including, but not limited to, the shell. In particular, that means trying to handle Unicode in filenames

ffs and utf8

2014-11-29 Thread frantisek holop
/read/renamed/deleted without problems. is it true to say then, that ffs is entirely utf8 safe, and/or that ffs is actually an utf-8 encoded filesystem as IIRC Mac OS is? or is it some kind of happy accident that it works? :) -f -- mips = meaningless index of processor speed

Re: ffs and utf8

2014-11-29 Thread Ville Valkonen
Hello, On 29 November 2014 at 14:02, frantisek holop min...@obiit.org wrote: i have written for myself a small python3 script that removes accented characters and all utf8 symbols from filenames, a kind of utf-8 to ascii sanitizer. Are you aware of 'detox' package? -- Regards, Ville

Re: ffs and utf8

2014-11-29 Thread frantisek holop
frantisek holop, 29 Nov 2014 13:02: while working on it, i created some strange test cases (e.g. »´ÁÉǑÄ«) for filenames and i was pleasently surprised that the files were created/read/renamed/deleted without problems. i think i should clarify this a bit: they show perfect in midnight

Re: ffs and utf8

2014-11-29 Thread frantisek holop
Ville Valkonen, 29 Nov 2014 14:08: Are you aware of 'detox' package? $ touch »´ÁÉǑÄ« $ detox * $ ls A_A_A_A_C_A_A_ $ touch »´ÁÉǑÄ« $ my_silly_script $ ls aeoa perhaps with some massaging detox can be made to work like my script, i dont know. but that is actually besides the point. i wrote my

Re: ffs and utf8

2014-11-29 Thread Paolo Aglialoro
Shouldn't in 2014 the aim having all working in utf-8?

Re: ffs and utf8

2014-11-29 Thread frantisek holop
Paolo Aglialoro, 29 Nov 2014 13:56: Shouldn't in 2014 the aim having all working in utf-8? sure. but i like my filenames ascii and whitespaceless. shows my age. -f -- what a nice night for an evening. -- steven wright

Re: ffs and utf8

2014-11-29 Thread Dmitrij D. Czarkoff
frantisek holop said: is it true to say then, that ffs is entirely utf8 safe, and/or that ffs is actually an utf-8 encoded filesystem as IIRC Mac OS is? or is it some kind of happy accident that it works? :) As I get it, ffs is entirely utf8 safe because it is not encoding aware

Re: ffs and utf8

2014-11-29 Thread Lars
Hi, On 29.11.2014 13:20, frantisek holop wrote: i think i should clarify this a bit: they show perfect in midnight commander, not in shell. $ touch »´ÁÉǑÄ« $ ls ?? -f I had a similar problem some time ago and have been told that the ls tool is not aware of UTF-8. See here for

Re: ffs and utf8

2014-11-29 Thread Ted Unangst
On Sat, Nov 29, 2014 at 13:02, frantisek holop wrote: is it true to say then, that ffs is entirely utf8 safe, and/or that ffs is actually an utf-8 encoded filesystem as IIRC Mac OS is? or is it some kind of happy accident that it works? :) FFS stores filenames as bytes.

Re: ffs and utf8

2014-11-29 Thread Christian Weisgerber
On 2014-11-29, frantisek holop min...@obiit.org wrote: is it true to say then, that ffs is entirely utf8 safe, and/or that ffs is actually an utf-8 encoded filesystem as IIRC Mac OS is? The former. Unix filesystems accept all bytes for filenames with the exception of 0x2f, which serves

Re: ffs and utf8

2014-11-29 Thread Christian Weisgerber
On 2014-11-29, frantisek holop min...@obiit.org wrote: $ touch »´ÁÉǑÄ« $ ls ?? If you need a locale-aware ls(1), use the one from the colorls package. (Don't worry, colored output is entirely optional.) -- Christian naddy Weisgerber na...@mips.inka.de

Re: ffs and utf8

2014-11-29 Thread Ingo Schwarze
Hi, Paolo Aglialoro wrote on Sat, Nov 29, 2014 at 01:56:23PM +0100: Shouldn't in 2014 the aim having all working in utf-8? Most definitely not, that would directly run contrary to some of OpenBSD's most important project goals: Correctness, simplicity, security. While the article is old, the

Re: ffs and utf8

2014-11-29 Thread Jan Stary
On Nov 29 13:02:34, min...@obiit.org wrote: is it true to say then, that ffs is entirely utf8 safe, and/or that ffs is actually an utf-8 encoded filesystem The file names are just strings of bytes. There is nothing UTF8 about them. On Nov 29 14:23:35, czark...@gmail.com wrote: (Interestingly

Re: ffs and utf8

2014-11-29 Thread Stefan Sperling
On Sat, Nov 29, 2014 at 02:08:32PM +0200, Ville Valkonen wrote: Hello, On 29 November 2014 at 14:02, frantisek holop min...@obiit.org wrote: i have written for myself a small python3 script that removes accented characters and all utf8 symbols from filenames, a kind of utf-8 to ascii

Re: ffs and utf8

2014-11-29 Thread Dmitrij D. Czarkoff
Ingo Schwarze said: While the article is old, the essence of what Schneier said here still stands, and it is not likely to fall in the future: https://www.schneier.com/crypto-gram-0007.html#9 Sorry, but this article is mostly based on lack of understanding of Unicode. that would directly

Re: ffs and utf8

2014-11-29 Thread Joel Rees
On Sun, Nov 30, 2014 at 5:48 AM, Dmitrij D. Czarkoff czark...@gmail.com wrote: Ingo Schwarze said: While the article is old, the essence of what Schneier said here still stands, and it is not likely to fall in the future: https://www.schneier.com/crypto-gram-0007.html#9 Sorry, but this

Re: ffs and utf8

2014-11-29 Thread Thomas Bohl
Am 29.11.2014 um 13:20 schrieb frantisek holop: i think i should clarify this a bit: they show perfect in midnight commander, not in shell. $ touch »´ÁÉǑÄ« $ ls ?? # ls | cat Will display the characters right. Not entirely sure why though.