Re: Let's discuss about unicode compositions for filenames!

2012-02-17 Thread Vincent Lefevre
On 2012-02-17 13:54:35 +0900, Hiroaki Nakamura wrote: Actually, whether filename is in NFC or NFD depends on the way of inputting filenames. If you type all characters, it is in NFC. No, or actually, perhaps this depends on the user configuration (e.g. keyboard configuration / input method).

Re: Let's discuss about unicode compositions for filenames!

2012-02-16 Thread Vincent Lefevre
On 2012-01-30 21:29:41 +0100, Stefan Sperling wrote: On Mon, Jan 30, 2012 at 09:09:22PM +0100, Branko Čibej wrote: Are you seriously proposing that we /support/ such broken, hackish nonsense? How do you expect users to tell the difference between file names that look identical on the

Re: Let's discuss about unicode compositions for filenames!

2012-02-16 Thread Hiroaki Nakamura
2012/2/17 Vincent Lefevre vincent-...@vinc17.net: On 2012-01-30 21:29:41 +0100, Stefan Sperling wrote: On Mon, Jan 30, 2012 at 09:09:22PM +0100, Branko Čibej wrote: Are you seriously proposing that we /support/ such broken, hackish nonsense? How do you expect users to tell the difference

AW: [RFC] Non-normalizing Unicode Composition Awareness (was: Let's discuss about unicode compositions for filenames!)

2012-02-14 Thread Markus Schaber
Composition Awareness (was: Let's discuss about unicode compositions for filenames!) Title: Non-normalizing Unicode Composition Awareness Version: 0.1 (2012-02-14) Context === Within Unicode, some characters can in the unicode standard be represented in 2 different ways (composed/decomposed), while

[RFC] Non-normalizing Unicode Composition Awareness (was: Let's discuss about unicode compositions for filenames!)

2012-02-13 Thread Thomas Åkesson
Title: Non-normalizing Unicode Composition Awareness Version: 0.1 (2012-02-14) Context === Within Unicode, some characters can in the unicode standard be represented in 2 different ways (composed/decomposed), while rendered equally on screen or in print. A unicode string (e.g. a file name)

Re: Let's discuss about unicode compositions for filenames!

2012-02-12 Thread Thomas Åkesson
On 11 feb 2012, at 13:10, Hiroaki Nakamura wrote: Hi, 2012/2/9 Thomas Åkesson tho...@akesson.cc: Hi, I have been interested in this issue for a couple of years and I remember it was discussed briefly at Subconf in Germany a couple of years ago. Branching the thread here because I'd

Re: Let's discuss about unicode compositions for filenames!

2012-02-12 Thread Stefan Sperling
On Sun, Feb 12, 2012 at 04:47:45PM +0100, Thomas Åkesson wrote: Would it make sense to formalize the different approaches into a couple of RFCs attempting to summarize the respective implications of each approach? I could try to write one up for the Non-normalizing approach. Detailed design

Re: Let's discuss about unicode compositions for filenames!

2012-02-12 Thread Thomas Åkesson
On 12 feb 2012, at 16:59, Stefan Sperling wrote: On Sun, Feb 12, 2012 at 04:47:45PM +0100, Thomas Åkesson wrote: Would it make sense to formalize the different approaches into a couple of RFCs attempting to summarize the respective implications of each approach? I could try to write one up

Re: Let's discuss about unicode compositions for filenames!

2012-02-11 Thread Hiroaki Nakamura
2012/2/9 Markus Schaber m.scha...@3s-software.com: Hi, Von: Stefan Sperling [mailto:s...@elego.de] On Thu, Feb 09, 2012 at 12:20:14AM +0900, Hiroaki Nakamura wrote: [Upgrade options / backwards compatibility for proposed unicode normalization fix] - Need to re-checkout existing working

Re: Let's discuss about unicode compositions for filenames!

2012-02-11 Thread Hiroaki Nakamura
Hi, 2012/2/9 Thomas Åkesson tho...@akesson.cc: Hi, I have been interested in this issue for a couple of years and I remember it was discussed briefly at Subconf in Germany a couple of years ago. Branching the thread here because I'd like to propose a different approach than Hiroaki. This

Re: Let's discuss about unicode compositions for filenames!

2012-02-11 Thread Branko Čibej
On 11.02.2012 13:05, Hiroaki Nakamura wrote: 2012/2/9 Markus Schaber m.scha...@3s-software.com: Hi, Von: Stefan Sperling [mailto:s...@elego.de] On Thu, Feb 09, 2012 at 12:20:14AM +0900, Hiroaki Nakamura wrote: [Upgrade options / backwards compatibility for proposed unicode normalization

Re: Let's discuss about unicode compositions for filenames!

2012-02-11 Thread Hiroaki Nakamura
2012/2/11 Branko Čibej br...@apache.org: On 11.02.2012 13:05, Hiroaki Nakamura wrote: 2012/2/9 Markus Schaber m.scha...@3s-software.com: Von: Stefan Sperling [mailto:s...@elego.de] On Thu, Feb 09, 2012 at 12:20:14AM +0900, Hiroaki Nakamura wrote: [Upgrade options / backwards compatibility for

Re: Let's discuss about unicode compositions for filenames!

2012-02-08 Thread Stefan Sperling
On Thu, Feb 09, 2012 at 12:20:14AM +0900, Hiroaki Nakamura wrote: 2012/1/30 Stefan Sperling s...@elego.de: I think the following caveats would be acceptable if they help with fixing the issue:  - An upgrade path which optionally requires people to check all   working copies out again,

Re: Let's discuss about unicode compositions for filenames!

2012-02-08 Thread Hiroaki Nakamura
Hi, thanks for your review. 2012/2/9 Stefan Sperling s...@elego.de: Open questions: Here I try to answer these. Of course, I welcome everyone to answer.  - How can the client retrieve the configuration from the server?   This is related to server-dictated configuration, see  

Re: Let's discuss about unicode compositions for filenames!

2012-02-08 Thread Daniel Shahaf
Hiroaki Nakamura wrote on Thu, Feb 09, 2012 at 07:16:57 +0900: 2012/2/9 Stefan Sperling s...@elego.de:  - What happens if NFC/NFD is enabled in repository config, but the   repository contains non-normalised paths (i.e. did not go through   a dump/load cycle to normalise all paths)? I

Re: Let's discuss about unicode compositions for filenames!

2012-02-08 Thread Thomas Åkesson
Hi, I have been interested in this issue for a couple of years and I remember it was discussed briefly at Subconf in Germany a couple of years ago. Branching the thread here because I'd like to propose a different approach than Hiroaki. This proposition is not very different from the note

AW: Let's discuss about unicode compositions for filenames!

2012-02-08 Thread Markus Schaber
Hi, Von: Stefan Sperling [mailto:s...@elego.de] On Thu, Feb 09, 2012 at 12:20:14AM +0900, Hiroaki Nakamura wrote: [Upgrade options / backwards compatibility for proposed unicode normalization fix] - Need to re-checkout existing working copies of the repository? = Yes, but only if

Re: Let's discuss about unicode compositions for filenames!

2012-02-07 Thread Hiroaki Nakamura
2012/2/7 Branko Čibej br...@apache.org: On 06.02.2012 22:26, Hiroaki Nakamura wrote: The Unicode Standard says canonical equivalent sequences should be interpreted the same way. * 1.1 Canonical and Compatibility Equivalence   http://unicode.org/reports/tr15/#Canonical_Equivalence * 2.12

Re: Let's discuss about unicode compositions for filenames!

2012-02-07 Thread Stefan Sperling
On Tue, Feb 07, 2012 at 02:43:19PM +0100, Branko Čibej wrote: The client-side mapping table is a more general solution, if a lot harder to implement. But it brings additional benefits in that we could use it to, e.g., transliterate characters that are allowed by some file systems, but not

Re: Let's discuss about unicode compositions for filenames!

2012-02-07 Thread Branko Čibej
On 07.02.2012 15:00, Stefan Sperling wrote: On Tue, Feb 07, 2012 at 02:43:19PM +0100, Branko Čibej wrote: The client-side mapping table is a more general solution, if a lot harder to implement. But it brings additional benefits in that we could use it to, e.g., transliterate characters that

Re: Let's discuss about unicode compositions for filenames!

2012-02-06 Thread Hiroaki Nakamura
Hi, all. It seems there is no further discussion. I think the conclusion for the short term solution is: We convert unnormalized paths to NFC normalized paths on clients only, that is, svn_path_cstring_to_utf8. It is the same approach as utf8precompose_macosx_2.patch in

Re: Let's discuss about unicode compositions for filenames!

2012-02-06 Thread Branko Čibej
On 06.02.2012 14:10, Hiroaki Nakamura wrote: Hi, all. It seems there is no further discussion. I think the conclusion for the short term solution is: We convert unnormalized paths to NFC normalized paths on clients only, that is, svn_path_cstring_to_utf8. It is the same approach as

Re: Let's discuss about unicode compositions for filenames!

2012-02-06 Thread Stefan Sperling
On Mon, Feb 06, 2012 at 02:28:40PM +0100, Branko Čibej wrote: On 06.02.2012 14:10, Hiroaki Nakamura wrote: Hi, all. It seems there is no further discussion. I think the conclusion for the short term solution is: We convert unnormalized paths to NFC normalized paths on clients only,

Re: Let's discuss about unicode compositions for filenames!

2012-02-06 Thread Hiroaki Nakamura
2012/2/6 Stefan Sperling s...@elego.de: On Mon, Feb 06, 2012 at 02:28:40PM +0100, Branko Čibej wrote: On 06.02.2012 14:10, Hiroaki Nakamura wrote: Hi, all. It seems there is no further discussion. I think the conclusion for the short term solution is: We convert unnormalized paths to

Re: Let's discuss about unicode compositions for filenames!

2012-02-06 Thread Branko Čibej
On 06.02.2012 22:26, Hiroaki Nakamura wrote: The Unicode Standard says canonical equivalent sequences should be interpreted the same way. * 1.1 Canonical and Compatibility Equivalence http://unicode.org/reports/tr15/#Canonical_Equivalence * 2.12 Equivalent Sequences and Normalization

Re: Let's discuss about unicode compositions for filenames!

2012-02-06 Thread Stefan Sperling
On Tue, Feb 07, 2012 at 06:26:54AM +0900, Hiroaki Nakamura wrote: 2012/2/6 Stefan Sperling s...@elego.de:  2) Do something else that effects repositories, too, and provide    a clean upgrade path for everyone (servers and clients).    AFAIK nobody has made a suggestion as to what could be

Re: Let's discuss about unicode compositions for filenames!

2012-02-04 Thread Hiroaki Nakamura
2012/2/3 Julian Foad julianf...@btopenworld.com: You may well be correct that NFC is never longer than NFD, but that's not the question.  The question is whether NFC may be longer than the current paths (which are not normalized to normalization form C or to form D).  And the answer is yes

Re: Let's discuss about unicode compositions for filenames!

2012-02-03 Thread Julian Foad
Hiroaki Nakamura wrote: It would be nice if we could normalize paths in the repository without having to perform a dump/reload cycle, but I don't know how that would work in FSFS. It won't.  Changing the encoding increase the length (in bytes) of the string (in the dirents hash, for

Re: Let's discuss about unicode compositions for filenames!

2012-02-02 Thread Peter Samuelson
[Hiroaki Nakamura] In option (2), we do n12n on all clients on all platforms, and we include web_dav_svn in clients. So we convert all input paths to the server encoding, which is NFC. Indeed. But the very concept of a server encoding means we are involving the server side. Which invokes a

Re: Let's discuss about unicode compositions for filenames!

2012-02-02 Thread Hiroaki Nakamura
2012/2/3 Peter Samuelson pe...@p12n.org: [Hiroaki Nakamura] In option (2), we do n12n on all clients on all platforms, and we include web_dav_svn in clients. So we convert all input paths to the server encoding, which is NFC. Indeed.  But the very concept of a server encoding means we are

Re: Let's discuss about unicode compositions for filenames!

2012-02-02 Thread Branko Čibej
On 02.02.2012 20:59, Hiroaki Nakamura wrote: So we need to change servers too. When servers read filenames from repositories, they first convert to NFC and then process commands. That won't work. You have to do the initial lookup in a normalization-agnostic way, and neither BDB nor FSFS makes

Re: Let's discuss about unicode compositions for filenames!

2012-02-02 Thread Daniel Shahaf
Branko Čibej wrote on Thu, Feb 02, 2012 at 21:03:47 +0100: On 02.02.2012 20:22, Peter Samuelson wrote: [Hiroaki Nakamura] In option (2), we do n12n on all clients on all platforms, and we include web_dav_svn in clients. So we convert all input paths to the server encoding, which is NFC.

Re: Let's discuss about unicode compositions for filenames!

2012-02-02 Thread Hiroaki Nakamura
2012/2/3 Branko Čibej br...@xbc.nu: On 02.02.2012 20:59, Hiroaki Nakamura wrote: So we need to change servers too. When servers read filenames from repositories, they first convert to NFC and then process commands. That won't work. You have to do the initial lookup in a

Re: Let's discuss about unicode compositions for filenames!

2012-02-02 Thread Peter Samuelson
[Hiroaki Nakamura] Existing repositories, I think it would be better to convert them too using svndump/svnload. And we change svnload to convert filenames to NFC. However in reality we cannot force users to convert every existing repository. Also note that if you convert a repository (via

Re: Let's discuss about unicode compositions for filenames!

2012-02-02 Thread Hiroaki Nakamura
2012/2/3 Daniel Shahaf danie...@elego.de: Branko Čibej wrote on Thu, Feb 02, 2012 at 21:03:47 +0100: On 02.02.2012 20:22, Peter Samuelson wrote: [Hiroaki Nakamura] In option (2), we do n12n on all clients on all platforms, and we include web_dav_svn in clients. So we convert all input

Re: Let's discuss about unicode compositions for filenames!

2012-02-02 Thread Peter Samuelson
On 02.02.2012 20:22, Peter Samuelson wrote: By proposing a client-only solution, I hope to avoid _all_ those questions. [Branko Cibej] Can't see how that works, unless you either make the client-side solution optional, create a mapping table, or make name lookup on the server agnostic to

Re: Let's discuss about unicode compositions for filenames!

2012-02-02 Thread Daniel Shahaf
Hiroaki Nakamura wrote on Fri, Feb 03, 2012 at 05:33:02 +0900: 2012/2/3 Daniel Shahaf danie...@elego.de: Branko Čibej wrote on Thu, Feb 02, 2012 at 21:03:47 +0100: On 02.02.2012 20:22, Peter Samuelson wrote: [Hiroaki Nakamura] In option (2), we do n12n on all clients on all platforms,

Re: Let's discuss about unicode compositions for filenames!

2012-02-02 Thread Branko Čibej
On 02.02.2012 21:28, Hiroaki Nakamura wrote: 2012/2/3 Branko Čibej br...@xbc.nu: On 02.02.2012 20:59, Hiroaki Nakamura wrote: So we need to change servers too. When servers read filenames from repositories, they first convert to NFC and then process commands. That won't work. You have to do

Re: Let's discuss about unicode compositions for filenames!

2012-02-02 Thread Hiroaki Nakamura
2012/2/3 Peter Samuelson pe...@p12n.org: [Hiroaki Nakamura] Existing repositories, I think it would be better to convert them too using svndump/svnload. And we change svnload to convert filenames to NFC. However in reality we cannot force users to convert every existing repository. Also

Re: Let's discuss about unicode compositions for filenames!

2012-01-31 Thread Peter Samuelson
[reordering the conversation flow slightly] [Peter Samuelson] That's the implementation I would like to see, to be honest. Start with the observation that we can treat Mac OS X NFD paths as a client character encoding. Now observe that it is lossy. But ... almost all non-Unicode

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Stefan Sperling
On Sun, Jan 29, 2012 at 07:38:44PM +0900, Hiroaki Nakamura wrote: Hi folks! I read the note about unicode compositions for filenames http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames and would like to drive the discussion. Hi, I am very happy to hear

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Branko Čibej
On 30.01.2012 13:30, Stefan Sperling wrote: On Sun, Jan 29, 2012 at 07:38:44PM +0900, Hiroaki Nakamura wrote: Hi folks! I read the note about unicode compositions for filenames http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames and would like to drive

AW: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Markus Schaber
Hi, Von: Stefan Sperling [mailto:s...@elego.de] On Sun, Jan 29, 2012 at 07:38:44PM +0900, Hiroaki Nakamura wrote: I read the note about unicode compositions for filenames http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames and would like to drive the

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Peter Samuelson
[Stefan Sperling] We could also open the parent directory, read all the filenames within it, normalise them all, and then search the resulting list. This works, expect if a name exists twice, once in NFC form and once in NFD form. We'd somehow have to solve the name collision in the

AW: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Markus Schaber
Hi, Peter, Von: Peter Samuelson [mailto:pe...@p12n.org] [Stefan Sperling] We could also open the parent directory, read all the filenames within it, normalise them all, and then search the resulting list. This works, expect if a name exists twice, once in NFC form and once in NFD

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Julian Foad
Let me just note some of the main similarities and differences between this issue of Unicode compositions and the issue of case-sensitivity in file names. Differences:   * NFC and NFD look the same when displayed, and most users haven't heard of them and don't expect that a computer might

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Neels J Hofmeyr
On 01/30/2012 02:00 PM, Markus Schaber wrote: Maybe the best solution to this issue is a client-only solution, in a similar way the case sensitivity problem is tackled. Spinning the client-only thought a bit: Imagine a repos with a un*x user adding a file called föö. Now an OSX user checks it

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Stefan Sperling
On Tue, Jan 31, 2012 at 01:42:21AM +0900, Hiroaki Nakamura wrote: 2012/1/30 Stefan Sperling s...@elego.de: My friend is not willing to upgrade to a new client version yet, which is fine because all 1.x releases of Subversion clients are supposed to be compatible with all 1.y releases of

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Johan Corveleyn
On Mon, Jan 30, 2012 at 8:10 PM, Stefan Sperling s...@elego.de wrote: On Tue, Jan 31, 2012 at 01:42:21AM +0900, Hiroaki Nakamura wrote: 2012/1/30 Stefan Sperling s...@elego.de: [ ... ] And mixing various unicode forms works fine today if the filesystem used by the client supports this. The

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Branko Čibej
On 30.01.2012 21:00, Johan Corveleyn wrote: On Mon, Jan 30, 2012 at 8:10 PM, Stefan Sperling s...@elego.de wrote: On Tue, Jan 31, 2012 at 01:42:21AM +0900, Hiroaki Nakamura wrote: 2012/1/30 Stefan Sperling s...@elego.de: [ ... ] And mixing various unicode forms works fine today if the

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Johan Corveleyn
On Mon, Jan 30, 2012 at 9:09 PM, Branko Čibej br...@xbc.nu wrote: On 30.01.2012 21:00, Johan Corveleyn wrote: On Mon, Jan 30, 2012 at 8:10 PM, Stefan Sperling s...@elego.de wrote: On Tue, Jan 31, 2012 at 01:42:21AM +0900, Hiroaki Nakamura wrote: 2012/1/30 Stefan Sperling s...@elego.de: [ ...

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Stefan Sperling
On Mon, Jan 30, 2012 at 09:09:22PM +0100, Branko Čibej wrote: Are you seriously proposing that we /support/ such broken, hackish nonsense? How do you expect users to tell the difference between file names that look identical on the character level, but are not on the code point level?

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Branko Čibej
On 30.01.2012 21:29, Stefan Sperling wrote: On Mon, Jan 30, 2012 at 09:09:22PM +0100, Branko Čibej wrote: Are you seriously proposing that we /support/ such broken, hackish nonsense? How do you expect users to tell the difference between file names that look identical on the character level,

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Stefan Sperling
On Mon, Jan 30, 2012 at 09:34:03PM +0100, Branko Čibej wrote: Sure, if you want to turn on such normalization, you pretty much have to dump and reload the repository as well as upgrading all working copies (again). Either that, or use form-independent comparison on the server, which isn't such

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Peter Samuelson
[Stefan Sperling] It is indeed harder because we are passing paths verbatim to sqlite. I doubt having more than one form of a given path in wc.db is fun... That's the implementation I would like to see, to be honest. Start with the observation that we can treat Mac OS X NFD paths as a client

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Branko Čibej
On 31.01.2012 00:14, Peter Samuelson wrote: [Stefan Sperling] It is indeed harder because we are passing paths verbatim to sqlite. I doubt having more than one form of a given path in wc.db is fun... That's the implementation I would like to see, to be honest. Start with the observation that

RE: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Bert Huijben
-Original Message- From: Branko Čibej [mailto:br...@xbc.nu] Sent: maandag 30 januari 2012 16:11 To: dev@subversion.apache.org Subject: Re: Let's discuss about unicode compositions for filenames! On 31.01.2012 00:14, Peter Samuelson wrote: [Stefan Sperling] It is indeed harder

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Branko Čibej
On 31.01.2012 02:47, Bert Huijben wrote: Last time we discussed this in depth (a few years ago), Windows didn't perform the normalization you describe here. Was this added later? (Any documentation pointers?) Ouch, you're right ... Windows API doesn't normalize the paths. -- Brane

Let's discuss about unicode compositions for filenames!

2012-01-29 Thread Hiroaki Nakamura
Hi folks! I read the note about unicode compositions for filenames http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames and would like to drive the discussion. First, for me, the short term solution (4) seems too difficult to implement. It is very complex and