On 2012-02-17 13:54:35 +0900, Hiroaki Nakamura wrote:
Actually, whether filename is in NFC or NFD depends on the way of
inputting filenames.
If you type all characters, it is in NFC.
No, or actually, perhaps this depends on the user configuration
(e.g. keyboard configuration / input method).
On 2012-01-30 21:29:41 +0100, Stefan Sperling wrote:
On Mon, Jan 30, 2012 at 09:09:22PM +0100, Branko Čibej wrote:
Are you seriously proposing that we /support/ such broken, hackish
nonsense? How do you expect users to tell the difference between file
names that look identical on the
2012/2/17 Vincent Lefevre vincent-...@vinc17.net:
On 2012-01-30 21:29:41 +0100, Stefan Sperling wrote:
On Mon, Jan 30, 2012 at 09:09:22PM +0100, Branko Čibej wrote:
Are you seriously proposing that we /support/ such broken, hackish
nonsense? How do you expect users to tell the difference
Composition Awareness (was: Let's
discuss about unicode compositions for filenames!)
Title: Non-normalizing Unicode Composition Awareness
Version: 0.1 (2012-02-14)
Context
===
Within Unicode, some characters can in the unicode standard be represented in 2
different ways (composed/decomposed), while
Title: Non-normalizing Unicode Composition Awareness
Version: 0.1 (2012-02-14)
Context
===
Within Unicode, some characters can in the unicode standard be represented in 2
different ways (composed/decomposed), while rendered equally on screen or in
print. A unicode string (e.g. a file name)
On 11 feb 2012, at 13:10, Hiroaki Nakamura wrote:
Hi,
2012/2/9 Thomas Åkesson tho...@akesson.cc:
Hi,
I have been interested in this issue for a couple of years and I remember it
was discussed briefly at Subconf in Germany a couple of years ago.
Branching the thread here because I'd
On Sun, Feb 12, 2012 at 04:47:45PM +0100, Thomas Åkesson wrote:
Would it make sense to formalize the different approaches into a
couple of RFCs attempting to summarize the respective implications of
each approach? I could try to write one up for the Non-normalizing
approach.
Detailed design
On 12 feb 2012, at 16:59, Stefan Sperling wrote:
On Sun, Feb 12, 2012 at 04:47:45PM +0100, Thomas Åkesson wrote:
Would it make sense to formalize the different approaches into a
couple of RFCs attempting to summarize the respective implications of
each approach? I could try to write one up
2012/2/9 Markus Schaber m.scha...@3s-software.com:
Hi,
Von: Stefan Sperling [mailto:s...@elego.de]
On Thu, Feb 09, 2012 at 12:20:14AM +0900, Hiroaki Nakamura wrote:
[Upgrade options / backwards compatibility for proposed unicode
normalization fix]
- Need to re-checkout existing working
Hi,
2012/2/9 Thomas Åkesson tho...@akesson.cc:
Hi,
I have been interested in this issue for a couple of years and I remember it
was discussed briefly at Subconf in Germany a couple of years ago.
Branching the thread here because I'd like to propose a different approach
than Hiroaki. This
On 11.02.2012 13:05, Hiroaki Nakamura wrote:
2012/2/9 Markus Schaber m.scha...@3s-software.com:
Hi,
Von: Stefan Sperling [mailto:s...@elego.de]
On Thu, Feb 09, 2012 at 12:20:14AM +0900, Hiroaki Nakamura wrote:
[Upgrade options / backwards compatibility for proposed unicode
normalization
2012/2/11 Branko Čibej br...@apache.org:
On 11.02.2012 13:05, Hiroaki Nakamura wrote:
2012/2/9 Markus Schaber m.scha...@3s-software.com:
Von: Stefan Sperling [mailto:s...@elego.de]
On Thu, Feb 09, 2012 at 12:20:14AM +0900, Hiroaki Nakamura wrote:
[Upgrade options / backwards compatibility for
On Thu, Feb 09, 2012 at 12:20:14AM +0900, Hiroaki Nakamura wrote:
2012/1/30 Stefan Sperling s...@elego.de:
I think the following caveats would be acceptable if they help
with fixing the issue:
- An upgrade path which optionally requires people to check all
working copies out again,
Hi, thanks for your review.
2012/2/9 Stefan Sperling s...@elego.de:
Open questions:
Here I try to answer these. Of course, I welcome everyone to answer.
- How can the client retrieve the configuration from the server?
This is related to server-dictated configuration, see
Hiroaki Nakamura wrote on Thu, Feb 09, 2012 at 07:16:57 +0900:
2012/2/9 Stefan Sperling s...@elego.de:
- What happens if NFC/NFD is enabled in repository config, but the
repository contains non-normalised paths (i.e. did not go through
a dump/load cycle to normalise all paths)?
I
Hi,
I have been interested in this issue for a couple of years and I remember it
was discussed briefly at Subconf in Germany a couple of years ago.
Branching the thread here because I'd like to propose a different approach than
Hiroaki. This proposition is not very different from the note
Hi,
Von: Stefan Sperling [mailto:s...@elego.de]
On Thu, Feb 09, 2012 at 12:20:14AM +0900, Hiroaki Nakamura wrote:
[Upgrade options / backwards compatibility for proposed unicode
normalization fix]
- Need to re-checkout existing working copies of the repository?
= Yes, but only if
2012/2/7 Branko Čibej br...@apache.org:
On 06.02.2012 22:26, Hiroaki Nakamura wrote:
The Unicode Standard says canonical equivalent sequences should be
interpreted the same way.
* 1.1 Canonical and Compatibility Equivalence
http://unicode.org/reports/tr15/#Canonical_Equivalence
* 2.12
On Tue, Feb 07, 2012 at 02:43:19PM +0100, Branko Čibej wrote:
The client-side mapping table is a more general solution, if a
lot harder to implement.
But it brings additional benefits in that we could use it to, e.g.,
transliterate characters that are allowed by some file systems, but not
On 07.02.2012 15:00, Stefan Sperling wrote:
On Tue, Feb 07, 2012 at 02:43:19PM +0100, Branko Čibej wrote:
The client-side mapping table is a more general solution, if a
lot harder to implement.
But it brings additional benefits in that we could use it to, e.g.,
transliterate characters that
Hi, all.
It seems there is no further discussion.
I think the conclusion for the short term solution is:
We convert unnormalized paths to NFC normalized paths on clients only,
that is, svn_path_cstring_to_utf8.
It is the same approach as utf8precompose_macosx_2.patch in
On 06.02.2012 14:10, Hiroaki Nakamura wrote:
Hi, all.
It seems there is no further discussion.
I think the conclusion for the short term solution is:
We convert unnormalized paths to NFC normalized paths on clients only,
that is, svn_path_cstring_to_utf8.
It is the same approach as
On Mon, Feb 06, 2012 at 02:28:40PM +0100, Branko Čibej wrote:
On 06.02.2012 14:10, Hiroaki Nakamura wrote:
Hi, all.
It seems there is no further discussion.
I think the conclusion for the short term solution is:
We convert unnormalized paths to NFC normalized paths on clients only,
2012/2/6 Stefan Sperling s...@elego.de:
On Mon, Feb 06, 2012 at 02:28:40PM +0100, Branko Čibej wrote:
On 06.02.2012 14:10, Hiroaki Nakamura wrote:
Hi, all.
It seems there is no further discussion.
I think the conclusion for the short term solution is:
We convert unnormalized paths to
On 06.02.2012 22:26, Hiroaki Nakamura wrote:
The Unicode Standard says canonical equivalent sequences should be
interpreted the same way.
* 1.1 Canonical and Compatibility Equivalence
http://unicode.org/reports/tr15/#Canonical_Equivalence
* 2.12 Equivalent Sequences and Normalization
On Tue, Feb 07, 2012 at 06:26:54AM +0900, Hiroaki Nakamura wrote:
2012/2/6 Stefan Sperling s...@elego.de:
2) Do something else that effects repositories, too, and provide
a clean upgrade path for everyone (servers and clients).
AFAIK nobody has made a suggestion as to what could be
2012/2/3 Julian Foad julianf...@btopenworld.com:
You may well be correct that NFC is never longer than NFD, but that's not the
question. The question is whether NFC may be longer than the current paths
(which are not normalized to normalization form C or to form D). And the
answer is yes
Hiroaki Nakamura wrote:
It would be nice if we could normalize paths in the repository without
having to perform a dump/reload cycle, but I don't know how that
would work in FSFS.
It won't. Changing the encoding increase the length (in bytes) of the
string (in the dirents hash, for
[Hiroaki Nakamura]
In option (2), we do n12n on all clients on all platforms, and we
include web_dav_svn in clients. So we convert all input paths to
the server encoding, which is NFC.
Indeed. But the very concept of a server encoding means we are
involving the server side. Which invokes a
2012/2/3 Peter Samuelson pe...@p12n.org:
[Hiroaki Nakamura]
In option (2), we do n12n on all clients on all platforms, and we
include web_dav_svn in clients. So we convert all input paths to
the server encoding, which is NFC.
Indeed. But the very concept of a server encoding means we are
On 02.02.2012 20:59, Hiroaki Nakamura wrote:
So we need to change servers too. When servers read filenames from
repositories, they first convert to NFC and then process commands.
That won't work. You have to do the initial lookup in a
normalization-agnostic way, and neither BDB nor FSFS makes
Branko Čibej wrote on Thu, Feb 02, 2012 at 21:03:47 +0100:
On 02.02.2012 20:22, Peter Samuelson wrote:
[Hiroaki Nakamura]
In option (2), we do n12n on all clients on all platforms, and we
include web_dav_svn in clients. So we convert all input paths to
the server encoding, which is NFC.
2012/2/3 Branko Čibej br...@xbc.nu:
On 02.02.2012 20:59, Hiroaki Nakamura wrote:
So we need to change servers too. When servers read filenames from
repositories, they first convert to NFC and then process commands.
That won't work. You have to do the initial lookup in a
[Hiroaki Nakamura]
Existing repositories, I think it would be better to convert them too using
svndump/svnload. And we change svnload to convert filenames to NFC.
However in reality we cannot force users to convert every existing repository.
Also note that if you convert a repository (via
2012/2/3 Daniel Shahaf danie...@elego.de:
Branko Čibej wrote on Thu, Feb 02, 2012 at 21:03:47 +0100:
On 02.02.2012 20:22, Peter Samuelson wrote:
[Hiroaki Nakamura]
In option (2), we do n12n on all clients on all platforms, and we
include web_dav_svn in clients. So we convert all input
On 02.02.2012 20:22, Peter Samuelson wrote:
By proposing a client-only solution, I hope to avoid _all_ those
questions.
[Branko Cibej]
Can't see how that works, unless you either make the client-side
solution optional, create a mapping table, or make name lookup on the
server agnostic to
Hiroaki Nakamura wrote on Fri, Feb 03, 2012 at 05:33:02 +0900:
2012/2/3 Daniel Shahaf danie...@elego.de:
Branko Čibej wrote on Thu, Feb 02, 2012 at 21:03:47 +0100:
On 02.02.2012 20:22, Peter Samuelson wrote:
[Hiroaki Nakamura]
In option (2), we do n12n on all clients on all platforms,
On 02.02.2012 21:28, Hiroaki Nakamura wrote:
2012/2/3 Branko Čibej br...@xbc.nu:
On 02.02.2012 20:59, Hiroaki Nakamura wrote:
So we need to change servers too. When servers read filenames from
repositories, they first convert to NFC and then process commands.
That won't work. You have to do
2012/2/3 Peter Samuelson pe...@p12n.org:
[Hiroaki Nakamura]
Existing repositories, I think it would be better to convert them too using
svndump/svnload. And we change svnload to convert filenames to NFC.
However in reality we cannot force users to convert every existing
repository.
Also
[reordering the conversation flow slightly]
[Peter Samuelson]
That's the implementation I would like to see, to be honest. Start
with the observation that we can treat Mac OS X NFD paths as a
client character encoding. Now observe that it is lossy. But
... almost all non-Unicode
On Sun, Jan 29, 2012 at 07:38:44PM +0900, Hiroaki Nakamura wrote:
Hi folks!
I read the note about unicode compositions for filenames
http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames
and would like to drive the discussion.
Hi,
I am very happy to hear
On 30.01.2012 13:30, Stefan Sperling wrote:
On Sun, Jan 29, 2012 at 07:38:44PM +0900, Hiroaki Nakamura wrote:
Hi folks!
I read the note about unicode compositions for filenames
http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames
and would like to drive
Hi,
Von: Stefan Sperling [mailto:s...@elego.de]
On Sun, Jan 29, 2012 at 07:38:44PM +0900, Hiroaki Nakamura wrote:
I read the note about unicode compositions for filenames
http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames
and would like to drive the
[Stefan Sperling]
We could also open the parent directory, read all the filenames
within it, normalise them all, and then search the resulting
list. This works, expect if a name exists twice, once in NFC form
and once in NFD form. We'd somehow have to solve the name collision
in the
Hi, Peter,
Von: Peter Samuelson [mailto:pe...@p12n.org]
[Stefan Sperling]
We could also open the parent directory, read all the filenames
within it, normalise them all, and then search the resulting list.
This works, expect if a name exists twice, once in NFC form and once
in NFD
Let me just note some of the main similarities and differences between this
issue of Unicode compositions and the issue of case-sensitivity in file names.
Differences:
* NFC and NFD look the same when
displayed, and most users haven't heard of them and don't expect that a
computer might
On 01/30/2012 02:00 PM, Markus Schaber wrote:
Maybe the best solution to this issue is a client-only solution, in a similar
way the case sensitivity problem is tackled.
Spinning the client-only thought a bit: Imagine a repos with a un*x user
adding a file called föö. Now an OSX user checks it
On Tue, Jan 31, 2012 at 01:42:21AM +0900, Hiroaki Nakamura wrote:
2012/1/30 Stefan Sperling s...@elego.de:
My friend is not willing to upgrade to a new client version yet, which
is fine because all 1.x releases of Subversion clients are supposed
to be compatible with all 1.y releases of
On Mon, Jan 30, 2012 at 8:10 PM, Stefan Sperling s...@elego.de wrote:
On Tue, Jan 31, 2012 at 01:42:21AM +0900, Hiroaki Nakamura wrote:
2012/1/30 Stefan Sperling s...@elego.de:
[ ... ]
And mixing various unicode forms works fine today if the filesystem
used by the client supports this. The
On 30.01.2012 21:00, Johan Corveleyn wrote:
On Mon, Jan 30, 2012 at 8:10 PM, Stefan Sperling s...@elego.de wrote:
On Tue, Jan 31, 2012 at 01:42:21AM +0900, Hiroaki Nakamura wrote:
2012/1/30 Stefan Sperling s...@elego.de:
[ ... ]
And mixing various unicode forms works fine today if the
On Mon, Jan 30, 2012 at 9:09 PM, Branko Čibej br...@xbc.nu wrote:
On 30.01.2012 21:00, Johan Corveleyn wrote:
On Mon, Jan 30, 2012 at 8:10 PM, Stefan Sperling s...@elego.de wrote:
On Tue, Jan 31, 2012 at 01:42:21AM +0900, Hiroaki Nakamura wrote:
2012/1/30 Stefan Sperling s...@elego.de:
[ ...
On Mon, Jan 30, 2012 at 09:09:22PM +0100, Branko Čibej wrote:
Are you seriously proposing that we /support/ such broken, hackish
nonsense? How do you expect users to tell the difference between file
names that look identical on the character level, but are not on the
code point level?
On 30.01.2012 21:29, Stefan Sperling wrote:
On Mon, Jan 30, 2012 at 09:09:22PM +0100, Branko Čibej wrote:
Are you seriously proposing that we /support/ such broken, hackish
nonsense? How do you expect users to tell the difference between file
names that look identical on the character level,
On Mon, Jan 30, 2012 at 09:34:03PM +0100, Branko Čibej wrote:
Sure, if you want to turn on such normalization, you pretty much have to
dump and reload the repository as well as upgrading all working copies
(again). Either that, or use form-independent comparison on the server,
which isn't such
[Stefan Sperling]
It is indeed harder because we are passing paths verbatim to sqlite.
I doubt having more than one form of a given path in wc.db is fun...
That's the implementation I would like to see, to be honest. Start
with the observation that we can treat Mac OS X NFD paths as a client
On 31.01.2012 00:14, Peter Samuelson wrote:
[Stefan Sperling]
It is indeed harder because we are passing paths verbatim to sqlite.
I doubt having more than one form of a given path in wc.db is fun...
That's the implementation I would like to see, to be honest. Start
with the observation that
-Original Message-
From: Branko Čibej [mailto:br...@xbc.nu]
Sent: maandag 30 januari 2012 16:11
To: dev@subversion.apache.org
Subject: Re: Let's discuss about unicode compositions for filenames!
On 31.01.2012 00:14, Peter Samuelson wrote:
[Stefan Sperling]
It is indeed harder
On 31.01.2012 02:47, Bert Huijben wrote:
Last time we discussed this in depth (a few years ago), Windows didn't
perform the normalization you describe here.
Was this added later? (Any documentation pointers?)
Ouch, you're right ... Windows API doesn't normalize the paths.
-- Brane
Hi folks!
I read the note about unicode compositions for filenames
http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames
and would like to drive the discussion.
First, for me, the short term solution (4) seems too difficult to implement.
It is very complex and
59 matches
Mail list logo