Re: [RFC] Non-normalizing Unicode Composition Awareness

2012-11-09 Thread Thomas Åkesson
Revisiting this thread after a few months. Last spring, I did some work in the Wiki designing a proposal for resolving the Mac Unicode issues in a Non-normalizing manner. I ran out of time, but the thought process has been ongoing. A couple of weeks ago at Subversion Live in London, I had the

Re: [RFC] Non-normalizing Unicode Composition Awareness

2012-11-09 Thread Branko Čibej
On 09.11.2012 12:28, Thomas Åkesson wrote: Today, I noticed that Branko started some implementation in a branch. Looks like a collation based on utf8proc is in the making? I think that would make a lot of sense because the ICU extension poses some challenges in the build process and we

Re: [RFC] Non-normalizing Unicode Composition Awareness

2012-11-09 Thread C. Michael Pilato
On 11/09/2012 07:49 AM, Branko Čibej wrote: On 09.11.2012 12:28, Thomas Åkesson wrote: I'm currently doing the grunt work of implementing the collation (done) and the LIKE and GLOB operators that we'll need (in progress). The next, and biggest, step will be to review the client and WC

Re: [RFC] Non-normalizing Unicode Composition Awareness

2012-11-09 Thread Branko Čibej
On 09.11.2012 14:28, C. Michael Pilato wrote: On 11/09/2012 07:49 AM, Branko Čibej wrote: On 09.11.2012 12:28, Thomas Åkesson wrote: I'm currently doing the grunt work of implementing the collation (done) and the LIKE and GLOB operators that we'll need (in progress). The next, and biggest,

Re: [RFC] Non-normalizing Unicode Composition Awareness

2012-11-09 Thread Thomas Åkesson
On 9 nov 2012, at 14:28, C. Michael Pilato cmpil...@collab.net wrote: On 11/09/2012 07:49 AM, Branko Čibej wrote: On 09.11.2012 12:28, Thomas Åkesson wrote: I'm currently doing the grunt work of implementing the collation (done) and the LIKE and GLOB operators that we'll need (in progress).

Re: [RFC] Non-normalizing Unicode Composition Awareness

2012-04-23 Thread Thomas Åkesson
Hi Philip, Thanks for your comments in the wiki article. They raised some important points and potentially an idea that might simplify the solution. All three paths are in UTF-8 but NFC/NFD is not currently specified. local_relpath/parent_relpath get converted from UTF-8 to whatever

Re: [RFC] Non-normalizing Unicode Composition Awareness

2012-04-23 Thread Philip Martin
Thomas Åkesson tho...@akesson.cc writes: If you, or someone else with WC insight, could provide some details on when/how conversions in the opposite direction is performed (e.g. svn stat and most commands taking path arguments), that would be incredibly useful to me. I would like to explore

Re: [RFC] Non-normalizing Unicode Composition Awareness

2012-04-17 Thread Stefan Sperling
On Tue, Apr 17, 2012 at 05:24:53AM +0200, Thomas Åkesson wrote: I intend to use this script to take the design to the next level of detail. First, I would like some feedback from people with in-depth knowledge of the WC and preferably get some idea on what the community thinks about the

Re: [RFC] Non-normalizing Unicode Composition Awareness

2012-04-16 Thread Thomas Åkesson
Hi, A bit of a status update on the wiki article: http://wiki.apache.org/subversion/NonNormalizingUnicodeCompositionAwareness Received some comments from Daniel, which I have tried to address. Thanks. I have written a bash script which demonstrates the concept of Alternative 1 with regards to

Re: [RFC] Non-normalizing Unicode Composition Awareness

2012-03-25 Thread Thomas Åkesson
Hi, Sorry about the delay, had a release to sort out... I have moved the proposal into the wiki: http://wiki.apache.org/subversion/NonNormalizingUnicodeCompositionAwareness The comments from Julian and Markus have been implemented and I have added more information to the Client Changes section

Re: {SPAM 03.5} Re: [RFC] Non-normalizing Unicode Composition Awareness

2012-02-21 Thread Daniel Shahaf
I've granted you write access to the wiki. Thomas Åkesson wrote on Tue, Feb 14, 2012 at 12:36:23 +0100: Thanks Julian and Markus for providing feedback. I am not commenting below because all the feedback is very good and I will try to address it as best I can in the next iteration.

AW: [RFC] Non-normalizing Unicode Composition Awareness (was: Let's discuss about unicode compositions for filenames!)

2012-02-14 Thread Markus Schaber
. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915 Von: Thomas Åkesson [mailto:tho...@akesson.cc] Gesendet: Dienstag, 14. Februar 2012 01:35 An: Subversion Development Cc: Hiroaki Nakamura; Stefan Sperling Betreff: [RFC] Non-normalizing Unicode

Re: [RFC] Non-normalizing Unicode Composition Awareness

2012-02-14 Thread Julian Foad
Hi Thomas.  It's fantastic that you're taking the trouble to write up this proposal.  That's just what we need.  Just a few initial comments below... Thomas Åkesson wrote: Context === [...] A unicode string (e.g. a file name) can be represented in 2 normalized forms (NFC/NFD) or mixed,

Re: {SPAM 03.5} Re: [RFC] Non-normalizing Unicode Composition Awareness

2012-02-14 Thread Thomas Åkesson
Thanks Julian and Markus for providing feedback. I am not commenting below because all the feedback is very good and I will try to address it as best I can in the next iteration. Describing the behaviour changes to the WC is the most challenging since I lack that kind of detailed knowledge. I

[RFC] Non-normalizing Unicode Composition Awareness (was: Let's discuss about unicode compositions for filenames!)

2012-02-13 Thread Thomas Åkesson
Title: Non-normalizing Unicode Composition Awareness Version: 0.1 (2012-02-14) Context === Within Unicode, some characters can in the unicode standard be represented in 2 different ways (composed/decomposed), while rendered equally on screen or in print. A unicode string (e.g. a file name)