Re: [PATCH/RFC] core.precomposeunicode is true by default

2013-08-27 Thread Torsten Bögershausen
On 2013-08-27 18.27, Junio C Hamano wrote:
> Torsten Bögershausen  writes:
> 
 2)
 When we access a repo from Windows/Linux using SAMBA,
>>> You mean s/repo/repository that resides on HFS+/?
>> Sorry being unclear here, trying being clearer with an example:
>> I have a /data/Docs on my linux box, which is handled by git
>>
>> I export /data/Docs via SAMBA, and use the Finder under Mac OS to have it
>> mounted on my Mac OS X box:
>> //tb@Linux/Docs on /Volumes/Docs (smbfs, nodev, nosuid, mounted by tb)
 readdir() will return decomposed.
 When the repo is created by nonMacOS, core.precomposeunicode is undefined.
 The precomposition is off, but should be on, 
 precomposed_unicode = -1, but should be = 1
>>> I do not think UTF-8-MAC is widely available; even if you flip the
>>> bit on, would it help much?
>> In the above example
>> /data/Docs/.git/config was created by Linux, so it does not have
>> core.precomposeunicode set, neither false nor true.
>> The Linux box does not have  UTF-8-MAC under iconv,
>> but will ignore core.precomposeunicode anyway (since the code is not 
>> compiled here)
>>
>> The Mac OS machine sees it under /Volumes/Docs/.git/config
>> And here we want the precomposition, even if core.precomposeunicode
>> is not present in the config.
> 
> It almost makes me wonder if you want not a per-repository but a
> per-machine configuration, i.e. "Whichever repository I am
> accessing, either on a local filesystem or shared over the network,
> readdir() on my box reports wrong paths, and I need correction."
> 
> That, or "if it hurts, don't do the remote mount then."
No, we don't need to be that restrictive.
We already have repository/user/system wide configuration files,
allowing tweeks and this is a good thing.

Re-Re-reading $gmane/188940: 
I am happy having the V2 patch from today being queued, thanks.

As a next step I will have a look into the advice machine.
 





--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] core.precomposeunicode is true by default

2013-08-27 Thread Junio C Hamano
Torsten Bögershausen  writes:

>>> 2)
>>> When we access a repo from Windows/Linux using SAMBA,
>> You mean s/repo/repository that resides on HFS+/?
> Sorry being unclear here, trying being clearer with an example:
> I have a /data/Docs on my linux box, which is handled by git
>
> I export /data/Docs via SAMBA, and use the Finder under Mac OS to have it
> mounted on my Mac OS X box:
> //tb@Linux/Docs on /Volumes/Docs (smbfs, nodev, nosuid, mounted by tb)
>>> readdir() will return decomposed.
>>> When the repo is created by nonMacOS, core.precomposeunicode is undefined.
>>> The precomposition is off, but should be on, 
>>> precomposed_unicode = -1, but should be = 1
>> I do not think UTF-8-MAC is widely available; even if you flip the
>> bit on, would it help much?
> In the above example
> /data/Docs/.git/config was created by Linux, so it does not have
> core.precomposeunicode set, neither false nor true.
> The Linux box does not have  UTF-8-MAC under iconv,
> but will ignore core.precomposeunicode anyway (since the code is not compiled 
> here)
>
> The Mac OS machine sees it under /Volumes/Docs/.git/config
> And here we want the precomposition, even if core.precomposeunicode
> is not present in the config.

It almost makes me wonder if you want not a per-repository but a
per-machine configuration, i.e. "Whichever repository I am
accessing, either on a local filesystem or shared over the network,
readdir() on my box reports wrong paths, and I need correction."

That, or "if it hurts, don't do the remote mount then."
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] core.precomposeunicode is true by default

2013-08-27 Thread Torsten Bögershausen
On 27.08.13 16:49, Junio C Hamano wrote:
> Torsten Bögershausen  writes:
>
>>> ... see if that path can be seen under its alias.  Why do we set it
>>> to "false"?  Isn't this the true culprit?
>>>
>>> After all, this is not in the "reinit" codepath, so we know we are
>>> dealing with a repository that was created afresh.
>> There is nothing wrong with the auto-sensing as such.
>> The problem for many users today is that we set core.precomposeunicode
>> to false, when it should be true.
> I think we are in agreement then.
>
> The code detects a broken filesystem just fine, but what it does
> when it finds the filesystem is broken is wrong---it sets the
> variable to false.  That makes the whole auto-sensing wrong, and I
> think it makes sense to correct that behaviour.
>
>> Let's look what precomposed_unicode does and go through a couple
>> of git operations.
>>
>> 1)
>> When we create a repo under Mac OS using HFS+,
>> we want to have precomposed_unicode = 1
> Yes.
>
>> 2)
>> When we access a repo from Windows/Linux using SAMBA,
> You mean s/repo/repository that resides on HFS+/?
Sorry being unclear here, trying being clearer with an example:
I have a /data/Docs on my linux box, which is handled by git

I export /data/Docs via SAMBA, and use the Finder under Mac OS to have it
mounted on my Mac OS X box:
//tb@Linux/Docs on /Volumes/Docs (smbfs, nodev, nosuid, mounted by tb)
>> readdir() will return decomposed.
>> When the repo is created by nonMacOS, core.precomposeunicode is undefined.
>> The precomposition is off, but should be on, 
>> precomposed_unicode = -1, but should be = 1
> I do not think UTF-8-MAC is widely available; even if you flip the
> bit on, would it help much?
In the above example
/data/Docs/.git/config was created by Linux, so it does not have
core.precomposeunicode set, neither false nor true.
The Linux box does not have  UTF-8-MAC under iconv,
but will ignore core.precomposeunicode anyway (since the code is not compiled 
here)

The Mac OS machine sees it under /Volumes/Docs/.git/config
And here we want the precomposition, even if core.precomposeunicode
is not present in the config.

 







--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] core.precomposeunicode is true by default

2013-08-27 Thread Junio C Hamano
Torsten Bögershausen  writes:

>>... see if that path can be seen under its alias.  Why do we set it
>>to "false"?  Isn't this the true culprit?
>>
>>After all, this is not in the "reinit" codepath, so we know we are
>>dealing with a repository that was created afresh.
>
> There is nothing wrong with the auto-sensing as such.
> The problem for many users today is that we set core.precomposeunicode
> to false, when it should be true.

I think we are in agreement then.

The code detects a broken filesystem just fine, but what it does
when it finds the filesystem is broken is wrong---it sets the
variable to false.  That makes the whole auto-sensing wrong, and I
think it makes sense to correct that behaviour.

> Let's look what precomposed_unicode does and go through a couple
> of git operations.
>
> 1)
> When we create a repo under Mac OS using HFS+,
> we want to have precomposed_unicode = 1

Yes.

> 2)
> When we access a repo from Windows/Linux using SAMBA,

You mean s/repo/repository that resides on HFS+/?

> readdir() will return decomposed.
> When the repo is created by nonMacOS, core.precomposeunicode is undefined.
> The precomposition is off, but should be on, 
> precomposed_unicode = -1, but should be = 1

I do not think UTF-8-MAC is widely available; even if you flip the
bit on, would it help much?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] core.precomposeunicode is true by default

2013-08-27 Thread Torsten Bögershausen
(Sorry for the somewhat late reply, thanks for review)
>Torsten Bögershausen  writes:
>
>> When core.precomposeunicode was introduced, it was set to false
>> by default, to be compatible with older versions of Git.
>>
>> Whenever UTF-8 file names are used in a mixed environment,
>> the Mac OS users need to find out that this configuration exist
>> and set it to true manually.
>>
>> There is no measurable performance impact between false and true.
>
>The real reason we default it to auto-sensing in the current code is
>for correctness, I think. the new precompose code could be buggy,
>and by auto-sensing, we hoped that we would enable it only on
>filesystems that the codepath matters.
>
>> A smoother workflow can be achieved for new Git users,
>> so change the default to true:
>>
>> - Remove the auto-sensing
>
>Why?
>
>> - Rename the internal variable into precompose_unicode,
>>   and set it to 1 meaning true.
>
>Why the rename?
>
>> - Adjust and clean up test cases
>>
>> The configuration core.precomposeunicode is still supported.
>
>Sorry, but I do not quite understand the change.  Is this because
>the auto-sensing is not working, or after auto-sensing we do a wrong
>thing?  If that is the case, perhaps that is what we should fix?
>
>> diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c
>> index 7980abd..5396b91 100644
>> --- a/compat/precompose_utf8.c
>> +++ b/compat/precompose_utf8.c
>> @@ -36,30 +36,6 @@ static size_t has_non_ascii(const char *s, size_t maxlen, 
>> size_t *strlen_c)
>>  }
>>  
>>  
>> -void probe_utf8_pathname_composition(char *path, int len)
>> -{
>> -static const char *auml_nfc = "\xc3\xa4";
>> -static const char *auml_nfd = "\x61\xcc\x88";
>> -int output_fd;
>> -if (precomposed_unicode != -1)
>> -return; /* We found it defined in the global config, respect it 
>> */
>> -strcpy(path + len, auml_nfc);
>> -output_fd = open(path, O_CREAT|O_EXCL|O_RDWR, 0600);
>
>So we try to create a path under one name, and ...
>
>> -if (output_fd >= 0) {
>> -close(output_fd);
>> -strcpy(path + len, auml_nfd);
>> -/* Indicate to the user, that we can configure it to true */
>> -if (!access(path, R_OK))
>> -git_config_set("core.precomposeunicode", "false");
>
>... see if that path can be seen under its alias.  Why do we set it
>to "false"?  Isn't this the true culprit?
>
>After all, this is not in the "reinit" codepath, so we know we are
>dealing with a repository that was created afresh.
>

There is nothing wrong with the auto-sensing as such.
The problem for many users today is that we set core.precomposeunicode
to false, when it should be true.

A patch for that comes out in a minute. But first look back and 
collect some experience with core.precomposeunicode.

Lets have a look at the variable "precomposed_unicode",
(the one I wanted to rename to be more consistant).
It is controlled by the git config files and
depending on the config it is set like this:
core.precomposeuinicode false -> precomposed_unicode = 0
core.precomposeuinicode true  -> precomposed_unicode = 1
core.precomposeuinicode  -> precomposed_unicode = -1.

Let's look what precomposed_unicode does and go through a couple
of git operations.

1)
When we create a repo under Mac OS using HFS+,
we want to have precomposed_unicode = 1

2)
When we access a repo from Windows/Linux using SAMBA,
readdir() will return decomposed.
When the repo is created by nonMacOS, core.precomposeunicode is undefined.
The precomposition is off, but should be on, 
precomposed_unicode = -1, but should be = 1

3)
When we access a repo from another Mac OS system using 
SAMBA, NFS or AFP readdir() will return decomposed.
As the repo is created under Mac OS, we have the same case as (1)

4)
When we access a repo from Linux using NFS we can have
precomposed_unicode = 0 (which is technically more correct).
If Linux users do not use decomposed unicode in their file names,
(according to my understanding this is the case), we can use 1
as well as 0:
precomposing an already precomposed string is a no-op, so it doesn't
harm.


5)
When we create a repo under Linux/Windows on a USB-drive,
and run "git status" under Mac OS, we want precomposed_unicode = 1.

There are few cases where we want to use precomposed_unicode=0:
a) To work around bugs. This may be a short term solution,
  I would rather see bugs to be fixed.
  I'm not aware of any bugs, so please remind me if I missed something.

b) Working with foreign vcs:  E.g. bzr and hg use decomposed unicode,
   so it may be better to use decomposed unicode in git as well.

The simplified V2 patch looks like this (I send it in a seperate mail):

diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c
index 7980abd..95fe849 100644
--- a/compat/precompose_utf8.c
+++ b/compat/precompose_utf8.c
@@ -48,11 +48,8 @@ void probe_utf8_pathname_composition(char *path, int len)
if (output_fd >= 0) {
  

Re: [PATCH/RFC] core.precomposeunicode is true by default

2013-07-29 Thread Junio C Hamano
Torsten Bögershausen  writes:

> When core.precomposeunicode was introduced, it was set to false
> by default, to be compatible with older versions of Git.
>
> Whenever UTF-8 file names are used in a mixed environment,
> the Mac OS users need to find out that this configuration exist
> and set it to true manually.
>
> There is no measurable performance impact between false and true.

The real reason we default it to auto-sensing in the current code is
for correctness, I think. the new precompose code could be buggy,
and by auto-sensing, we hoped that we would enable it only on
filesystems that the codepath matters.

> A smoother workflow can be achieved for new Git users,
> so change the default to true:
>
> - Remove the auto-sensing

Why?

> - Rename the internal variable into precompose_unicode,
>   and set it to 1 meaning true.

Why the rename?

> - Adjust and clean up test cases
>
> The configuration core.precomposeunicode is still supported.

Sorry, but I do not quite understand the change.  Is this because
the auto-sensing is not working, or after auto-sensing we do a wrong
thing?  If that is the case, perhaps that is what we should fix?

> diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c
> index 7980abd..5396b91 100644
> --- a/compat/precompose_utf8.c
> +++ b/compat/precompose_utf8.c
> @@ -36,30 +36,6 @@ static size_t has_non_ascii(const char *s, size_t maxlen, 
> size_t *strlen_c)
>  }
>  
>  
> -void probe_utf8_pathname_composition(char *path, int len)
> -{
> - static const char *auml_nfc = "\xc3\xa4";
> - static const char *auml_nfd = "\x61\xcc\x88";
> - int output_fd;
> - if (precomposed_unicode != -1)
> - return; /* We found it defined in the global config, respect it 
> */
> - strcpy(path + len, auml_nfc);
> - output_fd = open(path, O_CREAT|O_EXCL|O_RDWR, 0600);

So we try to create a path under one name, and ...

> - if (output_fd >= 0) {
> - close(output_fd);
> - strcpy(path + len, auml_nfd);
> - /* Indicate to the user, that we can configure it to true */
> - if (!access(path, R_OK))
> - git_config_set("core.precomposeunicode", "false");

... see if that path can be seen under its alias.  Why do we set it
to "false"?  Isn't this the true culprit?

After all, this is not in the "reinit" codepath, so we know we are
dealing with a repository that was created afresh.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] core.precomposeunicode is true by default

2013-07-27 Thread Duy Nguyen
On Sun, Jul 28, 2013 at 5:53 AM, Torsten Bögershausen  wrote:
>> Does this have any effects on non-utf8 users? I'm on utf-8, so this is
>> not really my concern, that is unless it changes something on LANG=C..
>>
> Not sure if I fully understand the question.
>
> Mac OS will always use UTF-8, and we can choose between
> precomposesd and decomposed.
>
> Windows ("Git for Windows" == msysgit) uses UTF-8 (precomposed)
> Git under cygwin 1.7 uses UTF-8, precomposed.
> Git under cygwin 1.5 or git compiled under mingw does not use
> UTF-8, but a Windows code page"
>
> Linux may use UTF-8 or ISO-8859 or whatever you configure.
>
> This change affects only Mac OS,
> (should this be stated better in the commit MSG?)

Ah yes, I see precompose_unicode changes default value from -1 to 1,
but I did not see that the whole precompose unicode code is only
active in Mac OS.

>
> And if somebody wants to change a repo between Linux, Windows
> and/or Mac OS, everybody should use UTF-8 (precomposed) for filenames,
> directories and branches.
> (or stick to ASCII)
> Does this answer the question?

It does, thanks.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] core.precomposeunicode is true by default

2013-07-27 Thread Torsten Bögershausen
On 2013-07-27 17.23, Duy Nguyen wrote:
> On Sat, Jul 27, 2013 at 8:21 AM, Torsten Bögershausen  wrote:
>> When core.precomposeunicode was introduced, it was set to false
>> by default, to be compatible with older versions of Git.
>>
>> Whenever UTF-8 file names are used in a mixed environment,
>> the Mac OS users need to find out that this configuration exist
>> and set it to true manually.
>>
>> There is no measurable performance impact between false and true.
>> A smoother workflow can be achieved for new Git users,
>> so change the default to true:
>>
>> - Remove the auto-sensing
>> - Rename the internal variable into precompose_unicode,
>>   and set it to 1 meaning true.
>> - Adjust and clean up test cases
>>
>> The configuration core.precomposeunicode is still supported.
> 
> Does this have any effects on non-utf8 users? I'm on utf-8, so this is
> not really my concern, that is unless it changes something on LANG=C..
> 
Not sure if I fully understand the question.

Mac OS will always use UTF-8, and we can choose between
precomposesd and decomposed.

Windows ("Git for Windows" == msysgit) uses UTF-8 (precomposed)
Git under cygwin 1.7 uses UTF-8, precomposed.
Git under cygwin 1.5 or git compiled under mingw does not use
UTF-8, but a Windows code page"

Linux may use UTF-8 or ISO-8859 or whatever you configure.

This change affects only Mac OS, 
(should this be stated better in the commit MSG?)

And if somebody wants to change a repo between Linux, Windows
and/or Mac OS, everybody should use UTF-8 (precomposed) for filenames,
directories and branches.
(or stick to ASCII)
Does this answer the question?
/Torsten


 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] core.precomposeunicode is true by default

2013-07-27 Thread Duy Nguyen
On Sat, Jul 27, 2013 at 8:21 AM, Torsten Bögershausen  wrote:
> When core.precomposeunicode was introduced, it was set to false
> by default, to be compatible with older versions of Git.
>
> Whenever UTF-8 file names are used in a mixed environment,
> the Mac OS users need to find out that this configuration exist
> and set it to true manually.
>
> There is no measurable performance impact between false and true.
> A smoother workflow can be achieved for new Git users,
> so change the default to true:
>
> - Remove the auto-sensing
> - Rename the internal variable into precompose_unicode,
>   and set it to 1 meaning true.
> - Adjust and clean up test cases
>
> The configuration core.precomposeunicode is still supported.

Does this have any effects on non-utf8 users? I'm on utf-8, so this is
not really my concern, that is unless it changes something on LANG=C..
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html