Re: [PATCH/RFC] core.precomposeunicode is true by default

2013-08-27 Thread Torsten Bögershausen
(Sorry for the somewhat late reply, thanks for review)
Torsten Bögershausen tbo...@web.de writes:

 When core.precomposeunicode was introduced, it was set to false
 by default, to be compatible with older versions of Git.

 Whenever UTF-8 file names are used in a mixed environment,
 the Mac OS users need to find out that this configuration exist
 and set it to true manually.

 There is no measurable performance impact between false and true.

The real reason we default it to auto-sensing in the current code is
for correctness, I think. the new precompose code could be buggy,
and by auto-sensing, we hoped that we would enable it only on
filesystems that the codepath matters.

 A smoother workflow can be achieved for new Git users,
 so change the default to true:

 - Remove the auto-sensing

Why?

 - Rename the internal variable into precompose_unicode,
   and set it to 1 meaning true.

Why the rename?

 - Adjust and clean up test cases

 The configuration core.precomposeunicode is still supported.

Sorry, but I do not quite understand the change.  Is this because
the auto-sensing is not working, or after auto-sensing we do a wrong
thing?  If that is the case, perhaps that is what we should fix?

 diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c
 index 7980abd..5396b91 100644
 --- a/compat/precompose_utf8.c
 +++ b/compat/precompose_utf8.c
 @@ -36,30 +36,6 @@ static size_t has_non_ascii(const char *s, size_t maxlen, 
 size_t *strlen_c)
  }
  
  
 -void probe_utf8_pathname_composition(char *path, int len)
 -{
 -static const char *auml_nfc = \xc3\xa4;
 -static const char *auml_nfd = \x61\xcc\x88;
 -int output_fd;
 -if (precomposed_unicode != -1)
 -return; /* We found it defined in the global config, respect it 
 */
 -strcpy(path + len, auml_nfc);
 -output_fd = open(path, O_CREAT|O_EXCL|O_RDWR, 0600);

So we try to create a path under one name, and ...

 -if (output_fd = 0) {
 -close(output_fd);
 -strcpy(path + len, auml_nfd);
 -/* Indicate to the user, that we can configure it to true */
 -if (!access(path, R_OK))
 -git_config_set(core.precomposeunicode, false);

... see if that path can be seen under its alias.  Why do we set it
to false?  Isn't this the true culprit?

After all, this is not in the reinit codepath, so we know we are
dealing with a repository that was created afresh.


There is nothing wrong with the auto-sensing as such.
The problem for many users today is that we set core.precomposeunicode
to false, when it should be true.

A patch for that comes out in a minute. But first look back and 
collect some experience with core.precomposeunicode.

Lets have a look at the variable precomposed_unicode,
(the one I wanted to rename to be more consistant).
It is controlled by the git config files and
depending on the config it is set like this:
core.precomposeuinicode false - precomposed_unicode = 0
core.precomposeuinicode true  - precomposed_unicode = 1
core.precomposeuinicode not set - precomposed_unicode = -1.

Let's look what precomposed_unicode does and go through a couple
of git operations.

1)
When we create a repo under Mac OS using HFS+,
we want to have precomposed_unicode = 1

2)
When we access a repo from Windows/Linux using SAMBA,
readdir() will return decomposed.
When the repo is created by nonMacOS, core.precomposeunicode is undefined.
The precomposition is off, but should be on, 
precomposed_unicode = -1, but should be = 1

3)
When we access a repo from another Mac OS system using 
SAMBA, NFS or AFP readdir() will return decomposed.
As the repo is created under Mac OS, we have the same case as (1)

4)
When we access a repo from Linux using NFS we can have
precomposed_unicode = 0 (which is technically more correct).
If Linux users do not use decomposed unicode in their file names,
(according to my understanding this is the case), we can use 1
as well as 0:
precomposing an already precomposed string is a no-op, so it doesn't
harm.


5)
When we create a repo under Linux/Windows on a USB-drive,
and run git status under Mac OS, we want precomposed_unicode = 1.

There are few cases where we want to use precomposed_unicode=0:
a) To work around bugs. This may be a short term solution,
  I would rather see bugs to be fixed.
  I'm not aware of any bugs, so please remind me if I missed something.

b) Working with foreign vcs:  E.g. bzr and hg use decomposed unicode,
   so it may be better to use decomposed unicode in git as well.

The simplified V2 patch looks like this (I send it in a seperate mail):

diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c
index 7980abd..95fe849 100644
--- a/compat/precompose_utf8.c
+++ b/compat/precompose_utf8.c
@@ -48,11 +48,8 @@ void probe_utf8_pathname_composition(char *path, int len)
if (output_fd = 0) {
close(output_fd);
strcpy(path + len, auml_nfd);
-   /* Indicate to the 

Re: [PATCH/RFC] core.precomposeunicode is true by default

2013-08-27 Thread Junio C Hamano
Torsten Bögershausen tbo...@web.de writes:

... see if that path can be seen under its alias.  Why do we set it
to false?  Isn't this the true culprit?

After all, this is not in the reinit codepath, so we know we are
dealing with a repository that was created afresh.

 There is nothing wrong with the auto-sensing as such.
 The problem for many users today is that we set core.precomposeunicode
 to false, when it should be true.

I think we are in agreement then.

The code detects a broken filesystem just fine, but what it does
when it finds the filesystem is broken is wrong---it sets the
variable to false.  That makes the whole auto-sensing wrong, and I
think it makes sense to correct that behaviour.

 Let's look what precomposed_unicode does and go through a couple
 of git operations.

 1)
 When we create a repo under Mac OS using HFS+,
 we want to have precomposed_unicode = 1

Yes.

 2)
 When we access a repo from Windows/Linux using SAMBA,

You mean s/repo/repository that resides on HFS+/?

 readdir() will return decomposed.
 When the repo is created by nonMacOS, core.precomposeunicode is undefined.
 The precomposition is off, but should be on, 
 precomposed_unicode = -1, but should be = 1

I do not think UTF-8-MAC is widely available; even if you flip the
bit on, would it help much?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] core.precomposeunicode is true by default

2013-08-27 Thread Torsten Bögershausen
On 27.08.13 16:49, Junio C Hamano wrote:
 Torsten Bögershausen tbo...@web.de writes:

 ... see if that path can be seen under its alias.  Why do we set it
 to false?  Isn't this the true culprit?

 After all, this is not in the reinit codepath, so we know we are
 dealing with a repository that was created afresh.
 There is nothing wrong with the auto-sensing as such.
 The problem for many users today is that we set core.precomposeunicode
 to false, when it should be true.
 I think we are in agreement then.

 The code detects a broken filesystem just fine, but what it does
 when it finds the filesystem is broken is wrong---it sets the
 variable to false.  That makes the whole auto-sensing wrong, and I
 think it makes sense to correct that behaviour.

 Let's look what precomposed_unicode does and go through a couple
 of git operations.

 1)
 When we create a repo under Mac OS using HFS+,
 we want to have precomposed_unicode = 1
 Yes.

 2)
 When we access a repo from Windows/Linux using SAMBA,
 You mean s/repo/repository that resides on HFS+/?
Sorry being unclear here, trying being clearer with an example:
I have a /data/Docs on my linux box, which is handled by git

I export /data/Docs via SAMBA, and use the Finder under Mac OS to have it
mounted on my Mac OS X box:
//tb@Linux/Docs on /Volumes/Docs (smbfs, nodev, nosuid, mounted by tb)
 readdir() will return decomposed.
 When the repo is created by nonMacOS, core.precomposeunicode is undefined.
 The precomposition is off, but should be on, 
 precomposed_unicode = -1, but should be = 1
 I do not think UTF-8-MAC is widely available; even if you flip the
 bit on, would it help much?
In the above example
/data/Docs/.git/config was created by Linux, so it does not have
core.precomposeunicode set, neither false nor true.
The Linux box does not have  UTF-8-MAC under iconv,
but will ignore core.precomposeunicode anyway (since the code is not compiled 
here)

The Mac OS machine sees it under /Volumes/Docs/.git/config
And here we want the precomposition, even if core.precomposeunicode
is not present in the config.

 







--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] core.precomposeunicode is true by default

2013-08-27 Thread Junio C Hamano
Torsten Bögershausen tbo...@web.de writes:

 2)
 When we access a repo from Windows/Linux using SAMBA,
 You mean s/repo/repository that resides on HFS+/?
 Sorry being unclear here, trying being clearer with an example:
 I have a /data/Docs on my linux box, which is handled by git

 I export /data/Docs via SAMBA, and use the Finder under Mac OS to have it
 mounted on my Mac OS X box:
 //tb@Linux/Docs on /Volumes/Docs (smbfs, nodev, nosuid, mounted by tb)
 readdir() will return decomposed.
 When the repo is created by nonMacOS, core.precomposeunicode is undefined.
 The precomposition is off, but should be on, 
 precomposed_unicode = -1, but should be = 1
 I do not think UTF-8-MAC is widely available; even if you flip the
 bit on, would it help much?
 In the above example
 /data/Docs/.git/config was created by Linux, so it does not have
 core.precomposeunicode set, neither false nor true.
 The Linux box does not have  UTF-8-MAC under iconv,
 but will ignore core.precomposeunicode anyway (since the code is not compiled 
 here)

 The Mac OS machine sees it under /Volumes/Docs/.git/config
 And here we want the precomposition, even if core.precomposeunicode
 is not present in the config.

It almost makes me wonder if you want not a per-repository but a
per-machine configuration, i.e. Whichever repository I am
accessing, either on a local filesystem or shared over the network,
readdir() on my box reports wrong paths, and I need correction.

That, or if it hurts, don't do the remote mount then.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] core.precomposeunicode is true by default

2013-08-27 Thread Torsten Bögershausen
On 2013-08-27 18.27, Junio C Hamano wrote:
 Torsten Bögershausen tbo...@web.de writes:
 
 2)
 When we access a repo from Windows/Linux using SAMBA,
 You mean s/repo/repository that resides on HFS+/?
 Sorry being unclear here, trying being clearer with an example:
 I have a /data/Docs on my linux box, which is handled by git

 I export /data/Docs via SAMBA, and use the Finder under Mac OS to have it
 mounted on my Mac OS X box:
 //tb@Linux/Docs on /Volumes/Docs (smbfs, nodev, nosuid, mounted by tb)
 readdir() will return decomposed.
 When the repo is created by nonMacOS, core.precomposeunicode is undefined.
 The precomposition is off, but should be on, 
 precomposed_unicode = -1, but should be = 1
 I do not think UTF-8-MAC is widely available; even if you flip the
 bit on, would it help much?
 In the above example
 /data/Docs/.git/config was created by Linux, so it does not have
 core.precomposeunicode set, neither false nor true.
 The Linux box does not have  UTF-8-MAC under iconv,
 but will ignore core.precomposeunicode anyway (since the code is not 
 compiled here)

 The Mac OS machine sees it under /Volumes/Docs/.git/config
 And here we want the precomposition, even if core.precomposeunicode
 is not present in the config.
 
 It almost makes me wonder if you want not a per-repository but a
 per-machine configuration, i.e. Whichever repository I am
 accessing, either on a local filesystem or shared over the network,
 readdir() on my box reports wrong paths, and I need correction.
 
 That, or if it hurts, don't do the remote mount then.
No, we don't need to be that restrictive.
We already have repository/user/system wide configuration files,
allowing tweeks and this is a good thing.

Re-Re-reading $gmane/188940: 
I am happy having the V2 patch from today being queued, thanks.

As a next step I will have a look into the advice machine.
 





--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] core.precomposeunicode is true by default

2013-07-29 Thread Junio C Hamano
Torsten Bögershausen tbo...@web.de writes:

 When core.precomposeunicode was introduced, it was set to false
 by default, to be compatible with older versions of Git.

 Whenever UTF-8 file names are used in a mixed environment,
 the Mac OS users need to find out that this configuration exist
 and set it to true manually.

 There is no measurable performance impact between false and true.

The real reason we default it to auto-sensing in the current code is
for correctness, I think. the new precompose code could be buggy,
and by auto-sensing, we hoped that we would enable it only on
filesystems that the codepath matters.

 A smoother workflow can be achieved for new Git users,
 so change the default to true:

 - Remove the auto-sensing

Why?

 - Rename the internal variable into precompose_unicode,
   and set it to 1 meaning true.

Why the rename?

 - Adjust and clean up test cases

 The configuration core.precomposeunicode is still supported.

Sorry, but I do not quite understand the change.  Is this because
the auto-sensing is not working, or after auto-sensing we do a wrong
thing?  If that is the case, perhaps that is what we should fix?

 diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c
 index 7980abd..5396b91 100644
 --- a/compat/precompose_utf8.c
 +++ b/compat/precompose_utf8.c
 @@ -36,30 +36,6 @@ static size_t has_non_ascii(const char *s, size_t maxlen, 
 size_t *strlen_c)
  }
  
  
 -void probe_utf8_pathname_composition(char *path, int len)
 -{
 - static const char *auml_nfc = \xc3\xa4;
 - static const char *auml_nfd = \x61\xcc\x88;
 - int output_fd;
 - if (precomposed_unicode != -1)
 - return; /* We found it defined in the global config, respect it 
 */
 - strcpy(path + len, auml_nfc);
 - output_fd = open(path, O_CREAT|O_EXCL|O_RDWR, 0600);

So we try to create a path under one name, and ...

 - if (output_fd = 0) {
 - close(output_fd);
 - strcpy(path + len, auml_nfd);
 - /* Indicate to the user, that we can configure it to true */
 - if (!access(path, R_OK))
 - git_config_set(core.precomposeunicode, false);

... see if that path can be seen under its alias.  Why do we set it
to false?  Isn't this the true culprit?

After all, this is not in the reinit codepath, so we know we are
dealing with a repository that was created afresh.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] core.precomposeunicode is true by default

2013-07-27 Thread Duy Nguyen
On Sat, Jul 27, 2013 at 8:21 AM, Torsten Bögershausen tbo...@web.de wrote:
 When core.precomposeunicode was introduced, it was set to false
 by default, to be compatible with older versions of Git.

 Whenever UTF-8 file names are used in a mixed environment,
 the Mac OS users need to find out that this configuration exist
 and set it to true manually.

 There is no measurable performance impact between false and true.
 A smoother workflow can be achieved for new Git users,
 so change the default to true:

 - Remove the auto-sensing
 - Rename the internal variable into precompose_unicode,
   and set it to 1 meaning true.
 - Adjust and clean up test cases

 The configuration core.precomposeunicode is still supported.

Does this have any effects on non-utf8 users? I'm on utf-8, so this is
not really my concern, that is unless it changes something on LANG=C..
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] core.precomposeunicode is true by default

2013-07-27 Thread Torsten Bögershausen
On 2013-07-27 17.23, Duy Nguyen wrote:
 On Sat, Jul 27, 2013 at 8:21 AM, Torsten Bögershausen tbo...@web.de wrote:
 When core.precomposeunicode was introduced, it was set to false
 by default, to be compatible with older versions of Git.

 Whenever UTF-8 file names are used in a mixed environment,
 the Mac OS users need to find out that this configuration exist
 and set it to true manually.

 There is no measurable performance impact between false and true.
 A smoother workflow can be achieved for new Git users,
 so change the default to true:

 - Remove the auto-sensing
 - Rename the internal variable into precompose_unicode,
   and set it to 1 meaning true.
 - Adjust and clean up test cases

 The configuration core.precomposeunicode is still supported.
 
 Does this have any effects on non-utf8 users? I'm on utf-8, so this is
 not really my concern, that is unless it changes something on LANG=C..
 
Not sure if I fully understand the question.

Mac OS will always use UTF-8, and we can choose between
precomposesd and decomposed.

Windows (Git for Windows == msysgit) uses UTF-8 (precomposed)
Git under cygwin 1.7 uses UTF-8, precomposed.
Git under cygwin 1.5 or git compiled under mingw does not use
UTF-8, but a Windows code page

Linux may use UTF-8 or ISO-8859 or whatever you configure.

This change affects only Mac OS, 
(should this be stated better in the commit MSG?)

And if somebody wants to change a repo between Linux, Windows
and/or Mac OS, everybody should use UTF-8 (precomposed) for filenames,
directories and branches.
(or stick to ASCII)
Does this answer the question?
/Torsten


 
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] core.precomposeunicode is true by default

2013-07-27 Thread Duy Nguyen
On Sun, Jul 28, 2013 at 5:53 AM, Torsten Bögershausen tbo...@web.de wrote:
 Does this have any effects on non-utf8 users? I'm on utf-8, so this is
 not really my concern, that is unless it changes something on LANG=C..

 Not sure if I fully understand the question.

 Mac OS will always use UTF-8, and we can choose between
 precomposesd and decomposed.

 Windows (Git for Windows == msysgit) uses UTF-8 (precomposed)
 Git under cygwin 1.7 uses UTF-8, precomposed.
 Git under cygwin 1.5 or git compiled under mingw does not use
 UTF-8, but a Windows code page

 Linux may use UTF-8 or ISO-8859 or whatever you configure.

 This change affects only Mac OS,
 (should this be stated better in the commit MSG?)

Ah yes, I see precompose_unicode changes default value from -1 to 1,
but I did not see that the whole precompose unicode code is only
active in Mac OS.


 And if somebody wants to change a repo between Linux, Windows
 and/or Mac OS, everybody should use UTF-8 (precomposed) for filenames,
 directories and branches.
 (or stick to ASCII)
 Does this answer the question?

It does, thanks.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC] core.precomposeunicode is true by default

2013-07-26 Thread Torsten Bögershausen
When core.precomposeunicode was introduced, it was set to false
by default, to be compatible with older versions of Git.

Whenever UTF-8 file names are used in a mixed environment,
the Mac OS users need to find out that this configuration exist
and set it to true manually.

There is no measurable performance impact between false and true.
A smoother workflow can be achieved for new Git users,
so change the default to true:

- Remove the auto-sensing
- Rename the internal variable into precompose_unicode,
  and set it to 1 meaning true.
- Adjust and clean up test cases

The configuration core.precomposeunicode is still supported.

Signed-off-by: Torsten Bögershausen tbo...@web.de
---
 Documentation/config.txt |  2 +-
 builtin/init-db.c|  1 -
 cache.h  |  2 +-
 compat/precompose_utf8.c | 28 ++--
 compat/precompose_utf8.h |  1 -
 config.c |  2 +-
 environment.c|  2 +-
 git-compat-util.h|  1 -
 t/t0050-filesystem.sh|  1 +
 t/t3910-mac-os-precompose.sh | 14 --
 t/t7400-submodule-basic.sh   |  1 -
 11 files changed, 7 insertions(+), 48 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index e0b923f..abe4cfa 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -240,7 +240,7 @@ core.precomposeunicode::
This option is only used by Mac OS implementation of Git.
When core.precomposeunicode=true, Git reverts the unicode decomposition
of filenames done by Mac OS. This is useful when sharing a repository
-   between Mac OS and Linux or Windows.
+   between Mac OS and Linux or Windows. True by default.
(Git for Windows 1.7.10 or higher is needed, or Git under cygwin 1.7).
When false, file names are handled fully transparent by Git,
which is backward compatible with older versions of Git.
diff --git a/builtin/init-db.c b/builtin/init-db.c
index 78aa387..08854d5 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -290,7 +290,6 @@ static int create_default_files(const char *template_path)
strcpy(path + len, CoNfIg);
if (!access(path, F_OK))
git_config_set(core.ignorecase, true);
-   probe_utf8_pathname_composition(path, len);
}
 
return reinit;
diff --git a/cache.h b/cache.h
index 4c606ce..abb2cad 100644
--- a/cache.h
+++ b/cache.h
@@ -593,7 +593,7 @@ extern int read_replace_refs;
 extern int fsync_object_files;
 extern int core_preload_index;
 extern int core_apply_sparse_checkout;
-extern int precomposed_unicode;
+extern int precompose_unicode;
 
 /*
  * The character that begins a commented line in user-editable file
diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c
index 7980abd..5396b91 100644
--- a/compat/precompose_utf8.c
+++ b/compat/precompose_utf8.c
@@ -36,30 +36,6 @@ static size_t has_non_ascii(const char *s, size_t maxlen, 
size_t *strlen_c)
 }
 
 
-void probe_utf8_pathname_composition(char *path, int len)
-{
-   static const char *auml_nfc = \xc3\xa4;
-   static const char *auml_nfd = \x61\xcc\x88;
-   int output_fd;
-   if (precomposed_unicode != -1)
-   return; /* We found it defined in the global config, respect it 
*/
-   strcpy(path + len, auml_nfc);
-   output_fd = open(path, O_CREAT|O_EXCL|O_RDWR, 0600);
-   if (output_fd = 0) {
-   close(output_fd);
-   strcpy(path + len, auml_nfd);
-   /* Indicate to the user, that we can configure it to true */
-   if (!access(path, R_OK))
-   git_config_set(core.precomposeunicode, false);
-   /* To be backward compatible, set precomposed_unicode to 0 */
-   precomposed_unicode = 0;
-   strcpy(path + len, auml_nfc);
-   if (unlink(path))
-   die_errno(_(failed to unlink '%s'), path);
-   }
-}
-
-
 void precompose_argv(int argc, const char **argv)
 {
int i = 0;
@@ -67,7 +43,7 @@ void precompose_argv(int argc, const char **argv)
char *newarg;
iconv_t ic_precompose;
 
-   if (precomposed_unicode != 1)
+   if (!precompose_unicode)
return;
 
ic_precompose = iconv_open(repo_encoding, path_encoding);
@@ -130,7 +106,7 @@ struct dirent_prec_psx *precompose_utf8_readdir(PREC_DIR 
*prec_dir)
prec_dir-dirent_nfc-d_ino  = res-d_ino;
prec_dir-dirent_nfc-d_type = res-d_type;
 
-   if ((precomposed_unicode == 1)  has_non_ascii(res-d_name, 
(size_t)-1, NULL)) {
+   if (precompose_unicode  has_non_ascii(res-d_name, 
(size_t)-1, NULL)) {
if (prec_dir-ic_precompose == (iconv_t)-1) {
die(iconv_open(%s,%s) failed, but needed:\n
precomposed unicode is not