Re: [PATCH v2] macos: lazily initialize iconv

2012-08-10 Thread Junio C Hamano
Torsten Bögershausen  writes:

> After cloning, I set "git config core.precomposeunicode true".
>
> git is v1.7.12.rc2,
> git_git/git is with your patch.
> 
> git v1.7.12.rc2:
> tb@birne:~/projects/git/linux-2.6> for f in 1 2 3 4 5; do time git status  ; 
> done 2>&1 | egrep "^user|^real|^sys"
> real0m0.460s
> user0m0.283s
> sys 0m0.176s
>
> With lazy init of iconv:
> tb@birne:~/projects/git/linux-2.6> for f in 1 2 3 4 5; do time 
> ~/projects/git/git_git/git status  ; done 2>&1 | egrep "^user|^real|^sys"
>
> real0m0.462s
> user0m0.281s
> sys 0m0.177s

Thanks.

So in short, even in a tree that theoretically should benefit from
lazy initialization, it does not make things better in any measuable
way.




--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] macos: lazily initialize iconv

2012-08-10 Thread Torsten Bögershausen
On 10.08.12 20:18, Junio C Hamano wrote:
> Torsten Bögershausen  writes:
>
>> On 31.07.12 20:37, Junio C Hamano wrote:
>>> In practice, the majority of paths do not have utf8 that needs
>>> the canonicalization. Lazily call iconv_open()/iconv_close() to
>>> avoid unnecessary overhead.
>>>
>>> Signed-off-by: Junio C Hamano 
>>> Helped-by: Ralf Thielow 
>>> Helped-by: Linus Torvalds 
>>> ---
>>>
>>>  * This is not even compile tested, so it needs testing and
>>>benchmarking, as I do not even know how costly the calls to
>>>open/close are when we do not have to call iconv() itself.
>>> ...
>> Hi Junio,
>>
>> thanks for the optimization.
>> Tested-by: Torsten Bögershausen 
> Well, I didn't mean the correctness testing without numbers.  The
> correctness of the patch after a couple of people eyeballed it was
> no longer a question.
>
> If the patch does not give any measuable performance difference to
> people who exercise this codepath, it is not worth merging.  And
> that is not something I can't do myself without a Mac (nor I wish to
> have one to be able to do so myself).
>
> Thanks.
Really sorry for the confusion with the percentage, I should have written:
The patch is tested OK, and we can even remove 2 lines of code,
where we save and restore errno.
They are no longer needed.

Here some benchmarks on my system
(Intel Core 2 Duo @   3,06 GHz, L2-Cache:3 MB,  Memory:4 GB)
running git status on the linux source tree.

After cloning, I set "git config core.precomposeunicode true".

git is v1.7.12.rc2,
git_git/git is with your patch.

git v1.7.12.rc2:
tb@birne:~/projects/git/linux-2.6> for f in 1 2 3 4 5; do time git status  ; 
done 2>&1 | egrep "^user|^real|^sys"
real0m0.469s
user0m0.283s
sys 0m0.175s

real0m0.461s
user0m0.283s
sys 0m0.170s

real0m0.460s
user0m0.283s
sys 0m0.170s

real0m0.461s
user0m0.283s
sys 0m0.177s

real0m0.460s
user0m0.283s
sys 0m0.176s

With lazy init of iconv:
tb@birne:~/projects/git/linux-2.6> for f in 1 2 3 4 5; do time 
~/projects/git/git_git/git status  ; done 2>&1 | egrep "^user|^real|^sys"
real0m0.463s
user0m0.282s
sys 0m0.173s

real0m0.460s
user0m0.281s
sys 0m0.172s

real0m0.460s
user0m0.281s
sys 0m0.172s

real0m0.463s
user0m0.281s
sys 0m0.175s

real0m0.462s
user0m0.281s
sys 0m0.177s




--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] macos: lazily initialize iconv

2012-08-10 Thread Junio C Hamano
Torsten Bögershausen  writes:

> On 31.07.12 20:37, Junio C Hamano wrote:
>> In practice, the majority of paths do not have utf8 that needs
>> the canonicalization. Lazily call iconv_open()/iconv_close() to
>> avoid unnecessary overhead.
>>
>> Signed-off-by: Junio C Hamano 
>> Helped-by: Ralf Thielow 
>> Helped-by: Linus Torvalds 
>> ---
>>
>>  * This is not even compile tested, so it needs testing and
>>benchmarking, as I do not even know how costly the calls to
>>open/close are when we do not have to call iconv() itself.
>> ...
> Hi Junio,
>
> thanks for the optimization.
> Tested-by: Torsten Bögershausen 

Well, I didn't mean the correctness testing without numbers.  The
correctness of the patch after a couple of people eyeballed it was
no longer a question.

If the patch does not give any measuable performance difference to
people who exercise this codepath, it is not worth merging.  And
that is not something I can't do myself without a Mac (nor I wish to
have one to be able to do so myself).

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] macos: lazily initialize iconv

2012-08-10 Thread Torsten Bögershausen
On 31.07.12 20:37, Junio C Hamano wrote:
> In practice, the majority of paths do not have utf8 that needs
> the canonicalization. Lazily call iconv_open()/iconv_close() to
> avoid unnecessary overhead.
>
> Signed-off-by: Junio C Hamano 
> Helped-by: Ralf Thielow 
> Helped-by: Linus Torvalds 
> ---
>
>  * This is not even compile tested, so it needs testing and
>benchmarking, as I do not even know how costly the calls to
>open/close are when we do not have to call iconv() itself.
>
>This reroll corrects an obvious thinko pointed out by Ralf, and
>gets rid of an extra iconv_opened field added unnecessarily in
>the previous round.
>
>This was brought up by Linus in http://goo.gl/INWVc
>
>  compat/precompose_utf8.c | 19 ++-
>  1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c
> index d40d1b3..79b5528 100644
> --- a/compat/precompose_utf8.c
> +++ b/compat/precompose_utf8.c
> @@ -67,7 +67,7 @@ void probe_utf8_pathname_composition(char *path, int len)
>  
>  void precompose_argv(int argc, const char **argv)
>  {
> - int i = 0;
> + int i;
>   const char *oldarg;
>   char *newarg;
>   iconv_t ic_precompose;
> @@ -75,11 +75,19 @@ void precompose_argv(int argc, const char **argv)
>   if (precomposed_unicode != 1)
>   return;
>  
> + /* Avoid iconv_open()/iconv_close() if there is nothing to convert */
> + for (i = 0; i < argc; i++) {
> + if (has_utf8(argv[i], (size_t)-1, NULL))
> + break;
> + }
> + if (argc <= i)
> + return; /* no utf8 found */
> +
>   ic_precompose = iconv_open(repo_encoding, path_encoding);
>   if (ic_precompose == (iconv_t) -1)
>   return;
>  
> - while (i < argc) {
> + for (i = 0; i < argc; i++) {
>   size_t namelen;
>   oldarg = argv[i];
>   if (has_utf8(oldarg, (size_t)-1, &namelen)) {
> @@ -87,7 +95,6 @@ void precompose_argv(int argc, const char **argv)
>   if (newarg)
>   argv[i] = newarg;
>   }
> - i++;
>   }
>   iconv_close(ic_precompose);
>  }
> @@ -106,8 +113,7 @@ PREC_DIR *precompose_utf8_opendir(const char *dirname)
>   return NULL;
>   } else {
>   int ret_errno = errno;
> - prec_dir->ic_precompose = iconv_open(repo_encoding, 
> path_encoding);
> - /* if iconv_open() fails, die() in readdir() if needed */
> + prec_dir->ic_precompose = (iconv_t)-1;
>   errno = ret_errno;
>   }
>  
> @@ -136,6 +142,9 @@ struct dirent_prec_psx *precompose_utf8_readdir(PREC_DIR 
> *prec_dir)
>   prec_dir->dirent_nfc->d_type = res->d_type;
>  
>   if ((precomposed_unicode == 1) && has_utf8(res->d_name, 
> (size_t)-1, NULL)) {
> + if (prec_dir->ic_precompose == (iconv_t)-1)
> + prec_dir->ic_precompose =
> + iconv_open(repo_encoding, 
> path_encoding);
>   if (prec_dir->ic_precompose == (iconv_t)-1) {
>   die("iconv_open(%s,%s) failed, but needed:\n"
>   "precomposed unicode is not 
> supported.\n"
Hi Junio,

thanks for the optimization.
Tested-by: Torsten Bögershausen 

We can optimize the optimization with another 0.01 %  ;-)

diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c
index 79b5528..93ae5de 100644
--- a/compat/precompose_utf8.c
+++ b/compat/precompose_utf8.c
@@ -112,9 +112,7 @@ PREC_DIR *precompose_utf8_opendir(const char *dirname)
free(prec_dir);
return NULL;
} else {
-   int ret_errno = errno;
prec_dir->ic_precompose = (iconv_t)-1;
-   errno = ret_errno;
}


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] macos: lazily initialize iconv

2012-08-01 Thread Torsten Bögershausen

Am 2012-07-31 22:39, schrieb Linus Torvalds:

On Tue, Jul 31, 2012 at 1:16 PM, Junio C Hamano  wrote:


Eek.


Oh, I agree. Doing a full character set conversion both ways is quite
a bit more work.


Not just write_entry() codepath that creates the final paths on the
filesystem, you would need to touch lstat() calls that check the
existence and freshness of the path, once you go that route.  I am
sure such a change can be made to work, but I am not sure how much
we would gain from one.


I think it might be interesting. I doubt it matters all that much any
more in Western Europe (Unicode really does seem to have largely taken
over), but I think Japan still uses Shift-JIS a lot.

Although maybe that is starting to fade too.

And it really is just a generalization of the OS X filesystem damage.

 Linus


Hi,
(I'm on vacation myself, so I migth apologize for answering very late.)

I had done a fully fledges conversion back-and-force of file names.

Having e.g. ISO-8859-1 on disk and UTF-8 in the repo.

The basic idea came from Linus, and it needs conversion for
open() fopen(), unlink(), readdir(), stat(), lstat(), rename()
(and may be one or two more, I even added support in readlink()).

After doing the whole thing ready, I realized that UTF-8 became more and 
more standard.


At the same time people started to enhance msysgit to use UTF-8 as well. 
(Thanks to the contributors)


My feeling was that the "market window" for such a generic file name 
conversion might be closed.


So, is there still a need for such a feature in git?

I wouldn't mind to re-base my pathch to master and post it within a 
couple of days/weeks.


Thanks for git and all enhancements.
/Torsten




--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] macos: lazily initialize iconv

2012-07-31 Thread Linus Torvalds
On Tue, Jul 31, 2012 at 1:16 PM, Junio C Hamano  wrote:
>
> Eek.

Oh, I agree. Doing a full character set conversion both ways is quite
a bit more work.

> Not just write_entry() codepath that creates the final paths on the
> filesystem, you would need to touch lstat() calls that check the
> existence and freshness of the path, once you go that route.  I am
> sure such a change can be made to work, but I am not sure how much
> we would gain from one.

I think it might be interesting. I doubt it matters all that much any
more in Western Europe (Unicode really does seem to have largely taken
over), but I think Japan still uses Shift-JIS a lot.

Although maybe that is starting to fade too.

And it really is just a generalization of the OS X filesystem damage.

Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] macos: lazily initialize iconv

2012-07-31 Thread Junio C Hamano
Linus Torvalds  writes:

> On Tue, Jul 31, 2012 at 11:37 AM, Junio C Hamano  wrote:
>>
>>  * This is not even compile tested, so it needs testing and
>>benchmarking, as I do not even know how costly the calls to
>>open/close are when we do not have to call iconv() itself.
>
> Ok, so it's easily compile-tested: just add
>
> +   COMPAT_OBJS += compat/precompose_utf8.o
> +   BASIC_CFLAGS += -DPRECOMPOSE_UNICODE
>
> to the makefile for Linux too.
>
> Actually testing how well it *works* is hard, since I don't really
> have a mac (well, I do, but it no longer has OS X on it ;), and the
> whole "utf-8-mac" thing does not make sense.

Also the motivation for this change (not the original utf-8-mac one,
which is not my code) is about not paying unnecessary iconv_open()
overhead when we do not have to, so the measurement has to happen on
Mac, not on Linux.

> HOWEVER. I actually tested it with the conversion being from Latin1 to
> UTF-8 instead, and it does interesting things, and kind of works. I
> say "kind of", because for the case of the filesystem being in Latin1,
> we actually have to convert things back to the filesystem character
> set ...

Eek.

Not just write_entry() codepath that creates the final paths on the
filesystem, you would need to touch lstat() calls that check the
existence and freshness of the path, once you go that route.  I am
sure such a change can be made to work, but I am not sure how much
we would gain from one.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] macos: lazily initialize iconv

2012-07-31 Thread Linus Torvalds
On Tue, Jul 31, 2012 at 11:37 AM, Junio C Hamano  wrote:
>
>  * This is not even compile tested, so it needs testing and
>benchmarking, as I do not even know how costly the calls to
>open/close are when we do not have to call iconv() itself.

Ok, so it's easily compile-tested: just add

+   COMPAT_OBJS += compat/precompose_utf8.o
+   BASIC_CFLAGS += -DPRECOMPOSE_UNICODE

to the makefile for Linux too.

Actually testing how well it *works* is hard, since I don't really
have a mac (well, I do, but it no longer has OS X on it ;), and the
whole "utf-8-mac" thing does not make sense.

HOWEVER. I actually tested it with the conversion being from Latin1 to
UTF-8 instead, and it does interesting things, and kind of works. I
say "kind of", because for the case of the filesystem being in Latin1,
we actually have to convert things back to the filesystem character
set when doing "open()" and "lstat()", and this patch obviously
doesn't do that, because OS X does the conversion back to NFD on its
own.

But ACK on the patch.

If I had more time, I'd actually be interested to do the generic case
of namespace conversion, and we could make this a generic git feature
- it's something I wanted to do long ago. However, right now I'm in
the merge window and will go on a vacation to Finland after that, so I
probably won't get around to it.

I do have one suggestion: please rename the "has_utf8()" function to
"has_nonascii()" too. Why? Because that's what the function actually
does. It has nothing to do with testing for UTF-8 (the utf-8 rules are
more complex than just "high bit set"), and *if* I ever get around to
doing a more generic character set conversion for the filenames, the
decision really would be about non-ASCII, not about non-UTF8.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] macos: lazily initialize iconv

2012-07-31 Thread Junio C Hamano
Ralf Thielow  writes:

> On Tue, Jul 31, 2012 at 8:37 PM, Junio C Hamano  wrote:
>> +   /* Avoid iconv_open()/iconv_close() if there is nothing to convert */
>> +   for (i = 0; i < argc; i++) {
>> +   if (has_utf8(argv[i], (size_t)-1, NULL))
>> +   break;
>> +   }
>> +   if (argc <= i)
>> +   return; /* no utf8 found */
>
> sorry, but "argc" can never be smaller than "i", right?

Yeah, but it is idiomatic to have an inverse of the exit condition
of the preceding for loop here to catch an early exit, and writing
it as "if (i == argc)", while technically correct, would break the
pattern.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] macos: lazily initialize iconv

2012-07-31 Thread Ralf Thielow
On Tue, Jul 31, 2012 at 8:37 PM, Junio C Hamano  wrote:
> +   /* Avoid iconv_open()/iconv_close() if there is nothing to convert */
> +   for (i = 0; i < argc; i++) {
> +   if (has_utf8(argv[i], (size_t)-1, NULL))
> +   break;
> +   }
> +   if (argc <= i)
> +   return; /* no utf8 found */

sorry, but "argc" can never be smaller than "i", right?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html