Re: [PATCH v2] macos: lazily initialize iconv
Torsten Bögershausen writes: > After cloning, I set "git config core.precomposeunicode true". > > git is v1.7.12.rc2, > git_git/git is with your patch. > > git v1.7.12.rc2: > tb@birne:~/projects/git/linux-2.6> for f in 1 2 3 4 5; do time git status ; > done 2>&1 | egrep "^user|^real|^sys" > real0m0.460s > user0m0.283s > sys 0m0.176s > > With lazy init of iconv: > tb@birne:~/projects/git/linux-2.6> for f in 1 2 3 4 5; do time > ~/projects/git/git_git/git status ; done 2>&1 | egrep "^user|^real|^sys" > > real0m0.462s > user0m0.281s > sys 0m0.177s Thanks. So in short, even in a tree that theoretically should benefit from lazy initialization, it does not make things better in any measuable way. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] macos: lazily initialize iconv
On 10.08.12 20:18, Junio C Hamano wrote: > Torsten Bögershausen writes: > >> On 31.07.12 20:37, Junio C Hamano wrote: >>> In practice, the majority of paths do not have utf8 that needs >>> the canonicalization. Lazily call iconv_open()/iconv_close() to >>> avoid unnecessary overhead. >>> >>> Signed-off-by: Junio C Hamano >>> Helped-by: Ralf Thielow >>> Helped-by: Linus Torvalds >>> --- >>> >>> * This is not even compile tested, so it needs testing and >>>benchmarking, as I do not even know how costly the calls to >>>open/close are when we do not have to call iconv() itself. >>> ... >> Hi Junio, >> >> thanks for the optimization. >> Tested-by: Torsten Bögershausen > Well, I didn't mean the correctness testing without numbers. The > correctness of the patch after a couple of people eyeballed it was > no longer a question. > > If the patch does not give any measuable performance difference to > people who exercise this codepath, it is not worth merging. And > that is not something I can't do myself without a Mac (nor I wish to > have one to be able to do so myself). > > Thanks. Really sorry for the confusion with the percentage, I should have written: The patch is tested OK, and we can even remove 2 lines of code, where we save and restore errno. They are no longer needed. Here some benchmarks on my system (Intel Core 2 Duo @ 3,06 GHz, L2-Cache:3 MB, Memory:4 GB) running git status on the linux source tree. After cloning, I set "git config core.precomposeunicode true". git is v1.7.12.rc2, git_git/git is with your patch. git v1.7.12.rc2: tb@birne:~/projects/git/linux-2.6> for f in 1 2 3 4 5; do time git status ; done 2>&1 | egrep "^user|^real|^sys" real0m0.469s user0m0.283s sys 0m0.175s real0m0.461s user0m0.283s sys 0m0.170s real0m0.460s user0m0.283s sys 0m0.170s real0m0.461s user0m0.283s sys 0m0.177s real0m0.460s user0m0.283s sys 0m0.176s With lazy init of iconv: tb@birne:~/projects/git/linux-2.6> for f in 1 2 3 4 5; do time ~/projects/git/git_git/git status ; done 2>&1 | egrep "^user|^real|^sys" real0m0.463s user0m0.282s sys 0m0.173s real0m0.460s user0m0.281s sys 0m0.172s real0m0.460s user0m0.281s sys 0m0.172s real0m0.463s user0m0.281s sys 0m0.175s real0m0.462s user0m0.281s sys 0m0.177s -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] macos: lazily initialize iconv
Torsten Bögershausen writes: > On 31.07.12 20:37, Junio C Hamano wrote: >> In practice, the majority of paths do not have utf8 that needs >> the canonicalization. Lazily call iconv_open()/iconv_close() to >> avoid unnecessary overhead. >> >> Signed-off-by: Junio C Hamano >> Helped-by: Ralf Thielow >> Helped-by: Linus Torvalds >> --- >> >> * This is not even compile tested, so it needs testing and >>benchmarking, as I do not even know how costly the calls to >>open/close are when we do not have to call iconv() itself. >> ... > Hi Junio, > > thanks for the optimization. > Tested-by: Torsten Bögershausen Well, I didn't mean the correctness testing without numbers. The correctness of the patch after a couple of people eyeballed it was no longer a question. If the patch does not give any measuable performance difference to people who exercise this codepath, it is not worth merging. And that is not something I can't do myself without a Mac (nor I wish to have one to be able to do so myself). Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] macos: lazily initialize iconv
On 31.07.12 20:37, Junio C Hamano wrote: > In practice, the majority of paths do not have utf8 that needs > the canonicalization. Lazily call iconv_open()/iconv_close() to > avoid unnecessary overhead. > > Signed-off-by: Junio C Hamano > Helped-by: Ralf Thielow > Helped-by: Linus Torvalds > --- > > * This is not even compile tested, so it needs testing and >benchmarking, as I do not even know how costly the calls to >open/close are when we do not have to call iconv() itself. > >This reroll corrects an obvious thinko pointed out by Ralf, and >gets rid of an extra iconv_opened field added unnecessarily in >the previous round. > >This was brought up by Linus in http://goo.gl/INWVc > > compat/precompose_utf8.c | 19 ++- > 1 file changed, 14 insertions(+), 5 deletions(-) > > diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c > index d40d1b3..79b5528 100644 > --- a/compat/precompose_utf8.c > +++ b/compat/precompose_utf8.c > @@ -67,7 +67,7 @@ void probe_utf8_pathname_composition(char *path, int len) > > void precompose_argv(int argc, const char **argv) > { > - int i = 0; > + int i; > const char *oldarg; > char *newarg; > iconv_t ic_precompose; > @@ -75,11 +75,19 @@ void precompose_argv(int argc, const char **argv) > if (precomposed_unicode != 1) > return; > > + /* Avoid iconv_open()/iconv_close() if there is nothing to convert */ > + for (i = 0; i < argc; i++) { > + if (has_utf8(argv[i], (size_t)-1, NULL)) > + break; > + } > + if (argc <= i) > + return; /* no utf8 found */ > + > ic_precompose = iconv_open(repo_encoding, path_encoding); > if (ic_precompose == (iconv_t) -1) > return; > > - while (i < argc) { > + for (i = 0; i < argc; i++) { > size_t namelen; > oldarg = argv[i]; > if (has_utf8(oldarg, (size_t)-1, &namelen)) { > @@ -87,7 +95,6 @@ void precompose_argv(int argc, const char **argv) > if (newarg) > argv[i] = newarg; > } > - i++; > } > iconv_close(ic_precompose); > } > @@ -106,8 +113,7 @@ PREC_DIR *precompose_utf8_opendir(const char *dirname) > return NULL; > } else { > int ret_errno = errno; > - prec_dir->ic_precompose = iconv_open(repo_encoding, > path_encoding); > - /* if iconv_open() fails, die() in readdir() if needed */ > + prec_dir->ic_precompose = (iconv_t)-1; > errno = ret_errno; > } > > @@ -136,6 +142,9 @@ struct dirent_prec_psx *precompose_utf8_readdir(PREC_DIR > *prec_dir) > prec_dir->dirent_nfc->d_type = res->d_type; > > if ((precomposed_unicode == 1) && has_utf8(res->d_name, > (size_t)-1, NULL)) { > + if (prec_dir->ic_precompose == (iconv_t)-1) > + prec_dir->ic_precompose = > + iconv_open(repo_encoding, > path_encoding); > if (prec_dir->ic_precompose == (iconv_t)-1) { > die("iconv_open(%s,%s) failed, but needed:\n" > "precomposed unicode is not > supported.\n" Hi Junio, thanks for the optimization. Tested-by: Torsten Bögershausen We can optimize the optimization with another 0.01 % ;-) diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c index 79b5528..93ae5de 100644 --- a/compat/precompose_utf8.c +++ b/compat/precompose_utf8.c @@ -112,9 +112,7 @@ PREC_DIR *precompose_utf8_opendir(const char *dirname) free(prec_dir); return NULL; } else { - int ret_errno = errno; prec_dir->ic_precompose = (iconv_t)-1; - errno = ret_errno; } -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] macos: lazily initialize iconv
Am 2012-07-31 22:39, schrieb Linus Torvalds: On Tue, Jul 31, 2012 at 1:16 PM, Junio C Hamano wrote: Eek. Oh, I agree. Doing a full character set conversion both ways is quite a bit more work. Not just write_entry() codepath that creates the final paths on the filesystem, you would need to touch lstat() calls that check the existence and freshness of the path, once you go that route. I am sure such a change can be made to work, but I am not sure how much we would gain from one. I think it might be interesting. I doubt it matters all that much any more in Western Europe (Unicode really does seem to have largely taken over), but I think Japan still uses Shift-JIS a lot. Although maybe that is starting to fade too. And it really is just a generalization of the OS X filesystem damage. Linus Hi, (I'm on vacation myself, so I migth apologize for answering very late.) I had done a fully fledges conversion back-and-force of file names. Having e.g. ISO-8859-1 on disk and UTF-8 in the repo. The basic idea came from Linus, and it needs conversion for open() fopen(), unlink(), readdir(), stat(), lstat(), rename() (and may be one or two more, I even added support in readlink()). After doing the whole thing ready, I realized that UTF-8 became more and more standard. At the same time people started to enhance msysgit to use UTF-8 as well. (Thanks to the contributors) My feeling was that the "market window" for such a generic file name conversion might be closed. So, is there still a need for such a feature in git? I wouldn't mind to re-base my pathch to master and post it within a couple of days/weeks. Thanks for git and all enhancements. /Torsten -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] macos: lazily initialize iconv
On Tue, Jul 31, 2012 at 1:16 PM, Junio C Hamano wrote: > > Eek. Oh, I agree. Doing a full character set conversion both ways is quite a bit more work. > Not just write_entry() codepath that creates the final paths on the > filesystem, you would need to touch lstat() calls that check the > existence and freshness of the path, once you go that route. I am > sure such a change can be made to work, but I am not sure how much > we would gain from one. I think it might be interesting. I doubt it matters all that much any more in Western Europe (Unicode really does seem to have largely taken over), but I think Japan still uses Shift-JIS a lot. Although maybe that is starting to fade too. And it really is just a generalization of the OS X filesystem damage. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] macos: lazily initialize iconv
Linus Torvalds writes: > On Tue, Jul 31, 2012 at 11:37 AM, Junio C Hamano wrote: >> >> * This is not even compile tested, so it needs testing and >>benchmarking, as I do not even know how costly the calls to >>open/close are when we do not have to call iconv() itself. > > Ok, so it's easily compile-tested: just add > > + COMPAT_OBJS += compat/precompose_utf8.o > + BASIC_CFLAGS += -DPRECOMPOSE_UNICODE > > to the makefile for Linux too. > > Actually testing how well it *works* is hard, since I don't really > have a mac (well, I do, but it no longer has OS X on it ;), and the > whole "utf-8-mac" thing does not make sense. Also the motivation for this change (not the original utf-8-mac one, which is not my code) is about not paying unnecessary iconv_open() overhead when we do not have to, so the measurement has to happen on Mac, not on Linux. > HOWEVER. I actually tested it with the conversion being from Latin1 to > UTF-8 instead, and it does interesting things, and kind of works. I > say "kind of", because for the case of the filesystem being in Latin1, > we actually have to convert things back to the filesystem character > set ... Eek. Not just write_entry() codepath that creates the final paths on the filesystem, you would need to touch lstat() calls that check the existence and freshness of the path, once you go that route. I am sure such a change can be made to work, but I am not sure how much we would gain from one. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] macos: lazily initialize iconv
On Tue, Jul 31, 2012 at 11:37 AM, Junio C Hamano wrote: > > * This is not even compile tested, so it needs testing and >benchmarking, as I do not even know how costly the calls to >open/close are when we do not have to call iconv() itself. Ok, so it's easily compile-tested: just add + COMPAT_OBJS += compat/precompose_utf8.o + BASIC_CFLAGS += -DPRECOMPOSE_UNICODE to the makefile for Linux too. Actually testing how well it *works* is hard, since I don't really have a mac (well, I do, but it no longer has OS X on it ;), and the whole "utf-8-mac" thing does not make sense. HOWEVER. I actually tested it with the conversion being from Latin1 to UTF-8 instead, and it does interesting things, and kind of works. I say "kind of", because for the case of the filesystem being in Latin1, we actually have to convert things back to the filesystem character set when doing "open()" and "lstat()", and this patch obviously doesn't do that, because OS X does the conversion back to NFD on its own. But ACK on the patch. If I had more time, I'd actually be interested to do the generic case of namespace conversion, and we could make this a generic git feature - it's something I wanted to do long ago. However, right now I'm in the merge window and will go on a vacation to Finland after that, so I probably won't get around to it. I do have one suggestion: please rename the "has_utf8()" function to "has_nonascii()" too. Why? Because that's what the function actually does. It has nothing to do with testing for UTF-8 (the utf-8 rules are more complex than just "high bit set"), and *if* I ever get around to doing a more generic character set conversion for the filenames, the decision really would be about non-ASCII, not about non-UTF8. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] macos: lazily initialize iconv
Ralf Thielow writes: > On Tue, Jul 31, 2012 at 8:37 PM, Junio C Hamano wrote: >> + /* Avoid iconv_open()/iconv_close() if there is nothing to convert */ >> + for (i = 0; i < argc; i++) { >> + if (has_utf8(argv[i], (size_t)-1, NULL)) >> + break; >> + } >> + if (argc <= i) >> + return; /* no utf8 found */ > > sorry, but "argc" can never be smaller than "i", right? Yeah, but it is idiomatic to have an inverse of the exit condition of the preceding for loop here to catch an early exit, and writing it as "if (i == argc)", while technically correct, would break the pattern. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] macos: lazily initialize iconv
On Tue, Jul 31, 2012 at 8:37 PM, Junio C Hamano wrote: > + /* Avoid iconv_open()/iconv_close() if there is nothing to convert */ > + for (i = 0; i < argc; i++) { > + if (has_utf8(argv[i], (size_t)-1, NULL)) > + break; > + } > + if (argc <= i) > + return; /* no utf8 found */ sorry, but "argc" can never be smaller than "i", right? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html