Re: [PATCH 3/4] pathspec: apply "*.c" optimization from exclude
Nguyen Thai Ngoc Duy writes: >> When you come to strcmp(), you see that string_len is 1, pattern_len >> is 3, and pattern is "oob". string+string_len-pattern_len = "oob", >> one past the beginning of the original string "foob". They match. >> >> Oops? > > Oops indead. I'll need to check exclude code too :( Ok, the one queued on 'pu' has been locally fixed when I sent the message you are responding to. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] pathspec: apply "*.c" optimization from exclude
On Tue, Nov 20, 2012 at 4:20 AM, Junio C Hamano wrote: >> $ time git rev-list --quiet HEAD -- '*.c' >> >> real0m40.770s >> user0m40.290s >> sys 0m0.256s >> >> With the patch >> >> $ time ~/w/git/git rev-list --quiet HEAD -- '*.c' >> >> real0m34.288s >> user0m33.997s >> sys 0m0.205s >> >> The above command is not supposed to be widely popular. > > Hrm, perhaps. I use "git grep -- \*.c" quite a lot, but > haven't seen use case for \*.c in the context of the "log" family. "git diff *.c" is also helpful (maybe "git diff *.[ch]" is more often used). But I suspect I/O dominates in both grep and diff cases. I just try to make sure matching code won't show up in profile. >> +#define PSF_ONESTAR 1 > > Together with the GF_ prefix in the previous, PSF_ prefix needs a > bit of in-code explanation. Is it just an RC3L (random combination > of 3 letters?) I'm pretty sure "PS" stands for pathspec. "F" is probably from fnmatch. Any suggestions? >> @@ -46,6 +46,12 @@ inline int git_fnmatch(const char *pattern, const char >> *string, >> pattern += prefix; >> string += prefix; >> } >> + if (flags & GF_ONESTAR) { >> + int pattern_len = strlen(++pattern); >> + int string_len = strlen(string); >> + return strcmp(pattern, >> + string + string_len - pattern_len); >> + } > > What happens when pattern="foo*oob" and string="foob"? > > The prefix match before this code determines that the literal prefix > in the pattern matches with the early part of the string, and makes > pattern="*oob" and string="b". > > When you come to strcmp(), you see that string_len is 1, pattern_len > is 3, and pattern is "oob". string+string_len-pattern_len = "oob", > one past the beginning of the original string "foob". They match. > > Oops? Oops indead. I'll need to check exclude code too :( > return (string_len < pattern_len) || > strcmp(pattern, string + string_len - pattern_len); > > perhaps? Yeah. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] pathspec: apply "*.c" optimization from exclude
Nguyễn Thái Ngọc Duy writes: > -O2 build on linux-2.6, without the patch: Before the result, can you briefly explain what '"*.c" optimization from exclude' the title talks about is? When a pattern contains only a single asterisk, e.g. "foo*bar", after literally comparing the leading part "foo" with the string, we can compare the tail of the string and make sure it matches "bar", instead of running fnmatch() on "*bar" against the remainder of the string. The funny thing was that trying to explain what the logic should do makes one realize potential issues in the implementation of that logic ;-) See below. > $ time git rev-list --quiet HEAD -- '*.c' > > real0m40.770s > user0m40.290s > sys 0m0.256s > > With the patch > > $ time ~/w/git/git rev-list --quiet HEAD -- '*.c' > > real0m34.288s > user0m33.997s > sys 0m0.205s > > The above command is not supposed to be widely popular. Hrm, perhaps. I use "git grep -- \*.c" quite a lot, but haven't seen use case for \*.c in the context of the "log" family. > diff --git a/cache.h b/cache.h > index bf031f1..d18f584 100644 > --- a/cache.h > +++ b/cache.h > @@ -473,6 +473,8 @@ extern int index_name_is_other(const struct index_state > *, const char *, int); > extern int ie_match_stat(const struct index_state *, struct cache_entry *, > struct stat *, unsigned int); > extern int ie_modified(const struct index_state *, struct cache_entry *, > struct stat *, unsigned int); > > +#define PSF_ONESTAR 1 Together with the GF_ prefix in the previous, PSF_ prefix needs a bit of in-code explanation. Is it just an RC3L (random combination of 3 letters?) > @@ -46,6 +46,12 @@ inline int git_fnmatch(const char *pattern, const char > *string, > pattern += prefix; > string += prefix; > } > + if (flags & GF_ONESTAR) { > + int pattern_len = strlen(++pattern); > + int string_len = strlen(string); > + return strcmp(pattern, > + string + string_len - pattern_len); > + } What happens when pattern="foo*oob" and string="foob"? The prefix match before this code determines that the literal prefix in the pattern matches with the early part of the string, and makes pattern="*oob" and string="b". When you come to strcmp(), you see that string_len is 1, pattern_len is 3, and pattern is "oob". string+string_len-pattern_len = "oob", one past the beginning of the original string "foob". They match. Oops? return (string_len < pattern_len) || strcmp(pattern, string + string_len - pattern_len); perhaps? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] pathspec: apply "*.c" optimization from exclude
-O2 build on linux-2.6, without the patch: $ time git rev-list --quiet HEAD -- '*.c' real0m40.770s user0m40.290s sys 0m0.256s With the patch $ time ~/w/git/git rev-list --quiet HEAD -- '*.c' real0m34.288s user0m33.997s sys 0m0.205s The above command is not supposed to be widely popular. It's chosen because it exercises pathspec matching a lot. The point is it cuts down matching time for popular patterns like *.c, which could be used as pathspec in other places. Signed-off-by: Nguyễn Thái Ngọc Duy --- cache.h | 3 +++ dir.c | 17 +++-- dir.h | 1 + tree-walk.c | 6 -- 4 files changed, 23 insertions(+), 4 deletions(-) diff --git a/cache.h b/cache.h index bf031f1..d18f584 100644 --- a/cache.h +++ b/cache.h @@ -473,6 +473,8 @@ extern int index_name_is_other(const struct index_state *, const char *, int); extern int ie_match_stat(const struct index_state *, struct cache_entry *, struct stat *, unsigned int); extern int ie_modified(const struct index_state *, struct cache_entry *, struct stat *, unsigned int); +#define PSF_ONESTAR 1 + struct pathspec { const char **raw; /* get_pathspec() result, not freed by free_pathspec() */ int nr; @@ -483,6 +485,7 @@ struct pathspec { const char *match; int len; int nowildcard_len; + int flags; } *items; }; diff --git a/dir.c b/dir.c index e4e6ca1..d00f240 100644 --- a/dir.c +++ b/dir.c @@ -46,6 +46,12 @@ inline int git_fnmatch(const char *pattern, const char *string, pattern += prefix; string += prefix; } + if (flags & GF_ONESTAR) { + int pattern_len = strlen(++pattern); + int string_len = strlen(string); + return strcmp(pattern, + string + string_len - pattern_len); + } return fnmatch(pattern, string, fnm_flags); } @@ -246,7 +252,9 @@ static int match_pathspec_item(const struct pathspec_item *item, int prefix, } if (item->nowildcard_len < item->len && - !git_fnmatch(match, name, 0, item->nowildcard_len - prefix)) + !git_fnmatch(match, name, +item->flags & PSF_ONESTAR ? GF_ONESTAR : 0, +item->nowildcard_len - prefix)) return MATCHED_FNMATCH; return 0; @@ -1446,8 +1454,13 @@ int init_pathspec(struct pathspec *pathspec, const char **paths) item->match = path; item->len = strlen(path); item->nowildcard_len = simple_length(path); - if (item->nowildcard_len < item->len) + item->flags = 0; + if (item->nowildcard_len < item->len) { pathspec->has_wildcard = 1; + if (path[item->nowildcard_len] == '*' && + no_wildcard(path + item->nowildcard_len + 1)) + item->flags |= PSF_ONESTAR; + } } qsort(pathspec->items, pathspec->nr, diff --git a/dir.h b/dir.h index 4cd5074..590532b 100644 --- a/dir.h +++ b/dir.h @@ -143,6 +143,7 @@ extern int fnmatch_icase(const char *pattern, const char *string, int flags); * The prefix part of pattern must not contains wildcards. */ #define GF_PATHNAME 1 +#define GF_ONESTAR 2 extern int git_fnmatch(const char *pattern, const char *string, int flags, int prefix); diff --git a/tree-walk.c b/tree-walk.c index 2fcf3c0..42fe610 100644 --- a/tree-walk.c +++ b/tree-walk.c @@ -628,7 +628,8 @@ enum interesting tree_entry_interesting(const struct name_entry *entry, if (item->nowildcard_len < item->len) { if (!git_fnmatch(match + baselen, entry->path, -0, item->nowildcard_len - baselen)) +item->flags & PSF_ONESTAR ? GF_ONESTAR : 0, +item->nowildcard_len - baselen)) return entry_interesting; /* @@ -654,7 +655,8 @@ match_wildcards: strbuf_add(base, entry->path, pathlen); if (!git_fnmatch(match, base->buf + base_offset, -0, item->nowildcard_len)) { +item->flags & PSF_ONESTAR ? GF_ONESTAR : 0, +item->nowildcard_len)) { strbuf_setlen(base, base_offset + baselen); return entry_interesting; } -- 1.8.0.rc2.23.g1fb49df -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html