Re: [PATCH 3/4] pathspec: apply "*.c" optimization from exclude

2012-11-20 Thread Junio C Hamano
Nguyen Thai Ngoc Duy  writes:

>> When you come to strcmp(), you see that string_len is 1, pattern_len
>> is 3, and pattern is "oob".  string+string_len-pattern_len = "oob",
>> one past the beginning of the original string "foob".  They match.
>>
>> Oops?
>
> Oops indead. I'll need to check exclude code too :(

Ok, the one queued on 'pu' has been locally fixed when I sent the
message you are responding to.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] pathspec: apply "*.c" optimization from exclude

2012-11-20 Thread Nguyen Thai Ngoc Duy
On Tue, Nov 20, 2012 at 4:20 AM, Junio C Hamano  wrote:
>> $ time git rev-list --quiet HEAD -- '*.c'
>>
>> real0m40.770s
>> user0m40.290s
>> sys 0m0.256s
>>
>> With the patch
>>
>> $ time ~/w/git/git rev-list --quiet HEAD -- '*.c'
>>
>> real0m34.288s
>> user0m33.997s
>> sys 0m0.205s
>>
>> The above command is not supposed to be widely popular.
>
> Hrm, perhaps.  I use "git grep  -- \*.c" quite a lot, but
> haven't seen use case for \*.c in the context of the "log" family.

"git diff *.c" is also helpful (maybe "git diff *.[ch]" is more often
used). But I suspect I/O dominates in both grep and diff cases. I just
try to make sure matching code won't show up in profile.


>> +#define PSF_ONESTAR 1
>
> Together with the GF_ prefix in the previous, PSF_ prefix needs a
> bit of in-code explanation.  Is it just an RC3L (random combination
> of 3 letters?)

I'm pretty sure "PS" stands for pathspec. "F" is probably from
fnmatch. Any suggestions?


>> @@ -46,6 +46,12 @@ inline int git_fnmatch(const char *pattern, const char 
>> *string,
>>   pattern += prefix;
>>   string += prefix;
>>   }
>> + if (flags & GF_ONESTAR) {
>> + int pattern_len = strlen(++pattern);
>> + int string_len = strlen(string);
>> + return strcmp(pattern,
>> +   string + string_len - pattern_len);
>> + }
>
> What happens when pattern="foo*oob" and string="foob"?
>
> The prefix match before this code determines that the literal prefix
> in the pattern matches with the early part of the string, and makes
> pattern="*oob" and string="b".
>
> When you come to strcmp(), you see that string_len is 1, pattern_len
> is 3, and pattern is "oob".  string+string_len-pattern_len = "oob",
> one past the beginning of the original string "foob".  They match.
>
> Oops?

Oops indead. I'll need to check exclude code too :(


> return (string_len < pattern_len) ||
> strcmp(pattern, string + string_len - pattern_len);
>
> perhaps?

Yeah.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] pathspec: apply "*.c" optimization from exclude

2012-11-19 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> -O2 build on linux-2.6, without the patch:

Before the result, can you briefly explain what '"*.c" optimization
from exclude' the title talks about is?

When a pattern contains only a single asterisk, e.g. "foo*bar",
after literally comparing the leading part "foo" with the
string, we can compare the tail of the string and make sure it
matches "bar", instead of running fnmatch() on "*bar" against
the remainder of the string.

The funny thing was that trying to explain what the logic should do
makes one realize potential issues in the implementation of that
logic ;-)  See below.

> $ time git rev-list --quiet HEAD -- '*.c'
>
> real0m40.770s
> user0m40.290s
> sys 0m0.256s
>
> With the patch
>
> $ time ~/w/git/git rev-list --quiet HEAD -- '*.c'
>
> real0m34.288s
> user0m33.997s
> sys 0m0.205s
>
> The above command is not supposed to be widely popular.

Hrm, perhaps.  I use "git grep  -- \*.c" quite a lot, but
haven't seen use case for \*.c in the context of the "log" family.

> diff --git a/cache.h b/cache.h
> index bf031f1..d18f584 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -473,6 +473,8 @@ extern int index_name_is_other(const struct index_state 
> *, const char *, int);
>  extern int ie_match_stat(const struct index_state *, struct cache_entry *, 
> struct stat *, unsigned int);
>  extern int ie_modified(const struct index_state *, struct cache_entry *, 
> struct stat *, unsigned int);
>  
> +#define PSF_ONESTAR 1

Together with the GF_ prefix in the previous, PSF_ prefix needs a
bit of in-code explanation.  Is it just an RC3L (random combination
of 3 letters?)

> @@ -46,6 +46,12 @@ inline int git_fnmatch(const char *pattern, const char 
> *string,
>   pattern += prefix;
>   string += prefix;
>   }
> + if (flags & GF_ONESTAR) {
> + int pattern_len = strlen(++pattern);
> + int string_len = strlen(string);
> + return strcmp(pattern,
> +   string + string_len - pattern_len);
> + }

What happens when pattern="foo*oob" and string="foob"?

The prefix match before this code determines that the literal prefix
in the pattern matches with the early part of the string, and makes
pattern="*oob" and string="b".

When you come to strcmp(), you see that string_len is 1, pattern_len
is 3, and pattern is "oob".  string+string_len-pattern_len = "oob",
one past the beginning of the original string "foob".  They match.

Oops?

return (string_len < pattern_len) ||
strcmp(pattern, string + string_len - pattern_len);

perhaps?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] pathspec: apply "*.c" optimization from exclude

2012-11-18 Thread Nguyễn Thái Ngọc Duy
-O2 build on linux-2.6, without the patch:

$ time git rev-list --quiet HEAD -- '*.c'

real0m40.770s
user0m40.290s
sys 0m0.256s

With the patch

$ time ~/w/git/git rev-list --quiet HEAD -- '*.c'

real0m34.288s
user0m33.997s
sys 0m0.205s

The above command is not supposed to be widely popular. It's chosen
because it exercises pathspec matching a lot. The point is it cuts
down matching time for popular patterns like *.c, which could be used
as pathspec in other places.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 cache.h |  3 +++
 dir.c   | 17 +++--
 dir.h   |  1 +
 tree-walk.c |  6 --
 4 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/cache.h b/cache.h
index bf031f1..d18f584 100644
--- a/cache.h
+++ b/cache.h
@@ -473,6 +473,8 @@ extern int index_name_is_other(const struct index_state *, 
const char *, int);
 extern int ie_match_stat(const struct index_state *, struct cache_entry *, 
struct stat *, unsigned int);
 extern int ie_modified(const struct index_state *, struct cache_entry *, 
struct stat *, unsigned int);
 
+#define PSF_ONESTAR 1
+
 struct pathspec {
const char **raw; /* get_pathspec() result, not freed by 
free_pathspec() */
int nr;
@@ -483,6 +485,7 @@ struct pathspec {
const char *match;
int len;
int nowildcard_len;
+   int flags;
} *items;
 };
 
diff --git a/dir.c b/dir.c
index e4e6ca1..d00f240 100644
--- a/dir.c
+++ b/dir.c
@@ -46,6 +46,12 @@ inline int git_fnmatch(const char *pattern, const char 
*string,
pattern += prefix;
string += prefix;
}
+   if (flags & GF_ONESTAR) {
+   int pattern_len = strlen(++pattern);
+   int string_len = strlen(string);
+   return strcmp(pattern,
+ string + string_len - pattern_len);
+   }
return fnmatch(pattern, string, fnm_flags);
 }
 
@@ -246,7 +252,9 @@ static int match_pathspec_item(const struct pathspec_item 
*item, int prefix,
}
 
if (item->nowildcard_len < item->len &&
-   !git_fnmatch(match, name, 0, item->nowildcard_len - prefix))
+   !git_fnmatch(match, name,
+item->flags & PSF_ONESTAR ? GF_ONESTAR : 0,
+item->nowildcard_len - prefix))
return MATCHED_FNMATCH;
 
return 0;
@@ -1446,8 +1454,13 @@ int init_pathspec(struct pathspec *pathspec, const char 
**paths)
item->match = path;
item->len = strlen(path);
item->nowildcard_len = simple_length(path);
-   if (item->nowildcard_len < item->len)
+   item->flags = 0;
+   if (item->nowildcard_len < item->len) {
pathspec->has_wildcard = 1;
+   if (path[item->nowildcard_len] == '*' &&
+   no_wildcard(path + item->nowildcard_len + 1))
+   item->flags |= PSF_ONESTAR;
+   }
}
 
qsort(pathspec->items, pathspec->nr,
diff --git a/dir.h b/dir.h
index 4cd5074..590532b 100644
--- a/dir.h
+++ b/dir.h
@@ -143,6 +143,7 @@ extern int fnmatch_icase(const char *pattern, const char 
*string, int flags);
  * The prefix part of pattern must not contains wildcards.
  */
 #define GF_PATHNAME 1
+#define GF_ONESTAR  2
 
 extern int git_fnmatch(const char *pattern, const char *string,
   int flags, int prefix);
diff --git a/tree-walk.c b/tree-walk.c
index 2fcf3c0..42fe610 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -628,7 +628,8 @@ enum interesting tree_entry_interesting(const struct 
name_entry *entry,
 
if (item->nowildcard_len < item->len) {
if (!git_fnmatch(match + baselen, entry->path,
-0, item->nowildcard_len - 
baselen))
+item->flags & PSF_ONESTAR ? 
GF_ONESTAR : 0,
+item->nowildcard_len - 
baselen))
return entry_interesting;
 
/*
@@ -654,7 +655,8 @@ match_wildcards:
strbuf_add(base, entry->path, pathlen);
 
if (!git_fnmatch(match, base->buf + base_offset,
-0, item->nowildcard_len)) {
+item->flags & PSF_ONESTAR ? GF_ONESTAR : 0,
+item->nowildcard_len)) {
strbuf_setlen(base, base_offset + baselen);
return entry_interesting;
}
-- 
1.8.0.rc2.23.g1fb49df

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html