Re: [PATCH 3/3] match_basename: use strncmp instead of strcmp
On Fri, Mar 08, 2013 at 11:50:04PM -0800, Junio C Hamano wrote: At the same time, I wonder if we can take advantage of the fact that these call sites only care about equality and not ordering. I did an RFC-patch for that (that I mistakenly didn't sent as a reply to this e-mail). And I believe that you're correct. My solution is inspired of curl's strequal. Is the reason for git not to care about lower/upper-case for beeing able to support windows? Or is there any other smart reason? I was also thinking about discarding files by looking at their modification date. If the modification timestamp is older than/or equal to the latest commit, there's probably no reason for examine that file any further. I'm not sure about the side effects this may imply though. I think they can be quite nasty. Is this something worth digging more in or am I already on the wrong path? -- Med vänliga hälsningar Fredrik Gustafsson tel: 0733-608274 e-post: iv...@iveqy.com -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] match_basename: use strncmp instead of strcmp
On Sat, Mar 9, 2013 at 2:50 PM, Junio C Hamano gits...@pobox.com wrote: Nguyễn Thái Ngọc Duy pclo...@gmail.com writes: strncmp provides length information, compared to strcmp, which could be taken advantage by the implementation. Even better, we could check if the lengths are equal before calling strncmp, eliminating a bit of strncmp calls. I think I am a bit slower than my usual self tonight, but I am utterly confused by the above. strncmp() compares _only_ up to the first n bytes, so when you are using it for equality, it is not we could check length, but is we MUST check they match to the length of the shorter string, if you want to obtain not just faster but correct result. Am I mistaken? Yeap, the description is a bit misleading. Although you could get away with length check by doing !strncmp(a, b, strlen(a)+1). Even if you are using strcmp() that yields ordering not just equality, it can return a correct result as soon as it hits the first bytes that are different; I doubt using strncmp() contributes to the performance very much. Comparing lengths before doing byte-for-byte comparison could help because you can reject two strings with different lengths without looking at them. At the same time, I wonder if we can take advantage of the fact that these call sites only care about equality and not ordering. I tried to push it further and compare hash before do the actual string comparison. It slowed things down (hopefully because the cost of hashing, the same one from name-hash.c, not because I did it wrong). -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] match_basename: use strncmp instead of strcmp
strncmp provides length information, compared to strcmp, which could be taken advantage by the implementation. Even better, we could check if the lengths are equal before calling strncmp, eliminating a bit of strncmp calls. before after user0m0.519s0m0.489s user0m0.521s0m0.504s user0m0.523s0m0.507s user0m0.532s0m0.510s user0m0.534s0m0.513s user0m0.536s0m0.514s user0m0.537s0m0.522s user0m0.545s0m0.523s user0m0.546s0m0.527s user0m0.548s0m0.529s While at there, fix an inconsistency about pattern/patternlen in how attr handles EXC_FLAG_MUSTBEDIR. When parse_exclude_pattern detects this flag, it sets patternlen _not_ to include the trailing slash and expects the caller to trim it. add_exclude does, parse_attr_line does not. In attr.c, the pattern could be foo/ while patternlen tells us it only has 3 chars. Some functions do not care about patternlen and will see the pattern as foo/ while others may see it as foo. This patch makes patternlen 4 in this case. (Although for a piece of mind, perhaps we should trim it to foo like exclude, and never pass a pathname like abc/ to match_{base,path}name) Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com --- attr.c | 2 ++ dir.c | 8 +--- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/attr.c b/attr.c index e2f9377..1818ba5 100644 --- a/attr.c +++ b/attr.c @@ -255,6 +255,8 @@ static struct match_attr *parse_attr_line(const char *line, const char *src, res-u.pat.patternlen, res-u.pat.flags, res-u.pat.nowildcardlen); + if (res-u.pat.flags EXC_FLAG_MUSTBEDIR) + res-u.pat.patternlen++; if (res-u.pat.flags EXC_FLAG_NEGATIVE) { warning(_(Negative patterns are ignored in git attributes\n Use '\\!' for literal leading exclamation.)); diff --git a/dir.c b/dir.c index f58320d..2a91d14 100644 --- a/dir.c +++ b/dir.c @@ -610,12 +610,14 @@ int match_basename(const char *basename, int basenamelen, int flags) { if (prefix == patternlen) { - if (!strcmp_icase(pattern, basename)) + if (patternlen == basenamelen + !strncmp_icase(pattern, basename, patternlen)) return 1; } else if (flags EXC_FLAG_ENDSWITH) { if (patternlen - 1 = basenamelen - !strcmp_icase(pattern + 1, - basename + basenamelen - patternlen + 1)) + !strncmp_icase(pattern + 1, + basename + basenamelen - patternlen + 1, + patternlen - 1)) return 1; } else { if (fnmatch_icase(pattern, basename, 0) == 0) -- 1.8.1.2.536.gf441e6d -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] match_basename: use strncmp instead of strcmp
Nguyễn Thái Ngọc Duy pclo...@gmail.com writes: strncmp provides length information, compared to strcmp, which could be taken advantage by the implementation. Even better, we could check if the lengths are equal before calling strncmp, eliminating a bit of strncmp calls. I think I am a bit slower than my usual self tonight, but I am utterly confused by the above. strncmp() compares _only_ up to the first n bytes, so when you are using it for equality, it is not we could check length, but is we MUST check they match to the length of the shorter string, if you want to obtain not just faster but correct result. Am I mistaken? Even if you are using strcmp() that yields ordering not just equality, it can return a correct result as soon as it hits the first bytes that are different; I doubt using strncmp() contributes to the performance very much. Comparing lengths before doing byte-for-byte comparison could help because you can reject two strings with different lengths without looking at them. At the same time, I wonder if we can take advantage of the fact that these call sites only care about equality and not ordering. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html