Re: [PATCH 3/3] match_basename: use strncmp instead of strcmp

2013-03-09 Thread Fredrik Gustafsson
On Fri, Mar 08, 2013 at 11:50:04PM -0800, Junio C Hamano wrote:
 At the same time, I wonder if we can take advantage of the fact that
 these call sites only care about equality and not ordering.

I did an RFC-patch for that (that I mistakenly didn't sent as a reply to
this e-mail). And I believe that you're correct. My solution is inspired
of curl's strequal.

Is the reason for git not to care about lower/upper-case for beeing able
to support windows? Or is there any other smart reason?

I was also thinking about discarding files by looking at their
modification date. If the modification timestamp is older than/or equal to
the latest commit, there's probably no reason for examine that file any
further. I'm not sure about the side effects this may imply though. I
think they can be quite nasty. Is this something worth digging more in
or am I already on the wrong path?

-- 
Med vänliga hälsningar
Fredrik Gustafsson

tel: 0733-608274
e-post: iv...@iveqy.com
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] match_basename: use strncmp instead of strcmp

2013-03-09 Thread Duy Nguyen
On Sat, Mar 9, 2013 at 2:50 PM, Junio C Hamano gits...@pobox.com wrote:
 Nguyễn Thái Ngọc Duy  pclo...@gmail.com writes:

 strncmp provides length information, compared to strcmp, which could
 be taken advantage by the implementation. Even better, we could check
 if the lengths are equal before calling strncmp, eliminating a bit of
 strncmp calls.

 I think I am a bit slower than my usual self tonight, but I am
 utterly confused by the above.

 strncmp() compares _only_ up to the first n bytes, so when you are
 using it for equality, it is not we could check length, but is we
 MUST check they match to the length of the shorter string, if you
 want to obtain not just faster but correct result.

 Am I mistaken?

Yeap, the description is a bit misleading. Although you could get away
with length check by doing !strncmp(a, b, strlen(a)+1).

 Even if you are using strcmp() that yields ordering not just
 equality, it can return a correct result as soon as it hits the
 first bytes that are different; I doubt using strncmp() contributes
 to the performance very much.  Comparing lengths before doing
 byte-for-byte comparison could help because you can reject two
 strings with different lengths without looking at them.

 At the same time, I wonder if we can take advantage of the fact that
 these call sites only care about equality and not ordering.

I tried to push it further and compare hash before do the actual
string comparison. It slowed things down (hopefully because the cost
of hashing, the same one from name-hash.c, not because I did it
wrong).
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] match_basename: use strncmp instead of strcmp

2013-03-08 Thread Nguyễn Thái Ngọc Duy
strncmp provides length information, compared to strcmp, which could
be taken advantage by the implementation. Even better, we could check
if the lengths are equal before calling strncmp, eliminating a bit of
strncmp calls.

before  after
user0m0.519s0m0.489s
user0m0.521s0m0.504s
user0m0.523s0m0.507s
user0m0.532s0m0.510s
user0m0.534s0m0.513s
user0m0.536s0m0.514s
user0m0.537s0m0.522s
user0m0.545s0m0.523s
user0m0.546s0m0.527s
user0m0.548s0m0.529s

While at there, fix an inconsistency about pattern/patternlen in how
attr handles EXC_FLAG_MUSTBEDIR. When parse_exclude_pattern detects
this flag, it sets patternlen _not_ to include the trailing slash and
expects the caller to trim it. add_exclude does, parse_attr_line does
not.

In attr.c, the pattern could be foo/ while patternlen tells us it
only has 3 chars. Some functions do not care about patternlen and will
see the pattern as foo/ while others may see it as foo. This patch
makes patternlen 4 in this case. (Although for a piece of mind,
perhaps we should trim it to foo like exclude, and never pass a
pathname like abc/ to match_{base,path}name)

Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com
---
 attr.c | 2 ++
 dir.c  | 8 +---
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/attr.c b/attr.c
index e2f9377..1818ba5 100644
--- a/attr.c
+++ b/attr.c
@@ -255,6 +255,8 @@ static struct match_attr *parse_attr_line(const char *line, 
const char *src,
  res-u.pat.patternlen,
  res-u.pat.flags,
  res-u.pat.nowildcardlen);
+   if (res-u.pat.flags  EXC_FLAG_MUSTBEDIR)
+   res-u.pat.patternlen++;
if (res-u.pat.flags  EXC_FLAG_NEGATIVE) {
warning(_(Negative patterns are ignored in git 
attributes\n
  Use '\\!' for literal leading 
exclamation.));
diff --git a/dir.c b/dir.c
index f58320d..2a91d14 100644
--- a/dir.c
+++ b/dir.c
@@ -610,12 +610,14 @@ int match_basename(const char *basename, int basenamelen,
   int flags)
 {
if (prefix == patternlen) {
-   if (!strcmp_icase(pattern, basename))
+   if (patternlen == basenamelen 
+   !strncmp_icase(pattern, basename, patternlen))
return 1;
} else if (flags  EXC_FLAG_ENDSWITH) {
if (patternlen - 1 = basenamelen 
-   !strcmp_icase(pattern + 1,
- basename + basenamelen - patternlen + 1))
+   !strncmp_icase(pattern + 1,
+  basename + basenamelen - patternlen + 1,
+  patternlen - 1))
return 1;
} else {
if (fnmatch_icase(pattern, basename, 0) == 0)
-- 
1.8.1.2.536.gf441e6d

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] match_basename: use strncmp instead of strcmp

2013-03-08 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy  pclo...@gmail.com writes:

 strncmp provides length information, compared to strcmp, which could
 be taken advantage by the implementation. Even better, we could check
 if the lengths are equal before calling strncmp, eliminating a bit of
 strncmp calls.

I think I am a bit slower than my usual self tonight, but I am
utterly confused by the above.

strncmp() compares _only_ up to the first n bytes, so when you are
using it for equality, it is not we could check length, but is we
MUST check they match to the length of the shorter string, if you
want to obtain not just faster but correct result.

Am I mistaken?

Even if you are using strcmp() that yields ordering not just
equality, it can return a correct result as soon as it hits the
first bytes that are different; I doubt using strncmp() contributes
to the performance very much.  Comparing lengths before doing
byte-for-byte comparison could help because you can reject two
strings with different lengths without looking at them.

At the same time, I wonder if we can take advantage of the fact that
these call sites only care about equality and not ordering.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html