Re: [PATCH 3/3] match_basename: use strncmp instead of strcmp

2013-03-09 Thread Duy Nguyen
On Sat, Mar 9, 2013 at 2:50 PM, Junio C Hamano  wrote:
> Nguyễn Thái Ngọc Duy   writes:
>
>> strncmp provides length information, compared to strcmp, which could
>> be taken advantage by the implementation. Even better, we could check
>> if the lengths are equal before calling strncmp, eliminating a bit of
>> strncmp calls.
>
> I think I am a bit slower than my usual self tonight, but I am
> utterly confused by the above.
>
> strncmp() compares _only_ up to the first n bytes, so when you are
> using it for equality, it is not "we could check length", but is "we
> MUST check they match to the length of the shorter string", if you
> want to obtain not just faster but correct result.
>
> Am I mistaken?

Yeap, the description is a bit misleading. Although you could get away
with length check by doing !strncmp(a, b, strlen(a)+1).

> Even if you are using strcmp() that yields ordering not just
> equality, it can return a correct result as soon as it hits the
> first bytes that are different; I doubt using strncmp() contributes
> to the performance very much.  Comparing lengths before doing
> byte-for-byte comparison could help because you can reject two
> strings with different lengths without looking at them.
>
> At the same time, I wonder if we can take advantage of the fact that
> these call sites only care about equality and not ordering.

I tried to push it further and compare hash before do the actual
string comparison. It slowed things down (hopefully because the cost
of hashing, the same one from name-hash.c, not because I did it
wrong).
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] match_basename: use strncmp instead of strcmp

2013-03-09 Thread Fredrik Gustafsson
On Fri, Mar 08, 2013 at 11:50:04PM -0800, Junio C Hamano wrote:
> At the same time, I wonder if we can take advantage of the fact that
> these call sites only care about equality and not ordering.

I did an RFC-patch for that (that I mistakenly didn't sent as a reply to
this e-mail). And I believe that you're correct. My solution is inspired
of curl's strequal.

Is the reason for git not to care about lower/upper-case for beeing able
to support windows? Or is there any other smart reason?

I was also thinking about discarding files by looking at their
modification date. If the modification timestamp is older than/or equal to
the latest commit, there's probably no reason for examine that file any
further. I'm not sure about the side effects this may imply though. I
think they can be quite nasty. Is this something worth digging more in
or am I already on the wrong path?

-- 
Med vänliga hälsningar
Fredrik Gustafsson

tel: 0733-608274
e-post: iv...@iveqy.com
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] match_basename: use strncmp instead of strcmp

2013-03-08 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> strncmp provides length information, compared to strcmp, which could
> be taken advantage by the implementation. Even better, we could check
> if the lengths are equal before calling strncmp, eliminating a bit of
> strncmp calls.

I think I am a bit slower than my usual self tonight, but I am
utterly confused by the above.

strncmp() compares _only_ up to the first n bytes, so when you are
using it for equality, it is not "we could check length", but is "we
MUST check they match to the length of the shorter string", if you
want to obtain not just faster but correct result.

Am I mistaken?

Even if you are using strcmp() that yields ordering not just
equality, it can return a correct result as soon as it hits the
first bytes that are different; I doubt using strncmp() contributes
to the performance very much.  Comparing lengths before doing
byte-for-byte comparison could help because you can reject two
strings with different lengths without looking at them.

At the same time, I wonder if we can take advantage of the fact that
these call sites only care about equality and not ordering.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html