Re: [PATCH v2 3/6] match_basename: use strncmp instead of strcmp

2013-03-12 Thread Duy Nguyen
On Wed, Mar 13, 2013 at 3:59 AM, Junio C Hamano  wrote:
>>> strncmp is provided length information which could be taken advantage
>>> by the underlying implementation.
>
> After all, strcmp() could also be optimized to fetch word-at-a-time
> while being careful about not overstepping the page boundary.

It boils down to giving more information to the underlying
implementation with hope that it can be useful for something. Although
at this point I think strncmp vs strcmp vs memcmp may be not worth
doing (we keep explicit length comparison to reduce *cmp calls
though). The gain is relatively small and will become even smaller
after we avoid running exclude on tracked files (which eliminates like
2/3 of the processed entries).
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/6] match_basename: use strncmp instead of strcmp

2013-03-12 Thread Junio C Hamano
Duy Nguyen  writes:

> glibc's C strncmp version does 4-byte comparison at a time when n >=4,
> then fall back to 1-byte for the rest. I don't know if it's faster
> than a plain always 1-byte comparison though. There's also the hand
> written assembly version that compares n from 1..16, not exactly sure
> how this version works yet.

It sounds to me more like "a very popular implementation of
strcmp/strncmp pair found to have more optimized strncmp than
strcmp".  While that may be a good reason to justify this patch,
I do not think it justifies this:

>> strncmp is provided length information which could be taken advantage
>> by the underlying implementation.

After all, strcmp() could also be optimized to fetch word-at-a-time
while being careful about not overstepping the page boundary.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/6] match_basename: use strncmp instead of strcmp

2013-03-10 Thread Duy Nguyen
On Sun, Mar 10, 2013 at 7:11 PM, Antoine Pelisse  wrote:
>>> By the way, if we know the length of the string, we could use memcmp.
>>> This one is allowed to compare 4-bytes at a time (he doesn't care
>>> about end of string). This is true because the value of the length
>>> parameter is no longer "at most".
>>
>> We still need to worry about access violation after NUL when two
>> strings have different lengths. That could be avoided in this
>> particular case, but I think it's too fragile.
>
> Why would we need to compare if the strings don't have the same length
> ? We already do that in combine-diff.c:append_lost().

Watching movie and replying to git@ don't mix. You're right we don't
need to compare if lengths are different. What was I thinking..
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/6] match_basename: use strncmp instead of strcmp

2013-03-10 Thread Antoine Pelisse
>> By the way, if we know the length of the string, we could use memcmp.
>> This one is allowed to compare 4-bytes at a time (he doesn't care
>> about end of string). This is true because the value of the length
>> parameter is no longer "at most".
>
> We still need to worry about access violation after NUL when two
> strings have different lengths. That could be avoided in this
> particular case, but I think it's too fragile.

Why would we need to compare if the strings don't have the same length
? We already do that in combine-diff.c:append_lost().
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/6] match_basename: use strncmp instead of strcmp

2013-03-10 Thread Duy Nguyen
On Sun, Mar 10, 2013 at 6:54 PM, Antoine Pelisse  wrote:
> On Sun, Mar 10, 2013 at 12:43 PM, Antoine Pelisse  wrote:
>> On Sun, Mar 10, 2013 at 11:38 AM, Duy Nguyen  wrote:
>>> glibc's C strncmp version does 4-byte comparison at a time when n >=4,
>>> then fall back to 1-byte for the rest.
>>
>> Looking at this
>> (http://fossies.org/dox/glibc-2.17/strncmp_8c_source.html), it's not
>> exactly true.
>>
>> It would rather be while (n >= 4), manually unroll the loop.
>
> By the way, if we know the length of the string, we could use memcmp.
> This one is allowed to compare 4-bytes at a time (he doesn't care
> about end of string). This is true because the value of the length
> parameter is no longer "at most".

We still need to worry about access violation after NUL when two
strings have different lengths. That could be avoided in this
particular case, but I think it's too fragile.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/6] match_basename: use strncmp instead of strcmp

2013-03-10 Thread Antoine Pelisse
On Sun, Mar 10, 2013 at 12:43 PM, Antoine Pelisse  wrote:
> On Sun, Mar 10, 2013 at 11:38 AM, Duy Nguyen  wrote:
>> glibc's C strncmp version does 4-byte comparison at a time when n >=4,
>> then fall back to 1-byte for the rest.
>
> Looking at this
> (http://fossies.org/dox/glibc-2.17/strncmp_8c_source.html), it's not
> exactly true.
>
> It would rather be while (n >= 4), manually unroll the loop.

By the way, if we know the length of the string, we could use memcmp.
This one is allowed to compare 4-bytes at a time (he doesn't care
about end of string). This is true because the value of the length
parameter is no longer "at most".
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/6] match_basename: use strncmp instead of strcmp

2013-03-10 Thread Antoine Pelisse
On Sun, Mar 10, 2013 at 11:38 AM, Duy Nguyen  wrote:
> glibc's C strncmp version does 4-byte comparison at a time when n >=4,
> then fall back to 1-byte for the rest.

Looking at this
(http://fossies.org/dox/glibc-2.17/strncmp_8c_source.html), it's not
exactly true.

It would rather be while (n >= 4), manually unroll the loop.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/6] match_basename: use strncmp instead of strcmp

2013-03-10 Thread Duy Nguyen
On Sun, Mar 10, 2013 at 2:34 PM, Junio C Hamano  wrote:
> Nguyễn Thái Ngọc Duy   writes:
>
>> strncmp is provided length information which could be taken advantage
>> by the underlying implementation.
>
> I may be missing something fundamental, but I somehow find the above
> does not make any sense.
>
> strcmp(a, b) has to pay attention to NUL in these strings and stop
> comparison.  strncmp(a, b, n) not only has to pay the same attention
> to NUL in the strings, but also needs to stop comparing at n bytes.
>
> In what situation can the latter take advantage of that extra thing
> that it needs to keep track of and operate faster, when n is the
> length of shorter of the two strings?

glibc's C strncmp version does 4-byte comparison at a time when n >=4,
then fall back to 1-byte for the rest. I don't know if it's faster
than a plain always 1-byte comparison though. There's also the hand
written assembly version that compares n from 1..16, not exactly sure
how this version works yet.

>> diff --git a/dir.c b/dir.c
>> index 9960a37..46b24db 100644
>> --- a/dir.c
>> +++ b/dir.c
>> @@ -610,12 +610,14 @@ int match_basename(const char *basename, int 
>> basenamelen,
>>  int flags)
>>  {
>>   if (prefix == patternlen) {
>> - if (!strcmp_icase(pattern, basename))
>> + if (patternlen == basenamelen &&
>> + !strncmp_icase(pattern, basename, patternlen))
>>   return 1;
>
> What happens if you replace this with
>
> if (patternlen == baselen &&
> !strcmp_icase(pattern, basename, patternlen))
>
> and drop the other hunk and run the benchmark again?
>

before  after
user0m0.533s0m0.522s
user0m0.549s0m0.530s
user0m0.550s0m0.534s
user0m0.551s0m0.545s
user0m0.556s0m0.550s
user0m0.557s0m0.552s
user0m0.559s0m0.554s
user0m0.564s0m0.561s
user0m0.567s0m0.565s
user0m0.567s0m0.565s
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/6] match_basename: use strncmp instead of strcmp

2013-03-09 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> strncmp is provided length information which could be taken advantage
> by the underlying implementation.

I may be missing something fundamental, but I somehow find the above
does not make any sense.

strcmp(a, b) has to pay attention to NUL in these strings and stop
comparison.  strncmp(a, b, n) not only has to pay the same attention
to NUL in the strings, but also needs to stop comparing at n bytes.

In what situation can the latter take advantage of that extra thing
that it needs to keep track of and operate faster, when n is the
length of shorter of the two strings?

> diff --git a/dir.c b/dir.c
> index 9960a37..46b24db 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -610,12 +610,14 @@ int match_basename(const char *basename, int 
> basenamelen,
>  int flags)
>  {
>   if (prefix == patternlen) {
> - if (!strcmp_icase(pattern, basename))
> + if (patternlen == basenamelen &&
> + !strncmp_icase(pattern, basename, patternlen))
>   return 1;

What happens if you replace this with

if (patternlen == baselen &&
!strcmp_icase(pattern, basename, patternlen))

and drop the other hunk and run the benchmark again?

>   } else if (flags & EXC_FLAG_ENDSWITH) {
>   if (patternlen - 1 <= basenamelen &&
> - !strcmp_icase(pattern + 1,
> -   basename + basenamelen - patternlen + 1))
> + !strncmp_icase(pattern + 1,
> +basename + basenamelen - patternlen + 1,
> +patternlen - 1))
>   return 1;
>   } else {
>   if (fnmatch_icase(pattern, basename, 0) == 0)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html