Re: [PATCH 2/4] Remove silent clamp of renameLimit

2017-11-11 Thread brian m. carlson
On Sat, Nov 11, 2017 at 08:39:11AM -0800, Elijah Newren wrote:
> Thanks for pointing out unsigned_mult_overflows; I was unaware of it.
> I think I'd prefer to not use it though; the fact that I had a case
> that genuinely needed a value greater than 2^31 (at least before my
> performance patches) meant that a slightly bigger repo is going to
> eventually need a value over 2^32, so I think we should just cast to a
> type that can hold it.

That's fine.  I had considered this in the context of 64-bit values, but
I suppose that the likelihood of us hitting 2**64 iterations (and
performing reasonably as well) is unlikely.

> I'm curious why you suggest size_t, though.  I have always associated
> that with an amount of memory that will be used, and there's no
> allocation based on this result.  Was I wrong to make that
> association, or is there another good reason here?

I usually like size_t for values that are unsigned and don't need to be
fixed size because it's usually the largest efficient unsigned type.
However, if you want something to handle items larger than 2**32, then I
agree that it's maybe not a great idea if we want it to work on 32-bit
systems.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 2/4] Remove silent clamp of renameLimit

2017-11-11 Thread Elijah Newren
Hi,

On Fri, Nov 10, 2017 at 3:42 PM, brian m. carlson
 wrote:
> On Fri, Nov 10, 2017 at 10:36:17AM -0800, Elijah Newren wrote:

>> Further, the later patch used uint64_t instead of double.  While
>> double works, perhaps I should change the double here to uint64_t for
>> consistency?
>
> I'm wondering if maybe you want to use size_t.  If you end up using an
> unsigned type, you might be able to leverage unsigned_mult_overflows to
> avoid having to write this by hand.

Thanks for pointing out unsigned_mult_overflows; I was unaware of it.
I think I'd prefer to not use it though; the fact that I had a case
that genuinely needed a value greater than 2^31 (at least before my
performance patches) meant that a slightly bigger repo is going to
eventually need a value over 2^32, so I think we should just cast to a
type that can hold it.

I'm curious why you suggest size_t, though.  I have always associated
that with an amount of memory that will be used, and there's no
allocation based on this result.  Was I wrong to make that
association, or is there another good reason here?


Re: [PATCH 2/4] Remove silent clamp of renameLimit

2017-11-10 Thread brian m. carlson
On Fri, Nov 10, 2017 at 10:36:17AM -0800, Elijah Newren wrote:
> Thanks for taking a look!
> 
> On Fri, Nov 10, 2017 at 10:26 AM, Stefan Beller  wrote:
> > From a technical perspective, I would think that if
> > (num_create <= rename_limit || num_src <= rename_limit)
> > holds true, that the double-cast condition would also be always true?
> > Could we just remove that last check?
> 
> Not necessarily.  For example, if num_create = rename_limit-1 and
> num_src = rename_limit+2, then the first condition will be satisfied
> but the second won't.  If it was && rather than ||, then the second
> condition would be superfluous.
> 
> > Or phrased differently, if we can cast to double and extend the check
> > here, do we have to adapt code at other places as well?
> 
> Good point, and yes.  Perhaps I should have re-ordered my patch series
> because I came back to it later and realized that the progress code
> was broken due to overflow/wraparound, and a patch 3 fixed that.
> 
> Further, the later patch used uint64_t instead of double.  While
> double works, perhaps I should change the double here to uint64_t for
> consistency?

I'm wondering if maybe you want to use size_t.  If you end up using an
unsigned type, you might be able to leverage unsigned_mult_overflows to
avoid having to write this by hand.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 2/4] Remove silent clamp of renameLimit

2017-11-10 Thread Elijah Newren
Thanks for taking a look!

On Fri, Nov 10, 2017 at 10:26 AM, Stefan Beller  wrote:

>> -   if (rename_limit <= 0 || rename_limit > 32767)
>> -   rename_limit = 32767;
>> if ((num_create <= rename_limit || num_src <= rename_limit) &&
>> -   (num_create * num_src <= rename_limit * rename_limit))
>> +   ((double)num_create * (double)num_src
>> +<= (double)rename_limit * (double)rename_limit))
>> return 0;
>
> From a technical perspective, I would think that if
> (num_create <= rename_limit || num_src <= rename_limit)
> holds true, that the double-cast condition would also be always true?
> Could we just remove that last check?

Not necessarily.  For example, if num_create = rename_limit-1 and
num_src = rename_limit+2, then the first condition will be satisfied
but the second won't.  If it was && rather than ||, then the second
condition would be superfluous.

> Or phrased differently, if we can cast to double and extend the check
> here, do we have to adapt code at other places as well?

Good point, and yes.  Perhaps I should have re-ordered my patch series
because I came back to it later and realized that the progress code
was broken due to overflow/wraparound, and a patch 3 fixed that.

Further, the later patch used uint64_t instead of double.  While
double works, perhaps I should change the double here to uint64_t for
consistency?


Re: [PATCH 2/4] Remove silent clamp of renameLimit

2017-11-10 Thread Stefan Beller
On Fri, Nov 10, 2017 at 9:39 AM, Elijah Newren  wrote:
> In commit 0024a5492 (Fix the rename detection limit checking; 2007-09-14),
> the renameLimit was clamped to 32767.  This appears to have been to simply
> avoid integer overflow in the following computation:
>
>num_create * num_src <= rename_limit * rename_limit
>
> although it also could be viewed as a hardcoded bound on the amount of CPU
> time we're willing to allow users to tell git to spend on handling
> renames.  An upper bound may make sense, particularly as the computation
> is O(rename_limit^2), but only if the bound is documented and communicated
> to the user -- neither of which were true.
>
> In fact, the silent clamping of the renameLimit to a smaller value and
> lack of reporting of the needed renameLimit when it was too large made it
> appear to the user as though they had used a high enough value; however,
> git would proceed to mess up the merge or cherry-pick badly based on the
> lack of rename detection.  Some hardy folks, despite the lack of feedback
> on the correct limit to choose, were desperate enough to repeatedly retry
> their cherry-picks with increasingly larger renameLimit values (going
> orders of magnitude beyond the built-in limit of 32767), but were
> consistently met with the same failure.
>
> Although large limits can make things slow, we have users who would be
> ecstatic to have a small five file change be correctly cherry picked even
> if they have to manually specify a large limit and it took git ten minutes
> to compute it.
>
> Signed-off-by: Elijah Newren 
> ---
>  diff.c|  2 +-
>  diffcore-rename.c | 11 ---
>  2 files changed, 5 insertions(+), 8 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 6fd288420b..c6597e3231 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -5524,7 +5524,7 @@ void diff_warn_rename_limit(const char *varname, int 
> needed, int degraded_cc)
> warning(_(rename_limit_warning));
> else
> return;
> -   if (0 < needed && needed < 32767)
> +   if (0 < needed)
> warning(_(rename_limit_advice), varname, needed);
>  }
>
> diff --git a/diffcore-rename.c b/diffcore-rename.c
> index 0d8c3d2ee4..7f9a463f5a 100644
> --- a/diffcore-rename.c
> +++ b/diffcore-rename.c
> @@ -391,14 +391,10 @@ static int too_many_rename_candidates(int num_create,
>  * growing larger than a "rename_limit" square matrix, ie:
>  *
>  *num_create * num_src > rename_limit * rename_limit
> -*
> -* but handles the potential overflow case specially (and we
> -* assume at least 32-bit integers)
>  */
> -   if (rename_limit <= 0 || rename_limit > 32767)
> -   rename_limit = 32767;
> if ((num_create <= rename_limit || num_src <= rename_limit) &&
> -   (num_create * num_src <= rename_limit * rename_limit))
> +   ((double)num_create * (double)num_src
> +<= (double)rename_limit * (double)rename_limit))
> return 0;

>From a technical perspective, I would think that if
(num_create <= rename_limit || num_src <= rename_limit)
holds true, that the double-cast condition would also be always true?
Could we just remove that last check?

Or phrased differently, if we can cast to double and extend the check
here, do we have to adapt code at other places as well?

>
> options->needed_rename_limit =
> @@ -415,7 +411,8 @@ static int too_many_rename_candidates(int num_create,
> num_src++;
> }
> if ((num_create <= rename_limit || num_src <= rename_limit) &&
> -   (num_create * num_src <= rename_limit * rename_limit))
> +   ((double)num_create * (double)num_src
> +<= (double)rename_limit * (double)rename_limit))
> return 2;
> return 1;
>  }
> --
> 2.15.0.5.g9567be9905
>