Re: [PATCHv2 0/7] Fix and generalize version sort reordering

2016-12-20 Thread Jeff King
On Tue, Dec 20, 2016 at 09:50:42AM +0100, SZEDER Gábor wrote:

> > It just seems like the whole thing would conceptually easier if we
> > pre-parsed the versions into a sequence of elements, then the comparison
> > between any two elements would just walk that sequence. The benefit
> > there is that you can implement whatever rules you like for the parsing
> > (like "prefer longer suffixes to shorter"), but you know the comparison
> > will always be consistent.
> 
> I considered parsing tagnames into prefix, version number and suffix,
> and then work from that, but decided against it.
> 
> versioncmp() is taken from glibc, so I assume that it's thoroughly
> tested, even in corner cases (e.g. multiple leading zeros).
> Furthermore, I think it's a good thing that by default (i.e. without
> suffix reordering) our version sort orders the same way as glibc's
> version sort does.  Introducing a different algorithm would risk bugs
> in the more subtle cases.

Fair enough. If it's working, I agree there is risk in changing things.
And if you're willing to deal with the bugs, then I'm happy to stand
back. :)

> Then there are all the weird release suffixes out there, and I didn't
> want to decide on a policy for splitting them sanely; don't know
> whether there exist any universal rules for this splitting at
> all.  E.g. one of the packages here has the following version (let's
> ignore the fact that because of the '~' this is an invalid refname in
> git):

I have a hunch that any policy you'd have to set for splitting is going
to end up becoming a policy you'll have to use when comparing (or you
risk violating the transitivity of your comparison function).

But that's just a hunch, not a proof. Again, I'm happy to defer to you
if you're the one working on it.

-Peff


Re: [PATCHv2 0/7] Fix and generalize version sort reordering

2016-12-20 Thread SZEDER Gábor
On Wed, Dec 14, 2016 at 6:08 PM, Jeff King  wrote:
> On Thu, Dec 08, 2016 at 03:23:54PM +0100, SZEDER Gábor wrote:
>
>> > With my patches it looks like this:
>> >
>> > $ git -c versionsort.prereleasesuffix=-pre \
>> >   -c versionsort.prereleasesuffix=-prerelease \
>> >   tag -l --sort=version:refname
>> > v1.0.0-prerelease1
>> > v1.0.0-pre1
>> > v1.0.0-pre2
>> > v1.0.0
>>
>> And when there happen to be more than one matching suffixes starting
>> at the same earliest position, then we should pick the longest of
>> them.  The new patch 6/7 implements this behavior.
>
> The whole approach taken by the suffix code (before your patches) leaves
> me with the nagging feeling that the comparison is not always going to
> be transitive (i.e., that "a < b && b < c" does not always imply "a <
> c"), which is going to cause nonsensical sorting results.
>
> And that may be part of the issue your 6/7 fixes.
>
> I spent some time playing with this the other day, though, and couldn't
> come up with a specific example that fails the condition above.
>
> It just seems like the whole thing would conceptually easier if we
> pre-parsed the versions into a sequence of elements, then the comparison
> between any two elements would just walk that sequence. The benefit
> there is that you can implement whatever rules you like for the parsing
> (like "prefer longer suffixes to shorter"), but you know the comparison
> will always be consistent.

I considered parsing tagnames into prefix, version number and suffix,
and then work from that, but decided against it.

versioncmp() is taken from glibc, so I assume that it's thoroughly
tested, even in corner cases (e.g. multiple leading zeros).
Furthermore, I think it's a good thing that by default (i.e. without
suffix reordering) our version sort orders the same way as glibc's
version sort does.  Introducing a different algorithm would risk bugs
in the more subtle cases.

Then there are all the weird release suffixes out there, and I didn't
want to decide on a policy for splitting them sanely; don't know
whether there exist any universal rules for this splitting at
all.  E.g. one of the packages here has the following version (let's
ignore the fact that because of the '~' this is an invalid refname in
git):

  1.1.0~rc1-2ubuntu7-1linuxmint1

Now, it's clear that the version number is "1.1.0", and the user
should configure the suffix "~rc" for prerelease reordering.  But what
about the rest?  How should we split it "into a sequence of elements",
is it { "1.1.0", "~rc1", "-2ubuntu7", "-1linuxmint1" } or { "1.1.0",
"~rc1-2", "ubuntu7-1", "linuxmint1" }?
What if there is a hard-working developer who is involved in a lot of
Debian derivatives (and derivatives of derivatives...), and, for
whatever reason, wants to put derivative-specific versions in a
particular order?  With my series, or conceptually even with master if
it weren't buggy, it's possible to specify the order of suffixes of
suffixes, and that dev could do this:

  $ git -c versionsort.suffix=-rc
-c versionsort.suffix=linuxmint
-c versionsort.suffix=YADoaDD
tag -l --sort=version:refname
  '1.1.0*'
  1.1.0-rc1-2ubuntu7-1linuxmint1
  1.1.0-rc1-2ubuntu7-1YADoaDD2
  1.1.0
  1.1.0-2ubuntu7-1linuxmint1
  1.1.0-2ubuntu7-1YADoaDD2

and would get Linux Mint-specific tags before "Yet Another Derivative
of a Debian Derivative"-specific ones.  Not sure whether this is
relevant in practice, but I think it's a nice property nonetheless.

(Btw, just for fun, I also found a package version
2.0.0~beta2+isreally1.8.6-0ubuntu1.  "isreally".  Oh yeah :)

> It would also be more efficient, I think (it seems like the sort is
> O(nr_tags * lg(nr_tags) * nr_suffixes) due to parsing suffixes in the
> comparator). Though that probably doesn't matter much in practice.

I don't think there will be more than only a few configured suffixes
in any repository.  However, if you consider O as "number of
starts_with() invocations", then there is an additional suffix_length
factor.  But then again, these suffixes tend to be short.

> I dunno. I think maybe your 6/7 has converged on an equivalent behavior.
> And I am certainly not volunteering to re-write it, so if what you have
> works, I'm not opposed to it.
>
> -Peff


Re: [PATCHv2 0/7] Fix and generalize version sort reordering

2016-12-14 Thread Junio C Hamano
Jeff King  writes:

> On Thu, Dec 08, 2016 at 03:23:54PM +0100, SZEDER Gábor wrote:
>
>> > With my patches it looks like this:
>> > 
>> > $ git -c versionsort.prereleasesuffix=-pre \
>> >   -c versionsort.prereleasesuffix=-prerelease \
>> >   tag -l --sort=version:refname
>> > v1.0.0-prerelease1
>> > v1.0.0-pre1
>> > v1.0.0-pre2
>> > v1.0.0
>> 
>> And when there happen to be more than one matching suffixes starting
>> at the same earliest position, then we should pick the longest of
>> them.  The new patch 6/7 implements this behavior.
>
> The whole approach taken by the suffix code (before your patches) leaves
> me with the nagging feeling that the comparison is not always going to
> be transitive (i.e., that "a < b && b < c" does not always imply "a <
> c"), which is going to cause nonsensical sorting results.
>
> And that may be part of the issue your 6/7 fixes.
>
> I spent some time playing with this the other day, though, and couldn't
> come up with a specific example that fails the condition above.
>
> It just seems like the whole thing would conceptually easier if we
> pre-parsed the versions into a sequence of elements, then the comparison
> between any two elements would just walk that sequence. The benefit
> there is that you can implement whatever rules you like for the parsing
> (like "prefer longer suffixes to shorter"), but you know the comparison
> will always be consistent.
>
> It would also be more efficient, I think (it seems like the sort is
> O(nr_tags * lg(nr_tags) * nr_suffixes) due to parsing suffixes in the
> comparator). Though that probably doesn't matter much in practice.
>
> I dunno. I think maybe your 6/7 has converged on an equivalent behavior.
> And I am certainly not volunteering to re-write it, so if what you have
> works, I'm not opposed to it.

I also had worries about transitiveness but couldn't break it after
trying for some time.  I find your pre-parsing suggestion a great
one, not from the point of view of performance, but because I would
imagine that the resulting logic would become a lot easier to
understand.




Re: [PATCHv2 0/7] Fix and generalize version sort reordering

2016-12-14 Thread Jeff King
On Thu, Dec 08, 2016 at 03:23:54PM +0100, SZEDER Gábor wrote:

> > With my patches it looks like this:
> > 
> > $ git -c versionsort.prereleasesuffix=-pre \
> >   -c versionsort.prereleasesuffix=-prerelease \
> >   tag -l --sort=version:refname
> > v1.0.0-prerelease1
> > v1.0.0-pre1
> > v1.0.0-pre2
> > v1.0.0
> 
> And when there happen to be more than one matching suffixes starting
> at the same earliest position, then we should pick the longest of
> them.  The new patch 6/7 implements this behavior.

The whole approach taken by the suffix code (before your patches) leaves
me with the nagging feeling that the comparison is not always going to
be transitive (i.e., that "a < b && b < c" does not always imply "a <
c"), which is going to cause nonsensical sorting results.

And that may be part of the issue your 6/7 fixes.

I spent some time playing with this the other day, though, and couldn't
come up with a specific example that fails the condition above.

It just seems like the whole thing would conceptually easier if we
pre-parsed the versions into a sequence of elements, then the comparison
between any two elements would just walk that sequence. The benefit
there is that you can implement whatever rules you like for the parsing
(like "prefer longer suffixes to shorter"), but you know the comparison
will always be consistent.

It would also be more efficient, I think (it seems like the sort is
O(nr_tags * lg(nr_tags) * nr_suffixes) due to parsing suffixes in the
comparator). Though that probably doesn't matter much in practice.

I dunno. I think maybe your 6/7 has converged on an equivalent behavior.
And I am certainly not volunteering to re-write it, so if what you have
works, I'm not opposed to it.

-Peff