Re: [PATCHv2 0/7] Fix and generalize version sort reordering
On Tue, Dec 20, 2016 at 09:50:42AM +0100, SZEDER Gábor wrote: > > It just seems like the whole thing would conceptually easier if we > > pre-parsed the versions into a sequence of elements, then the comparison > > between any two elements would just walk that sequence. The benefit > > there is that you can implement whatever rules you like for the parsing > > (like "prefer longer suffixes to shorter"), but you know the comparison > > will always be consistent. > > I considered parsing tagnames into prefix, version number and suffix, > and then work from that, but decided against it. > > versioncmp() is taken from glibc, so I assume that it's thoroughly > tested, even in corner cases (e.g. multiple leading zeros). > Furthermore, I think it's a good thing that by default (i.e. without > suffix reordering) our version sort orders the same way as glibc's > version sort does. Introducing a different algorithm would risk bugs > in the more subtle cases. Fair enough. If it's working, I agree there is risk in changing things. And if you're willing to deal with the bugs, then I'm happy to stand back. :) > Then there are all the weird release suffixes out there, and I didn't > want to decide on a policy for splitting them sanely; don't know > whether there exist any universal rules for this splitting at > all. E.g. one of the packages here has the following version (let's > ignore the fact that because of the '~' this is an invalid refname in > git): I have a hunch that any policy you'd have to set for splitting is going to end up becoming a policy you'll have to use when comparing (or you risk violating the transitivity of your comparison function). But that's just a hunch, not a proof. Again, I'm happy to defer to you if you're the one working on it. -Peff
Re: [PATCHv2 0/7] Fix and generalize version sort reordering
On Wed, Dec 14, 2016 at 6:08 PM, Jeff Kingwrote: > On Thu, Dec 08, 2016 at 03:23:54PM +0100, SZEDER Gábor wrote: > >> > With my patches it looks like this: >> > >> > $ git -c versionsort.prereleasesuffix=-pre \ >> > -c versionsort.prereleasesuffix=-prerelease \ >> > tag -l --sort=version:refname >> > v1.0.0-prerelease1 >> > v1.0.0-pre1 >> > v1.0.0-pre2 >> > v1.0.0 >> >> And when there happen to be more than one matching suffixes starting >> at the same earliest position, then we should pick the longest of >> them. The new patch 6/7 implements this behavior. > > The whole approach taken by the suffix code (before your patches) leaves > me with the nagging feeling that the comparison is not always going to > be transitive (i.e., that "a < b && b < c" does not always imply "a < > c"), which is going to cause nonsensical sorting results. > > And that may be part of the issue your 6/7 fixes. > > I spent some time playing with this the other day, though, and couldn't > come up with a specific example that fails the condition above. > > It just seems like the whole thing would conceptually easier if we > pre-parsed the versions into a sequence of elements, then the comparison > between any two elements would just walk that sequence. The benefit > there is that you can implement whatever rules you like for the parsing > (like "prefer longer suffixes to shorter"), but you know the comparison > will always be consistent. I considered parsing tagnames into prefix, version number and suffix, and then work from that, but decided against it. versioncmp() is taken from glibc, so I assume that it's thoroughly tested, even in corner cases (e.g. multiple leading zeros). Furthermore, I think it's a good thing that by default (i.e. without suffix reordering) our version sort orders the same way as glibc's version sort does. Introducing a different algorithm would risk bugs in the more subtle cases. Then there are all the weird release suffixes out there, and I didn't want to decide on a policy for splitting them sanely; don't know whether there exist any universal rules for this splitting at all. E.g. one of the packages here has the following version (let's ignore the fact that because of the '~' this is an invalid refname in git): 1.1.0~rc1-2ubuntu7-1linuxmint1 Now, it's clear that the version number is "1.1.0", and the user should configure the suffix "~rc" for prerelease reordering. But what about the rest? How should we split it "into a sequence of elements", is it { "1.1.0", "~rc1", "-2ubuntu7", "-1linuxmint1" } or { "1.1.0", "~rc1-2", "ubuntu7-1", "linuxmint1" }? What if there is a hard-working developer who is involved in a lot of Debian derivatives (and derivatives of derivatives...), and, for whatever reason, wants to put derivative-specific versions in a particular order? With my series, or conceptually even with master if it weren't buggy, it's possible to specify the order of suffixes of suffixes, and that dev could do this: $ git -c versionsort.suffix=-rc -c versionsort.suffix=linuxmint -c versionsort.suffix=YADoaDD tag -l --sort=version:refname '1.1.0*' 1.1.0-rc1-2ubuntu7-1linuxmint1 1.1.0-rc1-2ubuntu7-1YADoaDD2 1.1.0 1.1.0-2ubuntu7-1linuxmint1 1.1.0-2ubuntu7-1YADoaDD2 and would get Linux Mint-specific tags before "Yet Another Derivative of a Debian Derivative"-specific ones. Not sure whether this is relevant in practice, but I think it's a nice property nonetheless. (Btw, just for fun, I also found a package version 2.0.0~beta2+isreally1.8.6-0ubuntu1. "isreally". Oh yeah :) > It would also be more efficient, I think (it seems like the sort is > O(nr_tags * lg(nr_tags) * nr_suffixes) due to parsing suffixes in the > comparator). Though that probably doesn't matter much in practice. I don't think there will be more than only a few configured suffixes in any repository. However, if you consider O as "number of starts_with() invocations", then there is an additional suffix_length factor. But then again, these suffixes tend to be short. > I dunno. I think maybe your 6/7 has converged on an equivalent behavior. > And I am certainly not volunteering to re-write it, so if what you have > works, I'm not opposed to it. > > -Peff
Re: [PATCHv2 0/7] Fix and generalize version sort reordering
Jeff Kingwrites: > On Thu, Dec 08, 2016 at 03:23:54PM +0100, SZEDER Gábor wrote: > >> > With my patches it looks like this: >> > >> > $ git -c versionsort.prereleasesuffix=-pre \ >> > -c versionsort.prereleasesuffix=-prerelease \ >> > tag -l --sort=version:refname >> > v1.0.0-prerelease1 >> > v1.0.0-pre1 >> > v1.0.0-pre2 >> > v1.0.0 >> >> And when there happen to be more than one matching suffixes starting >> at the same earliest position, then we should pick the longest of >> them. The new patch 6/7 implements this behavior. > > The whole approach taken by the suffix code (before your patches) leaves > me with the nagging feeling that the comparison is not always going to > be transitive (i.e., that "a < b && b < c" does not always imply "a < > c"), which is going to cause nonsensical sorting results. > > And that may be part of the issue your 6/7 fixes. > > I spent some time playing with this the other day, though, and couldn't > come up with a specific example that fails the condition above. > > It just seems like the whole thing would conceptually easier if we > pre-parsed the versions into a sequence of elements, then the comparison > between any two elements would just walk that sequence. The benefit > there is that you can implement whatever rules you like for the parsing > (like "prefer longer suffixes to shorter"), but you know the comparison > will always be consistent. > > It would also be more efficient, I think (it seems like the sort is > O(nr_tags * lg(nr_tags) * nr_suffixes) due to parsing suffixes in the > comparator). Though that probably doesn't matter much in practice. > > I dunno. I think maybe your 6/7 has converged on an equivalent behavior. > And I am certainly not volunteering to re-write it, so if what you have > works, I'm not opposed to it. I also had worries about transitiveness but couldn't break it after trying for some time. I find your pre-parsing suggestion a great one, not from the point of view of performance, but because I would imagine that the resulting logic would become a lot easier to understand.
Re: [PATCHv2 0/7] Fix and generalize version sort reordering
On Thu, Dec 08, 2016 at 03:23:54PM +0100, SZEDER Gábor wrote: > > With my patches it looks like this: > > > > $ git -c versionsort.prereleasesuffix=-pre \ > > -c versionsort.prereleasesuffix=-prerelease \ > > tag -l --sort=version:refname > > v1.0.0-prerelease1 > > v1.0.0-pre1 > > v1.0.0-pre2 > > v1.0.0 > > And when there happen to be more than one matching suffixes starting > at the same earliest position, then we should pick the longest of > them. The new patch 6/7 implements this behavior. The whole approach taken by the suffix code (before your patches) leaves me with the nagging feeling that the comparison is not always going to be transitive (i.e., that "a < b && b < c" does not always imply "a < c"), which is going to cause nonsensical sorting results. And that may be part of the issue your 6/7 fixes. I spent some time playing with this the other day, though, and couldn't come up with a specific example that fails the condition above. It just seems like the whole thing would conceptually easier if we pre-parsed the versions into a sequence of elements, then the comparison between any two elements would just walk that sequence. The benefit there is that you can implement whatever rules you like for the parsing (like "prefer longer suffixes to shorter"), but you know the comparison will always be consistent. It would also be more efficient, I think (it seems like the sort is O(nr_tags * lg(nr_tags) * nr_suffixes) due to parsing suffixes in the comparator). Though that probably doesn't matter much in practice. I dunno. I think maybe your 6/7 has converged on an equivalent behavior. And I am certainly not volunteering to re-write it, so if what you have works, I'm not opposed to it. -Peff