Re: Versioned releases
On Thu, 2020-06-04 at 23:29 +0200, Bruno Haible wrote: > I disagree on this one. It would make people think that the Nth > commit, or the Monday commit, or whatever, is preferred over the > other commits. Which it really isn't - there may be a regression fix > coming in just the next day. I'm not sure about that. A tag which is just the date is pretty clearly nothing more than that: a tag which is the date. I don't think anyone will somehow believe that it means more than it is. Plenty of systems out there do similar things. > In summary: > * The date (first line of ChangeLog) is a good version indicator. > * If someone doesn't like dates, for whatever reason, they can use > 'git describe'. IMO 'git describe' is less useful without some type of tagging regimen. Even tagging once a year would be helpful. This is today: $ git describe v0.1-3536-gd50852525 ?? If we added a tag "2020" at the beginning of the year we'd get: $ git describe 2020-427-gd50852525 A tag like "202006" the beginning of the month would be: $ git describe 202006-11-gd50852525 However it's done, my main hope is that gnulib provide some kind of module which does this version detection and generation for you, and builds that into its scripting so it's automatic, rather than everyone reinventing it (possibly slightly differently) for themselves. For example when I run bootstrap against a Git repo, it would run "git describe" and put the results into some gnulib version string in the files copied into my workspace. And there could be an "extract a static workspace" script that would do the same type of thing for an entire gnulib copy, that distributions (if they really wanted to ship a "gnulib" package) could run.
Re: Versioned releases
Bernhard Voelker wrote: > Well, the projects using gnulib (via git submodule) could at least generate > the 'git describe' value into their NEWS file or other documentation. Some packages do this already. The latest GNU Bison release announcement [1], for example: "This release was bootstrapped with the following tools: Autoconf 2.69 Automake 1.16.2 Flex 2.6.4 Gettext 0.19.8.1 Gnulib v0.1-3420-gffbb0ced8 " > And gnulib could provide helpers for that. The build-aux/announce-gen script already has support for it: --gnulib-version=VERSION report VERSION as the gnulib version, where VERSION is the result of running git describe in the gnulib source directory. Bruno [1] https://lists.gnu.org/archive/html/bug-bison/2020-05/msg00097.html
Re: Versioned releases
On 2020-06-04 20:19, Bruno Haible wrote: > Indeed e.g. Debian has a gnulib "package": > https://packages.debian.org/sid/all/gnulib/filelist > > But I think it's a red herring, since basically no one is using gnulib > this way. I agree, it's not really useful: e.g. I'm using openSUSE:Tumbleweed, a rolling release with almost the latest and greatest. But still, the version of it is already >260 commits behind (from 2020-02-16). Well, the 'gnulib-docs' package integrates well, but some content is already outdated - especially the newer, interesting parts. I guess that other, non-rolling distros come with much older and therefore even more useless versions. > You mean, a distributor wants to determine which of the coreutils, > findutils, gawk, gettext, etc. package use the Gnulib before 2018-09-23? > This is nontrivial, but not because Gnulib does not have a version > number, but because it's shipped as a source-code library - something that > we don't want to change. Well, the projects using gnulib (via git submodule) could at least generate the 'git describe' value into their NEWS file or other documentation. And gnulib could provide helpers for that. Still, that wouldn't help in the case the packager adds a downstream patch for gnulib files. Well, that same patch could include add a note for it in the docs as well. Have a nice day, Berny
Re: Versioned releases
Hi Dmitry, > My claim only covers standalone distribution of gnulib. I don't want > to dig into the reasons for why upstream forces bundling and why > downstream don't follow it anyway, but the sole fact that it's packaged > standalone in so many distribution speaks for itself of that this way of > distribution is a necessity. I don't think so. This way of distribution is a misunderstanding. Every developer nowadays is used to doing 'git clone' here and there; there are even more and more people who prefer the hassles of building a package from a git checkout to the sailing trip of building a tarball. > With standalone distribuition there's no way to peek into git history > or some source files, but there's a clear identifier of which specific > version is packaged. Yes. As I said, the first line of the ChangeLog is the best identifier. > > Or are you suggesting that the Gnulib developers pick, say, every 100th > > Gnulib commit and assign it a version number? And how would that be useful, > > since the consumers upgrade when they like to? > > I would suggest using proper semver. semver is not a good philosophy for gnulib, because different packages use different gnulib modules. This week we made an incompatible change to the 'read-file' module; but the vast majority of the packages will not be impacted because they don't use this module. Therefore bumping a version number is not really meaningful. > But dumb tagging every nth > commit, or weekly or so would definitely be better than nothing I disagree on this one. It would make people think that the Nth commit, or the Monday commit, or whatever, is preferred over the other commits. Which it really isn't - there may be a regression fix coming in just the next day. In summary: * The date (first line of ChangeLog) is a good version indicator. * If someone doesn't like dates, for whatever reason, they can use 'git describe'. Bruno
Re: Versioned releases
On Thu, 2020-06-04 at 23:11 +0300, Dmitry Marakasov wrote: > * Paul Smith (psm...@gnu.org) wrote: > > > Regarding the format of the version: > > > > First, semver is not right for gnulib. The entire concept behind > > semver and similar versioning schemes is to use a version string to > > describe compatibility guarantees between different versions. > > That's (IMO) completely inappropriate for a source-only package > > like gnulib. > > Why, that's precisely what semver is useful and was designed for. > It's MAJOR.MINOR.PATCH - if you break API, bump MAJOR, if you > introduce new feature, bump MINOR, otherwise bump PATCH. I'm not a gnulib developer, so I don't want to speak for them: maybe they would like to make this attempt. But IMO it's not appropriate for gnulib. During the development of gnulib there aren't discrete release points where someone will stop and consider all the changes since the last release, and assign some version to it as a whole. To the extent that such discrete points exist they are invented by distributions that include gnulib as a package... not by the gnulib developers. To follow semver, or a similar versioning scheme, would mean that EVERY SINGLE COMMIT would have to change the version, because EVERY SINGLE COMMIT makes some change, and anyone could do a Git pull of gnulib at any instant and include it in their program, or in their distribution. The only possibly workable option would be to have the first two numbers in a semver be bumped by developers when they pushed changes which they knew to change the API or add a feature, and leave the last number to be automatically generated based on the number of Git commits since the last version bump (since those commits can be assumed to be bugfixes only). However, I doubt this is reasonable either. First, even only considering the first two semver values it would add a lot of overhead and effort to the development process to consider and get right these version bumps with every push to the repository. Second, remember gnulib is not a monolithic entity: it's a collection of 1,200 or so discrete "utilities" (and counting...), most of which are just one or two files. Do we say that the version of gnulib should change every time ANY ONE of those hundreds of utilities had a new feature or a change to their API? Suppose Bruno pushes a new module (second number bump), then the next day realizes it has a problem that needs the API to change (first number bump). Then an hour later he realizes there's another problem with the API (another first number bump). Etc. Just because the API version bumped doesn't tell you anything very interesting when it could be any one of >1000 different utilities whose API was changed, for any number of reasons. > So as a consumer I may just require e.g. version >=1.2.3 <2, and > expect it to be API-compatible and have all the features my code > requires. That isn't how gnulib is intended to be used. > > My recommendation would be to automatically add a tag once a month > > (say) to the gnulib Git repo with the date, and then use the "git > > describe" output as the version. This gives an easily-comparable > > version string with all the info needed. > > This complicates the format, as SHAs are never appropriate in the > verions, for they are not monotonic and alphabetic characters are not > compatible with all package managers. Someone may include them, some > may omit them, and we'll end up with incompatible versioning schemes > again. IMO the idea of being able to learn anything from a gnulib version that is more informative than, "this one contains more recent commits than that one" is not feasible. But I think that "this one contains more recent commits than that one" _IS_ a very useful and desirable metric and speaking as a gnlib user I hope we can find a relatively painless way to incorporate it.
Re: Versioned releases
* Paul Smith (psm...@gnu.org) wrote: > Regarding the format of the version: > > First, semver is not right for gnulib. The entire concept behind > semver and similar versioning schemes is to use a version string to > describe compatibility guarantees between different versions. That's > (IMO) completely inappropriate for a source-only package like gnulib. Why, that's precisely what semver is useful and was designed for. It's MAJOR.MINOR.PATCH - if you break API, bump MAJOR, if you introduce new feature, bump MINOR, otherwise bump PATCH. So as a consumer I may just require e.g. version >=1.2.3 <2, and expect it to be API-compatible and have all the features my code requires. With that, library code may be safely (even automatically) updated to the latest 1.x version, be it a systemwide package maintained by someone else, or a bundled code/subrepository, and consumer code will not break, yet having all the latest features/fixes from the library. > I think the Git SHA is the single most critical element and must be > included. However, it's not too informative unless the user has the > Git repo. > > My recommendation would be to automatically add a tag once a month > (say) to the gnulib Git repo with the date, and then use the "git > describe" output as the version. This gives an easily-comparable > version string with all the info needed. This complicates the format, as SHAs are never appropriate in the verions, for they are not monotonic and alphabetic characters are not compatible with all package managers. Someone may include them, some may omit them, and we'll end up with incompatible versioning schemes again. If you're going to introduce a version, please be sure it's the same being a tag or embedded in the source. You may as well embed git commit or `git describe` output, but it should be clearly separated from the version. -- Dmitry Marakasov . 55B5 0596 FF1E 8D84 5F56 9510 D35A 80DD F9D2 F77D amd...@amdmi3.ru ..: https://github.com/AMDmi3
Re: Versioned releases
On Thu, 2020-06-04 at 20:19 +0200, Bruno Haible wrote: > Are you suggesting that every gnulib commit can be translated to a > version number? There's 'git describe' which does that. > > Or are you suggesting that the Gnulib developers pick, say, every > 100th Gnulib commit and assign it a version number? And how would > that be useful, since the consumers upgrade when they like to? What would be useful is if there were a "gnulib-version" module or similar that was constructed when bootstrap was run and pulled in a new suite of gnulib content, for example, based on the Git version perhaps. Then applications could call a C function to return the gnulib version as a string and include it in their --version output (if they wanted to) and users could judge the "freshness" of the gnulib content. For the distro packages, that take a snapshot of the Git repo: it would be good if there were some way to have that snapshot contain hardcoded version details from the Git, so that if apps bootstrapped from the distro snapshot of gnulib they would get the correct hardcoded version. I don't pretend to know too much about how all this works, including how distros create gnulib packages, but this seems like something that would be do-able and useful, and wouldn't need to involve any type of "automatic versioning" of gnulib in the Git repo. Regarding the format of the version: First, semver is not right for gnulib. The entire concept behind semver and similar versioning schemes is to use a version string to describe compatibility guarantees between different versions. That's (IMO) completely inappropriate for a source-only package like gnulib. I think the Git SHA is the single most critical element and must be included. However, it's not too informative unless the user has the Git repo. My recommendation would be to automatically add a tag once a month (say) to the gnulib Git repo with the date, and then use the "git describe" output as the version. This gives an easily-comparable version string with all the info needed.
Re: Versioned releases
* Bruno Haible (br...@clisp.org) wrote: > > Despite that gnulib homepage says "Gnulib does not make releases. > > It is intended to be used at the source level." gnulib is in fact > > packaged in quite a lot of distributions: > > > > https://repology.org/project/gnulib/versions > > Indeed e.g. Debian has a gnulib "package": > https://packages.debian.org/sid/all/gnulib/filelist > > But I think it's a red herring, since basically no one is using gnulib > this way. > > > Note that since there are no official versions maintainers have to > > invent versioning schemes which include "0", multiple date based and > > commit number based formats. > > There is nothing wrong with that. As long as the date be retrieved from > the checkout, there is no problem: > > git_checkout_date=`if test -d .git; then > git log -n 1 --date=iso --format=fuller | sed -n -e > 's/^CommitDate: //p'; >else > sed -n -e > 's/^\([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]\).*/\1/p' -e 1q ChangeLog; >fi` > pretty_date=`LC_ALL=C date +"%e %B %Y" --date="$git_checkout_date"` > > > There are known vulnerabilities for gnulib which also have to use > > something version-like to describe which gnulib versions are affected > > (these use dates in -MM-DD format): > > > > https://nvd.nist.gov/vuln/detail/CVE-2017-7476 > > https://nvd.nist.gov/vuln/detail/CVE-2018-17942 > > It says e.g. "in Gnulib before 2018-09-23 has a heap-based buffer overflow". > It is easy for every user of Gnulib to determine whether their version > is before or after 2018-09-23. Just peek at the ChangeLog or 'gitk'. > > It is not harder than when a CVE is about "OpenSSL through 1.0.1i". > > > Note that it's impossible to match these against package versions due > > to inconsistent versioning scheme. > > You mean, a distributor wants to determine which of the coreutils, > findutils, gawk, gettext, etc. package use the Gnulib before 2018-09-23? > This is nontrivial, but not because Gnulib does not have a version > number, but because it's shipped as a source-code library - something that > we don't want to change. > > Such a distributor would > - for packages for which they used tarballs, look at the particular file > in the tarball (e.g. lib/vasnprintf.c); I admit it is tedious; > - for packages for which they use the git checkout, look at the git > submodule version (e.g. [1][2]); this is tedious as well. > > But I don't see how a versioning scheme would significantly help. My claim only covers standalone distribution of gnulib. I don't want to dig into the reasons for why upstream forces bundling and why downstream don't follow it anyway, but the sole fact that it's packaged standalone in so many distribution speaks for itself of that this way of distribution is a necessity. With standalone distribuition there's no way to peek into git history or some source files, but there's a clear identifier of which specific version is packaged. And it can be used to estimate of how up to date the packaged version is, and to reliably check whether it has known vulnerabilities and (when semver is used) whether it's compatible with particular consumers. > > So as you can see, even though there are no official versioned releases, > > people have to invent and use these to refer to specific gnulib commit > > ranges, and not having any consistency in these schemes results in e.g. > > inability to report vulnerable packages. > > I don't see noticeable problems caused by this inconsistency. > > > So I suggest to fix this by introducing any kind of upstream versioning. > > Are you suggesting that every gnulib commit can be translated to a > version number? There's 'git describe' which does that. > > Or are you suggesting that the Gnulib developers pick, say, every 100th > Gnulib commit and assign it a version number? And how would that be useful, > since the consumers upgrade when they like to? I would suggest using proper semver. But dumb tagging every nth commit, or weekly or so would definitely be better than nothing, as long as the tags use consistent scheme. There's no need for exact commit:version mapping, to say that "versions below x.y.z" contain a bug or vulnerability. Just enough precision to not have to wait months for a fixed version to be released. > [1] https://git.savannah.gnu.org/cgit/poke.git/log/gnulib > [2] https://git.savannah.gnu.org/cgit/gettext.git/log/gnulib -- Dmitry Marakasov . 55B5 0596 FF1E 8D84 5F56 9510 D35A 80DD F9D2 F77D amd...@amdmi3.ru ..: https://github.com/AMDmi3
Re: Versioned releases
Hi Dmitry, > Despite that gnulib homepage says "Gnulib does not make releases. > It is intended to be used at the source level." gnulib is in fact > packaged in quite a lot of distributions: > > https://repology.org/project/gnulib/versions Indeed e.g. Debian has a gnulib "package": https://packages.debian.org/sid/all/gnulib/filelist But I think it's a red herring, since basically no one is using gnulib this way. > Note that since there are no official versions maintainers have to > invent versioning schemes which include "0", multiple date based and > commit number based formats. There is nothing wrong with that. As long as the date be retrieved from the checkout, there is no problem: git_checkout_date=`if test -d .git; then git log -n 1 --date=iso --format=fuller | sed -n -e 's/^CommitDate: //p'; else sed -n -e 's/^\([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]\).*/\1/p' -e 1q ChangeLog; fi` pretty_date=`LC_ALL=C date +"%e %B %Y" --date="$git_checkout_date"` > There are known vulnerabilities for gnulib which also have to use > something version-like to describe which gnulib versions are affected > (these use dates in -MM-DD format): > > https://nvd.nist.gov/vuln/detail/CVE-2017-7476 > https://nvd.nist.gov/vuln/detail/CVE-2018-17942 It says e.g. "in Gnulib before 2018-09-23 has a heap-based buffer overflow". It is easy for every user of Gnulib to determine whether their version is before or after 2018-09-23. Just peek at the ChangeLog or 'gitk'. It is not harder than when a CVE is about "OpenSSL through 1.0.1i". > Note that it's impossible to match these against package versions due > to inconsistent versioning scheme. You mean, a distributor wants to determine which of the coreutils, findutils, gawk, gettext, etc. package use the Gnulib before 2018-09-23? This is nontrivial, but not because Gnulib does not have a version number, but because it's shipped as a source-code library - something that we don't want to change. Such a distributor would - for packages for which they used tarballs, look at the particular file in the tarball (e.g. lib/vasnprintf.c); I admit it is tedious; - for packages for which they use the git checkout, look at the git submodule version (e.g. [1][2]); this is tedious as well. But I don't see how a versioning scheme would significantly help. > So as you can see, even though there are no official versioned releases, > people have to invent and use these to refer to specific gnulib commit > ranges, and not having any consistency in these schemes results in e.g. > inability to report vulnerable packages. I don't see noticeable problems caused by this inconsistency. > So I suggest to fix this by introducing any kind of upstream versioning. Are you suggesting that every gnulib commit can be translated to a version number? There's 'git describe' which does that. Or are you suggesting that the Gnulib developers pick, say, every 100th Gnulib commit and assign it a version number? And how would that be useful, since the consumers upgrade when they like to? Bruno [1] https://git.savannah.gnu.org/cgit/poke.git/log/gnulib [2] https://git.savannah.gnu.org/cgit/gettext.git/log/gnulib