Re: Verifying dep-5
Hi, Quoting Jakub Wilk (2016-05-30 13:08:47) > * Johannes Schauer, 2016-05-28, 10:04: > >I was investigating this problem last year and as far as my research > >went, there is no tracing method in existence which reliably traces > >system calls in general, file system access or read/write operations > >while keeping track of the acting pid that is 100% reliable. The > >methods I found either were not transparent (and would thus break test > >suites) or suffered from race conditions where it was possible to > >register an operation but miss the pid the operation was carried out by > >or dropped operations if they occurred with a too-high frequency... > > Have you tried systemtap? yes, and it will drop events if they arrive too fast. There is no way to completely prevent it from doing so. One can only increase queue and buffer sizes and timeouts but that will never provide 100% reliability. cheers, josch signature.asc Description: signature
Re: Verifying dep-5
* Johannes Schauer, 2016-05-28, 10:04: I was investigating this problem last year and as far as my research went, there is no tracing method in existence which reliably traces system calls in general, file system access or read/write operations while keeping track of the acting pid that is 100% reliable. The methods I found either were not transparent (and would thus break test suites) or suffered from race conditions where it was possible to register an operation but miss the pid the operation was carried out by or dropped operations if they occurred with a too-high frequency... Have you tried systemtap? Timo Juhani Lindfors wrote PoC that tracks all execs: http://lindi.iki.fi/lindi/structured-buildlogs/logs/hello-2.6-1_amd64.build http://lindi.iki.fi/lindi/git/structured-buildlogs.git/ Having such a reliable tracing method would give us the ability to reliably infer copyright information As Paul noticed in another mail, system calls tracing won't necessarily help much. as well as generating structured build logs (knowing for each line in the build log the process (tree) that created it). Consider the following pipeline: $ (LC_ALL=C date | tail -c2; echo 6) | shuf | head -n1 | tee log 6 Which process created the log line? The technically correct answer is "tee"; but this answer is completely impractical. -- Jakub Wilk
Re: Verifying dep-5
Hi, Quoting Nikolaus Rath (2016-05-29 22:11:58) > Did you write down your findings in some more detail somewhere? no, sorry. > I'd be curious why e.g. a LD_PRELOAD based wrapper would not work for all > important cases. For me "all important cases" were "compilation of all debian source packages". LD_PRELOAD based methods would not work for for source packages which make use of this mechanism already (for example during their tests). A prominent example would be src:fakechroot itself. > Or are we assuming that the application is actively trying to prevent this > (and e.g. does system calls directly on its own)? We are assuming that applications do things that they normally do during package builds. Unfortunately that includes test cases which sometimes do really weird things. Using fakechroot or proot it would definitely be possible to set up such a package building tracer that would work for 99% of the archive. By building first without tracer, then with proot (on Linux) and then with fakechroot (should the build fail with proot) and by then using reproducible builds we can even make sure that the tracer did not influence the build in any way that produces different binary packages. If test suits cannot be executed because of the tracer, they will probably fail. I did not follow-up on this 99% solution because I'm usually much less motivated if the solution is not 100% proper. And there were some tricky things to solve like what file format to make up to be able to store build logs and operation on files while at the same time maintaining the process tree that lead to writing to the build log or general file descriptor operations. And since this information becomes a lot really quickly (a yaml based representation I tested with easily reached several hundred of megabytes) it would be great if the information could be written to the output file directly instead of being stored in memory, but this then has to work even with parallel builds. There is still a sticky note about all these things on my fridge but oh if I just would have more time... XD Thanks! cheers, josch signature.asc Description: signature
Re: Verifying dep-5
On May 28 2016, Johannes Schauerwrote: > Hi, > > Quoting Paul Wise (2016-05-28 06:45:44) >> I think it would be interesting to automatically track how each file >> in a binary package was created and which files they were derived >> from. Then we could automatically generate proper copyright files for >> binary packages. That is a hard project so... > > I was investigating this problem last year and as far as my research went, > there is no tracing method in existence which reliably traces system calls in > general, file system access or read/write operations while keeping track of > the > acting pid that is 100% reliable. The methods I found either were not > transparent (and would thus break test suites) or suffered from race > conditions > where it was possible to register an operation but miss the pid the operation > was carried out by or dropped operations if they occurred with a too-high > frequency... Did you write down your findings in some more detail somewhere? I'd be curious why e.g. a LD_PRELOAD based wrapper would not work for all important cases. Or are we assuming that the application is actively trying to prevent this (and e.g. does system calls directly on its own)? Best, Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«
Re: Verifying dep-5
[2016-05-28 13:20] Stefano Zacchiroli> On Sat, May 28, 2016 at 02:18:51AM +0300, Dmitry Bogatov wrote: > > But seems we do not have tools to check it. Probably, we need some way > > to mark licenses of whole binary packages. WDYT? > > You're correct that we have no way to document the licenses of binaries. > The Policy is currently only concerned to document licenses at the > source (files) level. > > Note that having a human-maintained documentation of the license of each > binary we ship is not enough to properly do the checking you've in mind. > Tracking licensing information across builds is actually an open > research question on which various teams around the world are > working---on various angles: formalizing dependencies across builds, > dynamically tracking builds using syscall tapping, inspecting built > binaries ex post, etc. There are prototypes of all these things around, > but TTBOMK they are all very limited (e.g., restricting to a specific > build system and/or a programming language) and as such by no mean > generic enough to scale to the size and diversity we have in Debian. In my particular case, issue is solved (upstream maintener agreed to remove GPL file, causing package be plain BSD-3-clause). But to get idea, whether such issue is worth new Field in d/control, it would be interesting to take a look on all dep5 d/copyright files. Downloading every source package in archive is not option, sure. -- Accept: text/plain, text/x-diff Accept-Language: eo,en,ru X-Keep-In-CC: yes X-Web-Site: sinsekvu.github.io
Re: Verifying dep-5
On Sat, May 28, 2016 at 4:04 PM, Johannes Schauer wrote: > Having such a reliable tracing method would give us the ability to reliably > infer copyright information as well as generating structured build logs > (knowing for each line in the build log the process (tree) that created it). > > Both of these would also tremendously help debugging problems. For example, > for > fixing reproducible build problems, I was often puzzled which program actually > created a file that I was interested in for a source package that I am not > familiar with. Thanks for these other use-cases, very interesting. > Unfortunately though, there seems to be no way to reliably trace process > execution and read/write/open/close system calls without either sometimes > missing information or breaking builds... I expect this would need some support from the kernel being run under. OTOH I don't think a tracing mechanism is what is needed though, since the kernel cannot know what the program is doing with each file being read/written by the program. These sort of semantics (input, output and code) are only known by the program that is doing the transformations. Especially when you factor in shell scripts and other things, the semantics get complicated. Kernel support would definitely be useful though. Perhaps we could have a brainstorm/BoF about this at a DebConf some time. -- bye, pabs https://wiki.debian.org/PaulWise
Re: Verifying dep-5
On Sat, May 28, 2016 at 02:18:51AM +0300, Dmitry Bogatov wrote: > But seems we do not have tools to check it. Probably, we need some way > to mark licenses of whole binary packages. WDYT? You're correct that we have no way to document the licenses of binaries. The Policy is currently only concerned to document licenses at the source (files) level. Note that having a human-maintained documentation of the license of each binary we ship is not enough to properly do the checking you've in mind. Tracking licensing information across builds is actually an open research question on which various teams around the world are working---on various angles: formalizing dependencies across builds, dynamically tracking builds using syscall tapping, inspecting built binaries ex post, etc. There are prototypes of all these things around, but TTBOMK they are all very limited (e.g., restricting to a specific build system and/or a programming language) and as such by no mean generic enough to scale to the size and diversity we have in Debian. Cheers. -- Stefano Zacchiroli . . . . . . . z...@upsilon.cc . . . . o . . . o . o Maître de conférences . . . . . http://upsilon.cc/zack . . . o . . . o o Former Debian Project Leader . . . . . @zacchiro . . . . o o o . . . o . « the first rule of tautology club is the first rule of tautology club » signature.asc Description: PGP signature
Re: Verifying dep-5
Hi, Quoting Paul Wise (2016-05-28 06:45:44) > I think it would be interesting to automatically track how each file > in a binary package was created and which files they were derived > from. Then we could automatically generate proper copyright files for > binary packages. That is a hard project so... I was investigating this problem last year and as far as my research went, there is no tracing method in existence which reliably traces system calls in general, file system access or read/write operations while keeping track of the acting pid that is 100% reliable. The methods I found either were not transparent (and would thus break test suites) or suffered from race conditions where it was possible to register an operation but miss the pid the operation was carried out by or dropped operations if they occurred with a too-high frequency... Having such a reliable tracing method would give us the ability to reliably infer copyright information as well as generating structured build logs (knowing for each line in the build log the process (tree) that created it). Both of these would also tremendously help debugging problems. For example, for fixing reproducible build problems, I was often puzzled which program actually created a file that I was interested in for a source package that I am not familiar with. Unfortunately though, there seems to be no way to reliably trace process execution and read/write/open/close system calls without either sometimes missing information or breaking builds... cheers, josch signature.asc Description: signature
Re: Verifying dep-5
Quoting Dmitry Bogatov (2016-05-28 07:47:31) > > [add debian-devel back to cc] > >> Regarding _declaring_ appropriate DEP5 hints, with machine-readable >> DEP5 = copyright format you can declare a license in the _header_ >> section to = indicate the effective license caused by "infection" of >> indivifual parts = on the whole of the binary product. > > Almost sufficent, but not general enough. I don't follow, but instead of elaborating further here, see below... > Just an idea: new field in Package: stanza in d/control: > `Effective-License', which specify which terms you must comply with if > you use this library. In my case, I would leave debian/copyright > alone, and add `Effective-License: GPL-2+' to libghc-missingh-dev. > > And add rule, that Effective-License defaults to License in header, > which defaults to the most strict of licenses of individual files. > Add tool, that implement this rule. Hmm, it is complicated. > > Thoughts? > >> Also note that DEP5 format is only optional, so such automated = >> checks, even if/when existing, would not cover Debian as a whole. > > Is there no plans to push it into policy? I guess further progress to copyright format is driven by bugreports against debian-policy. Therefore I suggest you to file a bugreport if you feel there is substance for change. Since generally Policy reflects reality of Debian rather than steering changes to it, you might consider "backing" such bugreport by active use of your proposed new field: Copyright format explicitly permit the use of unofficial fields. - Jonas -- * Jonas Smedegaard - idealist & Internet-arkitekt * Tlf.: +45 40843136 Website: http://dr.jones.dk/ [x] quote me freely [ ] ask before reusing [ ] keep private signature.asc Description: signature
Re: Verifying dep-5
[add debian-devel back to cc] > Regarding _declaring_ appropriate DEP5 hints, with machine-readable DEP5 = > copyright format you can declare a license in the _header_ section to = > indicate the effective license caused by "infection" of indivifual parts = > on the whole of the binary product. Almost sufficent, but not general enough. Just an idea: new field in Package: stanza in d/control: `Effective-License', which specify which terms you must comply with if you use this library. In my case, I would leave debian/copyright alone, and add `Effective-License: GPL-2+' to libghc-missingh-dev. And add rule, that Effective-License defaults to License in header, which defaults to the most strict of licenses of individual files. Add tool, that implement this rule. Hmm, it is complicated. Thoughts? > Also note that DEP5 format is only optional, so such automated = > checks, even if/when existing, would not cover Debian as a whole. Is there no plans to push it into policy? -- Accept: text/plain, text/x-diff Accept-Language: eo,en,ru X-Keep-In-CC: yes X-Web-Site: sinsekvu.github.io
Re: Verifying dep-5
On Sat, May 28, 2016 at 7:18 AM, Dmitry Bogatov wrote: > Do we have any tools to check for GPL violation? I mean, is it any > tool to perform rather crude check whether package that contains > non-copyleft source file depends on binary package, source package of > which contains GPL file? non-copyleft licenses are generally GPL compatible, but I guess you are thinking of BSD-4-clause and OpenSSL licenses here? There are GPL-incompatible copyleft licenses too (like CDDL). The adequate tool can perform some checking of license incompatibilities: https://piuparts.debian.org/sid/incompatible_licenses_inadequate_issue.html https://packages.debian.org/unstable/adequate > Currently, I am working about some issue with haskell-missingh. All > code in this package is BSD-3-clause, but one file is GPL. It would > be wrong to mark all files as GPL, but package as whole is GPL, which > should be propagated down the dependency tree. But seems we do not > have tools to check it. Probably, we need some way to mark licenses > of whole binary packages. WDYT? I think it would be interesting to automatically track how each file in a binary package was created and which files they were derived from. Then we could automatically generate proper copyright files for binary packages. That is a hard project so... The next best thing is to have a manually prepared copyright file for the binary package that is different to the one for the source package (see libicns for an example) but... Right now we completely ignore what the correct copyright/license situation is for binary packages and assume it is the same as for the source package. -- bye, pabs https://wiki.debian.org/PaulWise
Verifying dep-5
Hello! Do we have any tools to check for GPL violation? I mean, is it any tool to perform rather crude check whether package that contains non-copyleft source file depends on binary package, source package of which contains GPL file? Currently, I am working about some issue with haskell-missingh. All code in this package is BSD-3-clause, but one file is GPL. It would be wrong to mark all files as GPL, but package as whole is GPL, which should be propagated down the dependency tree. But seems we do not have tools to check it. Probably, we need some way to mark licenses of whole binary packages. WDYT? -- Accept: text/plain, text/x-diff Accept-Language: eo,en,ru X-Keep-In-CC: yes X-Web-Site: sinsekvu.github.io