Bug#1019980: lintian: source-is-missing check for HTML is much too sensitive
On Thu, Feb 08, 2024 at 06:39:18PM +, Bastien Roucariès wrote: > Are you sure it is not embdeded base64 encoded png or minified javascript* ? Yes, I'm absolutely certain. > If not we could try to know why it choke ? I already gave a full explanation of this in my first message, which for some reason people are ignoring: """ So it issues a diagnostic for every HTML file with a somewhat long line (over 512 characters) unless it has an associated .fragment.js somewhere """ The HTML files it's issuing a diagnostic on here are perfectly innocuous and readable. Here's an example of one of the "offending" lines: In version 0.51 and before, local echo could not be separated from local line editing (where you type a line of text locally, and it is not sent to the server until you press Return, so you have the chance to edit it and correct mistakes before the server sees it). New in version 0.52, local echo and local line editing are separate options, and by default PuTTY will try to determine automatically whether to enable them or not, based on which protocol you have selected and also based on hints from the server. If you have a problem with PuTTY's default choice, you can force each option to be enabled or disabled as you choose. The controls are in the Terminal panel, in the section marked Line discipline options. I mean, come on. Sure, there are a couple of character entities (which have nothing to do with the diagnostic here anyway), but otherwise you can't tell me with a straight face that that's some kind of obscure compiled format; I would have written it exactly the same way by hand except for the word-wrapping. > Another alternative if we could determine the file was compiled by halibut, > we could demote to pedantic warning > and ask to repack in order to be sure to recompile from source. Or we could fix the ridiculously-oversensitive diagnostic. On the matter of repacking (which I will not do in this case), please see my comment in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1019980#15. -- Colin Watson (he/him) [cjwat...@debian.org]
Bug#1019980: lintian: source-is-missing check for HTML is much too sensitive
On Thu, Feb 08, 2024 at 08:27:40PM +, Bastien Roucariès wrote: > > > > > > source package, though I can't see how Lintian could possibly > > > > > > expect to > > > > > > know that. > > > > > > Are you sure it is not embdeded base64 encoded png or minified > > > javascript* ? > > > > > > If not we could try to know why it choke ? > > > > > > In this particular case, it is the source package that choke. If halibut > > > include the name of the source > > > in the html we could magically remove the source is missing warnings. > > > > > > Another alternative if we could determine the file was compiled by > > > halibut, we could demote to pedantic warning > > > and ask to repack in order to be sure to recompile from source. > > > > There are far too many different HTML generators out there to handle. > > We have done this for doxyen and sphinx, so maybe not for more This is two out of how many ? For example, my packages use TtH, GAPDoc, hevea, pod2html. I do not think it is sustainable. Cheers, -- Bill. Imagine a large red swirl here.
Bug#1019980: lintian: source-is-missing check for HTML is much too sensitive
Le jeudi 8 février 2024, 19:57:22 UTC Bill Allombert a écrit : > On Thu, Feb 08, 2024 at 06:39:18PM +, Bastien Roucariès wrote: > > Le jeudi 8 février 2024, 18:31:28 UTC Santiago Ruano Rincón a écrit : > > > On Sat, 14 Oct 2023 20:23:18 +0200 Bill Allombert > > > wrote: > > > > On Sun, Sep 18, 2022 at 12:14:07AM +0100, Colin Watson wrote: > > > > > Package: lintian > > > > > Version: 2.115.3 > > > > > Severity: normal > > > > > > > > > > Lintian issues these errors for putty 0.77-1: > > > > > > > > > > E: putty source: source-is-missing [doc/html/AppendixA.html] > > > > > E: putty source: source-is-missing [doc/html/AppendixB.html] > > > > > E: putty source: source-is-missing [doc/html/AppendixE.html] > > > > > E: putty source: source-is-missing [doc/html/Chapter10.html] > > > > > E: putty source: source-is-missing [doc/html/Chapter2.html] > > > > > E: putty source: source-is-missing [doc/html/Chapter3.html] > > > > > E: putty source: source-is-missing [doc/html/Chapter4.html] > > > > > E: putty source: source-is-missing [doc/html/Chapter5.html] > > > > > E: putty source: source-is-missing [doc/html/Chapter7.html] > > > > > E: putty source: source-is-missing [doc/html/Chapter8.html] > > > > > E: putty source: source-is-missing [doc/html/Chapter9.html] > > > > > E: putty source: source-is-missing [doc/html/IndexPage.html] > > > > > > > > > > This is pretty oversensitive. Firstly, it's HTML, which is still > > > > > often > > > > > enough written by hand anyway. As it happens, these particular HTML > > > > > files are generated from halibut input that's also provided in the > > > > > source package, though I can't see how Lintian could possibly expect > > > > > to > > > > > know that. > > > > Are you sure it is not embdeded base64 encoded png or minified javascript* ? > > > > If not we could try to know why it choke ? > > > > In this particular case, it is the source package that choke. If halibut > > include the name of the source > > in the html we could magically remove the source is missing warnings. > > > > Another alternative if we could determine the file was compiled by halibut, > > we could demote to pedantic warning > > and ask to repack in order to be sure to recompile from source. > > There are far too many different HTML generators out there to handle. We have done this for doxyen and sphinx, so maybe not for more > You would need to define a standard way to indicate the path to the source in > the generated file. > But some generator authors might consider this is an inacceptable data leak, > so > this would only be done if some environment variable is defined. for doxygen or sphinx we only detect some string in html file and whitelist Generared by something will work Moreover adding missing-source override like could be done be done by adding manualy a symlink debian/missing-sources/ fullname pointing to the righ location. We also magically search know source by using some heurtistic in SourceMissing.pm So the basic framework is here, we only need to add more rules Bastien > > In the short term, I suggest to disable it since there is no policy > requirement > for the source code to be in a particular path, so it is not an error. > > At the very least, it should not be generated more than once per package. > > Cheers, > signature.asc Description: This is a digitally signed message part.
Bug#1019980: lintian: source-is-missing check for HTML is much too sensitive
On Thu, Feb 08, 2024 at 06:39:18PM +, Bastien Roucariès wrote: > Le jeudi 8 février 2024, 18:31:28 UTC Santiago Ruano Rincón a écrit : > > On Sat, 14 Oct 2023 20:23:18 +0200 Bill Allombert > > wrote: > > > On Sun, Sep 18, 2022 at 12:14:07AM +0100, Colin Watson wrote: > > > > Package: lintian > > > > Version: 2.115.3 > > > > Severity: normal > > > > > > > > Lintian issues these errors for putty 0.77-1: > > > > > > > > E: putty source: source-is-missing [doc/html/AppendixA.html] > > > > E: putty source: source-is-missing [doc/html/AppendixB.html] > > > > E: putty source: source-is-missing [doc/html/AppendixE.html] > > > > E: putty source: source-is-missing [doc/html/Chapter10.html] > > > > E: putty source: source-is-missing [doc/html/Chapter2.html] > > > > E: putty source: source-is-missing [doc/html/Chapter3.html] > > > > E: putty source: source-is-missing [doc/html/Chapter4.html] > > > > E: putty source: source-is-missing [doc/html/Chapter5.html] > > > > E: putty source: source-is-missing [doc/html/Chapter7.html] > > > > E: putty source: source-is-missing [doc/html/Chapter8.html] > > > > E: putty source: source-is-missing [doc/html/Chapter9.html] > > > > E: putty source: source-is-missing [doc/html/IndexPage.html] > > > > > > > > This is pretty oversensitive. Firstly, it's HTML, which is still often > > > > enough written by hand anyway. As it happens, these particular HTML > > > > files are generated from halibut input that's also provided in the > > > > source package, though I can't see how Lintian could possibly expect to > > > > know that. > > Are you sure it is not embdeded base64 encoded png or minified javascript* ? > > If not we could try to know why it choke ? > > In this particular case, it is the source package that choke. If halibut > include the name of the source > in the html we could magically remove the source is missing warnings. > > Another alternative if we could determine the file was compiled by halibut, > we could demote to pedantic warning > and ask to repack in order to be sure to recompile from source. There are far too many different HTML generators out there to handle. You would need to define a standard way to indicate the path to the source in the generated file. But some generator authors might consider this is an inacceptable data leak, so this would only be done if some environment variable is defined. In the short term, I suggest to disable it since there is no policy requirement for the source code to be in a particular path, so it is not an error. At the very least, it should not be generated more than once per package. Cheers, -- Bill. Imagine a large red swirl here.
Bug#1019980: lintian: source-is-missing check for HTML is much too sensitive
Le jeudi 8 février 2024, 18:31:28 UTC Santiago Ruano Rincón a écrit : > On Sat, 14 Oct 2023 20:23:18 +0200 Bill Allombert wrote: > > On Sun, Sep 18, 2022 at 12:14:07AM +0100, Colin Watson wrote: > > > Package: lintian > > > Version: 2.115.3 > > > Severity: normal > > > > > > Lintian issues these errors for putty 0.77-1: > > > > > > E: putty source: source-is-missing [doc/html/AppendixA.html] > > > E: putty source: source-is-missing [doc/html/AppendixB.html] > > > E: putty source: source-is-missing [doc/html/AppendixE.html] > > > E: putty source: source-is-missing [doc/html/Chapter10.html] > > > E: putty source: source-is-missing [doc/html/Chapter2.html] > > > E: putty source: source-is-missing [doc/html/Chapter3.html] > > > E: putty source: source-is-missing [doc/html/Chapter4.html] > > > E: putty source: source-is-missing [doc/html/Chapter5.html] > > > E: putty source: source-is-missing [doc/html/Chapter7.html] > > > E: putty source: source-is-missing [doc/html/Chapter8.html] > > > E: putty source: source-is-missing [doc/html/Chapter9.html] > > > E: putty source: source-is-missing [doc/html/IndexPage.html] > > > > > > This is pretty oversensitive. Firstly, it's HTML, which is still often > > > enough written by hand anyway. As it happens, these particular HTML > > > files are generated from halibut input that's also provided in the > > > source package, though I can't see how Lintian could possibly expect to > > > know that. Are you sure it is not embdeded base64 encoded png or minified javascript* ? If not we could try to know why it choke ? In this particular case, it is the source package that choke. If halibut include the name of the source in the html we could magically remove the source is missing warnings. Another alternative if we could determine the file was compiled by halibut, we could demote to pedantic warning and ask to repack in order to be sure to recompile from source. Thanks > > > > Dear Lintian maintainers, > > > > This test is causing hundreds of false positive and should be disabled as > > soon as possible. This is a huge waste of time for everybody. > > > > If you need help with that, please tell me, I have worked on lintian in the > > past. > > Dear Lintian maintainers, > > I cannot offer the same help as ballombe, but I also find it would help > to disable these errors. At least, could they be "demoted" to warnings? > Thanks in advance, > > Santiago > signature.asc Description: This is a digitally signed message part.
Bug#1019980: lintian: source-is-missing check for HTML is much too sensitive
On Sat, 14 Oct 2023 20:23:18 +0200 Bill Allombert wrote: > On Sun, Sep 18, 2022 at 12:14:07AM +0100, Colin Watson wrote: > > Package: lintian > > Version: 2.115.3 > > Severity: normal > > > > Lintian issues these errors for putty 0.77-1: > > > > E: putty source: source-is-missing [doc/html/AppendixA.html] > > E: putty source: source-is-missing [doc/html/AppendixB.html] > > E: putty source: source-is-missing [doc/html/AppendixE.html] > > E: putty source: source-is-missing [doc/html/Chapter10.html] > > E: putty source: source-is-missing [doc/html/Chapter2.html] > > E: putty source: source-is-missing [doc/html/Chapter3.html] > > E: putty source: source-is-missing [doc/html/Chapter4.html] > > E: putty source: source-is-missing [doc/html/Chapter5.html] > > E: putty source: source-is-missing [doc/html/Chapter7.html] > > E: putty source: source-is-missing [doc/html/Chapter8.html] > > E: putty source: source-is-missing [doc/html/Chapter9.html] > > E: putty source: source-is-missing [doc/html/IndexPage.html] > > > > This is pretty oversensitive. Firstly, it's HTML, which is still often > > enough written by hand anyway. As it happens, these particular HTML > > files are generated from halibut input that's also provided in the > > source package, though I can't see how Lintian could possibly expect to > > know that. > > Dear Lintian maintainers, > > This test is causing hundreds of false positive and should be disabled as > soon as possible. This is a huge waste of time for everybody. > > If you need help with that, please tell me, I have worked on lintian in the > past. Dear Lintian maintainers, I cannot offer the same help as ballombe, but I also find it would help to disable these errors. At least, could they be "demoted" to warnings? Thanks in advance, Santiago signature.asc Description: PGP signature
Bug#1019980: lintian: source-is-missing check for HTML is much too sensitive
On Sun, Sep 18, 2022 at 12:14:07AM +0100, Colin Watson wrote: > Package: lintian > Version: 2.115.3 > Severity: normal > > Lintian issues these errors for putty 0.77-1: > > E: putty source: source-is-missing [doc/html/AppendixA.html] > E: putty source: source-is-missing [doc/html/AppendixB.html] > E: putty source: source-is-missing [doc/html/AppendixE.html] > E: putty source: source-is-missing [doc/html/Chapter10.html] > E: putty source: source-is-missing [doc/html/Chapter2.html] > E: putty source: source-is-missing [doc/html/Chapter3.html] > E: putty source: source-is-missing [doc/html/Chapter4.html] > E: putty source: source-is-missing [doc/html/Chapter5.html] > E: putty source: source-is-missing [doc/html/Chapter7.html] > E: putty source: source-is-missing [doc/html/Chapter8.html] > E: putty source: source-is-missing [doc/html/Chapter9.html] > E: putty source: source-is-missing [doc/html/IndexPage.html] > > This is pretty oversensitive. Firstly, it's HTML, which is still often > enough written by hand anyway. As it happens, these particular HTML > files are generated from halibut input that's also provided in the > source package, though I can't see how Lintian could possibly expect to > know that. Dear Lintian maintainers, This test is causing hundreds of false positive and should be disabled as soon as possible. This is a huge waste of time for everybody. If you need help with that, please tell me, I have worked on lintian in the past. Cheers, -- Bill. Imagine a large red swirl here. signature.asc Description: PGP signature
Bug#1019980: lintian: source-is-missing check for HTML is much too sensitive
On Sun, Sep 18, 2022 at 08:11:03AM +0800, Paul Wise wrote: > I think the right thing for putty here is for upstream to remove the > HTML from their VCS and tarballs, then add the generation process to > their build system and continuous integration, so that they always know > when there are problems with generating the HTML. The HTML files have never been in PuTTY upstream's VCS. They are generated automatically as part of PuTTY's build system for release tarballs, as a convenience to people who want to build PuTTY without Halibut, since it's a somewhat niche documentation tool. Since I agree with upstream that this is a reasonable convenience, I'm not going to ask them to stop doing it. > If they refuse then you could exclude the HTML from Debian's copy of > the upstream tarball. We're not talking about opaque object code here. This is perfectly readable plain HTML that just happens to be generated from another perfectly readable text format. It's not the preferred form of modification, sure (I wouldn't edit it directly since I have the Halibut input files available, but if nobody told me that those existed then I'd happily edit the HTML without even noticing), but this package isn't covered by the GPL so that's not very relevant. I'm not going to waste a second on editing Debian's copy of the upstream tarball for this complete non-issue. I already take care to ensure that the package rebuilds the documentation from source, and there's no DFSG issue with the pre-generated files being present so there's no reason to remove them from the tarball. The only reason that the presence of pre-generated files is even coming up is because Lintian's heuristics are misfiring in a way that seems clearly incorrect and probably unintentional. > Until either lintian changes or the putty HTML gets removed, overriding > the lintian warning in putty seems the correct thing to do. Done. > If that is done, I think lintian should add more heuristics to detect > other generated HTML. The halibut generated HTML doesn't make that easy > but there are some signals that can be added I think, like this: > >halibut-1.3/bk_html.c: html_raw(,
Bug#1019980: lintian: source-is-missing check for HTML is much too sensitive
On Sun, 18 Sep 2022 00:14:07 +0100 Colin Watson wrote: > This is pretty oversensitive. Firstly, it's HTML, which is still often > enough written by hand anyway. As it happens, these particular HTML > files are generated from halibut input that's also provided in the > source package, though I can't see how Lintian could possibly expect to > know that. I am not a lintian maintainer, but: HTML is very often generated and there are many different ways to generate it. I think the right thing for lintian to do here is to know about more of the source formats and when there is generated HTML in the tarball but source is also present, then emit a new lower severity generated-files tag instead of the existing source-is-missing tag. I think the right thing for putty here is for upstream to remove the HTML from their VCS and tarballs, then add the generation process to their build system and continuous integration, so that they always know when there are problems with generating the HTML. If they refuse then you could exclude the HTML from Debian's copy of the upstream tarball. Until either lintian changes or the putty HTML gets removed, overriding the lintian warning in putty seems the correct thing to do. PS: I note that manual pages are similar to HTML in this regard and I think the same reasoning above applies to the putty manual pages and to lintian's treatment of manual pages in source packages. > I suggest restoring something like this code to check for
Bug#1019980: lintian: source-is-missing check for HTML is much too sensitive
Package: lintian Version: 2.115.3 Severity: normal Lintian issues these errors for putty 0.77-1: E: putty source: source-is-missing [doc/html/AppendixA.html] E: putty source: source-is-missing [doc/html/AppendixB.html] E: putty source: source-is-missing [doc/html/AppendixE.html] E: putty source: source-is-missing [doc/html/Chapter10.html] E: putty source: source-is-missing [doc/html/Chapter2.html] E: putty source: source-is-missing [doc/html/Chapter3.html] E: putty source: source-is-missing [doc/html/Chapter4.html] E: putty source: source-is-missing [doc/html/Chapter5.html] E: putty source: source-is-missing [doc/html/Chapter7.html] E: putty source: source-is-missing [doc/html/Chapter8.html] E: putty source: source-is-missing [doc/html/Chapter9.html] E: putty source: source-is-missing [doc/html/IndexPage.html] This is pretty oversensitive. Firstly, it's HTML, which is still often enough written by hand anyway. As it happens, these particular HTML files are generated from halibut input that's also provided in the source package, though I can't see how Lintian could possibly expect to know that. I tried to work out whether I should be overriding this or whether it's a bug in Lintian, and I think it's the latter. The current relevant code is this in lib/Lintian/Check/Files/SourceMissing.pm: sub visit_patched_files { my ($self, $item) = @_; return unless $item->is_file; [...] return if !defined $longest || $line_length{$longest} <= $VERY_LONG_LINE_LENGTH; [...] if ($item->basename =~ /\.(?:x?html?\d?|xht)$/i) { # html file $self->pointed_hint('source-is-missing', $item->pointer) unless $self->find_source($item, {'.fragment.js' => $DOLLAR}); } return; } So it issues a diagnostic for every HTML file with a somewhat long line (over 512 characters) unless it has an associated .fragment.js somewhere (I think - the find_source sub is undocumented and a bit obscure to me)? That doesn't sound right - surely that would catch far too many false positives. Next, I went looking through git history to try to figure out where this was introduced. I found this commit: https://salsa.debian.org/lintian/lintian/-/commit/4f24ab7fca The commit message makes it sound as though it was probably just refactoring, but it wasn't. The corresponding bit of code there was previously in a warn_prebuilt_javascript sub called from a warn_long_lines sub, which in turn was called in two places: once for certain kinds of .js files, and once from this sub: # check javascript in html file sub check_html_cruft { my ($self, $item, $lowercase) = @_; my $blockscript = $lowercase; my $indexscript; while (($indexscript = index($blockscript, ' $ITEM_NOT_FOUND) { $blockscript = substr($blockscript,$indexscript); # sourced script ok if ($blockscript =~ m{\A]*?src="[^"]+?"[^>]*?>}sm) { $blockscript = substr($blockscript,$+[0]); next; } # extract script if ($blockscript =~ m{]*?>(.*?)}sm) { $blockscript = substr($blockscript,$+[0]); my $lcscript = $1; $self->check_js_script($item, $lcscript); return 0 if $self->warn_long_lines($item, $lcscript); next; } # here we know that we have partial script. Do the check nevertheless # first check if we have the full