Bug#1019980: lintian: source-is-missing check for HTML is much too sensitive

2024-02-09 Thread Colin Watson
On Thu, Feb 08, 2024 at 06:39:18PM +, Bastien Roucariès wrote:
> Are you sure it is not embdeded base64 encoded png or minified javascript* ?

Yes, I'm absolutely certain.

> If not we could try to know why it choke ?  

I already gave a full explanation of this in my first message, which for
some reason people are ignoring:

"""
So it issues a diagnostic for every HTML file with a somewhat long line
(over 512 characters) unless it has an associated .fragment.js somewhere
"""

The HTML files it's issuing a diagnostic on here are perfectly innocuous
and readable.  Here's an example of one of the "offending" lines:

  In version 0.51 and before, local echo could not be separated from local line 
editing (where you type a line of text locally, and it is not sent to the 
server until you press Return, so you have the chance to edit it and correct 
mistakes before the server sees it). New in version 0.52, local echo 
and local line editing are separate options, and by default PuTTY will try to 
determine automatically whether to enable them or not, based on which protocol 
you have selected and also based on hints from the server. If you have a 
problem with PuTTY's default choice, you can force each option to be enabled or 
disabled as you choose. The controls are in the Terminal panel, in the section 
marked Line discipline options.

I mean, come on.  Sure, there are a couple of character entities (which
have nothing to do with the diagnostic here anyway), but otherwise you
can't tell me with a straight face that that's some kind of obscure
compiled format; I would have written it exactly the same way by hand
except for the word-wrapping.

> Another alternative if we could determine the file was compiled by halibut, 
> we could demote to pedantic warning 
> and ask to repack in order to be sure to recompile from source.

Or we could fix the ridiculously-oversensitive diagnostic.

On the matter of repacking (which I will not do in this case), please
see my comment in
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1019980#15.

-- 
Colin Watson (he/him)  [cjwat...@debian.org]



Bug#1019980: lintian: source-is-missing check for HTML is much too sensitive

2024-02-08 Thread Bill Allombert
On Thu, Feb 08, 2024 at 08:27:40PM +, Bastien Roucariès wrote:
> > > > > > source package, though I can't see how Lintian could possibly 
> > > > > > expect to
> > > > > > know that.
> > > 
> > > Are you sure it is not embdeded base64 encoded png or minified 
> > > javascript* ?
> > > 
> > > If not we could try to know why it choke ?  
> > > 
> > > In this particular case, it is the source package that choke. If halibut 
> > > include the name of the source
> > > in the html we could magically remove the source is missing warnings.
> > > 
> > > Another alternative if we could determine the file was compiled by 
> > > halibut, we could demote to pedantic warning 
> > > and ask to repack in order to be sure to recompile from source.
> > 
> > There are far too many different HTML generators out there to handle.
> 
> We have done this for doxyen and sphinx, so maybe not for more

This is two out of how many  ? 

For example, my packages use TtH, GAPDoc, hevea, pod2html.

I do not think it is sustainable.

Cheers,
-- 
Bill. 

Imagine a large red swirl here. 



Bug#1019980: lintian: source-is-missing check for HTML is much too sensitive

2024-02-08 Thread Bastien Roucariès
Le jeudi 8 février 2024, 19:57:22 UTC Bill Allombert a écrit :
> On Thu, Feb 08, 2024 at 06:39:18PM +, Bastien Roucariès wrote:
> > Le jeudi 8 février 2024, 18:31:28 UTC Santiago Ruano Rincón a écrit :
> > > On Sat, 14 Oct 2023 20:23:18 +0200 Bill Allombert  
> > > wrote:
> > > > On Sun, Sep 18, 2022 at 12:14:07AM +0100, Colin Watson wrote:
> > > > > Package: lintian
> > > > > Version: 2.115.3
> > > > > Severity: normal
> > > > > 
> > > > > Lintian issues these errors for putty 0.77-1:
> > > > > 
> > > > >   E: putty source: source-is-missing [doc/html/AppendixA.html]
> > > > >   E: putty source: source-is-missing [doc/html/AppendixB.html]
> > > > >   E: putty source: source-is-missing [doc/html/AppendixE.html]
> > > > >   E: putty source: source-is-missing [doc/html/Chapter10.html]
> > > > >   E: putty source: source-is-missing [doc/html/Chapter2.html]
> > > > >   E: putty source: source-is-missing [doc/html/Chapter3.html]
> > > > >   E: putty source: source-is-missing [doc/html/Chapter4.html]
> > > > >   E: putty source: source-is-missing [doc/html/Chapter5.html]
> > > > >   E: putty source: source-is-missing [doc/html/Chapter7.html]
> > > > >   E: putty source: source-is-missing [doc/html/Chapter8.html]
> > > > >   E: putty source: source-is-missing [doc/html/Chapter9.html]
> > > > >   E: putty source: source-is-missing [doc/html/IndexPage.html]
> > > > > 
> > > > > This is pretty oversensitive.  Firstly, it's HTML, which is still 
> > > > > often
> > > > > enough written by hand anyway.  As it happens, these particular HTML
> > > > > files are generated from halibut input that's also provided in the
> > > > > source package, though I can't see how Lintian could possibly expect 
> > > > > to
> > > > > know that.
> > 
> > Are you sure it is not embdeded base64 encoded png or minified javascript* ?
> > 
> > If not we could try to know why it choke ?  
> > 
> > In this particular case, it is the source package that choke. If halibut 
> > include the name of the source
> > in the html we could magically remove the source is missing warnings.
> > 
> > Another alternative if we could determine the file was compiled by halibut, 
> > we could demote to pedantic warning 
> > and ask to repack in order to be sure to recompile from source.
> 
> There are far too many different HTML generators out there to handle.

We have done this for doxyen and sphinx, so maybe not for more
> You would need to define a standard way to indicate the path to the source in
> the generated file.
> But some generator authors might consider this is an inacceptable data leak, 
> so
> this would only be done if some environment variable is defined.
for doxygen or sphinx we only detect some string in html file and whitelist

Generared by something will work

Moreover adding missing-source override like could be done be done by adding 
manualy a symlink debian/missing-sources/  fullname pointing to the righ 
location.

We also magically search know source by using some heurtistic in 
SourceMissing.pm

So the basic framework is here, we only need to add more rules

Bastien


> 
> In the short term, I suggest to disable it since there is no policy 
> requirement
> for the source code to be in a particular path, so it is not an error.
> 
> At the very least, it should not be generated more than once per package.
> 
> Cheers,
> 



signature.asc
Description: This is a digitally signed message part.


Bug#1019980: lintian: source-is-missing check for HTML is much too sensitive

2024-02-08 Thread Bill Allombert
On Thu, Feb 08, 2024 at 06:39:18PM +, Bastien Roucariès wrote:
> Le jeudi 8 février 2024, 18:31:28 UTC Santiago Ruano Rincón a écrit :
> > On Sat, 14 Oct 2023 20:23:18 +0200 Bill Allombert  
> > wrote:
> > > On Sun, Sep 18, 2022 at 12:14:07AM +0100, Colin Watson wrote:
> > > > Package: lintian
> > > > Version: 2.115.3
> > > > Severity: normal
> > > > 
> > > > Lintian issues these errors for putty 0.77-1:
> > > > 
> > > >   E: putty source: source-is-missing [doc/html/AppendixA.html]
> > > >   E: putty source: source-is-missing [doc/html/AppendixB.html]
> > > >   E: putty source: source-is-missing [doc/html/AppendixE.html]
> > > >   E: putty source: source-is-missing [doc/html/Chapter10.html]
> > > >   E: putty source: source-is-missing [doc/html/Chapter2.html]
> > > >   E: putty source: source-is-missing [doc/html/Chapter3.html]
> > > >   E: putty source: source-is-missing [doc/html/Chapter4.html]
> > > >   E: putty source: source-is-missing [doc/html/Chapter5.html]
> > > >   E: putty source: source-is-missing [doc/html/Chapter7.html]
> > > >   E: putty source: source-is-missing [doc/html/Chapter8.html]
> > > >   E: putty source: source-is-missing [doc/html/Chapter9.html]
> > > >   E: putty source: source-is-missing [doc/html/IndexPage.html]
> > > > 
> > > > This is pretty oversensitive.  Firstly, it's HTML, which is still often
> > > > enough written by hand anyway.  As it happens, these particular HTML
> > > > files are generated from halibut input that's also provided in the
> > > > source package, though I can't see how Lintian could possibly expect to
> > > > know that.
> 
> Are you sure it is not embdeded base64 encoded png or minified javascript* ?
> 
> If not we could try to know why it choke ?  
> 
> In this particular case, it is the source package that choke. If halibut 
> include the name of the source
> in the html we could magically remove the source is missing warnings.
> 
> Another alternative if we could determine the file was compiled by halibut, 
> we could demote to pedantic warning 
> and ask to repack in order to be sure to recompile from source.

There are far too many different HTML generators out there to handle.
You would need to define a standard way to indicate the path to the source in
the generated file.
But some generator authors might consider this is an inacceptable data leak, so
this would only be done if some environment variable is defined.

In the short term, I suggest to disable it since there is no policy requirement
for the source code to be in a particular path, so it is not an error.

At the very least, it should not be generated more than once per package.

Cheers,
-- 
Bill. 

Imagine a large red swirl here.



Bug#1019980: lintian: source-is-missing check for HTML is much too sensitive

2024-02-08 Thread Bastien Roucariès
Le jeudi 8 février 2024, 18:31:28 UTC Santiago Ruano Rincón a écrit :
> On Sat, 14 Oct 2023 20:23:18 +0200 Bill Allombert  wrote:
> > On Sun, Sep 18, 2022 at 12:14:07AM +0100, Colin Watson wrote:
> > > Package: lintian
> > > Version: 2.115.3
> > > Severity: normal
> > > 
> > > Lintian issues these errors for putty 0.77-1:
> > > 
> > >   E: putty source: source-is-missing [doc/html/AppendixA.html]
> > >   E: putty source: source-is-missing [doc/html/AppendixB.html]
> > >   E: putty source: source-is-missing [doc/html/AppendixE.html]
> > >   E: putty source: source-is-missing [doc/html/Chapter10.html]
> > >   E: putty source: source-is-missing [doc/html/Chapter2.html]
> > >   E: putty source: source-is-missing [doc/html/Chapter3.html]
> > >   E: putty source: source-is-missing [doc/html/Chapter4.html]
> > >   E: putty source: source-is-missing [doc/html/Chapter5.html]
> > >   E: putty source: source-is-missing [doc/html/Chapter7.html]
> > >   E: putty source: source-is-missing [doc/html/Chapter8.html]
> > >   E: putty source: source-is-missing [doc/html/Chapter9.html]
> > >   E: putty source: source-is-missing [doc/html/IndexPage.html]
> > > 
> > > This is pretty oversensitive.  Firstly, it's HTML, which is still often
> > > enough written by hand anyway.  As it happens, these particular HTML
> > > files are generated from halibut input that's also provided in the
> > > source package, though I can't see how Lintian could possibly expect to
> > > know that.

Are you sure it is not embdeded base64 encoded png or minified javascript* ?

If not we could try to know why it choke ?  

In this particular case, it is the source package that choke. If halibut 
include the name of the source
in the html we could magically remove the source is missing warnings.

Another alternative if we could determine the file was compiled by halibut, we 
could demote to pedantic warning 
and ask to repack in order to be sure to recompile from source.

Thanks
> > 
> > Dear Lintian maintainers,
> > 
> > This test is causing hundreds of false positive and should be disabled as
> > soon as possible. This is a huge waste of time for everybody.
> > 
> > If you need help with that, please tell me, I have worked on lintian in the 
> > past.
> 
> Dear Lintian maintainers,
> 
> I cannot offer the same help as ballombe, but I also find it would help
> to disable these errors. At least, could they be "demoted" to warnings?


> Thanks in advance,
> 
> Santiago
> 



signature.asc
Description: This is a digitally signed message part.


Bug#1019980: lintian: source-is-missing check for HTML is much too sensitive

2024-02-08 Thread Santiago Ruano Rincón
On Sat, 14 Oct 2023 20:23:18 +0200 Bill Allombert  wrote:
> On Sun, Sep 18, 2022 at 12:14:07AM +0100, Colin Watson wrote:
> > Package: lintian
> > Version: 2.115.3
> > Severity: normal
> > 
> > Lintian issues these errors for putty 0.77-1:
> > 
> >   E: putty source: source-is-missing [doc/html/AppendixA.html]
> >   E: putty source: source-is-missing [doc/html/AppendixB.html]
> >   E: putty source: source-is-missing [doc/html/AppendixE.html]
> >   E: putty source: source-is-missing [doc/html/Chapter10.html]
> >   E: putty source: source-is-missing [doc/html/Chapter2.html]
> >   E: putty source: source-is-missing [doc/html/Chapter3.html]
> >   E: putty source: source-is-missing [doc/html/Chapter4.html]
> >   E: putty source: source-is-missing [doc/html/Chapter5.html]
> >   E: putty source: source-is-missing [doc/html/Chapter7.html]
> >   E: putty source: source-is-missing [doc/html/Chapter8.html]
> >   E: putty source: source-is-missing [doc/html/Chapter9.html]
> >   E: putty source: source-is-missing [doc/html/IndexPage.html]
> > 
> > This is pretty oversensitive.  Firstly, it's HTML, which is still often
> > enough written by hand anyway.  As it happens, these particular HTML
> > files are generated from halibut input that's also provided in the
> > source package, though I can't see how Lintian could possibly expect to
> > know that.
> 
> Dear Lintian maintainers,
> 
> This test is causing hundreds of false positive and should be disabled as
> soon as possible. This is a huge waste of time for everybody.
> 
> If you need help with that, please tell me, I have worked on lintian in the 
> past.

Dear Lintian maintainers,

I cannot offer the same help as ballombe, but I also find it would help
to disable these errors. At least, could they be "demoted" to warnings?

Thanks in advance,

Santiago


signature.asc
Description: PGP signature


Bug#1019980: lintian: source-is-missing check for HTML is much too sensitive

2023-10-14 Thread Bill Allombert
On Sun, Sep 18, 2022 at 12:14:07AM +0100, Colin Watson wrote:
> Package: lintian
> Version: 2.115.3
> Severity: normal
> 
> Lintian issues these errors for putty 0.77-1:
> 
>   E: putty source: source-is-missing [doc/html/AppendixA.html]
>   E: putty source: source-is-missing [doc/html/AppendixB.html]
>   E: putty source: source-is-missing [doc/html/AppendixE.html]
>   E: putty source: source-is-missing [doc/html/Chapter10.html]
>   E: putty source: source-is-missing [doc/html/Chapter2.html]
>   E: putty source: source-is-missing [doc/html/Chapter3.html]
>   E: putty source: source-is-missing [doc/html/Chapter4.html]
>   E: putty source: source-is-missing [doc/html/Chapter5.html]
>   E: putty source: source-is-missing [doc/html/Chapter7.html]
>   E: putty source: source-is-missing [doc/html/Chapter8.html]
>   E: putty source: source-is-missing [doc/html/Chapter9.html]
>   E: putty source: source-is-missing [doc/html/IndexPage.html]
> 
> This is pretty oversensitive.  Firstly, it's HTML, which is still often
> enough written by hand anyway.  As it happens, these particular HTML
> files are generated from halibut input that's also provided in the
> source package, though I can't see how Lintian could possibly expect to
> know that.

Dear Lintian maintainers,

This test is causing hundreds of false positive and should be disabled as
soon as possible. This is a huge waste of time for everybody.

If you need help with that, please tell me, I have worked on lintian in the 
past.

Cheers,
-- 
Bill. 

Imagine a large red swirl here. 


signature.asc
Description: PGP signature


Bug#1019980: lintian: source-is-missing check for HTML is much too sensitive

2022-09-18 Thread Colin Watson
On Sun, Sep 18, 2022 at 08:11:03AM +0800, Paul Wise wrote:
> I think the right thing for putty here is for upstream to remove the
> HTML from their VCS and tarballs, then add the generation process to
> their build system and continuous integration, so that they always know
> when there are problems with generating the HTML.

The HTML files have never been in PuTTY upstream's VCS.  They are
generated automatically as part of PuTTY's build system for release
tarballs, as a convenience to people who want to build PuTTY without
Halibut, since it's a somewhat niche documentation tool.  Since I agree
with upstream that this is a reasonable convenience, I'm not going to
ask them to stop doing it.

> If they refuse then you could exclude the HTML from Debian's copy of
> the upstream tarball.

We're not talking about opaque object code here.  This is perfectly
readable plain HTML that just happens to be generated from another
perfectly readable text format.  It's not the preferred form of
modification, sure (I wouldn't edit it directly since I have the Halibut
input files available, but if nobody told me that those existed then I'd
happily edit the HTML without even noticing), but this package isn't
covered by the GPL so that's not very relevant.

I'm not going to waste a second on editing Debian's copy of the upstream
tarball for this complete non-issue.  I already take care to ensure that
the package rebuilds the documentation from source, and there's no DFSG
issue with the pre-generated files being present so there's no reason to
remove them from the tarball.  The only reason that the presence of
pre-generated files is even coming up is because Lintian's heuristics
are misfiring in a way that seems clearly incorrect and probably
unintentional.

> Until either lintian changes or the putty HTML gets removed, overriding
> the lintian warning in putty seems the correct thing to do.

Done.

> If that is done, I think lintian should add more heuristics to detect
> other generated HTML. The halibut generated HTML doesn't make that easy
> but there are some signals that can be added I think, like this:
> 
>halibut-1.3/bk_html.c: html_raw(, 

Bug#1019980: lintian: source-is-missing check for HTML is much too sensitive

2022-09-17 Thread Paul Wise
On Sun, 18 Sep 2022 00:14:07 +0100 Colin Watson wrote:

> This is pretty oversensitive.  Firstly, it's HTML, which is still often
> enough written by hand anyway.  As it happens, these particular HTML
> files are generated from halibut input that's also provided in the
> source package, though I can't see how Lintian could possibly expect to
> know that.

I am not a lintian maintainer, but:

HTML is very often generated and there are many different ways to
generate it. I think the right thing for lintian to do here is to know
about more of the source formats and when there is generated HTML in
the tarball but source is also present, then emit a new lower severity
generated-files tag instead of the existing source-is-missing tag.

I think the right thing for putty here is for upstream to remove the
HTML from their VCS and tarballs, then add the generation process to
their build system and continuous integration, so that they always know
when there are problems with generating the HTML. If they refuse then
you could exclude the HTML from Debian's copy of the upstream tarball.

Until either lintian changes or the putty HTML gets removed, overriding
the lintian warning in putty seems the correct thing to do.

PS: I note that manual pages are similar to HTML in this regard and I
think the same reasoning above applies to the putty manual pages and to
lintian's treatment of manual pages in source packages.

> I suggest restoring something like this code to check for 

Bug#1019980: lintian: source-is-missing check for HTML is much too sensitive

2022-09-17 Thread Colin Watson
Package: lintian
Version: 2.115.3
Severity: normal

Lintian issues these errors for putty 0.77-1:

  E: putty source: source-is-missing [doc/html/AppendixA.html]
  E: putty source: source-is-missing [doc/html/AppendixB.html]
  E: putty source: source-is-missing [doc/html/AppendixE.html]
  E: putty source: source-is-missing [doc/html/Chapter10.html]
  E: putty source: source-is-missing [doc/html/Chapter2.html]
  E: putty source: source-is-missing [doc/html/Chapter3.html]
  E: putty source: source-is-missing [doc/html/Chapter4.html]
  E: putty source: source-is-missing [doc/html/Chapter5.html]
  E: putty source: source-is-missing [doc/html/Chapter7.html]
  E: putty source: source-is-missing [doc/html/Chapter8.html]
  E: putty source: source-is-missing [doc/html/Chapter9.html]
  E: putty source: source-is-missing [doc/html/IndexPage.html]

This is pretty oversensitive.  Firstly, it's HTML, which is still often
enough written by hand anyway.  As it happens, these particular HTML
files are generated from halibut input that's also provided in the
source package, though I can't see how Lintian could possibly expect to
know that.

I tried to work out whether I should be overriding this or whether it's
a bug in Lintian, and I think it's the latter.  The current relevant
code is this in lib/Lintian/Check/Files/SourceMissing.pm:

sub visit_patched_files {
my ($self, $item) = @_;

return
  unless $item->is_file;
[...]
return
  if !defined $longest || $line_length{$longest} <= 
$VERY_LONG_LINE_LENGTH;
[...]

if ($item->basename =~ /\.(?:x?html?\d?|xht)$/i) {

# html file
$self->pointed_hint('source-is-missing', $item->pointer)
  unless $self->find_source($item, {'.fragment.js' => $DOLLAR});
}

return;
}

So it issues a diagnostic for every HTML file with a somewhat long line
(over 512 characters) unless it has an associated .fragment.js somewhere
(I think - the find_source sub is undocumented and a bit obscure to me)?
That doesn't sound right - surely that would catch far too many false
positives.

Next, I went looking through git history to try to figure out where this
was introduced.  I found this commit:

  https://salsa.debian.org/lintian/lintian/-/commit/4f24ab7fca

The commit message makes it sound as though it was probably just
refactoring, but it wasn't.  The corresponding bit of code there was
previously in a warn_prebuilt_javascript sub called from a
warn_long_lines sub, which in turn was called in two places: once for
certain kinds of .js files, and once from this sub:

# check javascript in html file
sub check_html_cruft {
my ($self, $item, $lowercase) = @_;

my $blockscript = $lowercase;
my $indexscript;

while (($indexscript = index($blockscript, ' 
$ITEM_NOT_FOUND) {

$blockscript = substr($blockscript,$indexscript);

# sourced script ok
if ($blockscript =~ m{\A]*?src="[^"]+?"[^>]*?>}sm) {

$blockscript = substr($blockscript,$+[0]);
next;
}

# extract script
if ($blockscript =~ m{]*?>(.*?)}sm) {

$blockscript = substr($blockscript,$+[0]);

my $lcscript = $1;
$self->check_js_script($item, $lcscript);

return 0
  if $self->warn_long_lines($item, $lcscript);

next;
}

# here we know that we have partial script. Do the check 
nevertheless
# first check if we have the full