Bug#929729: lintian: \n in filenames cause "md5sum: ...: No such file or directory"

2019-07-09 Thread Chris Lamb
Hi Felix,

> The index files seem to be inspired by the 't' option in tar. My sense
> is that we need more encoding rather than less to preserve the meaning
> of whitespace, and especially newlines, in those files.

Sure thing. I guess unless we moved these to a "NUL"-terminated format
but that's going to be a little too annoying in Perl and non-intuitive
to boot.

> > adjust all the consumer/producers of that data to reflect that? The
> > current situation as I understand it is that some assume the former,
> > some the latter.
> 
> I believe there is only one of each. Please let me know if you found others.
[..]
> which uses the subroutine 'dequote_name' from here:

Sounds about right. But just to add: when I was looking at this issue
before I tried using exactly this routine in the producers & consumers
and IIRC this actually caused further breakage. YMMV. :)


Regards,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org 🍥 chris-lamb.co.uk
   `-



Bug#929729: lintian: \n in filenames cause "md5sum: ...: No such file or directory"

2019-07-08 Thread Chris Lamb
Russ Allbery wrote:;

> >> The name of the files and directories installed by binary packages
> >> outside the system PATH must be encoded in UTF-8 and should be
> >> restricted to ASCII when it is possible to do so.

I ACK all the other comments here, just to add that whilst I would
concede that Policy §10.10 states the above, it is of no great use if
Lintian "blows up"; we are meant to be checking for such Policy
violations in the first place...


Best wishes,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org 🍥 chris-lamb.co.uk
   `-



Bug#929729: lintian: \n in filenames cause "md5sum: ...: No such file or directory"

2019-07-08 Thread Felix Lechner
On Mon, Jul 8, 2019 at 2:35 PM Russ Allbery  wrote:
>
> I see that what you're
> trying to do is get the data deep enough into Lintian so that you can
> issue appropriate tags, and the problem is with the data representation in
> advance of tag processing, for which this isn't super-relevant.

The policy reference was very helpful. Thank you.



Bug#929729: lintian: \n in filenames cause "md5sum: ...: No such file or directory"

2019-07-08 Thread Russ Allbery
Felix Lechner  writes:
> On Mon, Jul 8, 2019 at 12:50 PM Russ Allbery  wrote:

>> The name of the files and directories installed by binary packages
>> outside the system PATH must be encoded in UTF-8 and should be
>> restricted to ASCII when it is possible to do so.

> And that reads like the tag 'file-name-is-not-valid-UTF-8'.

Yeah, apologies, I think I misunderstood your message and what you meant
by not looking at UTF-8 validity in Lintian.  I see that what you're
trying to do is get the data deep enough into Lintian so that you can
issue appropriate tags, and the problem is with the data representation in
advance of tag processing, for which this isn't super-relevant.

-- 
Russ Allbery (r...@debian.org)   



Bug#929729: lintian: \n in filenames cause "md5sum: ...: No such file or directory"

2019-07-08 Thread Felix Lechner
Hi Russ,

On Mon, Jul 8, 2019 at 12:50 PM Russ Allbery  wrote:
>
> Policy 10.10:

Thank you for the pointer to policy.

> The name of the files installed by binary packages in the system PATH
> (namely /bin, /sbin, /usr/bin, /usr/sbin and /usr/games) must be
> encoded in ASCII.

That policy appears to be enforced by checks/files.pm. It reads like
the description for the tag 'file-name-in-PATH-is-not-ASCII'.

> The name of the files and directories installed by binary packages
> outside the system PATH must be encoded in UTF-8 and should be
> restricted to ASCII when it is possible to do so.

And that reads like the tag 'file-name-is-not-valid-UTF-8'.

Kind regards
Felix



Bug#929729: lintian: \n in filenames cause "md5sum: ...: No such file or directory"

2019-07-08 Thread Felix Lechner
Hi Chris,

On Mon, Jul 8, 2019 at 12:42 PM Chris Lamb  wrote:
>
> Do we need any of these techniques? Can we decree that the index files
> are escaped or unescaped (I'd be +1 on the latter, mind)

The index files seem to be inspired by the 't' option in tar. My sense
is that we need more encoding rather than less to preserve the meaning
of whitespace, and especially newlines, in those files. I do not think
we can leave the filenames unescaped, if that is what you are
suggesting.

> and then
> adjust all the consumer/producers of that data to reflect that? The
> current situation as I understand it is that some assume the former,
> some the latter.

I believe there is only one of each. Please let me know if you found others.

This is the producer:

https://salsa.debian.org/lintian/lintian/blob/master/collection/unpacked#L96-100

and this is the consumer:

https://salsa.debian.org/lintian/lintian/blob/master/lib/Lintian/Collect/Package.pm#L518

which uses the subroutine 'dequote_name' from here:

https://salsa.debian.org/lintian/lintian/blob/master/lib/Lintian/Util.pm#L775-795



Bug#929729: lintian: \n in filenames cause "md5sum: ...: No such file or directory"

2019-07-08 Thread Russ Allbery
Felix Lechner  writes:

> Since filenames are arbitrary byte sequences that serve as valid
> identifiers in the file system, I am not sure it makes sense to impose
> UTF-8 validation in Lintian.

Policy 10.10:

The name of the files installed by binary packages in the system PATH
(namely /bin, /sbin, /usr/bin, /usr/sbin and /usr/games) must be
encoded in ASCII.

The name of the files and directories installed by binary packages
outside the system PATH must be encoded in UTF-8 and should be
restricted to ASCII when it is possible to do so.

-- 
Russ Allbery (r...@debian.org)   



Bug#929729: lintian: \n in filenames cause "md5sum: ...: No such file or directory"

2019-07-08 Thread Chris Lamb
Hi Felix,

> After a cursory review of filename handling in Lintian's index files I
> can think of three possible solutions:
> 
> 1. Use String::Escape in both directions to make sure the
> transformation is reversible. (It currently isn't.)
> 2. Use Base64 to encode file names in the index.
> 3. Use an embedded DB such as SQLite in collections.

Do we need any of these techniques? Can we decree that the index files
are escaped or unescaped (I'd be +1 on the latter, mind) and then
adjust all the consumer/producers of that data to reflect that? The
current situation as I understand it is that some assume the former,
some the latter.

> > Sure thing. (I wonder whether we should also check for (at least) \t
> > and possibly even *invalid* unicode characters; those are a great way
> > to make programs blow up.)
> 
> Since filenames are arbitrary byte sequences that serve as valid
> identifiers in the file system, I am not sure it makes sense to impose
> UTF-8 validation in Lintian.

Exactly — this is what I was trying to say, sorry..


Regards,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org 🍥 chris-lamb.co.uk
   `-



Bug#929729: lintian: \n in filenames cause "md5sum: ...: No such file or directory"

2019-07-08 Thread Felix Lechner
On Sun, Jul 7, 2019 at 7:33 AM Chris Lamb  wrote:
>

After a cursory review of filename handling in Lintian's index files I
can think of three possible solutions:

1. Use String::Escape in both directions to make sure the
transformation is reversible. (It currently isn't.)
2. Use Base64 to encode file names in the index.
3. Use an embedded DB such as SQLite in collections.

I am happy to contribute patches if we can agree on a way forward.

> Sure thing. (I wonder whether we should also check for (at least) \t
> and possibly even *invalid* unicode characters; those are a great way
> to make programs blow up.)

Since filenames are arbitrary byte sequences that serve as valid
identifiers in the file system, I am not sure it makes sense to impose
UTF-8 validation in Lintian.



Bug#929729: lintian: \n in filenames cause "md5sum: ...: No such file or directory"

2019-07-07 Thread Chris Lamb
Hi Felix,

[…]

> > … but this is clearly hacking around the problem and is likely
> > incomplete. Storing the newline literally in the internal structure
> > breaks other things that I can't immediately see/fix.
> 
> I agree. I rewrote the collections using IO::Async (coming once buster
> is out) and can look at the newline issue over the next few days, if
> you are okay with that.

Sure thing. (I wonder whether we should also check for (at least) \t
and possibly even *invalid* unicode characters; those are a great way
to make programs blow up.)


Regards,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org 🍥 chris-lamb.co.uk
   `-



Bug#929729: lintian: \n in filenames cause "md5sum: ...: No such file or directory"

2019-07-06 Thread Felix Lechner
Hi Chris,

On Sat, Jul 6, 2019 at 7:21 PM Chris Lamb  wrote:
>
> The following "fixes" it:
>
> diff --git a/collection/md5sums b/collection/md5sums
> index 970eb0656..e8006ab10 100755
> --- a/collection/md5sums
> +++ b/collection/md5sums
> @@ -53,7 +53,8 @@ sub collect {
>
>  foreach my $file ($info->sorted_index) {
>  next unless $file->is_file;
> +$file =~ s,\\n,\n,g;
>  printf {$opts{pipe_in}} "%s\0", $file;
>  }
>
>  close($opts{pipe_in});
> diff --git a/lib/Lintian/Path.pm b/lib/Lintian/Path.pm
> index 108a18ede..fb8048ed5 100644
> --- a/lib/Lintian/Path.pm
> +++ b/lib/Lintian/Path.pm
> @@ -643,6 +643,7 @@ sub open {
>  $layer //= '';
>  my $opener = sub {
>  use autodie qw(open);
> +$_[0] =~ s,\\n,\n,g;
>  open(my $fd, "<${layer}", $_[0]);
>  return $fd;
>  };
>
> … but this is clearly hacking around the problem and is likely
> incomplete. Storing the newline literally in the internal structure
> breaks other things that I can't immediately see/fix.

I agree. I rewrote the collections using IO::Async (coming once buster
is out) and can look at the newline issue over the next few days, if
you are okay with that.

Kind regards,
Felix



Bug#929729: lintian: \n in filenames cause "md5sum: ...: No such file or directory"

2019-07-06 Thread Chris Lamb
Chris Lamb wrote:

> However, I can reproduce with your previously attached .deb:
> 
> $ lintian ~/Downloads/newline_1_all.deb 2>&1 | head -n2
> md5sum: 'usr/share/newline/\n/etc/issue': No such file or directory
> command failed with error code 123 at 
> /home/lamby/git/debian/lintian/lintian/lib/Lintian/Command.pm line 344.

The following "fixes" it:

diff --git a/collection/md5sums b/collection/md5sums
index 970eb0656..e8006ab10 100755
--- a/collection/md5sums
+++ b/collection/md5sums
@@ -53,7 +53,8 @@ sub collect {
 
 foreach my $file ($info->sorted_index) {
 next unless $file->is_file;
+$file =~ s,\\n,\n,g;
 printf {$opts{pipe_in}} "%s\0", $file;
 }
 
 close($opts{pipe_in});
diff --git a/lib/Lintian/Path.pm b/lib/Lintian/Path.pm
index 108a18ede..fb8048ed5 100644
--- a/lib/Lintian/Path.pm
+++ b/lib/Lintian/Path.pm
@@ -643,6 +643,7 @@ sub open {
 $layer //= '';
 my $opener = sub {
 use autodie qw(open);
+$_[0] =~ s,\\n,\n,g;
 open(my $fd, "<${layer}", $_[0]);
 return $fd;
 };

… but this is clearly hacking around the problem and is likely
incomplete. Storing the newline literally in the internal structure
breaks other things that I can't immediately see/fix.


Regards,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org 🍥 chris-lamb.co.uk
   `-



Bug#929729: lintian: \n in filenames cause "md5sum: ...: No such file or directory"

2019-07-06 Thread Chris Lamb
Hi Jakub,

> Newlines in filenames make Lintian very unhappy:

[..]

Curiously, I can't seem to reproduce this when rebuilding the .deb
from your Git repository and dpkg 1.19.7:

$ dpkg-buildpackage --version | head -n1
Debian dpkg-buildpackage version 1.19.7.

$ dpkg-buildpackage -uc -us
[…]

$ lintian ../newline_1_all.deb
E: newline: no-copyright-file
E: newline: extended-description-is-empty

$ echo $?
1

> The source package for this deb is here:
> https://github.com/jwilk/newline.deb
> You will probably need very old dpkg (<< 1.18.1) to build it; see 
> #929727.

Looks like this was fixed in 1.19.7, ie. my current version.

However, I can reproduce with your previously attached .deb:

$ lintian ~/Downloads/newline_1_all.deb 2>&1 | head -n2
md5sum: 'usr/share/newline/\n/etc/issue': No such file or directory
command failed with error code 123 at 
/home/lamby/git/debian/lintian/lintian/lib/Lintian/Command.pm line 344.
 

Regards,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org 🍥 chris-lamb.co.uk
   `-



Bug#929729: lintian: \n in filenames cause "md5sum: ...: No such file or directory"

2019-05-29 Thread Jakub Wilk

Package: lintian
Version: 2.15.0
Severity: minor

Newlines in filenames make Lintian very unhappy:

  $ lintian newline_1_all.deb
  md5sum: 'usr/share/newline/\n/etc/issue': No such file or directory
  command failed with error code 123 at /usr/share/perl5/Lintian/Command.pm 
line 344.
Lintian::Command::reap(HASH(0x57efd804)) called at 
/usr/share/lintian/collection/md5sums line 60
Lintian::coll::md5sums::collect("newline", "binary", 
"/tmp/temp-lintian-lab-oLpiOKhXw6/pool/n/newline/newline_1_all"...) called at 
/usr/share/perl5/Lintian/CollScript.pm line 227
Lintian::CollScript::collect(Lintian::CollScript=HASH(0x582cec54), "newline", 
"binary", "/tmp/temp-lintian-lab-oLpiOKhXw6/pool/n/newline/newline_1_all"...) called at 
/usr/share/perl5/Lintian/Unpacker.pm line 396
eval {...} called at /usr/share/perl5/Lintian/Unpacker.pm line 396
Lintian::Unpacker::__ANON__() called at 
/usr/share/perl5/IO/Async/Loop.pm line 2109
eval {...} called at /usr/share/perl5/IO/Async/Loop.pm line 2109
IO::Async::Loop::fork(IO::Async::Loop::Poll=HASH(0x59142200), "code", 
CODE(0x5915e704), "on_exit", CODE(0x5914c430)) called at 
/usr/share/perl5/Lintian/Unpacker.pm line 444
eval {...} called at /usr/share/perl5/Lintian/Unpacker.pm line 369
Lintian::Unpacker::__ANON__("md5sums-binary:newline/1/all", 
Lintian::CollScript=HASH(0x582cec54), Lintian::Lab::Entry=HASH(0x57d40bac), 
Lintian::DepMap::Properties=HASH(0x590f4e18)) called at 
/usr/share/perl5/Lintian/Unpacker.pm line 436
Lintian::Unpacker::__ANON__(1750, 0) called at 
/usr/share/perl5/IO/Async/Loop.pm line 2770
IO::Async::Loop::_reap_children(HASH(0x59118dc8)) called at 
/usr/share/perl5/IO/Async/Loop.pm line 2829
IO::Async::Loop::__ANON__() called at /usr/share/perl5/IO/Async/Loop.pm 
line 805
IO::Async::Loop::__ANON__() called at /usr/share/perl5/IO/Async/OS.pm 
line 577
IO::Async::OS::_Base::__ANON__(IO::Async::Handle=HASH(0x59119688)) 
called at /usr/share/perl5/IO/Async/Loop/Poll.pm line 172

IO::Async::Loop::Poll::post_poll(IO::Async::Loop::Poll=HASH(0x59142200)) called 
at /usr/share/perl5/IO/Async/Loop/Poll.pm line 285

IO::Async::Loop::Poll::loop_once(IO::Async::Loop::Poll=HASH(0x59142200), undef) 
called at /usr/share/perl5/IO/Async/Loop.pm line 524
IO::Async::Loop::run(IO::Async::Loop::Poll=HASH(0x59142200)) called at 
/usr/share/perl5/Lintian/Unpacker.pm line 463
Lintian::Unpacker::process_tasks(Lintian::Unpacker=HASH(0x57f3fe94), 
HASH(0x57aba954)) called at /usr/share/lintian/commands/lintian.pm line 949
main::unpack_group("newline/1", 
Lintian::ProcessableGroup=HASH(0x57d40904)) called at 
/usr/share/lintian/commands/lintian.pm line 731
main::__ANON__() called at /usr/share/lintian/commands/lintian.pm line 
1645
main::timed_task(CODE(0x5912bc18)) called at 
/usr/share/lintian/commands/lintian.pm line 734
main::__ANON__() called at /usr/share/lintian/commands/lintian.pm line 
1645
main::timed_task(CODE(0x5912b394)) called at 
/usr/share/lintian/commands/lintian.pm line 767
main::main() called at /usr/bin/lintian line 46
eval {...} called at /usr/bin/lintian line 46
main::__ANON__("/usr/share/lintian/commands/lintian.pm") called at 
/usr/bin/lintian line 114
dplint::run_tool("/usr/bin/lintian", "lintian") called at 
/usr/bin/lintian line 290
dplint::main() called at /usr/bin/lintian line 359
  warning: collect info md5sums about package newline failed (512)
  warning: skipping check of binary package newline


The source package for this deb is here:
https://github.com/jwilk/newline.deb
You will probably need very old dpkg (<< 1.18.1) to build it; see 
#929727.



-- System Information:
Architecture: i386

Versions of packages lintian depends on:
ii  binutils   2.31.1-16
ii  bzip2  1.0.6-9
ii  diffstat   1.62-1
ii  dpkg   1.19.6
ii  dpkg-dev   1.19.6
ii  file   1:5.35-4
ii  gettext0.19.8.1-9
ii  gpg2.2.13-2
ii  intltool-debian0.35.0+20060710.5
ii  libapt-pkg-perl0.1.34+b1
ii  libarchive-zip-perl1.64-1
ii  libcapture-tiny-perl   0.48-1
ii  libcgi-pm-perl 4.40-1
ii  libclass-accessor-perl 0.51-1
ii  libclone-perl  0.41-1+b1
un  libdigest-sha-perl 
ii  libdpkg-perl   1.19.6
ii  libemail-valid-perl1.202-1
ii  libfile-basedir-perl   0.08-1
ii  libio-async-perl   0.72-1
ii  libipc-run-perl20180523.0-1
ii  liblist-moreutils-perl 0.416-1+b4
ii  libparse-debianchangelog-perl  1.2.0-13
ii  libpath-tiny-perl  0.108-1
ii  libtext-levenshtein-perl   0.13-1
ii  libtimedate-perl