Re: MD5 listing format

2021-11-28 Thread David Wright
On Sun 28 Nov 2021 at 23:40:43 (+0100), Thomas Schmitt wrote:
> David Wright wrote:
> > My solution will be to rename or delete the file,
> > after adding code to detect any future occurrences.
> 
> Or you could avoid to show md5sum a file name.
> 
>   path=...as.weird.as.is...
>   md5sum <"$path"
> 
> will yield something like
> 
>   b8ce6ed30aa67e94ad9276c9ac2bbc50  -
> 
> If you need a file name in the result line, then you can change the "-"
> to the file name in a form which you like.

I came across this when I googled the filename:
https://askubuntu.com/questions/144408/what-is-the-file-c-nppdf32log-debuglog-txt
and I would imagine that it's likely related to the origin of my file.
It's the first file in a decade to cause that escaping mechanism,
and only got captured because I dumped the entire /home of a machine
I decommissioned a while back.

Your workaround is probably useful for really pathological filenames,
but doesn't scale well for my use case. (There are 356866 lines in
the MD5SUMS file concerned.) Thanks.

Cheers,
David.



Re: MD5 listing format

2021-11-28 Thread Thomas Schmitt
Hi,

David Wright wrote:
> My solution will be to rename or delete the file,
> after adding code to detect any future occurrences.

Or you could avoid to show md5sum a file name.

  path=...as.weird.as.is...
  md5sum <"$path"

will yield something like

  b8ce6ed30aa67e94ad9276c9ac2bbc50  -

If you need a file name in the result line, then you can change the "-"
to the file name in a form which you like.


Have a nice day :)

Thomas



Re: MD5 listing format

2021-11-28 Thread David Wright
On Sun 28 Nov 2021 at 20:21:15 (+0100), Thomas Schmitt wrote:
> David Wright wrote:
> > I was taken by surprise by the following output from md5sum:
> > \adfc1d2f1b1d6c7fcaa51e857c1a6f68  special/C:\\nppdf32Log\\debuglog.txt
> 
> It's a feature, not a bug. (tm)
> 
>   
> https://www.gnu.org/software/coreutils/manual/html_node/md5sum-invocation.html
> 
>   "Without --zero, if file contains a backslash, newline, or carriage
>return, the line is started with a backslash, and each problematic
>character in the file name is escaped with a backslash, making the
>output unambiguous even in the presence of arbitrary file names."

Ah, I missed that. Ironic considering that I feed md5sum with -print0
and -0 options, but then, md5sum is my final "product". I guess it's
required for md5sum -c to work correctly.

Thanks to Greg too. My solution will be to rename or delete the file,
after adding code to detect any future occurrences.

Cheers,
David.



Re: MD5 listing format

2021-11-28 Thread Thomas Schmitt
Hi,

David Wright wrote:
> I was taken by surprise by the following output from md5sum:
> \adfc1d2f1b1d6c7fcaa51e857c1a6f68  special/C:\\nppdf32Log\\debuglog.txt

It's a feature, not a bug. (tm)

  https://www.gnu.org/software/coreutils/manual/html_node/md5sum-invocation.html

  "Without --zero, if file contains a backslash, newline, or carriage
   return, the line is started with a backslash, and each problematic
   character in the file name is escaped with a backslash, making the
   output unambiguous even in the presence of arbitrary file names."


Have a nice day :)

Thomas



Re: MD5 listing format

2021-11-28 Thread Greg Wooledge
On Sun, Nov 28, 2021 at 12:57:00PM -0600, David Wright wrote:
> I was taken by surprise by the following output from md5sum:
> $ echo special/*
> special/C:\nppdf32Log\debuglog.txt special/same-contents
> $ md5sum special/*
> \adfc1d2f1b1d6c7fcaa51e857c1a6f68  special/C:\\nppdf32Log\\debuglog.txt
> adfc1d2f1b1d6c7fcaa51e857c1a6f68  special/same-contents

Fun.

> I don't understand why it pollutes the first field in its output.

Well, it doesn't bother to *document* why it does this, so we can only
guess (or source-dive).

> I would have thought it sufficient to mangle the filename if it
> feels it has to (echo doesn't bother).

Perhaps it prepends the \ character to the output line to indicate to
whoever's reading this file (which may be md5sum itself, in --check
mode) that a filename mangling *has occurred* and needs to be accounted
for.

Otherwise, how would the reader know whether the filename is actually

C:\\nppdf32Log\\debuglog.txt

or

C:\nppdf32Log\debuglog.txt

... and, upon further investigation, it turns out md5sum is part of GNU
coreutils.  Which means the man page that I've been reading *is not the
documentation*.  Fuckers.

In the blighted *info page*, there's this paragraph:

   For each FILE, ‘md5sum’ outputs by default, the MD5 checksum, a
space, a flag indicating binary or text input mode, and the file name.
Binary mode is indicated with ‘*’, text mode with ‘ ’ (space).  Binary
mode is the default on systems where it’s significant, otherwise text
mode is the default.  Without ‘--zero’, if FILE contains a backslash or
newline, the line is started with a backslash, and each problematic
character in the file name is escaped with a backslash, making the
output unambiguous even in the presence of arbitrary file names.  If
FILE is omitted or specified as ‘-’, standard input is read.



MD5 listing format

2021-11-28 Thread David Wright
I was taken by surprise by the following output from md5sum:

$ ls -Glg special/
total 8
-rw-r--r-- 1 144 Oct 24  2014 'C:\nppdf32Log\debuglog.txt'
-rw-r--r-- 1 144 Oct 24  2014  same-contents
$ echo special/*
special/C:\nppdf32Log\debuglog.txt special/same-contents
$ md5sum special/*
\adfc1d2f1b1d6c7fcaa51e857c1a6f68  special/C:\\nppdf32Log\\debuglog.txt
adfc1d2f1b1d6c7fcaa51e857c1a6f68  special/same-contents
$ md5sum special/* | hex
  5c 61 64 66 63 31 64 32  66 31 62 31 64 36 63 37  |\adfc1d2f1b1d6c7|
0010  66 63 61 61 35 31 65 38  35 37 63 31 61 36 66 36  |fcaa51e857c1a6f6|
0020  38 20 20 73 70 65 63 69  61 6c 2f 43 3a 5c 5c 6e  |8  special/C:\\n|
0030  70 70 64 66 33 32 4c 6f  67 5c 5c 64 65 62 75 67  |ppdf32Log\\debug|
0040  6c 6f 67 2e 74 78 74 0a  61 64 66 63 31 64 32 66  |log.txt.adfc1d2f|
0050  31 62 31 64 36 63 37 66  63 61 61 35 31 65 38 35  |1b1d6c7fcaa51e85|
0060  37 63 31 61 36 66 36 38  20 20 73 70 65 63 69 61  |7c1a6f68  specia|
0070  6c 2f 73 61 6d 65 2d 63  6f 6e 74 65 6e 74 73 0a  |l/same-contents.|
0080
$ 

I don't understand why it pollutes the first field in its output.
I would have thought it sufficient to mangle the filename if it
feels it has to (echo doesn't bother).

Cheers,
David.