Re: MD5 listing format
On Sun 28 Nov 2021 at 23:40:43 (+0100), Thomas Schmitt wrote: > David Wright wrote: > > My solution will be to rename or delete the file, > > after adding code to detect any future occurrences. > > Or you could avoid to show md5sum a file name. > > path=...as.weird.as.is... > md5sum <"$path" > > will yield something like > > b8ce6ed30aa67e94ad9276c9ac2bbc50 - > > If you need a file name in the result line, then you can change the "-" > to the file name in a form which you like. I came across this when I googled the filename: https://askubuntu.com/questions/144408/what-is-the-file-c-nppdf32log-debuglog-txt and I would imagine that it's likely related to the origin of my file. It's the first file in a decade to cause that escaping mechanism, and only got captured because I dumped the entire /home of a machine I decommissioned a while back. Your workaround is probably useful for really pathological filenames, but doesn't scale well for my use case. (There are 356866 lines in the MD5SUMS file concerned.) Thanks. Cheers, David.
Re: MD5 listing format
Hi, David Wright wrote: > My solution will be to rename or delete the file, > after adding code to detect any future occurrences. Or you could avoid to show md5sum a file name. path=...as.weird.as.is... md5sum <"$path" will yield something like b8ce6ed30aa67e94ad9276c9ac2bbc50 - If you need a file name in the result line, then you can change the "-" to the file name in a form which you like. Have a nice day :) Thomas
Re: MD5 listing format
On Sun 28 Nov 2021 at 20:21:15 (+0100), Thomas Schmitt wrote: > David Wright wrote: > > I was taken by surprise by the following output from md5sum: > > \adfc1d2f1b1d6c7fcaa51e857c1a6f68 special/C:\\nppdf32Log\\debuglog.txt > > It's a feature, not a bug. (tm) > > > https://www.gnu.org/software/coreutils/manual/html_node/md5sum-invocation.html > > "Without --zero, if file contains a backslash, newline, or carriage >return, the line is started with a backslash, and each problematic >character in the file name is escaped with a backslash, making the >output unambiguous even in the presence of arbitrary file names." Ah, I missed that. Ironic considering that I feed md5sum with -print0 and -0 options, but then, md5sum is my final "product". I guess it's required for md5sum -c to work correctly. Thanks to Greg too. My solution will be to rename or delete the file, after adding code to detect any future occurrences. Cheers, David.
Re: MD5 listing format
Hi, David Wright wrote: > I was taken by surprise by the following output from md5sum: > \adfc1d2f1b1d6c7fcaa51e857c1a6f68 special/C:\\nppdf32Log\\debuglog.txt It's a feature, not a bug. (tm) https://www.gnu.org/software/coreutils/manual/html_node/md5sum-invocation.html "Without --zero, if file contains a backslash, newline, or carriage return, the line is started with a backslash, and each problematic character in the file name is escaped with a backslash, making the output unambiguous even in the presence of arbitrary file names." Have a nice day :) Thomas
Re: MD5 listing format
On Sun, Nov 28, 2021 at 12:57:00PM -0600, David Wright wrote: > I was taken by surprise by the following output from md5sum: > $ echo special/* > special/C:\nppdf32Log\debuglog.txt special/same-contents > $ md5sum special/* > \adfc1d2f1b1d6c7fcaa51e857c1a6f68 special/C:\\nppdf32Log\\debuglog.txt > adfc1d2f1b1d6c7fcaa51e857c1a6f68 special/same-contents Fun. > I don't understand why it pollutes the first field in its output. Well, it doesn't bother to *document* why it does this, so we can only guess (or source-dive). > I would have thought it sufficient to mangle the filename if it > feels it has to (echo doesn't bother). Perhaps it prepends the \ character to the output line to indicate to whoever's reading this file (which may be md5sum itself, in --check mode) that a filename mangling *has occurred* and needs to be accounted for. Otherwise, how would the reader know whether the filename is actually C:\\nppdf32Log\\debuglog.txt or C:\nppdf32Log\debuglog.txt ... and, upon further investigation, it turns out md5sum is part of GNU coreutils. Which means the man page that I've been reading *is not the documentation*. Fuckers. In the blighted *info page*, there's this paragraph: For each FILE, ‘md5sum’ outputs by default, the MD5 checksum, a space, a flag indicating binary or text input mode, and the file name. Binary mode is indicated with ‘*’, text mode with ‘ ’ (space). Binary mode is the default on systems where it’s significant, otherwise text mode is the default. Without ‘--zero’, if FILE contains a backslash or newline, the line is started with a backslash, and each problematic character in the file name is escaped with a backslash, making the output unambiguous even in the presence of arbitrary file names. If FILE is omitted or specified as ‘-’, standard input is read.
MD5 listing format
I was taken by surprise by the following output from md5sum: $ ls -Glg special/ total 8 -rw-r--r-- 1 144 Oct 24 2014 'C:\nppdf32Log\debuglog.txt' -rw-r--r-- 1 144 Oct 24 2014 same-contents $ echo special/* special/C:\nppdf32Log\debuglog.txt special/same-contents $ md5sum special/* \adfc1d2f1b1d6c7fcaa51e857c1a6f68 special/C:\\nppdf32Log\\debuglog.txt adfc1d2f1b1d6c7fcaa51e857c1a6f68 special/same-contents $ md5sum special/* | hex 5c 61 64 66 63 31 64 32 66 31 62 31 64 36 63 37 |\adfc1d2f1b1d6c7| 0010 66 63 61 61 35 31 65 38 35 37 63 31 61 36 66 36 |fcaa51e857c1a6f6| 0020 38 20 20 73 70 65 63 69 61 6c 2f 43 3a 5c 5c 6e |8 special/C:\\n| 0030 70 70 64 66 33 32 4c 6f 67 5c 5c 64 65 62 75 67 |ppdf32Log\\debug| 0040 6c 6f 67 2e 74 78 74 0a 61 64 66 63 31 64 32 66 |log.txt.adfc1d2f| 0050 31 62 31 64 36 63 37 66 63 61 61 35 31 65 38 35 |1b1d6c7fcaa51e85| 0060 37 63 31 61 36 66 36 38 20 20 73 70 65 63 69 61 |7c1a6f68 specia| 0070 6c 2f 73 61 6d 65 2d 63 6f 6e 74 65 6e 74 73 0a |l/same-contents.| 0080 $ I don't understand why it pollutes the first field in its output. I would have thought it sufficient to mangle the filename if it feels it has to (echo doesn't bother). Cheers, David.