STINNER Victor added the comment:

About the compatibility with existing tools, I recall a discussion when the 
tarfile module got a CLI. First I expected a clone of the UNIX tar command, but 
it was decided to design a new *simpler* CLI.

---------------------------------------------------
$ python3 -m tarfile
usage: tarfile.py [-h] [-v] [-l <tarfile> | -e <tarfile> [<output_dir> ...] |
                  -c <name> [<file> ...] | -t <tarfile>]

A simple command line interface for tarfile module.

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         Verbose output
  -l <tarfile>, --list <tarfile>
                        Show listing of a tarfile
  -e <tarfile> [<output_dir> ...], --extract <tarfile> [<output_dir> ...]
                        Extract tarfile into target dir
  -c <name> [<file> ...], --create <name> [<file> ...]
                        Create tarfile from sources
  -t <tarfile>, --test <tarfile>
                        Test if a tarfile is valid
---------------------------------------------------


A common trap of the md5sum CLI is that users write "echo string|md5sum" which 
adds a newline to string. For an unknown reason, my french manual page of the 
md5sum command has a -s STRING/--string=STRING argument, but not my effective 
md5sum program. Maybe we should consider adding such option to avoid the trap?


Do you want to implement a function to compare computed hash to a file which 
contains the expected hash? Check for file integrity, md5sum -c 
FILE/--check=FILE. Example:
------
$ md5sum test_socket_with.patch > check
$ cat check 
cfc1d69e76c827c32af4f28f50714a5e  test_socket_with.patch

$ md5sum -c check
test_socket_with.patch: OK

$ vim test_socket_with.patch 
<modify something in the file>

$ md5sum -c check
test_socket_with.patch: FAILED
md5sum: WARNING: 1 computed checksum did NOT match
------


I worked hard to release the GIL when a hash is released. It would be super 
cool (a killer feature?) to automatically spawn threads to compute the hash. 
For example, use N threads where N is the number of CPU (os.cpu_count() or 1). 
Last time I wrote my md5sum.py, it was much faster than the UNIX md5sum tool 
since it uses all my CPU cores. You should just ensure that output is written 
in the correct order.


Raymond wrote:
> 1) Neither the md5 or shasum command-line tools offer control over the 
> blocksize.  I suggest that option be dropped from the command-line API giving 
> a nice simplification and usability improvement.

I agree. You should compute it per file using os.stat().st_blksize:

   https://docs.python.org/dev/library/os.html#os.stat_result.st_blksize

The io module uses st_blksize if it is greater than 1, or 8 * 1024 bytes.

(By the way, it looks like shutil.copyfile() doesn't use st_blksize.)

----------
nosy: +haypo

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26488>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to