STINNER Victor added the comment: About the compatibility with existing tools, I recall a discussion when the tarfile module got a CLI. First I expected a clone of the UNIX tar command, but it was decided to design a new *simpler* CLI.
--------------------------------------------------- $ python3 -m tarfile usage: tarfile.py [-h] [-v] [-l <tarfile> | -e <tarfile> [<output_dir> ...] | -c <name> [<file> ...] | -t <tarfile>] A simple command line interface for tarfile module. optional arguments: -h, --help show this help message and exit -v, --verbose Verbose output -l <tarfile>, --list <tarfile> Show listing of a tarfile -e <tarfile> [<output_dir> ...], --extract <tarfile> [<output_dir> ...] Extract tarfile into target dir -c <name> [<file> ...], --create <name> [<file> ...] Create tarfile from sources -t <tarfile>, --test <tarfile> Test if a tarfile is valid --------------------------------------------------- A common trap of the md5sum CLI is that users write "echo string|md5sum" which adds a newline to string. For an unknown reason, my french manual page of the md5sum command has a -s STRING/--string=STRING argument, but not my effective md5sum program. Maybe we should consider adding such option to avoid the trap? Do you want to implement a function to compare computed hash to a file which contains the expected hash? Check for file integrity, md5sum -c FILE/--check=FILE. Example: ------ $ md5sum test_socket_with.patch > check $ cat check cfc1d69e76c827c32af4f28f50714a5e test_socket_with.patch $ md5sum -c check test_socket_with.patch: OK $ vim test_socket_with.patch <modify something in the file> $ md5sum -c check test_socket_with.patch: FAILED md5sum: WARNING: 1 computed checksum did NOT match ------ I worked hard to release the GIL when a hash is released. It would be super cool (a killer feature?) to automatically spawn threads to compute the hash. For example, use N threads where N is the number of CPU (os.cpu_count() or 1). Last time I wrote my md5sum.py, it was much faster than the UNIX md5sum tool since it uses all my CPU cores. You should just ensure that output is written in the correct order. Raymond wrote: > 1) Neither the md5 or shasum command-line tools offer control over the > blocksize. I suggest that option be dropped from the command-line API giving > a nice simplification and usability improvement. I agree. You should compute it per file using os.stat().st_blksize: https://docs.python.org/dev/library/os.html#os.stat_result.st_blksize The io module uses st_blksize if it is greater than 1, or 8 * 1024 bytes. (By the way, it looks like shutil.copyfile() doesn't use st_blksize.) ---------- nosy: +haypo _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue26488> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com