It was brought to my attention that there is an open ticket for this.
Moving discussion there:

https://bugs.python.org/issue45150

Aur

On Sun, Mar 13, 2022 at 6:19 PM Aur Saraf <sonofli...@gmail.com> wrote:

> Hi,
>
> TL;DR: you can read just the code snippets and the last paragraph.
>
> First of all, I'm assuming hashlib is used to calculate hashes of large
> files in production very very often, such that even small performance and
> usability improvements would make a huge difference. If you don't share
> this assumption, delete this email :-)
>
> Today, hashlib.update() accepts only a bytes(), and many StackOverflow
> answers include code like:
>
> sha256 = hashlib.sha256()
> with open(path) as f:
>     while True:
>         data = f.read(BUF_SIZE)
>         if not data:
>             break
>         sha256.update(data)
> return sha256.hexdigest()
>
> This is problematic because not everybody knows this pattern, so many
> codebases will include code like:
>
> with open(path) as f:
>     return sha256(f.read()).hexdigest()
>
> and, frankly, who can blame them.
>
> It is also problematic for performance reasons - even if we know to do the
> chunking thing, while hashing a large file, the GIL will be taken and
> released many times, and many buffers will be allocated and deallocated.
>
> As far as I can see, hashlib already has a lock per hash object and safely
> releases the GIL in update() with a long bytes(), so it would be safe to
> add an option for update()/new() to take a file pointer and do the chunked
> reading/updating with one static buffer with the GIL released throughout,
> so that this would be
>
> with open(path) as f:
>     return sha256(f).hexdigest()
>
> We can discuss whether this is the best API or its preferable to have
>
> with open(path) as f:
>     return sha256.from_file(f).hexdigest()
>
> or
>
> with open(path) as f:
>     return sha256().update_from_file(f).hexdigest()
>
> but I submit that today many people try sha256(f).hexdigest() because
> they're used to e.g. json and csv accepting file objects, and that today
> passing a file object raises, so making both new() and update() accept file
> objects would be the most beginner-friendly and won't break anything.
>
> Knowing that there are a million tiny details that need to be... hashed
> out, and given that I'm willing to write the code, would the devs be
> receptive to something like this?
>
> Thanks,
> Aur
>
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WZF5POSOP4A6N4YRY4KBHPCTIZBEVE2Z/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to