[issue41002] HTTPResponse.read with amt is slow

2020-06-25 Thread Inada Naoki


Change by Inada Naoki :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41002] HTTPResponse.read with amt is slow

2020-06-25 Thread miss-islington


miss-islington  added the comment:


New changeset 152f0b8beea12e6282d284100b600771b968927a by Bruce Merry in branch 
'master':
bpo-41002: Optimize HTTPResponse.read with a given amount (GH-20943)
https://github.com/python/cpython/commit/152f0b8beea12e6282d284100b600771b968927a


--
nosy: +miss-islington

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41002] HTTPResponse.read with amt is slow

2020-06-18 Thread Open Close

Open Close  added the comment:

@bmerry yah, sorry, don't bother. I have mistaken.
(I thought somehow 'MB/s' was simple speed, not std).

I confirmed your tests.
httpclient-read: 2504.5 ± 10.6 MB/s
httpclient-read-length: 871.5 ± 4.9 MB/s
httpclient-read-raw: 2528.3 ± 3.6 MB/s
socket-read: 2520.9 ± 3.6 MB/s

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41002] HTTPResponse.read with amt is slow

2020-06-18 Thread Bruce Merry

Bruce Merry  added the comment:

> (perhaps 'MB/s's are wrong).

Why, are you getting significantly different results?

Just in case it's confusing, the results are reported as A ± B MB/s, where A is 
the mean and B is the standard deviation of the mean. So it's about 3GB/s when 
no length if passed, or 1GB/s when a length is passed.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41002] HTTPResponse.read with amt is slow

2020-06-18 Thread Open Close

Open Close  added the comment:

@bmerry check the test results again.
(perhaps 'MB/s's are wrong).

httpclient-read: 3019.0 ± 63.8 MB/s
httpclient-read-length: 1050.3 ± 4.8 MB/s
--> httpclient-read-raw: 3150.3 ± 5.3 MB/s
--> socket-read: 3134.4 ± 7.9 MB/s

--
nosy: +op368

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41002] HTTPResponse.read with amt is slow

2020-06-17 Thread Inada Naoki


Change by Inada Naoki :


--
nosy: +inada.naoki
versions: +Python 3.10 -Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41002] HTTPResponse.read with amt is slow

2020-06-17 Thread Bruce Merry


Change by Bruce Merry :


--
keywords: +patch
pull_requests: +20124
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/20943

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41002] HTTPResponse.read with amt is slow

2020-06-17 Thread Bruce Merry


Change by Bruce Merry :


--
type:  -> performance

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41002] HTTPResponse.read with amt is slow

2020-06-17 Thread Bruce Merry

New submission from Bruce Merry :

I've run into this on 3.8, but the code on Git master doesn't look 
significantly different so I assume it still applies. I'm happy to work on a PR 
for this.

When http.client.HTTPResponse.read is called with a specific amount to read, it 
goes down this code path:
```
if amt is not None:
# Amount is given, implement using readinto
b = bytearray(amt)
n = self.readinto(b)
return memoryview(b)[:n].tobytes()
```
That's pretty inefficient, because
- `bytearray(amt)` will first zero-fill some memory
- `tobytes()` will make an extra copy of this memory
- if amt is big enough, it'll cause the temporary memory to be allocated from 
the kernel, which will *also* zero-fill the pages for security.

A better approach would be to use the read method of the underlying fp.

I have a micro-benchmark (that I'll attach) showing that for a 1GB body and 
reading the whole body with or without the amount being explicit, performance 
is reduced from 3GB/s to 1GB/s.

For some unknown reason the requests library likes to read the body in 10KB 
chunks even if the user has requested the entire body, so this will help here 
(although the gains probably won't be as big because 10KB is really too small 
to amortise all the accounting overhead).

Output from my benchmark, run against a 1GB file on localhost:

httpclient-read: 3019.0 ± 63.8 MB/s
httpclient-read-length: 1050.3 ± 4.8 MB/s
httpclient-read-raw: 3150.3 ± 5.3 MB/s
socket-read: 3134.4 ± 7.9 MB/s

--
components: Library (Lib)
files: httpbench-simple.py
messages: 371732
nosy: bmerry
priority: normal
severity: normal
status: open
title: HTTPResponse.read with amt is slow
versions: Python 3.8
Added file: https://bugs.python.org/file49239/httpbench-simple.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com