> I don't see a way of getting the requests library to handle transfer
encoding but not content encoding without using undocumented API. As far
as I can tell, the raw attribute would give me neither.

Digging into this much deeper, the handling of transfer encoding doesn't
seem to be documented anywhere, but requests uses urllib3 which uses
http.client on Python 3, and http.client's automatic handling of
transfer encoding is passed all the way through. The requests library's
response object's raw attribute is a urllib3 response object, and
testing urllib3 against a deliberately chunked HTTP response does result
in it decoding correctly. For example, the following code worked as
expected:

import urllib3
http = urllib3.PoolManager()
resp = http.request("GET", "http://localhost";, preload_content=False)
print(resp.read())

In that test I arranged a fake chunked response with the following:

while true; do nc -q1 -l -p 80 < result;done

With the file result having the following contents saved with ff=dos in
vim:

--BEGIN--
HTTP/1.1 200
Transfer-Encoding: chunked
Content-Type: text/plain

4
Wiki
7
pedia i
B
n 
chunks.
0

--END--

Similarly, with requests directly, the following also worked fine
against the same fake chunked response:

import requests
with requests.get('http://localhost', stream=True) as resp:
    resp.raw.read(decode_content=False)

In conclusion I don't need to worry about transfer encoding not being
handled correctly when using raw.read() with requests.

Going back to the original issue, it is fixed with a partial revert of
1e20363 to restore the use of raw.read().

The commit message against 1e20363 talks of the changes file ending up
compressed on disk, and indeed I re-introduced this behaviour after my
revert.

To fix that without regressing the fetch of a diff.gz by "accidentally"
decompressing it, I think the correct fix would be to set "Accept-
Encoding: identity". Indeed, on testing with wget (using -d), I see that
wget sets this by default, so I'm confident it's correct. Setting this
using headers={'accept-encoding': 'identity'} fixes that issue, so that
for example "pull-pkg -D ubuntu --upload-queue --pull source r8125
jammy" downloads the changes file and writes it uncompressed again
(r8125 is currently in the unapproved queue but won't work later as an
example).

I think the final fix should therefore look something like:

--- a/ubuntutools/misc.py  2023-05-30 16:47:58.000000000 +0000
+++ b/ubuntutools/misc.py   2023-07-04 18:41:54.919108821 +0000
@@ -348,7 +348,7 @@
     with tempfile.TemporaryDirectory() as tmpdir:
         tmpdst = Path(tmpdir) / "dst"
         try:
-            with requests.get(src, stream=True, timeout=60, auth=auth) as fsrc:
+            with requests.get(src, stream=True, timeout=60, auth=auth, 
headers={'accept-encoding': 'identity'}) as fsrc:
                 with tmpdst.open("wb") as fdst:
                     fsrc.raise_for_status()
                     _download(fsrc, fdst, size, blocksize=blocksize)
@@ -433,7 +433,10 @@
 
     downloaded = 0
     try:
-        for block in fsrc.iter_content(blocksize):
+        while True:
+            block = fsrc.raw.read(blocksize)
+            if not block:
+                break
             fdst.write(block)
             downloaded += len(block)
             progress_bar.update(downloaded, size)

-- 
You received this bug notification because you are a member of MOTU,
which is subscribed to ubuntu-dev-tools in Ubuntu.
https://bugs.launchpad.net/bugs/2025748

Title:
  Launchpad downloads fail with "incorrect size, got X expected Y"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ubuntu-dev-tools/+bug/2025748/+subscriptions


-- 
universe-bugs mailing list
universe-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/universe-bugs

Reply via email to