[issue44424] Decompress streaming bz2 file

2021-06-15 Thread Carlos Franzreb


Change by Carlos Franzreb :


--
components: Library (Lib)
nosy: carlosfranzreb
priority: normal
severity: normal
status: open
title: Decompress streaming bz2 file
type: behavior
versions: Python 3.9

___
Python tracker 
<https://bugs.python.org/issue44424>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44424] Decompress streaming bz2 file

2021-06-15 Thread Carlos Franzreb


New submission from Carlos Franzreb :

I am trying to lazily load items from a compressed file that resides in Zenodo. 
My goal is to iteratively yield the items without storing the file in my 
computer. My problem is that an EOFerror occurs right after the first non-empty 
line is read. How can I overcome this issue?

Here is my code:

import requests as req
import json
from bz2 import BZ2Decompressor


def lazy_load(file_url):
dec = BZ2Decompressor()
with req.get(file_url, stream=True) as res:
for chunk in res.iter_content(chunk_size=1024):
data = dec.decompress(chunk).decode('utf-8')
# do something with 'data'


if __name__ == "__main__":
creds = json.load(open('credentials.json'))
url = 'https://zenodo.org/api/records/'
id = '4617285'
filename = '10.Papers.nt.bz2'
res = req.get(f'{url}{id}', params={'access_token': 
creds['zenodo_token']})
for file in res.json()['files']:
if file['key'] == filename:
for item in lazy_load(file['links']['self']):
# do something with 'item'

The error I become is the following:

Traceback (most recent call last):
File ".\mag_loader.py", line 51, in 
  for item in lazy_load(file['links']['self']):
File ".\mag_loader.py", line 18, in lazy_load
  data = dec.decompress(chunk)
EOFError: End of stream already reache

To run the code you need a Zenodo access token, for which you need an account. 
Once you have logged in, you can create the token here: 
https://zenodo.org/account/settings/applications/tokens/new/

--

___
Python tracker 
<https://bugs.python.org/issue44424>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com