ItamarWMDE added a comment.
I think this has to do with the fact that PyLD is sending a request to get the context document for **each and every** context found in the scraped data. we can mitigate this by: A) Caching the context documents (or the lack thereof): this might help in making the normalization process faster B) Replacing the document loader with our own class to retrieve the documents from cache (See: https://github.com/digitalbazaar/pyld#document-loader <https://github.com/digitalbazaar/pyld#document-loader>) 'https://www.babelio.com/auteur/wd/9652' generated an exception: ('Could not expand input before flattening.',) Type: jsonld.FlattenError Cause: ('Could not perform JSON-LD expansion.',) Type: jsonld.ExpandError Cause: ('Dereferencing a URL did not result in a valid JSON-LD context.',) Type: jsonld.ContextUrlError Code: loading remote context failed Details: {'url': 'http://microformats.org/wiki/'} Cause: ('Could not retrieve a JSON-LD document from the URL.',) Type: jsonld.LoadDocumentError Code: loading document failed Cause: Expecting value: line 1 column 1 (char 0) File "/home/itgi/Projects/reference-island/venv/lib/python3.7/site-packages/pyld/documentloader/requests.py", line 68, in loader 'document': response.json() File "/home/itgi/Projects/reference-island/venv/lib/python3.7/site-packages/requests/models.py", line 898, in json return complexjson.loads(self.text, **kwargs) File "/usr/lib/python3.7/json/__init__.py", line 348, in loads return _default_decoder.decode(s) File "/usr/lib/python3.7/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python3.7/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None File "/home/itgi/Projects/reference-island/venv/lib/python3.7/site-packages/pyld/jsonld.py", line 4749, in _retrieve_context_urls remote_doc = load_document(url) File "/home/itgi/Projects/reference-island/venv/lib/python3.7/site-packages/pyld/documentloader/requests.py", line 92, in loader cause=cause) File "/home/itgi/Projects/reference-island/venv/lib/python3.7/site-packages/pyld/jsonld.py", line 814, in expand input_, {}, options['documentLoader'], options['base']) File "/home/itgi/Projects/reference-island/venv/lib/python3.7/site-packages/pyld/jsonld.py", line 4756, in _retrieve_context_urls code='loading remote context failed', cause=cause) File "/home/itgi/Projects/reference-island/venv/lib/python3.7/site-packages/pyld/jsonld.py", line 867, in flatten expanded = self.expand(input_, options) File "/home/itgi/Projects/reference-island/venv/lib/python3.7/site-packages/pyld/jsonld.py", line 818, in expand 'jsonld.ExpandError', cause=cause) TASK DETAIL https://phabricator.wikimedia.org/T252614 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ItamarWMDE Cc: Aklapper, Tarrow, ItamarWMDE, Ladsgroup, Ferdinand0101, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs