ItamarWMDE added a comment.

  I think this has to do with the fact that PyLD is sending a request to get 
the context document for **each and every** context found in the scraped data. 
we can mitigate this by:
  
  A) Caching the context documents (or the lack thereof): this might help in 
making the normalization process faster
  
  B) Replacing the document loader with our own class to retrieve the documents 
from cache (See: https://github.com/digitalbazaar/pyld#document-loader 
<https://github.com/digitalbazaar/pyld#document-loader>)
  
    'https://www.babelio.com/auteur/wd/9652' generated an exception: ('Could 
not expand input before flattening.',)
    Type: jsonld.FlattenError
    Cause: ('Could not perform JSON-LD expansion.',)
    Type: jsonld.ExpandError
    Cause: ('Dereferencing a URL did not result in a valid JSON-LD context.',)
    Type: jsonld.ContextUrlError
    Code: loading remote context failed
    Details: {'url': 'http://microformats.org/wiki/'}
    Cause: ('Could not retrieve a JSON-LD document from the URL.',)
    Type: jsonld.LoadDocumentError
    Code: loading document failed
    Cause: Expecting value: line 1 column 1 (char 0)  File 
"/home/itgi/Projects/reference-island/venv/lib/python3.7/site-packages/pyld/documentloader/requests.py",
 line 68, in loader
        'document': response.json()
      File 
"/home/itgi/Projects/reference-island/venv/lib/python3.7/site-packages/requests/models.py",
 line 898, in json
        return complexjson.loads(self.text, **kwargs)
      File "/usr/lib/python3.7/json/__init__.py", line 348, in loads
        return _default_decoder.decode(s)
      File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
        obj, end = self.raw_decode(s, idx=_w(s, 0).end())
      File "/usr/lib/python3.7/json/decoder.py", line 355, in raw_decode
        raise JSONDecodeError("Expecting value", s, err.value) from None
      File 
"/home/itgi/Projects/reference-island/venv/lib/python3.7/site-packages/pyld/jsonld.py",
 line 4749, in _retrieve_context_urls
        remote_doc = load_document(url)
      File 
"/home/itgi/Projects/reference-island/venv/lib/python3.7/site-packages/pyld/documentloader/requests.py",
 line 92, in loader
        cause=cause)
      File 
"/home/itgi/Projects/reference-island/venv/lib/python3.7/site-packages/pyld/jsonld.py",
 line 814, in expand
        input_, {}, options['documentLoader'], options['base'])
      File 
"/home/itgi/Projects/reference-island/venv/lib/python3.7/site-packages/pyld/jsonld.py",
 line 4756, in _retrieve_context_urls
        code='loading remote context failed', cause=cause)
      File 
"/home/itgi/Projects/reference-island/venv/lib/python3.7/site-packages/pyld/jsonld.py",
 line 867, in flatten
        expanded = self.expand(input_, options)
      File 
"/home/itgi/Projects/reference-island/venv/lib/python3.7/site-packages/pyld/jsonld.py",
 line 818, in expand
        'jsonld.ExpandError', cause=cause)

TASK DETAIL
  https://phabricator.wikimedia.org/T252614

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ItamarWMDE
Cc: Aklapper, Tarrow, ItamarWMDE, Ladsgroup, Ferdinand0101, darthmon_wmde, 
Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to