NealMcB created this task.
NealMcB added subscribers: NealMcB, Halfak.
NealMcB added a project: Wikidata.
NealMcB moved this task to incoming on the Wikidata workboard.
Herald added a subscriber: Aklapper.

TASK DESCRIPTION
  Currently the only download option for wikidata in json format is a single 
gzipped file (see e.g. the files under 
https://dumps.wikimedia.org/wikidatawiki/entities/), which is 5.4 GB, 
compressed.
  
  This makes it hard to reliably get it all, or to get just a subset, or to 
obtain in parallel, or to mirror on other infrastructures which are designed to 
facilitate highly parallel downloads (e.g. clusters).   In addition, 5.4 GB is 
too large to easily get into Amazon s3, which has a 5 GB limit for many of the 
most convenient forms of upload.
  
  Note that e.g. the enwiki downloads are split into up to 128 pieces, which 
makes it much easier to process.

TASK DETAIL
  https://phabricator.wikimedia.org/T115223

WORKBOARD
  https://phabricator.wikimedia.org/project/board/71/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: NealMcB
Cc: Halfak, NealMcB, Aklapper, Wikidata-bugs, aude



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to