NealMcB created this task. NealMcB added subscribers: NealMcB, Halfak. NealMcB added a project: Wikidata. NealMcB moved this task to incoming on the Wikidata workboard. Herald added a subscriber: Aklapper.
TASK DESCRIPTION Currently the only download option for wikidata in json format is a single gzipped file (see e.g. the files under https://dumps.wikimedia.org/wikidatawiki/entities/), which is 5.4 GB, compressed. This makes it hard to reliably get it all, or to get just a subset, or to obtain in parallel, or to mirror on other infrastructures which are designed to facilitate highly parallel downloads (e.g. clusters). In addition, 5.4 GB is too large to easily get into Amazon s3, which has a 5 GB limit for many of the most convenient forms of upload. Note that e.g. the enwiki downloads are split into up to 128 pieces, which makes it much easier to process. TASK DETAIL https://phabricator.wikimedia.org/T115223 WORKBOARD https://phabricator.wikimedia.org/project/board/71/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: NealMcB Cc: Halfak, NealMcB, Aklapper, Wikidata-bugs, aude _______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
