Hydriz created this task. Hydriz added subscribers: Hydriz, Aklapper. Hydriz added projects: Datasets-Archiving, Labs, Labs-Infrastructure, Wikidata.
TASK DESCRIPTION The Wikidata JSON dumps are currently being rsynced from the datasets server to `/public/dumps/public/wikidatawiki/entities` on a daily basis via cron. However, it conflicts with the rsync job that pushes new dumps to its parent directory, which is a mirror and thus deletes other files that are not the main database dumps. Basically, this means that the Wikidata JSON dumps are being copied to the directory mentioned, but are deleted when new database dumps for wikidatawiki is made available. This thus causes the JSON dumps to be re-copied again and only to be deleted about a month later via the main database dumps rsync job, which generates a lot of bandwidth with no gain and blocks T101639. Looking at the script that does the rsync job for the main database dumps, it is unlikely that it would be modified to accommodate the JSON dumps being in the same directory. I propose that the JSON dumps should be pushed to a different directory (such as `/public/dumps/wikibase`) just like the other miscellaneous files we have. TASK DETAIL https://phabricator.wikimedia.org/T107226 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Hydriz Cc: Aklapper, Hydriz, Wikidata-bugs, aude, Gryllida, Nemo_bis, scfc, Malyacko, P.Copp _______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
