Hydriz created this task.
Hydriz added subscribers: Hydriz, Aklapper.
Hydriz added projects: Datasets-Archiving, Labs, Labs-Infrastructure, Wikidata.

TASK DESCRIPTION
  The Wikidata JSON dumps are currently being rsynced from the datasets server 
to `/public/dumps/public/wikidatawiki/entities` on a daily basis via cron. 
However, it conflicts with the rsync job that pushes new dumps to its parent 
directory, which is a mirror and thus deletes other files that are not the main 
database dumps.
  
  Basically, this means that the Wikidata JSON dumps are being copied to the 
directory mentioned, but are deleted when new database dumps for wikidatawiki 
is made available. This thus causes the JSON dumps to be re-copied again and 
only to be deleted about a month later via the main database dumps rsync job, 
which generates a lot of bandwidth with no gain and blocks T101639.
  
  Looking at the script that does the rsync job for the main database dumps, it 
is unlikely that it would be modified to accommodate the JSON dumps being in 
the same directory. I propose that the JSON dumps should be pushed to a 
different directory (such as `/public/dumps/wikibase`) just like the other 
miscellaneous files we have.

TASK DETAIL
  https://phabricator.wikimedia.org/T107226

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Hydriz
Cc: Aklapper, Hydriz, Wikidata-bugs, aude, Gryllida, Nemo_bis, scfc, Malyacko, 
P.Copp



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to