Frédéric Schütz wrote: > I had started reorganizing the files earlier this year but did not > finish; apart from a few directories leftover, there should no duplicate > files. In the meantime, I have restarted a few wget processes in order > to get the files.
At the moment I'm writing there are still some duplicates for 2011-11 and 2011-12 and for some projectcounts files, but that's not important. What I do want to ask is if you are planning to run, at least, a daily wget for the files, which I think should be necessary for any reliable tool who would make use of them, as I'm planning to do (currently they're stopped at Feb 14, 21 h). > A MMP is a good idea, and I can look into it. However, the most pressing > problem, as I see it, is space, that will become tight very quickly. Currently there are 828 Gb free; hopefully the administration team has enough time to get rid of this. As for the MMP, I offer myself again :). > And the second thing to do would be to produce daily/monthly/whatever > summary files from these raw files, as Erik Zachte does -- there should > be no reason to use the raw files. [ >> WHy not just pull his copies? ] If there are some summary files already, which I don't know, I agree the easiest way would be to also download them. If not, I think someone in here could/should do the task, using a format as plain as possible for any language or script to parse it. Again, it could be me if necessary. ** José Emilio Mori Recio - http://es.wikipedia.org/wiki/User:-jem- ** * Administrador Informático del Arzobispado de Valladolid * ** Bibliotecario de Wikipedia en español - Promotor de Wikimedia España ** ------------ Español: La información contenida en este e-mail es confidencial y va dirigida únicamente al receptor que aparece como destinatario. Si ha recibido este e-mail por error, por favor, notifíquenoslo inmediatamente y bórrelo de su sistema. Por favor, en tal caso, no lo copie ni lo use para ningún propósito, ni revele sus contenidos a ninguna persona ni lo almacene ni copie esta información en ningún medio. English: This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. _______________________________________________ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette