Re: [Foundation-l] [Wiki-research-l] Wikipedia dumps downloader

2011-06-28 Thread emijrp
Can you share your script with us? 2011/6/27 Platonides platoni...@gmail.com emijrp wrote: Hi SJ; You know that that is an old item in our TODO list ; ) I heard that Platonides developed a script for that task long time ago. Platonides, are you there? Regards, emijrp Yes, I am. :)

Re: [Foundation-l] [Wiki-research-l] Wikipedia dumps downloader

2011-06-28 Thread emijrp
Hi; @Derrick: I don't trust Amazon. Really, I don't trust Wikimedia Foundation either. They can't and/or they don't want to provide image dumps (what is worst?). Community donates images to Commons, community donates money every year, and now community needs to develop a software to extract all

Re: [Foundation-l] [Wiki-research-l] Wikipedia dumps downloader

2011-06-28 Thread Milos Rancic
On 06/28/2011 07:21 PM, emijrp wrote: @Milos: Instead of spliting image dump using the first letter of filenames, I thought about spliting using the upload date (-MM-DD). So, first chunks (2005-01-01) will be tiny, and recent ones of several GB (a single day). That would be better,

Re: [Foundation-l] [Wiki-research-l] Wikipedia dumps downloader

2011-06-28 Thread Platonides
emijrp wrote: Hi; @Derrick: I don't trust Amazon. I disagree. Note that we only need them to keep a redundant copy of a file. If they tried to tamper the file we could detect it with the hashes (which should be properly secured, that's no problem). I'd like having the hashes for the xml

Re: [Foundation-l] [Wiki-research-l] Wikipedia dumps downloader

2011-06-28 Thread emijrp
2011/6/28 Platonides platoni...@gmail.com emijrp wrote: Hi; @Derrick: I don't trust Amazon. I disagree. Note that we only need them to keep a redundant copy of a file. If they tried to tamper the file we could detect it with the hashes (which should be properly secured, that's no

Re: [Foundation-l] [Wiki-research-l] Wikipedia dumps downloader

2011-06-28 Thread Platonides
emijrp wrote: I didn't mean security problems. I meant just deleted files by weird terms of service. Commons hosts a lot of images which can be problematic, like nudes or copyrighted materials in some jurisdictions. They can deleted what they want and close every account they want, and we

Re: [Foundation-l] [Wiki-research-l] Wikipedia dumps downloader

2011-06-27 Thread Samuel Klein
Thank you, Emijrp! What about the dump of Commons images? [for those with 10TB to spare] SJ On Sun, Jun 26, 2011 at 8:53 AM, emijrp emi...@gmail.com wrote: Hi all; Can you imagine a day when Wikipedia is added to this list?[1] WikiTeam have developed a script[2] to download all the

Re: [Foundation-l] [Wiki-research-l] Wikipedia dumps downloader

2011-06-27 Thread Platonides
emijrp wrote: Hi SJ; You know that that is an old item in our TODO list ; ) I heard that Platonides developed a script for that task long time ago. Platonides, are you there? Regards, emijrp Yes, I am. :) ___ foundation-l mailing list