[Foundation-l] Dumps mirroring (was: Request: WMF commitment as a long term cultural archive?)
2011/9/21 emijrp emi...@gmail.com: Hi all; Just like the scripts to preserve wikis[1], I'm working in a new script to download all Wikimedia Commons images packed by day. But I have limited spare time. Sad that volunteers have to do this without any help from Wikimedia Foundation. I started too an effort in meta: (with low activity) to mirror XML dumps.[2] If you know about universities or research groups which works with Wiki[pm]edia XML dumps, they would be a possible successful target to mirror them. If you want to download the texts into your PC, you only need 100GB free and to run this Python script.[3] I heard that Internet Archive saves XML dumps quarterly or so, but no official announcement. Also, I heard about Library of Congress wanting to mirror the dumps, but not news since a long time. L'Encyclopédie has an uptime[4] of 260 years[5] and growing. Will Wiki[pm]edia projects reach that? Regards, emijrp [1] http://code.google.com/p/wikiteam/ [2] http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps [3] http://code.google.com/p/wikiteam/source/browse/trunk/wikipediadownloader.py [4] http://en.wikipedia.org/wiki/Uptime [5] http://en.wikipedia.org/wiki/Encyclop%C3%A9die Hi emirjrp, I can understand why you would prefer to have full mirrors of the dumps, but let's face it, 10TB is not (yet) something that most companies/universities can easily spare. Also, most people only work on 1-5 versions of Wikipedia, the rest is just overhead to them. My suggestion would be to accept mirrors of a single language and have a smart interface at dumps.wikimedia.org that redirects requests to the location that is the best match for the user. This system is used by some Linux distributions (see download.opensuse.org for instance) with great success. Regards, Strainu ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Dumps mirroring (was: Request: WMF commitment as a long term cultural archive?)
On Wed, Sep 21, 2011 at 3:45 AM, Strainu strain...@gmail.com wrote: 2011/9/21 emijrp emi...@gmail.com: Hi all; Just like the scripts to preserve wikis[1], I'm working in a new script to download all Wikimedia Commons images packed by day. But I have limited spare time. Sad that volunteers have to do this without any help from Wikimedia Foundation. I started too an effort in meta: (with low activity) to mirror XML dumps.[2] If you know about universities or research groups which works with Wiki[pm]edia XML dumps, they would be a possible successful target to mirror them. If you want to download the texts into your PC, you only need 100GB free and to run this Python script.[3] I heard that Internet Archive saves XML dumps quarterly or so, but no official announcement. Also, I heard about Library of Congress wanting to mirror the dumps, but not news since a long time. L'Encyclopédie has an uptime[4] of 260 years[5] and growing. Will Wiki[pm]edia projects reach that? Regards, emijrp [1] http://code.google.com/p/wikiteam/ [2] http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps [3] http://code.google.com/p/wikiteam/source/browse/trunk/wikipediadownloader.py [4] http://en.wikipedia.org/wiki/Uptime [5] http://en.wikipedia.org/wiki/Encyclop%C3%A9die Hi emirjrp, I can understand why you would prefer to have full mirrors of the dumps, but let's face it, 10TB is not (yet) something that most companies/universities can easily spare. Also, most people only work on 1-5 versions of Wikipedia, the rest is just overhead to them. My suggestion would be to accept mirrors of a single language and have a smart interface at dumps.wikimedia.org that redirects requests to the location that is the best match for the user. This system is used by some Linux distributions (see download.opensuse.org for instance) with great success. Regards, Strainu ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l Perhaps a torrent setup would be successful in this case. -- Brian Mingus Graduate student Computational Cognitive Neuroscience Lab University of Colorado at Boulder ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Dumps mirroring (was: Request: WMF commitment as a long term cultural archive?)
I would be happy to mirror. I was looking and poking arround for that a year ago and the biggest problem for me is that its not clear how WikiMedia would like to be mirrored. We are currently a Centos and Ubuntu mirror on the machine. We have the space, thats not the problem. Best, Huib Laurens WickedWay.nl 2011/9/21 Brian J Mingus brian.min...@colorado.edu On Wed, Sep 21, 2011 at 3:45 AM, Strainu strain...@gmail.com wrote: 2011/9/21 emijrp emi...@gmail.com: Hi all; Just like the scripts to preserve wikis[1], I'm working in a new script to download all Wikimedia Commons images packed by day. But I have limited spare time. Sad that volunteers have to do this without any help from Wikimedia Foundation. I started too an effort in meta: (with low activity) to mirror XML dumps.[2] If you know about universities or research groups which works with Wiki[pm]edia XML dumps, they would be a possible successful target to mirror them. If you want to download the texts into your PC, you only need 100GB free and to run this Python script.[3] I heard that Internet Archive saves XML dumps quarterly or so, but no official announcement. Also, I heard about Library of Congress wanting to mirror the dumps, but not news since a long time. L'Encyclopédie has an uptime[4] of 260 years[5] and growing. Will Wiki[pm]edia projects reach that? Regards, emijrp [1] http://code.google.com/p/wikiteam/ [2] http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps [3] http://code.google.com/p/wikiteam/source/browse/trunk/wikipediadownloader.py [4] http://en.wikipedia.org/wiki/Uptime [5] http://en.wikipedia.org/wiki/Encyclop%C3%A9die Hi emirjrp, I can understand why you would prefer to have full mirrors of the dumps, but let's face it, 10TB is not (yet) something that most companies/universities can easily spare. Also, most people only work on 1-5 versions of Wikipedia, the rest is just overhead to them. My suggestion would be to accept mirrors of a single language and have a smart interface at dumps.wikimedia.org that redirects requests to the location that is the best match for the user. This system is used by some Linux distributions (see download.opensuse.org for instance) with great success. Regards, Strainu ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l Perhaps a torrent setup would be successful in this case. -- Brian Mingus Graduate student Computational Cognitive Neuroscience Lab University of Colorado at Boulder ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l -- Kind regards, Huib Laurens WickedWay.nl Webhosting the wicked way. ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l