[Foundation-l] Dumps mirroring (was: Request: WMF commitment as a long term cultural archive?)

2011-09-21 Thread Strainu
2011/9/21 emijrp emi...@gmail.com:
 Hi all;

 Just like the scripts to preserve wikis[1], I'm working in a new script to
 download all Wikimedia Commons images packed by day. But I have limited
 spare time. Sad that volunteers have to do this without any help from
 Wikimedia Foundation.

 I started too an effort in meta: (with low activity) to mirror XML dumps.[2]
 If you know about universities or research groups which works with
 Wiki[pm]edia XML dumps, they would be a possible successful target to mirror
 them.

 If you want to download the texts into your PC, you only need 100GB free and
 to run this Python script.[3]

 I heard that Internet Archive saves XML dumps quarterly or so, but no
 official announcement. Also, I heard about Library of Congress wanting to
 mirror the dumps, but not news since a long time.

 L'Encyclopédie has an uptime[4] of 260 years[5] and growing. Will
 Wiki[pm]edia projects reach that?

 Regards,
 emijrp

 [1] http://code.google.com/p/wikiteam/
 [2] http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps
 [3]
 http://code.google.com/p/wikiteam/source/browse/trunk/wikipediadownloader.py
 [4] http://en.wikipedia.org/wiki/Uptime
 [5] http://en.wikipedia.org/wiki/Encyclop%C3%A9die



Hi emirjrp,

I can understand why you would prefer to have full mirrors of the
dumps, but let's face it, 10TB is not (yet) something that most
companies/universities can easily spare. Also, most people only work
on 1-5 versions of Wikipedia, the rest is just overhead to them.

My suggestion would be to accept mirrors of a single language and have
a smart interface at dumps.wikimedia.org that redirects requests to
the location that is the best match for the user. This system is used
by some Linux distributions (see download.opensuse.org for instance)
with great success.

Regards,
   Strainu

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Dumps mirroring (was: Request: WMF commitment as a long term cultural archive?)

2011-09-21 Thread Brian J Mingus
On Wed, Sep 21, 2011 at 3:45 AM, Strainu strain...@gmail.com wrote:

 2011/9/21 emijrp emi...@gmail.com:
  Hi all;
 
  Just like the scripts to preserve wikis[1], I'm working in a new script
 to
  download all Wikimedia Commons images packed by day. But I have limited
  spare time. Sad that volunteers have to do this without any help from
  Wikimedia Foundation.
 
  I started too an effort in meta: (with low activity) to mirror XML
 dumps.[2]
  If you know about universities or research groups which works with
  Wiki[pm]edia XML dumps, they would be a possible successful target to
 mirror
  them.
 
  If you want to download the texts into your PC, you only need 100GB free
 and
  to run this Python script.[3]
 
  I heard that Internet Archive saves XML dumps quarterly or so, but no
  official announcement. Also, I heard about Library of Congress wanting to
  mirror the dumps, but not news since a long time.
 
  L'Encyclopédie has an uptime[4] of 260 years[5] and growing. Will
  Wiki[pm]edia projects reach that?
 
  Regards,
  emijrp
 
  [1] http://code.google.com/p/wikiteam/
  [2] http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps
  [3]
 
 http://code.google.com/p/wikiteam/source/browse/trunk/wikipediadownloader.py
  [4] http://en.wikipedia.org/wiki/Uptime
  [5] http://en.wikipedia.org/wiki/Encyclop%C3%A9die
 
 

 Hi emirjrp,

 I can understand why you would prefer to have full mirrors of the
 dumps, but let's face it, 10TB is not (yet) something that most
 companies/universities can easily spare. Also, most people only work
 on 1-5 versions of Wikipedia, the rest is just overhead to them.

 My suggestion would be to accept mirrors of a single language and have
 a smart interface at dumps.wikimedia.org that redirects requests to
 the location that is the best match for the user. This system is used
 by some Linux distributions (see download.opensuse.org for instance)
 with great success.

 Regards,
   Strainu

 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l



Perhaps a torrent setup would be successful in this case.


-- 
Brian Mingus
Graduate student
Computational Cognitive Neuroscience Lab
University of Colorado at Boulder
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Dumps mirroring (was: Request: WMF commitment as a long term cultural archive?)

2011-09-21 Thread Huib Laurens
I would be happy to mirror. I was looking and poking arround for that a year
ago and the biggest problem for me is that its not clear how WikiMedia would
like to be mirrored.

We are currently a Centos and Ubuntu mirror on the machine. We have the
space, thats not the problem.


Best,

Huib Laurens
WickedWay.nl

2011/9/21 Brian J Mingus brian.min...@colorado.edu

 On Wed, Sep 21, 2011 at 3:45 AM, Strainu strain...@gmail.com wrote:

  2011/9/21 emijrp emi...@gmail.com:
   Hi all;
  
   Just like the scripts to preserve wikis[1], I'm working in a new script
  to
   download all Wikimedia Commons images packed by day. But I have limited
   spare time. Sad that volunteers have to do this without any help from
   Wikimedia Foundation.
  
   I started too an effort in meta: (with low activity) to mirror XML
  dumps.[2]
   If you know about universities or research groups which works with
   Wiki[pm]edia XML dumps, they would be a possible successful target to
  mirror
   them.
  
   If you want to download the texts into your PC, you only need 100GB
 free
  and
   to run this Python script.[3]
  
   I heard that Internet Archive saves XML dumps quarterly or so, but no
   official announcement. Also, I heard about Library of Congress wanting
 to
   mirror the dumps, but not news since a long time.
  
   L'Encyclopédie has an uptime[4] of 260 years[5] and growing. Will
   Wiki[pm]edia projects reach that?
  
   Regards,
   emijrp
  
   [1] http://code.google.com/p/wikiteam/
   [2]
 http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps
   [3]
  
 
 http://code.google.com/p/wikiteam/source/browse/trunk/wikipediadownloader.py
   [4] http://en.wikipedia.org/wiki/Uptime
   [5] http://en.wikipedia.org/wiki/Encyclop%C3%A9die
  
  
 
  Hi emirjrp,
 
  I can understand why you would prefer to have full mirrors of the
  dumps, but let's face it, 10TB is not (yet) something that most
  companies/universities can easily spare. Also, most people only work
  on 1-5 versions of Wikipedia, the rest is just overhead to them.
 
  My suggestion would be to accept mirrors of a single language and have
  a smart interface at dumps.wikimedia.org that redirects requests to
  the location that is the best match for the user. This system is used
  by some Linux distributions (see download.opensuse.org for instance)
  with great success.
 
  Regards,
Strainu
 
  ___
  foundation-l mailing list
  foundation-l@lists.wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
 


 Perhaps a torrent setup would be successful in this case.


 --
 Brian Mingus
 Graduate student
 Computational Cognitive Neuroscience Lab
 University of Colorado at Boulder
 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l




-- 
Kind regards,

Huib Laurens
WickedWay.nl

Webhosting the wicked way.
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l