Re: [Foundation-l] [Wiki-research-l] How to improve quality of Wikipedia?

2010-10-14 Thread Felipe Ortega
--- El dom, 10/10/10, Federico Leva (Nemo) nemow...@gmail.com escribió:

 De: Federico Leva (Nemo) nemow...@gmail.com
 Asunto: Re: [Wiki-research-l] [Foundation-l] How to improve quality of 
 Wikipedia?
 Para: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org
 CC: Research into Wikimedia content and communities 
 wiki-researc...@lists.wikimedia.org
 Fecha: domingo, 10 de octubre, 2010 19:45
 Przykuta, 10/10/2010 19:17:
   Old talk pages with solved problems are deleted.
 
 
 This is extremely strange. Talk pages are part of the
 article history.
 Ortega's thesis should be updated, perhaps.
 «The combination of a very active cohort of bots,
 together
 with the very low ratio of talk pages, indicates that the
 Polish 
 language version is not following the
 same organizational pattern found in other language
 editions. Such a low 
 ratio of talk pages points out
 the little effort undertaken on coordination actions and
 discussion 
 about article contents in the Polish
 version.»
 (http://libresoft.es/Members/jfelipe/phd-thesis , p.
 91)
 

Thanks for pointing out this, Nemo. I might have missed the thread in 
Foundation-l otherwise :).

Well, at least this gives a partial explanation for the very low ratio of 
available talk pages, though I personally think it is not enough to explain 
such a really really low figure.

In fact, I concur that this is very strange. As far as I have understood up to 
now, talk pages also serve as a backup log of past discussions for new users 
approaching an article for the first time. If this is true, then in PL some new 
editor of an article might run the risk of raising again a issue or a 
contribution that was already discussed a year ago by editors working on that 
article.

Best,
Felipe.

  Talk pages of dynamic IP are deleted too (we wait ~6
 months and delete them by bot). I don't know - is it
 standard behavior in other Wikiepdias or specific for pl.
 
 This isn't very relevant. On it.wiki they used to be
 deleted by 
 (unapproved) bots (run under sysop accounts); since some
 years they're 
 just replaced with a welcome IP template every month if
 they're more 
 than a month old.
 
  * Finally, Polish Wikipedia has fewer active users
 than any of the
  next three smaller Wikipedias - Italian,
 Japanese and Spanish -
  which might be significant here. Fewer users talk
 less, so there's
  fewer natural discussion pages.
 
  
  True - we have only ~300 very active users. We rather
 use main. One of the most often used slogan is we work
 here, not talk. Many times we spend in flagged revisions
 - so, we are sure, that 90% are free of vandalism.
 
 This is very important. The real question is: how can
 pl.wiki be so big 
 (and useful, looking at pageviews) with such a little
 editor base? Seems 
 a good result.
 
 Nemo
 
 ___
 Wiki-research-l mailing list
 wiki-researc...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 


  

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Wikipedia is not bureaucracy, said bureaucrat and deleted article

2009-11-26 Thread Felipe Ortega


--- El jue, 26/11/09, Milos Rancic mill...@gmail.com escribió:

 De: Milos Rancic mill...@gmail.com
 Asunto: [Foundation-l] Wikipedia is not bureaucracy, said bureaucrat and 
 deleted article
 Para: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org
 Fecha: jueves, 26 de noviembre, 2009 11:36
 Read 
 http://news.slashdot.org/story/09/11/25/160236/Contributors-Leaving-Wikipedia-In-Record-Numbers
 
 Article is based on Felipe Ortega's research. There are two
 claims
 from this article:
 

Hello, Milos, all.

 1. English-language version of Wikipedia suffered a net
 loss of 49,000
 contributors, compared with a loss of about 4,900 during
 the same
 period in 2008

Please, read the following blog post, which I already supervised in consensus 
with Erik Moller, explaining the difference between retaining editors (the 
numbers displayed in WSJ original article) and monthly number of active 
editors

http://blog.wikimedia.org/2009/11/26/wikipedias-volunteer-story/  

 2. There is an increase of bureaucracy and rules.
 

which is becoming increasingly difficult says Andrew Dalby, author of The 
World and Wikipedia: How We are Editing Reality and a regular editor of the 
site. 'There is an increase of bureaucracy and rules. Wikipedia grew because of 
the lack of rules. That has been forgotten. The rules are regarded as 
irritating and useless by many contributors.'

This is Andrew Dalby's quote, not mine.

 I would like to hear from Felipe clarification of the claim
 that
 49,000 contributors left Wikipedia. If it is so, then en.wp
 has around
 ten times more fluctuation of contributors. (According to
 statistics
 [1], there are no significant changes between the first
 months of 2008
 and 2009.) If it is so, we should try to understand why is
 it so.
 
 The second claim produced a lot of *relevant* testimonies
 from
 Wikipedian work. Please, read them. For the first time I
 see highly
 relevant discussion on Slashdot about Wikipedia structure.
 All of them
 are talking about current problems of Wikipedia.
 
 Problems are now visible at such level, that main stream
 media are
 talking about them [2]. I would say that we need some
 radical moves to
 stop current negative trends inside of the projects. Which?
 I don't
 know. We should think about them. (Actually, I have a
 couple of
 possible changes in my mind, which are not radical.
 However, their
 implementation would need radical changes. Because of
 bureaucracy.)
 
 [1] - http://stats.wikimedia.org/EN/ChartsWikipediaEN.htm
 [2] - 
 http://technology.timesonline.co.uk/tol/news/tech_and_web/the_web/article6930546.ece
 
 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
 


  

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] WSJ on Wikipedia

2009-11-24 Thread Felipe Ortega


--- El mar, 24/11/09, Nikola Smolenski smole...@eunet.rs escribió:

 De: Nikola Smolenski smole...@eunet.rs
 Asunto: Re: [Foundation-l] WSJ on Wikipedia
 Para: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org
 Fecha: martes, 24 de noviembre, 2009 08:50
 Felipe Ortega wrote:
  Wikipedia just entered a new phase. Our responsibility
 (as long-time Wikipedia researchers) is to find out the
 causes (not necessarily negative, please read a PDF
 summarizing a recent electronic interview for the Strategy
 plan, at http://strategy.wikimedia.org/wiki/Interviews) and
 prevent any possible problems as much in advance as
 possible.
 
 Why not also conduct interviews with Wikipedia editors,
 either a random 
 sample or targeted people (for example, people who had
 significant 
 contribution and then stopped).
 

Yeah, this is another interesting approach. 

The problem with it is that it's difficult to contact former editors/admins, 
once they abandon the project definitely (in my experience). Other strategies 
are too aggressive (like spamming talk pages) etc. and they should always be 
avoided.

We had an interesting discussion about this issue in an Open Space session at 
WikiSym 2009. It has resulted in a new project to try and improve these 
communication mechanisms:

http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Research

Regards,
F.

 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
 


  

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] WSJ on Wikipedia

2009-11-23 Thread Felipe Ortega
--- El lun, 23/11/09, Steven Walling steven.wall...@gmail.com escribió:

 De: Steven Walling steven.wall...@gmail.com
 
 I didn't see any of the graphs from the piece or any
 conclusions in the
 thesis which are equivalent to the statements made in the
 Journal, so this
 must be new research.
 

Hi, Steven.

I'm Felipe Ortega the author of the numbers and graphs you're mentioning.

Yes, these are recent updated results of our long-time research line about the 
Wikipedia community. They were firstly presented at WikiSym 2009, and before 
that on a coference in the Web Science Lecture Series, at Georgia Tech (both on 
last October).

As always, I just want to state that, even though the numbers doesn't seem 
really good for the sustainability of the project in the long term, I struggle 
daily to fight against fatalist claims or headlines speculating about the end 
of the project.

Wikipedia just entered a new phase. Our responsibility (as long-time Wikipedia 
researchers) is to find out the causes (not necessarily negative, please read a 
PDF summarizing a recent electronic interview for the Strategy plan, at 
http://strategy.wikimedia.org/wiki/Interviews) and prevent any possible 
problems as much in advance as possible.

As usual, I'm at your disposal for any comments/clarifications.

Best,
Felipe.

 Steven



  

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] WSJ on Wikipedia

2009-11-23 Thread Felipe Ortega


--- El lun, 23/11/09, Andrew Lih andrew@gmail.com escribió:

 De: Andrew Lih andrew@gmail.com
 Asunto: Re: [Foundation-l] WSJ on Wikipedia
 Para: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org
 Fecha: lunes, 23 de noviembre, 2009 22:49
 On Mon, Nov 23, 2009 at 12:28 PM,
 David Moran fordmadoxfr...@gmail.com
 wrote:
 
  I think a lot of attention is paid to the way the
 technical interface is
  hostile to newbies, and making that more user-friendly
 and democratic is
  certainly a concern that needs to be addressed.  But
 I think the tendency of
  older users, or certain editorially minded users, to
 squat on the project
  and bludgeon newer users with policy pages rolled up
 into sticks is just as
  much if not more responsible for driving away the new
 users we need to
  replenish our ranks.
 
 At Wikimania 2009 it was noted there were declines across
 different
 language editions, which started happening at the same
 time. This
 suggest that it's not simply the completeness of a
 particular
 edition at play here, as the development cycle of each
 different
 language edition should be fairly distinct. Rather, the
 sharp declines
 across languages indicates it could be a platform feature
 (ie.
 software, policy, et al) or that there is an
 interdependency across
 the language groups or some other outlying variable.
 
 The session at Wikimania about this:
 http://wikimania2009.wikimedia.org/wiki/Proceedings:221
 
 I'll be doing a talk at SXSW 2010 about this next year, and
 I welcome
 any/all theories and what areas of research to pursue.
 http://bit.ly/8Hh52
 

Thank you very much, Andrew for your comments. 

I'm really afraid I won't be able to attend to SXSW 2010. But, I'll attend for 
sure Wikimania 2010 next year, and I hope we'll have some time to reflect on 
these issues.

Best,
Felipe.

 
 -Andrew Lih
 
 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
 


  

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Analysis of statistics

2009-07-25 Thread Felipe Ortega



--- El sáb, 25/7/09, John at Darkstar vac...@jeb.no escribió:

 De: John at Darkstar vac...@jeb.no
 Asunto: Re: [Foundation-l] Analysis of statistics
 Para: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org
 Fecha: sábado, 25 julio, 2009 3:47
 I asked a source if they may grant us
 access to some statistics on users
 behaviour within social media. The time series starts well
 before Nupedia.
 

That would be great, John. 

Though Wikipedia peculiarities should be taken into account, long time series 
would allow interesting comparisons. In particular, about the future trends 
that we may expect to find in the future, from patterns already observed in 
other scenarios with a wider timespan.

Best,
Felipe.

 John
 
 Felipe Ortega wrote:
  --- El vie, 24/7/09, Milos Rancic mill...@gmail.com
 escribió:
  
  De: Milos Rancic mill...@gmail.com
  Asunto: Re: [Foundation-l] Analysis of statistics
  Para: Wikimedia Foundation Mailing List 
  foundation-l@lists.wikimedia.org
  Fecha: viernes, 24 julio, 2009 5:25
  
  Whatever means in the official statistics. It
 would be good
  to have numbers about newcomers and those who made
 10 or 100 edits,
  so we may compare how do we attract attention
 through the time.
  However, I think that those numbers are relatively
 stable in the past    couple of years
 (let's say, from 2005 or so).
 
  
  You can check more precise figures and graphs in my
 thesis about general statistics for survivability for all
 logged editors and core editors (the top 10% most active
 editors in each month), from the beginning until Dec. 2007,
 in the top-ten language versions (at that time).
  
  http://libresoft.es/Members/jfelipe/phd-thesis (page)
  http://libresoft.es/Members/jfelipe/thesis-wkp-quantanalysis
 (doc)
  
  As for the percentages of users by age, education
 level, etc. my impression is that opinions from experienced
 community members are often well oriented. But they're only
 opinions. Until we get the results of the general survey, we
 won't have a clear picture of the current recruitment
 targets for all versions.
  
  Nevertheless, according to our updates, it seems that
 the situation is not getting better from Jan 2008 onwards.
  
  Best,
  Felipe.
  
  
  
  
  ___
  foundation-l mailing list
  foundatio...@lists..wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
 
  
  
        
  
  
  ___
  foundation-l mailing list
  foundation-l@lists.wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
  
 
 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
 


  


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Analysis of statistics

2009-07-25 Thread Felipe Ortega

This is a good point, Milos. Quantity and quality are more related to each 
other than we may thought initially.

For instance: 

* The main proportion of Featured Articles in all top-ten language versions 
needed, at least, more than 1,000 days (3 years) to reach that level.

* Most of editors contributing to FAs were high experienced editors, meaning 
more than 2.5 or 3 years participating in Wikipedia. And these editors tend to 
be very active ones (though they not necessarily get 'sysop' or other special 
privileges). I recall you that more than 50% of editors abandonned after 
aprox.. half a year, in all versions we studied.

Therefore, the high experienced editors are taking care of top-quality content. 
Probably because they know, better than many other editors, the guidelines, 
procedures and daily workflows in the community. Of course, their knowledge 
(about the topics they contribute to) also matters. But I believe that the 
first condition is also critical. And you can get to that point with time, 
interacting with Wikipedia and the community.

As a result, any attempt to improve the feeling of newcomers as they start to 
contribute is invaluable. I've read your comments about chats with sysops or 
article's main editors. I've also read about training environments (customized 
sandboxes, more friendly, etc.).

So, all this makes *a lot of sense* in the current situation. Not because of 
quantity, but to improve *quality*.

Best,
Felipe.


--- El sáb, 25/7/09, Milos Rancic mill...@gmail.com escribió:

 De: Milos Rancic mill...@gmail.com
 Asunto: Re: [Foundation-l] Analysis of statistics

 So, to give the answer about quantity vs. quality: We need
 quantity to
 have sustainable community development or even just a
 sustainable
 stagnation. We shouldn't be shy of saying that quantity is
 very
 important to us because we are able to build quality. And,
 yes, it is
 possible that quality brings quantity. This thread is about
 that: we
 have to think how to do that. If we don't think
 (thinking=quality) how
 to bring quantity and our quantity is lowering: we are at
 the dead
 end.
 
 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
 


  


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Analysis of statistics

2009-07-24 Thread Felipe Ortega

--- El vie, 24/7/09, Milos Rancic mill...@gmail.com escribió:

 De: Milos Rancic mill...@gmail.com
 Asunto: Re: [Foundation-l] Analysis of statistics
 Para: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org
 Fecha: viernes, 24 julio, 2009 5:25

 Whatever means in the official statistics. It would be good
 to have numbers about newcomers and those who made 10 or 100 edits,
 so we may compare how do we attract attention through the time.
 However, I think that those numbers are relatively stable in the past
 couple of years (let's say, from 2005 or so).
 

You can check more precise figures and graphs in my thesis about general 
statistics for survivability for all logged editors and core editors (the top 
10% most active editors in each month), from the beginning until Dec. 2007, in 
the top-ten language versions (at that time).

http://libresoft.es/Members/jfelipe/phd-thesis (page)
http://libresoft.es/Members/jfelipe/thesis-wkp-quantanalysis (doc)

As for the percentages of users by age, education level, etc. my impression is 
that opinions from experienced community members are often well oriented. But 
they're only opinions. Until we get the results of the general survey, we won't 
have a clear picture of the current recruitment targets for all versions.

Nevertheless, according to our updates, it seems that the situation is not 
getting better from Jan 2008 onwards.

Best,
Felipe.




 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
 


  


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Wikipedia: A quantity analysis

2009-07-10 Thread Felipe Ortega

Hello. 
I'm Felipe Ortega, the URJC researcher author of this thesis.

Thank you for pointing this out, though I believe that I already CC the 
Foundation mailing list some months ago, immediately after the manuscript was 
published in our website.

I would like to take this opportunity to comment the very important coverage 
that the results of the thesis has got from Spanish national mass media this 
week. Last Wed., URJC published an official press release about it:

http://www.urjc.es/z_files/ai_noti/ai05/noticia_completa.php?ID=1034 

Just a few hours later, the main Spanish national newspapers followed on
the news:

ABC

http://www.abc.es/20090709/medios-redes-web/wikipedia-estanca-caida-numero-200907091136.html
 

El Mundo (featured in main page http://elmundo.es, as of Thu. July 9)

http://www.elmundo.es/elmundo/2009/07/09/navegante/1247132861.html

El País (also in main page, technology section, as of Thu. July 9)

http://www.elpais.com/articulo/tecnologia/Wikipedia/pierde/editores/elpeputec/20090709elpeputec_1/Tes

As well, it has been published in Europa Press, EFE news, and other
major national news agencies. According to Google, right now we're being 
reported on more than 30 different news sites :).

Yesterday, Fri. 10, it also received coverage by some important radio stations 
(Onda Madrid, Punto Radio) in Madrid region, as well as by national radio 
broadcasting consortium Cadena Ser, at national level, on the night of Thu. 9 
(Hora 25, prime time news program). Some of these included live interviews to 
stand out the main results.

Finally, I would really like to remark that, despite the obvious trend of some 
journalist to seek for a sensationalist headline like could it be the end of 
Wikipedia?, I have tried as much as I could to explicitly avoid this, and just 
told about Wikipedia reaching a new stabilized stage in both the number of 
active editors and number of revisions per month [...] the main cause could 
be that the project is losing, each month, more authors than the number of new 
contributors that arrive to help for the first time.

In fact, we would like to keep on working, with the help of other academic 
institutions, researchers and support from the community, to better understand 
the content creation process in Wikipedia, specially as for the improvement of 
quality, and to find better strategies to retain editors for longer periods of 
time.

All the best,

Felipe.

--- El jue, 9/7/09, Crazy Lover always_yours.fore...@yahoo.com escribió:

 De: Crazy Lover always_yours.fore...@yahoo.com
 Asunto: [Foundation-l] Wikipedia: A quantity analysis
 Para: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org
 Fecha: jueves, 9 julio, 2009 8:39
 Below it is a Doctoral Thesis that
 analize quantitatively the evolution of top ten Wikipedias,
 an their problems (in English): 
 
 http://libresoft.es/Members/jfelipe/thesis-wkp-quantanalysis
 
 C.m.l.
 
 
 
       
 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
 


  


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


[Foundation-l] Public repositories for research dumps

2009-06-23 Thread Felipe Ortega

Hello.

Since just a few hours ago, a new public repository has been created to host 
WikiXRay database dumps, containing info extracted from public Wikipedia 
dbdumps. The image is hosted by RedIRIS (in short, the Spanish equivalent of 
Kennisnet in Netherlands).

http://sunsite.rediris.es/mirror/WKP_research

ftp://ftp.rediris.es/mirror/WKP_research

These new dumps are aimed to save time and effort to other researchers, since 
they won't need to parse the complete XML dumps to extract all relevant 
activity metadata. We used mysqldump to create the dumps from our databases.. 

As of today, only some of the biggest Wikipedias are available. However,  in 
the following days the full set of available languages will be ready for 
downloading. The files will be updated regularly.

The procedure is as follows:

1. Find the research dump of your interest. Download and decompress it in your 
local system.

2. Create a local DB to import the information.

3. Load the dump file, using a MySQL user with insert privileges:

$ mysql -u user -p passw myDB  dumpfile.sql

And you're done.

Final warning. 3 fields in the revision table are not reliable yet:

rev_num_inlinks
rev_num_outlinks
rev_num_trans

All remaining fields/values are trustable (in particular rev_len, 
rev_num_words, and so forth).

Regards,

Felipe.






  


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] dumps

2009-02-25 Thread Felipe Ortega



--- El mié, 25/2/09, Anthony wikim...@inbox.org escribió:

 De: Anthony wikim...@inbox.org
 Asunto: Re: [Foundation-l] dumps
 Para: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org
 Fecha: miércoles, 25 febrero, 2009 5:26
 On Tue, Feb 24, 2009 at 11:26 PM, Brian
 brian.min...@colorado.edu wrote:
 
 Which uncompressed dump?  The full history English
 Wikipedia dump doesn't
 exist, and there doesn't seem to be any demand for this
 anyway.

Mmmm, sorry but then, I'm afraid that you missed some messages over the past 
year and a half on Wikitech-l, eagerly asking for the whole version of the 
English dump.

Just to give a straightforward application: people analyzing Wikipedia from a 
quantitative point of view need the whole dump file, no matter what do you want 
to examine. And believe it or not, the number of scholars (in different 
disciplines) focusing on this topic is growing steadily (actually, we could be 
many more if we could have a stable process, updated with reasonable frequency 
;) ).

It's also really difficult for people like me to advocate in favor of this line 
of research when we have such problems, though we found the way to accept these 
limitations so far (better something than nothing).

Best,

F.


  

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] dumps

2009-02-25 Thread Felipe Ortega



--- El jue, 26/2/09, Brian brian.min...@colorado.edu escribió:

 De: Brian brian.min...@colorado.edu
 Asunto: Re: [Foundation-l] dumps
 Para: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org
 Fecha: jueves, 26 febrero, 2009 12:33
 Ahh ok. Anyone who wants to do processing on the full
 history (and there are
 a lot of these people who exist!) by definition *has* to be
 willing to throw
 some money at it. It simply doesn't fit on commercial
 drives. 

Not necessarily. For instance, WikiXRay is capable of parsing the dump file on 
the fly, so you don't need to uncompress the whole file if you don't want to, 
and the result tipically fits in a 6-8 GB DB (depending on the amount of data 
your recover), which fits perfectly in commodity hw.

On the other hand, I completely agree with you in that working with the huge 
XML file requires specific hw (we bought a couple of servers for that).

 People *just want
 the data*.  Many people would be willing to pay a fee.
 

Probably, but anyway, I would like to avoid paying a fee to access what should 
be publicly available (at least, until the dump process broke, it was).

Some universities (including ourselves) has offered storage capacity and some 
bandwith to distribute mirrors and improve the dump availability, at no cost at 
all :).

 I have a rare copy of the last available full text dump.
 Perhaps I should
 initiate the process myself.
 

Nothing prevents you to do that (I think) and it could be a stimulus for 
thinking on subsequent solutions.

Best,

F.

 
 On Wed, Feb 25, 2009 at 2:20 PM, Thomas Dalton
 thomas.dal...@gmail.comwrote:
 
  2009/2/25 Brian brian.min...@colorado.edu:
   What has led you to believe there is no demand
 for a full dump of the
   english wikipedia?
 
  He didn't say there was no demand, he said there
 was no demand for
  having it on Amazon.
 
  ___
  foundation-l mailing list
  foundation-l@lists.wikimedia.org
  Unsubscribe:
 https://lists.wikimedia.org/mailman/listinfo/foundation-l
 
 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe:
 https://lists.wikimedia.org/mailman/listinfo/foundation-l


  

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l