Thanks, emjirp! Great news! Quoting from [2] for context before you clickthrough:
"The data set contains triples, each consisting of (i) *text*, a short, raw natural language string; (ii) *url*, a related concept, represented by an English Wikipedia article's canonical location<http://en.wikipedia.org/wiki/Help:URL#URLs_of_Wikipedia_pages>; and (iii) *count*, an integer indicating the number of times*text* has been observed connected with the concept's *url*. Our database thus includes weights that measure degrees of association." "The database that we are providing was designed for recall. It is large and noisy, incorporating 297,073,139 distinct string-concept pairs, aggregated over 3,152,091,432 individual links". Published in LREC 2012: “A Cross-Lingual Dictionary for English Wikipedia Concepts”, Valentin I. Spitkovsky <http://research.google.com/pubs/author3196.html>, Angel X. Chang<http://research.google.com/pubs/author39061.html> , *Eighth International Conference on Language Resources and Evaluation (LREC 2012)*. http://research.google.com/pubs/archive/38098.pdf On Sat, May 19, 2012 at 7:08 PM, emijrp <[email protected]> wrote: > Hi all; > > Just a quick notice about a new Google dataset related to > Wikipedia.[1][2][3] > > Regards, > emijrp > > [1] > http://googleresearch.blogspot.com.es/2012/05/from-words-to-concepts-and-back.html > [2] > http://ebiquity.umbc.edu/blogger/2012/05/19/google-releases-database-linking-strings-and-concepts/ > [3] http://www-nlp.stanford.edu/pubs/crosswikis-data.tar.bz2/ > > -- > Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com > Pre-doctoral student at the University of Cádiz (Spain) > Projects: AVBOT <http://code.google.com/p/avbot/> | > StatMediaWiki<http://statmediawiki.forja.rediris.es> > | WikiEvidens <http://code.google.com/p/wikievidens/> | > WikiPapers<http://wikipapers.referata.com> > | WikiTeam <http://code.google.com/p/wikiteam/> > Personal website: https://sites.google.com/site/emijrp/ > > > _______________________________________________ > Wiki-research-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > >
_______________________________________________ Wiki-research-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
