Thanks, emjirp! Great news! Quoting from [2] for context before you
clickthrough:

"The data set contains triples, each consisting of (i) *text*, a short, raw
natural language string; (ii) *url*, a related concept, represented by
an English
Wikipedia article's canonical
location<http://en.wikipedia.org/wiki/Help:URL#URLs_of_Wikipedia_pages>;
and (iii) *count*, an integer indicating the number of times*text* has been
observed connected with the concept's *url*. Our database thus includes
weights that measure degrees of association."

"The database that we are providing was designed for recall. It is large
and noisy, incorporating 297,073,139 distinct string-concept pairs,
aggregated over 3,152,091,432 individual links".

Published in LREC 2012:

 “A Cross-Lingual Dictionary for English Wikipedia Concepts”, Valentin I.
Spitkovsky <http://research.google.com/pubs/author3196.html>, Angel X.
Chang<http://research.google.com/pubs/author39061.html>
, *Eighth International Conference on Language Resources and Evaluation
(LREC 2012)*. http://research.google.com/pubs/archive/38098.pdf

On Sat, May 19, 2012 at 7:08 PM, emijrp <[email protected]> wrote:

> Hi all;
>
> Just a quick notice about a new Google dataset related to
> Wikipedia.[1][2][3]
>
> Regards,
> emijrp
>
> [1]
> http://googleresearch.blogspot.com.es/2012/05/from-words-to-concepts-and-back.html
> [2]
> http://ebiquity.umbc.edu/blogger/2012/05/19/google-releases-database-linking-strings-and-concepts/
> [3] http://www-nlp.stanford.edu/pubs/crosswikis-data.tar.bz2/
>
> --
> Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
> Pre-doctoral student at the University of Cádiz (Spain)
> Projects: AVBOT <http://code.google.com/p/avbot/> | 
> StatMediaWiki<http://statmediawiki.forja.rediris.es>
> | WikiEvidens <http://code.google.com/p/wikievidens/> | 
> WikiPapers<http://wikipapers.referata.com>
> | WikiTeam <http://code.google.com/p/wikiteam/>
> Personal website: https://sites.google.com/site/emijrp/
>
>
> _______________________________________________
> Wiki-research-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
_______________________________________________
Wiki-research-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to