Forwarding.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )


---------- Forwarded message ---------
From: Gabriel Altay <[email protected]>
Date: Mon, Feb 3, 2020 at 6:57 PM
Subject: [Wikidata] New Wikimedia dataset for NLP research
To: <[email protected]>


Hello Wikidata folks,

I would like to bring your attention to an open source dataset I've
been developing called the Kensho Derived Wikimedia Dataset (KDWD).
It's a cleaned English subset of Wikipedia/Wikidata with 2.3B tokens,
5.3M pages, 51M nodes, and 120M edges.  More details are available
here 
https://blog.kensho.com/announcing-the-kensho-derived-wikimedia-dataset-5d1197d72bcf

best,
-Gabriel
_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata

_______________________________________________
Wiki-research-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to