Forwarding. Pine ( https://meta.wikimedia.org/wiki/User:Pine )
---------- Forwarded message --------- From: Gabriel Altay <[email protected]> Date: Mon, Feb 3, 2020 at 6:57 PM Subject: [Wikidata] New Wikimedia dataset for NLP research To: <[email protected]> Hello Wikidata folks, I would like to bring your attention to an open source dataset I've been developing called the Kensho Derived Wikimedia Dataset (KDWD). It's a cleaned English subset of Wikipedia/Wikidata with 2.3B tokens, 5.3M pages, 51M nodes, and 120M edges. More details are available here https://blog.kensho.com/announcing-the-kensho-derived-wikimedia-dataset-5d1197d72bcf best, -Gabriel _______________________________________________ Wikidata mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata _______________________________________________ Wiki-research-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
