The GraphX team has been using Wikipedia dumps from
http://dumps.wikimedia.org/enwiki/. Unfortunately, these are in a less
convenient format than the Freebase dumps. In particular, an article may
span multiple lines, so more involved input parsing is required.
Dan Crankshaw (cc'd) wrote a driver
In particular, we are using this dataset:
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
Ankur http://www.ankurdave.com/
On Sun, Mar 30, 2014 at 12:45 AM, Ankur Dave ankurd...@gmail.com wrote:
The GraphX team has been using Wikipedia dumps from
Hello,
I would like to run the
WikipediaPageRankhttps://github.com/amplab/graphx/blob/f8544981a6d05687fa950639cb1eb3c31e9b6bf5/examples/src/main/scala/org/apache/spark/examples/bagel/WikipediaPageRank.scalaexample,
but the Wikipedia dump XML files are no longer available on
Freebase. Does anyone