Yes, it also says that you need to either translate this file into canonical "user,item" CSV format, or, swap in the Mapper provided in the book in place of the normal first Mapper to read this data. This file by itself doesn't work with the project since it is not in the right format.
On Tue, Jun 19, 2012 at 2:48 PM, Jonathan Hodges <[email protected]> wrote: > The input is the Wikipedia article data set as recommended by the example. > It was downloaded unchanged from > http://users.on.net/~henry/pagerank/links-simple-sorted.zip. I just > unzipped this and then put to HDFS at the path input/input.txt before > running the command line I mentioned previously. The following are the > first few lines of the Wikipedia data set file.
