You mean the Mahout javadoc? Are you asking why this location was chosen? or what this path contains? All Hadoop output comes in the form of many files in a directory. They are named like "part-m-00000" and so on. That's where that comes from. The rest is just where the job has chosen to put the output by default.
On Wed, Oct 19, 2011 at 1:49 AM, robpd <[email protected]> wrote: > Hi > > I'm a Hadoop / Mahout learner - slowly getting there but still at the > questionning stage! > > I was looking at the SimpleKMeansCluster.java example (in the Mahout in > Action book) and was wondering about the code line.... > > SequenceFile.Reader reader = new SequenceFile.Reader(fs, > new Path("output/" + Cluster.CLUSTERED_POINTS_DIR + "/part-m-00000"), conf); > > The Hadoop java docs don't give a description of the input parameters to > this method so it's not very clear exactly what the path refers to. I guess > that the method reads from the Hadoop FS at the location... > > "output/" + Cluster.CLUSTERED_POINTS_DIR + "/part-m-00000" > > to get the output clusters to report. Correct? > > How do you know that the cluster output would be at this path-location in > the file system though? None of the preceding code lines give a clue as to > this. There's nothing in the example that makes it clear as to why the > outputs are placed in this location as opposed to anywhere else > (particularly the "/part-m-00000" is confusing). I'm hoping that Ive missed > something obvious here. Having to know exactly where to look for the output > in the general case would make things very difficult to use without delving > into the Mahout / Hadoop source code itself. > > Sorry to be dim! Any help would be appreciated. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Don-t-Get-the-SequenceFile-Reader-Path-in-SimpleKMeansCluster-tp3432980p3432980.html > Sent from the Mahout User List mailing list archive at Nabble.com. >
