Don't Get the SequenceFile.Reader Path in SimpleKMeansCluster

robpd Wed, 19 Oct 2011 00:05:06 -0700

Hi

I'm a Hadoop / Mahout learner - slowly getting there but still at the
questionning stage!


I was looking at the SimpleKMeansCluster.java example (in the Mahout in
Action book) and was wondering about the code line.... 

SequenceFile.Reader reader = new SequenceFile.Reader(fs,
new Path("output/" + Cluster.CLUSTERED_POINTS_DIR + "/part-m-00000"), conf);

The Hadoop java docs don't give a description of the input parameters to
this method so it's not very clear exactly what the path refers to. I guess
that the method reads from the Hadoop FS at the location...

"output/" + Cluster.CLUSTERED_POINTS_DIR + "/part-m-00000"

to get the output clusters to report. Correct?

How do you know that the cluster output would be at this path-location in
the file system though? None of the preceding code lines give a clue as to
this.  There's nothing in the example that makes it clear as to why the
outputs are placed in this location as opposed to anywhere else
(particularly the "/part-m-00000" is confusing). I'm hoping that Ive missed
something obvious here.  Having to know exactly where to look for the output
in the general case would make things very difficult to use without delving
into the Mahout / Hadoop source code itself.

Sorry to be dim! Any help would be appreciated. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Don-t-Get-the-SequenceFile-Reader-Path-in-SimpleKMeansCluster-tp3432980p3432980.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Don't Get the SequenceFile.Reader Path in SimpleKMeansCluster

Reply via email to