Re: Don't Get the SequenceFile.Reader Path in SimpleKMeansCluster

Sean Owen Wed, 19 Oct 2011 00:28:19 -0700

You mean the Mahout javadoc?

Are you asking why this location was chosen? or what this path contains?
All Hadoop output comes in the form of many files in a directory. They
are named like "part-m-00000" and so on. That's where that comes from.
The rest is just where the job has chosen to put the output by default.


On Wed, Oct 19, 2011 at 1:49 AM, robpd <[email protected]> wrote:
> Hi
>
> I'm a Hadoop / Mahout learner - slowly getting there but still at the
> questionning stage!
>
> I was looking at the SimpleKMeansCluster.java example (in the Mahout in
> Action book) and was wondering about the code line....
>
> SequenceFile.Reader reader = new SequenceFile.Reader(fs,
> new Path("output/" + Cluster.CLUSTERED_POINTS_DIR + "/part-m-00000"), conf);
>
> The Hadoop java docs don't give a description of the input parameters to
> this method so it's not very clear exactly what the path refers to. I guess
> that the method reads from the Hadoop FS at the location...
>
> "output/" + Cluster.CLUSTERED_POINTS_DIR + "/part-m-00000"
>
> to get the output clusters to report. Correct?
>
> How do you know that the cluster output would be at this path-location in
> the file system though? None of the preceding code lines give a clue as to
> this.  There's nothing in the example that makes it clear as to why the
> outputs are placed in this location as opposed to anywhere else
> (particularly the "/part-m-00000" is confusing). I'm hoping that Ive missed
> something obvious here.  Having to know exactly where to look for the output
> in the general case would make things very difficult to use without delving
> into the Mahout / Hadoop source code itself.
>
> Sorry to be dim! Any help would be appreciated.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Don-t-Get-the-SequenceFile-Reader-Path-in-SimpleKMeansCluster-tp3432980p3432980.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>

Re: Don't Get the SequenceFile.Reader Path in SimpleKMeansCluster

Reply via email to