Thanks Jeff. I have one more question. May i know the structure of contents
of the part-m-* files in clusteredPoints. My interpretation is that each
record is a key-value pair where "key" is the clusterID to which the
vector  and belongs and "value" is the point vector.

I want to write a different version of ClusterDumper code where a new file
is created for each cluster and that file contains the points belonging to
that cluster; the reason being the existing ClusterDumper code is unable to
handle large dataset. Is my interpretation about the part-m-*  correct..??

On Wed, Nov 9, 2011 at 11:27 PM, Jeff Eastman <[email protected]> wrote:

> See inline,
> Jeff
>
> -----Original Message-----
> From: gaurav redkar [mailto:[email protected]]
> Sent: Wednesday, November 09, 2011 4:09 AM
> To: [email protected]
> Subject: meanshift clustering
>
> Hi.. I am unable to identify where is the clusterPoints() function in the
> MeanShiftCanopyClusterer.java file being called during the execution of
> Meanshift job.
>
>
> [jeff] That method is not called except by a unit test
> TestMeanShift.testClustererReferenceImplementation.
>
> What i need to know is where are the files in clusteredPoints n clusters-*
> directory being written when we run  the job on hadoop.
>
>
> [jeff] Those directories will be created within the --output directory
> which you specify for your job
>
> buildclustersMR() creates the clusters-* directory for each iteration but i
> am unable to locate the code which actually writes to d part-r-* files .
>
>
> [jeff] The code which writes the part-r-* files is Hadoop code which is
> called within MeanShiftCanopyReducer.reduce (line 55)
>
>
> Any suggestions..??
>
>
> Thanks
>

Reply via email to