Re: Facing problem while fetching the document id from cluser

syed kather Fri, 25 Nov 2011 10:26:12 -0800

Thanks lot it works .
By using this code
SequenceFile.Reader reader = new SequenceFile.Reader(fs,path1, conf);
IntWritable key = new IntWritable();
WeightedVectorWritable value = new WeightedVectorWritable();
while (reader.next(key, value))
{
 System.out.println(value.toString() + " belongs to cluster "+
key.toString());
}
reader.close();


i got the which documents

Can you help how to get the files from the documents id

            Thanks and Regards,
        S SYED ABDUL KATHER
                9731841519


On Fri, Nov 25, 2011 at 9:50 PM, Grant Ingersoll <[email protected]>wrote:

> I think you need to add the --clustering (-cl) option to your KMeans step.
>  By default, we only calculate the centroids.  FWIW, the ClusterDumper can
> also dump out the points associated w/ a cluster once you have run the
> --clustering step.
>
> Also I don't think the clusterData/part-randomSeed is the directory you
> want.
>
>
> On Nov 25, 2011, at 4:03 AM, syed kather wrote:
>
> > Team,
> > While reading the sequencial file . it is returning null
> > These are the command which i executed.
> > For Converting the Sequence File to Chunk(sequence vector) :
> >
> > raghu@Syed:/media/Work/mahout$ bin/mahout seqdirectory -i
> > /media/Work/mahout/examples/bin/sample/fileList -o
> > /media/Work/mahout/examples/bin/sample/seq-ve
> >
> > ctor -c UTF-8 -chunk 5
> >
> > For Converting the Chunk(sequence vector) to sparse :
> >
> > raghu@Syed:/media/Work/mahout$ bin/mahout seq2sparse -i
> > /media/Work/mahout/examples/bin/sample/seq-vector/ -o
> > /media/Work/mahout/examples/bin/sample/sparse
> >
> > For Converting the sparse to Cluster :
> >
> > raghu@Syed:/media/Work/mahout$ bin/mahout kmeans -i
> > /media/Work/mahout/examples/bin/sample/sparse/tfidf-vectors/ -c
> > /media/Work/mahout/examples/bin/sample/clusterData/ -o
> > /media/Work/mahout/examples/bin/sample/clusers -x 10 -k 20 -ow
> >
> > For Converting the Cluster to clusterdump :
> >
> > raghu@Syed:/media/Work/mahout$ bin/mahout clusterdump -s
> > /media/Work/mahout/examples/bin/sample/clusers/clusters-10 -d
> > /media/Work/mahout/examples/bin/sample/sparse/dictionary.file-0 -dt
> > sequencefile -b 100 n 20
> > To get the documents which belong to cluster i wrote these code
> > Configuration conf =new Configuration()
> >
> > Path path1=n*ew*
> Path("/media/Work/mahout/examples/bin/sample/clusterData/
> > part-randomSeed");
> >
> > FileSystem fs = FileSystem.*get*(path1.toUri(),conf);
> >
> > SequenceFile.Reader reader = *new* SequenceFile.Reader(fs,path1, conf);
> >
> > IntWritable key = *new* IntWritable();
> >
> > WeightedVectorWritable value = *new* WeightedVectorWritable();
> > *while* (reader.next(key, value))
> > {
> >
> > System.*out*.println(value.toString() + " belongs to cluster "+
> > key.toString());
> >
> > }
> >
> > reader.close();
> > But it is returning null .
> > Please help me to move further .
> >
> > Thanks and Regards,
> > S SYED ABDUL KATHER
>
> --------------------------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
>
>
>
>

Re: Facing problem while fetching the document id from cluser

Reply via email to