Re: How to determine which cluster an item belongs to

praneet mhatre Sat, 07 Jan 2012 12:58:13 -0800

This seems to work perfectly. Thank you Sean!

On Sat, Jan 7, 2012 at 12:36 PM, praneet mhatre <[email protected]>wrote:


> Hi Sean,
>
> I tried passing the file too. But doing so gives me the following error:
>
>
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
> [jar:file:/home/praneet/.m2/repository/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
> [jar:file:/home/praneet/.m2/repository/org/slf4j/slf4j-jcl/1.6.1/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> 12/01/07 12:31:57 INFO dirichlet.DirichletDriver: Iteration 1
> 12/01/07 12:31:57 INFO dirichlet.DirichletDriver: Iteration 2
> 12/01/07 12:31:57 INFO dirichlet.DirichletDriver: Iteration 3
> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 4
> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 5
> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 6
> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 7
> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 8
> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 9
> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 10
> java.lang.IllegalStateException:
> file:/home/praneet/Eclipse-Output/output/clusters-10-final/clusters-10
>     at
> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator$1.apply(SequenceFileDirValueIterator.java:82)
>     at
> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator$1.apply(SequenceFileDirValueIterator.java:1)
>     at com.google.common.collect.Iterators$8.next(Iterators.java:667)
>     at com.google.common.collect.Iterators$5.hasNext(Iterators.java:475)
>     at
> com.google.common.collect.ForwardingIterator.hasNext(ForwardingIterator.java:39)
>     at
> org.apache.mahout.clustering.dirichlet.DirichletClusterMapper.loadClusters(DirichletClusterMapper.java:68)
>     at
> org.apache.mahout.clustering.dirichlet.DirichletDriver.clusterDataSeq(DirichletDriver.java:487)
>     at
> org.apache.mahout.clustering.dirichlet.DirichletDriver.clusterData(DirichletDriver.java:474)
>     at
> org.apache.mahout.clustering.dirichlet.DirichletDriver.run(DirichletDriver.java:172)
>     at
> org.apache.mahout.clustering.TestClusterDumper.testDirichlet2(TestClusterDumper.java:297)
>     at org.apache.mahout.clustering.Test.main(Test.java:40)
> Caused by: java.io.FileNotFoundException:
> /home/praneet/Eclipse-Output/output/clusters-10-final/clusters-10 (Is a
> directory)
>
>     at java.io.FileInputStream.open(Native Method)
>     at java.io.FileInputStream.<init>(FileInputStream.java:137)
>     at
> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:70)
>     at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:106)
>     at
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176)
>     at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
>     at
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
>     at
> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437)
>     at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
>     at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>     at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>     at
> org.apache.mahout.common.iterator.sequencefile.SequenceFileValueIterator.<init>(SequenceFileValueIterator.java:51)
>     at
> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator$1.apply(SequenceFileDirValueIterator.java:78)
>     ... 10 more
>
> This is what I get when I try
>
> Path path = new Path("/home/praneet/Eclipse-
> Output/output/clusteredPoints/part-m-0");
>
> instead of
>
> Path path = new Path("/home/praneet/Eclipse-
> Output/output/clusteredPoints");
>
> Since the directory has only one file part-m-0, I do not need to read the
> whole directory. But I'll still try the approach you suggested and see how
> things work out.
>
>
>
>
> On Fri, Jan 6, 2012 at 9:09 PM, Sean Owen <[email protected]> wrote:
>
>> The error is right there:
>>
>> Exception in thread "main" java.io.FileNotFoundException:
>> /home/praneet/Eclipse-Output/output/clusteredPoints (Is a directory)
>>
>> You are passing a directory, not a file.
>> Look at the class SequenceFileDirIterable for an easy way to iterate
>> over all files in a directory as key-value pairs.
>>
>> On Sat, Jan 7, 2012 at 3:01 AM, praneet mhatre <[email protected]>
>> wrote:
>> > Hi Abin and Petar,
>> >
>> > I tried the above approach with Dirichlet clustering. I am using the
>> > following code snippet after clustering is completed.
>> >
>> >        Configuration conf = new Configuration();
>> >        FileSystem fs = FileSystem.get(conf);
>> >        Path path = new
>> > Path("/home/praneet/Eclipse-Output/output/clusteredPoints");
>> >
>> >        SequenceFile.Reader reader = new
>> SequenceFile.Reader(fs,path,conf);
>> >        IntWritable key = new IntWritable();
>> >        WeightedVectorWritable value = new WeightedVectorWritable();
>> >        while(reader.next(key,value))
>> >        {
>> >         System.out.print(value.toString() +" is in cluster " +
>> > key.toString() );
>> >        }
>> >        System.out.println();
>> >
>> > But I am getting the following error:
>> >
>> > SLF4J: Class path contains multiple SLF4J bindings.
>> > SLF4J: Found binding in
>> >
>> [jar:file:/home/praneet/.m2/repository/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> > SLF4J: Found binding in
>> >
>> [jar:file:/home/praneet/.m2/repository/org/slf4j/slf4j-jcl/1.6.1/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>> > explanation.
>> > 12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 1
>> > 12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 2
>> > 12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 3
>> > 12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 4
>> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 5
>> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 6
>> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 7
>> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 8
>> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 9
>> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 10
>> > 12/01/06 18:47:47 INFO clustering.ClusterDumper: Wrote 10 clusters
>> > Exception in thread "main" java.io.FileNotFoundException:
>> > /home/praneet/Eclipse-Output/output/clusteredPoints (Is a directory)
>> >    at java.io.FileInputStream.open(Native Method)
>> >    at java.io.FileInputStream.<init>(FileInputStream.java:137)
>> >    at
>> >
>> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:70)
>> >    at
>> >
>> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:106)
>> >    at
>> >
>> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176)
>> >    at
>> >
>> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
>> >    at
>> >
>> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
>> >    at
>> >
>> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437)
>> >    at
>> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
>> >    at
>> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>> >    at
>> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>> >    at org.apache.mahout.clustering.Test.main(Test.java:46)
>> >
>> > Any suggestions?
>> >
>> > On Wed, Dec 28, 2011 at 12:25 AM, petar.mitrovic <
>> [email protected]>wrote:
>> >
>> >> Hi Abin,
>> >>
>> >> Thank you very much! Your suggestion helped me a lot.
>> >>
>> >> First, I've set named vector parameter (-nv) to Mahout's vector
>> generation
>> >> process (seq2sparse) in order to write more descriptive vectors.
>> >>
>> >> Later, I could use something like this:
>> >>
>> >> IntWritable key= new IntWritable();
>> >> WeightedVectorWritable vector = new WeightedVectorWritable();
>> >> while (reader.next(key, vector)) {
>> >>        NamedVector nv = (NamedVector) vector.getVector();
>> >>        System.out.println(nv.getName() + " belongs to cluster " +
>> >> key.toString());
>> >> }
>> >>
>> >> Hope this can be useful for someone else, too.
>> >>
>> >> Regards,
>> >> Petar
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://lucene.472066.n3.nabble.com/How-to-determine-which-cluster-an-item-belongs-to-tp3613013p3615979.html
>> >> Sent from the Mahout User List mailing list archive at Nabble.com.
>> >>
>> >
>> >
>> >
>> > --
>> > Praneet Mhatre
>> > Graduate Student
>> > Donald Bren School of ICS
>> > University of California, Irvine
>>
>
>
>
> --
> Praneet Mhatre
> Graduate Student
> Donald Bren School of ICS
> University of California, Irvine
>
>


-- 
Praneet Mhatre
Graduate Student
Donald Bren School of ICS
University of California, Irvine

Re: How to determine which cluster an item belongs to

Reply via email to