Re: ClusteringUtils for Kmeans output

Bikash Gupta Sun, 09 Mar 2014 08:05:27 -0700

Info for everyone

I have successfully forced Mahout to build with Guava 11.0.2. Error and
fixes as mentioned below


1.  Class: org.apache.mahout.math.stats.GroupTree
- Change Line No 171 to - stack = new ArrayDeque<GroupTree>();
- Import package java.util.ArrayDeque;

2. Class: org.apache.mahout.classifier.sgd.OnlineLogisticRegressionTest
-  11.0.2 dosent have Closer in IO, hence I have used try-with-resources
- changed java to 1.7
- code changed as shown below

 try(ByteArrayOutputStream byteArrayOutputStream = new
ByteArrayOutputStream();
        DataOutputStream dataOutputStream = new
DataOutputStream(byteArrayOutputStream)) {
      PolymorphicWritable.write(dataOutputStream, lr);
      output = byteArrayOutputStream.toByteArray();
    }

    OnlineLogisticRegression read;

    try(ByteArrayInputStream byteArrayInputStream = new
ByteArrayInputStream(output);
      DataInputStream dataInputStream = new
DataInputStream(byteArrayInputStream)) {
      read = PolymorphicWritable.read(dataInputStream,
OnlineLogisticRegression.class);
    }

3. org.apache.mahout.utils.vectors.lucene.LuceneIterableTest
-  Iterators.advance was not present in 11.0.2. Hence just added the
respective code. sample shown below
int numberToAdvance = 1;
    int iterateNumberToAdvance;
    for (iterateNumberToAdvance = 0; iterateNumberToAdvance <
numberToAdvance && iterator.hasNext(); iterateNumberToAdvance++) {
      iterator.next();
    }

If anyone has good suggestion then please flag.

@Suneel,

Going back to my original question. I was able to call ClusteringUtils for
Kmeans, however I cannot use ClusterQualitySummarizer bcoz it doesnt
support WeightedPropertyVectorWritable.



On Sun, Mar 9, 2014 at 6:28 PM, Bikash Gupta <[email protected]>wrote:

> Just FYI... downgrading guava to 11.0.2 has fixed the build error in
> mahout-math as suggested by Ted however it is causing some other build
> error in mahout-core
>
> [INFO] -------------------------------------------------------------
> [ERROR]
> /mahout-trunk/core/src/test/java/org/apache/mahout/classifier/sgd/OnlineLogisticRegressionTest.java:[24,28]
> cannot find symbol
>   symbol:   class Closer
>   location: package com.google.common.io
> [ERROR]
> /mahout-trunk/core/src/test/java/org/apache/mahout/classifier/sgd/OnlineLogisticRegressionTest.java:[289,5]
> cannot find symbol
>   symbol:   class Closer
>   location: class
> org.apache.mahout.classifier.sgd.OnlineLogisticRegressionTest
> [ERROR]
> /mahout-trunk/core/src/test/java/org/apache/mahout/classifier/sgd/OnlineLogisticRegressionTest.java:[289,21]
> cannot find symbol
>   symbol:   variable Closer
>   location: class
> org.apache.mahout.classifier.sgd.OnlineLogisticRegressionTest
>
>
> On Sun, Mar 9, 2014 at 3:45 PM, Suneel Marthi <[email protected]>wrote:
>
>> Darn. U r the second guy to report that this week.  Change that line to
>> what ted suggested.  The issue is with guava incompatibility with Hadoop's
>> antiquated guava version.
>>
>> Sent from my iPhone
>>
>> On Mar 9, 2014, at 6:10 AM, Bikash Gupta <[email protected]>
>> wrote:
>>
>> I am successfully able to run ClusteringUtils on Kmeans(needs to check
>> the scenario which you have mentionbed). However I am getting error from
>> TDigest class
>>
>> Exception in thread "main" java.lang.NoSuchMethodError:
>> com.google.common.collect.Queues.newArrayDeque()Ljava/util/ArrayDeque;
>>     at org.apache.mahout.math.stats.GroupTree$1.<init>(GroupTree.java:171)
>>     at org.apache.mahout.math.stats.GroupTree.iterator(GroupTree.java:169)
>>     at
>> org.apache.mahout.math.stats.GroupTree.access$300(GroupTree.java:14)
>>     at
>> org.apache.mahout.math.stats.GroupTree$2.iterator(GroupTree.java:317)
>>     at org.apache.mahout.math.stats.TDigest.add(TDigest.java:105)
>>     at org.apache.mahout.math.stats.TDigest.add(TDigest.java:88)
>>     at org.apache.mahout.math.stats.TDigest.add(TDigest.java:76)
>>     at
>> org.apache.mahout.math.stats.OnlineSummarizer.add(OnlineSummarizer.java:57)
>>     at
>> org.apache.mahout.clustering.ClusteringUtils.summarizeClusterDistances(ClusteringUtils.java:65)
>>
>> Few days ago I saw a post where an user got a similar issue on TDigest
>> class. Ted suggested to replace the line with below code
>>
>> stack = new ArrayDeque<GroupTree>();
>>
>> Let me know if I am correct.
>>
>>
>> On Sun, Mar 9, 2014 at 3:18 PM, Suneel Marthi <[email protected]>wrote:
>>
>>> U could call ClusterQualitySummarizer which then calls ClusteringUtils
>>> to spew out the different metrics u had specified.
>>> For an example, see the Streaming Kmeans section in
>>> examples/bin/cluster-reuters.sh.
>>>
>>> It calls 'qualcluster' with options -i <tf-idf vectors generated from
>>> seq2sparse> -c <output of Kmeans> -o <output file generated with the
>>> metrics>
>>>
>>>
>>> I have not tried this on KMeans and since the output format of KMeans is
>>> different from Streaming KMeans, this might just fall flat.
>>> Also it may fail to read some of the clusters if the clusters have only
>>> a single clusteredpoint, this is due to new TDigest summarizer that expects
>>> atleast 2 points in order to calculate - max, quartiles, mean.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Sunday, March 9, 2014 4:19 AM, Bikash Gupta <[email protected]>
>>> wrote:
>>>
>>> Hi,
>>>
>>> I want to use ClusteringUtils on Kmeans clusteredPoints to get
>>> summarizeClusterDistances , daviesBouldinIndex & dunnIndex
>>>
>>> Is there any sample or example how to use these features?
>>> --
>>> Thanks & Regards
>>> Bikash Kumar Gupta
>>>
>>
>>
>>
>> --
>> Thanks & Regards
>> Bikash Kumar Gupta
>>
>>
>
>
> --
> Thanks & Regards
> Bikash Kumar Gupta
>



-- 
Thanks & Regards
Bikash Kumar Gupta

Re: ClusteringUtils for Kmeans output

Reply via email to