Re: ClusteringUtils for Kmeans output

Ted Dunning Sun, 09 Mar 2014 16:25:16 -0700

Exclusion sadly doesn't work because the resulting program will be running
with the class path of Hadoop unless you build a jar with dependencies.





On Sun, Mar 9, 2014 at 3:37 PM, Suneel Marthi <[email protected]>wrote:

> Thinking loud here. If this is indeed a build error that u r seeing, a
> better fix would be to exclude hadoop's guava 11 transitive dependency in
> the pom as opposed to having downgrade Mahout code to be guava 11
> compatible.
>
> We might have missed excluding Hadoop's Guava 11 jar during the recent
> patch for Hadoop 2 (this needs to be done for both hadoop 1 & 2 profiles)
> if that indeed fixes the issue.
>
>
>
>
>
>
>
>
> On Sunday, March 9, 2014 2:14 PM, Bikash Gupta <[email protected]>
> wrote:
>
> MAHOUT-1442 has been created. Will submit the patch too.
>
>
> On Sun, Mar 9, 2014 at 9:03 PM, Ted Dunning <[email protected]> wrote:
>
> > Can you file a JIRA and attach your patch?
> >
> >
> > On Sun, Mar 9, 2014 at 8:03 AM, Bikash Gupta <[email protected]
> > >wrote:
> >
> > > Info for everyone
> > >
> > > I have successfully forced Mahout to build with Guava 11.0.2. Error and
> > > fixes as mentioned below
> > >
> >
>  > 1.  Class: org.apache.mahout.math.stats.GroupTree
> > > - Change Line No 171 to - stack = new ArrayDeque<GroupTree>();
> > > - Import package java.util.ArrayDeque;
> > >
> > > 2. Class: org.apache.mahout.classifier.sgd.OnlineLogisticRegressionTest
> > > -  11.0.2 dosent have Closer in IO, hence I have used
> try-with-resources
> > > - changed java to 1.7
> > > - code changed as shown below
> > >
> > >  try(ByteArrayOutputStream byteArrayOutputStream = new
> > > ByteArrayOutputStream();
> > >         DataOutputStream dataOutputStream = new
> > > DataOutputStream(byteArrayOutputStream)) {
> > >
>  PolymorphicWritable.write(dataOutputStream, lr);
> > >       output =
>  byteArrayOutputStream.toByteArray();
> > >     }
> > >
> > >     OnlineLogisticRegression read;
> > >
> > >     try(ByteArrayInputStream byteArrayInputStream = new
> > > ByteArrayInputStream(output);
> > >       DataInputStream dataInputStream = new
> > > DataInputStream(byteArrayInputStream)) {
> > >       read = PolymorphicWritable.read(dataInputStream,
> > > OnlineLogisticRegression.class);
> > >     }
> > >
> > > 3. org.apache.mahout.utils.vectors.lucene.LuceneIterableTest
> > > -  Iterators.advance was not present in 11.0.2. Hence just added the
> > > respective code. sample shown
>  below
> > > int numberToAdvance = 1;
> > >     int iterateNumberToAdvance;
> > >     for (iterateNumberToAdvance = 0; iterateNumberToAdvance <
> > > numberToAdvance && iterator.hasNext(); iterateNumberToAdvance++) {
> > >       iterator.next();
> > >     }
> > >
> > > If anyone has good suggestion then please flag.
> > >
> > > @Suneel,
> > >
> > > Going back to my original question. I was able to call ClusteringUtils
> > for
> > > Kmeans, however I cannot use ClusterQualitySummarizer bcoz it doesnt
> > > support WeightedPropertyVectorWritable.
> > >
> > >
> > >
> > > On Sun, Mar 9, 2014 at 6:28 PM, Bikash Gupta <[email protected]
> > > >wrote:
> > >
> > > > Just FYI... downgrading guava to 11.0.2 has fixed the build error in
> > > > mahout-math as suggested by Ted however it is causing some other
> build
> > > > error in mahout-core
> > > >
> > > > [INFO] -------------------------------------------------------------
> > > > [ERROR]
> > > >
> > >
> >
> /mahout-trunk/core/src/test/java/org/apache/mahout/classifier/sgd/OnlineLogisticRegressionTest.java:[24,28]
> > > > cannot
>  find symbol
> >
>  > >   symbol:   class Closer
> > > >   location: package com.google.common.io
> > > > [ERROR]
> > > >
> > >
> >
> /mahout-trunk/core/src/test/java/org/apache/mahout/classifier/sgd/OnlineLogisticRegressionTest.java:[289,5]
> > > > cannot find symbol
> > > >   symbol:   class Closer
> > > >   location: class
> > > > org.apache.mahout.classifier.sgd.OnlineLogisticRegressionTest
> > > > [ERROR]
> > > >
> > >
> >
> /mahout-trunk/core/src/test/java/org/apache/mahout/classifier/sgd/OnlineLogisticRegressionTest.java:[289,21]
> > > > cannot find symbol
> > > >   symbol:   variable Closer
> > > >   location: class
> > > > org.apache.mahout.classifier.sgd.OnlineLogisticRegressionTest
> > > >
> > > >
> > > > On Sun, Mar 9, 2014 at 3:45 PM, Suneel Marthi <
> [email protected]
> > > >wrote:
> > > >
> > > >> Darn. U r the second guy to report that this week.  Change that line
> > to
> > > >> what ted suggested.  The issue is with guava incompatibility with
> > > Hadoop's
> > > >> antiquated guava version.
> > > >>
> > > >> Sent from my iPhone
> > >
>  >>
> > >
>  >> On Mar 9, 2014, at 6:10 AM, Bikash Gupta <[email protected]>
> > > >> wrote:
> > > >>
> > > >> I am successfully able to run ClusteringUtils on Kmeans(needs to
> check
> > > >> the scenario which you have mentionbed). However I am getting error
> > from
> > > >> TDigest class
> > > >>
> > > >> Exception in thread "main" java.lang.NoSuchMethodError:
> > > >>
> com.google.common.collect.Queues.newArrayDeque()Ljava/util/ArrayDeque;
> > > >>     at
> > > org.apache.mahout.math.stats.GroupTree$1.<init>(GroupTree.java:171)
> > >
>  >>     at
> > > org.apache.mahout.math.stats.GroupTree.iterator(GroupTree.java:169)
> > > >>     at
> > > >> org.apache.mahout.math.stats.GroupTree.access$300(GroupTree.java:14)
> > > >>     at
> > > >>
> org.apache.mahout.math.stats.GroupTree$2.iterator(GroupTree.java:317)
> > > >>     at org.apache.mahout.math.stats.TDigest.add(TDigest.java:105)
> > > >>     at org.apache.mahout.math.stats.TDigest.add(TDigest.java:88)
> > > >>     at org.apache.mahout.math.stats.TDigest.add(TDigest.java:76)
> > > >>     at
> > > >>
> > >
> >
>
>  org.apache.mahout.math.stats.OnlineSummarizer.add(OnlineSummarizer.java:57)
> > > >>     at
> > > >>
> > >
> >
> org.apache.mahout.clustering.ClusteringUtils.summarizeClusterDistances(ClusteringUtils.java:65)
> > > >>
> > > >> Few days ago I saw a post where an user got a similar issue on
> TDigest
> > > >> class. Ted suggested to replace the line with below code
> > > >>
> > > >> stack = new ArrayDeque<GroupTree>();
> > > >>
> > > >> Let me know if I am correct.
> > > >>
> > > >>
> > > >> On Sun, Mar 9, 2014 at 3:18 PM, Suneel Marthi <
> > [email protected]
> > > >wrote:
> > > >>
> > > >>> U could call ClusterQualitySummarizer which then calls
> > ClusteringUtils
> > > >>> to spew out the different metrics u had specified.
> > > >>> For an example, see the Streaming Kmeans section in
> > > >>> examples/bin/cluster-reuters.sh.
> > > >>>
> > > >>> It calls 'qualcluster' with options -i <tf-idf vectors generated
> from
> > > >>> seq2sparse> -c <output of Kmeans> -o <output file generated with
> the
> > > >>> metrics>
> > > >>>
> > >
>  >>>
> > > >>> I have not tried this on KMeans and since the output format of
> KMeans
> > > is
> > > >>> different from Streaming KMeans, this might just fall flat.
> > > >>> Also it may fail to read some of the clusters if the clusters have
> > only
> > > >>> a single clusteredpoint, this is due to new TDigest summarizer that
> > > expects
> > > >>> atleast 2 points in order to calculate - max, quartiles, mean.
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > >
>  >>> On
>  Sunday, March 9, 2014 4:19 AM, Bikash Gupta <
> > > [email protected]>
> > > >>> wrote:
> > > >>>
> > > >>> Hi,
> > > >>>
> > > >>> I want to use ClusteringUtils on Kmeans clusteredPoints to get
> > > >>> summarizeClusterDistances , daviesBouldinIndex & dunnIndex
> > > >>>
> > > >>> Is there any sample or example how to use these features?
> > > >>> --
> > > >>> Thanks & Regards
> > > >>> Bikash Kumar Gupta
>
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Thanks & Regards
> > > >> Bikash Kumar Gupta
> > > >>
> > > >>
> > > >
> > > >
> > > > --
> > > > Thanks & Regards
> > > > Bikash Kumar Gupta
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks & Regards
> > > Bikash Kumar Gupta
> > >
> >
>
>
>
> --
> Thanks &
>  Regards
> Bikash Kumar Gupta

Re: ClusteringUtils for Kmeans output

Reply via email to