Exclusion sadly doesn't work because the resulting program will be running with the class path of Hadoop unless you build a jar with dependencies.
On Sun, Mar 9, 2014 at 3:37 PM, Suneel Marthi <[email protected]>wrote: > Thinking loud here. If this is indeed a build error that u r seeing, a > better fix would be to exclude hadoop's guava 11 transitive dependency in > the pom as opposed to having downgrade Mahout code to be guava 11 > compatible. > > We might have missed excluding Hadoop's Guava 11 jar during the recent > patch for Hadoop 2 (this needs to be done for both hadoop 1 & 2 profiles) > if that indeed fixes the issue. > > > > > > > > > On Sunday, March 9, 2014 2:14 PM, Bikash Gupta <[email protected]> > wrote: > > MAHOUT-1442 has been created. Will submit the patch too. > > > On Sun, Mar 9, 2014 at 9:03 PM, Ted Dunning <[email protected]> wrote: > > > Can you file a JIRA and attach your patch? > > > > > > On Sun, Mar 9, 2014 at 8:03 AM, Bikash Gupta <[email protected] > > >wrote: > > > > > Info for everyone > > > > > > I have successfully forced Mahout to build with Guava 11.0.2. Error and > > > fixes as mentioned below > > > > > > > 1. Class: org.apache.mahout.math.stats.GroupTree > > > - Change Line No 171 to - stack = new ArrayDeque<GroupTree>(); > > > - Import package java.util.ArrayDeque; > > > > > > 2. Class: org.apache.mahout.classifier.sgd.OnlineLogisticRegressionTest > > > - 11.0.2 dosent have Closer in IO, hence I have used > try-with-resources > > > - changed java to 1.7 > > > - code changed as shown below > > > > > > try(ByteArrayOutputStream byteArrayOutputStream = new > > > ByteArrayOutputStream(); > > > DataOutputStream dataOutputStream = new > > > DataOutputStream(byteArrayOutputStream)) { > > > > PolymorphicWritable.write(dataOutputStream, lr); > > > output = > byteArrayOutputStream.toByteArray(); > > > } > > > > > > OnlineLogisticRegression read; > > > > > > try(ByteArrayInputStream byteArrayInputStream = new > > > ByteArrayInputStream(output); > > > DataInputStream dataInputStream = new > > > DataInputStream(byteArrayInputStream)) { > > > read = PolymorphicWritable.read(dataInputStream, > > > OnlineLogisticRegression.class); > > > } > > > > > > 3. org.apache.mahout.utils.vectors.lucene.LuceneIterableTest > > > - Iterators.advance was not present in 11.0.2. Hence just added the > > > respective code. sample shown > below > > > int numberToAdvance = 1; > > > int iterateNumberToAdvance; > > > for (iterateNumberToAdvance = 0; iterateNumberToAdvance < > > > numberToAdvance && iterator.hasNext(); iterateNumberToAdvance++) { > > > iterator.next(); > > > } > > > > > > If anyone has good suggestion then please flag. > > > > > > @Suneel, > > > > > > Going back to my original question. I was able to call ClusteringUtils > > for > > > Kmeans, however I cannot use ClusterQualitySummarizer bcoz it doesnt > > > support WeightedPropertyVectorWritable. > > > > > > > > > > > > On Sun, Mar 9, 2014 at 6:28 PM, Bikash Gupta <[email protected] > > > >wrote: > > > > > > > Just FYI... downgrading guava to 11.0.2 has fixed the build error in > > > > mahout-math as suggested by Ted however it is causing some other > build > > > > error in mahout-core > > > > > > > > [INFO] ------------------------------------------------------------- > > > > [ERROR] > > > > > > > > > > /mahout-trunk/core/src/test/java/org/apache/mahout/classifier/sgd/OnlineLogisticRegressionTest.java:[24,28] > > > > cannot > find symbol > > > > > symbol: class Closer > > > > location: package com.google.common.io > > > > [ERROR] > > > > > > > > > > /mahout-trunk/core/src/test/java/org/apache/mahout/classifier/sgd/OnlineLogisticRegressionTest.java:[289,5] > > > > cannot find symbol > > > > symbol: class Closer > > > > location: class > > > > org.apache.mahout.classifier.sgd.OnlineLogisticRegressionTest > > > > [ERROR] > > > > > > > > > > /mahout-trunk/core/src/test/java/org/apache/mahout/classifier/sgd/OnlineLogisticRegressionTest.java:[289,21] > > > > cannot find symbol > > > > symbol: variable Closer > > > > location: class > > > > org.apache.mahout.classifier.sgd.OnlineLogisticRegressionTest > > > > > > > > > > > > On Sun, Mar 9, 2014 at 3:45 PM, Suneel Marthi < > [email protected] > > > >wrote: > > > > > > > >> Darn. U r the second guy to report that this week. Change that line > > to > > > >> what ted suggested. The issue is with guava incompatibility with > > > Hadoop's > > > >> antiquated guava version. > > > >> > > > >> Sent from my iPhone > > > > >> > > > > >> On Mar 9, 2014, at 6:10 AM, Bikash Gupta <[email protected]> > > > >> wrote: > > > >> > > > >> I am successfully able to run ClusteringUtils on Kmeans(needs to > check > > > >> the scenario which you have mentionbed). However I am getting error > > from > > > >> TDigest class > > > >> > > > >> Exception in thread "main" java.lang.NoSuchMethodError: > > > >> > com.google.common.collect.Queues.newArrayDeque()Ljava/util/ArrayDeque; > > > >> at > > > org.apache.mahout.math.stats.GroupTree$1.<init>(GroupTree.java:171) > > > > >> at > > > org.apache.mahout.math.stats.GroupTree.iterator(GroupTree.java:169) > > > >> at > > > >> org.apache.mahout.math.stats.GroupTree.access$300(GroupTree.java:14) > > > >> at > > > >> > org.apache.mahout.math.stats.GroupTree$2.iterator(GroupTree.java:317) > > > >> at org.apache.mahout.math.stats.TDigest.add(TDigest.java:105) > > > >> at org.apache.mahout.math.stats.TDigest.add(TDigest.java:88) > > > >> at org.apache.mahout.math.stats.TDigest.add(TDigest.java:76) > > > >> at > > > >> > > > > > > > org.apache.mahout.math.stats.OnlineSummarizer.add(OnlineSummarizer.java:57) > > > >> at > > > >> > > > > > > org.apache.mahout.clustering.ClusteringUtils.summarizeClusterDistances(ClusteringUtils.java:65) > > > >> > > > >> Few days ago I saw a post where an user got a similar issue on > TDigest > > > >> class. Ted suggested to replace the line with below code > > > >> > > > >> stack = new ArrayDeque<GroupTree>(); > > > >> > > > >> Let me know if I am correct. > > > >> > > > >> > > > >> On Sun, Mar 9, 2014 at 3:18 PM, Suneel Marthi < > > [email protected] > > > >wrote: > > > >> > > > >>> U could call ClusterQualitySummarizer which then calls > > ClusteringUtils > > > >>> to spew out the different metrics u had specified. > > > >>> For an example, see the Streaming Kmeans section in > > > >>> examples/bin/cluster-reuters.sh. > > > >>> > > > >>> It calls 'qualcluster' with options -i <tf-idf vectors generated > from > > > >>> seq2sparse> -c <output of Kmeans> -o <output file generated with > the > > > >>> metrics> > > > >>> > > > > >>> > > > >>> I have not tried this on KMeans and since the output format of > KMeans > > > is > > > >>> different from Streaming KMeans, this might just fall flat. > > > >>> Also it may fail to read some of the clusters if the clusters have > > only > > > >>> a single clusteredpoint, this is due to new TDigest summarizer that > > > expects > > > >>> atleast 2 points in order to calculate - max, quartiles, mean. > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > > >>> On > Sunday, March 9, 2014 4:19 AM, Bikash Gupta < > > > [email protected]> > > > >>> wrote: > > > >>> > > > >>> Hi, > > > >>> > > > >>> I want to use ClusteringUtils on Kmeans clusteredPoints to get > > > >>> summarizeClusterDistances , daviesBouldinIndex & dunnIndex > > > >>> > > > >>> Is there any sample or example how to use these features? > > > >>> -- > > > >>> Thanks & Regards > > > >>> Bikash Kumar Gupta > > > > >>> > > > >> > > > >> > > > >> > > > >> -- > > > >> Thanks & Regards > > > >> Bikash Kumar Gupta > > > >> > > > >> > > > > > > > > > > > > -- > > > > Thanks & Regards > > > > Bikash Kumar Gupta > > > > > > > > > > > > > > > > -- > > > Thanks & Regards > > > Bikash Kumar Gupta > > > > > > > > > -- > Thanks & > Regards > Bikash Kumar Gupta
