Cool! Also, DisplaySpectralClustering does not work. It has some problems with the data directory names. I did not succeed in tracking these names via eclipse.
https://issues.apache.org/jira/browse/MAHOUT-524 On Wed, Sep 7, 2011 at 5:45 AM, Dan Brickley <[email protected]> wrote: > Trying to run https://cwiki.apache.org/MAHOUT/spectral-clustering.html > ... seems perhaps some code rot? > > Can anyone else report success with Spectral clustering against recent > trunk? > > Trying bin/mahout spectralkmeans -k 2 -i speccy -o specout --maxIter > 10 --dimensions 37 > > ...with the small example affinity file we discussed yesterday, I hit > a series of problems. > > data: http://danbri.org/2011/mahout/afftest.txt > > 1. As I mentioned in comments in > http://spectrallyclustered.wordpress.com/2010/07/14/sprint-3-quick-update/ > (both for local pseudo-cluster, and a real one) I had to patch in > calls to job.setJarByClass before job.waitForCompletion. This problem > occured for others elsewhere in Mahout, e.g. MAHOUT-428 and > MAHOUT-197, but I presume it can't be hitting everyone. From grepping > around, this might not be the only component missing setJarByClass > calls. Or is this just me, somehow? > > 2. Newlines in the input data made it fail, but the associated warning > from AffinityMatrixInputMapper was very vague. I'd suggest allowing > those and #-comments, but maybe not a good idea to make per-component > syntax designs? Suggest also it's worth printing the problem line (see > patch below) when complaining. > > 3. Failing to load the affinity matrix (surely a requirement for > further progress?) does not seem to halt the job, I see exceptions > mixed in with ongoing processing (until a later problem hits us). > Transcript: https://gist.github.com/1200455 ... actually it wasn't > clear if the newline problem was more of a warning, and other rows > from the input data were accepted. In which case, reporting them as > java.io.IOException seems a bit draconian. So maybe bits of the input > file were in fact loaded. It would be great to clarify what expected > behaviour is. > > > 4. After all that, the job still fails. Full transcript here: > https://gist.github.com/1200428 > > Excerpt: (I've added a bit more reporting output in a few places) > > 11/09/07 14:25:06 INFO common.VectorCache: Loading vector from: > specout/calculations/diagonal/part-r-00000 > Exception in thread "main" java.util.NoSuchElementException > at > com.google.common.collect.AbstractIterator.next(AbstractIterator.java:152) > at > org.apache.mahout.clustering.spectral.common.VectorCache.load(VectorCache.java:121) > > However that file does exist in hdfs, and seqdumper seems to accept > it; it just seems empty: > > Input Path: specout/calculations/diagonal/part-r-00000 > Key class: class org.apache.hadoop.io.NullWritable Value Class: class > org.apache.mahout.math.VectorWritable > Count: 0 > > I've posted an informal composite patch at > > https://raw.github.com/gist/1200439/4ad433b51e9d963cff5d500d974fa5cb6904b9c3/gistfile1.txt > ... if you can confirm the above issues and a breakdown into JIRAs, > I'll attach cleaner patches where appropriate. > > Looking forward to getting this running, > > cheers, > > Dan > -- Lance Norskog [email protected]
