Done, I've committed a change to check debug level before trying expensive debug statements, here and throughout the code.
2011/11/7 WangRamon <[email protected]>: > > > > > Hi All I'm using CanopyClusterer, the input is vectors of Type > RandomAccessSparseVector, each vector may have 1~99 attributes. When I'm > running CanopyClusterer on Hadoop, i find it was very very slow, so i get the > stacktrace of the map tasks, i find the following output: at > org.apache.mahout.clustering.AbstractCluster.formatVector(AbstractCluster.java:301) > at > org.apache.mahout.clustering.canopy.CanopyClusterer.addPointToCanopies(CanopyClusterer.java:161) > At line 161 of CanopyClusterer, it's just a log output statement, it > should have something like this "if(log.isDebugEnabled())" to avoid running > if the log level is not debug, but this is not the root cause, the root cause > in my case is AbstractCluster.formatVector is so slow to complete, after i > comment "AbstractCluster.formatVector" everything goes well, can any body > have a look at this, thank you very much. Cheers Ramon
