Before looking at your results, I would like to say that Mahout is about scalable data mining. Performance on small data sets is explicitly not a goal.
For small data sets like this, you would do much, much better to do your work in a conventional system like R where everything can fit in memory and clustering can take less than a second. On Wed, Dec 29, 2010 at 2:39 PM, Samir Raiyani <[email protected]> wrote: > We have been testing Mahout in a few different configurations and it seems > to take a significant amount of time (several minutes to over an hour) for > small document sets (3,000 documents and 7,000 documents). Is this type of > performance normal? >
