Actually, there are a whole set of "generic" classes that do caching by using FastMap classes (you can checkout the source of Mahout from Apache repo). These implementations actually gives you the same effect as the EhCache - by holding all data inside the memory.
The drawback of using only Mahout caching on the heap is that it happens while constructing these objects (not incrementally, by loading data into memory, as can be implemented with EhCache). If you are not going to do distributed calculations with MapReduce algorithms, you'll need caching to speed up. If your data isn't to big and it can fit into JVM heap well, you can use Mahout without EhCache but if you can't load all the data at once, you should try to implement your own caching (it is possible with EhCache itself) and make sure you don't run out of memory manually. On 11 September 2011 07:32, Ted Dunning <[email protected]> wrote: > Caching in-process like this is likely to have much more satisfactory > results than an external caching process. Also, caching structures with > repetitive access patterns is obviously better than caching single access > data. Thus caching small side data works well. Map inputs do not. > > On Sat, Sep 10, 2011 at 6:28 PM, Robin Anil <[email protected]> wrote: > > > I once wrote a simple cache for HBaseDatastore in naive Bayes classifier > > package and yes the speedup was really awesome, weights of high freq > words > > got cached and incremental lookup for rest of the words in a document was > > really low. I had posted numbers on the old JIRA ticket > > On Sep 11, 2011 12:36 AM, "Dhruv Kumar" <[email protected]> wrote: > > > Has anyone over here used EHcache with Mahout (or pure Hadoop jobs)? > > > > > > http://ehcache.org/ > > > > > > For iterative MapReduce applications running on a NoSQL data store, it > > > should provide a good performance boost by providing an in-memory > object > > > cache (I think). Any comments? > > > -- -- Marko Ćirić [email protected]
