I thought HBase might be a little slow for large data query. It normally takes 10-30ms to do a random read request. And even in a parallel/map-reduce condition, it will still take some time to query from the region server to the data node. I really doubt the hbase would become a io bottle neck for a large-scale machine learning algorithm.
Best wishes, Stanley Xu On Mon, Jul 25, 2011 at 8:53 PM, NightWolf <[email protected]> wrote: > Hi all, > > I'm working on a large text classification project and we have our text > data > (simple messages) stored in HBase. > > We have two problems, first we would like to use HBase as the source for > Mahout classifiers namely Bayers and Random Forests. > > Second, we would like to be able to store the model generated in HBase > instead of using the in memory approach (InMemoryBayesDatastore) however as > our sets grow we are running into problems with memory utilization and > would > like to test out HBase as a viable alternative. > > There seems to be little material floating around using HBase with Mahout > and if it's possible to use it as a potential datasource. I'm using Mahout > 0.6 core API in Java which has the InMemory datastore. > > Doing a bit of digging I belive that there (was) a HBase Bayers Datastore > component - > org.apache.mahout.classifier.bayes.datastore.HBaseBayesDatastore > See older JavaDoc here: > > http://www.jarvana.com/jarvana/view/org/apache/mahout/mahout-core/0.3/mahout-core-0.3-javadoc.jar!/org/apache/mahout/classifier/bayes/datastore/HBaseBayesDatastore.html > > However, looking at the latest documentation it looks like this feature has > disappeared..? https://builds.apache.org/job/Mahout-Quality/javadoc/ > > I wanted to know if it was still possible to use HBase as a datastource for > Bayers and RandomForests and are there any previous uses cases in this? > > Thanks! > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/HBase-Mahout-Using-HBase-as-a-Datastore-source-for-Mahout-Classification-tp3197368p3197368.html > Sent from the Mahout User List mailing list archive at Nabble.com. >
