I thought HBase might be a little slow for large data query. It normally
takes 10-30ms to do a random read request.
And even in a parallel/map-reduce condition, it will still take some time to
query from the region server to the data node. I really doubt the hbase
would become a io bottle neck for a large-scale machine learning algorithm.

Best wishes,
Stanley Xu



On Mon, Jul 25, 2011 at 8:53 PM, NightWolf <[email protected]> wrote:

> Hi all,
>
> I'm working on a large text classification project and we have our text
> data
> (simple messages) stored in HBase.
>
> We have two problems, first we would like to use HBase as the source for
> Mahout classifiers namely Bayers and Random Forests.
>
> Second, we would like to be able to store the model generated in HBase
> instead of using the in memory approach (InMemoryBayesDatastore) however as
> our sets grow we are running into problems with memory utilization and
> would
> like to test out HBase as a viable alternative.
>
> There seems to be little material floating around using HBase with Mahout
> and if it's possible to use it as a potential datasource. I'm using Mahout
> 0.6 core API in Java which has the InMemory datastore.
>
> Doing a bit of digging I belive that there (was) a HBase Bayers Datastore
> component -
> org.apache.mahout.classifier.bayes.datastore.HBaseBayesDatastore
> See older JavaDoc here:
>
> http://www.jarvana.com/jarvana/view/org/apache/mahout/mahout-core/0.3/mahout-core-0.3-javadoc.jar!/org/apache/mahout/classifier/bayes/datastore/HBaseBayesDatastore.html
>
> However, looking at the latest documentation it looks like this feature has
> disappeared..? https://builds.apache.org/job/Mahout-Quality/javadoc/
>
> I wanted to know if it was still possible to use HBase as a datastource for
> Bayers and RandomForests and are there any previous uses cases in this?
>
> Thanks!
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/HBase-Mahout-Using-HBase-as-a-Datastore-source-for-Mahout-Classification-tp3197368p3197368.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>

Reply via email to