You might want to sync back to an earlier version of Mahout which had this
and try to run trainer with --dataSource hbase

This will train over data from hdfs and store model on hbase. Similarly, you
can run the classifier with --dataSource hbase and use the model to classify
new instances. Note, we dont have code to read from hbase to train a model.

Can you tell me a bit about your use-case. Why do you need to store model in
hbase. Do you expect the model to be very large?


Robin


On Mon, Jul 25, 2011 at 7:00 PM, Nightie Wolfi <[email protected]>wrote:

> Thanks Robin for your quick response, that's great news.
>
> As I understand it, this will allow me to store the generated classifier
> model in HBase. Are there any examples of its usage? Does anyone know where
> can I find some test cases (such as the ones in
> MAHOUT-124<https://issues.apache.org/jira/browse/MAHOUT-124>
> )?
>
> The other question I have is how possible is it to use HBase as a data
> source instead of the file system (HDFS)?
>
> Are there are ideas or best practices on are there any ideas or even best
> practices on reading and writing documents and vectors generated from these
> documents directly from and to HBase rather than using HDFS?
>
> Thanks,
> NW
>
> On Mon, Jul 25, 2011 at 11:03 PM, Robin Anil <[email protected]> wrote:
>
> > We dropped it after pruning the dependencies in Mahout. You can simply
> > bring
> > back the class(from the repository) and use it to connect to HBase in
> your
> > client code.
> >
> > Robin
> >
> > On Mon, Jul 25, 2011 at 6:23 PM, NightWolf <[email protected]>
> wrote:
> >
> > > Hi all,
> > >
> > > I'm working on a large text classification project and we have our text
> > > data
> > > (simple messages) stored in HBase.
> > >
> > > We have two problems, first we would like to use HBase as the source
> for
> > > Mahout classifiers namely Bayers and Random Forests.
> > >
> > > Second, we would like to be able to store the model generated in HBase
> > > instead of using the in memory approach (InMemoryBayesDatastore)
> however
> > as
> > > our sets grow we are running into problems with memory utilization and
> > > would
> > > like to test out HBase as a viable alternative.
> > >
> > > There seems to be little material floating around using HBase with
> Mahout
> > > and if it's possible to use it as a potential datasource. I'm using
> > Mahout
> > > 0.6 core API in Java which has the InMemory datastore.
> > >
> > > Doing a bit of digging I belive that there (was) a HBase Bayers
> Datastore
> > > component -
> > > org.apache.mahout.classifier.bayes.datastore.HBaseBayesDatastore
> > > See older JavaDoc here:
> > >
> > >
> >
> http://www.jarvana.com/jarvana/view/org/apache/mahout/mahout-core/0.3/mahout-core-0.3-javadoc.jar!/org/apache/mahout/classifier/bayes/datastore/HBaseBayesDatastore.html
> > >
> > > However, looking at the latest documentation it looks like this feature
> > has
> > > disappeared..? https://builds.apache.org/job/Mahout-Quality/javadoc/
> > >
> > > I wanted to know if it was still possible to use HBase as a datastource
> > for
> > > Bayers and RandomForests and are there any previous uses cases in this?
> > >
> > > Thanks!
> > >
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://lucene.472066.n3.nabble.com/HBase-Mahout-Using-HBase-as-a-Datastore-source-for-Mahout-Classification-tp3197368p3197368.html
> > > Sent from the Mahout User List mailing list archive at Nabble.com.
> > >
> >
>

Reply via email to