You might want to sync back to an earlier version of Mahout which had this and try to run trainer with --dataSource hbase
This will train over data from hdfs and store model on hbase. Similarly, you can run the classifier with --dataSource hbase and use the model to classify new instances. Note, we dont have code to read from hbase to train a model. Can you tell me a bit about your use-case. Why do you need to store model in hbase. Do you expect the model to be very large? Robin On Mon, Jul 25, 2011 at 7:00 PM, Nightie Wolfi <[email protected]>wrote: > Thanks Robin for your quick response, that's great news. > > As I understand it, this will allow me to store the generated classifier > model in HBase. Are there any examples of its usage? Does anyone know where > can I find some test cases (such as the ones in > MAHOUT-124<https://issues.apache.org/jira/browse/MAHOUT-124> > )? > > The other question I have is how possible is it to use HBase as a data > source instead of the file system (HDFS)? > > Are there are ideas or best practices on are there any ideas or even best > practices on reading and writing documents and vectors generated from these > documents directly from and to HBase rather than using HDFS? > > Thanks, > NW > > On Mon, Jul 25, 2011 at 11:03 PM, Robin Anil <[email protected]> wrote: > > > We dropped it after pruning the dependencies in Mahout. You can simply > > bring > > back the class(from the repository) and use it to connect to HBase in > your > > client code. > > > > Robin > > > > On Mon, Jul 25, 2011 at 6:23 PM, NightWolf <[email protected]> > wrote: > > > > > Hi all, > > > > > > I'm working on a large text classification project and we have our text > > > data > > > (simple messages) stored in HBase. > > > > > > We have two problems, first we would like to use HBase as the source > for > > > Mahout classifiers namely Bayers and Random Forests. > > > > > > Second, we would like to be able to store the model generated in HBase > > > instead of using the in memory approach (InMemoryBayesDatastore) > however > > as > > > our sets grow we are running into problems with memory utilization and > > > would > > > like to test out HBase as a viable alternative. > > > > > > There seems to be little material floating around using HBase with > Mahout > > > and if it's possible to use it as a potential datasource. I'm using > > Mahout > > > 0.6 core API in Java which has the InMemory datastore. > > > > > > Doing a bit of digging I belive that there (was) a HBase Bayers > Datastore > > > component - > > > org.apache.mahout.classifier.bayes.datastore.HBaseBayesDatastore > > > See older JavaDoc here: > > > > > > > > > http://www.jarvana.com/jarvana/view/org/apache/mahout/mahout-core/0.3/mahout-core-0.3-javadoc.jar!/org/apache/mahout/classifier/bayes/datastore/HBaseBayesDatastore.html > > > > > > However, looking at the latest documentation it looks like this feature > > has > > > disappeared..? https://builds.apache.org/job/Mahout-Quality/javadoc/ > > > > > > I wanted to know if it was still possible to use HBase as a datastource > > for > > > Bayers and RandomForests and are there any previous uses cases in this? > > > > > > Thanks! > > > > > > > > > -- > > > View this message in context: > > > > > > http://lucene.472066.n3.nabble.com/HBase-Mahout-Using-HBase-as-a-Datastore-source-for-Mahout-Classification-tp3197368p3197368.html > > > Sent from the Mahout User List mailing list archive at Nabble.com. > > > > > >
