Hello I am an R user and now using Mahout for ML algorithms on big datasets that are out of reach of R. R has hadoopstreaming package and I was wondering if Mahout and R have an interface that has been developed.
My question arises from the fact that the lucence vectors/sparse matrices created by Mahout are unintelligible if there is no way to access them in R I have just tested using Apache Mahout for building an Latent dirichlet allocation model on a corpus of 30 documents. I did not have Hadoop installed on the system thats why a local execution of the Mahout yielded the resulting model. I would like to access the model parameters, as in the estimated \alpha, \beta, \Phi, \Theta How can I access these? <Mahout bin location>/mahout lda -i <tf-vectors location>/tf-vectors -o <lda-out-dir> -k 4-v 27 I can see that <lda-out-dir> has folder <state-i> for each iteration(i presume) of the learning algorithm. Each <state-i> has a single file part-r-0000 which I do not know how to access. Do I need to use HBASE to be able to acesss the data generated by Mahout? If my naive questions annoy you, I apologize, I am new to Mahout. Regards, Shivani
