Sparse Vectors from Directory of Documents

2012-04-26 Thread Dirk Weissenborn
Hey, is it already possible to create sparsevectors from documents using an already earlier generated dictionary? Should I just point the output directory to the already existing output directory of an earlier conversion? cheers dirk

Re: online lda learning algorithm

2012-03-27 Thread Dirk Weissenborn
for training, and it has been shown that a composition of lda and a classifier is much faster at the classification task, with similar results to tf-idf based classifiers. 2012/3/27 Jake Mannix jake.man...@gmail.com On Mon, Mar 26, 2012 at 5:13 PM, Dirk Weissenborn dirk.weissenb

Re: online lda learning algorithm

2012-03-27 Thread Dirk Weissenborn
I found this paper on supervised lda. http://www.cs.princeton.edu/~blei/papers/BleiMcAuliffe2007.pdf Could an implementation of supervised lda in your opinion be done easily given the already existent implementation? 2012/3/27 Dirk Weissenborn dirk.weissenb...@googlemail.com The fact

online lda learning algorithm

2012-03-26 Thread Dirk Weissenborn
Hello, I wanted to ask whether there is already an online learning algorithm implementation for lda or not? http://www.cs.princeton.edu/~blei/papers/HoffmanBleiBach2010b.pdf cheers, Dirk

Re: online lda learning algorithm

2012-03-26 Thread Dirk Weissenborn
to start! Refrences: 1) http://eprints.pascal-network.org/archive/6729/01/AsuWelSmy2009a.pdf 2) http://www.csee.ogi.edu/~zak/cs506-pslc/dist_lda.pdf On Mon, Mar 26, 2012 at 11:54 AM, Dirk Weissenborn dirk.weissenb...@googlemail.com wrote: Hello, I wanted to ask whether

Re: online lda learning algorithm

2012-03-26 Thread Dirk Weissenborn
and just ajust that one slightly? 2012/3/27 Dirk Weissenborn dirk.weissenb...@googlemail.com no problem. I ll post it 2012/3/27 Jake Mannix jake.man...@gmail.com Hey Dirk, Do you mind continuing this discussion on the mailing list? Lots of our users may ask this kind of question

Re: GSOC 2012

2012-03-19 Thread Dirk Weissenborn
What could be possible projects for this year? Or should the student provide a proposal for an improvement? I would probably be interested in working on a project for this years GSOC. I am also currently in contact with dbpedia spotlight developers for GSOC. They are interested in a topical

GSOC 2012

2012-03-04 Thread Dirk Weissenborn
Hey, I just wanted to check whether or not mahout is applying for GSOC this year again, because I would be interested in implementing another algorithm for mahout or improve an existing one. Regards, Dirk

Re: [jira] [Commented] (MAHOUT-968) Classifier based on restricted boltzmann machines

2012-02-05 Thread Dirk Weissenborn
on it! i am running my own test right now on the mnist testset, and i think in the next few days, i could upload the patch. but i still got a little testing going on. I ll let you know when it is ready! 2012/2/5 Viktor Gal (Commented) (JIRA) j...@apache.org [

Re: [jira] [Created] (MAHOUT-968) Classifier based on restricted boltzmann machines

2012-02-01 Thread Dirk Weissenborn
Hello Ted, I would have to study the paper you ve given me first a little bit. What I could do at the moment is a small adn easy overview over the model and algorithm I am implementing... Deep Boltzmann Machines that I am using for classification are artificial neural networks based on stacked

Re: [jira] [Created] (MAHOUT-968) Classifier based on restricted boltzmann machines

2012-02-01 Thread Dirk Weissenborn
exist? On Wed, Feb 1, 2012 at 1:57 PM, Dirk Weissenborn dirk.weissenb...@googlemail.com wrote: Hello Ted, I would have to study the paper you ve given me first a little bit. What I could do at the moment is a small adn easy overview over the model and algorithm I am implementing

converting idx files to mahout vector files

2012-01-13 Thread Dirk Weissenborn
Hello, I'd like to know whether there is a possibility in mahout to convert a byte file like the idx files of the mnist corpus ( http://yann.lecun.com/exdb/mnist/) to files containing mahout vectors, which i´d like to use for classification with rbms which I am writing now. Another thing I'd like