Re: gsoc , EM or SVM?
Hi I decided to go with the mixture model for EM. I have modified my proposal and submit it both on gsoc website and apache wiki. Best Regards Yifan 2009/4/1 Yifan Wang : > I will choose Mixture Model for the EM implementation. > > Yifan > > 2009/4/1 Ted Dunning : >> Yifan, >> >> EM is a highly non-specific term and covers a huge range of very different >> algorithms. For example, pLSI, HMM's, and mixture models can all be >> estimated using EM. >> >> What exactly did you mean to address with an EM implementation? >> >> On Wed, Apr 1, 2009 at 1:05 PM, Grant Ingersoll wrote: >> >>> Hi Yifan, >>> >>> I think both are good candidates, although AIUI, SVM is a bit harder to >>> parallelize, so maybe it would make sense to focus on EM. Of course, we >>> don't have to be distributed, so you could propose a non-distributed SVM >>> implementation as a first cut and then work on the distributed part as the >>> project develops. >>> >>> ... For EM, it is a generalization of the k-means algorithm, and we already have k-means in the Mahout library. >> >
Re: gsoc , EM or SVM?
I will choose Mixture Model for the EM implementation. Yifan 2009/4/1 Ted Dunning : > Yifan, > > EM is a highly non-specific term and covers a huge range of very different > algorithms. For example, pLSI, HMM's, and mixture models can all be > estimated using EM. > > What exactly did you mean to address with an EM implementation? > > On Wed, Apr 1, 2009 at 1:05 PM, Grant Ingersoll wrote: > >> Hi Yifan, >> >> I think both are good candidates, although AIUI, SVM is a bit harder to >> parallelize, so maybe it would make sense to focus on EM. Of course, we >> don't have to be distributed, so you could propose a non-distributed SVM >> implementation as a first cut and then work on the distributed part as the >> project develops. >> >> ... >>> >>> >>> For EM, it is a generalization of the k-means algorithm, and we already >>> have >>> k-means in the Mahout library. >>> >>> >
Re: gsoc , EM or SVM?
Yifan, EM is a highly non-specific term and covers a huge range of very different algorithms. For example, pLSI, HMM's, and mixture models can all be estimated using EM. What exactly did you mean to address with an EM implementation? On Wed, Apr 1, 2009 at 1:05 PM, Grant Ingersoll wrote: > Hi Yifan, > > I think both are good candidates, although AIUI, SVM is a bit harder to > parallelize, so maybe it would make sense to focus on EM. Of course, we > don't have to be distributed, so you could propose a non-distributed SVM > implementation as a first cut and then work on the distributed part as the > project develops. > > ... >> >> >> For EM, it is a generalization of the k-means algorithm, and we already >> have >> k-means in the Mahout library. >> >>
Re: gsoc , EM or SVM?
Hi Yifan, I think both are good candidates, although AIUI, SVM is a bit harder to parallelize, so maybe it would make sense to focus on EM. Of course, we don't have to be distributed, so you could propose a non- distributed SVM implementation as a first cut and then work on the distributed part as the project develops. -Grant On Mar 31, 2009, at 2:48 AM, Yifan Wang wrote: Hi, My Name is Yifan. I submitted a proposal for the gsoc this year. I am interested in the classification and clustering algorithms. Because I need one such algorithm for the experimental project that I started myself for text classification and clustering. In my proposal, I planned to implement two of the machine learning algorithms: EM and SVM. But it seems a bit much to implement two algorithms in gsoc, so now I need to choose one between the two algorithms. For EM, it is a generalization of the k-means algorithm, and we already have k-means in the Mahout library. For SVM, It is a quite important algorithm for classification while implementation of it can be hard. So any suggestions of which one has the most benefit to the Mahout library and may be a good candidate for the gsoc? Best Regards Yifan -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
gsoc , EM or SVM?
Hi, My Name is Yifan. I submitted a proposal for the gsoc this year. I am interested in the classification and clustering algorithms. Because I need one such algorithm for the experimental project that I started myself for text classification and clustering. In my proposal, I planned to implement two of the machine learning algorithms: EM and SVM. But it seems a bit much to implement two algorithms in gsoc, so now I need to choose one between the two algorithms. For EM, it is a generalization of the k-means algorithm, and we already have k-means in the Mahout library. For SVM, It is a quite important algorithm for classification while implementation of it can be hard. So any suggestions of which one has the most benefit to the Mahout library and may be a good candidate for the gsoc? Best Regards Yifan