Re: gsoc , EM or SVM?

2009-04-02 Thread Yifan Wang
Hi

I decided to go with the mixture model for EM.
I have modified my proposal and submit it both on gsoc website and apache wiki.

Best Regards
Yifan

2009/4/1 Yifan Wang :
> I will choose Mixture Model for the EM implementation.
>
> Yifan
>
> 2009/4/1 Ted Dunning :
>> Yifan,
>>
>> EM is a highly non-specific term and covers a huge range of very different
>> algorithms.  For example, pLSI, HMM's, and mixture models can all be
>> estimated using EM.
>>
>> What exactly did you mean to address with an EM implementation?
>>
>> On Wed, Apr 1, 2009 at 1:05 PM, Grant Ingersoll  wrote:
>>
>>> Hi Yifan,
>>>
>>> I think both are good candidates, although AIUI, SVM is a bit harder to
>>> parallelize, so maybe it would make sense to focus on EM.  Of course, we
>>> don't have to be distributed, so you could propose a non-distributed SVM
>>> implementation as a first cut and then work on the distributed part as the
>>> project develops.
>>>
>>> ...


 For EM, it is a generalization of the k-means algorithm, and we already
 have
 k-means in the Mahout library.


>>
>


Re: gsoc , EM or SVM?

2009-04-01 Thread Yifan Wang
I will choose Mixture Model for the EM implementation.

Yifan

2009/4/1 Ted Dunning :
> Yifan,
>
> EM is a highly non-specific term and covers a huge range of very different
> algorithms.  For example, pLSI, HMM's, and mixture models can all be
> estimated using EM.
>
> What exactly did you mean to address with an EM implementation?
>
> On Wed, Apr 1, 2009 at 1:05 PM, Grant Ingersoll  wrote:
>
>> Hi Yifan,
>>
>> I think both are good candidates, although AIUI, SVM is a bit harder to
>> parallelize, so maybe it would make sense to focus on EM.  Of course, we
>> don't have to be distributed, so you could propose a non-distributed SVM
>> implementation as a first cut and then work on the distributed part as the
>> project develops.
>>
>> ...
>>>
>>>
>>> For EM, it is a generalization of the k-means algorithm, and we already
>>> have
>>> k-means in the Mahout library.
>>>
>>>
>


Re: gsoc , EM or SVM?

2009-04-01 Thread Ted Dunning
Yifan,

EM is a highly non-specific term and covers a huge range of very different
algorithms.  For example, pLSI, HMM's, and mixture models can all be
estimated using EM.

What exactly did you mean to address with an EM implementation?

On Wed, Apr 1, 2009 at 1:05 PM, Grant Ingersoll  wrote:

> Hi Yifan,
>
> I think both are good candidates, although AIUI, SVM is a bit harder to
> parallelize, so maybe it would make sense to focus on EM.  Of course, we
> don't have to be distributed, so you could propose a non-distributed SVM
> implementation as a first cut and then work on the distributed part as the
> project develops.
>
> ...
>>
>>
>> For EM, it is a generalization of the k-means algorithm, and we already
>> have
>> k-means in the Mahout library.
>>
>>


Re: gsoc , EM or SVM?

2009-04-01 Thread Grant Ingersoll

Hi Yifan,

I think both are good candidates, although AIUI, SVM is a bit harder  
to parallelize, so maybe it would make sense to focus on EM.  Of  
course, we don't have to be distributed, so you could propose a non- 
distributed SVM implementation as a first cut and then work on the  
distributed part as the project develops.



-Grant

On Mar 31, 2009, at 2:48 AM, Yifan Wang wrote:

Hi, My Name is Yifan. I submitted a proposal for the gsoc this year.  
I am

interested in the classification and clustering algorithms.

Because I need one such algorithm for the experimental project that I
started myself for text classification and clustering.

In my proposal, I planned to implement two of the machine learning
algorithms: EM and SVM.

But it seems a bit much to implement two algorithms in gsoc, so now  
I need

to choose one between the two algorithms.

For EM, it is a generalization of the k-means algorithm, and we  
already have

k-means in the Mahout library.

For SVM, It is a quite important algorithm for classification while
implementation of it can be hard.

So any suggestions of which one has the most benefit to the Mahout  
library

and may be a good candidate for the gsoc?



Best Regards

Yifan





--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



gsoc , EM or SVM?

2009-03-30 Thread Yifan Wang
Hi, My Name is Yifan. I submitted a proposal for the gsoc this year. I am
interested in the classification and clustering algorithms.

Because I need one such algorithm for the experimental project that I
started myself for text classification and clustering.

In my proposal, I planned to implement two of the machine learning
algorithms: EM and SVM.

But it seems a bit much to implement two algorithms in gsoc, so now I need
to choose one between the two algorithms.

For EM, it is a generalization of the k-means algorithm, and we already have
k-means in the Mahout library. 

For SVM, It is a quite important algorithm for classification while
implementation of it can be hard.

So any suggestions of which one has the most benefit to the Mahout library
and may be a good candidate for the gsoc?

 

Best Regards

Yifan