Re: Making Mahout Leaner

2012-05-09 Thread Grant Ingersoll
On May 8, 2012, at 12:43 PM, Jake Mannix wrote: On Tue, May 8, 2012 at 9:31 AM, Ted Dunning ted.dunn...@gmail.com wrote: This is frustrating to consider losing Bayes, but I would consider keeping it if only to decrease the number of questions on the list about why the examples from the

Re: Making Mahout Leaner

2012-05-09 Thread Robin Anil
I believe most of this new NB discussion has been over chat. So here is the state of the NB universe from my view 1) Original NB and CNB code was as follows - Tokenize and find all possible collocations - Compute Tf and Idf for each ngram - Compute Global and per class sums for tf,

[jira] [Commented] (MAHOUT-803) Complete minsize constraints for similarity measures used in RowSimilarityJob

2012-05-09 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13271494#comment-13271494 ] Sebastian Schelter commented on MAHOUT-803: --- I'd like to clarify this issue a

[jira] [Assigned] (MAHOUT-803) Complete minsize constraints for similarity measures used in RowSimilarityJob

2012-05-09 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi reassigned MAHOUT-803: Assignee: Suneel Marthi (was: Sebastian Schelter) Complete minsize constraints for

Build failed in Jenkins: Mahout-Examples-Cluster-Reuters #127

2012-05-09 Thread Apache Jenkins Server
See https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters/127/changes Changes: [ssc] MAHOUT-979 cleanup: removing unused imports [ssc] MAHOUT-979 RowSimilarityJob should be able to infer the number of columns from the input matrix if not specified

[jira] [Resolved] (MAHOUT-933) Implement mapreduce version of ClusterIterator

2012-05-09 Thread Jeff Eastman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Eastman resolved MAHOUT-933. - Resolution: Fixed Closing this as the last subtask has been completed

[jira] [Resolved] (MAHOUT-990) Convert Dirichlet buildClusters to use new ClusterIterator

2012-05-09 Thread Jeff Eastman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Eastman resolved MAHOUT-990. - Resolution: Fixed Committed revision 1336424 that was based upon the above patches. Some changes

[jira] [Updated] (MAHOUT-929) Refactor Clustering (Vector Classification) into a Separate Postprocess with Outlier Pruning

2012-05-09 Thread Jeff Eastman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Eastman updated MAHOUT-929: Resolution: Fixed Status: Resolved (was: Patch Available) Resolving as all subtasks have

Jenkins build is still unstable: Mahout-Quality #1469

2012-05-09 Thread Apache Jenkins Server
See https://builds.apache.org/job/Mahout-Quality/changes

[jira] [Commented] (MAHOUT-990) Convert Dirichlet buildClusters to use new ClusterIterator

2012-05-09 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13271975#comment-13271975 ] Hudson commented on MAHOUT-990: --- Integrated in Mahout-Quality #1469 (See

Jenkins build is still unstable: Mahout-Quality #1470

2012-05-09 Thread Apache Jenkins Server
See https://builds.apache.org/job/Mahout-Quality/changes