Jenkins build is unstable: Mahout-Quality #1405

2012-03-21 Thread Apache Jenkins Server
See https://builds.apache.org/job/Mahout-Quality/1405/changes



[jira] [Commented] (MAHOUT-981) Refactor KMeans Clustering into a separate post process with outlier pruning

2012-03-21 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234184#comment-13234184
 ] 

Hudson commented on MAHOUT-981:
---

Integrated in Mahout-Quality #1405 (See 
[https://builds.apache.org/job/Mahout-Quality/1405/])
Mahout-981, Fixing test cases which are keeping clusters-*-final in the 
same directory for canopy and kmeans. (Revision 1303282)

 Result = SUCCESS

 Refactor KMeans Clustering into a separate post process with outlier pruning
 

 Key: MAHOUT-981
 URL: https://issues.apache.org/jira/browse/MAHOUT-981
 Project: Mahout
  Issue Type: Sub-task
  Components: Classification, Clustering
Affects Versions: 0.6
Reporter: Paritosh Ranjan
Assignee: Paritosh Ranjan
  Labels: classification, clustering
 Fix For: 0.7

 Attachments: MAHOUT-981.txt


 Use ClusterClassificationDriver to refactor clustering out of KMeansDriver 
 with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAHOUT-994) mahout script shouldn't rely on HADOOP_HOME since that was deprecated in all major Hadoop branches

2012-03-21 Thread Roman Shaposhnik (Created) (JIRA)
mahout script shouldn't rely on HADOOP_HOME since that was deprecated in all 
major Hadoop branches
--

 Key: MAHOUT-994
 URL: https://issues.apache.org/jira/browse/MAHOUT-994
 Project: Mahout
  Issue Type: Bug
  Components: Integration
Affects Versions: 0.6
Reporter: Roman Shaposhnik


Mahout should follow the Pig and Hive example and not rely explicitly on 
HADOOP_HOME and HADOOP_CONF_DIR

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-994) mahout script shouldn't rely on HADOOP_HOME since that was deprecated in all major Hadoop branches

2012-03-21 Thread Dmitriy Lyubimov (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234521#comment-13234521
 ] 

Dmitriy Lyubimov commented on MAHOUT-994:
-

What it should be relied on in new Hadoop branches to find the hadoop client 
libraries and settings?



 mahout script shouldn't rely on HADOOP_HOME since that was deprecated in all 
 major Hadoop branches
 --

 Key: MAHOUT-994
 URL: https://issues.apache.org/jira/browse/MAHOUT-994
 Project: Mahout
  Issue Type: Bug
  Components: Integration
Affects Versions: 0.6
Reporter: Roman Shaposhnik

 Mahout should follow the Pig and Hive example and not rely explicitly on 
 HADOOP_HOME and HADOOP_CONF_DIR

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (MAHOUT-994) mahout script shouldn't rely on HADOOP_HOME since that was deprecated in all major Hadoop branches

2012-03-21 Thread Dmitriy Lyubimov (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234521#comment-13234521
 ] 

Dmitriy Lyubimov edited comment on MAHOUT-994 at 3/21/12 5:13 PM:
--

What it should be relied on in new Hadoop branches to find the hadoop client 
libraries and settings?

Could you please describe the solution?


  was (Author: dlyubimov):
What it should be relied on in new Hadoop branches to find the hadoop 
client libraries and settings?


  
 mahout script shouldn't rely on HADOOP_HOME since that was deprecated in all 
 major Hadoop branches
 --

 Key: MAHOUT-994
 URL: https://issues.apache.org/jira/browse/MAHOUT-994
 Project: Mahout
  Issue Type: Bug
  Components: Integration
Affects Versions: 0.6
Reporter: Roman Shaposhnik

 Mahout should follow the Pig and Hive example and not rely explicitly on 
 HADOOP_HOME and HADOOP_CONF_DIR

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




MatrixVectorView Considered Harmful

2012-03-21 Thread Jake Mannix
Can anyone tell me what the following implementation choice will cause:

public class MatrixVectorView extends AbstractVector {

  //...

  public IteratorElement iterateNonZero() {
return iterator();
  }

  // ...

}

Note that MatrixVectorView is returned from every call to viewRow(), and
getRow() was removed in the last release.

-- 

  -jake


Re: MatrixVectorView Considered Harmful

2012-03-21 Thread Ted Dunning
This causes implementations that don't over-ride that method to lose the
benefits of sparsity when iterating through rows.

I deduce from the existence of your email that important sparse matrix
implementations suffer from this defect.

On Wed, Mar 21, 2012 at 10:32 PM, Jake Mannix jake.man...@gmail.com wrote:

 Can anyone tell me what the following implementation choice will cause:

 public class MatrixVectorView extends AbstractVector {

  //...

  public IteratorElement iterateNonZero() {
return iterator();
  }

  // ...

 }

 Note that MatrixVectorView is returned from every call to viewRow(), and
 getRow() was removed in the last release.

 --

  -jake



Jenkins build is still unstable: Mahout-Quality #1406

2012-03-21 Thread Apache Jenkins Server
See https://builds.apache.org/job/Mahout-Quality/changes



[jira] [Commented] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

2012-03-21 Thread Saikat Kanjilal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235364#comment-13235364
 ] 

Saikat Kanjilal commented on MAHOUT-984:


Paritosh,
I'm running into a strange issue, I've refactored the FuzzyKMeansDriver similar 
to KMeansDriver and to use the FuzzyKMeansClusteringPolicy with the other logic 
being pretty much the same.  The unit test for FuzzyKMeansDriver when run 
individually passes, however the unit test fails when I go to run all the unit 
tests together.  I am attaching the clusterData function here, any ideas on 
this?

Regards


  public static void clusterData(Path input,
 Path clustersIn,
 Path output,
 DistanceMeasure measure,
 double convergenceDelta,
 float m,
 boolean emitMostLikely,
 double threshold,
 boolean runSequential)
throws IOException, ClassNotFoundException, InterruptedException {
if (log.isInfoEnabled()) {
log.info(Running Clustering);
log.info(Input: {} Clusters In: {} Out: {} Distance: {}, new Object[] 
{input, clustersIn, output, measure});
  }
  ClusterClassifier.writePolicy(new 
FuzzyKMeansClusteringPolicy((double)m,convergenceDelta), clustersIn);
  ClusterClassificationDriver.run(input, output, new Path(output, 
CLUSTERED_POINTS_DIRECTORY),
  threshold, true, runSequential);

  }


 Refactor Fuzzy K Means Clustering into a separate post process with outlier 
 pruning
 ---

 Key: MAHOUT-984
 URL: https://issues.apache.org/jira/browse/MAHOUT-984
 Project: Mahout
  Issue Type: Sub-task
  Components: Clustering
Affects Versions: 0.6
Reporter: Paritosh Ranjan
Assignee: Paritosh Ranjan
  Labels: clustering
 Fix For: 0.7


 Use ClusterClassificationDriver to refactor clustering out of 
 FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira