[jira] [Updated] (MAHOUT-1638) H2O bindings fail at drmParallelizeWithRowLabels(...)

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1638: --- Labels: DSL scala (was: ) H2O bindings fail at drmParallelizeWithRowLabels(...)

[jira] [Updated] (MAHOUT-1618) Cooccurrence Recommender example and documentation

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1618: --- Labels: DSL scala spark (was: ) Cooccurrence Recommender example and documentation

[jira] [Created] (MAHOUT-1643) CLI arguments are not being processed in spark-shell

2015-03-05 Thread Andrew Palumbo (JIRA)
Andrew Palumbo created MAHOUT-1643: -- Summary: CLI arguments are not being processed in spark-shell Key: MAHOUT-1643 URL: https://issues.apache.org/jira/browse/MAHOUT-1643 Project: Mahout

[jira] [Updated] (MAHOUT-1643) CLI arguments are not being processed in spark-shell

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1643: --- Description: The CLI arguments are not being processed in spark-shell. Most importantly the

[jira] [Commented] (MAHOUT-1643) CLI arguments are not being processed in spark-shell

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349640#comment-14349640 ] Andrew Palumbo commented on MAHOUT-1643: I don't think you can change things like

[jira] [Updated] (MAHOUT-1559) Add documentation for and clean up the wikipedia classifier example

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1559: --- Labels: legacy (was: ) Add documentation for and clean up the wikipedia classifier example

[jira] [Updated] (MAHOUT-1559) Add documentation for and clean up the wikipedia classifier example

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1559: --- Labels: DSL legacy scala (was: legacy) Add documentation for and clean up the wikipedia

[jira] [Updated] (MAHOUT-1600) Algorithms for computing correlation and covariance

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1600: --- Labels: DSL scala (was: ) Algorithms for computing correlation and covariance

[jira] [Updated] (MAHOUT-1605) Make VisualizerTest locale independent

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1605: --- Labels: legacy (was: ) Make VisualizerTest locale independent

Re: [jira] [Commented] (MAHOUT-1643) CLI arguments are not being processed in spark-shell

2015-03-05 Thread Dmitriy Lyubimov
nope afaik. MAHOUT_OPTS is the place to set that (if we are talking about shell). On Thu, Mar 5, 2015 at 3:50 PM, Andrew Palumbo (JIRA) j...@apache.org wrote: [

[jira] [Updated] (MAHOUT-1564) Naive Bayes Classifier for New Text Documents

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1564: --- Labels: DSL legacy scala spark (was: ) Naive Bayes Classifier for New Text Documents

[jira] [Updated] (MAHOUT-1617) 404 error on link in cluster-dumper tutorial page

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1617: --- Labels: legacy (was: ) 404 error on link in cluster-dumper tutorial page

[jira] [Commented] (MAHOUT-1643) CLI arguments are not being processed in spark-shell

2015-03-05 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349633#comment-14349633 ] Pat Ferrel commented on MAHOUT-1643: ooops spark-shell doesn't use option parser,

[jira] [Commented] (MAHOUT-1643) CLI arguments are not being processed in spark-shell

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349655#comment-14349655 ] Andrew Palumbo commented on MAHOUT-1643: Yeah- talking about the shell. Do we

[jira] [Updated] (MAHOUT-1623) MAHOUT.CMD contains duplicated code

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1623: --- Labels: build easyfix (was: build easyfix legacy) MAHOUT.CMD contains duplicated code

[jira] [Updated] (MAHOUT-1623) MAHOUT.CMD contains duplicated code

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1623: --- Labels: build easyfix legacy (was: build easyfix) MAHOUT.CMD contains duplicated code

[jira] [Updated] (MAHOUT-1585) Temporarily Remove or Fix links to missing Javadocs

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1585: --- Labels: DSL legacy scala spark (was: ) Temporarily Remove or Fix links to missing Javadocs

[jira] [Updated] (MAHOUT-1601) Add javadoc for the classes - as there is no clue what the class is for .

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1601: --- Labels: documentation legacy (was: documentation) Add javadoc for the classes - as there

[jira] [Commented] (MAHOUT-1559) Add documentation for and clean up the wikipedia classifier example

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349671#comment-14349671 ] Andrew Palumbo commented on MAHOUT-1559: I have a scala script that I'd like to

[jira] [Commented] (MAHOUT-1618) Cooccurrence Recommender example and documentation

2015-03-05 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349670#comment-14349670 ] Pat Ferrel commented on MAHOUT-1618: If someone want's to pick this up. The best idea

[jira] [Updated] (MAHOUT-1579) Implement a datamodel which can load data from hadoop filesystem directly

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1579: --- Labels: legacy (was: ) Implement a datamodel which can load data from hadoop filesystem

[jira] [Updated] (MAHOUT-1516) run classify-20newsgroups.sh failed cause by /tmp/mahout-work-jpan/20news-all does not exists in hdfs.

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1516: --- Labels: legacy patch (was: patch) run classify-20newsgroups.sh failed cause by

[jira] [Updated] (MAHOUT-1562) Publish Scaladocs

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1562: --- Labels: DSL scala scaladocs spark spark-shell (was: scaladocs) Publish Scaladocs

[jira] [Updated] (MAHOUT-1277) Lose dependency on custom commons-cli

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1277: --- Labels: legacy scala (was: ) Lose dependency on custom commons-cli

[jira] [Updated] (MAHOUT-1578) Optimizations in matrix serialization

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1578: --- Labels: legacy (was: ) Optimizations in matrix serialization

[jira] [Updated] (MAHOUT-1602) Euclidean Distance Similarity Math

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1602: --- Labels: legacy (was: ) Euclidean Distance Similarity Math

[jira] [Updated] (MAHOUT-1603) Tweaks for Spark 1.0.x

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1603: --- Labels: DSL scala spark (was: ) Tweaks for Spark 1.0.x ---

[jira] [Resolved] (MAHOUT-1603) Tweaks for Spark 1.0.x

2015-03-05 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov resolved MAHOUT-1603. -- Resolution: Fixed Tweaks for Spark 1.0.x ---

[jira] [Updated] (MAHOUT-1641) Add conversion from a RDD[(String, String)] to a Drm[Int]

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1641: --- Labels: DSL scala spark (was: ) Add conversion from a RDD[(String, String)] to a Drm[Int]

[jira] [Updated] (MAHOUT-1640) Better collections would significantly improve vector-operation speed

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1640: --- Labels: legacy math scala (was: ) Better collections would significantly improve

[jira] [Updated] (MAHOUT-1462) Cleaning up Random Forests documentation on Mahout website

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1462: --- Labels: legacy (was: ) Cleaning up Random Forests documentation on Mahout website

[jira] [Updated] (MAHOUT-1443) Update How to release page

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1443: --- Labels: legacy scala (was: ) Update How to release page

[jira] [Updated] (MAHOUT-1552) Avoid new Configuration() instantiation

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1552: --- Labels: legacy (was: ) Avoid new Configuration() instantiation

[jira] [Updated] (MAHOUT-1551) Add document to describe how to use mlp with command line

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1551: --- Labels: documentation legacy (was: documentation) Add document to describe how to use mlp

[jira] [Updated] (MAHOUT-1443) Update How to release page

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1443: --- Labels: legacy release scala (was: legacy scala) Update How to release page

[jira] [Updated] (MAHOUT-1425) SGD classifier example with bank marketing dataset

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1425: --- Labels: legacy (was: ) SGD classifier example with bank marketing dataset

[jira] [Commented] (MAHOUT-1643) CLI arguments are not being processed in spark-shell

2015-03-05 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349626#comment-14349626 ] Pat Ferrel commented on MAHOUT-1643: Should be working in the master branch but it's

[jira] [Updated] (MAHOUT-1584) Create a detailed example of how to index an arbitrary dataset and run LDA on it

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1584: --- Labels: documentation legacy (was: documentation) Create a detailed example of how to

[jira] [Updated] (MAHOUT-1588) Multiple input path support in recommendation job

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1588: --- Labels: legacy (was: ) Multiple input path support in recommendation job

[jira] [Updated] (MAHOUT-1557) Add support for sparse training vectors in MLP

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1557: --- Labels: legacy mlp (was: mlp) Add support for sparse training vectors in MLP

[jira] [Updated] (MAHOUT-1470) Topic dump

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1470: --- Labels: legacy (was: ) Topic dump -- Key: MAHOUT-1470

[jira] [Updated] (MAHOUT-1278) Improve inheritance of Apache parent pom

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1278: --- Labels: legacy scala (was: ) Improve inheritance of Apache parent pom

Re: [jira] [Comment Edited] (MAHOUT-1643) CLI arguments are not being processed in spark-shell

2015-03-05 Thread Dmitriy Lyubimov
note that with MAHOUT_OPTS you have a choice. You can either set up env or you can use inline syntax like MAHOUT_OPTS='-Dk=n' bin/mahout spark-shell On Thu, Mar 5, 2015 at 4:50 PM, Andrew Palumbo (JIRA) j...@apache.org wrote: [

[jira] [Updated] (MAHOUT-1512) Hadoop 2 compatibility

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1512: --- Labels: legacy scala (was: ) Hadoop 2 compatibility --

[jira] [Updated] (MAHOUT-1522) Handle logging levels via log4j.xml

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1522: --- Labels: legacy scala (was: ) Handle logging levels via log4j.xml

[jira] [Updated] (MAHOUT-1495) Create a website describing the distributed item-based recommender

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1495: --- Labels: legacy (was: ) Create a website describing the distributed item-based recommender

[jira] [Updated] (MAHOUT-1477) Clean up website on Logistic Regression

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1477: --- Labels: legacy (was: ) Clean up website on Logistic Regression

[jira] [Updated] (MAHOUT-1630) Incorrect SparseColumnMatrix.numSlices() causes IndexException in toString()

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1630: --- Labels: legacy math scala (was: ) Incorrect SparseColumnMatrix.numSlices() causes

[jira] [Commented] (MAHOUT-1628) Propagation of Updates in DF

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349826#comment-14349826 ] Andrew Palumbo commented on MAHOUT-1628: Data Frames being DRMs I am assuming.

JIRA- legacy scala labels

2015-03-05 Thread Andrew Palumbo
I went through all of the unresolved JIRA issues and marked all with at least a legacy or scala. (for lack of a better name for all that is not legacy) label. Hopefully I got them all. Some are labelled with both (math, build, documentation related to both or neither, etc.) legacy issues:

[jira] [Updated] (MAHOUT-1594) Example factorize-movielens-1M.sh does not use HDFS

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1594: --- Labels: legacy newbie patch (was: newbie patch) Example factorize-movielens-1M.sh does not

Re: [jira] [Commented] (MAHOUT-1643) CLI arguments are not being processed in spark-shell

2015-03-05 Thread Dmitriy Lyubimov
the hack i have only takes MAHOUT_OPTS. it normally actually makes more sense to set it there since spark options are too numerous and too long to enter on command line. so i'd say we need to support MAHOUT_OPTS at minimum; or both. On Thu, Mar 5, 2015 at 4:04 PM, Andrew Palumbo (JIRA)

[jira] [Assigned] (MAHOUT-1618) Cooccurrence Recommender example and documentation

2015-03-05 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pat Ferrel reassigned MAHOUT-1618: -- Assignee: Pat Ferrel Cooccurrence Recommender example and documentation

[jira] [Updated] (MAHOUT-1624) Compilation errors when changing Lucene version to 4.10.1

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1624: --- Labels: legacy lucene (was: ) Compilation errors when changing Lucene version to 4.10.1

[jira] [Updated] (MAHOUT-1639) streamingkmeans doesn't properly validate estimatedNumMapClusters -km

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1639: --- Labels: legacy (was: ) streamingkmeans doesn't properly validate estimatedNumMapClusters

[jira] [Commented] (MAHOUT-1643) CLI arguments are not being processed in spark-shell

2015-03-05 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349695#comment-14349695 ] Pat Ferrel commented on MAHOUT-1643: I used the the spark shell methods to change the

[jira] [Commented] (MAHOUT-1578) Optimizations in matrix serialization

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349748#comment-14349748 ] Andrew Palumbo commented on MAHOUT-1578: this can be closed right?

[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349754#comment-14349754 ] Andrew Palumbo commented on MAHOUT-1603: this can be closed, right? Tweaks for

[jira] [Updated] (MAHOUT-1604) Create a RowSimilarity for Spark

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1604: --- Labels: DSL scala spark (was: ) Create a RowSimilarity for Spark

[jira] [Updated] (MAHOUT-1621) k-fold cross-validation in MapReduce Random Forest example?

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1621: --- Labels: legacy (was: ) k-fold cross-validation in MapReduce Random Forest example?

[jira] [Updated] (MAHOUT-1620) how to use mahout command matrixmult ?

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1620: --- Labels: legacy (was: ) how to use mahout command matrixmult ?

[jira] [Updated] (MAHOUT-1598) extend seq2sparse to handle multiple text blocks of same document

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1598: --- Labels: legacy (was: ) extend seq2sparse to handle multiple text blocks of same document

[jira] [Updated] (MAHOUT-1485) Clean up Recommender Overview page

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1485: --- Labels: legacy (was: ) Clean up Recommender Overview page

[jira] [Updated] (MAHOUT-1490) Data frame R-like bindings

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1490: --- Labels: DSL scala (was: ) Data frame R-like bindings --

[jira] [Updated] (MAHOUT-1607) spark-shell:scheduler.DAGScheduler: Failed to run fold at CheckpointedDrmSpark.scala:192

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1607: --- Labels: DSL scala spark test (was: test) spark-shell:scheduler.DAGScheduler: Failed to run

[jira] [Updated] (MAHOUT-1612) NullPointerException happens during JSON output format for clusterdumper

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1612: --- Labels: legacy (was: ) NullPointerException happens during JSON output format for

[jira] [Updated] (MAHOUT-1524) Script to auto-generate and view the Mahout website on a local machine

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1524: --- Labels: legacy scala (was: ) Script to auto-generate and view the Mahout website on a

[jira] [Updated] (MAHOUT-1539) Implement affinity matrix computation in Mahout DSL

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1539: --- Labels: DSL scala spark (was: ) Implement affinity matrix computation in Mahout DSL

[jira] [Updated] (MAHOUT-1507) Support input and output using user defined ID wherever possible

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1507: --- Labels: DSL scala spark (was: spark) Support input and output using user defined ID

[jira] [Updated] (MAHOUT-1626) Support for required quasi-algebraic operations and starting with aggregating rows/blocks

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1626: --- Labels: DSL scala spark (was: ) Support for required quasi-algebraic operations and

[jira] [Updated] (MAHOUT-1625) lucene2seq: failure to convert a document that does not contain a field (the field is not required)

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1625: --- Labels: LuceneIndexHelper easyfix legacy lucene lucene2seq mahout (was: LuceneIndexHelper

[jira] [Commented] (MAHOUT-1569) Create CLI driver that supports Spark jobs

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349804#comment-14349804 ] Andrew Palumbo commented on MAHOUT-1569: [~pferrel] this can be closed, right?

[jira] [Updated] (MAHOUT-1540) Reuters example for spectral clustering

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1540: --- Labels: DSL scala spark (was: ) Reuters example for spectral clustering

[jira] [Updated] (MAHOUT-1627) Problem with ALS Factorizer MapReduce version when working with oozie because of files in distributed cache. Error: Unable to read sequence file from cache.

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1627: --- Labels: legacy (was: ) Problem with ALS Factorizer MapReduce version when working with

[jira] [Updated] (MAHOUT-1592) bin/maout's seqdirectory doesn't work when MAHOUT_LOCAL non-empty

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1592: --- Labels: legacy (was: ) bin/maout's seqdirectory doesn't work when MAHOUT_LOCAL non-empty

[jira] [Commented] (MAHOUT-1643) CLI arguments are not being processed in spark-shell

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349710#comment-14349710 ] Andrew Palumbo commented on MAHOUT-1643: Ok, I'll leave this open. It seems that

[jira] [Commented] (MAHOUT-1643) CLI arguments are not being processed in spark-shell

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349739#comment-14349739 ] Andrew Palumbo commented on MAHOUT-1643: for future reference: {quote} note that

[jira] [Updated] (MAHOUT-1589) mahout.cmd has duplicated content

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1589: --- Labels: legacy scala (was: ) mahout.cmd has duplicated content

[jira] [Updated] (MAHOUT-1633) Failure to execute query when solr index contains documents with different fields

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1633: --- Labels: LuceneSegmentRecordReader easyfix legacy lucene lucene2SeqConfiguration lucene2seq

[jira] [Updated] (MAHOUT-1632) Please help me im stuck on using 20 newsgroups example on Windows

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1632: --- Labels: legacy (was: ) Please help me im stuck on using 20 newsgroups example on Windows

[jira] [Updated] (MAHOUT-1582) Create simpler row and column aggregation API at local level

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1582: --- Labels: legacy math scala (was: ) Create simpler row and column aggregation API at local

[jira] [Updated] (MAHOUT-1543) JSON output format for classifying with random forests

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1543: --- Labels: legacy patch (was: patch) JSON output format for classifying with random forests

[jira] [Updated] (MAHOUT-1586) Downloads must have hashes

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1586: --- Labels: legacy scala (was: ) Downloads must have hashes --

[jira] [Updated] (MAHOUT-1464) Cooccurrence Analysis on Spark

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1464: --- Labels: DSL scala spark (was: ) Cooccurrence Analysis on Spark

[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349780#comment-14349780 ] Andrew Palumbo commented on MAHOUT-1464: We can close this, right? Cooccurrence

[jira] [Commented] (MAHOUT-1607) spark-shell:scheduler.DAGScheduler: Failed to run fold at CheckpointedDrmSpark.scala:192

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349793#comment-14349793 ] Andrew Palumbo commented on MAHOUT-1607: this looks like it was probably due to a

[jira] [Updated] (MAHOUT-1541) Create CLI Driver for Spark Cooccurrence Analysis

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1541: --- Labels: DSL scala spark (was: ) Create CLI Driver for Spark Cooccurrence Analysis

[jira] [Updated] (MAHOUT-1538) Port spectral clustering to Mahout DSL

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1538: --- Labels: DSL Spark scala (was: ) Port spectral clustering to Mahout DSL

[jira] [Updated] (MAHOUT-1567) Add online sparse dictionary learning (dimensionality reduction)

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1567: --- Labels: DSL scala (was: ) Add online sparse dictionary learning (dimensionality reduction)

[jira] [Updated] (MAHOUT-1637) RecommenderJob of ALS fails in the mapper because it uses the instance of other class

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1637: --- Labels: als collaborative-filtering legacy (was: als collaborative-filtering)

[jira] [Updated] (MAHOUT-1629) Mahout cvb on AWS EMR: p(topic|docId) doesn't make sense when using s3 folder as --input

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1629: --- Labels: legacy (was: ) Mahout cvb on AWS EMR: p(topic|docId) doesn't make sense when using

[jira] [Updated] (MAHOUT-1628) Propagation of Updates in DF

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1628: --- Labels: legacy (was: ) Propagation of Updates in DF

[jira] [Updated] (MAHOUT-1628) Propagation of Updates in DF

2015-03-05 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1628: --- Labels: DSL scala (was: legacy) Propagation of Updates in DF

Re: PMML

2015-03-05 Thread Pat Ferrel
PMML doesn’t make a lot of sense when the model is a potentially massive matrix. One reason is that it will be pretty hard (impossible?) to parallelize read/write with the engines we use. JSON has the same problem and the only way SchemaRDD can read JSON is by bending the rules. Seems like a

Re: Spark 1.1.1 and 1.2.1

2015-03-05 Thread Pat Ferrel
I can’t build Mahout for 1.2.0 It seems like the artifacts are no longer available? Odd because I built Spark 1.2.0. If someone is using 1.2 please feel free to try it, just change the parent pom, a one-liner. Then run the Mahout shell and I’ll send instructions for itemsimilarity. On Mar 4,

Re: Next release

2015-03-05 Thread Pat Ferrel
Seems like we need the top list to be responded to also. Agree about similarity but a completely different method is needed for cosine and the other actual distance measures. The way the old Hadoop code did it is more appropriate. I’ll put it on my list. On Mar 5, 2015, at 9:46 AM, Andrew

Re: PMML

2015-03-05 Thread Suneel Marthi
Yes, it makes sense having one for Naive Bayes and KMeans (when we have that !!). On Thu, Mar 5, 2015 at 11:49 AM, Pat Ferrel p...@occamsmachete.com wrote: PMML doesn’t make a lot of sense when the model is a potentially massive matrix. One reason is that it will be pretty hard (impossible?)

Build failed in Jenkins: Mahout-Examples-Cluster-Reuters-II #1118

2015-03-05 Thread Apache Jenkins Server
See https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters-II/1118/ -- [...truncated 1701 lines...] A integration/src/test/java/org/apache/mahout/utils/vectors/arff/ARFFTypeTest.java A

Re: Next release

2015-03-05 Thread Pat Ferrel
Yes to splitting builds into legacy and scala (mostly). D can speak to his stuff better but It sounds like the Java Math module will be required but nothing from mrlegacy afaik. So a legacy and ??? build would overlap in the one module. We talked about using sbt but not sure that’s required for

Jenkins build is back to normal : Mahout-Quality #2982

2015-03-05 Thread Apache Jenkins Server
See https://builds.apache.org/job/Mahout-Quality/2982/

  1   2   >