[jira] [Updated] (SPARK-13672) Add python examples of BisectingKMeans in ML and MLLIB

2016-03-10 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13672: --- Shepherd: Nick Pentreath Assignee: zhengruifeng > Add python examples of BisectingKMeans

[jira] [Updated] (SPARK-13629) Add binary toggle Param to CountVectorizer

2016-03-10 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13629: --- Shepherd: Nick Pentreath > Add binary toggle Param to CountVectorizer >

[jira] [Updated] (SPARK-13629) Add binary toggle Param to CountVectorizer

2016-03-10 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13629: --- Assignee: yuhao yang > Add binary toggle Param to CountVectorizer >

[jira] [Resolved] (SPARK-13706) Python Example for Train Validation Split Missing

2016-03-09 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-13706. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11547

[jira] [Updated] (SPARK-13430) Expose ml summary function in PySpark for classification and regression models

2016-03-09 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13430: --- Assignee: Bryan Cutler > Expose ml summary function in PySpark for classification and

[jira] [Commented] (SPARK-12626) MLlib 2.0 Roadmap

2016-03-09 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15188774#comment-15188774 ] Nick Pentreath commented on SPARK-12626: [~dbtsai] ok thanks - would like to take a look when

[jira] [Commented] (SPARK-12626) MLlib 2.0 Roadmap

2016-03-09 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186985#comment-15186985 ] Nick Pentreath commented on SPARK-12626: [~mengxr] [~josephkb] I see this mentioned as a major

[jira] [Updated] (SPARK-13706) Python Example for Train Validation Split Missing

2016-03-09 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13706: --- Assignee: Jeremy > Python Example for Train Validation Split Missing >

[jira] [Updated] (SPARK-13706) Python Example for Train Validation Split Missing

2016-03-09 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13706: --- Issue Type: Improvement (was: Bug) > Python Example for Train Validation Split Missing >

[jira] [Commented] (SPARK-13600) Use approxQuantile from DataFrame stats in QuantileDiscretizer

2016-03-08 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186648#comment-15186648 ] Nick Pentreath commented on SPARK-13600: Thanks, that's fine > Use approxQuantile from DataFrame

[jira] [Commented] (SPARK-13600) Incorrect number of buckets in QuantileDiscretizer

2016-03-08 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184717#comment-15184717 ] Nick Pentreath commented on SPARK-13600: [~ocp] Could you update this ticket with something about

[jira] [Commented] (SPARK-10785) Scale QuantileDiscretizer using distributed binning

2016-03-08 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184711#comment-15184711 ] Nick Pentreath commented on SPARK-10785: Pending SPARK-13600, this would no longer be necessary,

[jira] [Commented] (SPARK-13629) Add binary toggle Param to CountVectorizer

2016-03-06 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182081#comment-15182081 ] Nick Pentreath commented on SPARK-13629: [~josephkb] what do you think about adding this param to

[jira] [Commented] (SPARK-13629) Add binary toggle Param to CountVectorizer

2016-03-04 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179571#comment-15179571 ] Nick Pentreath commented on SPARK-13629: Only the word count would be set to 1 (for non-zero

[jira] [Updated] (SPARK-12326) Move GBT implementation from spark.mllib to spark.ml

2016-03-03 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-12326: --- Assignee: Seth Hendrickson > Move GBT implementation from spark.mllib to spark.ml >

[jira] [Commented] (SPARK-13639) Statistics.colStats(rdd).mean and variance should handle NaN in the input vectors

2016-03-03 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179447#comment-15179447 ] Nick Pentreath commented on SPARK-13639: For SPARK-13568, we can take one of two approaches: 1.

[jira] [Commented] (SPARK-13568) Create feature transformer to impute missing values

2016-03-02 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177368#comment-15177368 ] Nick Pentreath commented on SPARK-13568: Ok - the Imputer will need to compute column stats

[jira] [Commented] (SPARK-13600) Incorrect number of buckets in QuantileDiscretizer

2016-03-02 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176117#comment-15176117 ] Nick Pentreath commented on SPARK-13600: [~ocp] do you plan to submit a PR? Since you worked on

[jira] [Commented] (SPARK-13568) Create feature transformer to impute missing values

2016-02-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172328#comment-15172328 ] Nick Pentreath commented on SPARK-13568: Sure, go ahead. However, taking a quick look at your

[jira] [Resolved] (SPARK-12348) PySpark _inferSchema crashes with incorrect exception on an empty RDD

2016-02-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-12348. Resolution: Not A Bug > PySpark _inferSchema crashes with incorrect exception on an empty

[jira] [Comment Edited] (SPARK-12348) PySpark _inferSchema crashes with incorrect exception on an empty RDD

2016-02-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171964#comment-15171964 ] Nick Pentreath edited comment on SPARK-12348 at 2/29/16 3:10 PM: - I'm not

[jira] [Commented] (SPARK-12348) PySpark _inferSchema crashes with incorrect exception on an empty RDD

2016-02-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171964#comment-15171964 ] Nick Pentreath commented on SPARK-12348: I'm not sure this is a bug or even a big deal. The cause

[jira] [Updated] (SPARK-12806) Support SQL expressions extracting values from VectorUDT

2016-02-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-12806: --- Description: Use cases exist where a specific index within a {{VectorUDT}} column of a

[jira] [Commented] (SPARK-12684) Matrix.toString should take a format for how each cell should be printed

2016-02-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171957#comment-15171957 ] Nick Pentreath commented on SPARK-12684: [~srowen] should this be resolved as *Won't Fix*? >

[jira] [Updated] (SPARK-13568) Create feature transformer to impute missing values

2016-02-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13568: --- Priority: Minor (was: Major) > Create feature transformer to impute missing values >

[jira] [Commented] (SPARK-13517) Expose regression summary classes in Pyspark

2016-02-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171905#comment-15171905 ] Nick Pentreath commented on SPARK-13517: Is this not a duplicate of SPARK-13430? > Expose

[jira] [Updated] (SPARK-13568) Create feature transformer to impute missing values

2016-02-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13568: --- Description: It is quite common to encounter missing values in data sets. It would be useful

[jira] [Updated] (SPARK-13568) Create feature transformer to impute missing values

2016-02-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13568: --- Description: It is quite common to encounter missing values in data sets. It would be useful

[jira] [Created] (SPARK-13568) Create feature transformer to impute missing values

2016-02-29 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-13568: -- Summary: Create feature transformer to impute missing values Key: SPARK-13568 URL: https://issues.apache.org/jira/browse/SPARK-13568 Project: Spark

[jira] [Resolved] (SPARK-12633) Make Parameter Descriptions Consistent for PySpark MLlib Regression

2016-02-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-12633. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11404

[jira] [Commented] (SPARK-13289) Word2Vec generate infinite distances when numIterations>5

2016-02-25 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168619#comment-15168619 ] Nick Pentreath commented on SPARK-13289: Master branch should be building now. Can you try again?

[jira] [Commented] (SPARK-13505) Python API for MaxAbsScaler

2016-02-25 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168529#comment-15168529 ] Nick Pentreath commented on SPARK-13505: [~holdenk] [~bryanc] [~sethah] any interest in adding

[jira] [Commented] (SPARK-13489) GSoC 2016 project ideas for MLlib

2016-02-25 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167120#comment-15167120 ] Nick Pentreath commented on SPARK-13489: Do we want to focus on work within core, or also

[jira] [Updated] (SPARK-13340) [ML] PolynomialExpansion and Normalizer should validate input type

2016-02-25 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13340: --- Assignee: Grzegorz Chilkiewicz > [ML] PolynomialExpansion and Normalizer should validate

[jira] [Commented] (SPARK-13289) Word2Vec generate infinite distances when numIterations>5

2016-02-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158446#comment-15158446 ] Nick Pentreath commented on SPARK-13289: Yes the master build is currently failing as detailed in

[jira] [Updated] (SPARK-12379) Copy GBT implementation to spark.ml

2016-02-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-12379: --- Assignee: Seth Hendrickson > Copy GBT implementation to spark.ml >

[jira] [Commented] (SPARK-13026) Umbrella: Allow user to specify initial model when training

2016-02-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156803#comment-15156803 ] Nick Pentreath commented on SPARK-13026: [~holdenk] is this JIRA necessary, as it duplicates

[jira] [Commented] (SPARK-13289) Word2Vec generate infinite distances when numIterations>5

2016-02-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156785#comment-15156785 ] Nick Pentreath commented on SPARK-13289: [~daiqi5477] could you try your experiments again

[jira] [Resolved] (SPARK-13334) ML KMeansModel/BisectingKMeansModel should be set parent

2016-02-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-13334. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11214

[jira] [Updated] (SPARK-13334) ML KMeansModel/BisectingKMeansModel should be set parent

2016-02-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13334: --- Assignee: Yanbo Liang > ML KMeansModel/BisectingKMeansModel should be set parent >

[jira] [Resolved] (SPARK-12632) Make Parameter Descriptions Consistent for PySpark MLlib FPM and Recommendation

2016-02-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-12632. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11186

[jira] [Updated] (SPARK-12632) Make Parameter Descriptions Consistent for PySpark MLlib FPM and Recommendation

2016-02-18 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-12632: --- Assignee: Bryan Cutler (was: somil deshmukh) > Make Parameter Descriptions Consistent for

[jira] [Updated] (SPARK-12247) Documentation for spark.ml's ALS and collaborative filtering in general

2016-01-21 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-12247: --- Assignee: Benjamin Fradet > Documentation for spark.ml's ALS and collaborative filtering in

[jira] [Updated] (SPARK-12247) Documentation for spark.ml's ALS and collaborative filtering in general

2016-01-21 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-12247: --- Affects Version/s: (was: 1.5.2) 2.0.0 > Documentation for

[jira] [Resolved] (SPARK-12296) Feature parity for pyspark.mllib StandardScalerModel

2015-12-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-12296. Resolution: Fixed Fix Version/s: 2.0.0 > Feature parity for pyspark.mllib

[jira] [Commented] (SPARK-12296) Feature parity for pyspark.mllib StandardScalerModel

2015-12-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15067725#comment-15067725 ] Nick Pentreath commented on SPARK-12296: Issue resolved by pull request 10298

[jira] [Updated] (SPARK-11922) Python API for ml.feature.QuantileDiscretizer

2015-12-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-11922: --- Assignee: holdenk > Python API for ml.feature.QuantileDiscretizer >

[jira] [Updated] (SPARK-12182) Distributed binning for trees in spark.ml

2015-12-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-12182: --- Assignee: Seth Hendrickson > Distributed binning for trees in spark.ml >

[jira] [Updated] (SPARK-12296) Feature parity for pyspark.mllib StandardScalerModel

2015-12-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-12296: --- Assignee: holdenk > Feature parity for pyspark.mllib StandardScalerModel >

[jira] [Commented] (SPARK-7008) An implementation of Factorization Machine (LibFM)

2015-10-23 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970870#comment-14970870 ] Nick Pentreath commented on SPARK-7008: --- Is this now going in 1.6 (as per SPARK-10324)? If so is

<    5   6   7   8   9   10