[jira] [Commented] (SPARK-7674) R-like stats for ML models

2017-04-07 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960530#comment-15960530 ] Nick Pentreath commented on SPARK-7674: --- Is this JIRA still open? Can it be resolved? Or are there

[jira] [Commented] (SPARK-12210) Small example that shows how to integrate spark.mllib with spark.ml

2017-04-07 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960523#comment-15960523 ] Nick Pentreath commented on SPARK-12210: Is this required any more? I guess we are close enough

[jira] [Assigned] (SPARK-20076) Python interface for ml.stats.Correlation

2017-04-07 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-20076: -- Assignee: Liang-Chi Hsieh > Python interface for ml.stats.Correlation >

[jira] [Resolved] (SPARK-20076) Python interface for ml.stats.Correlation

2017-04-07 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-20076. Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 17494

[jira] [Commented] (SPARK-19979) [MLLIB] Multiple Estimators/Pipelines In CrossValidator

2017-04-06 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958492#comment-15958492 ] Nick Pentreath commented on SPARK-19979: I think we could add a note to the user guide. However I

[jira] [Assigned] (SPARK-19953) RandomForest Models should use the UID of Estimator when fit

2017-04-06 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-19953: -- Assignee: Bryan Cutler > RandomForest Models should use the UID of Estimator when fit

[jira] [Resolved] (SPARK-19953) RandomForest Models should use the UID of Estimator when fit

2017-04-06 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-19953. Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 17296

[jira] [Commented] (SPARK-20203) Change default maxPatternLength value to Int.MaxValue in PrefixSpan

2017-04-04 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954904#comment-15954904 ] Nick Pentreath commented on SPARK-20203: I see there is a comment in the code that says: {{//

[jira] [Commented] (SPARK-20047) Constrained Logistic Regression

2017-04-03 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953551#comment-15953551 ] Nick Pentreath commented on SPARK-20047: Is this really targeted for 2.2.0? > Constrained

[jira] [Assigned] (SPARK-19969) Doc and examples for Imputer

2017-04-03 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-19969: -- Assignee: yuhao yang > Doc and examples for Imputer > >

[jira] [Resolved] (SPARK-19969) Doc and examples for Imputer

2017-04-03 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-19969. Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 17324

[jira] [Assigned] (SPARK-19985) Some ML Models error when copy or do not set parent

2017-04-03 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-19985: -- Assignee: Bryan Cutler > Some ML Models error when copy or do not set parent >

[jira] [Resolved] (SPARK-19985) Some ML Models error when copy or do not set parent

2017-04-03 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-19985. Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 17326

[jira] [Commented] (SPARK-14174) Accelerate KMeans via Mini-Batch EM

2017-03-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947114#comment-15947114 ] Nick Pentreath commented on SPARK-14174: The actual fix in the PR is pretty small - essentially

[jira] [Assigned] (SPARK-15040) PySpark impl for ml.feature.Imputer

2017-03-24 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-15040: -- Assignee: Nick Pentreath > PySpark impl for ml.feature.Imputer >

[jira] [Resolved] (SPARK-15040) PySpark impl for ml.feature.Imputer

2017-03-24 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-15040. Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 17316

[jira] [Updated] (SPARK-20043) CrossValidatorModel loader does not recognize impurity "Gini" and "Entropy" on ML random forest and decision. Only "gini" and "entropy" (in lower case) are accepted

2017-03-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-20043: --- Labels: starter (was: ) > CrossValidatorModel loader does not recognize impurity "Gini" and

[jira] [Commented] (SPARK-20043) CrossValidatorModel loader does not recognize impurity "Gini" and "Entropy" on ML random forest and decision. Only "gini" and "entropy" (in lower case) are accepted

2017-03-21 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934905#comment-15934905 ] Nick Pentreath commented on SPARK-20043: I just noticed the error message you put above says

[jira] [Updated] (SPARK-20043) CrossValidatorModel loader does not recognize impurity "Gini" and "Entropy" on ML random forest and decision. Only "gini" and "entropy" (in lower case) are accepted

2017-03-21 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-20043: --- Docs Text: (was: I saved a CrossValidatorModel with a decision tree and a random forest. I

[jira] [Updated] (SPARK-20043) CrossValidatorModel loader does not recognize impurity "Gini" and "Entropy" on ML random forest and decision. Only "gini" and "entropy" (in lower case) are accepted

2017-03-21 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-20043: --- Description: I saved a CrossValidatorModel with a decision tree and a random forest. I use

[jira] [Commented] (SPARK-19969) Doc and examples for Imputer

2017-03-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928854#comment-15928854 ] Nick Pentreath commented on SPARK-19969: Ok - I can help on it but probably only some time next

[jira] [Commented] (SPARK-19979) [MLLIB] Multiple Estimators/Pipelines In CrossValidator

2017-03-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928545#comment-15928545 ] Nick Pentreath commented on SPARK-19979: I wonder if this fits in as a sort of sub-task of

[jira] [Commented] (SPARK-19969) Doc and examples for Imputer

2017-03-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928537#comment-15928537 ] Nick Pentreath commented on SPARK-19969: No haven't done the doc or examples - I seem to recall

[jira] [Commented] (SPARK-15040) PySpark impl for ml.feature.Imputer

2017-03-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928180#comment-15928180 ] Nick Pentreath commented on SPARK-15040: Sorry, I did not see your comment - I opened a

[jira] [Commented] (SPARK-19899) FPGrowth input column naming

2017-03-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928193#comment-15928193 ] Nick Pentreath commented on SPARK-19899: +1 on {{itemsCol}} - feel free to send a PR :) >

[jira] [Assigned] (SPARK-13568) Create feature transformer to impute missing values

2017-03-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-13568: -- Assignee: yuhao yang > Create feature transformer to impute missing values >

[jira] [Resolved] (SPARK-13568) Create feature transformer to impute missing values

2017-03-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-13568. Resolution: Fixed Fix Version/s: 2.2.0 > Create feature transformer to impute

[jira] [Created] (SPARK-19969) Doc and examples for Imputer

2017-03-16 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-19969: -- Summary: Doc and examples for Imputer Key: SPARK-19969 URL: https://issues.apache.org/jira/browse/SPARK-19969 Project: Spark Issue Type: Documentation

[jira] [Commented] (SPARK-19962) add DictVectorizor for DataFrame

2017-03-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927601#comment-15927601 ] Nick Pentreath commented on SPARK-19962: You may also want to take a look at

[jira] [Commented] (SPARK-19957) Inconsist KMeans initialization mode behavior between ML and MLlib

2017-03-15 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925723#comment-15925723 ] Nick Pentreath commented on SPARK-19957: See https://issues.apache.org/jira/browse/SPARK-16832

[jira] [Commented] (SPARK-14409) Investigate adding a RankingEvaluator to ML

2017-03-09 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902649#comment-15902649 ] Nick Pentreath commented on SPARK-14409: [~josephkb] in reference to your [PR

[jira] [Comment Edited] (SPARK-14409) Investigate adding a RankingEvaluator to ML

2017-03-09 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902639#comment-15902639 ] Nick Pentreath edited comment on SPARK-14409 at 3/9/17 8:05 AM: I

[jira] [Commented] (SPARK-14409) Investigate adding a RankingEvaluator to ML

2017-03-09 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902639#comment-15902639 ] Nick Pentreath commented on SPARK-14409: I commented on the [PR for

[jira] [Commented] (SPARK-13969) Extend input format that feature hashing can handle

2017-03-07 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15900825#comment-15900825 ] Nick Pentreath commented on SPARK-13969: I think {{HashingTF}} and {{FeatureHasher}} are

[jira] [Commented] (SPARK-19848) Regex Support in StopWordsRemover

2017-03-07 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15899250#comment-15899250 ] Nick Pentreath commented on SPARK-19848: Perhaps the ML pipeline components mentioned

[jira] [Comment Edited] (SPARK-19848) Regex Support in StopWordsRemover

2017-03-07 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15899250#comment-15899250 ] Nick Pentreath edited comment on SPARK-19848 at 3/7/17 11:06 AM: - This

[jira] [Commented] (SPARK-14409) Investigate adding a RankingEvaluator to ML

2017-03-06 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898855#comment-15898855 ] Nick Pentreath commented on SPARK-14409: [~josephkb] the proposed input schema above encompasses

[jira] [Comment Edited] (SPARK-14409) Investigate adding a RankingEvaluator to ML

2017-03-06 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896933#comment-15896933 ] Nick Pentreath edited comment on SPARK-14409 at 3/6/17 9:07 AM: I've

[jira] [Comment Edited] (SPARK-14409) Investigate adding a RankingEvaluator to ML

2017-03-06 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896933#comment-15896933 ] Nick Pentreath edited comment on SPARK-14409 at 3/6/17 9:06 AM: I've

[jira] [Commented] (SPARK-14409) Investigate adding a RankingEvaluator to ML

2017-03-06 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896933#comment-15896933 ] Nick Pentreath commented on SPARK-14409: I've thought about this a lot over the past few days,

[jira] [Commented] (SPARK-7146) Should ML sharedParams be a public API?

2017-03-04 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15895629#comment-15895629 ] Nick Pentreath commented on SPARK-7146: --- Personally I support developer API - these are going to be

[jira] [Commented] (SPARK-19339) StatFunctions.multipleApproxQuantiles can give NoSuchElementException: next on empty iterator

2017-03-02 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893858#comment-15893858 ] Nick Pentreath commented on SPARK-19339: This should be addressed by SPARK-19573 - empty (or all

[jira] [Commented] (SPARK-19714) Bucketizer Bug Regarding Handling Unbucketed Inputs

2017-03-02 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893821#comment-15893821 ] Nick Pentreath commented on SPARK-19714: If you feel that handling values outside the bucket

[jira] [Commented] (SPARK-19747) Consolidate code in ML aggregators

2017-03-02 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893811#comment-15893811 ] Nick Pentreath commented on SPARK-19747: Also agree we should be able to extract out the penalty

[jira] [Commented] (SPARK-19747) Consolidate code in ML aggregators

2017-03-02 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893810#comment-15893810 ] Nick Pentreath commented on SPARK-19747: [~yuhaoyan] for {{SGDClassifier}} it would be

[jira] [Resolved] (SPARK-19345) Add doc for "coldStartStrategy" usage in ALS

2017-03-02 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-19345. Resolution: Fixed Fix Version/s: 2.2.0 > Add doc for "coldStartStrategy" usage in

[jira] [Updated] (SPARK-19345) Add doc for "coldStartStrategy" usage in ALS

2017-03-02 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-19345: --- Priority: Minor (was: Major) > Add doc for "coldStartStrategy" usage in ALS >

[jira] [Updated] (SPARK-19704) AFTSurvivalRegression should support numeric censorCol

2017-03-02 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-19704: --- Fix Version/s: 2.2.0 > AFTSurvivalRegression should support numeric censorCol >

[jira] [Assigned] (SPARK-19704) AFTSurvivalRegression should support numeric censorCol

2017-03-02 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-19704: -- Assignee: zhengruifeng > AFTSurvivalRegression should support numeric censorCol >

[jira] [Resolved] (SPARK-19704) AFTSurvivalRegression should support numeric censorCol

2017-03-02 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-19704. Resolution: Fixed > AFTSurvivalRegression should support numeric censorCol >

[jira] [Assigned] (SPARK-19733) ALS performs unnecessary casting on item and user ids

2017-03-02 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-19733: -- Assignee: Vasilis Vryniotis > ALS performs unnecessary casting on item and user ids >

[jira] [Resolved] (SPARK-19733) ALS performs unnecessary casting on item and user ids

2017-03-02 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-19733. Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 17059

[jira] [Resolved] (SPARK-19787) Different default regParam values in ALS

2017-03-01 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-19787. Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 17121

[jira] [Assigned] (SPARK-19345) Add doc for "coldStartStrategy" usage in ALS

2017-02-28 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-19345: -- Assignee: Nick Pentreath > Add doc for "coldStartStrategy" usage in ALS >

[jira] [Resolved] (SPARK-14489) RegressionEvaluator returns NaN for ALS in Spark ml

2017-02-28 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-14489. Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 12896

[jira] [Commented] (SPARK-11968) ALS recommend all methods spend most of time in GC

2017-02-27 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885636#comment-15885636 ] Nick Pentreath commented on SPARK-11968: While working on performance testing for ALS parity I've

[jira] [Reopened] (SPARK-11968) ALS recommend all methods spend most of time in GC

2017-02-27 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reopened SPARK-11968: Assignee: Nick Pentreath > ALS recommend all methods spend most of time in GC >

[jira] [Commented] (SPARK-19141) VectorAssembler metadata causing memory issues

2017-02-27 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885625#comment-15885625 ] Nick Pentreath commented on SPARK-19141: Hi there - I've also run into issues with larger-scale

[jira] [Commented] (SPARK-19714) Bucketizer Bug Regarding Handling Unbucketed Inputs

2017-02-27 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885315#comment-15885315 ] Nick Pentreath commented on SPARK-19714: I also agree that the naming of {{splits}} could be

[jira] [Commented] (SPARK-19747) Consolidate code in ML aggregators

2017-02-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885298#comment-15885298 ] Nick Pentreath commented on SPARK-19747: Big +1 for this! I agree we really should be able to

[jira] [Commented] (SPARK-19714) Bucketizer Bug Regarding Handling Unbucketed Inputs

2017-02-24 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882224#comment-15882224 ] Nick Pentreath commented on SPARK-19714: Another alternative is that we do expand the "invalid"

[jira] [Comment Edited] (SPARK-19714) Bucketizer Bug Regarding Handling Unbucketed Inputs

2017-02-24 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882216#comment-15882216 ] Nick Pentreath edited comment on SPARK-19714 at 2/24/17 8:35 AM: - I agree

[jira] [Commented] (SPARK-19714) Bucketizer Bug Regarding Handling Unbucketed Inputs

2017-02-24 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882216#comment-15882216 ] Nick Pentreath commented on SPARK-19714: I agree that the parameter naming is perhaps misleading.

[jira] [Commented] (SPARK-18813) MLlib 2.2 Roadmap

2017-02-24 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882206#comment-15882206 ] Nick Pentreath commented on SPARK-18813: FYI I've started going through a few of the top Watched

[jira] [Closed] (SPARK-10041) Proposal of Parameter Server Interface for Spark

2017-02-24 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath closed SPARK-10041. -- Resolution: Won't Fix > Proposal of Parameter Server Interface for Spark >

[jira] [Closed] (SPARK-10041) Proposal of Parameter Server Interface for Spark

2017-02-24 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath closed SPARK-10041. -- Resolution: Won't Fix > Proposal of Parameter Server Interface for Spark >

[jira] [Reopened] (SPARK-10041) Proposal of Parameter Server Interface for Spark

2017-02-24 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reopened SPARK-10041: > Proposal of Parameter Server Interface for Spark >

[jira] [Commented] (SPARK-10041) Proposal of Parameter Server Interface for Spark

2017-02-24 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882198#comment-15882198 ] Nick Pentreath commented on SPARK-10041: I think it is safe to say this is not going to be part

[jira] [Commented] (SPARK-2336) Approximate k-NN Models for MLLib

2017-02-24 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882187#comment-15882187 ] Nick Pentreath commented on SPARK-2336: --- I think it's safe to say that this now lives in a Spark

[jira] [Commented] (SPARK-6567) Large linear model parallelism via a join and reduceByKey

2017-02-24 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882182#comment-15882182 ] Nick Pentreath commented on SPARK-6567: --- This JIRA has been around for a while without any movement.

[jira] [Commented] (SPARK-3434) Distributed block matrix

2017-02-24 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882179#comment-15882179 ] Nick Pentreath commented on SPARK-3434: --- This JIRA only has SPARK-3976 open. There was an old PR for

[jira] [Commented] (SPARK-14409) Investigate adding a RankingEvaluator to ML

2017-02-23 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882174#comment-15882174 ] Nick Pentreath commented on SPARK-14409: The other option is to work with [~danilo.ascione] PR

[jira] [Commented] (SPARK-14409) Investigate adding a RankingEvaluator to ML

2017-02-23 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882163#comment-15882163 ] Nick Pentreath commented on SPARK-14409: [~roberto.mirizzi] the {{goodThreshold}} param seems

[jira] [Resolved] (SPARK-14084) Parallel training jobs in model selection

2017-02-23 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-14084. Resolution: Duplicate Target Version/s: (was: ) > Parallel training jobs in

[jira] [Commented] (SPARK-14084) Parallel training jobs in model selection

2017-02-23 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882123#comment-15882123 ] Nick Pentreath commented on SPARK-14084: I guess we could have put SPARK-19071 into this ticket

[jira] [Comment Edited] (SPARK-3246) Support weighted SVMWithSGD for classification of unbalanced dataset

2017-02-23 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882113#comment-15882113 ] Nick Pentreath edited comment on SPARK-3246 at 2/24/17 7:15 AM: Since

[jira] [Comment Edited] (SPARK-3246) Support weighted SVMWithSGD for classification of unbalanced dataset

2017-02-23 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882113#comment-15882113 ] Nick Pentreath edited comment on SPARK-3246 at 2/24/17 7:16 AM: Since

[jira] [Closed] (SPARK-3246) Support weighted SVMWithSGD for classification of unbalanced dataset

2017-02-23 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath closed SPARK-3246. - Resolution: Won't Fix > Support weighted SVMWithSGD for classification of unbalanced dataset >

[jira] [Commented] (SPARK-3246) Support weighted SVMWithSGD for classification of unbalanced dataset

2017-02-23 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882113#comment-15882113 ] Nick Pentreath commented on SPARK-3246: --- Since {{mllib}} is in maintenance mode and {{LinearSVC}}

[jira] [Commented] (SPARK-19634) Feature parity for descriptive statistics in MLlib

2017-02-23 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880628#comment-15880628 ] Nick Pentreath commented on SPARK-19634: Ah I see it was discussed in the design doc - will go

[jira] [Commented] (SPARK-19634) Feature parity for descriptive statistics in MLlib

2017-02-23 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880623#comment-15880623 ] Nick Pentreath commented on SPARK-19634: Thanks [~timhunter]. In terms of performance, we expect

[jira] [Commented] (SPARK-18813) MLlib 2.2 Roadmap

2017-02-23 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880387#comment-15880387 ] Nick Pentreath commented on SPARK-18813: Thanks for this Joseph and everyone for the comments &

[jira] [Commented] (SPARK-14409) Investigate adding a RankingEvaluator to ML

2017-02-23 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880324#comment-15880324 ] Nick Pentreath commented on SPARK-14409: [~roberto.mirizzi] If using the current

[jira] [Commented] (SPARK-14409) Investigate adding a RankingEvaluator to ML

2017-02-23 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880312#comment-15880312 ] Nick Pentreath commented on SPARK-14409: [~danilo.ascione] Yes, your solution is generic assuming

[jira] [Commented] (SPARK-19668) Multiple NGram sizes

2017-02-23 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880131#comment-15880131 ] Nick Pentreath commented on SPARK-19668: The simplest will be to keep the existing param and make

[jira] [Resolved] (SPARK-19679) Destroy broadcasted object without blocking

2017-02-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-19679. Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 17016

[jira] [Assigned] (SPARK-19679) Destroy broadcasted object without blocking

2017-02-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-19679: -- Assignee: zhengruifeng > Destroy broadcasted object without blocking >

[jira] [Assigned] (SPARK-19694) Add missing 'setTopicDistributionCol' for LDAModel

2017-02-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-19694: -- Assignee: zhengruifeng > Add missing 'setTopicDistributionCol' for LDAModel >

[jira] [Resolved] (SPARK-19694) Add missing 'setTopicDistributionCol' for LDAModel

2017-02-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-19694. Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 17021

[jira] [Comment Edited] (SPARK-18454) Changes to improve Nearest Neighbor Search for LSH

2017-02-21 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875552#comment-15875552 ] Nick Pentreath edited comment on SPARK-18454 at 2/21/17 8:00 AM: - Can you

[jira] [Commented] (SPARK-18454) Changes to improve Nearest Neighbor Search for LSH

2017-02-20 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875552#comment-15875552 ] Nick Pentreath commented on SPARK-18454: Can you also comment on

[jira] [Commented] (SPARK-18608) Spark ML algorithms that check RDD cache level for internal caching double-cache data

2017-02-20 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875465#comment-15875465 ] Nick Pentreath commented on SPARK-18608: [~podongfeng] [~yuhaoyan] I'm not aware of anyone

[jira] [Commented] (SPARK-19668) Multiple NGram sizes

2017-02-20 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875437#comment-15875437 ] Nick Pentreath commented on SPARK-19668: I'd say a range is feasible. The current API doesn't

[jira] [Commented] (SPARK-19573) Make NaN/null handling consistent in approxQuantile

2017-02-20 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874321#comment-15874321 ] Nick Pentreath commented on SPARK-19573: cc [~timhunter] - can you take a look at the discussion

[jira] [Commented] (SPARK-19208) MultivariateOnlineSummarizer performance optimization

2017-02-14 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866755#comment-15866755 ] Nick Pentreath commented on SPARK-19208: Ah right I see - yes rewrite rules would be a good

[jira] [Comment Edited] (SPARK-19208) MultivariateOnlineSummarizer performance optimization

2017-02-14 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866755#comment-15866755 ] Nick Pentreath edited comment on SPARK-19208 at 2/14/17 9:42 PM: - Ah

[jira] [Commented] (SPARK-19208) MultivariateOnlineSummarizer performance optimization

2017-02-14 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866675#comment-15866675 ] Nick Pentreath commented on SPARK-19208: When I said "estimator-like", I didn't mean it should

[jira] [Commented] (SPARK-14503) spark.ml Scala API for FPGrowth

2017-02-13 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864872#comment-15864872 ] Nick Pentreath commented on SPARK-14503: Seems {{PrefixSpan}} even takes different input:

[jira] [Commented] (SPARK-19422) Cache input data in algorithms

2017-02-01 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848401#comment-15848401 ] Nick Pentreath commented on SPARK-19422: Please see SPARK-18608 - the fix you propose in the PR

[jira] [Comment Edited] (SPARK-19208) MultivariateOnlineSummarizer performance optimization

2017-02-01 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848108#comment-15848108 ] Nick Pentreath edited comment on SPARK-19208 at 2/1/17 8:09 AM: Another

<    1   2   3   4   5   6   7   8   9   10   >