[jira] [Created] (SPARK-16814) Fix deprecated use of ParquetWriter in Parquet test suites

2016-07-30 Thread holdenk (JIRA)
holdenk created SPARK-16814: --- Summary: Fix deprecated use of ParquetWriter in Parquet test suites Key: SPARK-16814 URL: https://issues.apache.org/jira/browse/SPARK-16814 Project: Spark Issue Type:

[jira] [Commented] (SPARK-16779) Fix unnecessary use of postfix operations

2016-07-29 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400092#comment-15400092 ] holdenk commented on SPARK-16779: - I've gone ahead and done a more full-scope version - l

[jira] [Commented] (SPARK-16776) Fix Kafka deprecation warnings

2016-07-29 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399918#comment-15399918 ] holdenk commented on SPARK-16776: - Is this one you wanted to take on as well? > Fix Kafk

[jira] [Commented] (SPARK-16777) Parquet schema converter depends on deprecated APIs

2016-07-29 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398869#comment-15398869 ] holdenk commented on SPARK-16777: - Go for it :) Please CC me on the PR so I can do a code

[jira] [Commented] (SPARK-16777) Parquet schema converter depends on deprecated APIs

2016-07-28 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398827#comment-15398827 ] holdenk commented on SPARK-16777: - That's a good point, thanks for the comment/note :) I

[jira] [Commented] (SPARK-16788) Investigate JSR-310 & scala-time alternatives to our own datetime utils

2016-07-28 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398632#comment-15398632 ] holdenk commented on SPARK-16788: - cc [~davies] [~ckadner] :) > Investigate JSR-310 & sc

[jira] [Created] (SPARK-16788) Investigate JSR-310 & scala-time alternatives to our own datetime utils

2016-07-28 Thread holdenk (JIRA)
holdenk created SPARK-16788: --- Summary: Investigate JSR-310 & scala-time alternatives to our own datetime utils Key: SPARK-16788 URL: https://issues.apache.org/jira/browse/SPARK-16788 Project: Spark

[jira] [Comment Edited] (SPARK-16774) Fix use of deprecated TimeStamp constructor (also providing incorrect results)

2016-07-28 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398424#comment-15398424 ] holdenk edited comment on SPARK-16774 at 7/28/16 11:38 PM: --- Whi

[jira] [Updated] (SPARK-16774) Fix use of deprecated TimeStamp constructor (also providing incorrect results)

2016-07-28 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-16774: Description: The TimeStamp constructor we use inside of DateTime utils has been deprecated since JDK 1.1 -

[jira] [Updated] (SPARK-16774) Fix use of deprecated TimeStamp constructor

2016-07-28 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-16774: Description: The TimeStamp constructor we use inside of DateTime utils has been deprecated since JDK 1.1 -

[jira] [Updated] (SPARK-16774) Fix use of deprecated TimeStamp constructor (also providing incorrect results)

2016-07-28 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-16774: Summary: Fix use of deprecated TimeStamp constructor (also providing incorrect results) (was: Fix use of d

[jira] [Commented] (SPARK-16774) Fix use of deprecated TimeStamp constructor

2016-07-28 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398424#comment-15398424 ] holdenk commented on SPARK-16774: - While diving into this (relatedly I hate timezones) -

[jira] [Comment Edited] (SPARK-16779) Fix unnecessary use of postfix operations

2016-07-28 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15397984#comment-15397984 ] holdenk edited comment on SPARK-16779 at 7/28/16 6:36 PM: -- I'm s

[jira] [Commented] (SPARK-16779) Fix unnecessary use of postfix operations

2016-07-28 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15397984#comment-15397984 ] holdenk commented on SPARK-16779: - I'm sort of on the fence with fixing as well - but we

[jira] [Created] (SPARK-16779) Fix unnecessary use of postfix operations

2016-07-28 Thread holdenk (JIRA)
holdenk created SPARK-16779: --- Summary: Fix unnecessary use of postfix operations Key: SPARK-16779 URL: https://issues.apache.org/jira/browse/SPARK-16779 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-16773) Post Spark 2.0 deprecation & warnings cleanup

2016-07-28 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-16773: Summary: Post Spark 2.0 deprecation & warnings cleanup (was: Post Spark 2.0 deprecation cleanup) > Post S

[jira] [Created] (SPARK-16778) Fix use of deprecated SQLContext constructor

2016-07-28 Thread holdenk (JIRA)
holdenk created SPARK-16778: --- Summary: Fix use of deprecated SQLContext constructor Key: SPARK-16778 URL: https://issues.apache.org/jira/browse/SPARK-16778 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-16775) Reduce internal warnings from deprecated accumulator API

2016-07-28 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-16775: Component/s: SQL > Reduce internal warnings from deprecated accumulator API > -

[jira] [Created] (SPARK-16777) Parquet schema converter depends on deprecated APIs

2016-07-28 Thread holdenk (JIRA)
holdenk created SPARK-16777: --- Summary: Parquet schema converter depends on deprecated APIs Key: SPARK-16777 URL: https://issues.apache.org/jira/browse/SPARK-16777 Project: Spark Issue Type: Sub-tas

[jira] [Updated] (SPARK-16775) Reduce internal warnings from deprecated accumulator API

2016-07-28 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-16775: Component/s: (was: ML) (was: SQL) (was: MLlib) > Reduce inter

[jira] [Created] (SPARK-16776) Fix Kafka deprecation warnings

2016-07-28 Thread holdenk (JIRA)
holdenk created SPARK-16776: --- Summary: Fix Kafka deprecation warnings Key: SPARK-16776 URL: https://issues.apache.org/jira/browse/SPARK-16776 Project: Spark Issue Type: Sub-task Component

[jira] [Created] (SPARK-16775) Reduce internal warnings from deprecated accumulator API

2016-07-28 Thread holdenk (JIRA)
holdenk created SPARK-16775: --- Summary: Reduce internal warnings from deprecated accumulator API Key: SPARK-16775 URL: https://issues.apache.org/jira/browse/SPARK-16775 Project: Spark Issue Type: Su

[jira] [Created] (SPARK-16774) Fix use of deprecated TimeStamp constructor

2016-07-28 Thread holdenk (JIRA)
holdenk created SPARK-16774: --- Summary: Fix use of deprecated TimeStamp constructor Key: SPARK-16774 URL: https://issues.apache.org/jira/browse/SPARK-16774 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-16773) Post Spark 2.0 deprecation cleanup

2016-07-28 Thread holdenk (JIRA)
holdenk created SPARK-16773: --- Summary: Post Spark 2.0 deprecation cleanup Key: SPARK-16773 URL: https://issues.apache.org/jira/browse/SPARK-16773 Project: Spark Issue Type: Improvement Co

[jira] [Commented] (SPARK-15130) PySpark shared params should include default values to match Scala

2016-07-25 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392716#comment-15392716 ] holdenk commented on SPARK-15130: - Now that 2.0 is ready to go out, maybe we can decide w

[jira] [Updated] (SPARK-16720) Loading CSV file with 2k+ columns fails during attribute resolution on action

2016-07-25 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-16720: Summary: Loading CSV file with 2k+ columns fails during attribute resolution on action (was: Loading CSV f

[jira] [Created] (SPARK-16720) Loading CSV file with 2k+ columns and writing result with one selected column fails during attribute resolution

2016-07-25 Thread holdenk (JIRA)
holdenk created SPARK-16720: --- Summary: Loading CSV file with 2k+ columns and writing result with one selected column fails during attribute resolution Key: SPARK-16720 URL: https://issues.apache.org/jira/browse/SPARK-16

[jira] [Commented] (SPARK-16589) Chained cartesian produces incorrect number of records

2016-07-22 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390299#comment-15390299 ] holdenk commented on SPARK-16589: - Yah I think we should explore whats going on a bit mor

[jira] [Updated] (SPARK-15581) MLlib 2.1 Roadmap

2016-07-18 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-15581: Description: This is a master list for MLlib improvements we are working on for the next release. Please v

[jira] [Commented] (SPARK-14813) ML 2.0 QA: API: Python API coverage

2016-07-11 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371624#comment-15371624 ] holdenk commented on SPARK-14813: - Yup, auditing is done and once 2.0 is out we will go b

[jira] [Updated] (SPARK-16424) Add support for Structured Streaming to the ML Pipeline API

2016-07-08 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-16424: Description: For Spark 2.1 we should consider adding support for machine learning on top of the structured

[jira] [Created] (SPARK-16454) Consider adding a per-batch transform for structured streaming

2016-07-08 Thread holdenk (JIRA)
holdenk created SPARK-16454: --- Summary: Consider adding a per-batch transform for structured streaming Key: SPARK-16454 URL: https://issues.apache.org/jira/browse/SPARK-16454 Project: Spark Issue T

[jira] [Commented] (SPARK-15581) MLlib 2.1 Roadmap

2016-07-08 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368320#comment-15368320 ] holdenk commented on SPARK-15581: - Yah - the more I look at it the more rough it seems -

[jira] [Created] (SPARK-16424) Add support for Structured Streaming to the ML Pipeline API

2016-07-07 Thread holdenk (JIRA)
holdenk created SPARK-16424: --- Summary: Add support for Structured Streaming to the ML Pipeline API Key: SPARK-16424 URL: https://issues.apache.org/jira/browse/SPARK-16424 Project: Spark Issue Type

[jira] [Commented] (SPARK-15581) MLlib 2.1 Roadmap

2016-07-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15366520#comment-15366520 ] holdenk commented on SPARK-15581: - What do we think of Streaming ML Pipelines being on th

[jira] [Created] (SPARK-16407) Allow users to supply custom StreamSinkProviders

2016-07-06 Thread holdenk (JIRA)
holdenk created SPARK-16407: --- Summary: Allow users to supply custom StreamSinkProviders Key: SPARK-16407 URL: https://issues.apache.org/jira/browse/SPARK-16407 Project: Spark Issue Type: Improvemen

[jira] [Commented] (SPARK-13233) Python Dataset

2016-06-30 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15356683#comment-15356683 ] holdenk commented on SPARK-13233: - The ability to intermix functional transformations eas

[jira] [Commented] (SPARK-13233) Python Dataset

2016-06-30 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15356670#comment-15356670 ] holdenk commented on SPARK-13233: - [~maver1ck] not really sure what the API plan is here

[jira] [Commented] (SPARK-16020) Fix complete mode aggregation with console sink

2016-06-28 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15353579#comment-15353579 ] holdenk commented on SPARK-16020: - Do we know why this bug happened? > Fix complete mode

[jira] [Commented] (SPARK-15954) TestHive has issues being used in PySpark

2016-06-14 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330821#comment-15330821 ] holdenk commented on SPARK-15954: - See related PR https://github.com/apache/spark/pull/12

[jira] [Created] (SPARK-15954) TestHive has issues being used in PySpark

2016-06-14 Thread holdenk (JIRA)
holdenk created SPARK-15954: --- Summary: TestHive has issues being used in PySpark Key: SPARK-15954 URL: https://issues.apache.org/jira/browse/SPARK-15954 Project: Spark Issue Type: Bug Com

[jira] [Created] (SPARK-15902) Add a deprecation warning for Python 2.6

2016-06-12 Thread holdenk (JIRA)
holdenk created SPARK-15902: --- Summary: Add a deprecation warning for Python 2.6 Key: SPARK-15902 URL: https://issues.apache.org/jira/browse/SPARK-15902 Project: Spark Issue Type: Sub-task

[jira] [Commented] (SPARK-12661) Drop Python 2.6 support in PySpark

2016-06-12 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326393#comment-15326393 ] holdenk commented on SPARK-12661: - Even using pip on Python 2.6 prints a deprecation warn

[jira] [Updated] (SPARK-15369) Investigate selectively using Jython for parts of PySpark

2016-06-10 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-15369: Description: Transferring data from the JVM to the Python executor can be a substantial bottleneck. While J

[jira] [Commented] (SPARK-12661) Drop Python 2.6 support in PySpark

2016-06-10 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325678#comment-15325678 ] holdenk commented on SPARK-12661: - What are we missing to drop 2.6 support? We could keep

[jira] [Commented] (SPARK-15369) Investigate selectively using Jython for parts of PySpark

2016-06-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319966#comment-15319966 ] holdenk commented on SPARK-15369: - WIP design document https://docs.google.com/document/

[jira] [Updated] (SPARK-15623) 2.0 python converage ml.feature

2016-05-27 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-15623: Component/s: PySpark ML > 2.0 python converage ml.feature > --

[jira] [Commented] (SPARK-15625) 2.0 python converage ml.classification module

2016-05-27 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304734#comment-15304734 ] holdenk commented on SPARK-15625: - This audit is complete (outstanding PRs and issue in p

[jira] [Created] (SPARK-15630) 2.0 python converage ml root module

2016-05-27 Thread holdenk (JIRA)
holdenk created SPARK-15630: --- Summary: 2.0 python converage ml root module Key: SPARK-15630 URL: https://issues.apache.org/jira/browse/SPARK-15630 Project: Spark Issue Type: Improvement C

[jira] [Commented] (SPARK-14813) ML 2.0 QA: API: Python API coverage

2016-05-27 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304728#comment-15304728 ] holdenk commented on SPARK-14813: - I'm thinking we should skip read/write missing in comp

[jira] [Commented] (SPARK-15627) 2.0 python converage ml.tuning module

2016-05-27 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304723#comment-15304723 ] holdenk commented on SPARK-15627: - ml.tuning audit complete > 2.0 python converage ml.tu

[jira] [Commented] (SPARK-15623) 2.0 python converage ml.feature

2016-05-27 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304722#comment-15304722 ] holdenk commented on SPARK-15623: - cc [~bryanc] can you just double check/confirm that yo

[jira] [Updated] (SPARK-15628) pyspark.ml.evaluation module

2016-05-27 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-15628: Component/s: PySpark ML > pyspark.ml.evaluation module > > >

[jira] [Commented] (SPARK-15628) pyspark.ml.evaluation module

2016-05-27 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304720#comment-15304720 ] holdenk commented on SPARK-15628: - API Audit of this component complete > pyspark.ml.eva

[jira] [Created] (SPARK-15628) pyspark.ml.evaluation module

2016-05-27 Thread holdenk (JIRA)
holdenk created SPARK-15628: --- Summary: pyspark.ml.evaluation module Key: SPARK-15628 URL: https://issues.apache.org/jira/browse/SPARK-15628 Project: Spark Issue Type: Improvement Report

[jira] [Created] (SPARK-15629) 2.0 python converage pyspark.ml.linalg

2016-05-27 Thread holdenk (JIRA)
holdenk created SPARK-15629: --- Summary: 2.0 python converage pyspark.ml.linalg Key: SPARK-15629 URL: https://issues.apache.org/jira/browse/SPARK-15629 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-15589) Anaylze simple PySpark closures and generate SQL expressions

2016-05-27 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304692#comment-15304692 ] holdenk commented on SPARK-15589: - Of course needs to wait for the Python Dataset API to

[jira] [Created] (SPARK-15627) 2.0 python converage ml.tuning module

2016-05-27 Thread holdenk (JIRA)
holdenk created SPARK-15627: --- Summary: 2.0 python converage ml.tuning module Key: SPARK-15627 URL: https://issues.apache.org/jira/browse/SPARK-15627 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-15626) 2.0 python converage ml.regression module

2016-05-27 Thread holdenk (JIRA)
holdenk created SPARK-15626: --- Summary: 2.0 python converage ml.regression module Key: SPARK-15626 URL: https://issues.apache.org/jira/browse/SPARK-15626 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-15624) 2.0 python converage ml.recommendation module

2016-05-27 Thread holdenk (JIRA)
holdenk created SPARK-15624: --- Summary: 2.0 python converage ml.recommendation module Key: SPARK-15624 URL: https://issues.apache.org/jira/browse/SPARK-15624 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-15625) 2.0 python converage ml.classification module

2016-05-27 Thread holdenk (JIRA)
holdenk created SPARK-15625: --- Summary: 2.0 python converage ml.classification module Key: SPARK-15625 URL: https://issues.apache.org/jira/browse/SPARK-15625 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-15623) 2.0 python converage ml.feature

2016-05-27 Thread holdenk (JIRA)
holdenk created SPARK-15623: --- Summary: 2.0 python converage ml.feature Key: SPARK-15623 URL: https://issues.apache.org/jira/browse/SPARK-15623 Project: Spark Issue Type: Improvement Rep

[jira] [Commented] (SPARK-14813) ML 2.0 QA: API: Python API coverage

2016-05-27 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304681#comment-15304681 ] holdenk commented on SPARK-14813: - No worries, I'll break it up then. > ML 2.0 QA: API:

[jira] [Commented] (SPARK-13233) Python Dataset

2016-05-27 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304478#comment-15304478 ] holdenk commented on SPARK-13233: - So curious - is this targeted for 2.0 or are we planni

[jira] [Commented] (SPARK-12776) Implement Python API for Datasets

2016-05-27 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304477#comment-15304477 ] holdenk commented on SPARK-12776: - I think this might be duplicated by SPARK-13233, altho

[jira] [Created] (SPARK-15589) Anaylze simple PySpark closures and generate SQL expressions

2016-05-26 Thread holdenk (JIRA)
holdenk created SPARK-15589: --- Summary: Anaylze simple PySpark closures and generate SQL expressions Key: SPARK-15589 URL: https://issues.apache.org/jira/browse/SPARK-15589 Project: Spark Issue Typ

[jira] [Commented] (SPARK-14813) ML 2.0 QA: API: Python API coverage

2016-05-26 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303169#comment-15303169 ] holdenk commented on SPARK-14813: - I'd really like to split it up - but haven't heard bac

[jira] [Created] (SPARK-15577) Java can't import DataFrame type alias

2016-05-26 Thread holdenk (JIRA)
holdenk created SPARK-15577: --- Summary: Java can't import DataFrame type alias Key: SPARK-15577 URL: https://issues.apache.org/jira/browse/SPARK-15577 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-15551) Scaladoc for KeyValueGroupedDataset points to old method

2016-05-25 Thread holdenk (JIRA)
holdenk created SPARK-15551: --- Summary: Scaladoc for KeyValueGroupedDataset points to old method Key: SPARK-15551 URL: https://issues.apache.org/jira/browse/SPARK-15551 Project: Spark Issue Type: Do

[jira] [Created] (SPARK-15412) Improve linear & isotonic regression methods PyDocs

2016-05-19 Thread holdenk (JIRA)
holdenk created SPARK-15412: --- Summary: Improve linear & isotonic regression methods PyDocs Key: SPARK-15412 URL: https://issues.apache.org/jira/browse/SPARK-15412 Project: Spark Issue Type: Improve

[jira] [Created] (SPARK-15369) Investigate selectively using Jython for parts of PySpark

2016-05-17 Thread holdenk (JIRA)
holdenk created SPARK-15369: --- Summary: Investigate selectively using Jython for parts of PySpark Key: SPARK-15369 URL: https://issues.apache.org/jira/browse/SPARK-15369 Project: Spark Issue Type: I

[jira] [Updated] (SPARK-15316) PySpark GeneralizedLinearRegression missing linkPredictionCol param

2016-05-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-15316: Description: PySpark's GeneralizedLinearRegression is missing the linkPredictionCol param. (was: PySpark's

[jira] [Created] (SPARK-15316) PySpark GeneralizedLinearRegression missing linkPredictionCol param

2016-05-13 Thread holdenk (JIRA)
holdenk created SPARK-15316: --- Summary: PySpark GeneralizedLinearRegression missing linkPredictionCol param Key: SPARK-15316 URL: https://issues.apache.org/jira/browse/SPARK-15316 Project: Spark Is

[jira] [Created] (SPARK-15281) PySpark ML GBTRegressor lacks impurity param

2016-05-11 Thread holdenk (JIRA)
holdenk created SPARK-15281: --- Summary: PySpark ML GBTRegressor lacks impurity param Key: SPARK-15281 URL: https://issues.apache.org/jira/browse/SPARK-15281 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-15061) Upgrade Py4J to 0.10.1

2016-05-11 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280589#comment-15280589 ] holdenk commented on SPARK-15061: - Exciting "Py4J 0.10.1 has just been released on PyPI,

[jira] [Commented] (SPARK-14813) ML 2.0 QA: API: Python API coverage

2016-05-10 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15279380#comment-15279380 ] holdenk commented on SPARK-14813: - redid the links > ML 2.0 QA: API: Python API coverage

[jira] [Created] (SPARK-15254) Improve ML pipeline Cross Validation Scaladoc & PyDoc

2016-05-10 Thread holdenk (JIRA)
holdenk created SPARK-15254: --- Summary: Improve ML pipeline Cross Validation Scaladoc & PyDoc Key: SPARK-15254 URL: https://issues.apache.org/jira/browse/SPARK-15254 Project: Spark Issue Type: Impro

[jira] [Updated] (SPARK-15254) Improve ML pipeline Cross Validation Scaladoc & PyDoc

2016-05-10 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-15254: Component/s: ML > Improve ML pipeline Cross Validation Scaladoc & PyDoc > -

[jira] [Commented] (SPARK-15194) Add Python ML API for MultivariateGaussian

2016-05-06 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275077#comment-15275077 ] holdenk commented on SPARK-15194: - So this is the ml api not the mllib api, ml's `Multiva

[jira] [Updated] (SPARK-15194) Add Python ML API for MultivariateGaussian

2016-05-06 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-15194: Description: We have a PySpark API for the MLLib version but not the ML version. This would allow Python's

[jira] [Created] (SPARK-15195) Improve PyDoc for ml.tuning

2016-05-06 Thread holdenk (JIRA)
holdenk created SPARK-15195: --- Summary: Improve PyDoc for ml.tuning Key: SPARK-15195 URL: https://issues.apache.org/jira/browse/SPARK-15195 Project: Spark Issue Type: Improvement Component

[jira] [Created] (SPARK-15194) Add Python ML API for MultivariateGaussian

2016-05-06 Thread holdenk (JIRA)
holdenk created SPARK-15194: --- Summary: Add Python ML API for MultivariateGaussian Key: SPARK-15194 URL: https://issues.apache.org/jira/browse/SPARK-15194 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-14813) ML 2.0 QA: API: Python API coverage

2016-05-06 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274715#comment-15274715 ] holdenk commented on SPARK-14813: - [~yanboliang]Just following up since I've done a first

[jira] [Updated] (SPARK-15189) ml.Evaluation pydoc issues

2016-05-06 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-15189: Component/s: Documentation > ml.Evaluation pydoc issues > -- > > Ke

[jira] [Created] (SPARK-15189) ml.Evaluation pydoc issues

2016-05-06 Thread holdenk (JIRA)
holdenk created SPARK-15189: --- Summary: ml.Evaluation pydoc issues Key: SPARK-15189 URL: https://issues.apache.org/jira/browse/SPARK-15189 Project: Spark Issue Type: Improvement Components

[jira] [Created] (SPARK-15188) NaiveBayes is missing Thresholds param

2016-05-06 Thread holdenk (JIRA)
holdenk created SPARK-15188: --- Summary: NaiveBayes is missing Thresholds param Key: SPARK-15188 URL: https://issues.apache.org/jira/browse/SPARK-15188 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-15136) Linkify ML PyDoc

2016-05-05 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273709#comment-15273709 ] holdenk commented on SPARK-15136: - Seems reasonable, I've closed the two sub-tasks and I'

[jira] [Closed] (SPARK-15138) Linkify ML PyDoc regression

2016-05-05 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-15138. --- Resolution: Duplicate > Linkify ML PyDoc regression > --- > > Key: SP

[jira] [Closed] (SPARK-15137) Linkify ML PyDoc classification

2016-05-05 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-15137. --- Resolution: Duplicate > Linkify ML PyDoc classification > --- > >

[jira] [Commented] (SPARK-15163) Mark experimental algorithms experimental in PySpark

2016-05-05 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273705#comment-15273705 ] holdenk commented on SPARK-15163: - I think you were talking about https://issues.apache.o

[jira] [Created] (SPARK-15169) Consider improving HasSolver to allow generilization

2016-05-05 Thread holdenk (JIRA)
holdenk created SPARK-15169: --- Summary: Consider improving HasSolver to allow generilization Key: SPARK-15169 URL: https://issues.apache.org/jira/browse/SPARK-15169 Project: Spark Issue Type: Improv

[jira] [Updated] (SPARK-15168) Add missing params to Python's MultilayerPerceptronClassifier

2016-05-05 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-15168: Description: MultilayerPerceptronClassifier is missing step size, solver, and weights. Add these params. (

[jira] [Created] (SPARK-15168) Add missing params to Python's MultilayerPerceptronClassifier

2016-05-05 Thread holdenk (JIRA)
holdenk created SPARK-15168: --- Summary: Add missing params to Python's MultilayerPerceptronClassifier Key: SPARK-15168 URL: https://issues.apache.org/jira/browse/SPARK-15168 Project: Spark Issue Ty

[jira] [Updated] (SPARK-15163) Mark experimental algorithms experimental in PySpark

2016-05-05 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-15163: Component/s: PySpark > Mark experimental algorithms experimental in PySpark > -

[jira] [Created] (SPARK-15164) Mark classification algorithms as experimental where marked so in scala

2016-05-05 Thread holdenk (JIRA)
holdenk created SPARK-15164: --- Summary: Mark classification algorithms as experimental where marked so in scala Key: SPARK-15164 URL: https://issues.apache.org/jira/browse/SPARK-15164 Project: Spark

[jira] [Created] (SPARK-15163) Mark experimental algorithms experimental in PySpark

2016-05-05 Thread holdenk (JIRA)
holdenk created SPARK-15163: --- Summary: Mark experimental algorithms experimental in PySpark Key: SPARK-15163 URL: https://issues.apache.org/jira/browse/SPARK-15163 Project: Spark Issue Type: Improv

[jira] [Created] (SPARK-15162) Update PySpark LogisticRegression threshold PyDoc to be as complete as Scaladoc

2016-05-05 Thread holdenk (JIRA)
holdenk created SPARK-15162: --- Summary: Update PySpark LogisticRegression threshold PyDoc to be as complete as Scaladoc Key: SPARK-15162 URL: https://issues.apache.org/jira/browse/SPARK-15162 Project: Spark

[jira] [Commented] (SPARK-15138) Linkify ML PyDoc regression

2016-05-05 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272884#comment-15272884 ] holdenk commented on SPARK-15138: - cc [~yanboliang] > Linkify ML PyDoc regression >

[jira] [Created] (SPARK-15161) Consider moving featureImportances into TreeEnsemble models base class

2016-05-05 Thread holdenk (JIRA)
holdenk created SPARK-15161: --- Summary: Consider moving featureImportances into TreeEnsemble models base class Key: SPARK-15161 URL: https://issues.apache.org/jira/browse/SPARK-15161 Project: Spark

[jira] [Updated] (SPARK-15161) Consider moving featureImportances into TreeEnsemble models base class

2016-05-05 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-15161: Component/s: ML > Consider moving featureImportances into TreeEnsemble models base class >

<    1   2   3   4   5   6   7   8   9   10   >