[jira] [Resolved] (SPARK-19064) Fix pip install issue with ml sub components

2017-03-06 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-19064. - Resolution: Fixed Assignee: holdenk Fix Version/s: 2.2.0 2.1.1 > Fix

[jira] [Commented] (SPARK-19578) Poor pyspark performance + incorrect UI input-size metrics

2017-03-05 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896751#comment-15896751 ] holdenk commented on SPARK-19578: - [~nchammas] That sounds like a pretty good summary from my point of

[jira] [Commented] (SPARK-19578) Poor pyspark performance + incorrect UI input-size metrics

2017-03-01 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890846#comment-15890846 ] holdenk commented on SPARK-19578: - [~nchammas] It's an interesting idea but I don't think it would work

[jira] [Assigned] (SPARK-13330) PYTHONHASHSEED is not propgated to python worker

2017-02-24 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-13330: --- Assignee: Jeff Zhang > PYTHONHASHSEED is not propgated to python worker >

[jira] [Resolved] (SPARK-13330) PYTHONHASHSEED is not propgated to python worker

2017-02-24 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-13330. - Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 11211

[jira] [Commented] (SPARK-19161) Improving UDF Docstrings

2017-02-24 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883013#comment-15883013 ] holdenk commented on SPARK-19161: - Thanks for working on this [~zero323], having better docs for UDFs

[jira] [Resolved] (SPARK-19161) Improving UDF Docstrings

2017-02-24 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-19161. - Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 16534

[jira] [Assigned] (SPARK-19161) Improving UDF Docstrings

2017-02-24 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-19161: --- Assignee: Maciej Szymkiewicz > Improving UDF Docstrings > > >

[jira] [Resolved] (SPARK-19160) Decorator for UDF creation.

2017-02-15 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-19160. - Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 16533

[jira] [Assigned] (SPARK-19160) Decorator for UDF creation.

2017-02-15 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-19160: --- Assignee: Maciej Szymkiewicz > Decorator for UDF creation. > --- > >

[jira] [Assigned] (SPARK-19590) Update the document for QuantileDiscretizer in pyspark

2017-02-15 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-19590: --- Assignee: Vincent (was: holdenk) > Update the document for QuantileDiscretizer in pyspark >

[jira] [Assigned] (SPARK-19590) Update the document for QuantileDiscretizer in pyspark

2017-02-15 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-19590: --- Assignee: holdenk > Update the document for QuantileDiscretizer in pyspark >

[jira] [Resolved] (SPARK-19590) Update the document for QuantileDiscretizer in pyspark

2017-02-15 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-19590. - Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 16922

[jira] [Assigned] (SPARK-18541) Add pyspark.sql.Column.aliasWithMetadata to allow dynamic metadata management in pyspark SQL API

2017-02-14 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-18541: --- Assignee: Shea Parkes > Add pyspark.sql.Column.aliasWithMetadata to allow dynamic metadata

[jira] [Resolved] (SPARK-18541) Add pyspark.sql.Column.aliasWithMetadata to allow dynamic metadata management in pyspark SQL API

2017-02-14 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-18541. - Resolution: Fixed Fix Version/s: 2.2.0 > Add pyspark.sql.Column.aliasWithMetadata to allow

[jira] [Resolved] (SPARK-19162) UserDefinedFunction constructor should verify that func is callable

2017-02-14 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-19162. - Resolution: Fixed Fix Version/s: 2.2.0 > UserDefinedFunction constructor should verify that func

[jira] [Assigned] (SPARK-19162) UserDefinedFunction constructor should verify that func is callable

2017-02-14 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-19162: --- Assignee: Maciej Szymkiewicz > UserDefinedFunction constructor should verify that func is callable

[jira] [Assigned] (SPARK-19453) Correct DataFrame.replace docs

2017-02-14 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-19453: --- Assignee: Maciej Szymkiewicz > Correct DataFrame.replace docs > -- > >

[jira] [Resolved] (SPARK-19453) Correct DataFrame.replace docs

2017-02-14 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-19453. - Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 16792

[jira] [Commented] (SPARK-12661) Drop Python 2.6 support in PySpark

2017-02-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864824#comment-15864824 ] holdenk commented on SPARK-12661: - Coming back to this since Sean's thread reminded me - who should we

[jira] [Resolved] (SPARK-19429) Column.__getitem__ should support slice arguments

2017-02-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-19429. - Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 16771

[jira] [Assigned] (SPARK-19429) Column.__getitem__ should support slice arguments

2017-02-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-19429: --- Assignee: Maciej Szymkiewicz > Column.__getitem__ should support slice arguments >

[jira] [Commented] (SPARK-6883) Fork pyspark's cloudpickle as a separate dependency

2017-02-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864309#comment-15864309 ] holdenk commented on SPARK-6883: Let's consider re-opening this for discussion - do we maybe want to just

[jira] [Reopened] (SPARK-6883) Fork pyspark's cloudpickle as a separate dependency

2017-02-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reopened SPARK-6883: > Fork pyspark's cloudpickle as a separate dependency > --- > >

[jira] [Resolved] (SPARK-19427) UserDefinedFunction should support data types strings

2017-02-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-19427. - Resolution: Fixed Assignee: Maciej Szymkiewicz Fix Version/s: 2.2.0 Thanks for doing all

[jira] [Resolved] (SPARK-19506) Missing warnings import in pyspark.ml.util

2017-02-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-19506. - Resolution: Fixed Fix Version/s: 2.2.0 2.1.1 Thanks for reporting and fixing

[jira] [Assigned] (SPARK-19506) Missing warnings import in pyspark.ml.util

2017-02-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-19506: --- Assignee: Maciej Szymkiewicz > Missing warnings import in pyspark.ml.util >

[jira] [Resolved] (SPARK-19421) Remove numClasses and numFeatures methods in LinearSVC

2017-02-05 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-19421. - Resolution: Fixed Fix Version/s: 2.2.0 Thanks for fixing this, merged in 317fa750 :) > Remove

[jira] [Assigned] (SPARK-19421) Remove numClasses and numFeatures methods in LinearSVC

2017-02-05 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-19421: --- Assignee: zhengruifeng > Remove numClasses and numFeatures methods in LinearSVC >

[jira] [Assigned] (SPARK-17161) Add PySpark-ML JavaWrapper convenience function to create py4j JavaArrays

2017-02-03 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-17161: --- Assignee: Bryan Cutler Affects Version/s: 2.2.0 > Add PySpark-ML JavaWrapper

[jira] [Resolved] (SPARK-17161) Add PySpark-ML JavaWrapper convenience function to create py4j JavaArrays

2017-02-03 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-17161. - Resolution: Fixed Fix Version/s: 2.2.0 > Add PySpark-ML JavaWrapper convenience function to

[jira] [Commented] (SPARK-14352) approxQuantile should support multi columns

2017-02-02 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849971#comment-15849971 ] holdenk commented on SPARK-14352: - [~hyukjin.kwon] - I'm still waiting on getting my JIRA account set up

[jira] [Commented] (SPARK-2868) Support named accumulators in Python

2017-02-02 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849723#comment-15849723 ] holdenk commented on SPARK-2868: This might be a difficult issue to start of with [~heathkh] - the

[jira] [Commented] (SPARK-18692) Test Java 8 unidoc build on Jenkins master builder

2017-02-02 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849670#comment-15849670 ] holdenk commented on SPARK-18692: - Does the current java8 doc build take too long for this be part of the

[jira] (SPARK-16454) Consider adding a per-batch transform for structured streaming

2017-01-30 Thread holdenk (JIRA)
Title: Message Title holdenk commented on SPARK-16454

[jira] [Commented] (SPARK-17602) PySpark - Performance Optimization Large Size of Broadcast Variable

2017-01-18 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828709#comment-15828709 ] holdenk commented on SPARK-17602: - Ah yes, sorry I've been pretty busy. I just had an interesting chat

[jira] [Commented] (SPARK-19094) Plumb through logging/error messages from the JVM to Jupyter PySpark

2017-01-05 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803086#comment-15803086 ] holdenk commented on SPARK-19094: - I've got something basic working for this, but thinking it might make

[jira] [Created] (SPARK-19094) Plumb through logging/error messages from the JVM to Jupyter PySpark

2017-01-05 Thread holdenk (JIRA)
holdenk created SPARK-19094: --- Summary: Plumb through logging/error messages from the JVM to Jupyter PySpark Key: SPARK-19094 URL: https://issues.apache.org/jira/browse/SPARK-19094 Project: Spark

[jira] [Created] (SPARK-19064) Fix pip install issue with ml sub components

2017-01-03 Thread holdenk (JIRA)
holdenk created SPARK-19064: --- Summary: Fix pip install issue with ml sub components Key: SPARK-19064 URL: https://issues.apache.org/jira/browse/SPARK-19064 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-18281) toLocalIterator yields time out error on pyspark2

2016-12-14 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749830#comment-15749830 ] holdenk commented on SPARK-18281: - For what its worth I can repro on top of the PR with

[jira] [Created] (SPARK-18777) Return UDF objects when registering from Python

2016-12-07 Thread holdenk (JIRA)
holdenk created SPARK-18777: --- Summary: Return UDF objects when registering from Python Key: SPARK-18777 URL: https://issues.apache.org/jira/browse/SPARK-18777 Project: Spark Issue Type:

[jira] [Commented] (SPARK-15369) Investigate selectively using Jython for parts of PySpark

2016-11-29 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15707435#comment-15707435 ] holdenk commented on SPARK-15369: - So I'm probably going to be busy until after the 2.1 release (also

[jira] [Commented] (SPARK-15369) Investigate selectively using Jython for parts of PySpark

2016-11-29 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15707421#comment-15707421 ] holdenk commented on SPARK-15369: - That looks like a great start :) Probably the packaging is going to be

[jira] [Created] (SPARK-18630) PySpark ML memory leak

2016-11-29 Thread holdenk (JIRA)
holdenk created SPARK-18630: --- Summary: PySpark ML memory leak Key: SPARK-18630 URL: https://issues.apache.org/jira/browse/SPARK-18630 Project: Spark Issue Type: Bug Components: ML,

[jira] [Created] (SPARK-18628) Update handle invalid documentation string

2016-11-29 Thread holdenk (JIRA)
holdenk created SPARK-18628: --- Summary: Update handle invalid documentation string Key: SPARK-18628 URL: https://issues.apache.org/jira/browse/SPARK-18628 Project: Spark Issue Type: Improvement

[jira] [Reopened] (SPARK-17788) RangePartitioner results in few very large tasks and many small to empty tasks

2016-11-25 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reopened SPARK-17788: - This is somewhat distinct from the join case, but certainly related. > RangePartitioner results in few very

[jira] [Commented] (SPARK-17788) RangePartitioner results in few very large tasks and many small to empty tasks

2016-11-25 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15696188#comment-15696188 ] holdenk commented on SPARK-17788: - I don't think this is a duplicate - its related but a join doesn't

[jira] [Commented] (SPARK-6522) Standardize Random Number Generation

2016-11-25 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15696024#comment-15696024 ] holdenk commented on SPARK-6522: We have a standardized RDD generator in MLlib (see the RandomRDDs

[jira] [Closed] (SPARK-6522) Standardize Random Number Generation

2016-11-25 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-6522. -- Resolution: Fixed Fix Version/s: 1.1.0 > Standardize Random Number Generation >

[jira] [Commented] (SPARK-5997) Increase partition count without performing a shuffle

2016-11-25 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15696016#comment-15696016 ] holdenk commented on SPARK-5997: That could work, although we'd probably want a different API and we'd

[jira] [Resolved] (SPARK-3348) Support user-defined SparkListeners properly

2016-11-25 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-3348. Resolution: Duplicate > Support user-defined SparkListeners properly >

[jira] [Commented] (SPARK-5190) Allow spark listeners to be added before spark context gets initialized.

2016-11-25 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15696003#comment-15696003 ] holdenk commented on SPARK-5190: This seems to be fixed, but we forgot to close (cc [~joshrosen]) > Allow

[jira] [Commented] (SPARK-636) Add mechanism to run system management/configuration tasks on all workers

2016-11-25 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15695967#comment-15695967 ] holdenk commented on SPARK-636: --- If you have a logging system you want to initialize wouldn't using an object

[jira] [Commented] (SPARK-17788) RangePartitioner results in few very large tasks and many small to empty tasks

2016-11-25 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15695956#comment-15695956 ] holdenk commented on SPARK-17788: - This is semi-expected behaviour of the range partitioner (and really

[jira] [Updated] (SPARK-17788) RangePartitioner results in few very large tasks and many small to empty tasks

2016-11-25 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-17788: Target Version/s: (was: 2.1.0) > RangePartitioner results in few very large tasks and many small to

[jira] [Updated] (SPARK-18108) Partition discovery fails with explicitly written long partitions

2016-11-25 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-18108: Component/s: (was: Spark Core) SQL > Partition discovery fails with explicitly

[jira] [Commented] (SPARK-18128) Add support for publishing to PyPI

2016-11-25 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15695941#comment-15695941 ] holdenk commented on SPARK-18128: - Thanks! :) I'll start working on this issue once we start work on 2.2

[jira] [Commented] (SPARK-18128) Add support for publishing to PyPI

2016-11-25 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15695939#comment-15695939 ] holdenk commented on SPARK-18128: - Thanks! :) I'll start working on this issue once we start work on 2.2

[jira] [Commented] (SPARK-18405) Add yarn-cluster mode support to Spark Thrift Server

2016-11-25 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15695936#comment-15695936 ] holdenk commented on SPARK-18405: - Even in cluster mode you could overwhelm the node running the

[jira] [Updated] (SPARK-18502) Spark does not handle columns that contain backquote (`)

2016-11-25 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-18502: Component/s: (was: Spark Core) SQL > Spark does not handle columns that contain

[jira] [Updated] (SPARK-18532) Code generation memory issue

2016-11-25 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-18532: Component/s: (was: Spark Core) SQL > Code generation memory issue >

[jira] [Commented] (SPARK-18541) Add pyspark.sql.Column.aliasWithMetadata to allow dynamic metadata management in pyspark SQL API

2016-11-25 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15695922#comment-15695922 ] holdenk commented on SPARK-18541: - Making it easier for PySpark SQL users to specify metadata sounds

[jira] [Created] (SPARK-18576) Expose basic TaskContext info in PySpark

2016-11-24 Thread holdenk (JIRA)
holdenk created SPARK-18576: --- Summary: Expose basic TaskContext info in PySpark Key: SPARK-18576 URL: https://issues.apache.org/jira/browse/SPARK-18576 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-12469) Data Property Accumulators for Spark (formerly Consistent Accumulators)

2016-11-22 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15687763#comment-15687763 ] holdenk commented on SPARK-12469: - Cool - I'll bug y'all after the 2.1 release is out so hopefully we can

[jira] [Commented] (SPARK-12469) Data Property Accumulators for Spark (formerly Consistent Accumulators)

2016-11-22 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15687604#comment-15687604 ] holdenk commented on SPARK-12469: - In some ways I agree, on the other hand its slipped 2.0 already (as a

[jira] [Commented] (SPARK-12469) Data Property Accumulators for Spark (formerly Consistent Accumulators)

2016-11-22 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15687563#comment-15687563 ] holdenk commented on SPARK-12469: - Hi [~rxin]/[~squito] if we want to try and get this in for 2.1 we

[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce

2016-11-17 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674308#comment-15674308 ] holdenk commented on SPARK-2620: I don't think its been resolved, does your code need to be in the repl or

[jira] [Created] (SPARK-18418) Make release script hadoop profiles aren't correctly specified.

2016-11-11 Thread holdenk (JIRA)
holdenk created SPARK-18418: --- Summary: Make release script hadoop profiles aren't correctly specified. Key: SPARK-18418 URL: https://issues.apache.org/jira/browse/SPARK-18418 Project: Spark Issue

[jira] [Updated] (SPARK-18128) Add support for publishing to PyPI

2016-11-04 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-18128: Issue Type: Sub-task (was: Improvement) Parent: SPARK-18267 > Add support for publishing to PyPI

[jira] [Commented] (SPARK-18128) Add support for publishing to PyPI

2016-11-04 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637110#comment-15637110 ] holdenk commented on SPARK-18128: - When I e-mailed [~prabinb] earlier this week I got an out of office

[jira] [Commented] (SPARK-18128) Add support for publishing to PyPI

2016-11-04 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637111#comment-15637111 ] holdenk commented on SPARK-18128: - Sure > Add support for publishing to PyPI >

[jira] [Commented] (SPARK-18128) Add support for publishing to PyPI

2016-11-04 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637104#comment-15637104 ] holdenk commented on SPARK-18128: - Good call - so publishing to PyPI test has worked fine but there might

[jira] [Commented] (SPARK-15581) MLlib 2.1 Roadmap

2016-11-03 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634568#comment-15634568 ] holdenk commented on SPARK-15581: - This sounds like really good suggestions - I think some of the biggest

[jira] [Commented] (SPARK-18128) Add support for publishing to PyPI

2016-11-02 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15631463#comment-15631463 ] holdenk commented on SPARK-18128: - Extracted from the discussion around SPARK-1267: People who are

[jira] [Commented] (SPARK-7146) Should ML sharedParams be a public API?

2016-11-01 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15625685#comment-15625685 ] holdenk commented on SPARK-7146: I think it might be reasonable to just expose it as Scala traits and mark

[jira] [Closed] (SPARK-7638) Python API for pmml.export

2016-11-01 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-7638. -- Resolution: Won't Fix We are moving away from the MLlib APIs, so any new functionality should be done against

[jira] [Closed] (SPARK-3981) Consider a better approach to initialize SerDe on executors

2016-11-01 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-3981. -- Resolution: Won't Fix I'm closing this as a "Won't Fix" for now since we are moving over to the ML APIs. If

[jira] [Commented] (SPARK-2868) Support named accumulators in Python

2016-11-01 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15625621#comment-15625621 ] holdenk commented on SPARK-2868: or maybe [~rxin] or [~squito] who have been doing some other accumulator

[jira] [Updated] (SPARK-18128) Add support for publishing to PyPI

2016-10-27 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-18128: Description: After SPARK-1267 is done we should add support for publishing to PyPI similar to how we

[jira] [Commented] (SPARK-17602) PySpark - Performance Optimization Large Size of Broadcast Variable

2016-10-27 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611611#comment-15611611 ] holdenk commented on SPARK-17602: - This certainly looks interesting, do you maybe have some code you

[jira] [Created] (SPARK-18136) Make PySpark pip install works on windows

2016-10-27 Thread holdenk (JIRA)
holdenk created SPARK-18136: --- Summary: Make PySpark pip install works on windows Key: SPARK-18136 URL: https://issues.apache.org/jira/browse/SPARK-18136 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-18129) Sign pip artifacts

2016-10-26 Thread holdenk (JIRA)
holdenk created SPARK-18129: --- Summary: Sign pip artifacts Key: SPARK-18129 URL: https://issues.apache.org/jira/browse/SPARK-18129 Project: Spark Issue Type: Improvement Components:

[jira] [Created] (SPARK-18128) Add support for publishing to PyPI

2016-10-26 Thread holdenk (JIRA)
holdenk created SPARK-18128: --- Summary: Add support for publishing to PyPI Key: SPARK-18128 URL: https://issues.apache.org/jira/browse/SPARK-18128 Project: Spark Issue Type: Improvement

[jira] [Reopened] (SPARK-1267) Add a pip installer for PySpark

2016-10-26 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reopened SPARK-1267: re-opening after discussion on mailing list and PR thread. > Add a pip installer for PySpark >

[jira] [Commented] (SPARK-18073) Migrate wiki to spark.apache.org web site

2016-10-24 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602152#comment-15602152 ] holdenk commented on SPARK-18073: - I like the idea of migrating everything off of the wiki - the fact its

[jira] [Commented] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579949#comment-15579949 ] holdenk commented on SPARK-14141: - Ah sorry for the delay, so doing the cache + count together is done

[jira] [Commented] (SPARK-13534) Implement Apache Arrow serializer for Spark DataFrame for use in DataFrame.toPandas

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579946#comment-15579946 ] holdenk commented on SPARK-13534: - And now they have a release :) I'm not certain its at the stage where

[jira] [Commented] (SPARK-12753) Import error during unit test while calling a function from reduceByKey()

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579942#comment-15579942 ] holdenk commented on SPARK-12753: - (oh as a follow up it appears the user answered their own question on

[jira] [Closed] (SPARK-12753) Import error during unit test while calling a function from reduceByKey()

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-12753. --- Resolution: Not A Problem I don't believe this is a PySpark issue but rather it seems like a Python

[jira] [Resolved] (SPARK-11223) PySpark CrossValidatorModel does not output metrics for every param in paramGrid

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-11223. - Resolution: Fixed Fixed in SPARK-12810 by [~vectorijk] :) > PySpark CrossValidatorModel does not output

[jira] [Commented] (SPARK-11223) PySpark CrossValidatorModel does not output metrics for every param in paramGrid

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579923#comment-15579923 ] holdenk commented on SPARK-11223: - Oh wait it looks like we've already done this and I was looking at the

[jira] [Commented] (SPARK-11223) PySpark CrossValidatorModel does not output metrics for every param in paramGrid

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579921#comment-15579921 ] holdenk commented on SPARK-11223: - This could be a good starter issue for someone interested in ML or

[jira] [Updated] (SPARK-11223) PySpark CrossValidatorModel does not output metrics for every param in paramGrid

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-11223: Labels: starter (was: ) > PySpark CrossValidatorModel does not output metrics for every param in >

[jira] [Commented] (SPARK-10635) pyspark - running on a different host

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579914#comment-15579914 ] holdenk commented on SPARK-10635: - it would be a bit difficult, although as Py4J is speeding up the

[jira] [Commented] (SPARK-10628) Add support for arbitrary RandomRDD generation to PySparkAPI

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579910#comment-15579910 ] holdenk commented on SPARK-10628: - For someone who is interested in doing this, we might be able to do

[jira] [Closed] (SPARK-10525) Add Python example for VectorSlicer to user guide

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-10525. --- Resolution: Fixed Fix Version/s: 2.0.0 Fixed in SPARK-14514 by [~podongfeng] > Add Python example

[jira] [Commented] (SPARK-10525) Add Python example for VectorSlicer to user guide

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579906#comment-15579906 ] holdenk commented on SPARK-10525: - It looks like it does, I'm going to go ahead and resolve this. Thanks

[jira] [Commented] (SPARK-10319) ALS training using PySpark throws a StackOverflowError

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579901#comment-15579901 ] holdenk commented on SPARK-10319: - Is this issue still occurring for you? > ALS training using PySpark

[jira] [Closed] (SPARK-10223) Add takeOrderedByKey function to extract top N records within each group

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-10223. --- Resolution: Won't Fix I don't see this feature being particularly popular, especially since its relatively

[jira] [Closed] (SPARK-9965) Scala, Python SQLContext input methods' deprecation statuses do not match

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-9965. -- Resolution: Resolved Fix Version/s: 2.0.0 These methods were removed in

<    1   2   3   4   5   6   7   8   9   10   >