[jira] [Commented] (SPARK-28360) The serviceAccountName configuration item does not take effect in client mode.

2019-09-23 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936207#comment-16936207 ] holdenk commented on SPARK-28360: - Don't we need a service account name to create the executor pods? >

[jira] [Comment Edited] (SPARK-28362) Error communicating with MapOutputTracker when many tasks are launched concurrently

2019-09-23 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936206#comment-16936206 ] holdenk edited comment on SPARK-28362 at 9/23/19 9:34 PM: -- Why is your default

[jira] [Commented] (SPARK-28362) Error communicating with MapOutputTracker when many tasks are launched concurrently

2019-09-23 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936206#comment-16936206 ] holdenk commented on SPARK-28362: - Why is your default parallelism configured to `49 * 13 (cores) * 20 =

[jira] [Updated] (SPARK-28403) Executor Allocation Manager can add an extra executor when speculative tasks

2019-09-23 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-28403: Shepherd: holdenk > Executor Allocation Manager can add an extra executor when speculative tasks >

[jira] [Commented] (SPARK-28517) pyspark with --conf spark.jars.packages causes duplicate jars to be uploaded

2019-09-23 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936201#comment-16936201 ] holdenk commented on SPARK-28517: - cc [~bryanc] / [~ifilonenko] > pyspark with --conf

[jira] [Commented] (SPARK-28558) DatasetWriter partitionBy is changing the group file permissions in 2.4 for parquets

2019-09-23 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936200#comment-16936200 ] holdenk commented on SPARK-28558: - What storage system are y'all using [~nladuguie] & [~spearson] ? >

[jira] [Commented] (SPARK-28592) Mark new Shuffle apis as @Experimental (instead of @Private)

2019-09-23 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936199#comment-16936199 ] holdenk commented on SPARK-28592: - Should we set this to blocker so we don't forget? > Mark new Shuffle

[jira] [Commented] (SPARK-28653) Create table using DDL statement should not auto create the destination folder

2019-09-23 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936198#comment-16936198 ] holdenk commented on SPARK-28653: - [~thanida.t] can you confirm if you're still exerpeincing this issue

[jira] [Commented] (SPARK-28727) Request for partial least square (PLS) regression model

2019-09-23 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936196#comment-16936196 ] holdenk commented on SPARK-28727: - I don't believe we'll be adding new algorithms to Spark ML in the

[jira] [Updated] (SPARK-28781) Unneccesary persist in PeriodicCheckpointer.update()

2019-09-23 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-28781: Issue Type: Improvement (was: Bug) > Unneccesary persist in PeriodicCheckpointer.update() >

[jira] [Updated] (SPARK-28978) PySpark: Can't pass more than 256 arguments to a UDF

2019-09-23 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-28978: Target Version/s: 3.0.0 > PySpark: Can't pass more than 256 arguments to a UDF >

[jira] [Assigned] (SPARK-29083) Speed up toLocalIterator with prefetching when enabled

2019-09-23 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-29083: --- Assignee: holdenk > Speed up toLocalIterator with prefetching when enabled >

[jira] [Commented] (SPARK-29217) How to read streaming output path by ignoring metadata log files

2019-09-23 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936186#comment-16936186 ] holdenk commented on SPARK-29217: - Can you clarify what you mean by "Moving some files in the output

[jira] [Commented] (SPARK-29163) Provide a mixin to simplify HadoopConf access patterns in DataSource V2

2019-09-23 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936053#comment-16936053 ] holdenk commented on SPARK-29163: - I'm going to try and do some work on this before the end of the

[jira] [Resolved] (SPARK-27659) Allow PySpark toLocalIterator to prefetch data

2019-09-20 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-27659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-27659. - Fix Version/s: 3.0.0 Assignee: holdenk Resolution: Fixed > Allow PySpark

[jira] [Resolved] (SPARK-28936) Simplify Spark K8s tests by replacing race condition during command execution

2019-09-20 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-28936. - Resolution: Fixed > Simplify Spark K8s tests by replacing race condition during command execution >

[jira] [Updated] (SPARK-28936) Simplify Spark K8s tests by replacing race condition during command execution

2019-09-20 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-28936: Fix Version/s: 3.0.0 > Simplify Spark K8s tests by replacing race condition during command execution >

[jira] [Resolved] (SPARK-28937) Improve error reporting in Spark Secrets Test Suite

2019-09-20 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-28937. - Fix Version/s: 3.0.0 Resolution: Fixed > Improve error reporting in Spark Secrets Test Suite >

[jira] [Commented] (SPARK-29193) Update fabric8 version to 4.3 continue docker 4 desktop support

2019-09-20 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934798#comment-16934798 ] holdenk commented on SPARK-29193: - My bad looks, like we fixed this in  SPARK-28921 > Update fabric8

[jira] [Resolved] (SPARK-29193) Update fabric8 version to 4.3 continue docker 4 desktop support

2019-09-20 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-29193. - Fix Version/s: 3.0.0 Resolution: Duplicate > Update fabric8 version to 4.3 continue docker 4

[jira] [Updated] (SPARK-29193) Update fabric8 version to 4.3 continue docker 4 desktop support

2019-09-20 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-29193: Description: The current version of the kubernetes client we are using has some issues with not setting

[jira] [Commented] (SPARK-29193) Update fabric8 version to 4.3 continue docker 4 desktop support

2019-09-20 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934795#comment-16934795 ] holdenk commented on SPARK-29193: - While I've only observed the issue on docker 4 desktop, it's possible

[jira] [Created] (SPARK-29193) Update fabric8 version to continue docker 4 desktop support

2019-09-20 Thread holdenk (Jira)
holdenk created SPARK-29193: --- Summary: Update fabric8 version to continue docker 4 desktop support Key: SPARK-29193 URL: https://issues.apache.org/jira/browse/SPARK-29193 Project: Spark Issue

[jira] [Created] (SPARK-29163) Provide a mixin to simplify HadoopConf access patterns in DataSource V2

2019-09-18 Thread holdenk (Jira)
holdenk created SPARK-29163: --- Summary: Provide a mixin to simplify HadoopConf access patterns in DataSource V2 Key: SPARK-29163 URL: https://issues.apache.org/jira/browse/SPARK-29163 Project: Spark

[jira] [Created] (SPARK-29158) Expose SerializableConfiguration for DSv2

2019-09-18 Thread holdenk (Jira)
holdenk created SPARK-29158: --- Summary: Expose SerializableConfiguration for DSv2 Key: SPARK-29158 URL: https://issues.apache.org/jira/browse/SPARK-29158 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-22390) Aggregate push down

2019-09-18 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-22390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932739#comment-16932739 ] holdenk commented on SPARK-22390: - Love to follow where this is going, especially if it gets broken into

[jira] [Created] (SPARK-29083) Speed up toLocalIterator with prefetching when enabled

2019-09-13 Thread holdenk (Jira)
holdenk created SPARK-29083: --- Summary: Speed up toLocalIterator with prefetching when enabled Key: SPARK-29083 URL: https://issues.apache.org/jira/browse/SPARK-29083 Project: Spark Issue Type:

[jira] [Created] (SPARK-29076) Generalize the PVTestSuite to no longer need the minikube tag

2019-09-13 Thread holdenk (Jira)
holdenk created SPARK-29076: --- Summary: Generalize the PVTestSuite to no longer need the minikube tag Key: SPARK-29076 URL: https://issues.apache.org/jira/browse/SPARK-29076 Project: Spark Issue

[jira] [Commented] (SPARK-28937) Improve error reporting in Spark Secrets Test Suite

2019-08-30 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919991#comment-16919991 ] holdenk commented on SPARK-28937: - I'm working on this > Improve error reporting in Spark Secrets Test

[jira] [Created] (SPARK-28937) Improve error reporting in Spark Secrets Test Suite

2019-08-30 Thread holdenk (Jira)
holdenk created SPARK-28937: --- Summary: Improve error reporting in Spark Secrets Test Suite Key: SPARK-28937 URL: https://issues.apache.org/jira/browse/SPARK-28937 Project: Spark Issue Type:

[jira] [Commented] (SPARK-28936) Simplify Spark K8s tests by replacing race condition during command execution

2019-08-30 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919990#comment-16919990 ] holdenk commented on SPARK-28936: - I'm working on this. > Simplify Spark K8s tests by replacing race

[jira] [Created] (SPARK-28936) Simplify Spark K8s tests by replacing race condition during command execution

2019-08-30 Thread holdenk (Jira)
holdenk created SPARK-28936: --- Summary: Simplify Spark K8s tests by replacing race condition during command execution Key: SPARK-28936 URL: https://issues.apache.org/jira/browse/SPARK-28936 Project: Spark

[jira] [Commented] (SPARK-28904) Spark PV tests don't create required mount

2019-08-28 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918142#comment-16918142 ] holdenk commented on SPARK-28904: - Related PV FSGroup https://issues.apache.org/jira/browse/SPARK-28905 

[jira] [Created] (SPARK-28905) PVs mounted into Spark may not be writable by Spark

2019-08-28 Thread holdenk (Jira)
holdenk created SPARK-28905: --- Summary: PVs mounted into Spark may not be writable by Spark Key: SPARK-28905 URL: https://issues.apache.org/jira/browse/SPARK-28905 Project: Spark Issue Type:

[jira] [Created] (SPARK-28904) Spark PV tests don't create required mount

2019-08-28 Thread holdenk (Jira)
holdenk created SPARK-28904: --- Summary: Spark PV tests don't create required mount Key: SPARK-28904 URL: https://issues.apache.org/jira/browse/SPARK-28904 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-28886) Kubernetes DepsTestsSuite fails on OSX with minikube 1.3.1 due to formatting

2019-08-27 Thread holdenk (Jira)
holdenk created SPARK-28886: --- Summary: Kubernetes DepsTestsSuite fails on OSX with minikube 1.3.1 due to formatting Key: SPARK-28886 URL: https://issues.apache.org/jira/browse/SPARK-28886 Project: Spark

[jira] [Updated] (SPARK-28842) Cleanup the formatting/trailing spaces in resource-managers/kubernetes/integration-tests/README.md

2019-08-21 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-28842: Labels: starter (was: ) > Cleanup the formatting/trailing spaces in >

[jira] [Created] (SPARK-28842) Cleanup the formatting/trailing spaces in resource-managers/kubernetes/integration-tests/README.md

2019-08-21 Thread holdenk (Jira)
holdenk created SPARK-28842: --- Summary: Cleanup the formatting/trailing spaces in resource-managers/kubernetes/integration-tests/README.md Key: SPARK-28842 URL: https://issues.apache.org/jira/browse/SPARK-28842

[jira] [Assigned] (SPARK-28784) StreamExecution and StreamingQueryManager should utilize CheckpointFileManager to interact with checkpoint directories

2019-08-20 Thread holdenk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-28784: --- Assignee: Shruti Gumma > StreamExecution and StreamingQueryManager should utilize >

[jira] [Commented] (SPARK-27659) Allow PySpark toLocalIterator to prefetch data

2019-08-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909509#comment-16909509 ] holdenk commented on SPARK-27659: - I'm working on this. > Allow PySpark toLocalIterator to prefetch

[jira] [Commented] (SPARK-27683) Remove usage of TraversableOnce

2019-08-15 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908410#comment-16908410 ] holdenk commented on SPARK-27683: - Interesting related discussion over in 

[jira] [Commented] (SPARK-24666) Word2Vec generate infinity vectors when numIterations are large

2019-08-15 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908267#comment-16908267 ] holdenk commented on SPARK-24666: - [~zhongyu09]specific code & data which leads to repro can help. >

[jira] [Created] (SPARK-28740) Add support for building with bloop

2019-08-14 Thread holdenk (JIRA)
holdenk created SPARK-28740: --- Summary: Add support for building with bloop Key: SPARK-28740 URL: https://issues.apache.org/jira/browse/SPARK-28740 Project: Spark Issue Type: Improvement

[jira] [Resolved] (SPARK-9792) PySpark DenseMatrix, SparseMatrix should override __eq__

2019-04-01 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-9792. Resolution: Fixed Fix Version/s: 3.0.0 > PySpark DenseMatrix, SparseMatrix should override __eq__ >

[jira] [Created] (SPARK-27095) We depend on silently accepting failures in setup-integration-test-env.sh

2019-03-07 Thread holdenk (JIRA)
holdenk created SPARK-27095: --- Summary: We depend on silently accepting failures in setup-integration-test-env.sh Key: SPARK-27095 URL: https://issues.apache.org/jira/browse/SPARK-27095 Project: Spark

[jira] [Assigned] (SPARK-21094) Allow stdout/stderr pipes in pyspark.java_gateway.launch_gateway

2019-02-15 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-21094: --- Assignee: Peter Parente > Allow stdout/stderr pipes in pyspark.java_gateway.launch_gateway >

[jira] [Resolved] (SPARK-21094) Allow stdout/stderr pipes in pyspark.java_gateway.launch_gateway

2019-02-15 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-21094. - Resolution: Fixed Fix Version/s: 3.0.0 > Allow stdout/stderr pipes in

[jira] [Created] (SPARK-26898) Scalastyle should run during k8s integration tests

2019-02-15 Thread holdenk (JIRA)
holdenk created SPARK-26898: --- Summary: Scalastyle should run during k8s integration tests Key: SPARK-26898 URL: https://issues.apache.org/jira/browse/SPARK-26898 Project: Spark Issue Type:

[jira] [Created] (SPARK-26882) lint-scala script does not check all components

2019-02-14 Thread holdenk (JIRA)
holdenk created SPARK-26882: --- Summary: lint-scala script does not check all components Key: SPARK-26882 URL: https://issues.apache.org/jira/browse/SPARK-26882 Project: Spark Issue Type:

[jira] [Assigned] (SPARK-26185) add weightCol in python MulticlassClassificationEvaluator

2019-02-08 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-26185: --- Assignee: Huaxin Gao > add weightCol in python MulticlassClassificationEvaluator >

[jira] [Resolved] (SPARK-24489) No check for invalid input type of weight data in ml.PowerIterationClustering

2019-01-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-24489. - Resolution: Fixed Fix Version/s: 3.0.0 Thank's for working on this, I've merged the fix into

[jira] [Assigned] (SPARK-24489) No check for invalid input type of weight data in ml.PowerIterationClustering

2019-01-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-24489: --- Assignee: shahid > No check for invalid input type of weight data in ml.PowerIterationClustering >

[jira] [Created] (SPARK-26497) Show users where the pre-packaged SparkR and PySpark Dockerfiles are in the image build script.

2018-12-28 Thread holdenk (JIRA)
holdenk created SPARK-26497: --- Summary: Show users where the pre-packaged SparkR and PySpark Dockerfiles are in the image build script. Key: SPARK-26497 URL: https://issues.apache.org/jira/browse/SPARK-26497

[jira] [Created] (SPARK-26343) Running the kubernetes

2018-12-11 Thread holdenk (JIRA)
holdenk created SPARK-26343: --- Summary: Running the kubernetes Key: SPARK-26343 URL: https://issues.apache.org/jira/browse/SPARK-26343 Project: Spark Issue Type: Improvement Components:

[jira] [Updated] (SPARK-26343) Speed up running the kubernetes integration tests locally

2018-12-11 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-26343: Summary: Speed up running the kubernetes integration tests locally (was: Running the kubernetes ) >

[jira] [Resolved] (SPARK-25255) Add getActiveSession to SparkSession in PySpark

2018-10-26 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-25255. - Resolution: Fixed Thanks for the PR and fixing this issue :) > Add getActiveSession to SparkSession in

[jira] [Updated] (SPARK-25255) Add getActiveSession to SparkSession in PySpark

2018-10-26 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-25255: Fix Version/s: 3.0.0 > Add getActiveSession to SparkSession in PySpark >

[jira] [Assigned] (SPARK-25255) Add getActiveSession to SparkSession in PySpark

2018-10-26 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-25255: --- Assignee: Huaxin Gao > Add getActiveSession to SparkSession in PySpark >

[jira] [Commented] (SPARK-20598) Iterative checkpoints do not get removed from HDFS

2018-09-19 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621396#comment-16621396 ] holdenk commented on SPARK-20598: - Huh that's interesting.I suspect that could be we're keeping the

[jira] [Commented] (SPARK-25467) Python date/datetime objects in dataframes increment by 1 day when converted to JSON

2018-09-19 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621391#comment-16621391 ] holdenk commented on SPARK-25467: - cc [~bryanc] > Python date/datetime objects in dataframes increment

[jira] [Assigned] (SPARK-14352) approxQuantile should support multi columns

2018-09-19 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-14352: --- Assignee: zhengruifeng > approxQuantile should support multi columns >

[jira] [Commented] (SPARK-17602) PySpark - Performance Optimization Large Size of Broadcast Variable

2018-09-19 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621389#comment-16621389 ] holdenk commented on SPARK-17602: - Did we end up going anywhere with this? > PySpark - Performance

[jira] [Resolved] (SPARK-14352) approxQuantile should support multi columns

2018-09-19 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-14352. - Resolution: Fixed Target Version/s: 2.2.0 > approxQuantile should support multi columns >

[jira] [Updated] (SPARK-25021) Add spark.executor.pyspark.memory support to Kubernetes

2018-09-19 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-25021: Fix Version/s: 2.4.0 > Add spark.executor.pyspark.memory support to Kubernetes >

[jira] [Created] (SPARK-25432) Consider if using standard getOrCreate from PySpark into JVM SparkSession would simplify code

2018-09-14 Thread holdenk (JIRA)
holdenk created SPARK-25432: --- Summary: Consider if using standard getOrCreate from PySpark into JVM SparkSession would simplify code Key: SPARK-25432 URL: https://issues.apache.org/jira/browse/SPARK-25432

[jira] [Resolved] (SPARK-25021) Add spark.executor.pyspark.memory support to Kubernetes

2018-09-08 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-25021. - Resolution: Fixed Fix Version/s: 3.0.0 Merged for 3 - open to the discussion around backporting.

[jira] [Assigned] (SPARK-25021) Add spark.executor.pyspark.memory support to Kubernetes

2018-09-08 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-25021: --- Assignee: Ilan Filonenko > Add spark.executor.pyspark.memory support to Kubernetes >

[jira] [Created] (SPARK-25373) Support mixed language pipelines on Spark on K8s

2018-09-07 Thread holdenk (JIRA)
holdenk created SPARK-25373: --- Summary: Support mixed language pipelines on Spark on K8s Key: SPARK-25373 URL: https://issues.apache.org/jira/browse/SPARK-25373 Project: Spark Issue Type:

[jira] [Assigned] (SPARK-25270) lint-python: Add flake8 to find syntax errors and undefined names

2018-09-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-25270: --- Assignee: cclauss > lint-python: Add flake8 to find syntax errors and undefined names >

[jira] [Resolved] (SPARK-25370) Undefined name _exception_message in java_gateway

2018-09-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-25370. - Resolution: Duplicate Issue was already fixed later. > Undefined name _exception_message in

[jira] [Created] (SPARK-25370) Undefined name _exception_message in java_gateway

2018-09-07 Thread holdenk (JIRA)
holdenk created SPARK-25370: --- Summary: Undefined name _exception_message in java_gateway Key: SPARK-25370 URL: https://issues.apache.org/jira/browse/SPARK-25370 Project: Spark Issue Type: Bug

[jira] [Assigned] (SPARK-25370) Undefined name _exception_message in java_gateway

2018-09-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-25370: --- Assignee: holdenk > Undefined name _exception_message in java_gateway >

[jira] [Created] (SPARK-25360) Parallelized RDDs of Ranges could have known partitioner

2018-09-06 Thread holdenk (JIRA)
holdenk created SPARK-25360: --- Summary: Parallelized RDDs of Ranges could have known partitioner Key: SPARK-25360 URL: https://issues.apache.org/jira/browse/SPARK-25360 Project: Spark Issue Type:

[jira] [Updated] (SPARK-25255) Add getActiveSession to SparkSession in PySpark

2018-08-27 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-25255: Labels: starter (was: ) > Add getActiveSession to SparkSession in PySpark >

[jira] [Created] (SPARK-25255) Add getActiveSession to SparkSession in PySpark

2018-08-27 Thread holdenk (JIRA)
holdenk created SPARK-25255: --- Summary: Add getActiveSession to SparkSession in PySpark Key: SPARK-25255 URL: https://issues.apache.org/jira/browse/SPARK-25255 Project: Spark Issue Type:

[jira] [Commented] (SPARK-25236) Investigate using a logging library inside of PySpark on the workers instead of print

2018-08-26 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593098#comment-16593098 ] holdenk commented on SPARK-25236: - Probably. The only thing would be probably wanting to pass log level

[jira] [Created] (SPARK-25236) Investigate using a logging library inside of PySpark on the workers instead of print

2018-08-24 Thread holdenk (JIRA)
holdenk created SPARK-25236: --- Summary: Investigate using a logging library inside of PySpark on the workers instead of print Key: SPARK-25236 URL: https://issues.apache.org/jira/browse/SPARK-25236 Project:

[jira] [Updated] (SPARK-9636) Treat $SPARK_HOME as write-only

2018-08-24 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-9636: --- Labels: (was: easyfix) > Treat $SPARK_HOME as write-only > --- > >

[jira] [Resolved] (SPARK-19094) Plumb through logging/error messages from the JVM to Jupyter PySpark

2018-08-24 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-19094. - Resolution: Won't Fix No longer as important given other changes. > Plumb through logging/error

[jira] [Updated] (SPARK-25153) Improve error messages for columns with dots/periods

2018-08-18 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-25153: Labels: starter (was: ) > Improve error messages for columns with dots/periods >

[jira] [Created] (SPARK-25153) Improve error messages for columns with dots/periods

2018-08-18 Thread holdenk (JIRA)
holdenk created SPARK-25153: --- Summary: Improve error messages for columns with dots/periods Key: SPARK-25153 URL: https://issues.apache.org/jira/browse/SPARK-25153 Project: Spark Issue Type:

[jira] [Commented] (SPARK-24735) Improve exception when mixing up pandas_udf types

2018-08-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578719#comment-16578719 ] holdenk commented on SPARK-24735: - So [~bryanc]what do you think of if we add a AggregatePythonUDF and

[jira] [Commented] (SPARK-24735) Improve exception when mixing up pandas_udf types

2018-08-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578710#comment-16578710 ] holdenk commented on SPARK-24735: - I think we could do better than just improving the exception, if we

[jira] [Created] (SPARK-25105) Importing all of pyspark.sql.functions should bring PandasUDFType in as well

2018-08-13 Thread holdenk (JIRA)
holdenk created SPARK-25105: --- Summary: Importing all of pyspark.sql.functions should bring PandasUDFType in as well Key: SPARK-25105 URL: https://issues.apache.org/jira/browse/SPARK-25105 Project: Spark

[jira] [Updated] (SPARK-24735) Improve exception when mixing up pandas_udf types

2018-08-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-24735: Summary: Improve exception when mixing up pandas_udf types (was: Improve exception when mixing

[jira] [Commented] (SPARK-24736) --py-files not functional for non local URLs. It appears to pass non-local URL's into PYTHONPATH directly.

2018-08-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578624#comment-16578624 ] holdenk commented on SPARK-24736: - cc [~ifilonenko] > --py-files not functional for non local URLs. It

[jira] [Created] (SPARK-25053) Allow additional port forwarding on Spark on K8S as needed

2018-08-07 Thread holdenk (JIRA)
holdenk created SPARK-25053: --- Summary: Allow additional port forwarding on Spark on K8S as needed Key: SPARK-25053 URL: https://issues.apache.org/jira/browse/SPARK-25053 Project: Spark Issue Type:

[jira] [Comment Edited] (SPARK-21436) Take advantage of known partioner for distinct on RDDs

2018-08-06 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570517#comment-16570517 ] holdenk edited comment on SPARK-21436 at 8/6/18 5:19 PM: - @[~podongfeng]  So

[jira] [Commented] (SPARK-21436) Take advantage of known partioner for distinct on RDDs

2018-08-06 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570517#comment-16570517 ] holdenk commented on SPARK-21436: - @[~podongfeng]  So distinct triggers a `map` first (e.g. it is

[jira] [Commented] (SPARK-24579) SPIP: Standardize Optimized Data Exchange between Spark and DL/AI frameworks

2018-07-30 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562638#comment-16562638 ] holdenk commented on SPARK-24579: - [~mengxr]How about you just open comments up in general and then turn

[jira] [Resolved] (SPARK-23451) Deprecate KMeans computeCost

2018-07-20 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-23451. - Resolution: Fixed Assignee: Marco Gaido Fix Version/s: 2.4.0 > Deprecate KMeans

[jira] [Assigned] (SPARK-23528) Add numIter to ClusteringSummary

2018-07-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-23528: --- Assignee: Marco Gaido > Add numIter to ClusteringSummary > > >

[jira] [Resolved] (SPARK-23528) Add numIter to ClusteringSummary

2018-07-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-23528. - Resolution: Fixed Fix Version/s: 2.4.0 Thanks! > Add numIter to ClusteringSummary >

[jira] [Updated] (SPARK-23528) Add numIter to ClusteringSummary

2018-07-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-23528: Description: Spark ML should expose vital statistics of the GMM model: * *Number of iterations* (actual,

[jira] [Updated] (SPARK-23528) Add numIter to ClusteringSummary

2018-07-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-23528: Summary: Add numIter to ClusteringSummary (was: Expose vital statistics of GaussianMixtureModel) > Add

[jira] [Updated] (SPARK-24780) DataFrame.column_name should resolve to a distinct ref

2018-07-10 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-24780: Summary: DataFrame.column_name should resolve to a distinct ref (was: DataFrame.column_name should take

[jira] [Updated] (SPARK-24780) DataFrame.column_name should take into account DataFrame alias for future joins

2018-07-10 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-24780: Description: If we join a dataframe with another dataframe which has the same column name of the

[jira] [Created] (SPARK-24780) DataFrame.column_name should take into account DataFrame alias for future joins

2018-07-10 Thread holdenk (JIRA)
holdenk created SPARK-24780: --- Summary: DataFrame.column_name should take into account DataFrame alias for future joins Key: SPARK-24780 URL: https://issues.apache.org/jira/browse/SPARK-24780 Project: Spark

[jira] [Commented] (SPARK-24668) PySpark crashes when getting the webui url if the webui is disabled

2018-07-02 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530284#comment-16530284 ] holdenk commented on SPARK-24668: - So there is also the case where Spark is run without the history

[jira] [Updated] (SPARK-24668) PySpark crashes when getting the webui url if the webui is disabled

2018-07-02 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-24668: Shepherd: holdenk Affects Version/s: 2.4.0 > PySpark crashes when getting the webui url if

  1   2   3   4   5   6   7   8   9   10   >