[jira] [Resolved] (SPARK-24244) Parse only required columns of CSV file

2018-05-24 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-24244. - Resolution: Fixed > Parse only required columns of CSV file > --- >

[jira] [Updated] (SPARK-24244) Parse only required columns of CSV file

2018-05-24 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-24244: Priority: Major (was: Minor) > Parse only required columns of CSV file >

[jira] [Resolved] (SPARK-24368) Flaky tests: org.apache.spark.sql.execution.datasources.csv.UnivocityParserSuite

2018-05-24 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-24368. - Resolution: Fixed Assignee: Maxim Gekk Fix Version/s: 2.4.0 > Flaky tests: >

[jira] [Assigned] (SPARK-24367) Parquet: use JOB_SUMMARY_LEVEL instead of deprecated flag ENABLE_JOB_SUMMARY

2018-05-24 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-24367: Assignee: Gengliang Wang > Parquet: use JOB_SUMMARY_LEVEL instead of deprecated flag

[jira] [Resolved] (SPARK-24367) Parquet: use JOB_SUMMARY_LEVEL instead of deprecated flag ENABLE_JOB_SUMMARY

2018-05-24 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-24367. -- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 21411

[jira] [Resolved] (SPARK-23929) pandas_udf schema mapped by position and not by name

2018-05-24 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-23929. -- Resolution: Duplicate > pandas_udf schema mapped by position and not by name >

[jira] [Assigned] (SPARK-24235) create the top-of-task RDD sending rows to the remote buffer

2018-05-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24235: Assignee: (was: Apache Spark) > create the top-of-task RDD sending rows to the remote

[jira] [Commented] (SPARK-24235) create the top-of-task RDD sending rows to the remote buffer

2018-05-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490047#comment-16490047 ] Apache Spark commented on SPARK-24235: -- User 'jose-torres' has created a pull request for this

[jira] [Assigned] (SPARK-24235) create the top-of-task RDD sending rows to the remote buffer

2018-05-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24235: Assignee: Apache Spark > create the top-of-task RDD sending rows to the remote buffer >

[jira] [Commented] (SPARK-24324) Pandas Grouped Map UserDefinedFunction mixes column labels

2018-05-24 Thread Cristian Consonni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490022#comment-16490022 ] Cristian Consonni commented on SPARK-24324: --- [~bryanc] said: > As a workaround, you could write

[jira] [Created] (SPARK-24386) implement continuous processing coalesce(1)

2018-05-24 Thread Jose Torres (JIRA)
Jose Torres created SPARK-24386: --- Summary: implement continuous processing coalesce(1) Key: SPARK-24386 URL: https://issues.apache.org/jira/browse/SPARK-24386 Project: Spark Issue Type:

[jira] [Commented] (SPARK-23929) pandas_udf schema mapped by position and not by name

2018-05-24 Thread Cristian Consonni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489972#comment-16489972 ] Cristian Consonni commented on SPARK-23929: --- This bug was referenced in [issue

[jira] [Commented] (SPARK-24383) spark on k8s: "driver-svc" are not getting deleted

2018-05-24 Thread Lenin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489959#comment-16489959 ] Lenin commented on SPARK-24383: --- Yes it does have it. ``` apiVersion: v1 kind: Service metadata:

[jira] [Commented] (SPARK-24359) SPIP: ML Pipelines in R

2018-05-24 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489944#comment-16489944 ] Hossein Falaki commented on SPARK-24359: Thank you guys for feedback. I updated the SPIP and the

[jira] [Updated] (SPARK-24359) SPIP: ML Pipelines in R

2018-05-24 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-24359: --- Description: h1. Background and motivation SparkR supports calling MLlib functionality with

[jira] [Updated] (SPARK-24359) SPIP: ML Pipelines in R

2018-05-24 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-24359: --- Attachment: SparkML_ ML Pipelines in R-v2.pdf > SPIP: ML Pipelines in R >

[jira] [Commented] (SPARK-24383) spark on k8s: "driver-svc" are not getting deleted

2018-05-24 Thread Yinan Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489942#comment-16489942 ] Yinan Li commented on SPARK-24383: -- You can use {{kubectl get service -o=yaml}} to get a

[jira] [Commented] (SPARK-24383) spark on k8s: "driver-svc" are not getting deleted

2018-05-24 Thread Lenin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489939#comment-16489939 ] Lenin commented on SPARK-24383: --- I tried to check for that, but couldnt find where to look. I can see in

[jira] [Updated] (SPARK-23754) StopIterator exception in Python UDF results in partial result

2018-05-24 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated SPARK-23754: --- Target Version/s: 2.3.1 Adding target version so it actually shows up in searches... >

[jira] [Assigned] (SPARK-14220) Build and test Spark against Scala 2.12

2018-05-24 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin reassigned SPARK-14220: -- Assignee: Marcelo Vanzin > Build and test Spark against Scala 2.12 >

[jira] [Assigned] (SPARK-14220) Build and test Spark against Scala 2.12

2018-05-24 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin reassigned SPARK-14220: -- Assignee: (was: Marcelo Vanzin) > Build and test Spark against Scala 2.12 >

[jira] [Commented] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-24 Thread Wenbo Zhao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489820#comment-16489820 ] Wenbo Zhao commented on SPARK-24373: I guess we should use `planWithBarrier` in the

[jira] [Resolved] (SPARK-24350) ClassCastException in "array_position" function

2018-05-24 Thread Alex Vayda (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Vayda resolved SPARK-24350. Resolution: Fixed > ClassCastException in "array_position" function >

[jira] [Comment Edited] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-24 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489060#comment-16489060 ] Li Jin edited comment on SPARK-24373 at 5/24/18 9:00 PM: - This is a reproduce:

[jira] [Comment Edited] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-24 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489060#comment-16489060 ] Li Jin edited comment on SPARK-24373 at 5/24/18 9:00 PM: - This is a reproduce:

[jira] [Comment Edited] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-24 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489060#comment-16489060 ] Li Jin edited comment on SPARK-24373 at 5/24/18 8:51 PM: - This is a reproduce:

[jira] [Commented] (SPARK-24383) spark on k8s: "driver-svc" are not getting deleted

2018-05-24 Thread Yinan Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489776#comment-16489776 ] Yinan Li commented on SPARK-24383: -- Can you double check if the services have an {{OwnerReference}}

[jira] [Commented] (SPARK-24375) Design sketch: support barrier scheduling in Apache Spark

2018-05-24 Thread Jiang Xingbo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489764#comment-16489764 ] Jiang Xingbo commented on SPARK-24375: -- We proposal to add new RDDBarrier and BarrierTaskContext to

[jira] [Commented] (SPARK-24324) Pandas Grouped Map UserDefinedFunction mixes column labels

2018-05-24 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489759#comment-16489759 ] Li Jin commented on SPARK-24324: This is a dup of https://issues.apache.org/jira/browse/SPARK-23929, I am

[jira] [Comment Edited] (SPARK-24036) Stateful operators in continuous processing

2018-05-24 Thread Jose Torres (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489648#comment-16489648 ] Jose Torres edited comment on SPARK-24036 at 5/24/18 8:23 PM: -- I've been

[jira] [Comment Edited] (SPARK-24036) Stateful operators in continuous processing

2018-05-24 Thread Jose Torres (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489648#comment-16489648 ] Jose Torres edited comment on SPARK-24036 at 5/24/18 8:22 PM: -- I've been

[jira] [Commented] (SPARK-24036) Stateful operators in continuous processing

2018-05-24 Thread Jose Torres (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489727#comment-16489727 ] Jose Torres commented on SPARK-24036: - That's out of scope - the shuffle reader and writer work in

[jira] [Commented] (SPARK-24036) Stateful operators in continuous processing

2018-05-24 Thread Arun Mahadevan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489700#comment-16489700 ] Arun Mahadevan commented on SPARK-24036: If I understand correctly, continuous job would have a

[jira] [Resolved] (SPARK-24332) Fix places reading 'spark.network.timeout' as milliseconds

2018-05-24 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-24332. -- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 21382

[jira] [Commented] (SPARK-24383) spark on k8s: "driver-svc" are not getting deleted

2018-05-24 Thread Lenin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489665#comment-16489665 ] Lenin commented on SPARK-24383: --- its not something i observed. I had a lots of dangling services. > spark

[jira] [Commented] (SPARK-23416) Flaky test: KafkaSourceStressForDontFailOnDataLossSuite.stress test for failOnDataLoss=false

2018-05-24 Thread Jose Torres (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489661#comment-16489661 ] Jose Torres commented on SPARK-23416: - Do you know how to drive that? I'm not sure what the process

[jira] [Assigned] (SPARK-24324) Pandas Grouped Map UserDefinedFunction mixes column labels

2018-05-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24324: Assignee: Apache Spark > Pandas Grouped Map UserDefinedFunction mixes column labels >

[jira] [Commented] (SPARK-24324) Pandas Grouped Map UserDefinedFunction mixes column labels

2018-05-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489650#comment-16489650 ] Apache Spark commented on SPARK-24324: -- User 'BryanCutler' has created a pull request for this

[jira] [Assigned] (SPARK-24324) Pandas Grouped Map UserDefinedFunction mixes column labels

2018-05-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24324: Assignee: (was: Apache Spark) > Pandas Grouped Map UserDefinedFunction mixes column

[jira] [Commented] (SPARK-24036) Stateful operators in continuous processing

2018-05-24 Thread Jose Torres (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489648#comment-16489648 ] Jose Torres commented on SPARK-24036: - I've been notified of

[jira] [Created] (SPARK-24385) Trivially-true EqualNullSafe should be handled like EqualTo in Dataset.join

2018-05-24 Thread Daniel Shields (JIRA)
Daniel Shields created SPARK-24385: -- Summary: Trivially-true EqualNullSafe should be handled like EqualTo in Dataset.join Key: SPARK-24385 URL: https://issues.apache.org/jira/browse/SPARK-24385

[jira] [Updated] (SPARK-24324) Pandas Grouped Map UserDefinedFunction mixes column labels

2018-05-24 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-24324: - Summary: Pandas Grouped Map UserDefinedFunction mixes column labels (was: UserDefinedFunction

[jira] [Commented] (SPARK-24383) spark on k8s: "driver-svc" are not getting deleted

2018-05-24 Thread Yinan Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489601#comment-16489601 ] Yinan Li commented on SPARK-24383: -- The Kubernetes specific submission client adds an {{OwnerReference}} 

[jira] [Updated] (SPARK-24356) Duplicate strings in File.path managed by FileSegmentManagedBuffer

2018-05-24 Thread Misha Dmitriev (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Misha Dmitriev updated SPARK-24356: --- Attachment: SPARK-24356.01.patch > Duplicate strings in File.path managed by

[jira] [Comment Edited] (SPARK-24358) createDataFrame in Python 3 should be able to infer bytes type as Binary type

2018-05-24 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489546#comment-16489546 ] Hyukjin Kwon edited comment on SPARK-24358 at 5/24/18 6:31 PM: --- Yea, I know

[jira] [Commented] (SPARK-24358) createDataFrame in Python 3 should be able to infer bytes type as Binary type

2018-05-24 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489546#comment-16489546 ] Hyukjin Kwon commented on SPARK-24358: -- Yea, I know the differences and I know the rationale here.

[jira] [Commented] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-24 Thread Wenbo Zhao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489534#comment-16489534 ] Wenbo Zhao commented on SPARK-24373: It is not apparently to me that they are the same issue though

[jira] [Commented] (SPARK-24324) UserDefinedFunction mixes column labels

2018-05-24 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489533#comment-16489533 ] Bryan Cutler commented on SPARK-24324: -- I was able to reproduce, the problem is that when pyspark

[jira] [Commented] (SPARK-24384) spark-submit --py-files with .py files doesn't work in client mode before context initialization

2018-05-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489524#comment-16489524 ] Apache Spark commented on SPARK-24384: -- User 'HyukjinKwon' has created a pull request for this

[jira] [Assigned] (SPARK-24384) spark-submit --py-files with .py files doesn't work in client mode before context initialization

2018-05-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24384: Assignee: (was: Apache Spark) > spark-submit --py-files with .py files doesn't work

[jira] [Assigned] (SPARK-24384) spark-submit --py-files with .py files doesn't work in client mode before context initialization

2018-05-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24384: Assignee: Apache Spark > spark-submit --py-files with .py files doesn't work in client

[jira] [Commented] (SPARK-24358) createDataFrame in Python 3 should be able to infer bytes type as Binary type

2018-05-24 Thread Joel Croteau (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489507#comment-16489507 ] Joel Croteau commented on SPARK-24358: -- This does mean that the current implementation has some

[jira] [Updated] (SPARK-24384) spark-submit --py-files with .py files doesn't work in client mode before context initialization

2018-05-24 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-24384: - Component/s: Spark Submit > spark-submit --py-files with .py files doesn't work in client mode

[jira] [Created] (SPARK-24384) spark-submit --py-files with .py files doesn't work in client mode before context initialization

2018-05-24 Thread Hyukjin Kwon (JIRA)
Hyukjin Kwon created SPARK-24384: Summary: spark-submit --py-files with .py files doesn't work in client mode before context initialization Key: SPARK-24384 URL: https://issues.apache.org/jira/browse/SPARK-24384

[jira] [Updated] (SPARK-23754) StopIterator exception in Python UDF results in partial result

2018-05-24 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-23754: - Priority: Blocker (was: Major) > StopIterator exception in Python UDF results in partial result

[jira] [Commented] (SPARK-24356) Duplicate strings in File.path managed by FileSegmentManagedBuffer

2018-05-24 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489417#comment-16489417 ] Imran Rashid commented on SPARK-24356: -- cc [~jinxing6...@126.com] [~elu] [~felixcheung] -- this

[jira] [Commented] (SPARK-21945) pyspark --py-files doesn't work in yarn client mode

2018-05-24 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489397#comment-16489397 ] Hyukjin Kwon commented on SPARK-21945: -- For another clarification, the launch execution codepath is

[jira] [Commented] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-24 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489389#comment-16489389 ] Marcelo Vanzin commented on SPARK-24373: This could be the same as SPARK-23309. > "df.cache()

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-05-24 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489391#comment-16489391 ] Marcelo Vanzin commented on SPARK-23309: [~kiszk] SPARK-24373 has some code. > Spark 2.3 cached

[jira] [Commented] (SPARK-24372) Create script for preparing RCs

2018-05-24 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489378#comment-16489378 ] Marcelo Vanzin commented on SPARK-24372: I'm keeping the current version of the scripts here

[jira] [Comment Edited] (SPARK-21945) pyspark --py-files doesn't work in yarn client mode

2018-05-24 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489367#comment-16489367 ] Hyukjin Kwon edited comment on SPARK-21945 at 5/24/18 4:50 PM: --- To be more

[jira] [Commented] (SPARK-21945) pyspark --py-files doesn't work in yarn client mode

2018-05-24 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489369#comment-16489369 ] Hyukjin Kwon commented on SPARK-21945: -- Just for clarification, zip file works fine because the

[jira] [Commented] (SPARK-21945) pyspark --py-files doesn't work in yarn client mode

2018-05-24 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489367#comment-16489367 ] Hyukjin Kwon commented on SPARK-21945: -- To be more correct, the paths are added as are given my

[jira] [Commented] (SPARK-21945) pyspark --py-files doesn't work in yarn client mode

2018-05-24 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489338#comment-16489338 ] Marcelo Vanzin commented on SPARK-21945: It happens because the import happens before the context

[jira] [Commented] (SPARK-24374) SPIP: Support Barrier Scheduling in Apache Spark

2018-05-24 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489333#comment-16489333 ] Xiangrui Meng commented on SPARK-24374: --- [~galv] Thanks for your feedback! * SPARK-20327 allows

[jira] [Updated] (SPARK-20712) [SPARK 2.1 REGRESSION][SQL] Spark can't read Hive table when column type has length greater than 4000 bytes

2018-05-24 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-20712: Target Version/s: 2.4.0 > [SPARK 2.1 REGRESSION][SQL] Spark can't read Hive table when column type has >

[jira] [Updated] (SPARK-24378) Incorrect examples for date_trunc function in spark 2.3.0

2018-05-24 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-24378: - Flags: (was: Important) > Incorrect examples for date_trunc function in spark 2.3.0 >

[jira] [Updated] (SPARK-24378) Incorrect examples for date_trunc function in spark 2.3.0

2018-05-24 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-24378: - Priority: Trivial (was: Major) > Incorrect examples for date_trunc function in spark 2.3.0 >

[jira] [Updated] (SPARK-24378) Incorrect examples for date_trunc function in spark 2.3.0

2018-05-24 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-24378: - Issue Type: Documentation (was: Bug) > Incorrect examples for date_trunc function in spark

[jira] [Resolved] (SPARK-24378) Incorrect examples for date_trunc function in spark 2.3.0

2018-05-24 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-24378. -- Resolution: Fixed Fix Version/s: 2.4.0 2.3.1 Fixed in

[jira] [Assigned] (SPARK-24378) Incorrect examples for date_trunc function in spark 2.3.0

2018-05-24 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-24378: Assignee: Yuming Wang > Incorrect examples for date_trunc function in spark 2.3.0 >

[jira] [Created] (SPARK-24383) spark on k8s: "driver-svc" are not getting deleted

2018-05-24 Thread Lenin (JIRA)
Lenin created SPARK-24383: - Summary: spark on k8s: "driver-svc" are not getting deleted Key: SPARK-24383 URL: https://issues.apache.org/jira/browse/SPARK-24383 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-24381) Improve Unit Test Coverage of NOT IN subqueries

2018-05-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489242#comment-16489242 ] Apache Spark commented on SPARK-24381: -- User 'mgyucht' has created a pull request for this issue:

[jira] [Assigned] (SPARK-24381) Improve Unit Test Coverage of NOT IN subqueries

2018-05-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24381: Assignee: (was: Apache Spark) > Improve Unit Test Coverage of NOT IN subqueries >

[jira] [Assigned] (SPARK-24381) Improve Unit Test Coverage of NOT IN subqueries

2018-05-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24381: Assignee: Apache Spark > Improve Unit Test Coverage of NOT IN subqueries >

[jira] [Comment Edited] (SPARK-24091) Internally used ConfigMap prevents use of user-specified ConfigMaps carrying Spark configs files

2018-05-24 Thread Trevor McKay (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489186#comment-16489186 ] Trevor McKay edited comment on SPARK-24091 at 5/24/18 3:39 PM: --- I had a

[jira] [Comment Edited] (SPARK-24091) Internally used ConfigMap prevents use of user-specified ConfigMaps carrying Spark configs files

2018-05-24 Thread Trevor McKay (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489186#comment-16489186 ] Trevor McKay edited comment on SPARK-24091 at 5/24/18 3:38 PM: --- I had a

[jira] [Updated] (SPARK-24378) Incorrect examples for date_trunc function in spark 2.3.0

2018-05-24 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-24378: - Target Version/s: (was: 2.3.0) > Incorrect examples for date_trunc function in spark 2.3.0 >

[jira] [Comment Edited] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-24 Thread Wenbo Zhao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489233#comment-16489233 ] Wenbo Zhao edited comment on SPARK-24373 at 5/24/18 3:37 PM: - I turned on the

[jira] [Updated] (SPARK-24378) Incorrect examples for date_trunc function in spark 2.3.0

2018-05-24 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-24378: - Fix Version/s: (was: 2.3.0) > Incorrect examples for date_trunc function in spark 2.3.0 >

[jira] [Commented] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-24 Thread Wenbo Zhao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489233#comment-16489233 ] Wenbo Zhao commented on SPARK-24373: I turned on the log trace of RuleExecutor and found that in my

[jira] [Created] (SPARK-24382) Spark Structured Streaming aggregation on old timestamp data

2018-05-24 Thread Karthik (JIRA)
Karthik created SPARK-24382: --- Summary: Spark Structured Streaming aggregation on old timestamp data Key: SPARK-24382 URL: https://issues.apache.org/jira/browse/SPARK-24382 Project: Spark Issue

[jira] [Commented] (SPARK-24091) Internally used ConfigMap prevents use of user-specified ConfigMaps carrying Spark configs files

2018-05-24 Thread Trevor McKay (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489186#comment-16489186 ] Trevor McKay commented on SPARK-24091: -- I had a similar situation in a project. One way to handle

[jira] [Comment Edited] (SPARK-17592) SQL: CAST string as INT inconsistent with Hive

2018-05-24 Thread Jorge Machado (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489185#comment-16489185 ] Jorge Machado edited comment on SPARK-17592 at 5/24/18 3:11 PM: I'm

[jira] [Updated] (SPARK-17592) SQL: CAST string as INT inconsistent with Hive

2018-05-24 Thread Jorge Machado (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jorge Machado updated SPARK-17592: -- Attachment: image-2018-05-24-17-10-24-515.png > SQL: CAST string as INT inconsistent with Hive

[jira] [Commented] (SPARK-17592) SQL: CAST string as INT inconsistent with Hive

2018-05-24 Thread Jorge Machado (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489185#comment-16489185 ] Jorge Machado commented on SPARK-17592: --- I'm hitting the same issue I'm afraid but in slightly

[jira] [Commented] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-24 Thread Andreas Weise (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489129#comment-16489129 ] Andreas Weise commented on SPARK-24373: --- We are also facing increased runtime duration for our SQL

[jira] [Created] (SPARK-24381) Improve Unit Test Coverage of NOT IN subqueries

2018-05-24 Thread Miles Yucht (JIRA)
Miles Yucht created SPARK-24381: --- Summary: Improve Unit Test Coverage of NOT IN subqueries Key: SPARK-24381 URL: https://issues.apache.org/jira/browse/SPARK-24381 Project: Spark Issue Type:

[jira] [Commented] (SPARK-24329) Remove comments filtering before parsing of CSV files

2018-05-24 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489100#comment-16489100 ] Hyukjin Kwon commented on SPARK-24329: -- Fixed by explicitly adding a test where the code is valid.

[jira] [Resolved] (SPARK-24329) Remove comments filtering before parsing of CSV files

2018-05-24 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-24329. -- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 21394

[jira] [Assigned] (SPARK-24329) Remove comments filtering before parsing of CSV files

2018-05-24 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-24329: Assignee: Maxim Gekk > Remove comments filtering before parsing of CSV files >

[jira] [Commented] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-24 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489060#comment-16489060 ] Li Jin commented on SPARK-24373: This is a reproduce in unit test: {code:java} test("cache and count") {

[jira] [Created] (SPARK-24380) argument quoting/escaping broken

2018-05-24 Thread paul mackles (JIRA)
paul mackles created SPARK-24380: Summary: argument quoting/escaping broken Key: SPARK-24380 URL: https://issues.apache.org/jira/browse/SPARK-24380 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-24380) argument quoting/escaping broken in mesos cluster scheduler

2018-05-24 Thread paul mackles (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] paul mackles updated SPARK-24380: - Summary: argument quoting/escaping broken in mesos cluster scheduler (was: argument

[jira] [Resolved] (SPARK-24230) With Parquet 1.10 upgrade has errors in the vectorized reader

2018-05-24 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-24230. - Resolution: Fixed Assignee: Ryan Blue Fix Version/s: 2.4.0

[jira] [Assigned] (SPARK-24379) BroadcastExchangeExec should catch SparkOutOfMemory and re-throw SparkFatalException, which wraps SparkOutOfMemory inside.

2018-05-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24379: Assignee: (was: Apache Spark) > BroadcastExchangeExec should catch SparkOutOfMemory

[jira] [Commented] (SPARK-24379) BroadcastExchangeExec should catch SparkOutOfMemory and re-throw SparkFatalException, which wraps SparkOutOfMemory inside.

2018-05-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488860#comment-16488860 ] Apache Spark commented on SPARK-24379: -- User 'jinxing64' has created a pull request for this issue:

[jira] [Assigned] (SPARK-24379) BroadcastExchangeExec should catch SparkOutOfMemory and re-throw SparkFatalException, which wraps SparkOutOfMemory inside.

2018-05-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24379: Assignee: Apache Spark > BroadcastExchangeExec should catch SparkOutOfMemory and re-throw

[jira] [Commented] (SPARK-24378) Incorrect examples for date_trunc function in spark 2.3.0

2018-05-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488850#comment-16488850 ] Apache Spark commented on SPARK-24378: -- User 'wangyum' has created a pull request for this issue:

[jira] [Assigned] (SPARK-24378) Incorrect examples for date_trunc function in spark 2.3.0

2018-05-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24378: Assignee: Apache Spark > Incorrect examples for date_trunc function in spark 2.3.0 >

  1   2   >