[jira] [Assigned] (SPARK-24397) Add TaskContext.getLocalProperties in Python

2018-05-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24397: Assignee: Tathagata Das (was: Apache Spark) > Add TaskContext.getLocalProperties in

[jira] [Commented] (SPARK-24397) Add TaskContext.getLocalProperties in Python

2018-05-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491429#comment-16491429 ] Apache Spark commented on SPARK-24397: -- User 'tdas' has created a pull request for this issue:

[jira] [Assigned] (SPARK-24397) Add TaskContext.getLocalProperties in Python

2018-05-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24397: Assignee: Apache Spark (was: Tathagata Das) > Add TaskContext.getLocalProperties in

[jira] [Commented] (SPARK-24250) support accessing SQLConf inside tasks

2018-05-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491419#comment-16491419 ] Apache Spark commented on SPARK-24250: -- User 'gatorsmile' has created a pull request for this issue:

[jira] [Commented] (SPARK-24396) Add Structured Streaming ForeachWriter for python

2018-05-25 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491415#comment-16491415 ] Tathagata Das commented on SPARK-24396: --- TaskContext.getLocalProperty in Python is needed for

[jira] [Comment Edited] (SPARK-24396) Add Structured Streaming ForeachWriter for python

2018-05-25 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491415#comment-16491415 ] Tathagata Das edited comment on SPARK-24396 at 5/26/18 12:07 AM: -

[jira] [Updated] (SPARK-24397) Add TaskContext.getLocalProperties in Python

2018-05-25 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-24397: -- Issue Type: New Feature (was: Sub-task) Parent: (was: SPARK-24396) > Add

[jira] [Created] (SPARK-24397) Add TaskContext.getLocalProperties in Python

2018-05-25 Thread Tathagata Das (JIRA)
Tathagata Das created SPARK-24397: - Summary: Add TaskContext.getLocalProperties in Python Key: SPARK-24397 URL: https://issues.apache.org/jira/browse/SPARK-24397 Project: Spark Issue Type:

[jira] [Created] (SPARK-24396) Add Structured Streaming ForeachWriter for python

2018-05-25 Thread Tathagata Das (JIRA)
Tathagata Das created SPARK-24396: - Summary: Add Structured Streaming ForeachWriter for python Key: SPARK-24396 URL: https://issues.apache.org/jira/browse/SPARK-24396 Project: Spark Issue

[jira] [Commented] (SPARK-24359) SPIP: ML Pipelines in R

2018-05-25 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491412#comment-16491412 ] Joseph K. Bradley commented on SPARK-24359: --- Regarding separating repos: What's the conclusion?

[jira] [Updated] (SPARK-24359) SPIP: ML Pipelines in R

2018-05-25 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-24359: -- Description: h1. Background and motivation SparkR supports calling MLlib

[jira] [Commented] (SPARK-24300) generateLDAData in ml.cluster.LDASuite didn't set seed correctly

2018-05-25 Thread Lu Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491359#comment-16491359 ] Lu Wang commented on SPARK-24300: - I will fix this issue. > generateLDAData in ml.cluster.LDASuite

[jira] [Resolved] (SPARK-24366) Improve error message for Catalyst type converters

2018-05-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-24366. - Resolution: Fixed Assignee: Maxim Gekk Fix Version/s: 2.4.0 > Improve error message for

[jira] [Commented] (SPARK-23455) Default Params in ML should be saved separately

2018-05-25 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491354#comment-16491354 ] Joseph K. Bradley commented on SPARK-23455: --- Yep, thanks [~viirya] for answering! It will

[jira] [Commented] (SPARK-24369) A bug when having multiple distinct aggregations

2018-05-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491342#comment-16491342 ] Xiao Li commented on SPARK-24369: - Thanks! > A bug when having multiple distinct aggregations >

[jira] [Commented] (SPARK-24122) Allow automatic driver restarts on K8s

2018-05-25 Thread Yinan Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491338#comment-16491338 ] Yinan Li commented on SPARK-24122: -- The operator does cover automatic restart of an application with a

[jira] [Updated] (SPARK-24395) Fix Behavior of NOT IN with Literals Containing NULL

2018-05-25 Thread Miles Yucht (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miles Yucht updated SPARK-24395: Description: Spark does not return the correct answer when evaluating NOT IN in some cases. For

[jira] [Updated] (SPARK-24300) generateLDAData in ml.cluster.LDASuite didn't set seed correctly

2018-05-25 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-24300: -- Shepherd: Joseph K. Bradley > generateLDAData in ml.cluster.LDASuite didn't set seed

[jira] [Assigned] (SPARK-6235) Address various 2G limits

2018-05-25 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin reassigned SPARK-6235: - Assignee: Marcelo Vanzin > Address various 2G limits > - > >

[jira] [Assigned] (SPARK-6235) Address various 2G limits

2018-05-25 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin reassigned SPARK-6235: - Assignee: (was: Marcelo Vanzin) > Address various 2G limits >

[jira] [Created] (SPARK-24395) Fix Behavior of NOT IN with Literals Containing NULL

2018-05-25 Thread Miles Yucht (JIRA)
Miles Yucht created SPARK-24395: --- Summary: Fix Behavior of NOT IN with Literals Containing NULL Key: SPARK-24395 URL: https://issues.apache.org/jira/browse/SPARK-24395 Project: Spark Issue

[jira] [Updated] (SPARK-24392) Mark pandas_udf as Experimental

2018-05-25 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated SPARK-24392: --- Target Version/s: 2.3.1 > Mark pandas_udf as Experimental > ---

[jira] [Commented] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data when the analyzed plans are different after re-analyzing the plans

2018-05-25 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491309#comment-16491309 ] Li Jin commented on SPARK-24373: [~smilegator] do you mean that add AnalysisBarrier to 

[jira] [Comment Edited] (SPARK-24392) Mark pandas_udf as Experimental

2018-05-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491305#comment-16491305 ] Bryan Cutler edited comment on SPARK-24392 at 5/25/18 9:53 PM: --- Targeting

[jira] [Commented] (SPARK-24392) Mark pandas_udf as Experimental

2018-05-25 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491308#comment-16491308 ] Marcelo Vanzin commented on SPARK-24392: (There's a target version field for that, btw. Updating

[jira] [Commented] (SPARK-24392) Mark pandas_udf as Experimental

2018-05-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491305#comment-16491305 ] Bryan Cutler commented on SPARK-24392: -- Targeting 2.3.1 > Mark pandas_udf as Experimental >

[jira] [Updated] (SPARK-24392) Mark pandas_udf as Experimental

2018-05-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-24392: - Fix Version/s: 2.3.1 > Mark pandas_udf as Experimental > --- > >

[jira] [Commented] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data when the analyzed plans are different after re-analyzing the plans

2018-05-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491304#comment-16491304 ] Xiao Li commented on SPARK-24373: - In the above example, each time when we re-analyze the plan that is

[jira] [Created] (SPARK-24394) Nodes in decision tree sometimes have negative impurity values

2018-05-25 Thread Barry Becker (JIRA)
Barry Becker created SPARK-24394: Summary: Nodes in decision tree sometimes have negative impurity values Key: SPARK-24394 URL: https://issues.apache.org/jira/browse/SPARK-24394 Project: Spark

[jira] [Comment Edited] (SPARK-23576) SparkSQL - Decimal data missing decimal point

2018-05-25 Thread Hafthor Stefansson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491267#comment-16491267 ] Hafthor Stefansson edited comment on SPARK-23576 at 5/25/18 9:27 PM: -

[jira] [Comment Edited] (SPARK-23576) SparkSQL - Decimal data missing decimal point

2018-05-25 Thread Hafthor Stefansson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491267#comment-16491267 ] Hafthor Stefansson edited comment on SPARK-23576 at 5/25/18 9:26 PM: -

[jira] [Commented] (SPARK-24093) Make some fields of KafkaStreamWriter/InternalRowMicroBatchWriter visible to outside of the classes

2018-05-25 Thread Mingjie Tang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491285#comment-16491285 ] Mingjie Tang commented on SPARK-24093: -- i can add a PR for this. > Make some fields of

[jira] [Commented] (SPARK-23887) update query progress

2018-05-25 Thread Arun Mahadevan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491281#comment-16491281 ] Arun Mahadevan commented on SPARK-23887: We could probably invoke

[jira] [Commented] (SPARK-24091) Internally used ConfigMap prevents use of user-specified ConfigMaps carrying Spark configs files

2018-05-25 Thread Yinan Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491279#comment-16491279 ] Yinan Li commented on SPARK-24091: -- Thanks [~tmckay]! I think the first approach is a good way of

[jira] [Comment Edited] (SPARK-23576) SparkSQL - Decimal data missing decimal point

2018-05-25 Thread Hafthor Stefansson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491267#comment-16491267 ] Hafthor Stefansson edited comment on SPARK-23576 at 5/25/18 9:14 PM: -

[jira] [Commented] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data when the analyzed plans are different after re-analyzing the plans

2018-05-25 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491271#comment-16491271 ] Marco Gaido commented on SPARK-24373: - [~smilegator] yes, you're right, the impact would be

[jira] [Comment Edited] (SPARK-23576) SparkSQL - Decimal data missing decimal point

2018-05-25 Thread Hafthor Stefansson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491267#comment-16491267 ] Hafthor Stefansson edited comment on SPARK-23576 at 5/25/18 9:06 PM: -

[jira] [Commented] (SPARK-23576) SparkSQL - Decimal data missing decimal point

2018-05-25 Thread Hafthor Stefansson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491267#comment-16491267 ] Hafthor Stefansson commented on SPARK-23576: Here's an equivalent problem: spark.sql("select

[jira] [Updated] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data when the analyzed plans are different after re-analyzing the plans

2018-05-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-24373: Summary: "df.cache() df.count()" no longer eagerly caches data when the analyzed plans are different after

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-05-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491233#comment-16491233 ] Xiao Li commented on SPARK-23309: - [~vanzin] https://issues.apache.org/jira/browse/SPARK-24373 is not

[jira] [Updated] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data when the analyzed plans are different after rerunning the analyzer

2018-05-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-24373: Summary: "df.cache() df.count()" no longer eagerly caches data when the analyzed plans are different after

[jira] [Comment Edited] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491224#comment-16491224 ] Xiao Li edited comment on SPARK-24373 at 5/25/18 8:24 PM: -- {code} def count():

[jira] [Comment Edited] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491224#comment-16491224 ] Xiao Li edited comment on SPARK-24373 at 5/25/18 8:23 PM: -- {code} def count():

[jira] [Updated] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-24373: Target Version/s: 2.3.1 > "df.cache() df.count()" no longer eagerly caches data >

[jira] [Updated] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-24373: Priority: Blocker (was: Major) > "df.cache() df.count()" no longer eagerly caches data >

[jira] [Resolved] (SPARK-24004) Tests of from_json for MapType

2018-05-25 Thread Maxim Gekk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk resolved SPARK-24004. Resolution: Won't Fix > Tests of from_json for MapType > -- > >

[jira] [Commented] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491224#comment-16491224 ] Xiao Li commented on SPARK-24373: - {code} def count(): Long = withAction("count",

[jira] [Resolved] (SPARK-15125) CSV data source recognizes empty quoted strings in the input as null.

2018-05-25 Thread Maxim Gekk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk resolved SPARK-15125. Resolution: Fixed Fix Version/s: 2.4.0 The issue has been fixed by

[jira] [Created] (SPARK-24393) SQL builtin: isinf

2018-05-25 Thread Henry Robinson (JIRA)
Henry Robinson created SPARK-24393: -- Summary: SQL builtin: isinf Key: SPARK-24393 URL: https://issues.apache.org/jira/browse/SPARK-24393 Project: Spark Issue Type: New Feature

[jira] [Commented] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-25 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491153#comment-16491153 ] Tomasz Gawęda commented on SPARK-24373: --- [~LI,Xiao] That is a good idea :) Eager caching is

[jira] [Commented] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-25 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491124#comment-16491124 ] Marco Gaido commented on SPARK-24373: - [~smilegator] I think an eager API is not related to the

[jira] [Commented] (SPARK-24324) Pandas Grouped Map UserDefinedFunction mixes column labels

2018-05-25 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491125#comment-16491125 ] Li Jin commented on SPARK-24324: Moved under Spark-22216 for better ticket organization. > Pandas

[jira] [Updated] (SPARK-24324) Pandas Grouped Map UserDefinedFunction mixes column labels

2018-05-25 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24324: --- Issue Type: Sub-task (was: Bug) Parent: SPARK-22216 > Pandas Grouped Map UserDefinedFunction mixes

[jira] [Updated] (SPARK-22809) pyspark is sensitive to imports with dots

2018-05-25 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated SPARK-22809: --- Target Version/s: 2.4.0 (was: 2.3.1, 2.4.0) > pyspark is sensitive to imports with dots >

[jira] [Commented] (SPARK-24383) spark on k8s: "driver-svc" are not getting deleted

2018-05-25 Thread Yinan Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491106#comment-16491106 ] Yinan Li commented on SPARK-24383: -- OK, then garbage collection should kick in and delete the service

[jira] [Commented] (SPARK-22809) pyspark is sensitive to imports with dots

2018-05-25 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491107#comment-16491107 ] Marcelo Vanzin commented on SPARK-22809: I'm removing 2.3.1 since it doesn't seem there's any

[jira] [Commented] (SPARK-24392) Mark pandas_udf as Experimental

2018-05-25 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491099#comment-16491099 ] Marcelo Vanzin commented on SPARK-24392: What release is this supposed to block? > Mark

[jira] [Commented] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491088#comment-16491088 ] Xiao Li commented on SPARK-24373: - BTW, I plan to continue my work of

[jira] [Comment Edited] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491081#comment-16491081 ] Xiao Li edited comment on SPARK-24373 at 5/25/18 5:57 PM: -- [~icexelloss]

[jira] [Commented] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-25 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491086#comment-16491086 ] Li Jin commented on SPARK-24373: We use groupby() and pivot() > "df.cache() df.count()" no longer

[jira] [Commented] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491081#comment-16491081 ] Xiao Li commented on SPARK-24373: - [~icexelloss] [~aweise] Are you also using the Dataset APIs groupBy(),

[jira] [Assigned] (SPARK-24392) Mark pandas_udf as Experimental

2018-05-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24392: Assignee: Apache Spark > Mark pandas_udf as Experimental >

[jira] [Commented] (SPARK-24392) Mark pandas_udf as Experimental

2018-05-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491042#comment-16491042 ] Apache Spark commented on SPARK-24392: -- User 'BryanCutler' has created a pull request for this

[jira] [Assigned] (SPARK-24392) Mark pandas_udf as Experimental

2018-05-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24392: Assignee: (was: Apache Spark) > Mark pandas_udf as Experimental >

[jira] [Commented] (SPARK-24331) Add arrays_overlap / array_repeat / map_entries

2018-05-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490996#comment-16490996 ] Apache Spark commented on SPARK-24331: -- User 'mn-mikke' has created a pull request for this issue:

[jira] [Assigned] (SPARK-24331) Add arrays_overlap / array_repeat / map_entries

2018-05-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24331: Assignee: Apache Spark > Add arrays_overlap / array_repeat / map_entries >

[jira] [Assigned] (SPARK-24331) Add arrays_overlap / array_repeat / map_entries

2018-05-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24331: Assignee: (was: Apache Spark) > Add arrays_overlap / array_repeat / map_entries >

[jira] [Updated] (SPARK-24392) Mark pandas_udf as Experimental

2018-05-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-24392: - Priority: Blocker (was: Critical) > Mark pandas_udf as Experimental >

[jira] [Created] (SPARK-24392) Mark pandas_udf as Experimental

2018-05-25 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-24392: Summary: Mark pandas_udf as Experimental Key: SPARK-24392 URL: https://issues.apache.org/jira/browse/SPARK-24392 Project: Spark Issue Type: Task

[jira] [Created] (SPARK-24391) to_json/from_json should support arrays of primitives, and more generally all JSON

2018-05-25 Thread Sam Kitajima-Kimbrel (JIRA)
Sam Kitajima-Kimbrel created SPARK-24391: Summary: to_json/from_json should support arrays of primitives, and more generally all JSON Key: SPARK-24391 URL:

[jira] [Assigned] (SPARK-23820) Allow the long form of call sites to be recorded in the log

2018-05-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23820: Assignee: Apache Spark > Allow the long form of call sites to be recorded in the log >

[jira] [Assigned] (SPARK-23820) Allow the long form of call sites to be recorded in the log

2018-05-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23820: Assignee: (was: Apache Spark) > Allow the long form of call sites to be recorded in

[jira] [Commented] (SPARK-23820) Allow the long form of call sites to be recorded in the log

2018-05-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490905#comment-16490905 ] Apache Spark commented on SPARK-23820: -- User 'michaelmior' has created a pull request for this

[jira] [Updated] (SPARK-24331) Add arrays_overlap / array_repeat / map_entries

2018-05-25 Thread Marek Novotny (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marek Novotny updated SPARK-24331: -- Description: Add SparkR equivalent to: * arrays_overlap - SPARK-23922 * array_repeat - 

[jira] [Closed] (SPARK-24380) argument quoting/escaping broken in mesos cluster scheduler

2018-05-25 Thread paul mackles (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] paul mackles closed SPARK-24380. > argument quoting/escaping broken in mesos cluster scheduler >

[jira] [Resolved] (SPARK-24380) argument quoting/escaping broken in mesos cluster scheduler

2018-05-25 Thread paul mackles (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] paul mackles resolved SPARK-24380. -- Resolution: Duplicate Dupe of SPARK-23941, just a different config > argument

[jira] [Updated] (SPARK-24331) Add arrays_overlap / array_repeat / map_entries

2018-05-25 Thread Marek Novotny (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marek Novotny updated SPARK-24331: -- Summary: Add arrays_overlap / array_repeat / map_entries(was: Add cardinality /

[jira] [Created] (SPARK-24390) confusion of columns in projection after WITH ROLLUP

2018-05-25 Thread Ryan Foss (JIRA)
Ryan Foss created SPARK-24390: - Summary: confusion of columns in projection after WITH ROLLUP Key: SPARK-24390 URL: https://issues.apache.org/jira/browse/SPARK-24390 Project: Spark Issue Type:

[jira] [Commented] (SPARK-24389) describe() can't work on column that name contain dots

2018-05-25 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490797#comment-16490797 ] Marco Gaido commented on SPARK-24389: - I cannot reproduce on current master. Probably it has been

[jira] [Assigned] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24373: Assignee: Apache Spark > "df.cache() df.count()" no longer eagerly caches data >

[jira] [Assigned] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24373: Assignee: (was: Apache Spark) > "df.cache() df.count()" no longer eagerly caches data

[jira] [Commented] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490772#comment-16490772 ] Apache Spark commented on SPARK-24373: -- User 'mgaido91' has created a pull request for this issue:

[jira] [Commented] (SPARK-19112) add codec for ZStandard

2018-05-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490749#comment-16490749 ] Apache Spark commented on SPARK-19112: -- User 'wangyum' has created a pull request for this issue:

[jira] [Commented] (SPARK-23991) data loss when allocateBlocksToBatch

2018-05-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490729#comment-16490729 ] Apache Spark commented on SPARK-23991: -- User 'gaborgsomogyi' has created a pull request for this

[jira] [Assigned] (SPARK-23991) data loss when allocateBlocksToBatch

2018-05-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23991: Assignee: (was: Apache Spark) > data loss when allocateBlocksToBatch >

[jira] [Assigned] (SPARK-23991) data loss when allocateBlocksToBatch

2018-05-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23991: Assignee: Apache Spark > data loss when allocateBlocksToBatch >

[jira] [Commented] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-25 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490551#comment-16490551 ] Marco Gaido commented on SPARK-24373: - [~wbzhao] yes, I do agree with you. That is the problem. >

[jira] [Created] (SPARK-24389) describe() can't work on column that name contain dots

2018-05-25 Thread zhanggengxin (JIRA)
zhanggengxin created SPARK-24389: Summary: describe() can't work on column that name contain dots Key: SPARK-24389 URL: https://issues.apache.org/jira/browse/SPARK-24389 Project: Spark Issue

[jira] [Commented] (SPARK-24271) sc.hadoopConfigurations can not be overwritten in the same spark context

2018-05-25 Thread Jami Malikzade (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490499#comment-16490499 ] Jami Malikzade commented on SPARK-24271: [~ste...@apache.org] Thank you >

[jira] [Commented] (SPARK-24271) sc.hadoopConfigurations can not be overwritten in the same spark context

2018-05-25 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490474#comment-16490474 ] Steve Loughran commented on SPARK-24271: Disabling the s3 cache can be pretty inefficient, as

[jira] [Issue Comment Deleted] (SPARK-17592) SQL: CAST string as INT inconsistent with Hive

2018-05-25 Thread Jorge Machado (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jorge Machado updated SPARK-17592: -- Comment: was deleted (was: I'm hitting the same issue I'm afraid but in slightly another way.

[jira] [Commented] (SPARK-24388) EventLoop's run method don't handle fatal error, causes driver hang forever

2018-05-25 Thread Xianjin YE (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490373#comment-16490373 ] Xianjin YE commented on SPARK-24388: I am working on this and will send a pr soon. > EventLoop's run

[jira] [Created] (SPARK-24388) EventLoop's run method don't handle fatal error, causes driver hang forever

2018-05-25 Thread Xianjin YE (JIRA)
Xianjin YE created SPARK-24388: -- Summary: EventLoop's run method don't handle fatal error, causes driver hang forever Key: SPARK-24388 URL: https://issues.apache.org/jira/browse/SPARK-24388 Project:

[jira] [Commented] (SPARK-24387) Heartbeat-timeout executor is added back and used again

2018-05-25 Thread Rui Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490309#comment-16490309 ] Rui Li commented on SPARK-24387: When HeartbeatReceiver finds the executor's heartbeat is timeout, it

[jira] [Commented] (SPARK-24374) SPIP: Support Barrier Scheduling in Apache Spark

2018-05-25 Thread Wei Yan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490307#comment-16490307 ] Wei Yan commented on SPARK-24374: - Thanks [~mengxr] for the initiative and the doc. cc [~leftnoteasy]

[jira] [Commented] (SPARK-24387) Heartbeat-timeout executor is added back and used again

2018-05-25 Thread Rui Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490293#comment-16490293 ] Rui Li commented on SPARK-24387: A snippet of the log w/ some fields masked: {noformat} [Stage

[jira] [Created] (SPARK-24387) Heartbeat-timeout executor is added back and used again

2018-05-25 Thread Rui Li (JIRA)
Rui Li created SPARK-24387: -- Summary: Heartbeat-timeout executor is added back and used again Key: SPARK-24387 URL: https://issues.apache.org/jira/browse/SPARK-24387 Project: Spark Issue Type: Bug