[jira] [Commented] (SPARK-31437) Try assigning tasks to existing executors by which required resources in ResourceProfile are satisfied

2020-04-15 Thread Hongze Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084552#comment-17084552 ] Hongze Zhang commented on SPARK-31437: -- Thanks [~tgraves]. I got your point of making them tied.

[jira] [Updated] (SPARK-31457) spark jdbc read hive created the wrong PreparedStatement

2020-04-15 Thread daile (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] daile updated SPARK-31457: -- Attachment: hivejdbc3.png > spark jdbc read hive created the wrong PreparedStatement >

[jira] [Updated] (SPARK-31457) spark jdbc read hive created the wrong PreparedStatement

2020-04-15 Thread daile (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] daile updated SPARK-31457: -- Attachment: hivejdbc2.png > spark jdbc read hive created the wrong PreparedStatement >

[jira] [Updated] (SPARK-31457) spark jdbc read hive created the wrong PreparedStatement

2020-04-15 Thread daile (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] daile updated SPARK-31457: -- Attachment: sparkhivejdbc.png > spark jdbc read hive created the wrong PreparedStatement >

[jira] [Created] (SPARK-31457) spark jdbc read hive created the wrong PreparedStatement

2020-04-15 Thread daile (Jira)
daile created SPARK-31457: - Summary: spark jdbc read hive created the wrong PreparedStatement Key: SPARK-31457 URL: https://issues.apache.org/jira/browse/SPARK-31457 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-31456) If shutdownhook is added with priority Integer.MIN_VALUE, it's supposed to be called the last, but it gets called before other positive priority shutdownhook

2020-04-15 Thread Xiaolei Liu (Jira)
Xiaolei Liu created SPARK-31456: --- Summary: If shutdownhook is added with priority Integer.MIN_VALUE, it's supposed to be called the last, but it gets called before other positive priority shutdownhook Key: SPARK-31456

[jira] [Commented] (SPARK-31455) Fix rebasing of not-existed dates/timestamps

2020-04-15 Thread Wenchen Fan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084501#comment-17084501 ] Wenchen Fan commented on SPARK-31455: - also https://github.com/apache/spark/pull/28225 > Fix

[jira] [Resolved] (SPARK-31455) Fix rebasing of not-existed timestamps

2020-04-15 Thread Wenchen Fan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31455. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28227

[jira] [Updated] (SPARK-31455) Fix rebasing of not-existed dates/timestamps

2020-04-15 Thread Wenchen Fan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-31455: Summary: Fix rebasing of not-existed dates/timestamps (was: Fix rebasing of not-existed

[jira] [Created] (SPARK-31455) Fix rebasing of not-existed timestamps

2020-04-15 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-31455: --- Summary: Fix rebasing of not-existed timestamps Key: SPARK-31455 URL: https://issues.apache.org/jira/browse/SPARK-31455 Project: Spark Issue Type: Sub-task

[jira] [Assigned] (SPARK-15616) CatalogRelation should fallback to HDFS size of partitions that are involved in Query if statistics are not available.

2020-04-15 Thread Xiao Li (Jira)
[ https://issues.apache.org/jira/browse/SPARK-15616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-15616: --- Assignee: Hu Fuwang > CatalogRelation should fallback to HDFS size of partitions that are involved

[jira] [Created] (SPARK-31454) An optimized K-Means based on DenseMatrix and GEMM

2020-04-15 Thread Xiaochang Wu (Jira)
Xiaochang Wu created SPARK-31454: Summary: An optimized K-Means based on DenseMatrix and GEMM Key: SPARK-31454 URL: https://issues.apache.org/jira/browse/SPARK-31454 Project: Spark Issue

[jira] [Resolved] (SPARK-31428) Document Common Table Expression in SQL Reference

2020-04-15 Thread Takeshi Yamamuro (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-31428. -- Fix Version/s: 3.0.0 Assignee: Huaxin Gao Resolution: Fixed Resolved

[jira] [Commented] (SPARK-31380) Peak Execution Memory Quantile is not displayed in Spark History Server UI

2020-04-15 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084431#comment-17084431 ] Baohe Zhang commented on SPARK-31380: - Did you run your application with Spark 3? I tested it in my

[jira] [Updated] (SPARK-31380) Peak Execution Memory Quantile is not displayed in Spark History Server UI

2020-04-15 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-31380: Attachment: image-2020-04-15-18-16-18-254.png > Peak Execution Memory Quantile is not displayed

[jira] [Comment Edited] (SPARK-31236) Spark error while consuming data from Kinesis direct end point

2020-04-15 Thread Thukarama Prabhu (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084419#comment-17084419 ] Thukarama Prabhu edited comment on SPARK-31236 at 4/15/20, 10:45 PM: -

[jira] [Commented] (SPARK-31236) Spark error while consuming data from Kinesis direct end point

2020-04-15 Thread Thukarama Prabhu (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084419#comment-17084419 ] Thukarama Prabhu commented on SPARK-31236: -- Looks like this requires major code changes. Spark

[jira] [Assigned] (SPARK-31399) Closure cleaner broken in Scala 2.12

2020-04-15 Thread Xiao Li (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-31399: --- Assignee: Kris Mok > Closure cleaner broken in Scala 2.12 > >

[jira] [Commented] (SPARK-31399) Closure cleaner broken in Scala 2.12

2020-04-15 Thread Xiao Li (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084356#comment-17084356 ] Xiao Li commented on SPARK-31399: - [~rednaxelafx] will help this ticket and do more investigation.  >

[jira] [Commented] (SPARK-31423) DATES and TIMESTAMPS for a certain range are off by 10 days when stored in ORC

2020-04-15 Thread Maxim Gekk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084308#comment-17084308 ] Maxim Gekk commented on SPARK-31423: [~bersprockets] I think we should take the next valid date for

[jira] [Commented] (SPARK-31423) DATES and TIMESTAMPS for a certain range are off by 10 days when stored in ORC

2020-04-15 Thread Bruce Robbins (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084288#comment-17084288 ] Bruce Robbins commented on SPARK-31423: --- Thanks. It seems we can either - close this as "not a

[jira] [Commented] (SPARK-31423) DATES and TIMESTAMPS for a certain range are off by 10 days when stored in ORC

2020-04-15 Thread Wenchen Fan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084234#comment-17084234 ] Wenchen Fan commented on SPARK-31423: - I hope the ORC community can figure this out and switch

[jira] [Commented] (SPARK-31371) FileStreamSource: Decide seen files on the checksum, instead of filename.

2020-04-15 Thread Gabor Somogyi (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084223#comment-17084223 ] Gabor Somogyi commented on SPARK-31371: --- There was a similar feature request before and the

[jira] [Created] (SPARK-31453) Error while converting JavaRDD to Dataframe

2020-04-15 Thread Sachit Sharma (Jira)
Sachit Sharma created SPARK-31453: - Summary: Error while converting JavaRDD to Dataframe Key: SPARK-31453 URL: https://issues.apache.org/jira/browse/SPARK-31453 Project: Spark Issue Type:

[jira] [Commented] (SPARK-31423) DATES and TIMESTAMPS for a certain range are off by 10 days when stored in ORC

2020-04-15 Thread Bruce Robbins (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084218#comment-17084218 ] Bruce Robbins commented on SPARK-31423: --- OK, so this is a case of a limitation of the ORC library,

[jira] [Assigned] (SPARK-25440) Dump query execution info to a file

2020-04-15 Thread Xiao Li (Jira)
[ https://issues.apache.org/jira/browse/SPARK-25440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-25440: --- Assignee: Maxim Gekk > Dump query execution info to a file > --- >

[jira] [Created] (SPARK-31452) Do not create partition spec for 0-size partitions

2020-04-15 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-31452: --- Summary: Do not create partition spec for 0-size partitions Key: SPARK-31452 URL: https://issues.apache.org/jira/browse/SPARK-31452 Project: Spark Issue Type:

[jira] [Created] (SPARK-31451) Kafka connector does not retry in case of RetriableException

2020-04-15 Thread Chaoran Yu (Jira)
Chaoran Yu created SPARK-31451: -- Summary: Kafka connector does not retry in case of RetriableException Key: SPARK-31451 URL: https://issues.apache.org/jira/browse/SPARK-31451 Project: Spark

[jira] [Commented] (SPARK-31437) Try assigning tasks to existing executors by which required resources in ResourceProfile are satisfied

2020-04-15 Thread Thomas Graves (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084086#comment-17084086 ] Thomas Graves commented on SPARK-31437: --- so there are multiple reasons they are tied together for

[jira] [Commented] (SPARK-30364) The spark-streaming-kafka-0-10_2.11 test cases are failing on ppc64le

2020-04-15 Thread Nick Hryhoriev (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084033#comment-17084033 ] Nick Hryhoriev commented on SPARK-30364: I have the same issue with the spark app on Mac OS with

[jira] [Resolved] (SPARK-31394) Support for Kubernetes NFS volume mounts

2020-04-15 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-31394. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 27364

[jira] [Assigned] (SPARK-31394) Support for Kubernetes NFS volume mounts

2020-04-15 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-31394: - Assignee: Seongjin Cho > Support for Kubernetes NFS volume mounts >

[jira] [Commented] (SPARK-31437) Try assigning tasks to existing executors by which required resources in ResourceProfile are satisfied

2020-04-15 Thread Hongze Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083959#comment-17083959 ] Hongze Zhang commented on SPARK-31437: -- Thanks [~tgraves] and I too found some related discussion

[jira] [Updated] (SPARK-31429) Add additional fields in ExpressionDescription for more granular category in documentation

2020-04-15 Thread Takeshi Yamamuro (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-31429: - Issue Type: Improvement (was: Bug) > Add additional fields in ExpressionDescription

[jira] [Commented] (SPARK-26385) YARN - Spark Stateful Structured streaming HDFS_DELEGATION_TOKEN not found in cache

2020-04-15 Thread Zhou Jiashuai (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083937#comment-17083937 ] Zhou Jiashuai commented on SPARK-26385: --- [~gsomogyi] There is no unusual log detected in

[jira] [Updated] (SPARK-31394) Support for Kubernetes NFS volume mounts

2020-04-15 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31394: -- Affects Version/s: (was: 3) 3.1.0 > Support for Kubernetes NFS

[jira] [Updated] (SPARK-31394) Support for Kubernetes NFS volume mounts

2020-04-15 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31394: -- Shepherd: (was: Dongjoon Hyun) > Support for Kubernetes NFS volume mounts >

[jira] [Updated] (SPARK-31394) Support for Kubernetes NFS volume mounts

2020-04-15 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31394: -- Priority: Major (was: Minor) > Support for Kubernetes NFS volume mounts >

[jira] [Created] (SPARK-31450) Make ExpressionEncoder thread safe

2020-04-15 Thread Jira
Herman van Hövell created SPARK-31450: - Summary: Make ExpressionEncoder thread safe Key: SPARK-31450 URL: https://issues.apache.org/jira/browse/SPARK-31450 Project: Spark Issue Type:

[jira] [Commented] (SPARK-27296) Efficient User Defined Aggregators

2020-04-15 Thread Wenchen Fan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-27296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083901#comment-17083901 ] Wenchen Fan commented on SPARK-27296: - This feature is to speed up UDAF by using Aggregator, but not

[jira] [Issue Comment Deleted] (SPARK-31226) SizeBasedCoalesce logic error

2020-04-15 Thread Sathyaprakash Govindasamy (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sathyaprakash Govindasamy updated SPARK-31226: -- Comment: was deleted (was: [~angerszhuuu] Could you please share more

[jira] [Commented] (SPARK-31226) SizeBasedCoalesce logic error

2020-04-15 Thread Sathyaprakash Govindasamy (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083890#comment-17083890 ] Sathyaprakash Govindasamy commented on SPARK-31226: --- [~angerszhuuu] Could you please

[jira] [Updated] (SPARK-31447) DATE_PART functions produces incorrect result

2020-04-15 Thread Sathyaprakash Govindasamy (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sathyaprakash Govindasamy updated SPARK-31447: -- Priority: Minor (was: Major) > DATE_PART functions produces

[jira] [Commented] (SPARK-26385) YARN - Spark Stateful Structured streaming HDFS_DELEGATION_TOKEN not found in cache

2020-04-15 Thread Gabor Somogyi (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083888#comment-17083888 ] Gabor Somogyi commented on SPARK-26385: --- [~zjiash] Please don't forget to attach full driver AND

[jira] [Commented] (SPARK-26385) YARN - Spark Stateful Structured streaming HDFS_DELEGATION_TOKEN not found in cache

2020-04-15 Thread Zhou Jiashuai (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083871#comment-17083871 ] Zhou Jiashuai commented on SPARK-26385: --- We use yarn-cluster mode and I will set

[jira] [Commented] (SPARK-26646) Flaky test: pyspark.mllib.tests.test_streaming_algorithms StreamingLogisticRegressionWithSGDTests.test_training_and_prediction

2020-04-15 Thread Jungtaek Lim (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083856#comment-17083856 ] Jungtaek Lim commented on SPARK-26646: -- Looks like still happening on master branch

[jira] [Commented] (SPARK-29222) Flaky test: pyspark.mllib.tests.test_streaming_algorithms.StreamingLinearRegressionWithTests.test_parameter_convergence

2020-04-15 Thread Jungtaek Lim (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083857#comment-17083857 ] Jungtaek Lim commented on SPARK-29222: -- Still happening on master (3.1.0-SNAPSHOT)

[jira] [Comment Edited] (SPARK-29137) Flaky test: pyspark.mllib.tests.test_streaming_algorithms.StreamingLinearRegressionWithTests.test_train_prediction

2020-04-15 Thread Jungtaek Lim (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083855#comment-17083855 ] Jungtaek Lim edited comment on SPARK-29137 at 4/15/20, 6:58 AM: Still

[jira] [Commented] (SPARK-29137) Flaky test: pyspark.mllib.tests.test_streaming_algorithms.StreamingLinearRegressionWithTests.test_train_prediction

2020-04-15 Thread Jungtaek Lim (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083855#comment-17083855 ] Jungtaek Lim commented on SPARK-29137: -- Still valid on latest master (3.1.0-SNAPSHOT).

[jira] [Created] (SPARK-31449) Is there a difference between JDK and Spark's time zone offset calculation

2020-04-15 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-31449: -- Summary: Is there a difference between JDK and Spark's time zone offset calculation Key: SPARK-31449 URL: https://issues.apache.org/jira/browse/SPARK-31449 Project:

[jira] [Commented] (SPARK-26385) YARN - Spark Stateful Structured streaming HDFS_DELEGATION_TOKEN not found in cache

2020-04-15 Thread Jungtaek Lim (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083849#comment-17083849 ] Jungtaek Lim commented on SPARK-26385: -- Probably you may need to share the entire log messages

[jira] [Created] (SPARK-31448) Difference in Storage Levels used in cache() and persist() for pyspark dataframes

2020-04-15 Thread Abhishek Dixit (Jira)
Abhishek Dixit created SPARK-31448: -- Summary: Difference in Storage Levels used in cache() and persist() for pyspark dataframes Key: SPARK-31448 URL: https://issues.apache.org/jira/browse/SPARK-31448

[jira] [Resolved] (SPARK-31443) Perf regression of toJavaDate

2020-04-15 Thread Wenchen Fan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31443. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28212

[jira] [Assigned] (SPARK-31443) Perf regression of toJavaDate

2020-04-15 Thread Wenchen Fan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-31443: --- Assignee: Maxim Gekk > Perf regression of toJavaDate > - > >

[jira] [Commented] (SPARK-31423) DATES and TIMESTAMPS for a certain range are off by 10 days when stored in ORC

2020-04-15 Thread Wenchen Fan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083828#comment-17083828 ] Wenchen Fan commented on SPARK-31423: - We probably have a bug about picking the next valid date, but

[jira] [Commented] (SPARK-31423) DATES and TIMESTAMPS for a certain range are off by 10 days when stored in ORC

2020-04-15 Thread Wenchen Fan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083822#comment-17083822 ] Wenchen Fan commented on SPARK-31423: - The ORC file format spec doesn't specify the calendar, but