[jira] [Updated] (SPARK-37016) Publicise UpperCaseCharStream
[ https://issues.apache.org/jira/browse/SPARK-37016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dohongdayi updated SPARK-37016: --- Fix Version/s: 3.3.0 3.2.1 3.0.4 3.1.3 2.4.9 > Publicise UpperCaseCharStream > - > > Key: SPARK-37016 > URL: https://issues.apache.org/jira/browse/SPARK-37016 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.3, 2.3.4, 2.4.8, 3.0.3, 3.1.1, 3.1.2, 3.2.0 >Reporter: dohongdayi >Priority: Major > Fix For: 2.4.9, 3.1.3, 3.0.4, 3.2.1, 3.3.0 > > > Many Spark extension projects are copying `UpperCaseCharStream` because it is > private beneath `parser` package, such as: > [Hudi|https://github.com/apache/hudi/blob/3f8ca1a3552bb866163d3b1648f68d9c4824e21d/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/parser/HoodieCommonSqlParser.scala#L112] > [Iceberg|https://github.com/apache/iceberg/blob/c3ac4c6ca74a0013b4705d5bd5d17fade8e6f499/spark3-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSparkSqlExtensionsParser.scala#L175] > [Delta > Lake|https://github.com/delta-io/delta/blob/625de3b305f109441ad04b20dba91dd6c4e1d78e/core/src/main/scala/io/delta/sql/parser/DeltaSqlParser.scala#L290] > [Submarine|https://github.com/apache/submarine/blob/2faebb8efd69833853f62d89b4f1fea1b1718148/submarine-security/spark-security/src/main/scala/org/apache/submarine/spark/security/parser/UpperCaseCharStream.scala#L31] > [Kyuubi|https://github.com/apache/incubator-kyuubi/blob/8a5134e3223844714fc58833a6859d4df5b68d57/dev/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/zorder/ZorderSparkSqlExtensionsParserBase.scala#L108] > [Spark-ACID|https://github.com/qubole/spark-acid/blob/19bd6db757677c40f448e85c74d9995ba97d5942/src/main/scala/com/qubole/spark/datasources/hiveacid/sql/catalyst/parser/ParseDriver.scala#L13] > We can publicise `UpperCaseCharStream` to eliminate code duplication. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36980) Insert support query with CTE
[ https://issues.apache.org/jira/browse/SPARK-36980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-36980. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34252 [https://github.com/apache/spark/pull/34252] > Insert support query with CTE > - > > Key: SPARK-36980 > URL: https://issues.apache.org/jira/browse/SPARK-36980 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.1.2, 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.3.0 > > > INSERT INTO t_delta (WITH v1(c1) as (values (1)) select 1, 2,3 from v1); OK > INSERT INTO t_delta WITH v1(c1) as (values (1)) select 1, 2,3 from v1; FAIL -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36980) Insert support query with CTE
[ https://issues.apache.org/jira/browse/SPARK-36980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-36980: --- Assignee: angerszhu > Insert support query with CTE > - > > Key: SPARK-36980 > URL: https://issues.apache.org/jira/browse/SPARK-36980 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.1.2, 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > > INSERT INTO t_delta (WITH v1(c1) as (values (1)) select 1, 2,3 from v1); OK > INSERT INTO t_delta WITH v1(c1) as (values (1)) select 1, 2,3 from v1; FAIL -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37016) Publicise UpperCaseCharStream
[ https://issues.apache.org/jira/browse/SPARK-37016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37016: Assignee: Apache Spark > Publicise UpperCaseCharStream > - > > Key: SPARK-37016 > URL: https://issues.apache.org/jira/browse/SPARK-37016 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.3, 2.3.4, 2.4.8, 3.0.3, 3.1.1, 3.1.2, 3.2.0 >Reporter: dohongdayi >Assignee: Apache Spark >Priority: Major > > Many Spark extension projects are copying `UpperCaseCharStream` because it is > private beneath `parser` package, such as: > [Hudi|https://github.com/apache/hudi/blob/3f8ca1a3552bb866163d3b1648f68d9c4824e21d/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/parser/HoodieCommonSqlParser.scala#L112] > [Iceberg|https://github.com/apache/iceberg/blob/c3ac4c6ca74a0013b4705d5bd5d17fade8e6f499/spark3-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSparkSqlExtensionsParser.scala#L175] > [Delta > Lake|https://github.com/delta-io/delta/blob/625de3b305f109441ad04b20dba91dd6c4e1d78e/core/src/main/scala/io/delta/sql/parser/DeltaSqlParser.scala#L290] > [Submarine|https://github.com/apache/submarine/blob/2faebb8efd69833853f62d89b4f1fea1b1718148/submarine-security/spark-security/src/main/scala/org/apache/submarine/spark/security/parser/UpperCaseCharStream.scala#L31] > [Kyuubi|https://github.com/apache/incubator-kyuubi/blob/8a5134e3223844714fc58833a6859d4df5b68d57/dev/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/zorder/ZorderSparkSqlExtensionsParserBase.scala#L108] > [Spark-ACID|https://github.com/qubole/spark-acid/blob/19bd6db757677c40f448e85c74d9995ba97d5942/src/main/scala/com/qubole/spark/datasources/hiveacid/sql/catalyst/parser/ParseDriver.scala#L13] > We can publicise `UpperCaseCharStream` to eliminate code duplication. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37016) Publicise UpperCaseCharStream
[ https://issues.apache.org/jira/browse/SPARK-37016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37016: Assignee: (was: Apache Spark) > Publicise UpperCaseCharStream > - > > Key: SPARK-37016 > URL: https://issues.apache.org/jira/browse/SPARK-37016 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.3, 2.3.4, 2.4.8, 3.0.3, 3.1.1, 3.1.2, 3.2.0 >Reporter: dohongdayi >Priority: Major > > Many Spark extension projects are copying `UpperCaseCharStream` because it is > private beneath `parser` package, such as: > [Hudi|https://github.com/apache/hudi/blob/3f8ca1a3552bb866163d3b1648f68d9c4824e21d/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/parser/HoodieCommonSqlParser.scala#L112] > [Iceberg|https://github.com/apache/iceberg/blob/c3ac4c6ca74a0013b4705d5bd5d17fade8e6f499/spark3-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSparkSqlExtensionsParser.scala#L175] > [Delta > Lake|https://github.com/delta-io/delta/blob/625de3b305f109441ad04b20dba91dd6c4e1d78e/core/src/main/scala/io/delta/sql/parser/DeltaSqlParser.scala#L290] > [Submarine|https://github.com/apache/submarine/blob/2faebb8efd69833853f62d89b4f1fea1b1718148/submarine-security/spark-security/src/main/scala/org/apache/submarine/spark/security/parser/UpperCaseCharStream.scala#L31] > [Kyuubi|https://github.com/apache/incubator-kyuubi/blob/8a5134e3223844714fc58833a6859d4df5b68d57/dev/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/zorder/ZorderSparkSqlExtensionsParserBase.scala#L108] > [Spark-ACID|https://github.com/qubole/spark-acid/blob/19bd6db757677c40f448e85c74d9995ba97d5942/src/main/scala/com/qubole/spark/datasources/hiveacid/sql/catalyst/parser/ParseDriver.scala#L13] > We can publicise `UpperCaseCharStream` to eliminate code duplication. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37016) Publicise UpperCaseCharStream
[ https://issues.apache.org/jira/browse/SPARK-37016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429117#comment-17429117 ] Apache Spark commented on SPARK-37016: -- User 'dohongdayi' has created a pull request for this issue: https://github.com/apache/spark/pull/34290 > Publicise UpperCaseCharStream > - > > Key: SPARK-37016 > URL: https://issues.apache.org/jira/browse/SPARK-37016 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.3, 2.3.4, 2.4.8, 3.0.3, 3.1.1, 3.1.2, 3.2.0 >Reporter: dohongdayi >Priority: Major > > Many Spark extension projects are copying `UpperCaseCharStream` because it is > private beneath `parser` package, such as: > [Hudi|https://github.com/apache/hudi/blob/3f8ca1a3552bb866163d3b1648f68d9c4824e21d/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/parser/HoodieCommonSqlParser.scala#L112] > [Iceberg|https://github.com/apache/iceberg/blob/c3ac4c6ca74a0013b4705d5bd5d17fade8e6f499/spark3-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSparkSqlExtensionsParser.scala#L175] > [Delta > Lake|https://github.com/delta-io/delta/blob/625de3b305f109441ad04b20dba91dd6c4e1d78e/core/src/main/scala/io/delta/sql/parser/DeltaSqlParser.scala#L290] > [Submarine|https://github.com/apache/submarine/blob/2faebb8efd69833853f62d89b4f1fea1b1718148/submarine-security/spark-security/src/main/scala/org/apache/submarine/spark/security/parser/UpperCaseCharStream.scala#L31] > [Kyuubi|https://github.com/apache/incubator-kyuubi/blob/8a5134e3223844714fc58833a6859d4df5b68d57/dev/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/zorder/ZorderSparkSqlExtensionsParserBase.scala#L108] > [Spark-ACID|https://github.com/qubole/spark-acid/blob/19bd6db757677c40f448e85c74d9995ba97d5942/src/main/scala/com/qubole/spark/datasources/hiveacid/sql/catalyst/parser/ParseDriver.scala#L13] > We can publicise `UpperCaseCharStream` to eliminate code duplication. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37016) Publicise UpperCaseCharStream
[ https://issues.apache.org/jira/browse/SPARK-37016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429114#comment-17429114 ] dohongdayi commented on SPARK-37016: I have submitted a PR [https://github.com/apache/spark/pull/34290|https://github.com/apache/spark/pull/34290] > Publicise UpperCaseCharStream > - > > Key: SPARK-37016 > URL: https://issues.apache.org/jira/browse/SPARK-37016 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.3, 2.3.4, 2.4.8, 3.0.3, 3.1.1, 3.1.2, 3.2.0 >Reporter: dohongdayi >Priority: Major > > Many Spark extension projects are copying `UpperCaseCharStream` because it is > private beneath `parser` package, such as: > [Hudi|https://github.com/apache/hudi/blob/3f8ca1a3552bb866163d3b1648f68d9c4824e21d/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/parser/HoodieCommonSqlParser.scala#L112] > [Iceberg|https://github.com/apache/iceberg/blob/c3ac4c6ca74a0013b4705d5bd5d17fade8e6f499/spark3-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSparkSqlExtensionsParser.scala#L175] > [Delta > Lake|https://github.com/delta-io/delta/blob/625de3b305f109441ad04b20dba91dd6c4e1d78e/core/src/main/scala/io/delta/sql/parser/DeltaSqlParser.scala#L290] > [Submarine|https://github.com/apache/submarine/blob/2faebb8efd69833853f62d89b4f1fea1b1718148/submarine-security/spark-security/src/main/scala/org/apache/submarine/spark/security/parser/UpperCaseCharStream.scala#L31] > [Kyuubi|https://github.com/apache/incubator-kyuubi/blob/8a5134e3223844714fc58833a6859d4df5b68d57/dev/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/zorder/ZorderSparkSqlExtensionsParserBase.scala#L108] > [Spark-ACID|https://github.com/qubole/spark-acid/blob/19bd6db757677c40f448e85c74d9995ba97d5942/src/main/scala/com/qubole/spark/datasources/hiveacid/sql/catalyst/parser/ParseDriver.scala#L13] > We can publicise `UpperCaseCharStream` to eliminate code duplication. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37016) Publicise UpperCaseCharStream
dohongdayi created SPARK-37016: -- Summary: Publicise UpperCaseCharStream Key: SPARK-37016 URL: https://issues.apache.org/jira/browse/SPARK-37016 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0, 3.1.2, 3.1.1, 3.0.3, 2.4.8, 2.3.4, 2.2.3 Reporter: dohongdayi Many Spark extension projects are copying `UpperCaseCharStream` because it is private beneath `parser` package, such as: [Hudi|https://github.com/apache/hudi/blob/3f8ca1a3552bb866163d3b1648f68d9c4824e21d/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/parser/HoodieCommonSqlParser.scala#L112] [Iceberg|https://github.com/apache/iceberg/blob/c3ac4c6ca74a0013b4705d5bd5d17fade8e6f499/spark3-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSparkSqlExtensionsParser.scala#L175] [Delta Lake|https://github.com/delta-io/delta/blob/625de3b305f109441ad04b20dba91dd6c4e1d78e/core/src/main/scala/io/delta/sql/parser/DeltaSqlParser.scala#L290] [Submarine|https://github.com/apache/submarine/blob/2faebb8efd69833853f62d89b4f1fea1b1718148/submarine-security/spark-security/src/main/scala/org/apache/submarine/spark/security/parser/UpperCaseCharStream.scala#L31] [Kyuubi|https://github.com/apache/incubator-kyuubi/blob/8a5134e3223844714fc58833a6859d4df5b68d57/dev/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/zorder/ZorderSparkSqlExtensionsParserBase.scala#L108] [Spark-ACID|https://github.com/qubole/spark-acid/blob/19bd6db757677c40f448e85c74d9995ba97d5942/src/main/scala/com/qubole/spark/datasources/hiveacid/sql/catalyst/parser/ParseDriver.scala#L13] We can publicise `UpperCaseCharStream` to eliminate code duplication. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37014) Inline type hints for python/pyspark/streaming/context.py
[ https://issues.apache.org/jira/browse/SPARK-37014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429112#comment-17429112 ] dch nguyen commented on SPARK-37014: working on this > Inline type hints for python/pyspark/streaming/context.py > - > > Key: SPARK-37014 > URL: https://issues.apache.org/jira/browse/SPARK-37014 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37015) Inline type hints for python/pyspark/streaming/dstream.py
[ https://issues.apache.org/jira/browse/SPARK-37015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429111#comment-17429111 ] dch nguyen commented on SPARK-37015: working on this > Inline type hints for python/pyspark/streaming/dstream.py > - > > Key: SPARK-37015 > URL: https://issues.apache.org/jira/browse/SPARK-37015 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37015) Inline type hints for python/pyspark/streaming/dstream.py
dch nguyen created SPARK-37015: -- Summary: Inline type hints for python/pyspark/streaming/dstream.py Key: SPARK-37015 URL: https://issues.apache.org/jira/browse/SPARK-37015 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3.0 Reporter: dch nguyen -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37014) Inline type hints for python/pyspark/streaming/context.py
dch nguyen created SPARK-37014: -- Summary: Inline type hints for python/pyspark/streaming/context.py Key: SPARK-37014 URL: https://issues.apache.org/jira/browse/SPARK-37014 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3.0 Reporter: dch nguyen -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37013) `select format_string('%0$s', 'Hello')` has different behavior when using java 8 and Java 17
Yang Jie created SPARK-37013: Summary: `select format_string('%0$s', 'Hello')` has different behavior when using java 8 and Java 17 Key: SPARK-37013 URL: https://issues.apache.org/jira/browse/SPARK-37013 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: Yang Jie {code:java} --PostgreSQL throw ERROR: format specifies argument 0, but arguments are numbered from 1 select format_string('%0$s', 'Hello'); {code} Execute with Java 8 {code:java} -- !query select format_string('%0$s', 'Hello') -- !query schema struct -- !query output Hello {code} Execute with Java 11 {code:java} -- !query select format_string('%0$s', 'Hello') -- !query schema struct<> -- !query output java.util.IllegalFormatArgumentIndexException Illegal format argument index = 0 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36964) Reuse CachedDNSToSwitchMapping for yarn container requests
[ https://issues.apache.org/jira/browse/SPARK-36964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-36964: --- Description: Similar to SPARK-13704, In some cases, YarnAllocator add container requests with locality preference can be expensive, it may call the topology script for rack awareness. When submit a very large job in a very large Yarn cluster, the topology script may take signifiant time to run. And this blocks receiving YarnSchedulerBackend's RequestExecutors rpc calls, This request comes from spark dynamic executor allocation thread, which may blocks the ExecutorAllocationListener, and then result in executorManagement queue backlog. Some logs: {code:java} 21/09/29 12:04:35 INFO spark-dynamic-executor-allocation ExecutorAllocationManager: Error reaching cluster manager.21/09/29 12:04:35 INFO spark-dynamic-executor-allocation ExecutorAllocationManager: Error reaching cluster manager.org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:839) at org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:411) at org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:361) at org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:316) at org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:227) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263) at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:294) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) ... 12 more21/09/29 12:04:35 WARN spark-dynamic-executor-allocation ExecutorAllocationManager: Unable to reach the cluster manager to request 1922 total executors! 21/09/29 12:04:35 INFO spark-dynamic-executor-allocation ExecutorAllocationManager: Error reaching cluster manager.21/09/29 12:04:35 INFO spark-dynamic-executor-allocation ExecutorAllocationManager: Error reaching cluster manager.org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:839) at org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:411) at org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:361) at org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:316) at org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:227) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThrea
[jira] [Updated] (SPARK-36964) Reuse CachedDNSToSwitchMapping for yarn container requests
[ https://issues.apache.org/jira/browse/SPARK-36964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-36964: --- Description: Similar to SPARK-13704, In some cases, YarnAllocator add or remove container requests can be expensive, it may call the topology script for rack awareness. When submit a very large job in a very large Yarn cluster, the topology script may take signifiant time to run. And this blocks receiving YarnSchedulerBackend's RequestExecutors rpc calls, This request comes from spark dynamic executor allocation thread, which may blocks the ExecutorAllocationListener, {code} 12:04:35 INFO spark-dynamic-executor-allocation ExecutorAllocationManager: Error reaching cluster manager.21/09/29 12:04:35 INFO spark-dynamic-executor-allocation ExecutorAllocationManager: Error reaching cluster manager.org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:839) at org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:411) at org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:361) at org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:316) at org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:227) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263) at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:294) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) ... 12 more21/09/29 12:04:35 WARN spark-dynamic-executor-allocation ExecutorAllocationManager: Unable to reach the cluster manager to request 1922 total executors!{code} and then result in executorManagement queue backlog. e.g. some log: {code} 21/09/29 12:02:49 ERROR dag-scheduler-event-loop AsyncEventQueue: Dropping event from queue executorManagement. This likely means one of the listeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler. 21/09/29 12:02:49 WARN dag-scheduler-event-loop AsyncEventQueue: Dropped 1 events from executorManagement since the application started. 21/09/29 12:02:55 INFO spark-listener-group-eventLog AsyncEventQueue: Process of event SparkListenerExecutorAdded(1632888172920,543,org.apache.spark.scheduler.cluster.ExecutorData@8cfab8f5,None) by listener EventLoggingListener took 3.037686034s. 21/09/29 12:03:03 INFO spark-listener-group-eventLog AsyncEventQueue: Process of event SparkListenerBlockManagerAdded(1632888181779,BlockManagerId(1359, --, 57233, None),2704696934,Some(2704696934),Some(0)) by listener EventLoggingListener took 1.462598355s. 21/09/29 12:03:49 WARN dispatcher-BlockManagerMaster AsyncEventQueue: Dropped 74388 events from executorManagement since Wed Sep 29 12:02:49 CST 2021. 21/09/29 12:04:35 INFO spark-listener-group-executorManagement AsyncEventQueue: Process of event SparkListenerStageSubmitted(org.apache.spark.scheduler.StageInfo@52f810ad,{...}) by listener ExecutorAllocationListener took 116.526408932s. 21/09/29 12:04:49 WARN heartbeat-receiver-event-loop-thread AsyncEventQueue: Dropped 18892 events from executorManagement since Wed Sep 29 12:03:49 CST 2021. 21/09/29 12:05:49 WARN dispatcher-BlockManagerMaster AsyncEventQueue: Dropped 19397 events from executorManagement since Wed Sep 29 12:04:49 CST 2021. {code} was: Similar to SPARK-13704, In some cases, YarnAllocator add or r
[jira] [Updated] (SPARK-36964) Reuse CachedDNSToSwitchMapping for yarn container requests
[ https://issues.apache.org/jira/browse/SPARK-36964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-36964: --- Description: Similar to SPARK-13704, In some cases, YarnAllocator add or remove container requests can be expensive, it may call the topology script for rack awareness. When submit a very large job in a very large Yarn cluster, the topology script may take signifiant time to run. And this blocks receiving YarnSchedulerBackend's RequestExecutors rpc calls, This request comes from spark dynamic executor allocation thread, which may blocks the ExecutorAllocationListener, {code:text} 21/09/29 12:04:35 INFO spark-dynamic-executor-allocation ExecutorAllocationManager: Error reaching cluster manager.21/09/29 12:04:35 INFO spark-dynamic-executor-allocation ExecutorAllocationManager: Error reaching cluster manager.org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:839) at org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:411) at org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:361) at org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:316) at org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:227) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263) at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:294) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) ... 12 more21/09/29 12:04:35 WARN spark-dynamic-executor-allocation ExecutorAllocationManager: Unable to reach the cluster manager to request 1922 total executors!{code} and then result in executorManagement queue backlog. e.g. some log: {code:text} 21/09/29 12:02:49 ERROR dag-scheduler-event-loop AsyncEventQueue: Dropping event from queue executorManagement. This likely means one of the listeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler. 21/09/29 12:02:49 WARN dag-scheduler-event-loop AsyncEventQueue: Dropped 1 events from executorManagement since the application started. 21/09/29 12:02:55 INFO spark-listener-group-eventLog AsyncEventQueue: Process of event SparkListenerExecutorAdded(1632888172920,543,org.apache.spark.scheduler.cluster.ExecutorData@8cfab8f5,None) by listener EventLoggingListener took 3.037686034s. 21/09/29 12:03:03 INFO spark-listener-group-eventLog AsyncEventQueue: Process of event SparkListenerBlockManagerAdded(1632888181779,BlockManagerId(1359, --, 57233, None),2704696934,Some(2704696934),Some(0)) by listener EventLoggingListener took 1.462598355s. 21/09/29 12:03:49 WARN dispatcher-BlockManagerMaster AsyncEventQueue: Dropped 74388 events from executorManagement since Wed Sep 29 12:02:49 CST 2021. 21/09/29 12:04:35 INFO spark-listener-group-executorManagement AsyncEventQueue: Process of event SparkListenerStageSubmitted(org.apache.spark.scheduler.StageInfo@52f810ad,{...}) by listener ExecutorAllocationListener took 116.526408932s. 21/09/29 12:04:49 WARN heartbeat-receiver-event-loop-thread AsyncEventQueue: Dropped 18892 events from executorManagement since Wed Sep 29 12:03:49 CST 2021. 21/09/29 12:05:49 WARN dispatcher-BlockManagerMaster AsyncEventQueue: Dropped 19397 events from executorManagement since Wed Sep 29 12:04:49 CST 2021. {code} was: Similar to SPARK-13704, In some cases,
[jira] [Assigned] (SPARK-36945) Inline type hints for python/pyspark/sql/udf.py
[ https://issues.apache.org/jira/browse/SPARK-36945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36945: Assignee: Apache Spark > Inline type hints for python/pyspark/sql/udf.py > --- > > Key: SPARK-36945 > URL: https://issues.apache.org/jira/browse/SPARK-36945 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36945) Inline type hints for python/pyspark/sql/udf.py
[ https://issues.apache.org/jira/browse/SPARK-36945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36945: Assignee: (was: Apache Spark) > Inline type hints for python/pyspark/sql/udf.py > --- > > Key: SPARK-36945 > URL: https://issues.apache.org/jira/browse/SPARK-36945 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36945) Inline type hints for python/pyspark/sql/udf.py
[ https://issues.apache.org/jira/browse/SPARK-36945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429100#comment-17429100 ] Apache Spark commented on SPARK-36945: -- User 'dchvn' has created a pull request for this issue: https://github.com/apache/spark/pull/34289 > Inline type hints for python/pyspark/sql/udf.py > --- > > Key: SPARK-36945 > URL: https://issues.apache.org/jira/browse/SPARK-36945 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36337) decimal('Nan') is unsupported in net.razorvine.pickle
[ https://issues.apache.org/jira/browse/SPARK-36337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-36337: Assignee: Yikun Jiang > decimal('Nan') is unsupported in net.razorvine.pickle > -- > > Key: SPARK-36337 > URL: https://issues.apache.org/jira/browse/SPARK-36337 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > > Decimal('NaN') is not supported by net.razorvine.pickle now. > In Python > {code:java} > >>> pickled = cloudpickle.dumps(decimal.Decimal('NaN')) > b'\x80\x05\x95!\x00\x00\x00\x00\x00\x00\x00\x8c\x07decimal\x94\x8c\x07Decimal\x94\x93\x94\x8c\x03NaN\x94\x85\x94R\x94.' > >>> pickle.loads(pickled) > Decimal('NaN') > {code} > In Scala > {code:java} > scala> import net.razorvine.pickle.\{Pickler, Unpickler, PickleUtils} > scala> val unpickle = new Unpickler > scala> > unpickle.loads(PickleUtils.str2bytes("\u0080\u0005\u0095!\u\u\u\u\u\u\u\u008c\u0007decimal\u0094\u008c\u0007Decimal\u0094\u0093\u0094\u008c\u0003NaN\u0094\u0085\u0094R\u0094.")) > net.razorvine.pickle.PickleException: problem construction object: > java.lang.reflect.InvocationTargetException > at > net.razorvine.pickle.objects.AnyClassConstructor.construct(AnyClassConstructor.java:29) > at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:773) > at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:213) > at net.razorvine.pickle.Unpickler.load(Unpickler.java:123) > at net.razorvine.pickle.Unpickler.loads(Unpickler.java:136) > ... 48 elided > {code} > I submit an issue in pickle upstream > [https://github.com/irmen/pickle/issues/7] . > we should bump pickle latest version after it fixed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36337) decimal('Nan') is unsupported in net.razorvine.pickle
[ https://issues.apache.org/jira/browse/SPARK-36337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36337. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34285 [https://github.com/apache/spark/pull/34285] > decimal('Nan') is unsupported in net.razorvine.pickle > -- > > Key: SPARK-36337 > URL: https://issues.apache.org/jira/browse/SPARK-36337 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > Fix For: 3.3.0 > > > Decimal('NaN') is not supported by net.razorvine.pickle now. > In Python > {code:java} > >>> pickled = cloudpickle.dumps(decimal.Decimal('NaN')) > b'\x80\x05\x95!\x00\x00\x00\x00\x00\x00\x00\x8c\x07decimal\x94\x8c\x07Decimal\x94\x93\x94\x8c\x03NaN\x94\x85\x94R\x94.' > >>> pickle.loads(pickled) > Decimal('NaN') > {code} > In Scala > {code:java} > scala> import net.razorvine.pickle.\{Pickler, Unpickler, PickleUtils} > scala> val unpickle = new Unpickler > scala> > unpickle.loads(PickleUtils.str2bytes("\u0080\u0005\u0095!\u\u\u\u\u\u\u\u008c\u0007decimal\u0094\u008c\u0007Decimal\u0094\u0093\u0094\u008c\u0003NaN\u0094\u0085\u0094R\u0094.")) > net.razorvine.pickle.PickleException: problem construction object: > java.lang.reflect.InvocationTargetException > at > net.razorvine.pickle.objects.AnyClassConstructor.construct(AnyClassConstructor.java:29) > at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:773) > at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:213) > at net.razorvine.pickle.Unpickler.load(Unpickler.java:123) > at net.razorvine.pickle.Unpickler.loads(Unpickler.java:136) > ... 48 elided > {code} > I submit an issue in pickle upstream > [https://github.com/irmen/pickle/issues/7] . > we should bump pickle latest version after it fixed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37012) Disable pinned thread mode by default
[ https://issues.apache.org/jira/browse/SPARK-37012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37012: Assignee: Apache Spark > Disable pinned thread mode by default > - > > Key: SPARK-37012 > URL: https://issues.apache.org/jira/browse/SPARK-37012 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Blocker > > Pinned thread mode was enabled by default in Spark 3.2 (SPARK-35303). > However, it causes some breaking changes such as SPARK-37004. Maybe we should > disable it by default for now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37012) Disable pinned thread mode by default
[ https://issues.apache.org/jira/browse/SPARK-37012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429086#comment-17429086 ] Apache Spark commented on SPARK-37012: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/34288 > Disable pinned thread mode by default > - > > Key: SPARK-37012 > URL: https://issues.apache.org/jira/browse/SPARK-37012 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Hyukjin Kwon >Priority: Blocker > > Pinned thread mode was enabled by default in Spark 3.2 (SPARK-35303). > However, it causes some breaking changes such as SPARK-37004. Maybe we should > disable it by default for now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37012) Disable pinned thread mode by default
[ https://issues.apache.org/jira/browse/SPARK-37012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37012: Assignee: (was: Apache Spark) > Disable pinned thread mode by default > - > > Key: SPARK-37012 > URL: https://issues.apache.org/jira/browse/SPARK-37012 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Hyukjin Kwon >Priority: Blocker > > Pinned thread mode was enabled by default in Spark 3.2 (SPARK-35303). > However, it causes some breaking changes such as SPARK-37004. Maybe we should > disable it by default for now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37010) Remove unnecessary "noqa: F401" comments in pandas-on-Spark
[ https://issues.apache.org/jira/browse/SPARK-37010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-37010. -- Fix Version/s: 3.3.0 Assignee: Takuya Ueshin Resolution: Fixed Fixed in https://github.com/apache/spark/pull/34287 > Remove unnecessary "noqa: F401" comments in pandas-on-Spark > --- > > Key: SPARK-37010 > URL: https://issues.apache.org/jira/browse/SPARK-37010 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 3.3.0 > > > After upgrading flake8 in Jenkins, there are still unnecessary {{noqa: F401}} > comments. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37005) pyspark os.getenv('SPARK_YARN_STAGING_DIR') can not get job path
[ https://issues.apache.org/jira/browse/SPARK-37005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WeiNan Zhao updated SPARK-37005: Description: when i submit a spark job , use spark-submit and set option --files file1, then in python code, i use {code:java} //代码占位符 path = str(os.environ["SPARK_YARN_STAGING_DIR"]) {code} return path is None, but this can success in java code {code:java} //代码占位符 spark.read().textFile(System.getenv("SPARK_YARN_STAGING_DIR") + "/README.md") {code} which cause this problem. was: hi, all i commit a spark job , use spark-submit and set option --files file1, then in code,i use {code:java} //代码占位符 path = str(os.environ["SPARK_YARN_STAGING_DIR"]) {code} but path is None, this can success in java code {code:java} //代码占位符 spark.read().textFile(System.getenv("SPARK_YARN_STAGING_DIR") + "/README.md") {code} which cause this problem. > pyspark os.getenv('SPARK_YARN_STAGING_DIR') can not get job path > > > Key: SPARK-37005 > URL: https://issues.apache.org/jira/browse/SPARK-37005 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.4 > Environment: python2.7 > spark2.4 >Reporter: WeiNan Zhao >Priority: Major > Labels: python, spark-core > > when i submit a spark job , use spark-submit and set option --files file1, > then in python code, i use > {code:java} > //代码占位符 > path = str(os.environ["SPARK_YARN_STAGING_DIR"]) > {code} > return path is None, but this can success in java code > {code:java} > //代码占位符 > spark.read().textFile(System.getenv("SPARK_YARN_STAGING_DIR") + "/README.md") > {code} > which cause this problem. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37006) MapStatus adds localDirs to avoid the rpc request by method getHostLocalDirs when shuffle reading
[ https://issues.apache.org/jira/browse/SPARK-37006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429079#comment-17429079 ] jinhai commented on SPARK-37006: Or whether we can generate localDirs based on appId and execId, just like DiskBlockManager.getFile, so that we don't need to save localDirs in MapStatus, just add appId. > MapStatus adds localDirs to avoid the rpc request by method getHostLocalDirs > when shuffle reading > - > > Key: SPARK-37006 > URL: https://issues.apache.org/jira/browse/SPARK-37006 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 3.1.2 >Reporter: jinhai >Priority: Major > > In shuffle reading, in order to get the hostLocalDirs value when executing > fetchHostLocalBlocks, we need ExternalBlockStoreClient or > NettyBlockTransferService to make a rpc request. > And when externalShuffleServiceEnabled, there is no need to registerExecutor > and so on in the ExternalShuffleBlockResolver class. > Throughout the spark shuffle module, a lot of code logic is written to deal > with localDirs. > We can directly add localDirs to the BlockManagerId class of MapStatus to get > datafile and indexfile. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36525) DS V2 Index Support
[ https://issues.apache.org/jira/browse/SPARK-36525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429077#comment-17429077 ] dch nguyen commented on SPARK-36525: [~huaxingao] yes, i'd like to > DS V2 Index Support > --- > > Key: SPARK-36525 > URL: https://issues.apache.org/jira/browse/SPARK-36525 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Priority: Major > > Many data sources support index to improvement query performance. In order to > take advantage of the index support in data source, the following APIs will > be added for working with indexes: > {code:java} > /** >* Creates an index. >* >* @param indexName the name of the index to be created >* @param indexType the IndexType of the index to be created >* @param table the table on which index to be created >* @param columns the columns on which index to be created >* @param properties the properties of the index to be created >* @throws IndexAlreadyExistsException If the index already exists > (optional) >* @throws UnsupportedOperationException If create index is not a supported > operation >*/ > void createIndex(String indexName, > String indexType, > Identifier table, > FieldReference[] columns, > Map properties) > throws IndexAlreadyExistsException, UnsupportedOperationException; > /** >* Soft deletes the index with the given name. >* Deleted index can be restored by calling restoreIndex. >* >* @param indexName the name of the index to be deleted >* @return true if the index is deleted >* @throws NoSuchIndexException If the index does not exist (optional) >* @throws UnsupportedOperationException If delete index is not a supported > operation >*/ > default boolean deleteIndex(String indexName) > throws NoSuchIndexException, UnsupportedOperationException > /** >* Checks whether an index exists. >* >* @param indexName the name of the index >* @return true if the index exists, false otherwise >*/ > boolean indexExists(String indexName); > /** >* Lists all the indexes in a table. >* >* @param table the table to be checked on for indexes >* @throws NoSuchTableException >*/ > Index[] listIndexes(Identifier table) throws NoSuchTableException; > /** >* Hard deletes the index with the given name. >* The Index can't be restored once dropped. >* >* @param indexName the name of the index to be dropped. >* @return true if the index is dropped >* @throws NoSuchIndexException If the index does not exist (optional) >* @throws UnsupportedOperationException If drop index is not a supported > operation >*/ > boolean dropIndex(String indexName) throws NoSuchIndexException, > UnsupportedOperationException; > /** >* Restores the index with the given name. >* Deleted index can be restored by calling restoreIndex, but dropped index > can't be restored. >* >* @param indexName the name of the index to be restored >* @return true if the index is restored >* @throws NoSuchIndexException If the index does not exist (optional) >* @throws UnsupportedOperationException >*/ > default boolean restoreIndex(String indexName) > throws NoSuchIndexException, UnsupportedOperationException > /** >* Refreshes index using the latest data. This causes the index to be > rebuilt. >* >* @param indexName the name of the index to be rebuilt >* @return true if the index is rebuilt >* @throws NoSuchIndexException If the index does not exist (optional) >* @throws UnsupportedOperationException >*/ > default boolean refreshIndex(String indexName) > throws NoSuchIndexException, UnsupportedOperationException > /** >* Alter Index using the new property. This causes the index to be rebuilt. >* >* @param indexName the name of the index to be altered >* @return true if the index is altered >* @throws NoSuchIndexException If the index does not exist (optional) >* @throws UnsupportedOperationException >*/ > default boolean alterIndex(String indexName, Properties properties) > throws NoSuchIndexException, UnsupportedOperationException > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37012) Disable pinned thread mode by default
[ https://issues.apache.org/jira/browse/SPARK-37012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-37012: - Priority: Blocker (was: Major) > Disable pinned thread mode by default > - > > Key: SPARK-37012 > URL: https://issues.apache.org/jira/browse/SPARK-37012 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Hyukjin Kwon >Priority: Blocker > > Pinned thread mode was enabled by default in Spark 3.2 (SPARK-35303). > However, it causes some breaking changes such as SPARK-37004. Maybe we should > disable it by default for now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37012) Disable pinned thread mode by default
Hyukjin Kwon created SPARK-37012: Summary: Disable pinned thread mode by default Key: SPARK-37012 URL: https://issues.apache.org/jira/browse/SPARK-37012 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.3.0 Reporter: Hyukjin Kwon Pinned thread mode was enabled by default in Spark 3.2 (SPARK-35303). However, it causes some breaking changes such as SPARK-37004. Maybe we should disable it by default for now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37011) Upgrade flake8 to 3.9.0 or above in Jenkins
[ https://issues.apache.org/jira/browse/SPARK-37011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-37011: -- Description: In flake8 < 3.9.0, F401 error occurs for imports when the imported identities are used in a {{bound}} argument in {{TypeVar(..., bound="XXX")}}. For example: {code:python} if TYPE_CHECKING: from pyspark.pandas.base import IndexOpsMixin IndexOpsLike = TypeVar("IndexOpsLike", bound="IndexOpsMixin") {code} Since this behavior is fixed In flake8 >= 3.9.0, we should upgrade the flake8 installed in Jenkins to 3.9.0 or above. And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from 3.8.0 to 3.9.0. was: In flake8 < 3.9.0, F401 error occurs for imports when the imported identities are used in a {{bound}} argument in {{TypeVar(..., bound="XXX")}}. For example: {code:python} if TYPE_CHECKING: from pyspark.pandas.base import IndexOpsMixin IndexOpsLike = TypeVar("IndexOpsLike", bound="IndexOpsMixin") {code} Since this behavior is fixed In flake8 >= 3.9.0, we should upgrade the flake8 installed in Jenkins to 3.9.0 or above. And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from 3.8.0 to 3.9.0. > Upgrade flake8 to 3.9.0 or above in Jenkins > --- > > Key: SPARK-37011 > URL: https://issues.apache.org/jira/browse/SPARK-37011 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Priority: Major > > In flake8 < 3.9.0, F401 error occurs for imports when the imported identities > are used in a {{bound}} argument in {{TypeVar(..., bound="XXX")}}. > For example: > {code:python} > if TYPE_CHECKING: > from pyspark.pandas.base import IndexOpsMixin > IndexOpsLike = TypeVar("IndexOpsLike", bound="IndexOpsMixin") > {code} > Since this behavior is fixed In flake8 >= 3.9.0, we should upgrade the flake8 > installed in Jenkins to 3.9.0 or above. > And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from > 3.8.0 to 3.9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37011) Upgrade flake8 to 3.9.0 or above in Jenkins
[ https://issues.apache.org/jira/browse/SPARK-37011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-37011: -- Description: In flake8 < 3.9.0, F401 error occurs for imports when the imported identities are used in a {{bound}} argument in {{TypeVar(..., bound="XXX")}}. For example: {code:python} if TYPE_CHECKING: from pyspark.pandas.base import IndexOpsMixin IndexOpsLike = TypeVar("IndexOpsLike", bound="IndexOpsMixin") {code} Since this behavior is fixed In flake8 >= 3.9.0, we should upgrade the flake8 installed in Jenkins to 3.9.0 or above. And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from 3.8.0 to 3.9.0. was: In flake8 < 3.9.0, F401 error occurs for imports when the impo Since this behavior is fixed In flake8 >= 3.9.0, we should upgrade the flake8 installed in Jenkins to 3.8.0 or above. Otherwise, it occurs F401 error for several lines in pandas-on-PySpark that uses TYPE_CHECKING. And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from 3.8.0 to 3.9.0. > Upgrade flake8 to 3.9.0 or above in Jenkins > --- > > Key: SPARK-37011 > URL: https://issues.apache.org/jira/browse/SPARK-37011 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Priority: Major > > In flake8 < 3.9.0, F401 error occurs for imports when the imported identities > are used in a {{bound}} argument in {{TypeVar(..., bound="XXX")}}. > For example: > {code:python} > if TYPE_CHECKING: > from pyspark.pandas.base import IndexOpsMixin > IndexOpsLike = TypeVar("IndexOpsLike", bound="IndexOpsMixin") > {code} > Since this behavior is fixed In flake8 >= 3.9.0, we should upgrade the flake8 > installed in Jenkins to 3.9.0 or above. > And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from > 3.8.0 to 3.9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37011) Upgrade flake8 to 3.9.0 or above in Jenkins
[ https://issues.apache.org/jira/browse/SPARK-37011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-37011: -- Description: In flake8 < 3.9.0, F401 error occurs for imports when the impo Since this behavior is fixed In flake8 >= 3.9.0, we should upgrade the flake8 installed in Jenkins to 3.8.0 or above. Otherwise, it occurs F401 error for several lines in pandas-on-PySpark that uses TYPE_CHECKING. And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from 3.8.0 to 3.9.0. was: In flake8 < 3.8.0, F401 error occurs for imports in *if* statements when TYPE_CHECKING is True. However, TYPE_CHECKING is always False at runtime, so there is no need to treat it as an error in static analysis. Since this behavior is fixed In flake8 >= 3.9.0, we should upgrade the flake8 installed in Jenkins to 3.8.0 or above. Otherwise, it occurs F401 error for several lines in pandas-on-PySpark that uses TYPE_CHECKING. And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from 3.8.0 to 3.9.0. > Upgrade flake8 to 3.9.0 or above in Jenkins > --- > > Key: SPARK-37011 > URL: https://issues.apache.org/jira/browse/SPARK-37011 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Priority: Major > > In flake8 < 3.9.0, F401 error occurs for imports when the impo > Since this behavior is fixed In flake8 >= 3.9.0, we should upgrade the flake8 > installed in Jenkins to 3.8.0 or above. Otherwise, it occurs F401 error for > several lines in pandas-on-PySpark that uses TYPE_CHECKING. > And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from > 3.8.0 to 3.9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37011) Upgrade flake8 to 3.9.0 or above in Jenkins
[ https://issues.apache.org/jira/browse/SPARK-37011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-37011: -- Fix Version/s: (was: 3.3.0) > Upgrade flake8 to 3.9.0 or above in Jenkins > --- > > Key: SPARK-37011 > URL: https://issues.apache.org/jira/browse/SPARK-37011 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Priority: Major > > In flake8 < 3.8.0, F401 error occurs for imports in *if* statements when > TYPE_CHECKING is True. However, TYPE_CHECKING is always False at runtime, so > there is no need to treat it as an error in static analysis. > Since this behavior is fixed In flake8 >= 3.9.0, we should upgrade the flake8 > installed in Jenkins to 3.8.0 or above. Otherwise, it occurs F401 error for > several lines in pandas-on-PySpark that uses TYPE_CHECKING. > And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from > 3.8.0 to 3.9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37011) Upgrade flake8 to 3.9.0 or above in Jenkins
[ https://issues.apache.org/jira/browse/SPARK-37011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-37011: -- Affects Version/s: (was: 3.2.0) 3.3.0 > Upgrade flake8 to 3.9.0 or above in Jenkins > --- > > Key: SPARK-37011 > URL: https://issues.apache.org/jira/browse/SPARK-37011 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Priority: Major > Fix For: 3.3.0 > > > In flake8 < 3.8.0, F401 error occurs for imports in *if* statements when > TYPE_CHECKING is True. However, TYPE_CHECKING is always False at runtime, so > there is no need to treat it as an error in static analysis. > Since this behavior is fixed In flake8 >= 3.9.0, we should upgrade the flake8 > installed in Jenkins to 3.8.0 or above. Otherwise, it occurs F401 error for > several lines in pandas-on-PySpark that uses TYPE_CHECKING. > And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from > 3.8.0 to 3.9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37011) Upgrade flake8 to 3.9.0 or above in Jenkins
[ https://issues.apache.org/jira/browse/SPARK-37011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-37011: -- Description: In flake8 < 3.8.0, F401 error occurs for imports in *if* statements when TYPE_CHECKING is True. However, TYPE_CHECKING is always False at runtime, so there is no need to treat it as an error in static analysis. Since this behavior is fixed In flake8 >= 3.9.0, we should upgrade the flake8 installed in Jenkins to 3.8.0 or above. Otherwise, it occurs F401 error for several lines in pandas-on-PySpark that uses TYPE_CHECKING. And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from 3.8.0 to 3.9.0. was: In flake8 < 3.8.0, F401 error occurs for imports in *if* statements when TYPE_CHECKING is True. However, TYPE_CHECKING is always False at runtime, so there is no need to treat it as an error in static analysis. Since this behavior is fixed In flake8 >= 3.8.0, we should upgrade the flake8 installed in Jenkins to 3.8.0 or above. Otherwise, it occurs F401 error for several lines in pandas-on-PySpark that uses TYPE_CHECKING. And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from 3.5.0 to 3.8.0. > Upgrade flake8 to 3.9.0 or above in Jenkins > --- > > Key: SPARK-37011 > URL: https://issues.apache.org/jira/browse/SPARK-37011 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > Fix For: 3.3.0 > > > In flake8 < 3.8.0, F401 error occurs for imports in *if* statements when > TYPE_CHECKING is True. However, TYPE_CHECKING is always False at runtime, so > there is no need to treat it as an error in static analysis. > Since this behavior is fixed In flake8 >= 3.9.0, we should upgrade the flake8 > installed in Jenkins to 3.8.0 or above. Otherwise, it occurs F401 error for > several lines in pandas-on-PySpark that uses TYPE_CHECKING. > And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from > 3.8.0 to 3.9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37011) Upgrade flake8 to 3.9.0 or above in Jenkins
[ https://issues.apache.org/jira/browse/SPARK-37011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-37011: -- Reporter: Takuya Ueshin (was: Haejoon Lee) > Upgrade flake8 to 3.9.0 or above in Jenkins > --- > > Key: SPARK-37011 > URL: https://issues.apache.org/jira/browse/SPARK-37011 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > Fix For: 3.3.0 > > > In flake8 < 3.8.0, F401 error occurs for imports in *if* statements when > TYPE_CHECKING is True. However, TYPE_CHECKING is always False at runtime, so > there is no need to treat it as an error in static analysis. > Since this behavior is fixed In flake8 >= 3.8.0, we should upgrade the flake8 > installed in Jenkins to 3.8.0 or above. Otherwise, it occurs F401 error for > several lines in pandas-on-PySpark that uses TYPE_CHECKING. > And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from > 3.5.0 to 3.8.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37011) Upgrade flake8 to 3.9.0 or above in Jenkins
[ https://issues.apache.org/jira/browse/SPARK-37011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin reassigned SPARK-37011: - Assignee: (was: Shane Knapp) > Upgrade flake8 to 3.9.0 or above in Jenkins > --- > > Key: SPARK-37011 > URL: https://issues.apache.org/jira/browse/SPARK-37011 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Priority: Major > Fix For: 3.3.0 > > > In flake8 < 3.8.0, F401 error occurs for imports in *if* statements when > TYPE_CHECKING is True. However, TYPE_CHECKING is always False at runtime, so > there is no need to treat it as an error in static analysis. > Since this behavior is fixed In flake8 >= 3.8.0, we should upgrade the flake8 > installed in Jenkins to 3.8.0 or above. Otherwise, it occurs F401 error for > several lines in pandas-on-PySpark that uses TYPE_CHECKING. > And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from > 3.5.0 to 3.8.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37011) Upgrade flake8 to 3.9.0 or above in Jenkins
Takuya Ueshin created SPARK-37011: - Summary: Upgrade flake8 to 3.9.0 or above in Jenkins Key: SPARK-37011 URL: https://issues.apache.org/jira/browse/SPARK-37011 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.2.0 Reporter: Haejoon Lee Assignee: Shane Knapp Fix For: 3.3.0 In flake8 < 3.8.0, F401 error occurs for imports in *if* statements when TYPE_CHECKING is True. However, TYPE_CHECKING is always False at runtime, so there is no need to treat it as an error in static analysis. Since this behavior is fixed In flake8 >= 3.8.0, we should upgrade the flake8 installed in Jenkins to 3.8.0 or above. Otherwise, it occurs F401 error for several lines in pandas-on-PySpark that uses TYPE_CHECKING. And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from 3.5.0 to 3.8.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36942) Inline type hints for python/pyspark/sql/readwriter.py
[ https://issues.apache.org/jira/browse/SPARK-36942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-36942. --- Fix Version/s: 3.3.0 Assignee: Xinrong Meng Resolution: Fixed Issue resolved by pull request 34216 https://github.com/apache/spark/pull/34216 > Inline type hints for python/pyspark/sql/readwriter.py > -- > > Key: SPARK-36942 > URL: https://issues.apache.org/jira/browse/SPARK-36942 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.3.0 > > > Inline type hints for python/pyspark/sql/readwriter.py. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37010) Remove unnecessary "noqa: F401" comments in pandas-on-Spark
[ https://issues.apache.org/jira/browse/SPARK-37010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429041#comment-17429041 ] Apache Spark commented on SPARK-37010: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/34287 > Remove unnecessary "noqa: F401" comments in pandas-on-Spark > --- > > Key: SPARK-37010 > URL: https://issues.apache.org/jira/browse/SPARK-37010 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Priority: Major > > After upgrading flake8 in Jenkins, there are still unnecessary {{noqa: F401}} > comments. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37010) Remove unnecessary "noqa: F401" comments in pandas-on-Spark
[ https://issues.apache.org/jira/browse/SPARK-37010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429040#comment-17429040 ] Apache Spark commented on SPARK-37010: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/34287 > Remove unnecessary "noqa: F401" comments in pandas-on-Spark > --- > > Key: SPARK-37010 > URL: https://issues.apache.org/jira/browse/SPARK-37010 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Priority: Major > > After upgrading flake8 in Jenkins, there are still unnecessary {{noqa: F401}} > comments. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37010) Remove unnecessary "noqa: F401" comments in pandas-on-Spark
[ https://issues.apache.org/jira/browse/SPARK-37010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37010: Assignee: (was: Apache Spark) > Remove unnecessary "noqa: F401" comments in pandas-on-Spark > --- > > Key: SPARK-37010 > URL: https://issues.apache.org/jira/browse/SPARK-37010 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Priority: Major > > After upgrading flake8 in Jenkins, there are still unnecessary {{noqa: F401}} > comments. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37010) Remove unnecessary "noqa: F401" comments in pandas-on-Spark
[ https://issues.apache.org/jira/browse/SPARK-37010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37010: Assignee: Apache Spark > Remove unnecessary "noqa: F401" comments in pandas-on-Spark > --- > > Key: SPARK-37010 > URL: https://issues.apache.org/jira/browse/SPARK-37010 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > > After upgrading flake8 in Jenkins, there are still unnecessary {{noqa: F401}} > comments. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37010) Remove unnecessary "noqa: F401" comments in pandas-on-Spark
Takuya Ueshin created SPARK-37010: - Summary: Remove unnecessary "noqa: F401" comments in pandas-on-Spark Key: SPARK-37010 URL: https://issues.apache.org/jira/browse/SPARK-37010 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.3.0 Reporter: Takuya Ueshin After upgrading flake8 in Jenkins, there are still unnecessary {{noqa: F401}} comments. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-23626) DAGScheduler blocked due to JobSubmitted event
[ https://issues.apache.org/jira/browse/SPARK-23626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-23626. Fix Version/s: 3.3.0 3.2.1 3.0.4 3.1.3 Resolution: Fixed Fixed in https://github.com/apache/spark/pull/34265 > DAGScheduler blocked due to JobSubmitted event > --- > > Key: SPARK-23626 > URL: https://issues.apache.org/jira/browse/SPARK-23626 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core >Affects Versions: 2.2.1, 2.3.3, 2.4.3, 3.0.0 >Reporter: Ajith S >Assignee: Josh Rosen >Priority: Major > Fix For: 3.1.3, 3.0.4, 3.2.1, 3.3.0 > > > DAGScheduler becomes a bottleneck in cluster when multiple JobSubmitted > events has to be processed as DAGSchedulerEventProcessLoop is single threaded > and it will block other tasks in queue like TaskCompletion. > The JobSubmitted event is time consuming depending on the nature of the job > (Example: calculating parent stage dependencies, shuffle dependencies, > partitions) and thus it blocks all the events to be processed. > > I see multiple JIRA referring to this behavior > https://issues.apache.org/jira/browse/SPARK-2647 > https://issues.apache.org/jira/browse/SPARK-4961 > > Similarly in my cluster some jobs partition calculation is time consuming > (Similar to stack at SPARK-2647) hence it slows down the spark > DAGSchedulerEventProcessLoop which results in user jobs to slowdown, even if > its tasks are finished within seconds, as TaskCompletion Events are processed > at a slower rate due to blockage. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-23626) DAGScheduler blocked due to JobSubmitted event
[ https://issues.apache.org/jira/browse/SPARK-23626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen reassigned SPARK-23626: -- Assignee: Josh Rosen > DAGScheduler blocked due to JobSubmitted event > --- > > Key: SPARK-23626 > URL: https://issues.apache.org/jira/browse/SPARK-23626 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core >Affects Versions: 2.2.1, 2.3.3, 2.4.3, 3.0.0 >Reporter: Ajith S >Assignee: Josh Rosen >Priority: Major > > DAGScheduler becomes a bottleneck in cluster when multiple JobSubmitted > events has to be processed as DAGSchedulerEventProcessLoop is single threaded > and it will block other tasks in queue like TaskCompletion. > The JobSubmitted event is time consuming depending on the nature of the job > (Example: calculating parent stage dependencies, shuffle dependencies, > partitions) and thus it blocks all the events to be processed. > > I see multiple JIRA referring to this behavior > https://issues.apache.org/jira/browse/SPARK-2647 > https://issues.apache.org/jira/browse/SPARK-4961 > > Similarly in my cluster some jobs partition calculation is time consuming > (Similar to stack at SPARK-2647) hence it slows down the spark > DAGSchedulerEventProcessLoop which results in user jobs to slowdown, even if > its tasks are finished within seconds, as TaskCompletion Events are processed > at a slower rate due to blockage. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37009) Add checks to DebugFilesystem to ensure that FS operations are not performed in the DAGScheduler event loop
Josh Rosen created SPARK-37009: -- Summary: Add checks to DebugFilesystem to ensure that FS operations are not performed in the DAGScheduler event loop Key: SPARK-37009 URL: https://issues.apache.org/jira/browse/SPARK-37009 Project: Spark Issue Type: Improvement Components: Scheduler, Tests Affects Versions: 3.0.0 Reporter: Josh Rosen As [~yuchen.huo] suggested at [https://github.com/apache/spark/pull/34265#discussion_r728805893,] we explore modifying {{DebugFilesystem}} to throw exceptions in case filesystem operations are performed from inside of the DAGScheduler's event processing thread. This could help prevent future issues that are similar to SPARK-23626. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37000) Add type hints to python/pyspark/sql/util.py
[ https://issues.apache.org/jira/browse/SPARK-37000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-37000. --- Fix Version/s: 3.3.0 Assignee: Takuya Ueshin (was: Apache Spark) Resolution: Fixed Issue resolved by pull request 34278 https://github.com/apache/spark/pull/34278 > Add type hints to python/pyspark/sql/util.py > > > Key: SPARK-37000 > URL: https://issues.apache.org/jira/browse/SPARK-37000 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 3.3.0 > > > Add type hints for python/pyspark/sql/utils.py. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36938) Inline type hints for group.py in python/pyspark/sql
[ https://issues.apache.org/jira/browse/SPARK-36938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-36938. --- Fix Version/s: 3.3.0 Assignee: dch nguyen Resolution: Fixed Issue resolved by pull request 34197 https://github.com/apache/spark/pull/34197 > Inline type hints for group.py in python/pyspark/sql > - > > Key: SPARK-36938 > URL: https://issues.apache.org/jira/browse/SPARK-36938 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Assignee: dch nguyen >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36905) Reading Hive view without explicit column names fails in Spark
[ https://issues.apache.org/jira/browse/SPARK-36905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-36905: Affects Version/s: (was: 3.2.0) 3.3.0 > Reading Hive view without explicit column names fails in Spark > --- > > Key: SPARK-36905 > URL: https://issues.apache.org/jira/browse/SPARK-36905 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Shardul Mahadik >Assignee: Linhong Liu >Priority: Major > Fix For: 3.2.1, 3.3.0 > > > Consider a Hive view in which some columns are not explicitly named > {code:sql} > CREATE VIEW test_view AS > SELECT 1 > FROM some_table > {code} > Reading this view in Spark leads to an {{AnalysisException}} > {code:java} > org.apache.spark.sql.AnalysisException: cannot resolve '`_c0`' given input > columns: [1] > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:188) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:185) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:340) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:340) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:337) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:404) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:357) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:337) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:337) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:404) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:357) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:337) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:127) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:132) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:132) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:137) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:137) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:104) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:185) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:182) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91) >
[jira] [Updated] (SPARK-36905) Reading Hive view without explicit column names fails in Spark
[ https://issues.apache.org/jira/browse/SPARK-36905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-36905: Affects Version/s: (was: 3.3.0) 3.2.0 > Reading Hive view without explicit column names fails in Spark > --- > > Key: SPARK-36905 > URL: https://issues.apache.org/jira/browse/SPARK-36905 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Shardul Mahadik >Assignee: Linhong Liu >Priority: Major > Fix For: 3.2.1, 3.3.0 > > > Consider a Hive view in which some columns are not explicitly named > {code:sql} > CREATE VIEW test_view AS > SELECT 1 > FROM some_table > {code} > Reading this view in Spark leads to an {{AnalysisException}} > {code:java} > org.apache.spark.sql.AnalysisException: cannot resolve '`_c0`' given input > columns: [1] > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:188) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:185) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:340) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:340) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:337) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:404) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:357) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:337) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:337) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:404) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:357) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:337) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:127) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:132) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:132) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:137) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:137) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:104) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:185) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:182) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91) >
[jira] [Assigned] (SPARK-36905) Reading Hive view without explicit column names fails in Spark
[ https://issues.apache.org/jira/browse/SPARK-36905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-36905: --- Assignee: Linhong Liu > Reading Hive view without explicit column names fails in Spark > --- > > Key: SPARK-36905 > URL: https://issues.apache.org/jira/browse/SPARK-36905 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Shardul Mahadik >Assignee: Linhong Liu >Priority: Major > > Consider a Hive view in which some columns are not explicitly named > {code:sql} > CREATE VIEW test_view AS > SELECT 1 > FROM some_table > {code} > Reading this view in Spark leads to an {{AnalysisException}} > {code:java} > org.apache.spark.sql.AnalysisException: cannot resolve '`_c0`' given input > columns: [1] > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:188) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:185) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:340) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:340) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:337) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:404) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:357) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:337) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:337) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:404) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:357) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:337) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:127) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:132) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:132) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:137) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:137) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:104) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:185) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:182) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Ana
[jira] [Resolved] (SPARK-36905) Reading Hive view without explicit column names fails in Spark
[ https://issues.apache.org/jira/browse/SPARK-36905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-36905. - Fix Version/s: 3.3.0 3.2.1 Resolution: Fixed > Reading Hive view without explicit column names fails in Spark > --- > > Key: SPARK-36905 > URL: https://issues.apache.org/jira/browse/SPARK-36905 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Shardul Mahadik >Assignee: Linhong Liu >Priority: Major > Fix For: 3.2.1, 3.3.0 > > > Consider a Hive view in which some columns are not explicitly named > {code:sql} > CREATE VIEW test_view AS > SELECT 1 > FROM some_table > {code} > Reading this view in Spark leads to an {{AnalysisException}} > {code:java} > org.apache.spark.sql.AnalysisException: cannot resolve '`_c0`' given input > columns: [1] > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:188) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:185) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:340) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:340) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:337) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:404) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:357) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:337) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:337) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:404) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:357) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:337) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:127) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:132) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:132) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:137) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:137) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:104) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:185) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:182) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.sca
[jira] [Resolved] (SPARK-37003) Merge INSERT related docs
[ https://issues.apache.org/jira/browse/SPARK-37003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-37003. - Resolution: Fixed Issue resolved by pull request 34282 [https://github.com/apache/spark/pull/34282] > Merge INSERT related docs > - > > Key: SPARK-37003 > URL: https://issues.apache.org/jira/browse/SPARK-37003 > Project: Spark > Issue Type: Improvement > Components: docs >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.3.0 > > > Current insert doc have too many same content, merge insert into and overwrite -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37003) Merge INSERT related docs
[ https://issues.apache.org/jira/browse/SPARK-37003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-37003: --- Assignee: angerszhu > Merge INSERT related docs > - > > Key: SPARK-37003 > URL: https://issues.apache.org/jira/browse/SPARK-37003 > Project: Spark > Issue Type: Improvement > Components: docs >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.3.0 > > > Current insert doc have too many same content, merge insert into and overwrite -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36525) DS V2 Index Support
[ https://issues.apache.org/jira/browse/SPARK-36525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428834#comment-17428834 ] Huaxin Gao commented on SPARK-36525: Yes, it would be great if you can please help [~dchvn] > DS V2 Index Support > --- > > Key: SPARK-36525 > URL: https://issues.apache.org/jira/browse/SPARK-36525 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Priority: Major > > Many data sources support index to improvement query performance. In order to > take advantage of the index support in data source, the following APIs will > be added for working with indexes: > {code:java} > /** >* Creates an index. >* >* @param indexName the name of the index to be created >* @param indexType the IndexType of the index to be created >* @param table the table on which index to be created >* @param columns the columns on which index to be created >* @param properties the properties of the index to be created >* @throws IndexAlreadyExistsException If the index already exists > (optional) >* @throws UnsupportedOperationException If create index is not a supported > operation >*/ > void createIndex(String indexName, > String indexType, > Identifier table, > FieldReference[] columns, > Map properties) > throws IndexAlreadyExistsException, UnsupportedOperationException; > /** >* Soft deletes the index with the given name. >* Deleted index can be restored by calling restoreIndex. >* >* @param indexName the name of the index to be deleted >* @return true if the index is deleted >* @throws NoSuchIndexException If the index does not exist (optional) >* @throws UnsupportedOperationException If delete index is not a supported > operation >*/ > default boolean deleteIndex(String indexName) > throws NoSuchIndexException, UnsupportedOperationException > /** >* Checks whether an index exists. >* >* @param indexName the name of the index >* @return true if the index exists, false otherwise >*/ > boolean indexExists(String indexName); > /** >* Lists all the indexes in a table. >* >* @param table the table to be checked on for indexes >* @throws NoSuchTableException >*/ > Index[] listIndexes(Identifier table) throws NoSuchTableException; > /** >* Hard deletes the index with the given name. >* The Index can't be restored once dropped. >* >* @param indexName the name of the index to be dropped. >* @return true if the index is dropped >* @throws NoSuchIndexException If the index does not exist (optional) >* @throws UnsupportedOperationException If drop index is not a supported > operation >*/ > boolean dropIndex(String indexName) throws NoSuchIndexException, > UnsupportedOperationException; > /** >* Restores the index with the given name. >* Deleted index can be restored by calling restoreIndex, but dropped index > can't be restored. >* >* @param indexName the name of the index to be restored >* @return true if the index is restored >* @throws NoSuchIndexException If the index does not exist (optional) >* @throws UnsupportedOperationException >*/ > default boolean restoreIndex(String indexName) > throws NoSuchIndexException, UnsupportedOperationException > /** >* Refreshes index using the latest data. This causes the index to be > rebuilt. >* >* @param indexName the name of the index to be rebuilt >* @return true if the index is rebuilt >* @throws NoSuchIndexException If the index does not exist (optional) >* @throws UnsupportedOperationException >*/ > default boolean refreshIndex(String indexName) > throws NoSuchIndexException, UnsupportedOperationException > /** >* Alter Index using the new property. This causes the index to be rebuilt. >* >* @param indexName the name of the index to be altered >* @return true if the index is altered >* @throws NoSuchIndexException If the index does not exist (optional) >* @throws UnsupportedOperationException >*/ > default boolean alterIndex(String indexName, Properties properties) > throws NoSuchIndexException, UnsupportedOperationException > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37007) ExecutorAllocationManager schedule() does not use spark.dynamicAllocation.executorIdleTimeout
[ https://issues.apache.org/jira/browse/SPARK-37007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428806#comment-17428806 ] Nandini commented on SPARK-37007: - Hello, I would like to work on this jira. > ExecutorAllocationManager schedule() does not use > spark.dynamicAllocation.executorIdleTimeout > - > > Key: SPARK-37007 > URL: https://issues.apache.org/jira/browse/SPARK-37007 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.1, 3.2.0 >Reporter: Nandini >Priority: Minor > > The ExecutorAllocationManager removes idle executors after the configured > spark.dynamicAllocation.executorIdleTimeout but in the schedule() it does not > use the same configuration. > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L249 > The value for intervalMillis is 100 and timeunit is ms. Hence you see this > log approximately 10 times in a second. > | executor.scheduleWithFixedDelay(scheduleTask, 0, intervalMillis, > TimeUnit.MILLISECONDS) > In older versions it was at info level > [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L454] > In the latest versions (and master) it has been changed to debug > [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L540] > The change request for the above - > [https://github.com/apache/spark/commit/3584d849438ad48ff54af3c982c124a8443dc590] > However, this check for executors to be removed should be using > spark.dynamicAllocation.executorIdleTimeout instead of > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L153 > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-37007) ExecutorAllocationManager schedule() does not use spark.dynamicAllocation.executorIdleTimeout
[ https://issues.apache.org/jira/browse/SPARK-37007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nandini updated SPARK-37007: Comment: was deleted (was: Hello Team, I would like to work on this jira.) > ExecutorAllocationManager schedule() does not use > spark.dynamicAllocation.executorIdleTimeout > - > > Key: SPARK-37007 > URL: https://issues.apache.org/jira/browse/SPARK-37007 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.1, 3.2.0 >Reporter: Nandini >Priority: Minor > > The ExecutorAllocationManager removes idle executors after the configured > spark.dynamicAllocation.executorIdleTimeout but in the schedule() it does not > use the same configuration. > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L249 > The value for intervalMillis is 100 and timeunit is ms. Hence you see this > log approximately 10 times in a second. > | executor.scheduleWithFixedDelay(scheduleTask, 0, intervalMillis, > TimeUnit.MILLISECONDS) > In older versions it was at info level > [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L454] > In the latest versions (and master) it has been changed to debug > [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L540] > The change request for the above - > [https://github.com/apache/spark/commit/3584d849438ad48ff54af3c982c124a8443dc590] > However, this check for executors to be removed should be using > spark.dynamicAllocation.executorIdleTimeout instead of > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L153 > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37007) ExecutorAllocationManager schedule() does not use spark.dynamicAllocation.executorIdleTimeout
[ https://issues.apache.org/jira/browse/SPARK-37007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428805#comment-17428805 ] Nandini commented on SPARK-37007: - Hello Team, I would like to work on this jira. > ExecutorAllocationManager schedule() does not use > spark.dynamicAllocation.executorIdleTimeout > - > > Key: SPARK-37007 > URL: https://issues.apache.org/jira/browse/SPARK-37007 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.1, 3.2.0 >Reporter: Nandini >Priority: Minor > > The ExecutorAllocationManager removes idle executors after the configured > spark.dynamicAllocation.executorIdleTimeout but in the schedule() it does not > use the same configuration. > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L249 > The value for intervalMillis is 100 and timeunit is ms. Hence you see this > log approximately 10 times in a second. > | executor.scheduleWithFixedDelay(scheduleTask, 0, intervalMillis, > TimeUnit.MILLISECONDS) > In older versions it was at info level > [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L454] > In the latest versions (and master) it has been changed to debug > [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L540] > The change request for the above - > [https://github.com/apache/spark/commit/3584d849438ad48ff54af3c982c124a8443dc590] > However, this check for executors to be removed should be using > spark.dynamicAllocation.executorIdleTimeout instead of > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L153 > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michelle m Hovington updated SPARK-21187: - Attachment: 0--1172099527-254246775-1412485878 > Complete support for remaining Spark data types in Arrow Converters > --- > > Key: SPARK-21187 > URL: https://issues.apache.org/jira/browse/SPARK-21187 > Project: Spark > Issue Type: Umbrella > Components: PySpark, SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler >Assignee: Bryan Cutler >Priority: Major > Fix For: 3.1.0 > > Attachments: 0--1172099527-254246775-1412485878 > > > This is to track adding the remaining type support in Arrow Converters. > Currently, only primitive data types are supported. ' > Remaining types: > * -*Date*- > * -*Timestamp*- > * *Complex*: -Struct-, -Array-, -Map- > * -*Decimal*- > * -*Binary*- > * -*Categorical*- when converting from Pandas > Some things to do before closing this out: > * -Look to upgrading to Arrow 0.7 for better Decimal support (can now write > values as BigDecimal)- > * -Need to add some user docs- > * -Make sure Python tests are thorough- > * Check into complex type support mentioned in comments by [~leif], should > we support mulit-indexing? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37008) WholeStageCodegenSparkSubmitSuite Failed with Java 17
[ https://issues.apache.org/jira/browse/SPARK-37008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37008: Assignee: (was: Apache Spark) > WholeStageCodegenSparkSubmitSuite Failed with Java 17 > -- > > Key: SPARK-37008 > URL: https://issues.apache.org/jira/browse/SPARK-37008 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Major > > WholeStageCodegenSparkSubmitSuite test failed when use Java 17 > {code:java} > 2021-10-14 04:32:38.038 - stderr> Exception in thread "main" > org.scalatest.exceptions.TestFailedException: 16 was not greater than 16 > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite$.main(WholeStageCodegenSparkSubmitSuite.scala:82) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite.main(WholeStageCodegenSparkSubmitSuite.scala) > 2021-10-14 04:32:38.038 - stderr> at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2021-10-14 04:32:38.038 - stderr> at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > 2021-10-14 04:32:38.038 - stderr> at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2021-10-14 04:32:38.038 - stderr> at > java.base/java.lang.reflect.Method.invoke(Method.java:568) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37008) WholeStageCodegenSparkSubmitSuite Failed with Java 17
[ https://issues.apache.org/jira/browse/SPARK-37008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37008: Assignee: Apache Spark > WholeStageCodegenSparkSubmitSuite Failed with Java 17 > -- > > Key: SPARK-37008 > URL: https://issues.apache.org/jira/browse/SPARK-37008 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > WholeStageCodegenSparkSubmitSuite test failed when use Java 17 > {code:java} > 2021-10-14 04:32:38.038 - stderr> Exception in thread "main" > org.scalatest.exceptions.TestFailedException: 16 was not greater than 16 > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite$.main(WholeStageCodegenSparkSubmitSuite.scala:82) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite.main(WholeStageCodegenSparkSubmitSuite.scala) > 2021-10-14 04:32:38.038 - stderr> at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2021-10-14 04:32:38.038 - stderr> at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > 2021-10-14 04:32:38.038 - stderr> at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2021-10-14 04:32:38.038 - stderr> at > java.base/java.lang.reflect.Method.invoke(Method.java:568) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37008) WholeStageCodegenSparkSubmitSuite Failed with Java 17
[ https://issues.apache.org/jira/browse/SPARK-37008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428783#comment-17428783 ] Apache Spark commented on SPARK-37008: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/34286 > WholeStageCodegenSparkSubmitSuite Failed with Java 17 > -- > > Key: SPARK-37008 > URL: https://issues.apache.org/jira/browse/SPARK-37008 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Major > > WholeStageCodegenSparkSubmitSuite test failed when use Java 17 > {code:java} > 2021-10-14 04:32:38.038 - stderr> Exception in thread "main" > org.scalatest.exceptions.TestFailedException: 16 was not greater than 16 > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite$.main(WholeStageCodegenSparkSubmitSuite.scala:82) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite.main(WholeStageCodegenSparkSubmitSuite.scala) > 2021-10-14 04:32:38.038 - stderr> at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2021-10-14 04:32:38.038 - stderr> at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > 2021-10-14 04:32:38.038 - stderr> at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2021-10-14 04:32:38.038 - stderr> at > java.base/java.lang.reflect.Method.invoke(Method.java:568) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37008) WholeStageCodegenSparkSubmitSuite Failed with Java 17
[ https://issues.apache.org/jira/browse/SPARK-37008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37008: Assignee: (was: Apache Spark) > WholeStageCodegenSparkSubmitSuite Failed with Java 17 > -- > > Key: SPARK-37008 > URL: https://issues.apache.org/jira/browse/SPARK-37008 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Major > > WholeStageCodegenSparkSubmitSuite test failed when use Java 17 > {code:java} > 2021-10-14 04:32:38.038 - stderr> Exception in thread "main" > org.scalatest.exceptions.TestFailedException: 16 was not greater than 16 > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite$.main(WholeStageCodegenSparkSubmitSuite.scala:82) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite.main(WholeStageCodegenSparkSubmitSuite.scala) > 2021-10-14 04:32:38.038 - stderr> at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2021-10-14 04:32:38.038 - stderr> at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > 2021-10-14 04:32:38.038 - stderr> at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2021-10-14 04:32:38.038 - stderr> at > java.base/java.lang.reflect.Method.invoke(Method.java:568) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36337) decimal('Nan') is unsupported in net.razorvine.pickle
[ https://issues.apache.org/jira/browse/SPARK-36337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36337: Assignee: (was: Apache Spark) > decimal('Nan') is unsupported in net.razorvine.pickle > -- > > Key: SPARK-36337 > URL: https://issues.apache.org/jira/browse/SPARK-36337 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Yikun Jiang >Priority: Major > > Decimal('NaN') is not supported by net.razorvine.pickle now. > In Python > {code:java} > >>> pickled = cloudpickle.dumps(decimal.Decimal('NaN')) > b'\x80\x05\x95!\x00\x00\x00\x00\x00\x00\x00\x8c\x07decimal\x94\x8c\x07Decimal\x94\x93\x94\x8c\x03NaN\x94\x85\x94R\x94.' > >>> pickle.loads(pickled) > Decimal('NaN') > {code} > In Scala > {code:java} > scala> import net.razorvine.pickle.\{Pickler, Unpickler, PickleUtils} > scala> val unpickle = new Unpickler > scala> > unpickle.loads(PickleUtils.str2bytes("\u0080\u0005\u0095!\u\u\u\u\u\u\u\u008c\u0007decimal\u0094\u008c\u0007Decimal\u0094\u0093\u0094\u008c\u0003NaN\u0094\u0085\u0094R\u0094.")) > net.razorvine.pickle.PickleException: problem construction object: > java.lang.reflect.InvocationTargetException > at > net.razorvine.pickle.objects.AnyClassConstructor.construct(AnyClassConstructor.java:29) > at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:773) > at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:213) > at net.razorvine.pickle.Unpickler.load(Unpickler.java:123) > at net.razorvine.pickle.Unpickler.loads(Unpickler.java:136) > ... 48 elided > {code} > I submit an issue in pickle upstream > [https://github.com/irmen/pickle/issues/7] . > we should bump pickle latest version after it fixed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36337) decimal('Nan') is unsupported in net.razorvine.pickle
[ https://issues.apache.org/jira/browse/SPARK-36337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36337: Assignee: Apache Spark > decimal('Nan') is unsupported in net.razorvine.pickle > -- > > Key: SPARK-36337 > URL: https://issues.apache.org/jira/browse/SPARK-36337 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Yikun Jiang >Assignee: Apache Spark >Priority: Major > > Decimal('NaN') is not supported by net.razorvine.pickle now. > In Python > {code:java} > >>> pickled = cloudpickle.dumps(decimal.Decimal('NaN')) > b'\x80\x05\x95!\x00\x00\x00\x00\x00\x00\x00\x8c\x07decimal\x94\x8c\x07Decimal\x94\x93\x94\x8c\x03NaN\x94\x85\x94R\x94.' > >>> pickle.loads(pickled) > Decimal('NaN') > {code} > In Scala > {code:java} > scala> import net.razorvine.pickle.\{Pickler, Unpickler, PickleUtils} > scala> val unpickle = new Unpickler > scala> > unpickle.loads(PickleUtils.str2bytes("\u0080\u0005\u0095!\u\u\u\u\u\u\u\u008c\u0007decimal\u0094\u008c\u0007Decimal\u0094\u0093\u0094\u008c\u0003NaN\u0094\u0085\u0094R\u0094.")) > net.razorvine.pickle.PickleException: problem construction object: > java.lang.reflect.InvocationTargetException > at > net.razorvine.pickle.objects.AnyClassConstructor.construct(AnyClassConstructor.java:29) > at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:773) > at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:213) > at net.razorvine.pickle.Unpickler.load(Unpickler.java:123) > at net.razorvine.pickle.Unpickler.loads(Unpickler.java:136) > ... 48 elided > {code} > I submit an issue in pickle upstream > [https://github.com/irmen/pickle/issues/7] . > we should bump pickle latest version after it fixed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36337) decimal('Nan') is unsupported in net.razorvine.pickle
[ https://issues.apache.org/jira/browse/SPARK-36337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428773#comment-17428773 ] Apache Spark commented on SPARK-36337: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/34285 > decimal('Nan') is unsupported in net.razorvine.pickle > -- > > Key: SPARK-36337 > URL: https://issues.apache.org/jira/browse/SPARK-36337 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Yikun Jiang >Priority: Major > > Decimal('NaN') is not supported by net.razorvine.pickle now. > In Python > {code:java} > >>> pickled = cloudpickle.dumps(decimal.Decimal('NaN')) > b'\x80\x05\x95!\x00\x00\x00\x00\x00\x00\x00\x8c\x07decimal\x94\x8c\x07Decimal\x94\x93\x94\x8c\x03NaN\x94\x85\x94R\x94.' > >>> pickle.loads(pickled) > Decimal('NaN') > {code} > In Scala > {code:java} > scala> import net.razorvine.pickle.\{Pickler, Unpickler, PickleUtils} > scala> val unpickle = new Unpickler > scala> > unpickle.loads(PickleUtils.str2bytes("\u0080\u0005\u0095!\u\u\u\u\u\u\u\u008c\u0007decimal\u0094\u008c\u0007Decimal\u0094\u0093\u0094\u008c\u0003NaN\u0094\u0085\u0094R\u0094.")) > net.razorvine.pickle.PickleException: problem construction object: > java.lang.reflect.InvocationTargetException > at > net.razorvine.pickle.objects.AnyClassConstructor.construct(AnyClassConstructor.java:29) > at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:773) > at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:213) > at net.razorvine.pickle.Unpickler.load(Unpickler.java:123) > at net.razorvine.pickle.Unpickler.loads(Unpickler.java:136) > ... 48 elided > {code} > I submit an issue in pickle upstream > [https://github.com/irmen/pickle/issues/7] . > we should bump pickle latest version after it fixed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37008) WholeStageCodegenSparkSubmitSuite Failed with Java 17
[ https://issues.apache.org/jira/browse/SPARK-37008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-37008: - Summary: WholeStageCodegenSparkSubmitSuite Failed with Java 17 (was: Use UseCompressedClassPointers instead of UseCompressedOops to pass WholeStageCodegenSparkSubmitSuite with Java 17 ) > WholeStageCodegenSparkSubmitSuite Failed with Java 17 > -- > > Key: SPARK-37008 > URL: https://issues.apache.org/jira/browse/SPARK-37008 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Major > > WholeStageCodegenSparkSubmitSuite test failed when use Java 17 > {code:java} > 2021-10-14 04:32:38.038 - stderr> Exception in thread "main" > org.scalatest.exceptions.TestFailedException: 16 was not greater than 16 > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite$.main(WholeStageCodegenSparkSubmitSuite.scala:82) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite.main(WholeStageCodegenSparkSubmitSuite.scala) > 2021-10-14 04:32:38.038 - stderr> at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2021-10-14 04:32:38.038 - stderr> at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > 2021-10-14 04:32:38.038 - stderr> at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2021-10-14 04:32:38.038 - stderr> at > java.base/java.lang.reflect.Method.invoke(Method.java:568) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37008) Use UseCompressedClassPointers instead of UseCompressedOops to pass WholeStageCodegenSparkSubmitSuite with Java 17
Yang Jie created SPARK-37008: Summary: Use UseCompressedClassPointers instead of UseCompressedOops to pass WholeStageCodegenSparkSubmitSuite with Java 17 Key: SPARK-37008 URL: https://issues.apache.org/jira/browse/SPARK-37008 Project: Spark Issue Type: Sub-task Components: SQL, Tests Affects Versions: 3.3.0 Reporter: Yang Jie WholeStageCodegenSparkSubmitSuite test failed when use Java 17 {code:java} 2021-10-14 04:32:38.038 - stderr> Exception in thread "main" org.scalatest.exceptions.TestFailedException: 16 was not greater than 16 2021-10-14 04:32:38.038 - stderr> at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) 2021-10-14 04:32:38.038 - stderr> at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) 2021-10-14 04:32:38.038 - stderr> at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) 2021-10-14 04:32:38.038 - stderr> at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) 2021-10-14 04:32:38.038 - stderr> at org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite$.main(WholeStageCodegenSparkSubmitSuite.scala:82) 2021-10-14 04:32:38.038 - stderr> at org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite.main(WholeStageCodegenSparkSubmitSuite.scala) 2021-10-14 04:32:38.038 - stderr> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 2021-10-14 04:32:38.038 - stderr> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) 2021-10-14 04:32:38.038 - stderr> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 2021-10-14 04:32:38.038 - stderr> at java.base/java.lang.reflect.Method.invoke(Method.java:568) 2021-10-14 04:32:38.038 - stderr> at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) 2021-10-14 04:32:38.038 - stderr> at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) 2021-10-14 04:32:38.038 - stderr> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) 2021-10-14 04:32:38.038 - stderr> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) 2021-10-14 04:32:38.038 - stderr> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) 2021-10-14 04:32:38.038 - stderr> at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043) 2021-10-14 04:32:38.038 - stderr> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052) 2021-10-14 04:32:38.038 - stderr> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37007) ExecutorAllocationManager schedule() does not use spark.dynamicAllocation.executorIdleTimeout
Nandini created SPARK-37007: --- Summary: ExecutorAllocationManager schedule() does not use spark.dynamicAllocation.executorIdleTimeout Key: SPARK-37007 URL: https://issues.apache.org/jira/browse/SPARK-37007 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.2.0, 2.4.1 Reporter: Nandini The ExecutorAllocationManager removes idle executors after the configured spark.dynamicAllocation.executorIdleTimeout but in the schedule() it does not use the same configuration. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L249 The value for intervalMillis is 100 and timeunit is ms. Hence you see this log approximately 10 times in a second. | executor.scheduleWithFixedDelay(scheduleTask, 0, intervalMillis, TimeUnit.MILLISECONDS) In older versions it was at info level [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L454] In the latest versions (and master) it has been changed to debug [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L540] The change request for the above - [https://github.com/apache/spark/commit/3584d849438ad48ff54af3c982c124a8443dc590] However, this check for executors to be removed should be using spark.dynamicAllocation.executorIdleTimeout instead of https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L153 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37001) Disable two level of map for final hash aggregation by default
[ https://issues.apache.org/jira/browse/SPARK-37001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-37001: --- Assignee: Cheng Su > Disable two level of map for final hash aggregation by default > -- > > Key: SPARK-37001 > URL: https://issues.apache.org/jira/browse/SPARK-37001 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Cheng Su >Assignee: Cheng Su >Priority: Minor > > This JIRA is to disable two level of maps for final hash aggregation by > default. The feature was introduced in > [#32242|https://github.com/apache/spark/pull/32242] and we found it can lead > to query performance regression when the final aggregation gets rows with a > lot of distinct keys. The 1st level hash map is full so a lot of rows will > waste the 1st hash map lookup and inserted into 2nd hash map. This feature > still benefits query with not so many distinct keys though, so introducing a > config to allow query to enable the feature when seeing benefit. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37001) Disable two level of map for final hash aggregation by default
[ https://issues.apache.org/jira/browse/SPARK-37001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-37001. - Fix Version/s: 3.2.1 3.3.0 Resolution: Fixed Issue resolved by pull request 34270 [https://github.com/apache/spark/pull/34270] > Disable two level of map for final hash aggregation by default > -- > > Key: SPARK-37001 > URL: https://issues.apache.org/jira/browse/SPARK-37001 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Cheng Su >Assignee: Cheng Su >Priority: Minor > Fix For: 3.3.0, 3.2.1 > > > This JIRA is to disable two level of maps for final hash aggregation by > default. The feature was introduced in > [#32242|https://github.com/apache/spark/pull/32242] and we found it can lead > to query performance regression when the final aggregation gets rows with a > lot of distinct keys. The 1st level hash map is full so a lot of rows will > waste the 1st hash map lookup and inserted into 2nd hash map. This feature > still benefits query with not so many distinct keys though, so introducing a > config to allow query to enable the feature when seeing benefit. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12567) Add aes_encrypt and aes_decrypt UDFs
[ https://issues.apache.org/jira/browse/SPARK-12567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-12567: --- Assignee: Kousuke Saruta > Add aes_encrypt and aes_decrypt UDFs > > > Key: SPARK-12567 > URL: https://issues.apache.org/jira/browse/SPARK-12567 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Kai Jiang >Assignee: Kousuke Saruta >Priority: Major > Fix For: 3.3.0 > > > AES (Advanced Encryption Standard) algorithm. > Add aes_encrypt and aes_decrypt UDFs. > Ref: > [Hive|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Misc.Functions] > [MySQL|https://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_aes-decrypt] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12567) Add aes_encrypt and aes_decrypt UDFs
[ https://issues.apache.org/jira/browse/SPARK-12567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-12567. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 32801 [https://github.com/apache/spark/pull/32801] > Add aes_encrypt and aes_decrypt UDFs > > > Key: SPARK-12567 > URL: https://issues.apache.org/jira/browse/SPARK-12567 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Kai Jiang >Priority: Major > Fix For: 3.3.0 > > > AES (Advanced Encryption Standard) algorithm. > Add aes_encrypt and aes_decrypt UDFs. > Ref: > [Hive|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Misc.Functions] > [MySQL|https://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_aes-decrypt] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37006) MapStatus adds localDirs to avoid the rpc request by method getHostLocalDirs when shuffle reading
[ https://issues.apache.org/jira/browse/SPARK-37006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428714#comment-17428714 ] jinhai commented on SPARK-37006: hi [~cloud_fan],can you review this issue for me? > MapStatus adds localDirs to avoid the rpc request by method getHostLocalDirs > when shuffle reading > - > > Key: SPARK-37006 > URL: https://issues.apache.org/jira/browse/SPARK-37006 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 3.1.2 >Reporter: jinhai >Priority: Major > > In shuffle reading, in order to get the hostLocalDirs value when executing > fetchHostLocalBlocks, we need ExternalBlockStoreClient or > NettyBlockTransferService to make a rpc request. > And when externalShuffleServiceEnabled, there is no need to registerExecutor > and so on in the ExternalShuffleBlockResolver class. > Throughout the spark shuffle module, a lot of code logic is written to deal > with localDirs. > We can directly add localDirs to the BlockManagerId class of MapStatus to get > datafile and indexfile. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37006) MapStatus adds localDirs to avoid the rpc request by method getHostLocalDirs when shuffle reading
jinhai created SPARK-37006: -- Summary: MapStatus adds localDirs to avoid the rpc request by method getHostLocalDirs when shuffle reading Key: SPARK-37006 URL: https://issues.apache.org/jira/browse/SPARK-37006 Project: Spark Issue Type: Improvement Components: Shuffle Affects Versions: 3.1.2 Reporter: jinhai In shuffle reading, in order to get the hostLocalDirs value when executing fetchHostLocalBlocks, we need ExternalBlockStoreClient or NettyBlockTransferService to make a rpc request. And when externalShuffleServiceEnabled, there is no need to registerExecutor and so on in the ExternalShuffleBlockResolver class. Throughout the spark shuffle module, a lot of code logic is written to deal with localDirs. We can directly add localDirs to the BlockManagerId class of MapStatus to get datafile and indexfile. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36632) DivideYMInterval and DivideDTInterval should throw the same exception when divide by zero.
[ https://issues.apache.org/jira/browse/SPARK-36632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-36632. - Fix Version/s: 3.2.1 Resolution: Fixed Issue resolved by pull request 33889 [https://github.com/apache/spark/pull/33889] > DivideYMInterval and DivideDTInterval should throw the same exception when > divide by zero. > -- > > Key: SPARK-36632 > URL: https://issues.apache.org/jira/browse/SPARK-36632 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.2.1 > > > DivideYMInterval not consider the ansi mode, we should support it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36632) DivideYMInterval and DivideDTInterval should throw the same exception when divide by zero.
[ https://issues.apache.org/jira/browse/SPARK-36632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-36632: --- Assignee: jiaan.geng > DivideYMInterval and DivideDTInterval should throw the same exception when > divide by zero. > -- > > Key: SPARK-36632 > URL: https://issues.apache.org/jira/browse/SPARK-36632 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > > DivideYMInterval not consider the ansi mode, we should support it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36571) Optimized FileOutputCommitter with StagingDir
[ https://issues.apache.org/jira/browse/SPARK-36571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36571: Assignee: (was: Apache Spark) > Optimized FileOutputCommitter with StagingDir > - > > Key: SPARK-36571 > URL: https://issues.apache.org/jira/browse/SPARK-36571 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36571) Optimized FileOutputCommitter with StagingDir
[ https://issues.apache.org/jira/browse/SPARK-36571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428693#comment-17428693 ] Apache Spark commented on SPARK-36571: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/33820 > Optimized FileOutputCommitter with StagingDir > - > > Key: SPARK-36571 > URL: https://issues.apache.org/jira/browse/SPARK-36571 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36571) Optimized FileOutputCommitter with StagingDir
[ https://issues.apache.org/jira/browse/SPARK-36571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36571: Assignee: Apache Spark > Optimized FileOutputCommitter with StagingDir > - > > Key: SPARK-36571 > URL: https://issues.apache.org/jira/browse/SPARK-36571 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36464) Fix Underlying Size Variable Initialization in ChunkedByteBufferOutputStream for Writing Over 2GB Data
[ https://issues.apache.org/jira/browse/SPARK-36464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428689#comment-17428689 ] Apache Spark commented on SPARK-36464: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/34284 > Fix Underlying Size Variable Initialization in ChunkedByteBufferOutputStream > for Writing Over 2GB Data > -- > > Key: SPARK-36464 > URL: https://issues.apache.org/jira/browse/SPARK-36464 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.8, 3.0.3, 3.1.2, 3.3.0 >Reporter: Kazuyuki Tanimura >Assignee: Kazuyuki Tanimura >Priority: Major > Fix For: 3.2.0, 3.1.3, 3.0.4 > > > The `size` method of `ChunkedByteBufferOutputStream` returns a `Long` value; > however, the underlying `_size` variable is initialized as `Int`. > That causes an overflow and returns a negative size when over 2GB data is > written into `ChunkedByteBufferOutputStream` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36464) Fix Underlying Size Variable Initialization in ChunkedByteBufferOutputStream for Writing Over 2GB Data
[ https://issues.apache.org/jira/browse/SPARK-36464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428688#comment-17428688 ] Apache Spark commented on SPARK-36464: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/34284 > Fix Underlying Size Variable Initialization in ChunkedByteBufferOutputStream > for Writing Over 2GB Data > -- > > Key: SPARK-36464 > URL: https://issues.apache.org/jira/browse/SPARK-36464 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.8, 3.0.3, 3.1.2, 3.3.0 >Reporter: Kazuyuki Tanimura >Assignee: Kazuyuki Tanimura >Priority: Major > Fix For: 3.2.0, 3.1.3, 3.0.4 > > > The `size` method of `ChunkedByteBufferOutputStream` returns a `Long` value; > however, the underlying `_size` variable is initialized as `Int`. > That causes an overflow and returns a negative size when over 2GB data is > written into `ChunkedByteBufferOutputStream` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36464) Fix Underlying Size Variable Initialization in ChunkedByteBufferOutputStream for Writing Over 2GB Data
[ https://issues.apache.org/jira/browse/SPARK-36464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428686#comment-17428686 ] Apache Spark commented on SPARK-36464: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/34284 > Fix Underlying Size Variable Initialization in ChunkedByteBufferOutputStream > for Writing Over 2GB Data > -- > > Key: SPARK-36464 > URL: https://issues.apache.org/jira/browse/SPARK-36464 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.8, 3.0.3, 3.1.2, 3.3.0 >Reporter: Kazuyuki Tanimura >Assignee: Kazuyuki Tanimura >Priority: Major > Fix For: 3.2.0, 3.1.3, 3.0.4 > > > The `size` method of `ChunkedByteBufferOutputStream` returns a `Long` value; > however, the underlying `_size` variable is initialized as `Int`. > That causes an overflow and returns a negative size when over 2GB data is > written into `ChunkedByteBufferOutputStream` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36900) "SPARK-36464: size returns correct positive number even with over 2GB data" will oom with JDK17
[ https://issues.apache.org/jira/browse/SPARK-36900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428684#comment-17428684 ] Apache Spark commented on SPARK-36900: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/34284 > "SPARK-36464: size returns correct positive number even with over 2GB data" > will oom with JDK17 > > > Key: SPARK-36900 > URL: https://issues.apache.org/jira/browse/SPARK-36900 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > > Execute > > {code:java} > build/mvn clean install -pl core -am -Dtest=none > -DwildcardSuites=org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite > {code} > with JDK 17, > {code:java} > ChunkedByteBufferOutputStreamSuite: > - empty output > - write a single byte > - write a single near boundary > - write a single at boundary > - single chunk output > - single chunk output at boundary size > - multiple chunk output > - multiple chunk output at boundary size > *** RUN ABORTED *** > java.lang.OutOfMemoryError: Java heap space > at java.base/java.lang.Integer.valueOf(Integer.java:1081) > at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:67) > at > org.apache.spark.util.io.ChunkedByteBufferOutputStream.allocateNewChunkIfNeeded(ChunkedByteBufferOutputStream.scala:87) > at > org.apache.spark.util.io.ChunkedByteBufferOutputStream.write(ChunkedByteBufferOutputStream.scala:75) > at java.base/java.io.OutputStream.write(OutputStream.java:127) > at > org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite.$anonfun$new$22(ChunkedByteBufferOutputStreamSuite.scala:127) > at > org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite$$Lambda$179/0x0008011a75d8.apply(Unknown > Source) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36900) "SPARK-36464: size returns correct positive number even with over 2GB data" will oom with JDK17
[ https://issues.apache.org/jira/browse/SPARK-36900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428685#comment-17428685 ] Apache Spark commented on SPARK-36900: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/34284 > "SPARK-36464: size returns correct positive number even with over 2GB data" > will oom with JDK17 > > > Key: SPARK-36900 > URL: https://issues.apache.org/jira/browse/SPARK-36900 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > > Execute > > {code:java} > build/mvn clean install -pl core -am -Dtest=none > -DwildcardSuites=org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite > {code} > with JDK 17, > {code:java} > ChunkedByteBufferOutputStreamSuite: > - empty output > - write a single byte > - write a single near boundary > - write a single at boundary > - single chunk output > - single chunk output at boundary size > - multiple chunk output > - multiple chunk output at boundary size > *** RUN ABORTED *** > java.lang.OutOfMemoryError: Java heap space > at java.base/java.lang.Integer.valueOf(Integer.java:1081) > at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:67) > at > org.apache.spark.util.io.ChunkedByteBufferOutputStream.allocateNewChunkIfNeeded(ChunkedByteBufferOutputStream.scala:87) > at > org.apache.spark.util.io.ChunkedByteBufferOutputStream.write(ChunkedByteBufferOutputStream.scala:75) > at java.base/java.io.OutputStream.write(OutputStream.java:127) > at > org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite.$anonfun$new$22(ChunkedByteBufferOutputStreamSuite.scala:127) > at > org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite$$Lambda$179/0x0008011a75d8.apply(Unknown > Source) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36525) DS V2 Index Support
[ https://issues.apache.org/jira/browse/SPARK-36525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428676#comment-17428676 ] dch nguyen commented on SPARK-36525: [~huaxingao], Should we do these functions for supportsIndex in JDBC for the other dialects like Oracle, Postgres, etc.? > DS V2 Index Support > --- > > Key: SPARK-36525 > URL: https://issues.apache.org/jira/browse/SPARK-36525 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Priority: Major > > Many data sources support index to improvement query performance. In order to > take advantage of the index support in data source, the following APIs will > be added for working with indexes: > {code:java} > /** >* Creates an index. >* >* @param indexName the name of the index to be created >* @param indexType the IndexType of the index to be created >* @param table the table on which index to be created >* @param columns the columns on which index to be created >* @param properties the properties of the index to be created >* @throws IndexAlreadyExistsException If the index already exists > (optional) >* @throws UnsupportedOperationException If create index is not a supported > operation >*/ > void createIndex(String indexName, > String indexType, > Identifier table, > FieldReference[] columns, > Map properties) > throws IndexAlreadyExistsException, UnsupportedOperationException; > /** >* Soft deletes the index with the given name. >* Deleted index can be restored by calling restoreIndex. >* >* @param indexName the name of the index to be deleted >* @return true if the index is deleted >* @throws NoSuchIndexException If the index does not exist (optional) >* @throws UnsupportedOperationException If delete index is not a supported > operation >*/ > default boolean deleteIndex(String indexName) > throws NoSuchIndexException, UnsupportedOperationException > /** >* Checks whether an index exists. >* >* @param indexName the name of the index >* @return true if the index exists, false otherwise >*/ > boolean indexExists(String indexName); > /** >* Lists all the indexes in a table. >* >* @param table the table to be checked on for indexes >* @throws NoSuchTableException >*/ > Index[] listIndexes(Identifier table) throws NoSuchTableException; > /** >* Hard deletes the index with the given name. >* The Index can't be restored once dropped. >* >* @param indexName the name of the index to be dropped. >* @return true if the index is dropped >* @throws NoSuchIndexException If the index does not exist (optional) >* @throws UnsupportedOperationException If drop index is not a supported > operation >*/ > boolean dropIndex(String indexName) throws NoSuchIndexException, > UnsupportedOperationException; > /** >* Restores the index with the given name. >* Deleted index can be restored by calling restoreIndex, but dropped index > can't be restored. >* >* @param indexName the name of the index to be restored >* @return true if the index is restored >* @throws NoSuchIndexException If the index does not exist (optional) >* @throws UnsupportedOperationException >*/ > default boolean restoreIndex(String indexName) > throws NoSuchIndexException, UnsupportedOperationException > /** >* Refreshes index using the latest data. This causes the index to be > rebuilt. >* >* @param indexName the name of the index to be rebuilt >* @return true if the index is rebuilt >* @throws NoSuchIndexException If the index does not exist (optional) >* @throws UnsupportedOperationException >*/ > default boolean refreshIndex(String indexName) > throws NoSuchIndexException, UnsupportedOperationException > /** >* Alter Index using the new property. This causes the index to be rebuilt. >* >* @param indexName the name of the index to be altered >* @return true if the index is altered >* @throws NoSuchIndexException If the index does not exist (optional) >* @throws UnsupportedOperationException >*/ > default boolean alterIndex(String indexName, Properties properties) > throws NoSuchIndexException, UnsupportedOperationException > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36946) Support time for ps.to_datetime
[ https://issues.apache.org/jira/browse/SPARK-36946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-36946: Assignee: dgd_contributor > Support time for ps.to_datetime > --- > > Key: SPARK-36946 > URL: https://issues.apache.org/jira/browse/SPARK-36946 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dgd_contributor >Assignee: dgd_contributor >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36946) Support time for ps.to_datetime
[ https://issues.apache.org/jira/browse/SPARK-36946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36946. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34211 [https://github.com/apache/spark/pull/34211] > Support time for ps.to_datetime > --- > > Key: SPARK-36946 > URL: https://issues.apache.org/jira/browse/SPARK-36946 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dgd_contributor >Assignee: dgd_contributor >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37004) Job cancellation causes py4j errors on Jupyter due to pinned thread mode
[ https://issues.apache.org/jira/browse/SPARK-37004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428663#comment-17428663 ] Hyukjin Kwon commented on SPARK-37004: -- It was found that it's a Py4J issue. I made a fix (https://github.com/bartdag/py4j/pull/440). > Job cancellation causes py4j errors on Jupyter due to pinned thread mode > > > Key: SPARK-37004 > URL: https://issues.apache.org/jira/browse/SPARK-37004 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xiangrui Meng >Priority: Blocker > Attachments: pinned.ipynb > > > Spark 3.2.0 turned on py4j pinned thread mode by default (SPARK-35303). > However, in a jupyter notebook, after I cancel (interrupt) a long-running > Spark job, the next Spark command will fail with some py4j errors. See > attached notebook for repro. > Cannot reproduce the issue after I turn off pinned thread mode . -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37005) pyspark os.getenv('SPARK_YARN_STAGING_DIR') can not get job path
[ https://issues.apache.org/jira/browse/SPARK-37005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WeiNan Zhao updated SPARK-37005: Labels: python spark-core (was: ) > pyspark os.getenv('SPARK_YARN_STAGING_DIR') can not get job path > > > Key: SPARK-37005 > URL: https://issues.apache.org/jira/browse/SPARK-37005 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.4 > Environment: python2.7 > spark2.4 >Reporter: WeiNan Zhao >Priority: Major > Labels: python, spark-core > > hi, all > i commit a spark job , use spark-submit and set option --files file1, > then in code,i use > {code:java} > //代码占位符 > path = str(os.environ["SPARK_YARN_STAGING_DIR"]) > {code} > but path is None, this can success in java code > {code:java} > //代码占位符 > spark.read().textFile(System.getenv("SPARK_YARN_STAGING_DIR") + "/README.md") > {code} > which cause this problem. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37005) pyspark os.getenv('SPARK_YARN_STAGING_DIR') can not get job path
WeiNan Zhao created SPARK-37005: --- Summary: pyspark os.getenv('SPARK_YARN_STAGING_DIR') can not get job path Key: SPARK-37005 URL: https://issues.apache.org/jira/browse/SPARK-37005 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.4.4 Environment: python2.7 spark2.4 Reporter: WeiNan Zhao hi, all i commit a spark job , use spark-submit and set option --files file1, then in code,i use {code:java} //代码占位符 path = str(os.environ["SPARK_YARN_STAGING_DIR"]) {code} but path is None, this can success in java code {code:java} //代码占位符 spark.read().textFile(System.getenv("SPARK_YARN_STAGING_DIR") + "/README.md") {code} which cause this problem. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36871) Migrate CreateViewStatement to v2 command
[ https://issues.apache.org/jira/browse/SPARK-36871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428651#comment-17428651 ] Apache Spark commented on SPARK-36871: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/34283 > Migrate CreateViewStatement to v2 command > - > > Key: SPARK-36871 > URL: https://issues.apache.org/jira/browse/SPARK-36871 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36871) Migrate CreateViewStatement to v2 command
[ https://issues.apache.org/jira/browse/SPARK-36871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428649#comment-17428649 ] Apache Spark commented on SPARK-36871: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/34283 > Migrate CreateViewStatement to v2 command > - > > Key: SPARK-36871 > URL: https://issues.apache.org/jira/browse/SPARK-36871 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36993) Fix json_tuple throw NPE if fields exist no foldable null value
[ https://issues.apache.org/jira/browse/SPARK-36993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-36993: - Fix Version/s: 3.0.4 > Fix json_tuple throw NPE if fields exist no foldable null value > --- > > Key: SPARK-36993 > URL: https://issues.apache.org/jira/browse/SPARK-36993 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Major > Fix For: 3.1.3, 3.0.4, 3.2.1, 3.3.0 > > > If json_tuple exists no foldable null field, Spark would throw NPE during > eval field.toString. > e.g. the query will fail with: > {code:java} > SELECT json_tuple('{"a":"1"}', if(c1 < 1, null, 'a')) FROM ( SELECT rand() AS > c1 ); > {code} > {code:java} > Caused by: java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.JsonTuple.$anonfun$parseRow$2(jsonExpressions.scala:435) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:286) > at scala.collection.TraversableLike.map$(TraversableLike.scala:279) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.sql.catalyst.expressions.JsonTuple.parseRow(jsonExpressions.scala:435) > at > org.apache.spark.sql.catalyst.expressions.JsonTuple.$anonfun$eval$6(jsonExpressions.scala:413) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org