[jira] [Updated] (SPARK-37016) Publicise UpperCaseCharStream

2021-10-14 Thread dohongdayi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dohongdayi updated SPARK-37016:
---
Fix Version/s: 3.3.0
   3.2.1
   3.0.4
   3.1.3
   2.4.9

> Publicise UpperCaseCharStream
> -
>
> Key: SPARK-37016
> URL: https://issues.apache.org/jira/browse/SPARK-37016
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.3, 2.3.4, 2.4.8, 3.0.3, 3.1.1, 3.1.2, 3.2.0
>Reporter: dohongdayi
>Priority: Major
> Fix For: 2.4.9, 3.1.3, 3.0.4, 3.2.1, 3.3.0
>
>
> Many Spark extension projects are copying `UpperCaseCharStream` because it is 
> private beneath `parser` package, such as:
> [Hudi|https://github.com/apache/hudi/blob/3f8ca1a3552bb866163d3b1648f68d9c4824e21d/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/parser/HoodieCommonSqlParser.scala#L112]
> [Iceberg|https://github.com/apache/iceberg/blob/c3ac4c6ca74a0013b4705d5bd5d17fade8e6f499/spark3-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSparkSqlExtensionsParser.scala#L175]
> [Delta 
> Lake|https://github.com/delta-io/delta/blob/625de3b305f109441ad04b20dba91dd6c4e1d78e/core/src/main/scala/io/delta/sql/parser/DeltaSqlParser.scala#L290]
> [Submarine|https://github.com/apache/submarine/blob/2faebb8efd69833853f62d89b4f1fea1b1718148/submarine-security/spark-security/src/main/scala/org/apache/submarine/spark/security/parser/UpperCaseCharStream.scala#L31]
> [Kyuubi|https://github.com/apache/incubator-kyuubi/blob/8a5134e3223844714fc58833a6859d4df5b68d57/dev/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/zorder/ZorderSparkSqlExtensionsParserBase.scala#L108]
> [Spark-ACID|https://github.com/qubole/spark-acid/blob/19bd6db757677c40f448e85c74d9995ba97d5942/src/main/scala/com/qubole/spark/datasources/hiveacid/sql/catalyst/parser/ParseDriver.scala#L13]
> We can publicise `UpperCaseCharStream` to eliminate code duplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36980) Insert support query with CTE

2021-10-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-36980.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34252
[https://github.com/apache/spark/pull/34252]

> Insert support query with CTE
> -
>
> Key: SPARK-36980
> URL: https://issues.apache.org/jira/browse/SPARK-36980
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.3.0
>
>
> INSERT INTO t_delta (WITH v1(c1) as (values (1)) select 1, 2,3 from v1);  OK
> INSERT INTO t_delta WITH v1(c1) as (values (1)) select 1, 2,3 from v1; FAIL



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36980) Insert support query with CTE

2021-10-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-36980:
---

Assignee: angerszhu

> Insert support query with CTE
> -
>
> Key: SPARK-36980
> URL: https://issues.apache.org/jira/browse/SPARK-36980
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>
> INSERT INTO t_delta (WITH v1(c1) as (values (1)) select 1, 2,3 from v1);  OK
> INSERT INTO t_delta WITH v1(c1) as (values (1)) select 1, 2,3 from v1; FAIL



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37016) Publicise UpperCaseCharStream

2021-10-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37016:


Assignee: Apache Spark

> Publicise UpperCaseCharStream
> -
>
> Key: SPARK-37016
> URL: https://issues.apache.org/jira/browse/SPARK-37016
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.3, 2.3.4, 2.4.8, 3.0.3, 3.1.1, 3.1.2, 3.2.0
>Reporter: dohongdayi
>Assignee: Apache Spark
>Priority: Major
>
> Many Spark extension projects are copying `UpperCaseCharStream` because it is 
> private beneath `parser` package, such as:
> [Hudi|https://github.com/apache/hudi/blob/3f8ca1a3552bb866163d3b1648f68d9c4824e21d/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/parser/HoodieCommonSqlParser.scala#L112]
> [Iceberg|https://github.com/apache/iceberg/blob/c3ac4c6ca74a0013b4705d5bd5d17fade8e6f499/spark3-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSparkSqlExtensionsParser.scala#L175]
> [Delta 
> Lake|https://github.com/delta-io/delta/blob/625de3b305f109441ad04b20dba91dd6c4e1d78e/core/src/main/scala/io/delta/sql/parser/DeltaSqlParser.scala#L290]
> [Submarine|https://github.com/apache/submarine/blob/2faebb8efd69833853f62d89b4f1fea1b1718148/submarine-security/spark-security/src/main/scala/org/apache/submarine/spark/security/parser/UpperCaseCharStream.scala#L31]
> [Kyuubi|https://github.com/apache/incubator-kyuubi/blob/8a5134e3223844714fc58833a6859d4df5b68d57/dev/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/zorder/ZorderSparkSqlExtensionsParserBase.scala#L108]
> [Spark-ACID|https://github.com/qubole/spark-acid/blob/19bd6db757677c40f448e85c74d9995ba97d5942/src/main/scala/com/qubole/spark/datasources/hiveacid/sql/catalyst/parser/ParseDriver.scala#L13]
> We can publicise `UpperCaseCharStream` to eliminate code duplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37016) Publicise UpperCaseCharStream

2021-10-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37016:


Assignee: (was: Apache Spark)

> Publicise UpperCaseCharStream
> -
>
> Key: SPARK-37016
> URL: https://issues.apache.org/jira/browse/SPARK-37016
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.3, 2.3.4, 2.4.8, 3.0.3, 3.1.1, 3.1.2, 3.2.0
>Reporter: dohongdayi
>Priority: Major
>
> Many Spark extension projects are copying `UpperCaseCharStream` because it is 
> private beneath `parser` package, such as:
> [Hudi|https://github.com/apache/hudi/blob/3f8ca1a3552bb866163d3b1648f68d9c4824e21d/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/parser/HoodieCommonSqlParser.scala#L112]
> [Iceberg|https://github.com/apache/iceberg/blob/c3ac4c6ca74a0013b4705d5bd5d17fade8e6f499/spark3-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSparkSqlExtensionsParser.scala#L175]
> [Delta 
> Lake|https://github.com/delta-io/delta/blob/625de3b305f109441ad04b20dba91dd6c4e1d78e/core/src/main/scala/io/delta/sql/parser/DeltaSqlParser.scala#L290]
> [Submarine|https://github.com/apache/submarine/blob/2faebb8efd69833853f62d89b4f1fea1b1718148/submarine-security/spark-security/src/main/scala/org/apache/submarine/spark/security/parser/UpperCaseCharStream.scala#L31]
> [Kyuubi|https://github.com/apache/incubator-kyuubi/blob/8a5134e3223844714fc58833a6859d4df5b68d57/dev/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/zorder/ZorderSparkSqlExtensionsParserBase.scala#L108]
> [Spark-ACID|https://github.com/qubole/spark-acid/blob/19bd6db757677c40f448e85c74d9995ba97d5942/src/main/scala/com/qubole/spark/datasources/hiveacid/sql/catalyst/parser/ParseDriver.scala#L13]
> We can publicise `UpperCaseCharStream` to eliminate code duplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37016) Publicise UpperCaseCharStream

2021-10-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429117#comment-17429117
 ] 

Apache Spark commented on SPARK-37016:
--

User 'dohongdayi' has created a pull request for this issue:
https://github.com/apache/spark/pull/34290

> Publicise UpperCaseCharStream
> -
>
> Key: SPARK-37016
> URL: https://issues.apache.org/jira/browse/SPARK-37016
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.3, 2.3.4, 2.4.8, 3.0.3, 3.1.1, 3.1.2, 3.2.0
>Reporter: dohongdayi
>Priority: Major
>
> Many Spark extension projects are copying `UpperCaseCharStream` because it is 
> private beneath `parser` package, such as:
> [Hudi|https://github.com/apache/hudi/blob/3f8ca1a3552bb866163d3b1648f68d9c4824e21d/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/parser/HoodieCommonSqlParser.scala#L112]
> [Iceberg|https://github.com/apache/iceberg/blob/c3ac4c6ca74a0013b4705d5bd5d17fade8e6f499/spark3-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSparkSqlExtensionsParser.scala#L175]
> [Delta 
> Lake|https://github.com/delta-io/delta/blob/625de3b305f109441ad04b20dba91dd6c4e1d78e/core/src/main/scala/io/delta/sql/parser/DeltaSqlParser.scala#L290]
> [Submarine|https://github.com/apache/submarine/blob/2faebb8efd69833853f62d89b4f1fea1b1718148/submarine-security/spark-security/src/main/scala/org/apache/submarine/spark/security/parser/UpperCaseCharStream.scala#L31]
> [Kyuubi|https://github.com/apache/incubator-kyuubi/blob/8a5134e3223844714fc58833a6859d4df5b68d57/dev/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/zorder/ZorderSparkSqlExtensionsParserBase.scala#L108]
> [Spark-ACID|https://github.com/qubole/spark-acid/blob/19bd6db757677c40f448e85c74d9995ba97d5942/src/main/scala/com/qubole/spark/datasources/hiveacid/sql/catalyst/parser/ParseDriver.scala#L13]
> We can publicise `UpperCaseCharStream` to eliminate code duplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37016) Publicise UpperCaseCharStream

2021-10-14 Thread dohongdayi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429114#comment-17429114
 ] 

dohongdayi commented on SPARK-37016:


I have submitted a PR

[https://github.com/apache/spark/pull/34290|https://github.com/apache/spark/pull/34290]

> Publicise UpperCaseCharStream
> -
>
> Key: SPARK-37016
> URL: https://issues.apache.org/jira/browse/SPARK-37016
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.3, 2.3.4, 2.4.8, 3.0.3, 3.1.1, 3.1.2, 3.2.0
>Reporter: dohongdayi
>Priority: Major
>
> Many Spark extension projects are copying `UpperCaseCharStream` because it is 
> private beneath `parser` package, such as:
> [Hudi|https://github.com/apache/hudi/blob/3f8ca1a3552bb866163d3b1648f68d9c4824e21d/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/parser/HoodieCommonSqlParser.scala#L112]
> [Iceberg|https://github.com/apache/iceberg/blob/c3ac4c6ca74a0013b4705d5bd5d17fade8e6f499/spark3-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSparkSqlExtensionsParser.scala#L175]
> [Delta 
> Lake|https://github.com/delta-io/delta/blob/625de3b305f109441ad04b20dba91dd6c4e1d78e/core/src/main/scala/io/delta/sql/parser/DeltaSqlParser.scala#L290]
> [Submarine|https://github.com/apache/submarine/blob/2faebb8efd69833853f62d89b4f1fea1b1718148/submarine-security/spark-security/src/main/scala/org/apache/submarine/spark/security/parser/UpperCaseCharStream.scala#L31]
> [Kyuubi|https://github.com/apache/incubator-kyuubi/blob/8a5134e3223844714fc58833a6859d4df5b68d57/dev/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/zorder/ZorderSparkSqlExtensionsParserBase.scala#L108]
> [Spark-ACID|https://github.com/qubole/spark-acid/blob/19bd6db757677c40f448e85c74d9995ba97d5942/src/main/scala/com/qubole/spark/datasources/hiveacid/sql/catalyst/parser/ParseDriver.scala#L13]
> We can publicise `UpperCaseCharStream` to eliminate code duplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37016) Publicise UpperCaseCharStream

2021-10-14 Thread dohongdayi (Jira)
dohongdayi created SPARK-37016:
--

 Summary: Publicise UpperCaseCharStream
 Key: SPARK-37016
 URL: https://issues.apache.org/jira/browse/SPARK-37016
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0, 3.1.2, 3.1.1, 3.0.3, 2.4.8, 2.3.4, 2.2.3
Reporter: dohongdayi


Many Spark extension projects are copying `UpperCaseCharStream` because it is 
private beneath `parser` package, such as:

[Hudi|https://github.com/apache/hudi/blob/3f8ca1a3552bb866163d3b1648f68d9c4824e21d/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/parser/HoodieCommonSqlParser.scala#L112]

[Iceberg|https://github.com/apache/iceberg/blob/c3ac4c6ca74a0013b4705d5bd5d17fade8e6f499/spark3-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSparkSqlExtensionsParser.scala#L175]

[Delta 
Lake|https://github.com/delta-io/delta/blob/625de3b305f109441ad04b20dba91dd6c4e1d78e/core/src/main/scala/io/delta/sql/parser/DeltaSqlParser.scala#L290]

[Submarine|https://github.com/apache/submarine/blob/2faebb8efd69833853f62d89b4f1fea1b1718148/submarine-security/spark-security/src/main/scala/org/apache/submarine/spark/security/parser/UpperCaseCharStream.scala#L31]

[Kyuubi|https://github.com/apache/incubator-kyuubi/blob/8a5134e3223844714fc58833a6859d4df5b68d57/dev/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/zorder/ZorderSparkSqlExtensionsParserBase.scala#L108]

[Spark-ACID|https://github.com/qubole/spark-acid/blob/19bd6db757677c40f448e85c74d9995ba97d5942/src/main/scala/com/qubole/spark/datasources/hiveacid/sql/catalyst/parser/ParseDriver.scala#L13]

We can publicise `UpperCaseCharStream` to eliminate code duplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37014) Inline type hints for python/pyspark/streaming/context.py

2021-10-14 Thread dch nguyen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429112#comment-17429112
 ] 

dch nguyen commented on SPARK-37014:


working on this

> Inline type hints for python/pyspark/streaming/context.py
> -
>
> Key: SPARK-37014
> URL: https://issues.apache.org/jira/browse/SPARK-37014
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37015) Inline type hints for python/pyspark/streaming/dstream.py

2021-10-14 Thread dch nguyen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429111#comment-17429111
 ] 

dch nguyen commented on SPARK-37015:


working on this

> Inline type hints for python/pyspark/streaming/dstream.py
> -
>
> Key: SPARK-37015
> URL: https://issues.apache.org/jira/browse/SPARK-37015
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37015) Inline type hints for python/pyspark/streaming/dstream.py

2021-10-14 Thread dch nguyen (Jira)
dch nguyen created SPARK-37015:
--

 Summary: Inline type hints for python/pyspark/streaming/dstream.py
 Key: SPARK-37015
 URL: https://issues.apache.org/jira/browse/SPARK-37015
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: dch nguyen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37014) Inline type hints for python/pyspark/streaming/context.py

2021-10-14 Thread dch nguyen (Jira)
dch nguyen created SPARK-37014:
--

 Summary: Inline type hints for python/pyspark/streaming/context.py
 Key: SPARK-37014
 URL: https://issues.apache.org/jira/browse/SPARK-37014
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: dch nguyen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37013) `select format_string('%0$s', 'Hello')` has different behavior when using java 8 and Java 17

2021-10-14 Thread Yang Jie (Jira)
Yang Jie created SPARK-37013:


 Summary: `select format_string('%0$s', 'Hello')` has different 
behavior when using java 8 and Java 17
 Key: SPARK-37013
 URL: https://issues.apache.org/jira/browse/SPARK-37013
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Yang Jie


{code:java}
--PostgreSQL throw ERROR:  format specifies argument 0, but arguments are 
numbered from 1
select format_string('%0$s', 'Hello');
{code}
Execute with Java 8
{code:java}
-- !query
select format_string('%0$s', 'Hello')
-- !query schema
struct
-- !query output
Hello
{code}
Execute with Java 11
{code:java}
-- !query
select format_string('%0$s', 'Hello')
-- !query schema
struct<>
-- !query output
java.util.IllegalFormatArgumentIndexException
Illegal format argument index = 0
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36964) Reuse CachedDNSToSwitchMapping for yarn container requests

2021-10-14 Thread gaoyajun02 (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaoyajun02 updated SPARK-36964:
---
Description: 
Similar to SPARK-13704​, In some cases, YarnAllocator add container requests 
with locality preference can be expensive, it may call the topology script for 
rack awareness.

When submit a very large job in a very large Yarn cluster, the topology script 
may take signifiant time to run. And this blocks receiving 
YarnSchedulerBackend's RequestExecutors rpc calls, This request comes from 
spark dynamic executor allocation thread, which may blocks the 
ExecutorAllocationListener, and then result in executorManagement queue backlog.

 

Some logs:
{code:java}
21/09/29 12:04:35 INFO spark-dynamic-executor-allocation 
ExecutorAllocationManager: Error reaching cluster manager.21/09/29 12:04:35 
INFO spark-dynamic-executor-allocation ExecutorAllocationManager: Error 
reaching cluster manager.org.apache.spark.rpc.RpcTimeoutException: Futures 
timed out after [120 seconds]. This timeout is controlled by 
spark.rpc.askTimeout at 
org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
 at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
 at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
 at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) 
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:839)
 at 
org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:411)
 at 
org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:361)
 at 
org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:316)
 at 
org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:227)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745)Caused by: 
java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] at 
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259) at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263) at 
org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:294) at 
org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) ... 12 
more21/09/29 12:04:35 WARN spark-dynamic-executor-allocation 
ExecutorAllocationManager: Unable to reach the cluster manager to request 1922 
total executors!

21/09/29 12:04:35 INFO spark-dynamic-executor-allocation 
ExecutorAllocationManager: Error reaching cluster manager.21/09/29 12:04:35 
INFO spark-dynamic-executor-allocation ExecutorAllocationManager: Error 
reaching cluster manager.org.apache.spark.rpc.RpcTimeoutException: Futures 
timed out after [120 seconds]. This timeout is controlled by 
spark.rpc.askTimeout at 
org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
 at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
 at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
 at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) 
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:839)
 at 
org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:411)
 at 
org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:361)
 at 
org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:316)
 at 
org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:227)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThrea

[jira] [Updated] (SPARK-36964) Reuse CachedDNSToSwitchMapping for yarn container requests

2021-10-14 Thread gaoyajun02 (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaoyajun02 updated SPARK-36964:
---
Description: 
Similar to SPARK-13704​, In some cases, YarnAllocator add or remove container 
requests can be expensive, it may call the topology script for rack awareness.

When submit a very large job in a very large Yarn cluster, the topology script 
may take signifiant time to run. And this blocks receiving 
YarnSchedulerBackend's RequestExecutors rpc calls, This request comes from 
spark dynamic executor allocation thread, which may blocks the 
ExecutorAllocationListener,
{code}
12:04:35 INFO spark-dynamic-executor-allocation ExecutorAllocationManager: 
Error reaching cluster manager.21/09/29 12:04:35 INFO 
spark-dynamic-executor-allocation ExecutorAllocationManager: Error reaching 
cluster manager.org.apache.spark.rpc.RpcTimeoutException: Futures timed out 
after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout at 
org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
 at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
 at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
 at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) 
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:839)
 at 
org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:411)
 at 
org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:361)
 at 
org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:316)
 at 
org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:227)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745)Caused by: 
java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] at 
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259) at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263) at 
org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:294) at 
org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) ... 12 
more21/09/29 12:04:35 WARN spark-dynamic-executor-allocation 
ExecutorAllocationManager: Unable to reach the cluster manager to request 1922 
total executors!{code}
and then result in executorManagement queue backlog. e.g. some log:
{code}
21/09/29 12:02:49 ERROR dag-scheduler-event-loop AsyncEventQueue: Dropping 
event from queue executorManagement. This likely means one of the listeners is 
too slow and cannot keep up with the rate at which tasks are being started by 
the scheduler.
21/09/29 12:02:49 WARN dag-scheduler-event-loop AsyncEventQueue: Dropped 1 
events from executorManagement since the application started.
21/09/29 12:02:55 INFO spark-listener-group-eventLog AsyncEventQueue: Process 
of event 
SparkListenerExecutorAdded(1632888172920,543,org.apache.spark.scheduler.cluster.ExecutorData@8cfab8f5,None)
 by listener EventLoggingListener took 3.037686034s.
21/09/29 12:03:03 INFO spark-listener-group-eventLog AsyncEventQueue: Process 
of event SparkListenerBlockManagerAdded(1632888181779,BlockManagerId(1359, --, 
57233, None),2704696934,Some(2704696934),Some(0)) by listener 
EventLoggingListener took 1.462598355s.
21/09/29 12:03:49 WARN dispatcher-BlockManagerMaster AsyncEventQueue: Dropped 
74388 events from executorManagement since Wed Sep 29 12:02:49 CST 2021.
21/09/29 12:04:35 INFO spark-listener-group-executorManagement AsyncEventQueue: 
Process of event 
SparkListenerStageSubmitted(org.apache.spark.scheduler.StageInfo@52f810ad,{...})
 by listener ExecutorAllocationListener took 116.526408932s.
21/09/29 12:04:49 WARN heartbeat-receiver-event-loop-thread AsyncEventQueue: 
Dropped 18892 events from executorManagement since Wed Sep 29 12:03:49 CST 2021.
21/09/29 12:05:49 WARN dispatcher-BlockManagerMaster AsyncEventQueue: Dropped 
19397 events from executorManagement since Wed Sep 29 12:04:49 CST 2021.
{code}

  was:
Similar to SPARK-13704​, In some cases, YarnAllocator add or r

[jira] [Updated] (SPARK-36964) Reuse CachedDNSToSwitchMapping for yarn container requests

2021-10-14 Thread gaoyajun02 (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaoyajun02 updated SPARK-36964:
---
Description: 
Similar to SPARK-13704​, In some cases, YarnAllocator add or remove container 
requests can be expensive, it may call the topology script for rack awareness.

When submit a very large job in a very large Yarn cluster, the topology script 
may take signifiant time to run. And this blocks receiving 
YarnSchedulerBackend's RequestExecutors rpc calls, This request comes from 
spark dynamic executor allocation thread, which may blocks the 
ExecutorAllocationListener,
{code:text}
21/09/29 12:04:35 INFO spark-dynamic-executor-allocation 
ExecutorAllocationManager: Error reaching cluster manager.21/09/29 12:04:35 
INFO spark-dynamic-executor-allocation ExecutorAllocationManager: Error 
reaching cluster manager.org.apache.spark.rpc.RpcTimeoutException: Futures 
timed out after [120 seconds]. This timeout is controlled by 
spark.rpc.askTimeout at 
org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
 at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
 at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
 at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) 
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:839)
 at 
org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:411)
 at 
org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:361)
 at 
org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:316)
 at 
org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:227)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745)Caused by: 
java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] at 
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259) at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263) at 
org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:294) at 
org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) ... 12 
more21/09/29 12:04:35 WARN spark-dynamic-executor-allocation 
ExecutorAllocationManager: Unable to reach the cluster manager to request 1922 
total executors!{code}
and then result in executorManagement queue backlog. e.g. some log:
{code:text}
21/09/29 12:02:49 ERROR dag-scheduler-event-loop AsyncEventQueue: Dropping 
event from queue executorManagement. This likely means one of the listeners is 
too slow and cannot keep up with the rate at which tasks are being started by 
the scheduler.
21/09/29 12:02:49 WARN dag-scheduler-event-loop AsyncEventQueue: Dropped 1 
events from executorManagement since the application started.
21/09/29 12:02:55 INFO spark-listener-group-eventLog AsyncEventQueue: Process 
of event 
SparkListenerExecutorAdded(1632888172920,543,org.apache.spark.scheduler.cluster.ExecutorData@8cfab8f5,None)
 by listener EventLoggingListener took 3.037686034s.
21/09/29 12:03:03 INFO spark-listener-group-eventLog AsyncEventQueue: Process 
of event SparkListenerBlockManagerAdded(1632888181779,BlockManagerId(1359, --, 
57233, None),2704696934,Some(2704696934),Some(0)) by listener 
EventLoggingListener took 1.462598355s.
21/09/29 12:03:49 WARN dispatcher-BlockManagerMaster AsyncEventQueue: Dropped 
74388 events from executorManagement since Wed Sep 29 12:02:49 CST 2021.
21/09/29 12:04:35 INFO spark-listener-group-executorManagement AsyncEventQueue: 
Process of event 
SparkListenerStageSubmitted(org.apache.spark.scheduler.StageInfo@52f810ad,{...})
 by listener ExecutorAllocationListener took 116.526408932s.
21/09/29 12:04:49 WARN heartbeat-receiver-event-loop-thread AsyncEventQueue: 
Dropped 18892 events from executorManagement since Wed Sep 29 12:03:49 CST 2021.
21/09/29 12:05:49 WARN dispatcher-BlockManagerMaster AsyncEventQueue: Dropped 
19397 events from executorManagement since Wed Sep 29 12:04:49 CST 2021.
{code}



  was:
Similar to SPARK-13704​, In some cases, 

[jira] [Assigned] (SPARK-36945) Inline type hints for python/pyspark/sql/udf.py

2021-10-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36945:


Assignee: Apache Spark

> Inline type hints for python/pyspark/sql/udf.py
> ---
>
> Key: SPARK-36945
> URL: https://issues.apache.org/jira/browse/SPARK-36945
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36945) Inline type hints for python/pyspark/sql/udf.py

2021-10-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36945:


Assignee: (was: Apache Spark)

> Inline type hints for python/pyspark/sql/udf.py
> ---
>
> Key: SPARK-36945
> URL: https://issues.apache.org/jira/browse/SPARK-36945
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36945) Inline type hints for python/pyspark/sql/udf.py

2021-10-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429100#comment-17429100
 ] 

Apache Spark commented on SPARK-36945:
--

User 'dchvn' has created a pull request for this issue:
https://github.com/apache/spark/pull/34289

> Inline type hints for python/pyspark/sql/udf.py
> ---
>
> Key: SPARK-36945
> URL: https://issues.apache.org/jira/browse/SPARK-36945
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36337) decimal('Nan') is unsupported in net.razorvine.pickle

2021-10-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-36337:


Assignee: Yikun Jiang

> decimal('Nan') is unsupported in net.razorvine.pickle 
> --
>
> Key: SPARK-36337
> URL: https://issues.apache.org/jira/browse/SPARK-36337
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
>
> Decimal('NaN') is not supported by net.razorvine.pickle now.
> In Python
> {code:java}
> >>> pickled = cloudpickle.dumps(decimal.Decimal('NaN'))
> b'\x80\x05\x95!\x00\x00\x00\x00\x00\x00\x00\x8c\x07decimal\x94\x8c\x07Decimal\x94\x93\x94\x8c\x03NaN\x94\x85\x94R\x94.'
> >>> pickle.loads(pickled)
> Decimal('NaN')
> {code}
> In Scala
> {code:java}
> scala> import net.razorvine.pickle.\{Pickler, Unpickler, PickleUtils}
> scala> val unpickle = new Unpickler
> scala> 
> unpickle.loads(PickleUtils.str2bytes("\u0080\u0005\u0095!\u\u\u\u\u\u\u\u008c\u0007decimal\u0094\u008c\u0007Decimal\u0094\u0093\u0094\u008c\u0003NaN\u0094\u0085\u0094R\u0094."))
> net.razorvine.pickle.PickleException: problem construction object: 
> java.lang.reflect.InvocationTargetException
>  at 
> net.razorvine.pickle.objects.AnyClassConstructor.construct(AnyClassConstructor.java:29)
>  at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:773)
>  at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:213)
>  at net.razorvine.pickle.Unpickler.load(Unpickler.java:123)
>  at net.razorvine.pickle.Unpickler.loads(Unpickler.java:136)
>  ... 48 elided
> {code}
> I submit an issue in pickle upstream 
> [https://github.com/irmen/pickle/issues/7] .
> we should bump pickle latest version after it fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36337) decimal('Nan') is unsupported in net.razorvine.pickle

2021-10-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-36337.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34285
[https://github.com/apache/spark/pull/34285]

> decimal('Nan') is unsupported in net.razorvine.pickle 
> --
>
> Key: SPARK-36337
> URL: https://issues.apache.org/jira/browse/SPARK-36337
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.3.0
>
>
> Decimal('NaN') is not supported by net.razorvine.pickle now.
> In Python
> {code:java}
> >>> pickled = cloudpickle.dumps(decimal.Decimal('NaN'))
> b'\x80\x05\x95!\x00\x00\x00\x00\x00\x00\x00\x8c\x07decimal\x94\x8c\x07Decimal\x94\x93\x94\x8c\x03NaN\x94\x85\x94R\x94.'
> >>> pickle.loads(pickled)
> Decimal('NaN')
> {code}
> In Scala
> {code:java}
> scala> import net.razorvine.pickle.\{Pickler, Unpickler, PickleUtils}
> scala> val unpickle = new Unpickler
> scala> 
> unpickle.loads(PickleUtils.str2bytes("\u0080\u0005\u0095!\u\u\u\u\u\u\u\u008c\u0007decimal\u0094\u008c\u0007Decimal\u0094\u0093\u0094\u008c\u0003NaN\u0094\u0085\u0094R\u0094."))
> net.razorvine.pickle.PickleException: problem construction object: 
> java.lang.reflect.InvocationTargetException
>  at 
> net.razorvine.pickle.objects.AnyClassConstructor.construct(AnyClassConstructor.java:29)
>  at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:773)
>  at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:213)
>  at net.razorvine.pickle.Unpickler.load(Unpickler.java:123)
>  at net.razorvine.pickle.Unpickler.loads(Unpickler.java:136)
>  ... 48 elided
> {code}
> I submit an issue in pickle upstream 
> [https://github.com/irmen/pickle/issues/7] .
> we should bump pickle latest version after it fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37012) Disable pinned thread mode by default

2021-10-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37012:


Assignee: Apache Spark

> Disable pinned thread mode by default
> -
>
> Key: SPARK-37012
> URL: https://issues.apache.org/jira/browse/SPARK-37012
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Blocker
>
> Pinned thread mode was enabled by default in Spark 3.2 (SPARK-35303). 
> However, it causes some breaking changes such as SPARK-37004. Maybe we should 
> disable it by default for now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37012) Disable pinned thread mode by default

2021-10-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429086#comment-17429086
 ] 

Apache Spark commented on SPARK-37012:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/34288

> Disable pinned thread mode by default
> -
>
> Key: SPARK-37012
> URL: https://issues.apache.org/jira/browse/SPARK-37012
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Blocker
>
> Pinned thread mode was enabled by default in Spark 3.2 (SPARK-35303). 
> However, it causes some breaking changes such as SPARK-37004. Maybe we should 
> disable it by default for now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37012) Disable pinned thread mode by default

2021-10-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37012:


Assignee: (was: Apache Spark)

> Disable pinned thread mode by default
> -
>
> Key: SPARK-37012
> URL: https://issues.apache.org/jira/browse/SPARK-37012
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Blocker
>
> Pinned thread mode was enabled by default in Spark 3.2 (SPARK-35303). 
> However, it causes some breaking changes such as SPARK-37004. Maybe we should 
> disable it by default for now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37010) Remove unnecessary "noqa: F401" comments in pandas-on-Spark

2021-10-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37010.
--
Fix Version/s: 3.3.0
 Assignee: Takuya Ueshin
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/34287

> Remove unnecessary "noqa: F401" comments in pandas-on-Spark
> ---
>
> Key: SPARK-37010
> URL: https://issues.apache.org/jira/browse/SPARK-37010
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.3.0
>
>
> After upgrading flake8 in Jenkins, there are still unnecessary {{noqa: F401}} 
> comments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37005) pyspark os.getenv('SPARK_YARN_STAGING_DIR') can not get job path

2021-10-14 Thread WeiNan Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WeiNan Zhao updated SPARK-37005:

Description: 
when  i submit a spark job , use spark-submit and set option --files file1, 

then in python code, i use 
{code:java}
//代码占位符
path = str(os.environ["SPARK_YARN_STAGING_DIR"])
{code}
return path is None, but this can success in java code
{code:java}
//代码占位符
spark.read().textFile(System.getenv("SPARK_YARN_STAGING_DIR") + "/README.md")
{code}
which cause this problem.

 

  was:
hi, all

 i commit a spark job , use spark-submit and set option --files file1,

then in code,i use 
{code:java}
//代码占位符
path = str(os.environ["SPARK_YARN_STAGING_DIR"])
{code}
but path is None, this can success in java code
{code:java}
//代码占位符
spark.read().textFile(System.getenv("SPARK_YARN_STAGING_DIR") + "/README.md")
{code}
which cause this problem.

 


> pyspark os.getenv('SPARK_YARN_STAGING_DIR') can not get job path
> 
>
> Key: SPARK-37005
> URL: https://issues.apache.org/jira/browse/SPARK-37005
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.4
> Environment: python2.7
> spark2.4
>Reporter: WeiNan Zhao
>Priority: Major
>  Labels: python, spark-core
>
> when  i submit a spark job , use spark-submit and set option --files file1, 
> then in python code, i use 
> {code:java}
> //代码占位符
> path = str(os.environ["SPARK_YARN_STAGING_DIR"])
> {code}
> return path is None, but this can success in java code
> {code:java}
> //代码占位符
> spark.read().textFile(System.getenv("SPARK_YARN_STAGING_DIR") + "/README.md")
> {code}
> which cause this problem.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37006) MapStatus adds localDirs to avoid the rpc request by method getHostLocalDirs when shuffle reading

2021-10-14 Thread jinhai (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429079#comment-17429079
 ] 

jinhai commented on SPARK-37006:


Or whether we can generate localDirs based on appId and execId, just like 
DiskBlockManager.getFile, so that we don't need to save localDirs in MapStatus, 
just add appId.

> MapStatus adds localDirs to avoid the rpc request by method getHostLocalDirs 
> when shuffle reading
> -
>
> Key: SPARK-37006
> URL: https://issues.apache.org/jira/browse/SPARK-37006
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.1.2
>Reporter: jinhai
>Priority: Major
>
> In shuffle reading, in order to get the hostLocalDirs value when executing 
> fetchHostLocalBlocks, we need ExternalBlockStoreClient or 
> NettyBlockTransferService to make a rpc request.
> And when externalShuffleServiceEnabled, there is no need to registerExecutor 
> and so on in the ExternalShuffleBlockResolver class.
> Throughout the spark shuffle module, a lot of code logic is written to deal 
> with localDirs.
> We can directly add localDirs to the BlockManagerId class of MapStatus to get 
> datafile and indexfile.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36525) DS V2 Index Support

2021-10-14 Thread dch nguyen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429077#comment-17429077
 ] 

dch nguyen commented on SPARK-36525:


[~huaxingao] yes, i'd like to

> DS V2 Index Support
> ---
>
> Key: SPARK-36525
> URL: https://issues.apache.org/jira/browse/SPARK-36525
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Priority: Major
>
> Many data sources support index to improvement query performance. In order to 
> take advantage of the index support in data source, the following APIs will 
> be added for working with indexes:
> {code:java}
>   /**
>* Creates an index.
>*
>* @param indexName the name of the index to be created
>* @param indexType the IndexType of the index to be created
>* @param table the table on which index to be created
>* @param columns the columns on which index to be created
>* @param properties the properties of the index to be created
>* @throws IndexAlreadyExistsException If the index already exists 
> (optional)
>* @throws UnsupportedOperationException If create index is not a supported 
> operation
>*/
>   void createIndex(String indexName,
>   String indexType,
>   Identifier table,
>   FieldReference[] columns,
>   Map properties)
>   throws IndexAlreadyExistsException, UnsupportedOperationException;
>   /**
>* Soft deletes the index with the given name.
>* Deleted index can be restored by calling restoreIndex.
>*
>* @param indexName the name of the index to be deleted
>* @return true if the index is deleted
>* @throws NoSuchIndexException If the index does not exist (optional)
>* @throws UnsupportedOperationException If delete index is not a supported 
> operation
>*/
>   default boolean deleteIndex(String indexName)
>   throws NoSuchIndexException, UnsupportedOperationException
>   /**
>* Checks whether an index exists.
>*
>* @param indexName the name of the index
>* @return true if the index exists, false otherwise
>*/
>   boolean indexExists(String indexName);
>   /**
>* Lists all the indexes in a table.
>*
>* @param table the table to be checked on for indexes
>* @throws NoSuchTableException
>*/
>   Index[] listIndexes(Identifier table) throws NoSuchTableException;
>   /**
>* Hard deletes the index with the given name.
>* The Index can't be restored once dropped.
>*
>* @param indexName the name of the index to be dropped.
>* @return true if the index is dropped
>* @throws NoSuchIndexException If the index does not exist (optional)
>* @throws UnsupportedOperationException If drop index is not a supported 
> operation
>*/
>   boolean dropIndex(String indexName) throws NoSuchIndexException, 
> UnsupportedOperationException;
>   /**
>* Restores the index with the given name.
>* Deleted index can be restored by calling restoreIndex, but dropped index 
> can't be restored.
>*
>* @param indexName the name of the index to be restored
>* @return true if the index is restored
>* @throws NoSuchIndexException If the index does not exist (optional)
>* @throws UnsupportedOperationException
>*/
>   default boolean restoreIndex(String indexName)
>   throws NoSuchIndexException, UnsupportedOperationException
>   /**
>* Refreshes index using the latest data. This causes the index to be 
> rebuilt.
>*
>* @param indexName the name of the index to be rebuilt
>* @return true if the index is rebuilt
>* @throws NoSuchIndexException If the index does not exist (optional)
>* @throws UnsupportedOperationException
>*/
>   default boolean refreshIndex(String indexName)
>   throws NoSuchIndexException, UnsupportedOperationException
>   /**
>* Alter Index using the new property. This causes the index to be rebuilt.
>*
>* @param indexName the name of the index to be altered
>* @return true if the index is altered
>* @throws NoSuchIndexException If the index does not exist (optional)
>* @throws UnsupportedOperationException
>*/
>   default boolean alterIndex(String indexName, Properties properties)
>   throws NoSuchIndexException, UnsupportedOperationException
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37012) Disable pinned thread mode by default

2021-10-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37012:
-
Priority: Blocker  (was: Major)

> Disable pinned thread mode by default
> -
>
> Key: SPARK-37012
> URL: https://issues.apache.org/jira/browse/SPARK-37012
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Blocker
>
> Pinned thread mode was enabled by default in Spark 3.2 (SPARK-35303). 
> However, it causes some breaking changes such as SPARK-37004. Maybe we should 
> disable it by default for now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37012) Disable pinned thread mode by default

2021-10-14 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-37012:


 Summary: Disable pinned thread mode by default
 Key: SPARK-37012
 URL: https://issues.apache.org/jira/browse/SPARK-37012
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Hyukjin Kwon


Pinned thread mode was enabled by default in Spark 3.2 (SPARK-35303). However, 
it causes some breaking changes such as SPARK-37004. Maybe we should disable it 
by default for now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37011) Upgrade flake8 to 3.9.0 or above in Jenkins

2021-10-14 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-37011:
--
Description: 
In flake8 < 3.9.0, F401 error occurs for imports when the imported identities 
are used in a {{bound}} argument in {{TypeVar(..., bound="XXX")}}.

For example:

{code:python}
if TYPE_CHECKING:
from pyspark.pandas.base import IndexOpsMixin

IndexOpsLike = TypeVar("IndexOpsLike", bound="IndexOpsMixin")
{code}


Since this behavior is fixed In flake8 >= 3.9.0, we should upgrade the flake8 
installed in Jenkins to 3.9.0 or above.
And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from 
3.8.0 to 3.9.0.

  was:
In flake8 < 3.9.0, F401 error occurs for imports when the imported identities 
are used in a {{bound}} argument in {{TypeVar(..., bound="XXX")}}.

For example:

{code:python}

if TYPE_CHECKING:
from pyspark.pandas.base import IndexOpsMixin

IndexOpsLike = TypeVar("IndexOpsLike", bound="IndexOpsMixin")
{code}


Since this behavior is fixed In flake8 >= 3.9.0, we should upgrade the flake8 
installed in Jenkins to 3.9.0 or above.
And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from 
3.8.0 to 3.9.0.


> Upgrade flake8 to 3.9.0 or above in Jenkins
> ---
>
> Key: SPARK-37011
> URL: https://issues.apache.org/jira/browse/SPARK-37011
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> In flake8 < 3.9.0, F401 error occurs for imports when the imported identities 
> are used in a {{bound}} argument in {{TypeVar(..., bound="XXX")}}.
> For example:
> {code:python}
> if TYPE_CHECKING:
> from pyspark.pandas.base import IndexOpsMixin
> IndexOpsLike = TypeVar("IndexOpsLike", bound="IndexOpsMixin")
> {code}
> Since this behavior is fixed In flake8 >= 3.9.0, we should upgrade the flake8 
> installed in Jenkins to 3.9.0 or above.
> And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from 
> 3.8.0 to 3.9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37011) Upgrade flake8 to 3.9.0 or above in Jenkins

2021-10-14 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-37011:
--
Description: 
In flake8 < 3.9.0, F401 error occurs for imports when the imported identities 
are used in a {{bound}} argument in {{TypeVar(..., bound="XXX")}}.

For example:

{code:python}

if TYPE_CHECKING:
from pyspark.pandas.base import IndexOpsMixin

IndexOpsLike = TypeVar("IndexOpsLike", bound="IndexOpsMixin")
{code}


Since this behavior is fixed In flake8 >= 3.9.0, we should upgrade the flake8 
installed in Jenkins to 3.9.0 or above.
And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from 
3.8.0 to 3.9.0.

  was:
In flake8 < 3.9.0, F401 error occurs for imports when the impo

Since this behavior is fixed In flake8 >= 3.9.0, we should upgrade the flake8 
installed in Jenkins to 3.8.0 or above. Otherwise, it occurs F401 error for 
several lines in pandas-on-PySpark that uses TYPE_CHECKING.

And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from 
3.8.0 to 3.9.0.


> Upgrade flake8 to 3.9.0 or above in Jenkins
> ---
>
> Key: SPARK-37011
> URL: https://issues.apache.org/jira/browse/SPARK-37011
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> In flake8 < 3.9.0, F401 error occurs for imports when the imported identities 
> are used in a {{bound}} argument in {{TypeVar(..., bound="XXX")}}.
> For example:
> {code:python}
> if TYPE_CHECKING:
> from pyspark.pandas.base import IndexOpsMixin
> IndexOpsLike = TypeVar("IndexOpsLike", bound="IndexOpsMixin")
> {code}
> Since this behavior is fixed In flake8 >= 3.9.0, we should upgrade the flake8 
> installed in Jenkins to 3.9.0 or above.
> And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from 
> 3.8.0 to 3.9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37011) Upgrade flake8 to 3.9.0 or above in Jenkins

2021-10-14 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-37011:
--
Description: 
In flake8 < 3.9.0, F401 error occurs for imports when the impo

Since this behavior is fixed In flake8 >= 3.9.0, we should upgrade the flake8 
installed in Jenkins to 3.8.0 or above. Otherwise, it occurs F401 error for 
several lines in pandas-on-PySpark that uses TYPE_CHECKING.

And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from 
3.8.0 to 3.9.0.

  was:
In flake8 < 3.8.0, F401 error occurs for imports in *if* statements when 
TYPE_CHECKING is True. However, TYPE_CHECKING is always False at runtime, so 
there is no need to treat it as an error in static analysis.

Since this behavior is fixed In flake8 >= 3.9.0, we should upgrade the flake8 
installed in Jenkins to 3.8.0 or above. Otherwise, it occurs F401 error for 
several lines in pandas-on-PySpark that uses TYPE_CHECKING.

And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from 
3.8.0 to 3.9.0.


> Upgrade flake8 to 3.9.0 or above in Jenkins
> ---
>
> Key: SPARK-37011
> URL: https://issues.apache.org/jira/browse/SPARK-37011
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> In flake8 < 3.9.0, F401 error occurs for imports when the impo
> Since this behavior is fixed In flake8 >= 3.9.0, we should upgrade the flake8 
> installed in Jenkins to 3.8.0 or above. Otherwise, it occurs F401 error for 
> several lines in pandas-on-PySpark that uses TYPE_CHECKING.
> And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from 
> 3.8.0 to 3.9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37011) Upgrade flake8 to 3.9.0 or above in Jenkins

2021-10-14 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-37011:
--
Fix Version/s: (was: 3.3.0)

> Upgrade flake8 to 3.9.0 or above in Jenkins
> ---
>
> Key: SPARK-37011
> URL: https://issues.apache.org/jira/browse/SPARK-37011
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> In flake8 < 3.8.0, F401 error occurs for imports in *if* statements when 
> TYPE_CHECKING is True. However, TYPE_CHECKING is always False at runtime, so 
> there is no need to treat it as an error in static analysis.
> Since this behavior is fixed In flake8 >= 3.9.0, we should upgrade the flake8 
> installed in Jenkins to 3.8.0 or above. Otherwise, it occurs F401 error for 
> several lines in pandas-on-PySpark that uses TYPE_CHECKING.
> And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from 
> 3.8.0 to 3.9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37011) Upgrade flake8 to 3.9.0 or above in Jenkins

2021-10-14 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-37011:
--
Affects Version/s: (was: 3.2.0)
   3.3.0

> Upgrade flake8 to 3.9.0 or above in Jenkins
> ---
>
> Key: SPARK-37011
> URL: https://issues.apache.org/jira/browse/SPARK-37011
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
> Fix For: 3.3.0
>
>
> In flake8 < 3.8.0, F401 error occurs for imports in *if* statements when 
> TYPE_CHECKING is True. However, TYPE_CHECKING is always False at runtime, so 
> there is no need to treat it as an error in static analysis.
> Since this behavior is fixed In flake8 >= 3.9.0, we should upgrade the flake8 
> installed in Jenkins to 3.8.0 or above. Otherwise, it occurs F401 error for 
> several lines in pandas-on-PySpark that uses TYPE_CHECKING.
> And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from 
> 3.8.0 to 3.9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37011) Upgrade flake8 to 3.9.0 or above in Jenkins

2021-10-14 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-37011:
--
Description: 
In flake8 < 3.8.0, F401 error occurs for imports in *if* statements when 
TYPE_CHECKING is True. However, TYPE_CHECKING is always False at runtime, so 
there is no need to treat it as an error in static analysis.

Since this behavior is fixed In flake8 >= 3.9.0, we should upgrade the flake8 
installed in Jenkins to 3.8.0 or above. Otherwise, it occurs F401 error for 
several lines in pandas-on-PySpark that uses TYPE_CHECKING.

And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from 
3.8.0 to 3.9.0.

  was:
In flake8 < 3.8.0, F401 error occurs for imports in *if* statements when 
TYPE_CHECKING is True. However, TYPE_CHECKING is always False at runtime, so 
there is no need to treat it as an error in static analysis.

Since this behavior is fixed In flake8 >= 3.8.0, we should upgrade the flake8 
installed in Jenkins to 3.8.0 or above. Otherwise, it occurs F401 error for 
several lines in pandas-on-PySpark that uses TYPE_CHECKING.

And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from 
3.5.0 to 3.8.0.


> Upgrade flake8 to 3.9.0 or above in Jenkins
> ---
>
> Key: SPARK-37011
> URL: https://issues.apache.org/jira/browse/SPARK-37011
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
> Fix For: 3.3.0
>
>
> In flake8 < 3.8.0, F401 error occurs for imports in *if* statements when 
> TYPE_CHECKING is True. However, TYPE_CHECKING is always False at runtime, so 
> there is no need to treat it as an error in static analysis.
> Since this behavior is fixed In flake8 >= 3.9.0, we should upgrade the flake8 
> installed in Jenkins to 3.8.0 or above. Otherwise, it occurs F401 error for 
> several lines in pandas-on-PySpark that uses TYPE_CHECKING.
> And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from 
> 3.8.0 to 3.9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37011) Upgrade flake8 to 3.9.0 or above in Jenkins

2021-10-14 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-37011:
--
Reporter: Takuya Ueshin  (was: Haejoon Lee)

> Upgrade flake8 to 3.9.0 or above in Jenkins
> ---
>
> Key: SPARK-37011
> URL: https://issues.apache.org/jira/browse/SPARK-37011
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
> Fix For: 3.3.0
>
>
> In flake8 < 3.8.0, F401 error occurs for imports in *if* statements when 
> TYPE_CHECKING is True. However, TYPE_CHECKING is always False at runtime, so 
> there is no need to treat it as an error in static analysis.
> Since this behavior is fixed In flake8 >= 3.8.0, we should upgrade the flake8 
> installed in Jenkins to 3.8.0 or above. Otherwise, it occurs F401 error for 
> several lines in pandas-on-PySpark that uses TYPE_CHECKING.
> And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from 
> 3.5.0 to 3.8.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37011) Upgrade flake8 to 3.9.0 or above in Jenkins

2021-10-14 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin reassigned SPARK-37011:
-

Assignee: (was: Shane Knapp)

> Upgrade flake8 to 3.9.0 or above in Jenkins
> ---
>
> Key: SPARK-37011
> URL: https://issues.apache.org/jira/browse/SPARK-37011
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Haejoon Lee
>Priority: Major
> Fix For: 3.3.0
>
>
> In flake8 < 3.8.0, F401 error occurs for imports in *if* statements when 
> TYPE_CHECKING is True. However, TYPE_CHECKING is always False at runtime, so 
> there is no need to treat it as an error in static analysis.
> Since this behavior is fixed In flake8 >= 3.8.0, we should upgrade the flake8 
> installed in Jenkins to 3.8.0 or above. Otherwise, it occurs F401 error for 
> several lines in pandas-on-PySpark that uses TYPE_CHECKING.
> And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from 
> 3.5.0 to 3.8.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37011) Upgrade flake8 to 3.9.0 or above in Jenkins

2021-10-14 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-37011:
-

 Summary: Upgrade flake8 to 3.9.0 or above in Jenkins
 Key: SPARK-37011
 URL: https://issues.apache.org/jira/browse/SPARK-37011
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Haejoon Lee
Assignee: Shane Knapp
 Fix For: 3.3.0


In flake8 < 3.8.0, F401 error occurs for imports in *if* statements when 
TYPE_CHECKING is True. However, TYPE_CHECKING is always False at runtime, so 
there is no need to treat it as an error in static analysis.

Since this behavior is fixed In flake8 >= 3.8.0, we should upgrade the flake8 
installed in Jenkins to 3.8.0 or above. Otherwise, it occurs F401 error for 
several lines in pandas-on-PySpark that uses TYPE_CHECKING.

And also we might update the {{MINIMUM_FLAKE8}} in the {{lint-python}} from 
3.5.0 to 3.8.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36942) Inline type hints for python/pyspark/sql/readwriter.py

2021-10-14 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-36942.
---
Fix Version/s: 3.3.0
 Assignee: Xinrong Meng
   Resolution: Fixed

Issue resolved by pull request 34216
https://github.com/apache/spark/pull/34216

> Inline type hints for python/pyspark/sql/readwriter.py
> --
>
> Key: SPARK-36942
> URL: https://issues.apache.org/jira/browse/SPARK-36942
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.3.0
>
>
> Inline type hints for python/pyspark/sql/readwriter.py.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37010) Remove unnecessary "noqa: F401" comments in pandas-on-Spark

2021-10-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429041#comment-17429041
 ] 

Apache Spark commented on SPARK-37010:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/34287

> Remove unnecessary "noqa: F401" comments in pandas-on-Spark
> ---
>
> Key: SPARK-37010
> URL: https://issues.apache.org/jira/browse/SPARK-37010
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> After upgrading flake8 in Jenkins, there are still unnecessary {{noqa: F401}} 
> comments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37010) Remove unnecessary "noqa: F401" comments in pandas-on-Spark

2021-10-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429040#comment-17429040
 ] 

Apache Spark commented on SPARK-37010:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/34287

> Remove unnecessary "noqa: F401" comments in pandas-on-Spark
> ---
>
> Key: SPARK-37010
> URL: https://issues.apache.org/jira/browse/SPARK-37010
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> After upgrading flake8 in Jenkins, there are still unnecessary {{noqa: F401}} 
> comments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37010) Remove unnecessary "noqa: F401" comments in pandas-on-Spark

2021-10-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37010:


Assignee: (was: Apache Spark)

> Remove unnecessary "noqa: F401" comments in pandas-on-Spark
> ---
>
> Key: SPARK-37010
> URL: https://issues.apache.org/jira/browse/SPARK-37010
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> After upgrading flake8 in Jenkins, there are still unnecessary {{noqa: F401}} 
> comments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37010) Remove unnecessary "noqa: F401" comments in pandas-on-Spark

2021-10-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37010:


Assignee: Apache Spark

> Remove unnecessary "noqa: F401" comments in pandas-on-Spark
> ---
>
> Key: SPARK-37010
> URL: https://issues.apache.org/jira/browse/SPARK-37010
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>
> After upgrading flake8 in Jenkins, there are still unnecessary {{noqa: F401}} 
> comments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37010) Remove unnecessary "noqa: F401" comments in pandas-on-Spark

2021-10-14 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-37010:
-

 Summary: Remove unnecessary "noqa: F401" comments in 
pandas-on-Spark
 Key: SPARK-37010
 URL: https://issues.apache.org/jira/browse/SPARK-37010
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Takuya Ueshin


After upgrading flake8 in Jenkins, there are still unnecessary {{noqa: F401}} 
comments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23626) DAGScheduler blocked due to JobSubmitted event

2021-10-14 Thread Josh Rosen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-23626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-23626.

Fix Version/s: 3.3.0
   3.2.1
   3.0.4
   3.1.3
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/34265

>  DAGScheduler blocked due to JobSubmitted event
> ---
>
> Key: SPARK-23626
> URL: https://issues.apache.org/jira/browse/SPARK-23626
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core
>Affects Versions: 2.2.1, 2.3.3, 2.4.3, 3.0.0
>Reporter: Ajith S
>Assignee: Josh Rosen
>Priority: Major
> Fix For: 3.1.3, 3.0.4, 3.2.1, 3.3.0
>
>
> DAGScheduler becomes a bottleneck in cluster when multiple JobSubmitted 
> events has to be processed as DAGSchedulerEventProcessLoop is single threaded 
> and it will block other tasks in queue like TaskCompletion.
> The JobSubmitted event is time consuming depending on the nature of the job 
> (Example: calculating parent stage dependencies, shuffle dependencies, 
> partitions) and thus it blocks all the events to be processed.
>  
> I see multiple JIRA referring to this behavior
> https://issues.apache.org/jira/browse/SPARK-2647
> https://issues.apache.org/jira/browse/SPARK-4961
>  
> Similarly in my cluster some jobs partition calculation is time consuming 
> (Similar to stack at SPARK-2647) hence it slows down the spark 
> DAGSchedulerEventProcessLoop which results in user jobs to slowdown, even if 
> its tasks are finished within seconds, as TaskCompletion Events are processed 
> at a slower rate due to blockage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23626) DAGScheduler blocked due to JobSubmitted event

2021-10-14 Thread Josh Rosen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-23626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen reassigned SPARK-23626:
--

Assignee: Josh Rosen

>  DAGScheduler blocked due to JobSubmitted event
> ---
>
> Key: SPARK-23626
> URL: https://issues.apache.org/jira/browse/SPARK-23626
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core
>Affects Versions: 2.2.1, 2.3.3, 2.4.3, 3.0.0
>Reporter: Ajith S
>Assignee: Josh Rosen
>Priority: Major
>
> DAGScheduler becomes a bottleneck in cluster when multiple JobSubmitted 
> events has to be processed as DAGSchedulerEventProcessLoop is single threaded 
> and it will block other tasks in queue like TaskCompletion.
> The JobSubmitted event is time consuming depending on the nature of the job 
> (Example: calculating parent stage dependencies, shuffle dependencies, 
> partitions) and thus it blocks all the events to be processed.
>  
> I see multiple JIRA referring to this behavior
> https://issues.apache.org/jira/browse/SPARK-2647
> https://issues.apache.org/jira/browse/SPARK-4961
>  
> Similarly in my cluster some jobs partition calculation is time consuming 
> (Similar to stack at SPARK-2647) hence it slows down the spark 
> DAGSchedulerEventProcessLoop which results in user jobs to slowdown, even if 
> its tasks are finished within seconds, as TaskCompletion Events are processed 
> at a slower rate due to blockage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37009) Add checks to DebugFilesystem to ensure that FS operations are not performed in the DAGScheduler event loop

2021-10-14 Thread Josh Rosen (Jira)
Josh Rosen created SPARK-37009:
--

 Summary: Add checks to DebugFilesystem to ensure that FS 
operations are not performed in the DAGScheduler event loop
 Key: SPARK-37009
 URL: https://issues.apache.org/jira/browse/SPARK-37009
 Project: Spark
  Issue Type: Improvement
  Components: Scheduler, Tests
Affects Versions: 3.0.0
Reporter: Josh Rosen


As [~yuchen.huo] suggested at 
[https://github.com/apache/spark/pull/34265#discussion_r728805893,] we explore 
modifying {{DebugFilesystem}} to throw exceptions in case filesystem operations 
are performed from inside of the DAGScheduler's event processing thread. This 
could help prevent future issues that are similar to SPARK-23626. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37000) Add type hints to python/pyspark/sql/util.py

2021-10-14 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-37000.
---
Fix Version/s: 3.3.0
 Assignee: Takuya Ueshin  (was: Apache Spark)
   Resolution: Fixed

Issue resolved by pull request 34278
https://github.com/apache/spark/pull/34278

> Add type hints to python/pyspark/sql/util.py
> 
>
> Key: SPARK-37000
> URL: https://issues.apache.org/jira/browse/SPARK-37000
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.3.0
>
>
> Add type hints for python/pyspark/sql/utils.py.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36938) Inline type hints for group.py in python/pyspark/sql

2021-10-14 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-36938.
---
Fix Version/s: 3.3.0
 Assignee: dch nguyen
   Resolution: Fixed

Issue resolved by pull request 34197
https://github.com/apache/spark/pull/34197

> Inline type hints for group.py in python/pyspark/sql  
> -
>
> Key: SPARK-36938
> URL: https://issues.apache.org/jira/browse/SPARK-36938
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: dch nguyen
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36905) Reading Hive view without explicit column names fails in Spark

2021-10-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-36905:

Affects Version/s: (was: 3.2.0)
   3.3.0

> Reading Hive view without explicit column names fails in Spark 
> ---
>
> Key: SPARK-36905
> URL: https://issues.apache.org/jira/browse/SPARK-36905
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Shardul Mahadik
>Assignee: Linhong Liu
>Priority: Major
> Fix For: 3.2.1, 3.3.0
>
>
> Consider a Hive view in which some columns are not explicitly named
> {code:sql}
> CREATE VIEW test_view AS
> SELECT 1
> FROM some_table
> {code}
> Reading this view in Spark leads to an {{AnalysisException}}
> {code:java}
> org.apache.spark.sql.AnalysisException: cannot resolve '`_c0`' given input 
> columns: [1]
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:188)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:185)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:340)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:340)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:337)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:406)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:404)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:357)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:337)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:337)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:406)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:404)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:357)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:337)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:127)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:132)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:132)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:137)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:137)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:104)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:185)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:182)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91)
>

[jira] [Updated] (SPARK-36905) Reading Hive view without explicit column names fails in Spark

2021-10-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-36905:

Affects Version/s: (was: 3.3.0)
   3.2.0

> Reading Hive view without explicit column names fails in Spark 
> ---
>
> Key: SPARK-36905
> URL: https://issues.apache.org/jira/browse/SPARK-36905
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Shardul Mahadik
>Assignee: Linhong Liu
>Priority: Major
> Fix For: 3.2.1, 3.3.0
>
>
> Consider a Hive view in which some columns are not explicitly named
> {code:sql}
> CREATE VIEW test_view AS
> SELECT 1
> FROM some_table
> {code}
> Reading this view in Spark leads to an {{AnalysisException}}
> {code:java}
> org.apache.spark.sql.AnalysisException: cannot resolve '`_c0`' given input 
> columns: [1]
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:188)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:185)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:340)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:340)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:337)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:406)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:404)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:357)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:337)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:337)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:406)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:404)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:357)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:337)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:127)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:132)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:132)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:137)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:137)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:104)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:185)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:182)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91)
>

[jira] [Assigned] (SPARK-36905) Reading Hive view without explicit column names fails in Spark

2021-10-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-36905:
---

Assignee: Linhong Liu

> Reading Hive view without explicit column names fails in Spark 
> ---
>
> Key: SPARK-36905
> URL: https://issues.apache.org/jira/browse/SPARK-36905
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Shardul Mahadik
>Assignee: Linhong Liu
>Priority: Major
>
> Consider a Hive view in which some columns are not explicitly named
> {code:sql}
> CREATE VIEW test_view AS
> SELECT 1
> FROM some_table
> {code}
> Reading this view in Spark leads to an {{AnalysisException}}
> {code:java}
> org.apache.spark.sql.AnalysisException: cannot resolve '`_c0`' given input 
> columns: [1]
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:188)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:185)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:340)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:340)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:337)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:406)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:404)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:357)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:337)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:337)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:406)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:404)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:357)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:337)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:127)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:132)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:132)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:137)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:137)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:104)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:185)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:182)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Ana

[jira] [Resolved] (SPARK-36905) Reading Hive view without explicit column names fails in Spark

2021-10-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-36905.
-
Fix Version/s: 3.3.0
   3.2.1
   Resolution: Fixed

> Reading Hive view without explicit column names fails in Spark 
> ---
>
> Key: SPARK-36905
> URL: https://issues.apache.org/jira/browse/SPARK-36905
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Shardul Mahadik
>Assignee: Linhong Liu
>Priority: Major
> Fix For: 3.2.1, 3.3.0
>
>
> Consider a Hive view in which some columns are not explicitly named
> {code:sql}
> CREATE VIEW test_view AS
> SELECT 1
> FROM some_table
> {code}
> Reading this view in Spark leads to an {{AnalysisException}}
> {code:java}
> org.apache.spark.sql.AnalysisException: cannot resolve '`_c0`' given input 
> columns: [1]
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:188)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:185)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:340)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:340)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:337)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:406)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:404)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:357)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:337)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:337)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:406)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:404)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:357)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:337)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:127)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:132)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:132)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:137)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:137)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:104)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:185)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:182)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.sca

[jira] [Resolved] (SPARK-37003) Merge INSERT related docs

2021-10-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-37003.
-
Resolution: Fixed

Issue resolved by pull request 34282
[https://github.com/apache/spark/pull/34282]

> Merge INSERT related docs
> -
>
> Key: SPARK-37003
> URL: https://issues.apache.org/jira/browse/SPARK-37003
> Project: Spark
>  Issue Type: Improvement
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.3.0
>
>
> Current insert doc have too many same content, merge insert into and overwrite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37003) Merge INSERT related docs

2021-10-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-37003:
---

Assignee: angerszhu

> Merge INSERT related docs
> -
>
> Key: SPARK-37003
> URL: https://issues.apache.org/jira/browse/SPARK-37003
> Project: Spark
>  Issue Type: Improvement
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.3.0
>
>
> Current insert doc have too many same content, merge insert into and overwrite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36525) DS V2 Index Support

2021-10-14 Thread Huaxin Gao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428834#comment-17428834
 ] 

Huaxin Gao commented on SPARK-36525:


Yes, it would be great if you can please help [~dchvn]

> DS V2 Index Support
> ---
>
> Key: SPARK-36525
> URL: https://issues.apache.org/jira/browse/SPARK-36525
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Priority: Major
>
> Many data sources support index to improvement query performance. In order to 
> take advantage of the index support in data source, the following APIs will 
> be added for working with indexes:
> {code:java}
>   /**
>* Creates an index.
>*
>* @param indexName the name of the index to be created
>* @param indexType the IndexType of the index to be created
>* @param table the table on which index to be created
>* @param columns the columns on which index to be created
>* @param properties the properties of the index to be created
>* @throws IndexAlreadyExistsException If the index already exists 
> (optional)
>* @throws UnsupportedOperationException If create index is not a supported 
> operation
>*/
>   void createIndex(String indexName,
>   String indexType,
>   Identifier table,
>   FieldReference[] columns,
>   Map properties)
>   throws IndexAlreadyExistsException, UnsupportedOperationException;
>   /**
>* Soft deletes the index with the given name.
>* Deleted index can be restored by calling restoreIndex.
>*
>* @param indexName the name of the index to be deleted
>* @return true if the index is deleted
>* @throws NoSuchIndexException If the index does not exist (optional)
>* @throws UnsupportedOperationException If delete index is not a supported 
> operation
>*/
>   default boolean deleteIndex(String indexName)
>   throws NoSuchIndexException, UnsupportedOperationException
>   /**
>* Checks whether an index exists.
>*
>* @param indexName the name of the index
>* @return true if the index exists, false otherwise
>*/
>   boolean indexExists(String indexName);
>   /**
>* Lists all the indexes in a table.
>*
>* @param table the table to be checked on for indexes
>* @throws NoSuchTableException
>*/
>   Index[] listIndexes(Identifier table) throws NoSuchTableException;
>   /**
>* Hard deletes the index with the given name.
>* The Index can't be restored once dropped.
>*
>* @param indexName the name of the index to be dropped.
>* @return true if the index is dropped
>* @throws NoSuchIndexException If the index does not exist (optional)
>* @throws UnsupportedOperationException If drop index is not a supported 
> operation
>*/
>   boolean dropIndex(String indexName) throws NoSuchIndexException, 
> UnsupportedOperationException;
>   /**
>* Restores the index with the given name.
>* Deleted index can be restored by calling restoreIndex, but dropped index 
> can't be restored.
>*
>* @param indexName the name of the index to be restored
>* @return true if the index is restored
>* @throws NoSuchIndexException If the index does not exist (optional)
>* @throws UnsupportedOperationException
>*/
>   default boolean restoreIndex(String indexName)
>   throws NoSuchIndexException, UnsupportedOperationException
>   /**
>* Refreshes index using the latest data. This causes the index to be 
> rebuilt.
>*
>* @param indexName the name of the index to be rebuilt
>* @return true if the index is rebuilt
>* @throws NoSuchIndexException If the index does not exist (optional)
>* @throws UnsupportedOperationException
>*/
>   default boolean refreshIndex(String indexName)
>   throws NoSuchIndexException, UnsupportedOperationException
>   /**
>* Alter Index using the new property. This causes the index to be rebuilt.
>*
>* @param indexName the name of the index to be altered
>* @return true if the index is altered
>* @throws NoSuchIndexException If the index does not exist (optional)
>* @throws UnsupportedOperationException
>*/
>   default boolean alterIndex(String indexName, Properties properties)
>   throws NoSuchIndexException, UnsupportedOperationException
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37007) ExecutorAllocationManager schedule() does not use spark.dynamicAllocation.executorIdleTimeout

2021-10-14 Thread Nandini (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428806#comment-17428806
 ] 

Nandini commented on SPARK-37007:
-

Hello,
I would like to work on this jira. 

> ExecutorAllocationManager schedule() does not use 
> spark.dynamicAllocation.executorIdleTimeout
> -
>
> Key: SPARK-37007
> URL: https://issues.apache.org/jira/browse/SPARK-37007
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.1, 3.2.0
>Reporter: Nandini
>Priority: Minor
>
> The ExecutorAllocationManager removes idle executors after the configured 
> spark.dynamicAllocation.executorIdleTimeout but in the schedule() it does not 
> use the same configuration.
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L249
> The value for intervalMillis is 100 and timeunit is ms. Hence you see this 
> log approximately 10 times in a second. 
>  | executor.scheduleWithFixedDelay(scheduleTask, 0, intervalMillis, 
> TimeUnit.MILLISECONDS)
> In older versions it was at info level
> [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L454]
> In the latest versions (and master) it has been changed to debug
> [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L540]
> The change request for the above - 
> [https://github.com/apache/spark/commit/3584d849438ad48ff54af3c982c124a8443dc590]
> However, this check for executors to be removed should be using 
> spark.dynamicAllocation.executorIdleTimeout instead of 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L153
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-37007) ExecutorAllocationManager schedule() does not use spark.dynamicAllocation.executorIdleTimeout

2021-10-14 Thread Nandini (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nandini updated SPARK-37007:

Comment: was deleted

(was: Hello Team, 
I would like to work on this jira.)

> ExecutorAllocationManager schedule() does not use 
> spark.dynamicAllocation.executorIdleTimeout
> -
>
> Key: SPARK-37007
> URL: https://issues.apache.org/jira/browse/SPARK-37007
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.1, 3.2.0
>Reporter: Nandini
>Priority: Minor
>
> The ExecutorAllocationManager removes idle executors after the configured 
> spark.dynamicAllocation.executorIdleTimeout but in the schedule() it does not 
> use the same configuration.
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L249
> The value for intervalMillis is 100 and timeunit is ms. Hence you see this 
> log approximately 10 times in a second. 
>  | executor.scheduleWithFixedDelay(scheduleTask, 0, intervalMillis, 
> TimeUnit.MILLISECONDS)
> In older versions it was at info level
> [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L454]
> In the latest versions (and master) it has been changed to debug
> [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L540]
> The change request for the above - 
> [https://github.com/apache/spark/commit/3584d849438ad48ff54af3c982c124a8443dc590]
> However, this check for executors to be removed should be using 
> spark.dynamicAllocation.executorIdleTimeout instead of 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L153
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37007) ExecutorAllocationManager schedule() does not use spark.dynamicAllocation.executorIdleTimeout

2021-10-14 Thread Nandini (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428805#comment-17428805
 ] 

Nandini commented on SPARK-37007:
-

Hello Team, 
I would like to work on this jira.

> ExecutorAllocationManager schedule() does not use 
> spark.dynamicAllocation.executorIdleTimeout
> -
>
> Key: SPARK-37007
> URL: https://issues.apache.org/jira/browse/SPARK-37007
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.1, 3.2.0
>Reporter: Nandini
>Priority: Minor
>
> The ExecutorAllocationManager removes idle executors after the configured 
> spark.dynamicAllocation.executorIdleTimeout but in the schedule() it does not 
> use the same configuration.
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L249
> The value for intervalMillis is 100 and timeunit is ms. Hence you see this 
> log approximately 10 times in a second. 
>  | executor.scheduleWithFixedDelay(scheduleTask, 0, intervalMillis, 
> TimeUnit.MILLISECONDS)
> In older versions it was at info level
> [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L454]
> In the latest versions (and master) it has been changed to debug
> [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L540]
> The change request for the above - 
> [https://github.com/apache/spark/commit/3584d849438ad48ff54af3c982c124a8443dc590]
> However, this check for executors to be removed should be using 
> spark.dynamicAllocation.executorIdleTimeout instead of 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L153
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2021-10-14 Thread Michelle m Hovington (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michelle m Hovington updated SPARK-21187:
-
Attachment: 0--1172099527-254246775-1412485878

> Complete support for remaining Spark data types in Arrow Converters
> ---
>
> Key: SPARK-21187
> URL: https://issues.apache.org/jira/browse/SPARK-21187
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: 0--1172099527-254246775-1412485878
>
>
> This is to track adding the remaining type support in Arrow Converters. 
> Currently, only primitive data types are supported. '
> Remaining types:
>  * -*Date*-
>  * -*Timestamp*-
>  * *Complex*: -Struct-, -Array-, -Map-
>  * -*Decimal*-
>  * -*Binary*-
>  * -*Categorical*- when converting from Pandas
> Some things to do before closing this out:
>  * -Look to upgrading to Arrow 0.7 for better Decimal support (can now write 
> values as BigDecimal)-
>  * -Need to add some user docs-
>  * -Make sure Python tests are thorough-
>  * Check into complex type support mentioned in comments by [~leif], should 
> we support mulit-indexing?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37008) WholeStageCodegenSparkSubmitSuite Failed with Java 17

2021-10-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37008:


Assignee: (was: Apache Spark)

> WholeStageCodegenSparkSubmitSuite Failed with Java 17 
> --
>
> Key: SPARK-37008
> URL: https://issues.apache.org/jira/browse/SPARK-37008
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Major
>
> WholeStageCodegenSparkSubmitSuite test failed when use Java 17
> {code:java}
>  2021-10-14 04:32:38.038 - stderr> Exception in thread "main" 
> org.scalatest.exceptions.TestFailedException: 16 was not greater than 16
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite$.main(WholeStageCodegenSparkSubmitSuite.scala:82)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite.main(WholeStageCodegenSparkSubmitSuite.scala)
>   2021-10-14 04:32:38.038 - stderr>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   2021-10-14 04:32:38.038 - stderr>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>   2021-10-14 04:32:38.038 - stderr>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   2021-10-14 04:32:38.038 - stderr>   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:568)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37008) WholeStageCodegenSparkSubmitSuite Failed with Java 17

2021-10-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37008:


Assignee: Apache Spark

> WholeStageCodegenSparkSubmitSuite Failed with Java 17 
> --
>
> Key: SPARK-37008
> URL: https://issues.apache.org/jira/browse/SPARK-37008
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> WholeStageCodegenSparkSubmitSuite test failed when use Java 17
> {code:java}
>  2021-10-14 04:32:38.038 - stderr> Exception in thread "main" 
> org.scalatest.exceptions.TestFailedException: 16 was not greater than 16
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite$.main(WholeStageCodegenSparkSubmitSuite.scala:82)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite.main(WholeStageCodegenSparkSubmitSuite.scala)
>   2021-10-14 04:32:38.038 - stderr>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   2021-10-14 04:32:38.038 - stderr>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>   2021-10-14 04:32:38.038 - stderr>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   2021-10-14 04:32:38.038 - stderr>   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:568)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37008) WholeStageCodegenSparkSubmitSuite Failed with Java 17

2021-10-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428783#comment-17428783
 ] 

Apache Spark commented on SPARK-37008:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/34286

> WholeStageCodegenSparkSubmitSuite Failed with Java 17 
> --
>
> Key: SPARK-37008
> URL: https://issues.apache.org/jira/browse/SPARK-37008
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Major
>
> WholeStageCodegenSparkSubmitSuite test failed when use Java 17
> {code:java}
>  2021-10-14 04:32:38.038 - stderr> Exception in thread "main" 
> org.scalatest.exceptions.TestFailedException: 16 was not greater than 16
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite$.main(WholeStageCodegenSparkSubmitSuite.scala:82)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite.main(WholeStageCodegenSparkSubmitSuite.scala)
>   2021-10-14 04:32:38.038 - stderr>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   2021-10-14 04:32:38.038 - stderr>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>   2021-10-14 04:32:38.038 - stderr>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   2021-10-14 04:32:38.038 - stderr>   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:568)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37008) WholeStageCodegenSparkSubmitSuite Failed with Java 17

2021-10-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37008:


Assignee: (was: Apache Spark)

> WholeStageCodegenSparkSubmitSuite Failed with Java 17 
> --
>
> Key: SPARK-37008
> URL: https://issues.apache.org/jira/browse/SPARK-37008
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Major
>
> WholeStageCodegenSparkSubmitSuite test failed when use Java 17
> {code:java}
>  2021-10-14 04:32:38.038 - stderr> Exception in thread "main" 
> org.scalatest.exceptions.TestFailedException: 16 was not greater than 16
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite$.main(WholeStageCodegenSparkSubmitSuite.scala:82)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite.main(WholeStageCodegenSparkSubmitSuite.scala)
>   2021-10-14 04:32:38.038 - stderr>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   2021-10-14 04:32:38.038 - stderr>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>   2021-10-14 04:32:38.038 - stderr>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   2021-10-14 04:32:38.038 - stderr>   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:568)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36337) decimal('Nan') is unsupported in net.razorvine.pickle

2021-10-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36337:


Assignee: (was: Apache Spark)

> decimal('Nan') is unsupported in net.razorvine.pickle 
> --
>
> Key: SPARK-36337
> URL: https://issues.apache.org/jira/browse/SPARK-36337
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Yikun Jiang
>Priority: Major
>
> Decimal('NaN') is not supported by net.razorvine.pickle now.
> In Python
> {code:java}
> >>> pickled = cloudpickle.dumps(decimal.Decimal('NaN'))
> b'\x80\x05\x95!\x00\x00\x00\x00\x00\x00\x00\x8c\x07decimal\x94\x8c\x07Decimal\x94\x93\x94\x8c\x03NaN\x94\x85\x94R\x94.'
> >>> pickle.loads(pickled)
> Decimal('NaN')
> {code}
> In Scala
> {code:java}
> scala> import net.razorvine.pickle.\{Pickler, Unpickler, PickleUtils}
> scala> val unpickle = new Unpickler
> scala> 
> unpickle.loads(PickleUtils.str2bytes("\u0080\u0005\u0095!\u\u\u\u\u\u\u\u008c\u0007decimal\u0094\u008c\u0007Decimal\u0094\u0093\u0094\u008c\u0003NaN\u0094\u0085\u0094R\u0094."))
> net.razorvine.pickle.PickleException: problem construction object: 
> java.lang.reflect.InvocationTargetException
>  at 
> net.razorvine.pickle.objects.AnyClassConstructor.construct(AnyClassConstructor.java:29)
>  at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:773)
>  at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:213)
>  at net.razorvine.pickle.Unpickler.load(Unpickler.java:123)
>  at net.razorvine.pickle.Unpickler.loads(Unpickler.java:136)
>  ... 48 elided
> {code}
> I submit an issue in pickle upstream 
> [https://github.com/irmen/pickle/issues/7] .
> we should bump pickle latest version after it fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36337) decimal('Nan') is unsupported in net.razorvine.pickle

2021-10-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36337:


Assignee: Apache Spark

> decimal('Nan') is unsupported in net.razorvine.pickle 
> --
>
> Key: SPARK-36337
> URL: https://issues.apache.org/jira/browse/SPARK-36337
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Yikun Jiang
>Assignee: Apache Spark
>Priority: Major
>
> Decimal('NaN') is not supported by net.razorvine.pickle now.
> In Python
> {code:java}
> >>> pickled = cloudpickle.dumps(decimal.Decimal('NaN'))
> b'\x80\x05\x95!\x00\x00\x00\x00\x00\x00\x00\x8c\x07decimal\x94\x8c\x07Decimal\x94\x93\x94\x8c\x03NaN\x94\x85\x94R\x94.'
> >>> pickle.loads(pickled)
> Decimal('NaN')
> {code}
> In Scala
> {code:java}
> scala> import net.razorvine.pickle.\{Pickler, Unpickler, PickleUtils}
> scala> val unpickle = new Unpickler
> scala> 
> unpickle.loads(PickleUtils.str2bytes("\u0080\u0005\u0095!\u\u\u\u\u\u\u\u008c\u0007decimal\u0094\u008c\u0007Decimal\u0094\u0093\u0094\u008c\u0003NaN\u0094\u0085\u0094R\u0094."))
> net.razorvine.pickle.PickleException: problem construction object: 
> java.lang.reflect.InvocationTargetException
>  at 
> net.razorvine.pickle.objects.AnyClassConstructor.construct(AnyClassConstructor.java:29)
>  at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:773)
>  at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:213)
>  at net.razorvine.pickle.Unpickler.load(Unpickler.java:123)
>  at net.razorvine.pickle.Unpickler.loads(Unpickler.java:136)
>  ... 48 elided
> {code}
> I submit an issue in pickle upstream 
> [https://github.com/irmen/pickle/issues/7] .
> we should bump pickle latest version after it fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36337) decimal('Nan') is unsupported in net.razorvine.pickle

2021-10-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428773#comment-17428773
 ] 

Apache Spark commented on SPARK-36337:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/34285

> decimal('Nan') is unsupported in net.razorvine.pickle 
> --
>
> Key: SPARK-36337
> URL: https://issues.apache.org/jira/browse/SPARK-36337
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Yikun Jiang
>Priority: Major
>
> Decimal('NaN') is not supported by net.razorvine.pickle now.
> In Python
> {code:java}
> >>> pickled = cloudpickle.dumps(decimal.Decimal('NaN'))
> b'\x80\x05\x95!\x00\x00\x00\x00\x00\x00\x00\x8c\x07decimal\x94\x8c\x07Decimal\x94\x93\x94\x8c\x03NaN\x94\x85\x94R\x94.'
> >>> pickle.loads(pickled)
> Decimal('NaN')
> {code}
> In Scala
> {code:java}
> scala> import net.razorvine.pickle.\{Pickler, Unpickler, PickleUtils}
> scala> val unpickle = new Unpickler
> scala> 
> unpickle.loads(PickleUtils.str2bytes("\u0080\u0005\u0095!\u\u\u\u\u\u\u\u008c\u0007decimal\u0094\u008c\u0007Decimal\u0094\u0093\u0094\u008c\u0003NaN\u0094\u0085\u0094R\u0094."))
> net.razorvine.pickle.PickleException: problem construction object: 
> java.lang.reflect.InvocationTargetException
>  at 
> net.razorvine.pickle.objects.AnyClassConstructor.construct(AnyClassConstructor.java:29)
>  at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:773)
>  at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:213)
>  at net.razorvine.pickle.Unpickler.load(Unpickler.java:123)
>  at net.razorvine.pickle.Unpickler.loads(Unpickler.java:136)
>  ... 48 elided
> {code}
> I submit an issue in pickle upstream 
> [https://github.com/irmen/pickle/issues/7] .
> we should bump pickle latest version after it fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37008) WholeStageCodegenSparkSubmitSuite Failed with Java 17

2021-10-14 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-37008:
-
Summary: WholeStageCodegenSparkSubmitSuite Failed with Java 17   (was: Use 
UseCompressedClassPointers instead of UseCompressedOops to pass 
WholeStageCodegenSparkSubmitSuite with Java 17 )

> WholeStageCodegenSparkSubmitSuite Failed with Java 17 
> --
>
> Key: SPARK-37008
> URL: https://issues.apache.org/jira/browse/SPARK-37008
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Major
>
> WholeStageCodegenSparkSubmitSuite test failed when use Java 17
> {code:java}
>  2021-10-14 04:32:38.038 - stderr> Exception in thread "main" 
> org.scalatest.exceptions.TestFailedException: 16 was not greater than 16
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite$.main(WholeStageCodegenSparkSubmitSuite.scala:82)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite.main(WholeStageCodegenSparkSubmitSuite.scala)
>   2021-10-14 04:32:38.038 - stderr>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   2021-10-14 04:32:38.038 - stderr>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>   2021-10-14 04:32:38.038 - stderr>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   2021-10-14 04:32:38.038 - stderr>   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:568)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
>   2021-10-14 04:32:38.038 - stderr>   at 
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37008) Use UseCompressedClassPointers instead of UseCompressedOops to pass WholeStageCodegenSparkSubmitSuite with Java 17

2021-10-14 Thread Yang Jie (Jira)
Yang Jie created SPARK-37008:


 Summary: Use UseCompressedClassPointers instead of 
UseCompressedOops to pass WholeStageCodegenSparkSubmitSuite with Java 17 
 Key: SPARK-37008
 URL: https://issues.apache.org/jira/browse/SPARK-37008
 Project: Spark
  Issue Type: Sub-task
  Components: SQL, Tests
Affects Versions: 3.3.0
Reporter: Yang Jie


WholeStageCodegenSparkSubmitSuite test failed when use Java 17
{code:java}
 2021-10-14 04:32:38.038 - stderr> Exception in thread "main" 
org.scalatest.exceptions.TestFailedException: 16 was not greater than 16
  2021-10-14 04:32:38.038 - stderr> at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
  2021-10-14 04:32:38.038 - stderr> at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
  2021-10-14 04:32:38.038 - stderr> at 
org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
  2021-10-14 04:32:38.038 - stderr> at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
  2021-10-14 04:32:38.038 - stderr> at 
org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite$.main(WholeStageCodegenSparkSubmitSuite.scala:82)
  2021-10-14 04:32:38.038 - stderr> at 
org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite.main(WholeStageCodegenSparkSubmitSuite.scala)
  2021-10-14 04:32:38.038 - stderr> at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  2021-10-14 04:32:38.038 - stderr> at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  2021-10-14 04:32:38.038 - stderr> at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  2021-10-14 04:32:38.038 - stderr> at 
java.base/java.lang.reflect.Method.invoke(Method.java:568)
  2021-10-14 04:32:38.038 - stderr> at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
  2021-10-14 04:32:38.038 - stderr> at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
  2021-10-14 04:32:38.038 - stderr> at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
  2021-10-14 04:32:38.038 - stderr> at 
org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
  2021-10-14 04:32:38.038 - stderr> at 
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
  2021-10-14 04:32:38.038 - stderr> at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
  2021-10-14 04:32:38.038 - stderr> at 
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
  2021-10-14 04:32:38.038 - stderr> at 
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37007) ExecutorAllocationManager schedule() does not use spark.dynamicAllocation.executorIdleTimeout

2021-10-14 Thread Nandini (Jira)
Nandini created SPARK-37007:
---

 Summary: ExecutorAllocationManager schedule() does not use 
spark.dynamicAllocation.executorIdleTimeout
 Key: SPARK-37007
 URL: https://issues.apache.org/jira/browse/SPARK-37007
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.2.0, 2.4.1
Reporter: Nandini


The ExecutorAllocationManager removes idle executors after the configured 
spark.dynamicAllocation.executorIdleTimeout but in the schedule() it does not 
use the same configuration.
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L249

The value for intervalMillis is 100 and timeunit is ms. Hence you see this log 
approximately 10 times in a second. 
 | executor.scheduleWithFixedDelay(scheduleTask, 0, intervalMillis, 
TimeUnit.MILLISECONDS)

In older versions it was at info level
[https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L454]
In the latest versions (and master) it has been changed to debug
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L540]
The change request for the above - 
[https://github.com/apache/spark/commit/3584d849438ad48ff54af3c982c124a8443dc590]

However, this check for executors to be removed should be using 
spark.dynamicAllocation.executorIdleTimeout instead of 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L153
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37001) Disable two level of map for final hash aggregation by default

2021-10-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-37001:
---

Assignee: Cheng Su

> Disable two level of map for final hash aggregation by default
> --
>
> Key: SPARK-37001
> URL: https://issues.apache.org/jira/browse/SPARK-37001
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Cheng Su
>Assignee: Cheng Su
>Priority: Minor
>
> This JIRA is to disable two level of maps for final hash aggregation by 
> default. The feature was introduced in 
> [#32242|https://github.com/apache/spark/pull/32242] and we found it can lead 
> to query performance regression when the final aggregation gets rows with a 
> lot of distinct keys. The 1st level hash map is full so a lot of rows will 
> waste the 1st hash map lookup and inserted into 2nd hash map. This feature 
> still benefits query with not so many distinct keys though, so introducing a 
> config to allow query to enable the feature when seeing benefit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37001) Disable two level of map for final hash aggregation by default

2021-10-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-37001.
-
Fix Version/s: 3.2.1
   3.3.0
   Resolution: Fixed

Issue resolved by pull request 34270
[https://github.com/apache/spark/pull/34270]

> Disable two level of map for final hash aggregation by default
> --
>
> Key: SPARK-37001
> URL: https://issues.apache.org/jira/browse/SPARK-37001
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Cheng Su
>Assignee: Cheng Su
>Priority: Minor
> Fix For: 3.3.0, 3.2.1
>
>
> This JIRA is to disable two level of maps for final hash aggregation by 
> default. The feature was introduced in 
> [#32242|https://github.com/apache/spark/pull/32242] and we found it can lead 
> to query performance regression when the final aggregation gets rows with a 
> lot of distinct keys. The 1st level hash map is full so a lot of rows will 
> waste the 1st hash map lookup and inserted into 2nd hash map. This feature 
> still benefits query with not so many distinct keys though, so introducing a 
> config to allow query to enable the feature when seeing benefit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12567) Add aes_encrypt and aes_decrypt UDFs

2021-10-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-12567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-12567:
---

Assignee: Kousuke Saruta

> Add aes_encrypt and aes_decrypt UDFs
> 
>
> Key: SPARK-12567
> URL: https://issues.apache.org/jira/browse/SPARK-12567
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kai Jiang
>Assignee: Kousuke Saruta
>Priority: Major
> Fix For: 3.3.0
>
>
> AES (Advanced Encryption Standard) algorithm.
> Add aes_encrypt and aes_decrypt UDFs.
> Ref:
> [Hive|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Misc.Functions]
> [MySQL|https://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_aes-decrypt]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12567) Add aes_encrypt and aes_decrypt UDFs

2021-10-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-12567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-12567.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 32801
[https://github.com/apache/spark/pull/32801]

> Add aes_encrypt and aes_decrypt UDFs
> 
>
> Key: SPARK-12567
> URL: https://issues.apache.org/jira/browse/SPARK-12567
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kai Jiang
>Priority: Major
> Fix For: 3.3.0
>
>
> AES (Advanced Encryption Standard) algorithm.
> Add aes_encrypt and aes_decrypt UDFs.
> Ref:
> [Hive|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Misc.Functions]
> [MySQL|https://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_aes-decrypt]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37006) MapStatus adds localDirs to avoid the rpc request by method getHostLocalDirs when shuffle reading

2021-10-14 Thread jinhai (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428714#comment-17428714
 ] 

jinhai commented on SPARK-37006:


hi [~cloud_fan],can you review this issue for me?

> MapStatus adds localDirs to avoid the rpc request by method getHostLocalDirs 
> when shuffle reading
> -
>
> Key: SPARK-37006
> URL: https://issues.apache.org/jira/browse/SPARK-37006
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.1.2
>Reporter: jinhai
>Priority: Major
>
> In shuffle reading, in order to get the hostLocalDirs value when executing 
> fetchHostLocalBlocks, we need ExternalBlockStoreClient or 
> NettyBlockTransferService to make a rpc request.
> And when externalShuffleServiceEnabled, there is no need to registerExecutor 
> and so on in the ExternalShuffleBlockResolver class.
> Throughout the spark shuffle module, a lot of code logic is written to deal 
> with localDirs.
> We can directly add localDirs to the BlockManagerId class of MapStatus to get 
> datafile and indexfile.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37006) MapStatus adds localDirs to avoid the rpc request by method getHostLocalDirs when shuffle reading

2021-10-14 Thread jinhai (Jira)
jinhai created SPARK-37006:
--

 Summary: MapStatus adds localDirs to avoid the rpc request by 
method getHostLocalDirs when shuffle reading
 Key: SPARK-37006
 URL: https://issues.apache.org/jira/browse/SPARK-37006
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle
Affects Versions: 3.1.2
Reporter: jinhai


In shuffle reading, in order to get the hostLocalDirs value when executing 
fetchHostLocalBlocks, we need ExternalBlockStoreClient or 
NettyBlockTransferService to make a rpc request.

And when externalShuffleServiceEnabled, there is no need to registerExecutor 
and so on in the ExternalShuffleBlockResolver class.

Throughout the spark shuffle module, a lot of code logic is written to deal 
with localDirs.

We can directly add localDirs to the BlockManagerId class of MapStatus to get 
datafile and indexfile.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36632) DivideYMInterval and DivideDTInterval should throw the same exception when divide by zero.

2021-10-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-36632.
-
Fix Version/s: 3.2.1
   Resolution: Fixed

Issue resolved by pull request 33889
[https://github.com/apache/spark/pull/33889]

> DivideYMInterval and DivideDTInterval should throw the same exception when 
> divide by zero.
> --
>
> Key: SPARK-36632
> URL: https://issues.apache.org/jira/browse/SPARK-36632
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.2.1
>
>
> DivideYMInterval not consider the ansi mode, we should support it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36632) DivideYMInterval and DivideDTInterval should throw the same exception when divide by zero.

2021-10-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-36632:
---

Assignee: jiaan.geng

> DivideYMInterval and DivideDTInterval should throw the same exception when 
> divide by zero.
> --
>
> Key: SPARK-36632
> URL: https://issues.apache.org/jira/browse/SPARK-36632
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
>
> DivideYMInterval not consider the ansi mode, we should support it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36571) Optimized FileOutputCommitter with StagingDir

2021-10-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36571:


Assignee: (was: Apache Spark)

> Optimized FileOutputCommitter with StagingDir
> -
>
> Key: SPARK-36571
> URL: https://issues.apache.org/jira/browse/SPARK-36571
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36571) Optimized FileOutputCommitter with StagingDir

2021-10-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428693#comment-17428693
 ] 

Apache Spark commented on SPARK-36571:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/33820

> Optimized FileOutputCommitter with StagingDir
> -
>
> Key: SPARK-36571
> URL: https://issues.apache.org/jira/browse/SPARK-36571
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36571) Optimized FileOutputCommitter with StagingDir

2021-10-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36571:


Assignee: Apache Spark

> Optimized FileOutputCommitter with StagingDir
> -
>
> Key: SPARK-36571
> URL: https://issues.apache.org/jira/browse/SPARK-36571
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36464) Fix Underlying Size Variable Initialization in ChunkedByteBufferOutputStream for Writing Over 2GB Data

2021-10-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428689#comment-17428689
 ] 

Apache Spark commented on SPARK-36464:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/34284

> Fix Underlying Size Variable Initialization in ChunkedByteBufferOutputStream 
> for Writing Over 2GB Data
> --
>
> Key: SPARK-36464
> URL: https://issues.apache.org/jira/browse/SPARK-36464
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.8, 3.0.3, 3.1.2, 3.3.0
>Reporter: Kazuyuki Tanimura
>Assignee: Kazuyuki Tanimura
>Priority: Major
> Fix For: 3.2.0, 3.1.3, 3.0.4
>
>
> The `size` method of `ChunkedByteBufferOutputStream` returns a `Long` value; 
> however, the underlying `_size` variable is initialized as `Int`.
> That causes an overflow and returns a negative size when over 2GB data is 
> written into `ChunkedByteBufferOutputStream`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36464) Fix Underlying Size Variable Initialization in ChunkedByteBufferOutputStream for Writing Over 2GB Data

2021-10-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428688#comment-17428688
 ] 

Apache Spark commented on SPARK-36464:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/34284

> Fix Underlying Size Variable Initialization in ChunkedByteBufferOutputStream 
> for Writing Over 2GB Data
> --
>
> Key: SPARK-36464
> URL: https://issues.apache.org/jira/browse/SPARK-36464
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.8, 3.0.3, 3.1.2, 3.3.0
>Reporter: Kazuyuki Tanimura
>Assignee: Kazuyuki Tanimura
>Priority: Major
> Fix For: 3.2.0, 3.1.3, 3.0.4
>
>
> The `size` method of `ChunkedByteBufferOutputStream` returns a `Long` value; 
> however, the underlying `_size` variable is initialized as `Int`.
> That causes an overflow and returns a negative size when over 2GB data is 
> written into `ChunkedByteBufferOutputStream`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36464) Fix Underlying Size Variable Initialization in ChunkedByteBufferOutputStream for Writing Over 2GB Data

2021-10-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428686#comment-17428686
 ] 

Apache Spark commented on SPARK-36464:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/34284

> Fix Underlying Size Variable Initialization in ChunkedByteBufferOutputStream 
> for Writing Over 2GB Data
> --
>
> Key: SPARK-36464
> URL: https://issues.apache.org/jira/browse/SPARK-36464
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.8, 3.0.3, 3.1.2, 3.3.0
>Reporter: Kazuyuki Tanimura
>Assignee: Kazuyuki Tanimura
>Priority: Major
> Fix For: 3.2.0, 3.1.3, 3.0.4
>
>
> The `size` method of `ChunkedByteBufferOutputStream` returns a `Long` value; 
> however, the underlying `_size` variable is initialized as `Int`.
> That causes an overflow and returns a negative size when over 2GB data is 
> written into `ChunkedByteBufferOutputStream`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36900) "SPARK-36464: size returns correct positive number even with over 2GB data" will oom with JDK17

2021-10-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428684#comment-17428684
 ] 

Apache Spark commented on SPARK-36900:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/34284

> "SPARK-36464: size returns correct positive number even with over 2GB data" 
> will oom with JDK17 
> 
>
> Key: SPARK-36900
> URL: https://issues.apache.org/jira/browse/SPARK-36900
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>
> Execute
>  
> {code:java}
> build/mvn clean install  -pl core -am -Dtest=none 
> -DwildcardSuites=org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite
> {code}
> with JDK 17,
> {code:java}
> ChunkedByteBufferOutputStreamSuite:
> - empty output
> - write a single byte
> - write a single near boundary
> - write a single at boundary
> - single chunk output
> - single chunk output at boundary size
> - multiple chunk output
> - multiple chunk output at boundary size
> *** RUN ABORTED ***
>   java.lang.OutOfMemoryError: Java heap space
>   at java.base/java.lang.Integer.valueOf(Integer.java:1081)
>   at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:67)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStream.allocateNewChunkIfNeeded(ChunkedByteBufferOutputStream.scala:87)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStream.write(ChunkedByteBufferOutputStream.scala:75)
>   at java.base/java.io.OutputStream.write(OutputStream.java:127)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite.$anonfun$new$22(ChunkedByteBufferOutputStreamSuite.scala:127)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite$$Lambda$179/0x0008011a75d8.apply(Unknown
>  Source)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36900) "SPARK-36464: size returns correct positive number even with over 2GB data" will oom with JDK17

2021-10-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428685#comment-17428685
 ] 

Apache Spark commented on SPARK-36900:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/34284

> "SPARK-36464: size returns correct positive number even with over 2GB data" 
> will oom with JDK17 
> 
>
> Key: SPARK-36900
> URL: https://issues.apache.org/jira/browse/SPARK-36900
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>
> Execute
>  
> {code:java}
> build/mvn clean install  -pl core -am -Dtest=none 
> -DwildcardSuites=org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite
> {code}
> with JDK 17,
> {code:java}
> ChunkedByteBufferOutputStreamSuite:
> - empty output
> - write a single byte
> - write a single near boundary
> - write a single at boundary
> - single chunk output
> - single chunk output at boundary size
> - multiple chunk output
> - multiple chunk output at boundary size
> *** RUN ABORTED ***
>   java.lang.OutOfMemoryError: Java heap space
>   at java.base/java.lang.Integer.valueOf(Integer.java:1081)
>   at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:67)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStream.allocateNewChunkIfNeeded(ChunkedByteBufferOutputStream.scala:87)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStream.write(ChunkedByteBufferOutputStream.scala:75)
>   at java.base/java.io.OutputStream.write(OutputStream.java:127)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite.$anonfun$new$22(ChunkedByteBufferOutputStreamSuite.scala:127)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite$$Lambda$179/0x0008011a75d8.apply(Unknown
>  Source)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36525) DS V2 Index Support

2021-10-14 Thread dch nguyen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428676#comment-17428676
 ] 

dch nguyen commented on SPARK-36525:


[~huaxingao], Should we do these functions for supportsIndex in JDBC for the 
other dialects like Oracle, Postgres, etc.?

> DS V2 Index Support
> ---
>
> Key: SPARK-36525
> URL: https://issues.apache.org/jira/browse/SPARK-36525
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Priority: Major
>
> Many data sources support index to improvement query performance. In order to 
> take advantage of the index support in data source, the following APIs will 
> be added for working with indexes:
> {code:java}
>   /**
>* Creates an index.
>*
>* @param indexName the name of the index to be created
>* @param indexType the IndexType of the index to be created
>* @param table the table on which index to be created
>* @param columns the columns on which index to be created
>* @param properties the properties of the index to be created
>* @throws IndexAlreadyExistsException If the index already exists 
> (optional)
>* @throws UnsupportedOperationException If create index is not a supported 
> operation
>*/
>   void createIndex(String indexName,
>   String indexType,
>   Identifier table,
>   FieldReference[] columns,
>   Map properties)
>   throws IndexAlreadyExistsException, UnsupportedOperationException;
>   /**
>* Soft deletes the index with the given name.
>* Deleted index can be restored by calling restoreIndex.
>*
>* @param indexName the name of the index to be deleted
>* @return true if the index is deleted
>* @throws NoSuchIndexException If the index does not exist (optional)
>* @throws UnsupportedOperationException If delete index is not a supported 
> operation
>*/
>   default boolean deleteIndex(String indexName)
>   throws NoSuchIndexException, UnsupportedOperationException
>   /**
>* Checks whether an index exists.
>*
>* @param indexName the name of the index
>* @return true if the index exists, false otherwise
>*/
>   boolean indexExists(String indexName);
>   /**
>* Lists all the indexes in a table.
>*
>* @param table the table to be checked on for indexes
>* @throws NoSuchTableException
>*/
>   Index[] listIndexes(Identifier table) throws NoSuchTableException;
>   /**
>* Hard deletes the index with the given name.
>* The Index can't be restored once dropped.
>*
>* @param indexName the name of the index to be dropped.
>* @return true if the index is dropped
>* @throws NoSuchIndexException If the index does not exist (optional)
>* @throws UnsupportedOperationException If drop index is not a supported 
> operation
>*/
>   boolean dropIndex(String indexName) throws NoSuchIndexException, 
> UnsupportedOperationException;
>   /**
>* Restores the index with the given name.
>* Deleted index can be restored by calling restoreIndex, but dropped index 
> can't be restored.
>*
>* @param indexName the name of the index to be restored
>* @return true if the index is restored
>* @throws NoSuchIndexException If the index does not exist (optional)
>* @throws UnsupportedOperationException
>*/
>   default boolean restoreIndex(String indexName)
>   throws NoSuchIndexException, UnsupportedOperationException
>   /**
>* Refreshes index using the latest data. This causes the index to be 
> rebuilt.
>*
>* @param indexName the name of the index to be rebuilt
>* @return true if the index is rebuilt
>* @throws NoSuchIndexException If the index does not exist (optional)
>* @throws UnsupportedOperationException
>*/
>   default boolean refreshIndex(String indexName)
>   throws NoSuchIndexException, UnsupportedOperationException
>   /**
>* Alter Index using the new property. This causes the index to be rebuilt.
>*
>* @param indexName the name of the index to be altered
>* @return true if the index is altered
>* @throws NoSuchIndexException If the index does not exist (optional)
>* @throws UnsupportedOperationException
>*/
>   default boolean alterIndex(String indexName, Properties properties)
>   throws NoSuchIndexException, UnsupportedOperationException
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36946) Support time for ps.to_datetime

2021-10-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-36946:


Assignee: dgd_contributor

> Support time for ps.to_datetime
> ---
>
> Key: SPARK-36946
> URL: https://issues.apache.org/jira/browse/SPARK-36946
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Assignee: dgd_contributor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36946) Support time for ps.to_datetime

2021-10-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-36946.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34211
[https://github.com/apache/spark/pull/34211]

> Support time for ps.to_datetime
> ---
>
> Key: SPARK-36946
> URL: https://issues.apache.org/jira/browse/SPARK-36946
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Assignee: dgd_contributor
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37004) Job cancellation causes py4j errors on Jupyter due to pinned thread mode

2021-10-14 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428663#comment-17428663
 ] 

Hyukjin Kwon commented on SPARK-37004:
--

It was found that it's a Py4J issue. I made a fix 
(https://github.com/bartdag/py4j/pull/440).

> Job cancellation causes py4j errors on Jupyter due to pinned thread mode
> 
>
> Key: SPARK-37004
> URL: https://issues.apache.org/jira/browse/SPARK-37004
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xiangrui Meng
>Priority: Blocker
> Attachments: pinned.ipynb
>
>
> Spark 3.2.0 turned on py4j pinned thread mode by default (SPARK-35303). 
> However, in a jupyter notebook, after I cancel (interrupt) a long-running 
> Spark job, the next Spark command will fail with some py4j errors. See 
> attached notebook for repro.
> Cannot reproduce the issue after I turn off pinned thread mode .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37005) pyspark os.getenv('SPARK_YARN_STAGING_DIR') can not get job path

2021-10-14 Thread WeiNan Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WeiNan Zhao updated SPARK-37005:

Labels: python spark-core  (was: )

> pyspark os.getenv('SPARK_YARN_STAGING_DIR') can not get job path
> 
>
> Key: SPARK-37005
> URL: https://issues.apache.org/jira/browse/SPARK-37005
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.4
> Environment: python2.7
> spark2.4
>Reporter: WeiNan Zhao
>Priority: Major
>  Labels: python, spark-core
>
> hi, all
>  i commit a spark job , use spark-submit and set option --files file1,
> then in code,i use 
> {code:java}
> //代码占位符
> path = str(os.environ["SPARK_YARN_STAGING_DIR"])
> {code}
> but path is None, this can success in java code
> {code:java}
> //代码占位符
> spark.read().textFile(System.getenv("SPARK_YARN_STAGING_DIR") + "/README.md")
> {code}
> which cause this problem.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37005) pyspark os.getenv('SPARK_YARN_STAGING_DIR') can not get job path

2021-10-14 Thread WeiNan Zhao (Jira)
WeiNan Zhao created SPARK-37005:
---

 Summary: pyspark os.getenv('SPARK_YARN_STAGING_DIR') can not get 
job path
 Key: SPARK-37005
 URL: https://issues.apache.org/jira/browse/SPARK-37005
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 2.4.4
 Environment: python2.7

spark2.4
Reporter: WeiNan Zhao


hi, all

 i commit a spark job , use spark-submit and set option --files file1,

then in code,i use 
{code:java}
//代码占位符
path = str(os.environ["SPARK_YARN_STAGING_DIR"])
{code}
but path is None, this can success in java code
{code:java}
//代码占位符
spark.read().textFile(System.getenv("SPARK_YARN_STAGING_DIR") + "/README.md")
{code}
which cause this problem.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36871) Migrate CreateViewStatement to v2 command

2021-10-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428651#comment-17428651
 ] 

Apache Spark commented on SPARK-36871:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/34283

> Migrate CreateViewStatement to v2 command
> -
>
> Key: SPARK-36871
> URL: https://issues.apache.org/jira/browse/SPARK-36871
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36871) Migrate CreateViewStatement to v2 command

2021-10-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428649#comment-17428649
 ] 

Apache Spark commented on SPARK-36871:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/34283

> Migrate CreateViewStatement to v2 command
> -
>
> Key: SPARK-36871
> URL: https://issues.apache.org/jira/browse/SPARK-36871
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36993) Fix json_tuple throw NPE if fields exist no foldable null value

2021-10-14 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-36993:
-
Fix Version/s: 3.0.4

> Fix json_tuple throw NPE if fields exist no foldable null value
> ---
>
> Key: SPARK-36993
> URL: https://issues.apache.org/jira/browse/SPARK-36993
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.1.3, 3.0.4, 3.2.1, 3.3.0
>
>
> If json_tuple exists no foldable null field, Spark would throw NPE during 
> eval field.toString.
> e.g. the query will fail with:
> {code:java}
> SELECT json_tuple('{"a":"1"}', if(c1 < 1, null, 'a')) FROM ( SELECT rand() AS 
> c1 );
> {code}
> {code:java}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.JsonTuple.$anonfun$parseRow$2(jsonExpressions.scala:435)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>   at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.catalyst.expressions.JsonTuple.parseRow(jsonExpressions.scala:435)
>   at 
> org.apache.spark.sql.catalyst.expressions.JsonTuple.$anonfun$eval$6(jsonExpressions.scala:413)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org