date:20210701

[jira] [Updated] (SPARK-35978) Support new keyword TIMESTAMP_LTZ

2021-07-01 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-35978:
---
Affects Version/s: (was: 3.3.0)

> Support new keyword TIMESTAMP_LTZ
> -
>
> Key: SPARK-35978
> URL: https://issues.apache.org/jira/browse/SPARK-35978
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Priority: Major
>
> Support new keyword TIMESTAMP_LTZ, which can be used for:
> * timestamp with local time zone data type in DDL
> * timestamp with local time zone data type in Cast clause.
> * timestamp with local time zone data type literal



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35978) Support new keyword TIMESTAMP_LTZ

2021-07-01 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-35978:
---
Affects Version/s: 3.3.0

> Support new keyword TIMESTAMP_LTZ
> -
>
> Key: SPARK-35978
> URL: https://issues.apache.org/jira/browse/SPARK-35978
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Gengliang Wang
>Priority: Major
>
> Support new keyword TIMESTAMP_LTZ, which can be used for:
> * timestamp with local time zone data type in DDL
> * timestamp with local time zone data type in Cast clause.
> * timestamp with local time zone data type literal



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35955) Fix decimal overflow issues for Average

2021-07-01 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-35955.

Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33177
[https://github.com/apache/spark/pull/33177]

> Fix decimal overflow issues for Average
> ---
>
> Key: SPARK-35955
> URL: https://issues.apache.org/jira/browse/SPARK-35955
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Karen Feng
>Assignee: Karen Feng
>Priority: Major
> Fix For: 3.2.0
>
>
> Fix decimal overflow issues for decimal average in ANSI mode. Linked to 
> SPARK-32018 and SPARK-28067, which address decimal sum.
> Repro:
>  
> {code:java}
> import org.apache.spark.sql.functions._
> spark.conf.set("spark.sql.ansi.enabled", true)
> val df = Seq(
>  (BigDecimal("1000"), 1),
>  (BigDecimal("1000"), 1),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2)).toDF("decNum", "intNum")
> val df2 = df.withColumnRenamed("decNum", "decNum2").join(df, 
> "intNum").agg(mean("decNum"))
> df2.show(40,false)
> {code}
>  
> Should throw an exception (as sum overflows), but instead returns:
>  
> {code:java}
> +---+
> |avg(decNum)|
> +---+
> |null   |
> +---+{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35955) Fix decimal overflow issues for Average

2021-07-01 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-35955:
--

Assignee: Karen Feng

> Fix decimal overflow issues for Average
> ---
>
> Key: SPARK-35955
> URL: https://issues.apache.org/jira/browse/SPARK-35955
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Karen Feng
>Assignee: Karen Feng
>Priority: Major
>
> Fix decimal overflow issues for decimal average in ANSI mode. Linked to 
> SPARK-32018 and SPARK-28067, which address decimal sum.
> Repro:
>  
> {code:java}
> import org.apache.spark.sql.functions._
> spark.conf.set("spark.sql.ansi.enabled", true)
> val df = Seq(
>  (BigDecimal("1000"), 1),
>  (BigDecimal("1000"), 1),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2)).toDF("decNum", "intNum")
> val df2 = df.withColumnRenamed("decNum", "decNum2").join(df, 
> "intNum").agg(mean("decNum"))
> df2.show(40,false)
> {code}
>  
> Should throw an exception (as sum overflows), but instead returns:
>  
> {code:java}
> +---+
> |avg(decNum)|
> +---+
> |null   |
> +---+{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35897) Support user defined initial state with flatMapGroupsWithState in Structured Streaming

2021-07-01 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-35897:
--

Assignee: Rahul Shivu Mahadev

> Support user defined initial state with flatMapGroupsWithState in Structured 
> Streaming
> --
>
> Key: SPARK-35897
> URL: https://issues.apache.org/jira/browse/SPARK-35897
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.2
>Reporter: Rahul Shivu Mahadev
>Assignee: Rahul Shivu Mahadev
>Priority: Major
> Fix For: 3.2.0
>
>
> Structured Streaming supports arbitrary stateful processing using 
> mapGroupsWithState and flatMapGroupWithState operators. The state is created 
> by processing the data that comes in with every batch. This API improvement 
> will allow users to specify an initial state which is applied at the time of 
> executing the first batch.
>  
> h2. Proposed new APIs (Scala)
>  
>  
>   def mapGroupsWithState[S: Encoder, U: Encoder](
>   timeoutConf: GroupStateTimeout,
>   initialState: Dataset[(K, S)])(
>   func: (K, Iterator[V], GroupState[S]) => U): Dataset[U] 
>  
>   def flatMapGroupsWithState[S: Encoder, U: Encoder](
>   outputMode: OutputMode,
>   timeoutConf: GroupStateTimeout,
>   initialState: Dataset[(K, S)])(
>   func: (K, Iterator[V], GroupState[S]) => Iterator[U])
>  
> h2.    Proposed new APIs (Java)
>   
> def mapGroupsWithState[S, U](
>   func: MapGroupsWithStateFunction[K, V, S, U],
>   stateEncoder: Encoder[S],
>   outputEncoder: Encoder[U],
>   timeoutConf: GroupStateTimeout,
>   initialState: Dataset[(K, S)]): Dataset[U]
> def flatMapGroupsWithState[S, U](
>   func: FlatMapGroupsWithStateFunction[K, V, S, U],
>   outputMode: OutputMode,
>   stateEncoder: Encoder[S],
>   outputEncoder: Encoder[U],
>   timeoutConf: GroupStateTimeout,
>   initialState: Dataset[(K, S)]): Dataset[U]
>  
>    
> h2. Example Usage
>    
> val initialState: Dataset[(String, RunningCount)] = Seq(
>   ("a", new RunningCount(1)),
>  ("b", new RunningCount(1))
> ).toDS()
>  
> val inputData = MemoryStream[String]
> val result =
>   inputData.toDS()
> .groupByKey(x => x)
> .mapGroupsWithState(initialState, timeoutConf)(stateFunc)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35897) Support user defined initial state with flatMapGroupsWithState in Structured Streaming

2021-07-01 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-35897.

Resolution: Fixed

Issue resolved by pull request 33093
[https://github.com/apache/spark/pull/33093]

> Support user defined initial state with flatMapGroupsWithState in Structured 
> Streaming
> --
>
> Key: SPARK-35897
> URL: https://issues.apache.org/jira/browse/SPARK-35897
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.2
>Reporter: Rahul Shivu Mahadev
>Assignee: Rahul Shivu Mahadev
>Priority: Major
> Fix For: 3.2.0
>
>
> Structured Streaming supports arbitrary stateful processing using 
> mapGroupsWithState and flatMapGroupWithState operators. The state is created 
> by processing the data that comes in with every batch. This API improvement 
> will allow users to specify an initial state which is applied at the time of 
> executing the first batch.
>  
> h2. Proposed new APIs (Scala)
>  
>  
>   def mapGroupsWithState[S: Encoder, U: Encoder](
>   timeoutConf: GroupStateTimeout,
>   initialState: Dataset[(K, S)])(
>   func: (K, Iterator[V], GroupState[S]) => U): Dataset[U] 
>  
>   def flatMapGroupsWithState[S: Encoder, U: Encoder](
>   outputMode: OutputMode,
>   timeoutConf: GroupStateTimeout,
>   initialState: Dataset[(K, S)])(
>   func: (K, Iterator[V], GroupState[S]) => Iterator[U])
>  
> h2.    Proposed new APIs (Java)
>   
> def mapGroupsWithState[S, U](
>   func: MapGroupsWithStateFunction[K, V, S, U],
>   stateEncoder: Encoder[S],
>   outputEncoder: Encoder[U],
>   timeoutConf: GroupStateTimeout,
>   initialState: Dataset[(K, S)]): Dataset[U]
> def flatMapGroupsWithState[S, U](
>   func: FlatMapGroupsWithStateFunction[K, V, S, U],
>   outputMode: OutputMode,
>   stateEncoder: Encoder[S],
>   outputEncoder: Encoder[U],
>   timeoutConf: GroupStateTimeout,
>   initialState: Dataset[(K, S)]): Dataset[U]
>  
>    
> h2. Example Usage
>    
> val initialState: Dataset[(String, RunningCount)] = Seq(
>   ("a", new RunningCount(1)),
>  ("b", new RunningCount(1))
> ).toDS()
>  
> val inputData = MemoryStream[String]
> val result =
>   inputData.toDS()
> .groupByKey(x => x)
> .mapGroupsWithState(initialState, timeoutConf)(stateFunc)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35984) Add a config to force using ShuffledHashJoin for test purpose

2021-07-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35984:


Assignee: (was: Apache Spark)

> Add a config to force using ShuffledHashJoin for test purpose
> -
>
> Key: SPARK-35984
> URL: https://issues.apache.org/jira/browse/SPARK-35984
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Linhong Liu
>Priority: Major
>
> In the join.sql, we want to cover all 3 join types. but the problem is 
> currently the `spark.sql.join.preferSortMergeJoin = false` couldn't guarantee 
> all the joins will use ShuffledHashJoin, so we need another config to force 
> using hash join in the testing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35984) Add a config to force using ShuffledHashJoin for test purpose

2021-07-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373193#comment-17373193
 ] 

Apache Spark commented on SPARK-35984:
--

User 'linhongliu-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/33182

> Add a config to force using ShuffledHashJoin for test purpose
> -
>
> Key: SPARK-35984
> URL: https://issues.apache.org/jira/browse/SPARK-35984
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Linhong Liu
>Priority: Major
>
> In the join.sql, we want to cover all 3 join types. but the problem is 
> currently the `spark.sql.join.preferSortMergeJoin = false` couldn't guarantee 
> all the joins will use ShuffledHashJoin, so we need another config to force 
> using hash join in the testing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35984) Add a config to force using ShuffledHashJoin for test purpose

2021-07-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35984:


Assignee: Apache Spark

> Add a config to force using ShuffledHashJoin for test purpose
> -
>
> Key: SPARK-35984
> URL: https://issues.apache.org/jira/browse/SPARK-35984
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Linhong Liu
>Assignee: Apache Spark
>Priority: Major
>
> In the join.sql, we want to cover all 3 join types. but the problem is 
> currently the `spark.sql.join.preferSortMergeJoin = false` couldn't guarantee 
> all the joins will use ShuffledHashJoin, so we need another config to force 
> using hash join in the testing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35982) Allow from_json/to_json for map types where value types are year-month intervals

2021-07-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35982:


Assignee: Kousuke Saruta  (was: Apache Spark)

> Allow from_json/to_json for map types where value types are year-month 
> intervals
> 
>
> Key: SPARK-35982
> URL: https://issues.apache.org/jira/browse/SPARK-35982
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> In the current master, from_json and to_json are doesn't support map types 
> where value types are year-month interval types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35982) Allow from_json/to_json for map types where value types are year-month intervals

2021-07-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35982:


Assignee: Apache Spark  (was: Kousuke Saruta)

> Allow from_json/to_json for map types where value types are year-month 
> intervals
> 
>
> Key: SPARK-35982
> URL: https://issues.apache.org/jira/browse/SPARK-35982
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Major
>
> In the current master, from_json and to_json are doesn't support map types 
> where value types are year-month interval types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35982) Allow from_json/to_json for map types where value types are year-month intervals

2021-07-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373164#comment-17373164
 ] 

Apache Spark commented on SPARK-35982:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/33181

> Allow from_json/to_json for map types where value types are year-month 
> intervals
> 
>
> Key: SPARK-35982
> URL: https://issues.apache.org/jira/browse/SPARK-35982
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> In the current master, from_json and to_json are doesn't support map types 
> where value types are year-month interval types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35984) Add a config to force using ShuffledHashJoin for test purpose

2021-07-01 Thread Linhong Liu (Jira)

Linhong Liu created SPARK-35984:
---

 Summary: Add a config to force using ShuffledHashJoin for test 
purpose
 Key: SPARK-35984
 URL: https://issues.apache.org/jira/browse/SPARK-35984
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Linhong Liu


In the join.sql, we want to cover all 3 join types. but the problem is 
currently the `spark.sql.join.preferSortMergeJoin = false` couldn't guarantee 
all the joins will use ShuffledHashJoin, so we need another config to force 
using hash join in the testing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35983) Allow from_json/to_json for map types where value types are day-time intervals

2021-07-01 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-35983:
---
Description: In the current master, from_json and to_json are doesn't 
support map types where value types are day-time interval types.  (was: In the 
current master, an exception is thrown if we specify day-time interval types as 
map value type.)

> Allow from_json/to_json for map types where value types are day-time intervals
> --
>
> Key: SPARK-35983
> URL: https://issues.apache.org/jira/browse/SPARK-35983
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> In the current master, from_json and to_json are doesn't support map types 
> where value types are day-time interval types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35982) Allow from_json/to_json for map types where value types are year-month intervals

2021-07-01 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-35982:
---
Description: In the current master, from_json and to_json are doesn't 
support map types where value types are year-month interval types.  (was: In 
the current master, an exception is thrown if we specify year-month interval 
types as map value type.)

> Allow from_json/to_json for map types where value types are year-month 
> intervals
> 
>
> Key: SPARK-35982
> URL: https://issues.apache.org/jira/browse/SPARK-35982
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> In the current master, from_json and to_json are doesn't support map types 
> where value types are year-month interval types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35982) Allow from_json/to_json for map types where value types are year-month intervals

2021-07-01 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-35982:
---
Summary: Allow from_json/to_json for map types where value types are 
year-month intervals  (was: Allow year-month intervals as map value types)

> Allow from_json/to_json for map types where value types are year-month 
> intervals
> 
>
> Key: SPARK-35982
> URL: https://issues.apache.org/jira/browse/SPARK-35982
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> In the current master, an exception is thrown if we specify year-month 
> interval types as map value type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35983) Allow from_json/to_json for map types where value types are day-time intervals

2021-07-01 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-35983:
---
Summary: Allow from_json/to_json for map types where value types are 
day-time intervals  (was: Allow day-time intervals as map value types)

> Allow from_json/to_json for map types where value types are day-time intervals
> --
>
> Key: SPARK-35983
> URL: https://issues.apache.org/jira/browse/SPARK-35983
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> In the current master, an exception is thrown if we specify day-time interval 
> types as map value type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35983) Allow day-time intervals as map value types

2021-07-01 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-35983:
--

 Summary: Allow day-time intervals as map value types
 Key: SPARK-35983
 URL: https://issues.apache.org/jira/browse/SPARK-35983
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


In the current master, an exception is thrown if we specify day-time interval 
types as map value type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35982) Allow year-month intervals as map value types

2021-07-01 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-35982:
---
Summary: Allow year-month intervals as map value types  (was: Allow 
year-month intervals as map key types)

> Allow year-month intervals as map value types
> -
>
> Key: SPARK-35982
> URL: https://issues.apache.org/jira/browse/SPARK-35982
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> In the current master, an exception is thrown if we specify year-month 
> interval types as map key type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35982) Allow year-month intervals as map value types

2021-07-01 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-35982:
---
Description: In the current master, an exception is thrown if we specify 
year-month interval types as map value type.  (was: In the current master, an 
exception is thrown if we specify year-month interval types as map key type.)

> Allow year-month intervals as map value types
> -
>
> Key: SPARK-35982
> URL: https://issues.apache.org/jira/browse/SPARK-35982
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> In the current master, an exception is thrown if we specify year-month 
> interval types as map value type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35982) Allow year-month intervals as map key types

2021-07-01 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-35982:
--

 Summary: Allow year-month intervals as map key types
 Key: SPARK-35982
 URL: https://issues.apache.org/jira/browse/SPARK-35982
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


In the current master, an exception is thrown if we specify year-month interval 
types as map key type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32899) Support submit application with user-defined cluster manager

2021-07-01 Thread Xianyang Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyang Liu updated SPARK-32899:
-
Description: 
We have supported users to define the customed cluster manager with 
`ExternalClusterManager` trait. However, we can not submit the application with 
`SparkSubmit`. This patch adds the support to submit applications with 
user-defined cluster manager.

 

Add design doc: 
https://docs.google.com/document/d/1-Sn4Zh-l0SCqH7DQ0esdukS68ptSolK4lStj7MZUqJo/edit?usp=sharing

  was:We have supported users to define the customed cluster manager with 
`ExternalClusterManager` trait. However, we can not submit the application with 
`SparkSubmit`. This patch adds the support to submit applications with 
user-defined cluster manager.


> Support submit application with user-defined cluster manager
> 
>
> Key: SPARK-32899
> URL: https://issues.apache.org/jira/browse/SPARK-32899
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Xianyang Liu
>Priority: Major
>
> We have supported users to define the customed cluster manager with 
> `ExternalClusterManager` trait. However, we can not submit the application 
> with `SparkSubmit`. This patch adds the support to submit applications with 
> user-defined cluster manager.
>  
> Add design doc: 
> https://docs.google.com/document/d/1-Sn4Zh-l0SCqH7DQ0esdukS68ptSolK4lStj7MZUqJo/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35339) Improve unit tests for data-type-based basic operations

2021-07-01 Thread Takuya Ueshin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-35339.
---
Fix Version/s: 3.2.0
 Assignee: Xinrong Meng
   Resolution: Fixed

Issue resolved by pull request 33095
https://github.com/apache/spark/pull/33095

> Improve unit tests for data-type-based basic operations
> ---
>
> Key: SPARK-35339
> URL: https://issues.apache.org/jira/browse/SPARK-35339
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> Unit tests for arithmetic operations are scattered in the codebase: 
>  * pyspark/pandas/tests/test_ops_on_diff_frames.py
>  * pyspark/pandas/tests/test_dataframe.py
>  * pyspark/pandas/tests/test_series.py
>  * (Upcoming) pyspark/pandas/tests/data_type_ops/
> We wanted to consolidate them.
> The code would be cleaner and easier to maintain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35825) Increase the heap and stack size for Maven build

2021-07-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373114#comment-17373114
 ] 

Apache Spark commented on SPARK-35825:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/33180

> Increase the heap and stack size for Maven build
> 
>
> Key: SPARK-35825
> URL: https://issues.apache.org/jira/browse/SPARK-35825
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra, Tests
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> The jenkins jobs are unstable due to the stackoverflow errors: 
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-3.2-jdk-11/
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/2274/
> We should increase memory configuration for Maven build.
> Stack size: 64MB => 128MB
> Initial heap size: 1024MB => 2048MB
> Maximum heap size: 1024MB => 2048MB
> The SBT builds are ok so let's keep the current configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35779) Support dynamic filtering for v2 tables

2021-07-01 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh reassigned SPARK-35779:
---

Assignee: Anton Okolnychyi

> Support dynamic filtering for v2 tables
> ---
>
> Key: SPARK-35779
> URL: https://issues.apache.org/jira/browse/SPARK-35779
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Anton Okolnychyi
>Assignee: Anton Okolnychyi
>Priority: Major
>
> We need to support dynamic filtering for v2 tables.
> Design doc:
> https://docs.google.com/document/d/1RfFn2e9o_1uHJ8jFGsSakp-BZMizX1uRrJSybMe2a6M



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35779) Support dynamic filtering for v2 tables

2021-07-01 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh resolved SPARK-35779.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32921
[https://github.com/apache/spark/pull/32921]

> Support dynamic filtering for v2 tables
> ---
>
> Key: SPARK-35779
> URL: https://issues.apache.org/jira/browse/SPARK-35779
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Anton Okolnychyi
>Assignee: Anton Okolnychyi
>Priority: Major
> Fix For: 3.2.0
>
>
> We need to support dynamic filtering for v2 tables.
> Design doc:
> https://docs.google.com/document/d/1RfFn2e9o_1uHJ8jFGsSakp-BZMizX1uRrJSybMe2a6M



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35981) Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check precision

2021-07-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373079#comment-17373079
 ] 

Apache Spark commented on SPARK-35981:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/33179

> Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check 
> precision
> ---
>
> Key: SPARK-35981
> URL: https://issues.apache.org/jira/browse/SPARK-35981
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> In some environment, the precision could be different in {{DataFrame.corr}} 
> function.
> We should use {{check_exact=False}} to loosen the precision.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35981) Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check precision

2021-07-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35981:


Assignee: (was: Apache Spark)

> Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check 
> precision
> ---
>
> Key: SPARK-35981
> URL: https://issues.apache.org/jira/browse/SPARK-35981
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> In some environment, the precision could be different in {{DataFrame.corr}} 
> function.
> We should use {{check_exact=False}} to loosen the precision.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35981) Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check precision

2021-07-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35981:


Assignee: Apache Spark

> Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check 
> precision
> ---
>
> Key: SPARK-35981
> URL: https://issues.apache.org/jira/browse/SPARK-35981
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>
> In some environment, the precision could be different in {{DataFrame.corr}} 
> function.
> We should use {{check_exact=False}} to loosen the precision.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35981) Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check precision

2021-07-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373078#comment-17373078
 ] 

Apache Spark commented on SPARK-35981:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/33179

> Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check 
> precision
> ---
>
> Key: SPARK-35981
> URL: https://issues.apache.org/jira/browse/SPARK-35981
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> In some environment, the precision could be different in {{DataFrame.corr}} 
> function.
> We should use {{check_exact=False}} to loosen the precision.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35976) Adjust `astype` method for ExtensionDtype in pandas API on Spark

2021-07-01 Thread Xinrong Meng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35976:
-
Description: 
Currently, `astype` method for ExtensionDtype in pandas API on Spark is not 
consistent with pandas. For example, 
[https://github.com/apache/spark/pull/33095#discussion_r661704734.]

[https://github.com/apache/spark/pull/33095#discussion_r662623005.]

 

We ought to fill in the gap.

  was:
Currently, `astype` method for ExtensionDtype in pandas API on Spark is not 
consistent with pandas. For example, 
[https://github.com/apache/spark/pull/33095#discussion_r661704734.]

 

We ought to fill in the gap.


> Adjust `astype` method for ExtensionDtype in pandas API on Spark
> 
>
> Key: SPARK-35976
> URL: https://issues.apache.org/jira/browse/SPARK-35976
> Project: Spark
>  Issue Type: Story
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Currently, `astype` method for ExtensionDtype in pandas API on Spark is not 
> consistent with pandas. For example, 
> [https://github.com/apache/spark/pull/33095#discussion_r661704734.]
> [https://github.com/apache/spark/pull/33095#discussion_r662623005.]
>  
> We ought to fill in the gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35981) Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check precision

2021-07-01 Thread Takuya Ueshin (Jira)

Takuya Ueshin created SPARK-35981:
-

 Summary: Use check_exact=False in StatsTest.test_cov_corr_meta to 
loosen the check precision
 Key: SPARK-35981
 URL: https://issues.apache.org/jira/browse/SPARK-35981
 Project: Spark
  Issue Type: Test
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Takuya Ueshin


In some environment, the precision could be different in {{DataFrame.corr}} 
function.

We should use {{check_exact=False}} to loosen the precision.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35980) ThreadAudit test helper should log whether a thread is a Daemon thread

2021-07-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35980:


Assignee: (was: Apache Spark)

> ThreadAudit test helper should log whether a thread is a Daemon thread
> --
>
> Key: SPARK-35980
> URL: https://issues.apache.org/jira/browse/SPARK-35980
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.1.2
>Reporter: Tim Armstrong
>Priority: Major
>
> It would be helpful if the POSSIBLE THREAD LEAK IN SUITE error mentioned 
> whether the threads were daemon threads or not, since leaked non-daemon 
> threads are more likely to be unintentional than leaked daemon threads.
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ThreadAudit.scala#L113



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35980) ThreadAudit test helper should log whether a thread is a Daemon thread

2021-07-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373069#comment-17373069
 ] 

Apache Spark commented on SPARK-35980:
--

User 'timarmstrong' has created a pull request for this issue:
https://github.com/apache/spark/pull/33178

> ThreadAudit test helper should log whether a thread is a Daemon thread
> --
>
> Key: SPARK-35980
> URL: https://issues.apache.org/jira/browse/SPARK-35980
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.1.2
>Reporter: Tim Armstrong
>Priority: Major
>
> It would be helpful if the POSSIBLE THREAD LEAK IN SUITE error mentioned 
> whether the threads were daemon threads or not, since leaked non-daemon 
> threads are more likely to be unintentional than leaked daemon threads.
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ThreadAudit.scala#L113



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35980) ThreadAudit test helper should log whether a thread is a Daemon thread

2021-07-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35980:


Assignee: Apache Spark

> ThreadAudit test helper should log whether a thread is a Daemon thread
> --
>
> Key: SPARK-35980
> URL: https://issues.apache.org/jira/browse/SPARK-35980
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.1.2
>Reporter: Tim Armstrong
>Assignee: Apache Spark
>Priority: Major
>
> It would be helpful if the POSSIBLE THREAD LEAK IN SUITE error mentioned 
> whether the threads were daemon threads or not, since leaked non-daemon 
> threads are more likely to be unintentional than leaked daemon threads.
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ThreadAudit.scala#L113



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35980) ThreadAudit test helper should log whether a thread is a Daemon thread

2021-07-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373068#comment-17373068
 ] 

Apache Spark commented on SPARK-35980:
--

User 'timarmstrong' has created a pull request for this issue:
https://github.com/apache/spark/pull/33178

> ThreadAudit test helper should log whether a thread is a Daemon thread
> --
>
> Key: SPARK-35980
> URL: https://issues.apache.org/jira/browse/SPARK-35980
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.1.2
>Reporter: Tim Armstrong
>Priority: Major
>
> It would be helpful if the POSSIBLE THREAD LEAK IN SUITE error mentioned 
> whether the threads were daemon threads or not, since leaked non-daemon 
> threads are more likely to be unintentional than leaked daemon threads.
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ThreadAudit.scala#L113



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35980) ThreadAudit test helper should log whether a thread is a Daemon thread

2021-07-01 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373065#comment-17373065
 ] 

Tim Armstrong commented on SPARK-35980:
---

I plan to contribute a fix for this.

> ThreadAudit test helper should log whether a thread is a Daemon thread
> --
>
> Key: SPARK-35980
> URL: https://issues.apache.org/jira/browse/SPARK-35980
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.1.2
>Reporter: Tim Armstrong
>Priority: Major
>
> It would be helpful if the POSSIBLE THREAD LEAK IN SUITE error mentioned 
> whether the threads were daemon threads or not, since leaked non-daemon 
> threads are more likely to be unintentional than leaked daemon threads.
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ThreadAudit.scala#L113



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35980) ThreadAudit test helper should log whether a thread is a Daemon thread

2021-07-01 Thread Tim Armstrong (Jira)

Tim Armstrong created SPARK-35980:
-

 Summary: ThreadAudit test helper should log whether a thread is a 
Daemon thread
 Key: SPARK-35980
 URL: https://issues.apache.org/jira/browse/SPARK-35980
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 3.1.2
Reporter: Tim Armstrong


It would be helpful if the POSSIBLE THREAD LEAK IN SUITE error mentioned 
whether the threads were daemon threads or not, since leaked non-daemon threads 
are more likely to be unintentional than leaked daemon threads.

https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ThreadAudit.scala#L113



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35955) Fix decimal overflow issues for Average

2021-07-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35955:


Assignee: Apache Spark

> Fix decimal overflow issues for Average
> ---
>
> Key: SPARK-35955
> URL: https://issues.apache.org/jira/browse/SPARK-35955
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Karen Feng
>Assignee: Apache Spark
>Priority: Major
>
> Fix decimal overflow issues for decimal average in ANSI mode. Linked to 
> SPARK-32018 and SPARK-28067, which address decimal sum.
> Repro:
>  
> {code:java}
> import org.apache.spark.sql.functions._
> spark.conf.set("spark.sql.ansi.enabled", true)
> val df = Seq(
>  (BigDecimal("1000"), 1),
>  (BigDecimal("1000"), 1),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2)).toDF("decNum", "intNum")
> val df2 = df.withColumnRenamed("decNum", "decNum2").join(df, 
> "intNum").agg(mean("decNum"))
> df2.show(40,false)
> {code}
>  
> Should throw an exception (as sum overflows), but instead returns:
>  
> {code:java}
> +---+
> |avg(decNum)|
> +---+
> |null   |
> +---+{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35955) Fix decimal overflow issues for Average

2021-07-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373039#comment-17373039
 ] 

Apache Spark commented on SPARK-35955:
--

User 'karenfeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/33177

> Fix decimal overflow issues for Average
> ---
>
> Key: SPARK-35955
> URL: https://issues.apache.org/jira/browse/SPARK-35955
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Karen Feng
>Priority: Major
>
> Fix decimal overflow issues for decimal average in ANSI mode. Linked to 
> SPARK-32018 and SPARK-28067, which address decimal sum.
> Repro:
>  
> {code:java}
> import org.apache.spark.sql.functions._
> spark.conf.set("spark.sql.ansi.enabled", true)
> val df = Seq(
>  (BigDecimal("1000"), 1),
>  (BigDecimal("1000"), 1),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2)).toDF("decNum", "intNum")
> val df2 = df.withColumnRenamed("decNum", "decNum2").join(df, 
> "intNum").agg(mean("decNum"))
> df2.show(40,false)
> {code}
>  
> Should throw an exception (as sum overflows), but instead returns:
>  
> {code:java}
> +---+
> |avg(decNum)|
> +---+
> |null   |
> +---+{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35955) Fix decimal overflow issues for Average

2021-07-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35955:


Assignee: (was: Apache Spark)

> Fix decimal overflow issues for Average
> ---
>
> Key: SPARK-35955
> URL: https://issues.apache.org/jira/browse/SPARK-35955
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Karen Feng
>Priority: Major
>
> Fix decimal overflow issues for decimal average in ANSI mode. Linked to 
> SPARK-32018 and SPARK-28067, which address decimal sum.
> Repro:
>  
> {code:java}
> import org.apache.spark.sql.functions._
> spark.conf.set("spark.sql.ansi.enabled", true)
> val df = Seq(
>  (BigDecimal("1000"), 1),
>  (BigDecimal("1000"), 1),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2)).toDF("decNum", "intNum")
> val df2 = df.withColumnRenamed("decNum", "decNum2").join(df, 
> "intNum").agg(mean("decNum"))
> df2.show(40,false)
> {code}
>  
> Should throw an exception (as sum overflows), but instead returns:
>  
> {code:java}
> +---+
> |avg(decNum)|
> +---+
> |null   |
> +---+{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35975) New configuration spark.sql.timestampType for the default timestamp type

2021-07-01 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-35975.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33176
[https://github.com/apache/spark/pull/33176]

> New configuration spark.sql.timestampType for the default timestamp type
> 
>
> Key: SPARK-35975
> URL: https://issues.apache.org/jira/browse/SPARK-35975
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> Add a new configuration `spark.sql.timestampType`, which configures the 
> default timestamp type of Spark SQL, including SQL DDL and Cast clause. 
> Setting the configuration as TIMESTAMP_NTZ will use TIMESTAMP WITHOUT TIME 
> ZONE as the default type while putting it as TIMESTAMP_LTZ will use TIMESTAMP 
> WITH LOCAL TIME ZONE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35855) Unify reuse map data structures in non-AQE and AQE rules

2021-07-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373025#comment-17373025
 ] 

Apache Spark commented on SPARK-35855:
--

User 'karenfeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/33177

> Unify reuse map data structures in non-AQE and AQE rules
> 
>
> Key: SPARK-35855
> URL: https://issues.apache.org/jira/browse/SPARK-35855
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Peter Toth
>Assignee: Peter Toth
>Priority: Minor
> Fix For: 3.2.0
>
>
> We can unify reuse map data structures in non-AQE and AQE rules 
> (`ReuseExchangeAndSubquery`, `ReuseAdaptiveSubquery`) to a simple 
> `Map[, ]`.
> Please find discussion here: 
> [https://github.com/apache/spark/pull/28885#discussion_r655073897]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35855) Unify reuse map data structures in non-AQE and AQE rules

2021-07-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373023#comment-17373023
 ] 

Apache Spark commented on SPARK-35855:
--

User 'karenfeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/33177

> Unify reuse map data structures in non-AQE and AQE rules
> 
>
> Key: SPARK-35855
> URL: https://issues.apache.org/jira/browse/SPARK-35855
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Peter Toth
>Assignee: Peter Toth
>Priority: Minor
> Fix For: 3.2.0
>
>
> We can unify reuse map data structures in non-AQE and AQE rules 
> (`ReuseExchangeAndSubquery`, `ReuseAdaptiveSubquery`) to a simple 
> `Map[, ]`.
> Please find discussion here: 
> [https://github.com/apache/spark/pull/28885#discussion_r655073897]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35974) Spark submit REST cluster/standalone mode - launching an s3a jar with STS

2021-07-01 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-35974.
---
Resolution: Cannot Reproduce

Could you try to use Apache Spark 3.1.2, please, [~toopt4], because Apache 
Spark 2.4 is EOL. It seems that the log shows `spark-2.3.4-bin-hadoop2.7` and 
the affected version is 2.4.6. Both are too old.

> Spark submit REST cluster/standalone mode - launching an s3a jar with STS
> -
>
> Key: SPARK-35974
> URL: https://issues.apache.org/jira/browse/SPARK-35974
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.6
>Reporter: t oo
>Priority: Major
>
> {code:java}
> /var/lib/spark-2.3.4-bin-hadoop2.7/bin/spark-submit --master 
> spark://myhost:6066 --conf spark.hadoop.fs.s3a.access.key='redact1' --conf 
> spark.executorEnv.AWS_ACCESS_KEY_ID='redact1' --conf 
> spark.driverEnv.AWS_ACCESS_KEY_ID='redact1' --conf 
> spark.hadoop.fs.s3a.secret.key='redact2' --conf 
> spark.executorEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf 
> spark.driverEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf 
> spark.hadoop.fs.s3a.session.token='redact3' --conf 
> spark.executorEnv.AWS_SESSION_TOKEN='redact3' --conf 
> spark.driverEnv.AWS_SESSION_TOKEN='redact3' --conf 
> spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider
>  --conf spark.driver.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1 
> -DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3' --conf 
> spark.executor.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1 
> -DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3' 
> --total-executor-cores 4 --executor-cores 2 --executor-memory 2g 
> --driver-memory 1g --name lin1 --deploy-mode cluster --conf 
> spark.eventLog.enabled=false --class com.yotpo.metorikku.Metorikku 
> s3a://mybuc/metorikku_2.11.jar -c s3a://mybuc/spark_ingestion_job.yaml
> {code}
> running the above command give below stack trace:
>  
> {code:java}
>  Exception from the cluster:\njava.nio.file.AccessDeniedException: 
> s3a://mybuc/metorikku_2.11.jar: getFileStatus on 
> s3a://mybuc/metorikku_2.11.jar: 
> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon 
> S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: xx; S3 Extended 
> Request ID: /1qj/yy=), S3 Extended Request ID: /1qj/yy=\n\
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158)
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:101)
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1542)
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:117)
> org.apache.hadoop.fs.FileSystem.isFile(FileSystem.java:1463)
> org.apache.hadoop.fs.s3a.S3AFileSystem.isFile(S3AFileSystem.java:2030)
> org.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:747)
> org.apache.spark.util.Utils$.doFetchFile(Utils.scala:723)
> org.apache.spark.util.Utils$.fetchFile(Utils.scala:509)
> org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:155)
> org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:173)
> org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:92){code}
> all the ec2s in the spark cluster only have access to s3 via STS tokens. The 
> jar itself reads csvs from s3 using the tokens, and everything works if 
> either 1. i change the commandline to point to local jars on the ec2 OR 2. 
> use port 7077/client mode instead of cluster mode. But it seems the jar 
> itself can't be launched off s3, as if the tokens are not being picked up 
> properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35974) Spark submit REST cluster/standalone mode - launching an s3a jar with STS

2021-07-01 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372995#comment-17372995
 ] 

Dongjoon Hyun commented on SPARK-35974:
---

Free free to reopen this with the updated information with Spark 3.

> Spark submit REST cluster/standalone mode - launching an s3a jar with STS
> -
>
> Key: SPARK-35974
> URL: https://issues.apache.org/jira/browse/SPARK-35974
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.6
>Reporter: t oo
>Priority: Major
>
> {code:java}
> /var/lib/spark-2.3.4-bin-hadoop2.7/bin/spark-submit --master 
> spark://myhost:6066 --conf spark.hadoop.fs.s3a.access.key='redact1' --conf 
> spark.executorEnv.AWS_ACCESS_KEY_ID='redact1' --conf 
> spark.driverEnv.AWS_ACCESS_KEY_ID='redact1' --conf 
> spark.hadoop.fs.s3a.secret.key='redact2' --conf 
> spark.executorEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf 
> spark.driverEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf 
> spark.hadoop.fs.s3a.session.token='redact3' --conf 
> spark.executorEnv.AWS_SESSION_TOKEN='redact3' --conf 
> spark.driverEnv.AWS_SESSION_TOKEN='redact3' --conf 
> spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider
>  --conf spark.driver.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1 
> -DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3' --conf 
> spark.executor.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1 
> -DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3' 
> --total-executor-cores 4 --executor-cores 2 --executor-memory 2g 
> --driver-memory 1g --name lin1 --deploy-mode cluster --conf 
> spark.eventLog.enabled=false --class com.yotpo.metorikku.Metorikku 
> s3a://mybuc/metorikku_2.11.jar -c s3a://mybuc/spark_ingestion_job.yaml
> {code}
> running the above command give below stack trace:
>  
> {code:java}
>  Exception from the cluster:\njava.nio.file.AccessDeniedException: 
> s3a://mybuc/metorikku_2.11.jar: getFileStatus on 
> s3a://mybuc/metorikku_2.11.jar: 
> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon 
> S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: xx; S3 Extended 
> Request ID: /1qj/yy=), S3 Extended Request ID: /1qj/yy=\n\
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158)
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:101)
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1542)
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:117)
> org.apache.hadoop.fs.FileSystem.isFile(FileSystem.java:1463)
> org.apache.hadoop.fs.s3a.S3AFileSystem.isFile(S3AFileSystem.java:2030)
> org.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:747)
> org.apache.spark.util.Utils$.doFetchFile(Utils.scala:723)
> org.apache.spark.util.Utils$.fetchFile(Utils.scala:509)
> org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:155)
> org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:173)
> org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:92){code}
> all the ec2s in the spark cluster only have access to s3 via STS tokens. The 
> jar itself reads csvs from s3 using the tokens, and everything works if 
> either 1. i change the commandline to point to local jars on the ec2 OR 2. 
> use port 7077/client mode instead of cluster mode. But it seems the jar 
> itself can't be launched off s3, as if the tokens are not being picked up 
> properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-35972) NestColumnPruning cause execute loss output

2021-07-01 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372980#comment-17372980
 ] 

Dongjoon Hyun edited comment on SPARK-35972 at 7/1/21, 6:06 PM:


Hi, [~angerszhu]. Could you make this as a BUG?
Also, could you provide more detail?


was (Author: dongjoon):
Hi, [~angerszhu]. Could you make this as a BUG?

> NestColumnPruning cause execute loss output
> ---
>
> Key: SPARK-35972
> URL: https://issues.apache.org/jira/browse/SPARK-35972
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: angerszhu
>Priority: Major
>
> {code:java}
> Job aborted due to stage failure: Task 47 in stage 1.0 failed 4 times, most 
> recent failure: Lost task 47.3 in stage 1.0 (TID 328) 
> (ip-idata-server.shopee.io executor 3): 
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
> attribute, tree: _gen_alias_788#788
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:75)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:318)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:318)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.immutable.List.map(List.scala:298)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.immutable.List.map(List.scala:298)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
>   at 
>

[jira] [Commented] (SPARK-35972) NestColumnPruning cause execute loss output

2021-07-01 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372980#comment-17372980
 ] 

Dongjoon Hyun commented on SPARK-35972:
---

Hi, [~angerszhu]. Could you make this as a BUG?

> NestColumnPruning cause execute loss output
> ---
>
> Key: SPARK-35972
> URL: https://issues.apache.org/jira/browse/SPARK-35972
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: angerszhu
>Priority: Major
>
> {code:java}
> Job aborted due to stage failure: Task 47 in stage 1.0 failed 4 times, most 
> recent failure: Lost task 47.3 in stage 1.0 (TID 328) 
> (ip-idata-server.shopee.io executor 3): 
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
> attribute, tree: _gen_alias_788#788
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:75)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:318)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:318)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.immutable.List.map(List.scala:298)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.immutable.List.map(List.scala:298)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
>   at 
>

[jira] [Created] (SPARK-35979) Return different timestamp literals based on the default timestamp type

2021-07-01 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-35979:
--

 Summary: Return different timestamp literals based on the default 
timestamp type
 Key: SPARK-35979
 URL: https://issues.apache.org/jira/browse/SPARK-35979
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Gengliang Wang


For the timestamp literal, it should have following behavior.
1. When spark.sql.timestampType is TIMESTAMP_NTZ: if there is no time zone 
part, return timestamp without time zone literal; otherwise, return timestamp 
with local time zone literal

2. When spark.sql.timestampType is TIMESTAMP_LTZ: return timestamp with local 
time zone literal



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35978) Support new keyword TIMESTAMP_LTZ

2021-07-01 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-35978:
--

 Summary: Support new keyword TIMESTAMP_LTZ
 Key: SPARK-35978
 URL: https://issues.apache.org/jira/browse/SPARK-35978
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Gengliang Wang


Support new keyword TIMESTAMP_LTZ, which can be used for:
* timestamp with local time zone data type in DDL
* timestamp with local time zone data type in Cast clause.
* timestamp with local time zone data type literal



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35977) Support new keyword TIMESTAMP_NTZ

2021-07-01 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-35977:
--

 Summary: Support new keyword TIMESTAMP_NTZ
 Key: SPARK-35977
 URL: https://issues.apache.org/jira/browse/SPARK-35977
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Gengliang Wang


Support new keyword TIMESTAMP_NTZ, which can be used for:
* timestamp without time zone data type in DDL
* timestamp without time zone data type in Cast clause.
* timestamp without time zone data type literal



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35756) unionByName should support nested struct also

2021-07-01 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-35756.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32972
[https://github.com/apache/spark/pull/32972]

> unionByName should support nested struct also
> -
>
> Key: SPARK-35756
> URL: https://issues.apache.org/jira/browse/SPARK-35756
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Wassim Almaaoui
>Assignee: Saurabh Chawla
>Priority: Major
> Fix For: 3.2.0
>
>
> It would be cool if `unionByName` supports also nested struct. I don't kwon 
> if it's the expected behaviour already or not so I am not sure if its a bug 
> or an improvement proposal. 
> {code:java}
> case class Struct1(c1: Int, c2: Int)
> case class Struct2(c2: Int, c1: Int)
> val ds1 = Seq((1, Struct1(1,2))).toDS
> val ds2 = Seq((1, Struct2(1,2))).toDS
> ds1.unionByName(ds2.as[(Int,Struct1)]) {code}
> gives 
> {code:java}
> org.apache.spark.sql.AnalysisException: Union can only be performed on tables 
> with the compatible column types. struct <> 
> struct at the second column of the second table; 'Union false, 
> false :- LocalRelation [_1#38, _2#39] +- LocalRelation _1#45, _2#46
> {code}
> The code documentation of the function `unionByName` says `Note that 
> allowMissingColumns supports nested column in struct types` but doesn't say 
> if the function itself supports the nested column ordering or not. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35756) unionByName should support nested struct also

2021-07-01 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-35756:
---

Assignee: Saurabh Chawla

> unionByName should support nested struct also
> -
>
> Key: SPARK-35756
> URL: https://issues.apache.org/jira/browse/SPARK-35756
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Wassim Almaaoui
>Assignee: Saurabh Chawla
>Priority: Major
>
> It would be cool if `unionByName` supports also nested struct. I don't kwon 
> if it's the expected behaviour already or not so I am not sure if its a bug 
> or an improvement proposal. 
> {code:java}
> case class Struct1(c1: Int, c2: Int)
> case class Struct2(c2: Int, c1: Int)
> val ds1 = Seq((1, Struct1(1,2))).toDS
> val ds2 = Seq((1, Struct2(1,2))).toDS
> ds1.unionByName(ds2.as[(Int,Struct1)]) {code}
> gives 
> {code:java}
> org.apache.spark.sql.AnalysisException: Union can only be performed on tables 
> with the compatible column types. struct <> 
> struct at the second column of the second table; 'Union false, 
> false :- LocalRelation [_1#38, _2#39] +- LocalRelation _1#45, _2#46
> {code}
> The code documentation of the function `unionByName` says `Note that 
> allowMissingColumns supports nested column in struct types` but doesn't say 
> if the function itself supports the nested column ordering or not. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35976) Adjust `astype` method for ExtensionDtype in pandas API on Spark

2021-07-01 Thread Xinrong Meng (Jira)

Xinrong Meng created SPARK-35976:


 Summary: Adjust `astype` method for ExtensionDtype in pandas API 
on Spark
 Key: SPARK-35976
 URL: https://issues.apache.org/jira/browse/SPARK-35976
 Project: Spark
  Issue Type: Story
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Xinrong Meng


Currently, `astype` method for ExtensionDtype in pandas API on Spark is not 
consistent with pandas. For example, 
[https://github.com/apache/spark/pull/33095#discussion_r661704734.]

 

We ought to fill in the gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35975) New configuration spark.sql.timestampType for the default timestamp type

2021-07-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35975:


Assignee: Apache Spark  (was: Gengliang Wang)

> New configuration spark.sql.timestampType for the default timestamp type
> 
>
> Key: SPARK-35975
> URL: https://issues.apache.org/jira/browse/SPARK-35975
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> Add a new configuration `spark.sql.timestampType`, which configures the 
> default timestamp type of Spark SQL, including SQL DDL and Cast clause. 
> Setting the configuration as TIMESTAMP_NTZ will use TIMESTAMP WITHOUT TIME 
> ZONE as the default type while putting it as TIMESTAMP_LTZ will use TIMESTAMP 
> WITH LOCAL TIME ZONE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35975) New configuration spark.sql.timestampType for the default timestamp type

2021-07-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35975:


Assignee: Gengliang Wang  (was: Apache Spark)

> New configuration spark.sql.timestampType for the default timestamp type
> 
>
> Key: SPARK-35975
> URL: https://issues.apache.org/jira/browse/SPARK-35975
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Add a new configuration `spark.sql.timestampType`, which configures the 
> default timestamp type of Spark SQL, including SQL DDL and Cast clause. 
> Setting the configuration as TIMESTAMP_NTZ will use TIMESTAMP WITHOUT TIME 
> ZONE as the default type while putting it as TIMESTAMP_LTZ will use TIMESTAMP 
> WITH LOCAL TIME ZONE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35975) New configuration spark.sql.timestampType for the default timestamp type

2021-07-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372914#comment-17372914
 ] 

Apache Spark commented on SPARK-35975:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/33176

> New configuration spark.sql.timestampType for the default timestamp type
> 
>
> Key: SPARK-35975
> URL: https://issues.apache.org/jira/browse/SPARK-35975
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Add a new configuration `spark.sql.timestampType`, which configures the 
> default timestamp type of Spark SQL, including SQL DDL and Cast clause. 
> Setting the configuration as TIMESTAMP_NTZ will use TIMESTAMP WITHOUT TIME 
> ZONE as the default type while putting it as TIMESTAMP_LTZ will use TIMESTAMP 
> WITH LOCAL TIME ZONE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35975) New configuration spark.sql.timestampType for the default timestamp type

2021-07-01 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-35975:
--

 Summary: New configuration spark.sql.timestampType for the default 
timestamp type
 Key: SPARK-35975
 URL: https://issues.apache.org/jira/browse/SPARK-35975
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


Add a new configuration `spark.sql.timestampType`, which configures the default 
timestamp type of Spark SQL, including SQL DDL and Cast clause. Setting the 
configuration as TIMESTAMP_NTZ will use TIMESTAMP WITHOUT TIME ZONE as the 
default type while putting it as TIMESTAMP_LTZ will use TIMESTAMP WITH LOCAL 
TIME ZONE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35955) Fix decimal overflow issues for Average

2021-07-01 Thread Karen Feng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372910#comment-17372910
 ] 

Karen Feng commented on SPARK-35955:


I have changes almost ready locally, will open PR soon.[~dc-heros], what is the 
state of your work?

> Fix decimal overflow issues for Average
> ---
>
> Key: SPARK-35955
> URL: https://issues.apache.org/jira/browse/SPARK-35955
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Karen Feng
>Priority: Major
>
> Fix decimal overflow issues for decimal average in ANSI mode. Linked to 
> SPARK-32018 and SPARK-28067, which address decimal sum.
> Repro:
>  
> {code:java}
> import org.apache.spark.sql.functions._
> spark.conf.set("spark.sql.ansi.enabled", true)
> val df = Seq(
>  (BigDecimal("1000"), 1),
>  (BigDecimal("1000"), 1),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2)).toDF("decNum", "intNum")
> val df2 = df.withColumnRenamed("decNum", "decNum2").join(df, 
> "intNum").agg(mean("decNum"))
> df2.show(40,false)
> {code}
>  
> Should throw an exception (as sum overflows), but instead returns:
>  
> {code:java}
> +---+
> |avg(decNum)|
> +---+
> |null   |
> +---+{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35968) Make sure partitions are not too small in AQE partition coalescing

2021-07-01 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-35968:
--
Parent: SPARK-33828
Issue Type: Sub-task  (was: Improvement)

> Make sure partitions are not too small in AQE partition coalescing
> --
>
> Key: SPARK-35968
> URL: https://issues.apache.org/jira/browse/SPARK-35968
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35969) Make the pod prefix more readable and tallied with K8S DNS Label Names

2021-07-01 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-35969.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33171
[https://github.com/apache/spark/pull/33171]

> Make the pod prefix more readable and tallied with K8S DNS Label Names
> --
>
> Key: SPARK-35969
> URL: https://issues.apache.org/jira/browse/SPARK-35969
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.2.0
>
>
> By default, the executor pod prefix is generated by the app name. It handles 
> characters that match [^a-z0-9\\-] differently. The '.' and all whitespaces 
> will be converted to '-', but other ones to empty string. Especially,  
> characters like '_', '|' are commonly used as a word separator in many 
> languages.
> According to the K8S DNS Label Names, see 
> [https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names,]
>  we can convert all special characters to `-`.
>  
> {code:scala}
> scala> "time.is%the￥most$valuable_——thing,it's about 
> time.".replaceAll("[^a-z0-9\\-]", "-").replaceAll("-+", "-")
> res9: String = time-is-the-most-valuable-thing-it-s-about-time-
> scala> "time.is%the￥most$valuable_——thing,it's about 
> time.".replaceAll("\\s+", "-").replaceAll("\\.", 
> "-").replaceAll("[^a-z0-9\\-]", "").replaceAll("-+", "-")
> res10: String = time-isthemostvaluablethingits-about-time-
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35969) Make the pod prefix more readable and tallied with K8S DNS Label Names

2021-07-01 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-35969:
-

Assignee: Kent Yao

> Make the pod prefix more readable and tallied with K8S DNS Label Names
> --
>
> Key: SPARK-35969
> URL: https://issues.apache.org/jira/browse/SPARK-35969
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>
> By default, the executor pod prefix is generated by the app name. It handles 
> characters that match [^a-z0-9\\-] differently. The '.' and all whitespaces 
> will be converted to '-', but other ones to empty string. Especially,  
> characters like '_', '|' are commonly used as a word separator in many 
> languages.
> According to the K8S DNS Label Names, see 
> [https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names,]
>  we can convert all special characters to `-`.
>  
> {code:scala}
> scala> "time.is%the￥most$valuable_——thing,it's about 
> time.".replaceAll("[^a-z0-9\\-]", "-").replaceAll("-+", "-")
> res9: String = time-is-the-most-valuable-thing-it-s-about-time-
> scala> "time.is%the￥most$valuable_——thing,it's about 
> time.".replaceAll("\\s+", "-").replaceAll("\\.", 
> "-").replaceAll("[^a-z0-9\\-]", "").replaceAll("-+", "-")
> res10: String = time-isthemostvaluablethingits-about-time-
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35974) Spark submit REST cluster/standalone mode - launching an s3a jar with STS

2021-07-01 Thread t oo (Jira)

t oo created SPARK-35974:


 Summary: Spark submit REST cluster/standalone mode - launching an 
s3a jar with STS
 Key: SPARK-35974
 URL: https://issues.apache.org/jira/browse/SPARK-35974
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.6
Reporter: t oo


{code:java}
/var/lib/spark-2.3.4-bin-hadoop2.7/bin/spark-submit --master 
spark://myhost:6066 --conf spark.hadoop.fs.s3a.access.key='redact1' --conf 
spark.executorEnv.AWS_ACCESS_KEY_ID='redact1' --conf 
spark.driverEnv.AWS_ACCESS_KEY_ID='redact1' --conf 
spark.hadoop.fs.s3a.secret.key='redact2' --conf 
spark.executorEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf 
spark.driverEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf 
spark.hadoop.fs.s3a.session.token='redact3' --conf 
spark.executorEnv.AWS_SESSION_TOKEN='redact3' --conf 
spark.driverEnv.AWS_SESSION_TOKEN='redact3' --conf 
spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider
 --conf spark.driver.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1 
-DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3' --conf 
spark.executor.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1 
-DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3' 
--total-executor-cores 4 --executor-cores 2 --executor-memory 2g 
--driver-memory 1g --name lin1 --deploy-mode cluster --conf 
spark.eventLog.enabled=false --class com.yotpo.metorikku.Metorikku 
s3a://mybuc/metorikku_2.11.jar -c s3a://mybuc/spark_ingestion_job.yaml
{code}
running the above command give below stack trace:

 
{code:java}
 Exception from the cluster:\njava.nio.file.AccessDeniedException: 
s3a://mybuc/metorikku_2.11.jar: getFileStatus on 
s3a://mybuc/metorikku_2.11.jar: 
com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon 
S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: xx; S3 Extended 
Request ID: /1qj/yy=), S3 Extended Request ID: /1qj/yy=\n\
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158)
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:101)
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1542)
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:117)
org.apache.hadoop.fs.FileSystem.isFile(FileSystem.java:1463)
org.apache.hadoop.fs.s3a.S3AFileSystem.isFile(S3AFileSystem.java:2030)
org.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:747)
org.apache.spark.util.Utils$.doFetchFile(Utils.scala:723)
org.apache.spark.util.Utils$.fetchFile(Utils.scala:509)
org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:155)
org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:173)
org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:92){code}
all the ec2s in the spark cluster only have access to s3 via STS tokens. The 
jar itself reads csvs from s3 using the tokens, and everything works if either 
1. i change the commandline to point to local jars on the ec2 OR 2. use port 
7077/client mode instead of cluster mode. But it seems the jar itself can't be 
launched off s3, as if the tokens are not being picked up properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35973) DataSourceV2: Support SHOW CATALOGS

2021-07-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35973:


Assignee: Apache Spark

> DataSourceV2: Support SHOW CATALOGS
> ---
>
> Key: SPARK-35973
> URL: https://issues.apache.org/jira/browse/SPARK-35973
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: PengLei
>Assignee: Apache Spark
>Priority: Major
>
> Datasource V2 can support multiple catalogs. Having "SHOW CATALOGS" to list 
> the catalogs and corresponding default-namespace info will be useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35973) DataSourceV2: Support SHOW CATALOGS

2021-07-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35973:


Assignee: (was: Apache Spark)

> DataSourceV2: Support SHOW CATALOGS
> ---
>
> Key: SPARK-35973
> URL: https://issues.apache.org/jira/browse/SPARK-35973
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: PengLei
>Priority: Major
>
> Datasource V2 can support multiple catalogs. Having "SHOW CATALOGS" to list 
> the catalogs and corresponding default-namespace info will be useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35973) DataSourceV2: Support SHOW CATALOGS

2021-07-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372800#comment-17372800
 ] 

Apache Spark commented on SPARK-35973:
--

User 'Peng-Lei' has created a pull request for this issue:
https://github.com/apache/spark/pull/33175

> DataSourceV2: Support SHOW CATALOGS
> ---
>
> Key: SPARK-35973
> URL: https://issues.apache.org/jira/browse/SPARK-35973
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: PengLei
>Priority: Major
>
> Datasource V2 can support multiple catalogs. Having "SHOW CATALOGS" to list 
> the catalogs and corresponding default-namespace info will be useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35973) DataSourceV2: Support SHOW CATALOGS

2021-07-01 Thread PengLei (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PengLei updated SPARK-35973:

Description: Datasource V2 can support multiple catalogs. Having "SHOW 
CATALOGS" to list the catalogs and corresponding default-namespace info will be 
useful.  (was: Datasource V2 can support multiple catalogs. Having "SHOW 
CATALOGS" to list the catalogs/default-namespace info will be useful.)

> DataSourceV2: Support SHOW CATALOGS
> ---
>
> Key: SPARK-35973
> URL: https://issues.apache.org/jira/browse/SPARK-35973
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: PengLei
>Priority: Major
>
> Datasource V2 can support multiple catalogs. Having "SHOW CATALOGS" to list 
> the catalogs and corresponding default-namespace info will be useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35973) DataSourceV2: Support SHOW CATALOGS

2021-07-01 Thread PengLei (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372797#comment-17372797
 ] 

PengLei commented on SPARK-35973:
-

I am woking on this. After 3.2 released

> DataSourceV2: Support SHOW CATALOGS
> ---
>
> Key: SPARK-35973
> URL: https://issues.apache.org/jira/browse/SPARK-35973
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: PengLei
>Priority: Major
>
> Datasource V2 can support multiple catalogs. Having "SHOW CATALOGS" to list 
> the catalogs/default-namespace info will be useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35973) DataSourceV2: Support SHOW CATALOGS

2021-07-01 Thread PengLei (Jira)

PengLei created SPARK-35973:
---

 Summary: DataSourceV2: Support SHOW CATALOGS
 Key: SPARK-35973
 URL: https://issues.apache.org/jira/browse/SPARK-35973
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.0
Reporter: PengLei


Datasource V2 can support multiple catalogs. Having "SHOW CATALOGS" to list the 
catalogs/default-namespace info will be useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35971) Rename the type name of TimestampNTZType as "timestamp_ntz"

2021-07-01 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-35971.

Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33173
[https://github.com/apache/spark/pull/33173]

> Rename the type name of TimestampNTZType as "timestamp_ntz"
> ---
>
> Key: SPARK-35971
> URL: https://issues.apache.org/jira/browse/SPARK-35971
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> Rename the type name string of TimestampNTZType from "timestamp without time 
> zone" to "timestamp_ntz".
> This is to make the column header shorter and simpler.
> Snowflake and Flink uses similar approach:
> https://docs.snowflake.com/en/sql-reference/data-types-datetime.html
> https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/concepts/timezone/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35686) Avoid using auto generated alias when creating view

2021-07-01 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-35686.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32832
[https://github.com/apache/spark/pull/32832]

> Avoid using auto generated alias when creating view
> ---
>
> Key: SPARK-35686
> URL: https://issues.apache.org/jira/browse/SPARK-35686
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Linhong Liu
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.2.0
>
>
> If the user creates a view in 2.4 and reads it in 3.2, there will be an 
> incompatible schema issue. the root cause is that we changed the alias auto 
> generation rule after 2.4. To avoid this happening again, we should let the 
> user explicitly specifying the column names



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35972) NestColumnPruning cause execute loss output

2021-07-01 Thread angerszhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372647#comment-17372647
 ] 

angerszhu commented on SPARK-35972:
---

We meet a case that it can analyze/optimize and generate sparkplan well but 
when running, executor will throw exception like above in desc. looks like 
child's output loss data

@

> NestColumnPruning cause execute loss output
> ---
>
> Key: SPARK-35972
> URL: https://issues.apache.org/jira/browse/SPARK-35972
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: angerszhu
>Priority: Major
>
> {code:java}
> Job aborted due to stage failure: Task 47 in stage 1.0 failed 4 times, most 
> recent failure: Lost task 47.3 in stage 1.0 (TID 328) 
> (ip-idata-server.shopee.io executor 3): 
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
> attribute, tree: _gen_alias_788#788
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:75)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:318)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:318)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.immutable.List.map(List.scala:298)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.immutable.List.map(List.scala:298)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
>   at 
>

[jira] [Updated] (SPARK-35972) NestColumnPruning cause execute loss output

2021-07-01 Thread angerszhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-35972:
--
Description: 
{code:java}
Job aborted due to stage failure: Task 47 in stage 1.0 failed 4 times, most 
recent failure: Lost task 47.3 in stage 1.0 (TID 328) 
(ip-idata-server.shopee.io executor 3): 
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
attribute, tree: _gen_alias_788#788
at 
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:75)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:318)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:318)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438)
at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.immutable.List.map(List.scala:298)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438)
at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.immutable.List.map(List.scala:298)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:386)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438)
at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.immutable.List.foreach(List.scala:392)
at

[jira] [Created] (SPARK-35972) NestColumnPruning cause execute loss output

2021-07-01 Thread angerszhu (Jira)

angerszhu created SPARK-35972:
-

 Summary: NestColumnPruning cause execute loss output
 Key: SPARK-35972
 URL: https://issues.apache.org/jira/browse/SPARK-35972
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.2
Reporter: angerszhu



{code:java}
Job aborted due to stage failure: Task 47 in stage 1.0 failed 4 times, most 
recent failure: Lost task 47.3 in stage 1.0 (TID 328) 
(ip-10-130-163-200.idata-server.shopee.io executor 3): 
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
attribute, tree: _gen_alias_788#788
at 
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:75)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:318)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:318)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438)
at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.immutable.List.map(List.scala:298)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438)
at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.immutable.List.map(List.scala:298)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:386)
at

[jira] [Resolved] (SPARK-35966) Port HIVE-17952: Fix license headers to avoid dangling javadoc warnings

2021-07-01 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-35966.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33169
[https://github.com/apache/spark/pull/33169]

> Port HIVE-17952: Fix license headers to avoid dangling javadoc warnings
> ---
>
> Key: SPARK-35966
> URL: https://issues.apache.org/jira/browse/SPARK-35966
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
> Fix For: 3.2.0
>
>
> see HIVE-17952



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35966) Port HIVE-17952: Fix license headers to avoid dangling javadoc warnings

2021-07-01 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-35966:


Assignee: Kent Yao

> Port HIVE-17952: Fix license headers to avoid dangling javadoc warnings
> ---
>
> Key: SPARK-35966
> URL: https://issues.apache.org/jira/browse/SPARK-35966
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
>
> see HIVE-17952



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35965) Add documentation for ORC nested column vectorized reader

2021-07-01 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-35965:


Assignee: Cheng Su

> Add documentation for ORC nested column vectorized reader
> -
>
> Key: SPARK-35965
> URL: https://issues.apache.org/jira/browse/SPARK-35965
> Project: Spark
>  Issue Type: Documentation
>  Components: docs, SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Su
>Assignee: Cheng Su
>Priority: Trivial
>
> In https://issues.apache.org/jira/browse/SPARK-34862, we added support for 
> ORC nested column vectorized reader, and it is disabled by default for now. 
> So we would like to add the user-facing documentation for it, and user can 
> opt-in to use it if they want.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35965) Add documentation for ORC nested column vectorized reader

2021-07-01 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-35965.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33168
[https://github.com/apache/spark/pull/33168]

> Add documentation for ORC nested column vectorized reader
> -
>
> Key: SPARK-35965
> URL: https://issues.apache.org/jira/browse/SPARK-35965
> Project: Spark
>  Issue Type: Documentation
>  Components: docs, SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Su
>Assignee: Cheng Su
>Priority: Trivial
> Fix For: 3.2.0
>
>
> In https://issues.apache.org/jira/browse/SPARK-34862, we added support for 
> ORC nested column vectorized reader, and it is disabled by default for now. 
> So we would like to add the user-facing documentation for it, and user can 
> opt-in to use it if they want.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35685) Prompt recreating the View when there is a schema incompatible change

2021-07-01 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-35685.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32831
[https://github.com/apache/spark/pull/32831]

> Prompt recreating the View when there is a schema incompatible change
> -
>
> Key: SPARK-35685
> URL: https://issues.apache.org/jira/browse/SPARK-35685
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Linhong Liu
>Assignee: Linhong Liu
>Priority: Major
> Fix For: 3.2.0
>
>
> Prompt recreating the View when there is a schema incompatible change. 
> Something like:
> "there is an incompatible schema change and the column couldn't be resolved. 
> Please consider to recreate the view to fix this: CREATE OR REPLACE VIEW v AS 
> xxx"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35685) Prompt recreating the View when there is a schema incompatible change

2021-07-01 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-35685:
---

Assignee: Linhong Liu

> Prompt recreating the View when there is a schema incompatible change
> -
>
> Key: SPARK-35685
> URL: https://issues.apache.org/jira/browse/SPARK-35685
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Linhong Liu
>Assignee: Linhong Liu
>Priority: Major
>
> Prompt recreating the View when there is a schema incompatible change. 
> Something like:
> "there is an incompatible schema change and the column couldn't be resolved. 
> Please consider to recreate the view to fix this: CREATE OR REPLACE VIEW v AS 
> xxx"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35618) Resolve star expressions in subquery

2021-07-01 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-35618:
---

Assignee: Allison Wang

> Resolve star expressions in subquery
> 
>
> Key: SPARK-35618
> URL: https://issues.apache.org/jira/browse/SPARK-35618
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>
> Currently, Spark does not resolve star expressions in subqueries correctly. 
> It can only resolve the star expressions using the inner query attributes. 
> For example:
> {{CREATE VIEW t(a) AS VALUES (1), (2);}}
> {{SELECT * FROM t WHERE a in (SELECT t.*)}}
> {{SELECT * FROM t, LATERAL (SELECT t.*)}}
> {{org.apache.spark.sql.AnalysisException: cannot resolve 't.*' given input 
> columns '';}}
> Instead, we should try to resolve star expressions in subquery first using 
> the inner attributes and then using the outer query attributes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35618) Resolve star expressions in subquery

2021-07-01 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-35618.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32787
[https://github.com/apache/spark/pull/32787]

> Resolve star expressions in subquery
> 
>
> Key: SPARK-35618
> URL: https://issues.apache.org/jira/browse/SPARK-35618
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> Currently, Spark does not resolve star expressions in subqueries correctly. 
> It can only resolve the star expressions using the inner query attributes. 
> For example:
> {{CREATE VIEW t(a) AS VALUES (1), (2);}}
> {{SELECT * FROM t WHERE a in (SELECT t.*)}}
> {{SELECT * FROM t, LATERAL (SELECT t.*)}}
> {{org.apache.spark.sql.AnalysisException: cannot resolve 't.*' given input 
> columns '';}}
> Instead, we should try to resolve star expressions in subquery first using 
> the inner attributes and then using the outer query attributes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35955) Fix decimal overflow issues for Average

2021-07-01 Thread dgd_contributor (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372566#comment-17372566
 ] 

dgd_contributor commented on SPARK-35955:
-

I will raise a pull request soon

 

> Fix decimal overflow issues for Average
> ---
>
> Key: SPARK-35955
> URL: https://issues.apache.org/jira/browse/SPARK-35955
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Karen Feng
>Priority: Major
>
> Fix decimal overflow issues for decimal average in ANSI mode. Linked to 
> SPARK-32018 and SPARK-28067, which address decimal sum.
> Repro:
>  
> {code:java}
> import org.apache.spark.sql.functions._
> spark.conf.set("spark.sql.ansi.enabled", true)
> val df = Seq(
>  (BigDecimal("1000"), 1),
>  (BigDecimal("1000"), 1),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2),
>  (BigDecimal("1000"), 2)).toDF("decNum", "intNum")
> val df2 = df.withColumnRenamed("decNum", "decNum2").join(df, 
> "intNum").agg(mean("decNum"))
> df2.show(40,false)
> {code}
>  
> Should throw an exception (as sum overflows), but instead returns:
>  
> {code:java}
> +---+
> |avg(decNum)|
> +---+
> |null   |
> +---+{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35721) Path level discover for python unittests

2021-07-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372560#comment-17372560
 ] 

Apache Spark commented on SPARK-35721:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/33174

> Path level discover for python unittests
> 
>
> Key: SPARK-35721
> URL: https://issues.apache.org/jira/browse/SPARK-35721
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.2.0
>Reporter: Yikun Jiang
>Priority: Major
>
> Now we need to specify the python test cases by manually when we add a new 
> testcase. Sometime, we forgot to add the testcase to module list, the 
> testcase would not be executed.
> Such as:
>  * pyspark-core pyspark.tests.test_pin_thread
> Thus we need some auto-discover way to find all testcase rather than 
> specified every case by manually.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35721) Path level discover for python unittests

2021-07-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372559#comment-17372559
 ] 

Apache Spark commented on SPARK-35721:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/33174

> Path level discover for python unittests
> 
>
> Key: SPARK-35721
> URL: https://issues.apache.org/jira/browse/SPARK-35721
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.2.0
>Reporter: Yikun Jiang
>Priority: Major
>
> Now we need to specify the python test cases by manually when we add a new 
> testcase. Sometime, we forgot to add the testcase to module list, the 
> testcase would not be executed.
> Such as:
>  * pyspark-core pyspark.tests.test_pin_thread
> Thus we need some auto-discover way to find all testcase rather than 
> specified every case by manually.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35971) Rename the type name of TimestampNTZType as "timestamp_ntz"

2021-07-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372555#comment-17372555
 ] 

Apache Spark commented on SPARK-35971:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/33173

> Rename the type name of TimestampNTZType as "timestamp_ntz"
> ---
>
> Key: SPARK-35971
> URL: https://issues.apache.org/jira/browse/SPARK-35971
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Rename the type name string of TimestampNTZType from "timestamp without time 
> zone" to "timestamp_ntz".
> This is to make the column header shorter and simpler.
> Snowflake and Flink uses similar approach:
> https://docs.snowflake.com/en/sql-reference/data-types-datetime.html
> https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/concepts/timezone/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35971) Rename the type name of TimestampNTZType as "timestamp_ntz"

2021-07-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35971:


Assignee: Apache Spark  (was: Gengliang Wang)

> Rename the type name of TimestampNTZType as "timestamp_ntz"
> ---
>
> Key: SPARK-35971
> URL: https://issues.apache.org/jira/browse/SPARK-35971
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> Rename the type name string of TimestampNTZType from "timestamp without time 
> zone" to "timestamp_ntz".
> This is to make the column header shorter and simpler.
> Snowflake and Flink uses similar approach:
> https://docs.snowflake.com/en/sql-reference/data-types-datetime.html
> https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/concepts/timezone/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35971) Rename the type name of TimestampNTZType as "timestamp_ntz"

2021-07-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35971:


Assignee: Gengliang Wang  (was: Apache Spark)

> Rename the type name of TimestampNTZType as "timestamp_ntz"
> ---
>
> Key: SPARK-35971
> URL: https://issues.apache.org/jira/browse/SPARK-35971
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Rename the type name string of TimestampNTZType from "timestamp without time 
> zone" to "timestamp_ntz".
> This is to make the column header shorter and simpler.
> Snowflake and Flink uses similar approach:
> https://docs.snowflake.com/en/sql-reference/data-types-datetime.html
> https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/concepts/timezone/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35971) Rename the type name of TimestampNTZType as "timestamp_ntz"

2021-07-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372553#comment-17372553
 ] 

Apache Spark commented on SPARK-35971:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/33173

> Rename the type name of TimestampNTZType as "timestamp_ntz"
> ---
>
> Key: SPARK-35971
> URL: https://issues.apache.org/jira/browse/SPARK-35971
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Rename the type name string of TimestampNTZType from "timestamp without time 
> zone" to "timestamp_ntz".
> This is to make the column header shorter and simpler.
> Snowflake and Flink uses similar approach:
> https://docs.snowflake.com/en/sql-reference/data-types-datetime.html
> https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/concepts/timezone/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35968) Make sure partitions are not too small in AQE partition coalescing

2021-07-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35968:


Assignee: Apache Spark

> Make sure partitions are not too small in AQE partition coalescing
> --
>
> Key: SPARK-35968
> URL: https://issues.apache.org/jira/browse/SPARK-35968
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35968) Make sure partitions are not too small in AQE partition coalescing

2021-07-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35968:


Assignee: (was: Apache Spark)

> Make sure partitions are not too small in AQE partition coalescing
> --
>
> Key: SPARK-35968
> URL: https://issues.apache.org/jira/browse/SPARK-35968
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35968) Make sure partitions are not too small in AQE partition coalescing

2021-07-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372552#comment-17372552
 ] 

Apache Spark commented on SPARK-35968:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/33172

> Make sure partitions are not too small in AQE partition coalescing
> --
>
> Key: SPARK-35968
> URL: https://issues.apache.org/jira/browse/SPARK-35968
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35971) Rename the type name of TimestampNTZType as "timestamp_ntz"

2021-07-01 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-35971:
--

 Summary: Rename the type name of TimestampNTZType as 
"timestamp_ntz"
 Key: SPARK-35971
 URL: https://issues.apache.org/jira/browse/SPARK-35971
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


Rename the type name string of TimestampNTZType from "timestamp without time 
zone" to "timestamp_ntz".
This is to make the column header shorter and simpler.
Snowflake and Flink uses similar approach:
https://docs.snowflake.com/en/sql-reference/data-types-datetime.html
https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/concepts/timezone/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35963) Rename TimestampWithoutTZType to TimestampNTZType

2021-07-01 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-35963.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33167
[https://github.com/apache/spark/pull/33167]

> Rename TimestampWithoutTZType to TimestampNTZType
> -
>
> Key: SPARK-35963
> URL: https://issues.apache.org/jira/browse/SPARK-35963
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> The time name of `TimestampWithoutTZType` is verbose.  Rename it as 
> `TimestampNTZType` so that
> 1.  it is easier to read and type.
> 2. As we have the function to_timestamp_ntz, this makes the names consistent.
> 3. We will introduce a new SQL configuration `spark.sql.timestampType` for 
> the default timestamp type. The configuration values can be "TIMESTMAP_NTZ" 
> or "TIMESTMAP_LTZ" for simplicity. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35969) Make the pod prefix more readable and tallied with K8S DNS Label Names

2021-07-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35969:


Assignee: Apache Spark

> Make the pod prefix more readable and tallied with K8S DNS Label Names
> --
>
> Key: SPARK-35969
> URL: https://issues.apache.org/jira/browse/SPARK-35969
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Major
>
> By default, the executor pod prefix is generated by the app name. It handles 
> characters that match [^a-z0-9\\-] differently. The '.' and all whitespaces 
> will be converted to '-', but other ones to empty string. Especially,  
> characters like '_', '|' are commonly used as a word separator in many 
> languages.
> According to the K8S DNS Label Names, see 
> [https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names,]
>  we can convert all special characters to `-`.
>  
> {code:scala}
> scala> "time.is%the￥most$valuable_——thing,it's about 
> time.".replaceAll("[^a-z0-9\\-]", "-").replaceAll("-+", "-")
> res9: String = time-is-the-most-valuable-thing-it-s-about-time-
> scala> "time.is%the￥most$valuable_——thing,it's about 
> time.".replaceAll("\\s+", "-").replaceAll("\\.", 
> "-").replaceAll("[^a-z0-9\\-]", "").replaceAll("-+", "-")
> res10: String = time-isthemostvaluablethingits-about-time-
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35970) Allow predicate for pyspark.sql.functions.array_sort

2021-07-01 Thread Ramanan Subramanian (Jira)

Ramanan Subramanian created SPARK-35970:
---

 Summary: Allow predicate for pyspark.sql.functions.array_sort
 Key: SPARK-35970
 URL: https://issues.apache.org/jira/browse/SPARK-35970
 Project: Spark
  Issue Type: Wish
  Components: SQL
Affects Versions: 3.1.2
Reporter: Ramanan Subramanian


Currently, both the Python API and the Scala API for the SQL function 
`array_sort` do not take a predicate boolean function/lambda expression as a 
second argument. Hence, we have to resort to `expression` or `selectExpression` 
and use the DSL for the predicate function. It would be nice to allow this, 
just like all the higher-order functions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35969) Make the pod prefix more readable and tallied with K8S DNS Label Names

2021-07-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372532#comment-17372532
 ] 

Apache Spark commented on SPARK-35969:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/33171

> Make the pod prefix more readable and tallied with K8S DNS Label Names
> --
>
> Key: SPARK-35969
> URL: https://issues.apache.org/jira/browse/SPARK-35969
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Kent Yao
>Priority: Major
>
> By default, the executor pod prefix is generated by the app name. It handles 
> characters that match [^a-z0-9\\-] differently. The '.' and all whitespaces 
> will be converted to '-', but other ones to empty string. Especially,  
> characters like '_', '|' are commonly used as a word separator in many 
> languages.
> According to the K8S DNS Label Names, see 
> [https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names,]
>  we can convert all special characters to `-`.
>  
> {code:scala}
> scala> "time.is%the￥most$valuable_——thing,it's about 
> time.".replaceAll("[^a-z0-9\\-]", "-").replaceAll("-+", "-")
> res9: String = time-is-the-most-valuable-thing-it-s-about-time-
> scala> "time.is%the￥most$valuable_——thing,it's about 
> time.".replaceAll("\\s+", "-").replaceAll("\\.", 
> "-").replaceAll("[^a-z0-9\\-]", "").replaceAll("-+", "-")
> res10: String = time-isthemostvaluablethingits-about-time-
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35969) Make the pod prefix more readable and tallied with K8S DNS Label Names

2021-07-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35969:


Assignee: (was: Apache Spark)

> Make the pod prefix more readable and tallied with K8S DNS Label Names
> --
>
> Key: SPARK-35969
> URL: https://issues.apache.org/jira/browse/SPARK-35969
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Kent Yao
>Priority: Major
>
> By default, the executor pod prefix is generated by the app name. It handles 
> characters that match [^a-z0-9\\-] differently. The '.' and all whitespaces 
> will be converted to '-', but other ones to empty string. Especially,  
> characters like '_', '|' are commonly used as a word separator in many 
> languages.
> According to the K8S DNS Label Names, see 
> [https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names,]
>  we can convert all special characters to `-`.
>  
> {code:scala}
> scala> "time.is%the￥most$valuable_——thing,it's about 
> time.".replaceAll("[^a-z0-9\\-]", "-").replaceAll("-+", "-")
> res9: String = time-is-the-most-valuable-thing-it-s-about-time-
> scala> "time.is%the￥most$valuable_——thing,it's about 
> time.".replaceAll("\\s+", "-").replaceAll("\\.", 
> "-").replaceAll("[^a-z0-9\\-]", "").replaceAll("-+", "-")
> res10: String = time-isthemostvaluablethingits-about-time-
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35969) Make the pod prefix more readable and tallied with K8S DNS Label Names

2021-07-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372531#comment-17372531
 ] 

Apache Spark commented on SPARK-35969:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/33171

> Make the pod prefix more readable and tallied with K8S DNS Label Names
> --
>
> Key: SPARK-35969
> URL: https://issues.apache.org/jira/browse/SPARK-35969
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Kent Yao
>Priority: Major
>
> By default, the executor pod prefix is generated by the app name. It handles 
> characters that match [^a-z0-9\\-] differently. The '.' and all whitespaces 
> will be converted to '-', but other ones to empty string. Especially,  
> characters like '_', '|' are commonly used as a word separator in many 
> languages.
> According to the K8S DNS Label Names, see 
> [https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names,]
>  we can convert all special characters to `-`.
>  
> {code:scala}
> scala> "time.is%the￥most$valuable_——thing,it's about 
> time.".replaceAll("[^a-z0-9\\-]", "-").replaceAll("-+", "-")
> res9: String = time-is-the-most-valuable-thing-it-s-about-time-
> scala> "time.is%the￥most$valuable_——thing,it's about 
> time.".replaceAll("\\s+", "-").replaceAll("\\.", 
> "-").replaceAll("[^a-z0-9\\-]", "").replaceAll("-+", "-")
> res10: String = time-isthemostvaluablethingits-about-time-
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35969) Make the pod prefix more readable and tallied with K8S DNS Label Names

2021-07-01 Thread Kent Yao (Jira)

Kent Yao created SPARK-35969:


 Summary: Make the pod prefix more readable and tallied with K8S 
DNS Label Names
 Key: SPARK-35969
 URL: https://issues.apache.org/jira/browse/SPARK-35969
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.2.0
Reporter: Kent Yao


By default, the executor pod prefix is generated by the app name. It handles 
characters that match [^a-z0-9\\-] differently. The '.' and all whitespaces 
will be converted to '-', but other ones to empty string. Especially,  
characters like '_', '|' are commonly used as a word separator in many 
languages.

According to the K8S DNS Label Names, see 
[https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names,]
 we can convert all special characters to `-`.

 
{code:scala}
scala> "time.is%the￥most$valuable_——thing,it's about 
time.".replaceAll("[^a-z0-9\\-]", "-").replaceAll("-+", "-")
res9: String = time-is-the-most-valuable-thing-it-s-about-time-

scala> "time.is%the￥most$valuable_——thing,it's about 
time.".replaceAll("\\s+", "-").replaceAll("\\.", 
"-").replaceAll("[^a-z0-9\\-]", "").replaceAll("-+", "-")
res10: String = time-isthemostvaluablethingits-about-time-

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 137 matches

Mail list logo