[jira] [Updated] (SPARK-36660) Cotangent is not supported by Dataframe

2021-09-02 Thread Yuto Akutsu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuto Akutsu updated SPARK-36660:

Priority: Minor  (was: Major)

> Cotangent is not supported by Dataframe
> ---
>
> Key: SPARK-36660
> URL: https://issues.apache.org/jira/browse/SPARK-36660
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Yuto Akutsu
>Priority: Minor
>
> Cotangent is implemented in mathExpressions but cannot be called by dataframe 
> operations like other math expressions (e.g. {{df.select(sin($"col"))}}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36661) Support TimestampNTZ in PyArrow

2021-09-02 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-36661:


 Summary: Support TimestampNTZ in PyArrow
 Key: SPARK-36661
 URL: https://issues.apache.org/jira/browse/SPARK-36661
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36661) Support TimestampNTZ in Py4J

2021-09-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-36661:
-
Summary: Support TimestampNTZ in Py4J  (was: Support TimestampNTZ in 
PyArrow)

> Support TimestampNTZ in Py4J
> 
>
> Key: SPARK-36661
> URL: https://issues.apache.org/jira/browse/SPARK-36661
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36660) Cotangent is not supported by Dataframe

2021-09-02 Thread Yuto Akutsu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuto Akutsu updated SPARK-36660:

Description: Cotangent is implemented in mathExpressions but cannot be 
called by dataframe operations like other math expressions (e.g. 
{{df.select(sin($"col"))}}).  (was: Cotangent is implemented in mathExpressions 
but cannot be called by dataframe operations like other math expressions (e.g. 
df.select(sin($"col"))).)

> Cotangent is not supported by Dataframe
> ---
>
> Key: SPARK-36660
> URL: https://issues.apache.org/jira/browse/SPARK-36660
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Yuto Akutsu
>Priority: Major
>
> Cotangent is implemented in mathExpressions but cannot be called by dataframe 
> operations like other math expressions (e.g. {{df.select(sin($"col"))}}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36660) Cotangent is not supported by Dataframe

2021-09-02 Thread Yuto Akutsu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuto Akutsu updated SPARK-36660:

Description: Cotangent is implemented in mathExpressions but cannot be 
called by dataframe operations like other math expressions (e.g. 
\{code}df.select(sin($"col"))\{code}).  (was: Cotangent is implemented in 
mathExpressions but cannot be called by dataframe operations like other math 
expressions (e.g. df.select(sin($"col"))).)

> Cotangent is not supported by Dataframe
> ---
>
> Key: SPARK-36660
> URL: https://issues.apache.org/jira/browse/SPARK-36660
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Yuto Akutsu
>Priority: Major
>
> Cotangent is implemented in mathExpressions but cannot be called by dataframe 
> operations like other math expressions (e.g. 
> \{code}df.select(sin($"col"))\{code}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36660) Cotangent is not supported by Dataframe

2021-09-02 Thread Yuto Akutsu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuto Akutsu updated SPARK-36660:

Description: Cotangent is implemented in mathExpressions but cannot be 
called by dataframe operations like other math expressions (e.g. 
df.select(sin($"col"))).  (was: Cotangent is implemented in mathExpressions but 
cannot be called by dataframe operations like other math expressions (e.g. 
\{code}df.select(sin($"col"))\{code}).)

> Cotangent is not supported by Dataframe
> ---
>
> Key: SPARK-36660
> URL: https://issues.apache.org/jira/browse/SPARK-36660
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Yuto Akutsu
>Priority: Major
>
> Cotangent is implemented in mathExpressions but cannot be called by dataframe 
> operations like other math expressions (e.g. df.select(sin($"col"))).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36660) Cotangent is not supported by Dataframe

2021-09-02 Thread Yuto Akutsu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuto Akutsu updated SPARK-36660:

Description: Cotangent is implemented in mathExpressions but cannot be 
called by dataframe operations like other math expressions (e.g. 
df.select(sin($"col"))).  (was: Cotangent is implemented in ^mathExpressions^ 
but cannot be called by dataframe operations like other math expressions (e.g. 
^df.select(sin($"col"))^).)

> Cotangent is not supported by Dataframe
> ---
>
> Key: SPARK-36660
> URL: https://issues.apache.org/jira/browse/SPARK-36660
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Yuto Akutsu
>Priority: Major
>
> Cotangent is implemented in mathExpressions but cannot be called by dataframe 
> operations like other math expressions (e.g. df.select(sin($"col"))).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36658) Expose executionId to QueryExecutionListener

2021-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36658:


Assignee: Apache Spark

> Expose executionId to QueryExecutionListener
> 
>
> Key: SPARK-36658
> URL: https://issues.apache.org/jira/browse/SPARK-36658
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: huangtengfei
>Assignee: Apache Spark
>Priority: Minor
>
> Now in 
> [QueryExecutionListener|https://github.com/apache/spark/blob/v3.2.0-rc2/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala#L38]
>  we have exposed API to get the query execution information:
> def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit
> def onFailure(funcName: String, qe: QueryExecution, exception: Exception): 
> Unit
>  
> But we can not get a clear information that which query is this. In Spark 
> SQL, I think that executionId is the direct identifier of a query execution. 
> So I think it make sense to expose executionId to the QueryExecutionListener, 
> so that people can easily find the exact query in UI or history server to 
> track more information of the query execution. And there is no easy way we 
> can find the relevant executionId from a QueryExecution object. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36658) Expose executionId to QueryExecutionListener

2021-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36658:


Assignee: (was: Apache Spark)

> Expose executionId to QueryExecutionListener
> 
>
> Key: SPARK-36658
> URL: https://issues.apache.org/jira/browse/SPARK-36658
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: huangtengfei
>Priority: Minor
>
> Now in 
> [QueryExecutionListener|https://github.com/apache/spark/blob/v3.2.0-rc2/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala#L38]
>  we have exposed API to get the query execution information:
> def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit
> def onFailure(funcName: String, qe: QueryExecution, exception: Exception): 
> Unit
>  
> But we can not get a clear information that which query is this. In Spark 
> SQL, I think that executionId is the direct identifier of a query execution. 
> So I think it make sense to expose executionId to the QueryExecutionListener, 
> so that people can easily find the exact query in UI or history server to 
> track more information of the query execution. And there is no easy way we 
> can find the relevant executionId from a QueryExecution object. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36658) Expose executionId to QueryExecutionListener

2021-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409254#comment-17409254
 ] 

Apache Spark commented on SPARK-36658:
--

User 'ivoson' has created a pull request for this issue:
https://github.com/apache/spark/pull/33905

> Expose executionId to QueryExecutionListener
> 
>
> Key: SPARK-36658
> URL: https://issues.apache.org/jira/browse/SPARK-36658
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: huangtengfei
>Priority: Minor
>
> Now in 
> [QueryExecutionListener|https://github.com/apache/spark/blob/v3.2.0-rc2/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala#L38]
>  we have exposed API to get the query execution information:
> def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit
> def onFailure(funcName: String, qe: QueryExecution, exception: Exception): 
> Unit
>  
> But we can not get a clear information that which query is this. In Spark 
> SQL, I think that executionId is the direct identifier of a query execution. 
> So I think it make sense to expose executionId to the QueryExecutionListener, 
> so that people can easily find the exact query in UI or history server to 
> track more information of the query execution. And there is no easy way we 
> can find the relevant executionId from a QueryExecution object. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36660) Cotangent is not supported by Dataframe

2021-09-02 Thread Yuto Akutsu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuto Akutsu updated SPARK-36660:

Description: Cotangent is implemented in ^mathExpressions^ but cannot be 
called by dataframe operations like other math expressions (e.g. 
^df.select(sin($"col"))^).  (was: Cotangent is implemented in mathExpressions 
but cannot be called by dataframe operations like other math expressions (e.g. 
^df.select(sin($"col"))^).)

> Cotangent is not supported by Dataframe
> ---
>
> Key: SPARK-36660
> URL: https://issues.apache.org/jira/browse/SPARK-36660
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Yuto Akutsu
>Priority: Major
>
> Cotangent is implemented in ^mathExpressions^ but cannot be called by 
> dataframe operations like other math expressions (e.g. 
> ^df.select(sin($"col"))^).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36660) Cotangent is not supported by Dataframe

2021-09-02 Thread Yuto Akutsu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuto Akutsu updated SPARK-36660:

Description: Cotangent is implemented in mathExpressions but cannot be 
called by dataframe operations like other math expressions (e.g. 
^df.select(sin($"col"))^).  (was: Cotangent is implemented in mathExpressions 
but cannot be called by dataframe operations like other math expressions (e.g. 
`df.select(sin($"col"))`).)

> Cotangent is not supported by Dataframe
> ---
>
> Key: SPARK-36660
> URL: https://issues.apache.org/jira/browse/SPARK-36660
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Yuto Akutsu
>Priority: Major
>
> Cotangent is implemented in mathExpressions but cannot be called by dataframe 
> operations like other math expressions (e.g. ^df.select(sin($"col"))^).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36660) Cotangent is not supported by Dataframe

2021-09-02 Thread Yuto Akutsu (Jira)
Yuto Akutsu created SPARK-36660:
---

 Summary: Cotangent is not supported by Dataframe
 Key: SPARK-36660
 URL: https://issues.apache.org/jira/browse/SPARK-36660
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.2
Reporter: Yuto Akutsu


Cotangent is implemented in mathExpressions but cannot be called by dataframe 
operations like other math expressions (e.g. `df.select(sin($"col"))`).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36659) Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config

2021-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409240#comment-17409240
 ] 

Apache Spark commented on SPARK-36659:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/33904

> Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config
> --
>
> Key: SPARK-36659
> URL: https://issues.apache.org/jira/browse/SPARK-36659
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Kent Yao
>Priority: Minor
>
> spark.sql.execution.topKSortFallbackThreshold now is an internal config 
> hidden from users Integer.MAX_VALUE - 15 as its default. In many real-world 
> cases, if the K is very big,  there would be performance issues.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36659) Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config

2021-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36659:


Assignee: Apache Spark

> Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config
> --
>
> Key: SPARK-36659
> URL: https://issues.apache.org/jira/browse/SPARK-36659
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Minor
>
> spark.sql.execution.topKSortFallbackThreshold now is an internal config 
> hidden from users Integer.MAX_VALUE - 15 as its default. In many real-world 
> cases, if the K is very big,  there would be performance issues.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36659) Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config

2021-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36659:


Assignee: (was: Apache Spark)

> Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config
> --
>
> Key: SPARK-36659
> URL: https://issues.apache.org/jira/browse/SPARK-36659
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Kent Yao
>Priority: Minor
>
> spark.sql.execution.topKSortFallbackThreshold now is an internal config 
> hidden from users Integer.MAX_VALUE - 15 as its default. In many real-world 
> cases, if the K is very big,  there would be performance issues.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36659) Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config

2021-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409239#comment-17409239
 ] 

Apache Spark commented on SPARK-36659:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/33904

> Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config
> --
>
> Key: SPARK-36659
> URL: https://issues.apache.org/jira/browse/SPARK-36659
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Kent Yao
>Priority: Minor
>
> spark.sql.execution.topKSortFallbackThreshold now is an internal config 
> hidden from users Integer.MAX_VALUE - 15 as its default. In many real-world 
> cases, if the K is very big,  there would be performance issues.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36652) AQE dynamic join selection should not apply to non-equi join

2021-09-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-36652.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 33899
[https://github.com/apache/spark/pull/33899]

> AQE dynamic join selection should not apply to non-equi join
> 
>
> Key: SPARK-36652
> URL: https://issues.apache.org/jira/browse/SPARK-36652
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Cheng Su
>Assignee: Cheng Su
>Priority: Minor
> Fix For: 3.3.0
>
>
> Currently `DynamicJoinSelection` has two features: 1.demote broadcast hash 
> join, and 2.promote shuffled hash join. Both are achieved by adding join hint 
> in query plan, and only works for equi join. However the rule is matching 
> with `Join` operator now - 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala#L71,]
>  so it would add hint for non-equi join by mistake.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36652) AQE dynamic join selection should not apply to non-equi join

2021-09-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-36652:
-

Assignee: Cheng Su

> AQE dynamic join selection should not apply to non-equi join
> 
>
> Key: SPARK-36652
> URL: https://issues.apache.org/jira/browse/SPARK-36652
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Cheng Su
>Assignee: Cheng Su
>Priority: Minor
>
> Currently `DynamicJoinSelection` has two features: 1.demote broadcast hash 
> join, and 2.promote shuffled hash join. Both are achieved by adding join hint 
> in query plan, and only works for equi join. However the rule is matching 
> with `Join` operator now - 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala#L71,]
>  so it would add hint for non-equi join by mistake.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36633) DivideDTInterval should throw the same exception when divide by zero.

2021-09-02 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-36633:
---
Summary: DivideDTInterval should throw the same exception when divide by 
zero.  (was: DivideDTInterval should consider ansi mode.)

> DivideDTInterval should throw the same exception when divide by zero.
> -
>
> Key: SPARK-36633
> URL: https://issues.apache.org/jira/browse/SPARK-36633
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> DivideDTInterval not consider the ansi mode, we should support it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36659) Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config

2021-09-02 Thread Kent Yao (Jira)
Kent Yao created SPARK-36659:


 Summary: Promote spark.sql.execution.topKSortFallbackThreshold to 
user-faced config
 Key: SPARK-36659
 URL: https://issues.apache.org/jira/browse/SPARK-36659
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0, 3.3.0
Reporter: Kent Yao


spark.sql.execution.topKSortFallbackThreshold now is an internal config hidden 
from users Integer.MAX_VALUE - 15 as its default. In many real-world cases, if 
the K is very big,  there would be performance issues.

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36650) ApplicationMaster shutdown hook should catch timeout exception

2021-09-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-36650.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 33897
[https://github.com/apache/spark/pull/33897]

> ApplicationMaster shutdown hook should catch timeout exception
> --
>
> Key: SPARK-36650
> URL: https://issues.apache.org/jira/browse/SPARK-36650
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.3.0
>
>
> ApplicationMaster shutdown hook should catch timeout exception



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36650) ApplicationMaster shutdown hook should catch timeout exception

2021-09-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-36650:


Assignee: angerszhu

> ApplicationMaster shutdown hook should catch timeout exception
> --
>
> Key: SPARK-36650
> URL: https://issues.apache.org/jira/browse/SPARK-36650
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>
> ApplicationMaster shutdown hook should catch timeout exception



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-36658) Expose executionId to QueryExecutionListener

2021-09-02 Thread huangtengfei (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409210#comment-17409210
 ] 

huangtengfei edited comment on SPARK-36658 at 9/3/21, 2:36 AM:
---

cc [~cloud_fan] could you share thoughts about this?


was (Author: ivoson):
cc [~cloud_fan] could you share any thoughts about this?

> Expose executionId to QueryExecutionListener
> 
>
> Key: SPARK-36658
> URL: https://issues.apache.org/jira/browse/SPARK-36658
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: huangtengfei
>Priority: Minor
>
> Now in 
> [QueryExecutionListener|https://github.com/apache/spark/blob/v3.2.0-rc2/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala#L38]
>  we have exposed API to get the query execution information:
> def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit
> def onFailure(funcName: String, qe: QueryExecution, exception: Exception): 
> Unit
>  
> But we can not get a clear information that which query is this. In Spark 
> SQL, I think that executionId is the direct identifier of a query execution. 
> So I think it make sense to expose executionId to the QueryExecutionListener, 
> so that people can easily find the exact query in UI or history server to 
> track more information of the query execution. And there is no easy way we 
> can find the relevant executionId from a QueryExecution object. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36658) Expose executionId to QueryExecutionListener

2021-09-02 Thread huangtengfei (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409228#comment-17409228
 ] 

huangtengfei commented on SPARK-36658:
--

Will create a RP for this.

> Expose executionId to QueryExecutionListener
> 
>
> Key: SPARK-36658
> URL: https://issues.apache.org/jira/browse/SPARK-36658
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: huangtengfei
>Priority: Minor
>
> Now in 
> [QueryExecutionListener|https://github.com/apache/spark/blob/v3.2.0-rc2/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala#L38]
>  we have exposed API to get the query execution information:
> def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit
> def onFailure(funcName: String, qe: QueryExecution, exception: Exception): 
> Unit
>  
> But we can not get a clear information that which query is this. In Spark 
> SQL, I think that executionId is the direct identifier of a query execution. 
> So I think it make sense to expose executionId to the QueryExecutionListener, 
> so that people can easily find the exact query in UI or history server to 
> track more information of the query execution. And there is no easy way we 
> can find the relevant executionId from a QueryExecution object. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36635) spark-sql do NOT support that select name expression as string type now

2021-09-02 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409225#comment-17409225
 ] 

Hyukjin Kwon commented on SPARK-36635:
--

can you justify why it should work? is it ANSI standard or other DBMSes support?

> spark-sql do NOT support  that select name expression as string type now
> 
>
> Key: SPARK-36635
> URL: https://issues.apache.org/jira/browse/SPARK-36635
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0, 3.1.2
>Reporter: weixiuli
>Priority: Major
>
> The follow statement would throw an exception.
> {code:java}
>  sql("SELECT age as 'a', name as 'n' FROM VALUES (2, 'Alice'), (5, 'Bob') 
> people(age, name)")
> {code}
> {code:java}
> // Exception information
> Error in query:
> mismatched input ''a'' expecting {, ';'}(line 1, pos 14)
> == SQL ==
> SELECT age as 'a', name as 'n' FROM VALUES (2, 'Alice'), (5, 'Bob') 
> people(age, name)
> --^^^
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36635) spark-sql do NOT support that select name expression as string type now

2021-09-02 Thread weixiuli (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409222#comment-17409222
 ] 

weixiuli commented on SPARK-36635:
--

It didn't work before either.

> spark-sql do NOT support  that select name expression as string type now
> 
>
> Key: SPARK-36635
> URL: https://issues.apache.org/jira/browse/SPARK-36635
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0, 3.1.2
>Reporter: weixiuli
>Priority: Major
>
> The follow statement would throw an exception.
> {code:java}
>  sql("SELECT age as 'a', name as 'n' FROM VALUES (2, 'Alice'), (5, 'Bob') 
> people(age, name)")
> {code}
> {code:java}
> // Exception information
> Error in query:
> mismatched input ''a'' expecting {, ';'}(line 1, pos 14)
> == SQL ==
> SELECT age as 'a', name as 'n' FROM VALUES (2, 'Alice'), (5, 'Bob') 
> people(age, name)
> --^^^
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36351) Separate partition filters and data filters in PushDownUtils

2021-09-02 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh resolved SPARK-36351.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 33650
[https://github.com/apache/spark/pull/33650]

> Separate partition filters and data filters in PushDownUtils
> 
>
> Key: SPARK-36351
> URL: https://issues.apache.org/jira/browse/SPARK-36351
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Minor
> Fix For: 3.3.0
>
>
> Currently, DSv2 partition filters and data filters are separated in 
> PruneFileSourcePartitions. It's better to separate these in PushDownUtils, 
> where we do filter/aggregate push down and column pruning, so we can still 
> push down aggregate for FileScan if the filers are only partition filters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36351) Separate partition filters and data filters in PushDownUtils

2021-09-02 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh reassigned SPARK-36351:
---

Assignee: Huaxin Gao

> Separate partition filters and data filters in PushDownUtils
> 
>
> Key: SPARK-36351
> URL: https://issues.apache.org/jira/browse/SPARK-36351
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Minor
>
> Currently, DSv2 partition filters and data filters are separated in 
> PruneFileSourcePartitions. It's better to separate these in PushDownUtils, 
> where we do filter/aggregate push down and column pruning, so we can still 
> push down aggregate for FileScan if the filers are only partition filters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36658) Expose executionId to QueryExecutionListener

2021-09-02 Thread huangtengfei (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huangtengfei updated SPARK-36658:
-
Description: 
Now in 
[QueryExecutionListener|https://github.com/apache/spark/blob/v3.2.0-rc2/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala#L38]
 we have exposed API to get the query execution information:

def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit

def onFailure(funcName: String, qe: QueryExecution, exception: Exception): Unit

 

But we can not get a clear information that which query is this. In Spark SQL, 
I think that executionId is the direct identifier of a query execution. So I 
think it make sense to expose executionId to the QueryExecutionListener, so 
that people can easily find the exact query in UI or history server to track 
more information of the query execution. And there is no easy way we can find 
the relevant executionId from a QueryExecution object. 

 

  was:
Now in 
[QueryExecutionListener|https://github.com/apache/spark/blob/v3.2.0-rc2/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala#L38]
 we have exposed API to get the query execution information:

def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit

def onFailure(funcName: String, qe: QueryExecution, exception: Exception): Unit

 

But we can not get a clear information that which query is this. In Spark SQL, 
I think that executionId is the direct identifier of a query execution. So I 
think it make sense to expose executionId to the QueryExecutionListener, so 
that people can easily find the exact query in UI or history server to track 
more information of the query execution.

 


> Expose executionId to QueryExecutionListener
> 
>
> Key: SPARK-36658
> URL: https://issues.apache.org/jira/browse/SPARK-36658
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: huangtengfei
>Priority: Minor
>
> Now in 
> [QueryExecutionListener|https://github.com/apache/spark/blob/v3.2.0-rc2/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala#L38]
>  we have exposed API to get the query execution information:
> def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit
> def onFailure(funcName: String, qe: QueryExecution, exception: Exception): 
> Unit
>  
> But we can not get a clear information that which query is this. In Spark 
> SQL, I think that executionId is the direct identifier of a query execution. 
> So I think it make sense to expose executionId to the QueryExecutionListener, 
> so that people can easily find the exact query in UI or history server to 
> track more information of the query execution. And there is no easy way we 
> can find the relevant executionId from a QueryExecution object. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36658) Expose executionId to QueryExecutionListener

2021-09-02 Thread huangtengfei (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409210#comment-17409210
 ] 

huangtengfei commented on SPARK-36658:
--

cc [~cloud_fan] could you share any thoughts about this?

> Expose executionId to QueryExecutionListener
> 
>
> Key: SPARK-36658
> URL: https://issues.apache.org/jira/browse/SPARK-36658
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: huangtengfei
>Priority: Minor
>
> Now in 
> [QueryExecutionListener|https://github.com/apache/spark/blob/v3.2.0-rc2/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala#L38]
>  we have exposed API to get the query execution information:
> def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit
> def onFailure(funcName: String, qe: QueryExecution, exception: Exception): 
> Unit
>  
> But we can not get a clear information that which query is this. In Spark 
> SQL, I think that executionId is the direct identifier of a query execution. 
> So I think it make sense to expose executionId to the QueryExecutionListener, 
> so that people can easily find the exact query in UI or history server to 
> track more information of the query execution.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36658) Expose executionId to QueryExecutionListener

2021-09-02 Thread huangtengfei (Jira)
huangtengfei created SPARK-36658:


 Summary: Expose executionId to QueryExecutionListener
 Key: SPARK-36658
 URL: https://issues.apache.org/jira/browse/SPARK-36658
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.2
Reporter: huangtengfei


Now in 
[QueryExecutionListener|https://github.com/apache/spark/blob/v3.2.0-rc2/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala#L38]
 we have exposed API to get the query execution information:

def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit

def onFailure(funcName: String, qe: QueryExecution, exception: Exception): Unit

 

But we can not get a clear information that which query is this. In Spark SQL, 
I think that executionId is the direct identifier of a query execution. So I 
think it make sense to expose executionId to the QueryExecutionListener, so 
that people can easily find the exact query in UI or history server to track 
more information of the query execution.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36657) Update comment in `gen-sql-config-docs.py`

2021-09-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-36657:
--
Affects Version/s: 3.2.0

> Update comment in `gen-sql-config-docs.py`
> --
>
> Key: SPARK-36657
> URL: https://issues.apache.org/jira/browse/SPARK-36657
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: William Hyun
>Assignee: William Hyun
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36657) Update comment in `gen-sql-config-docs.py`

2021-09-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-36657:
-

Assignee: William Hyun

> Update comment in `gen-sql-config-docs.py`
> --
>
> Key: SPARK-36657
> URL: https://issues.apache.org/jira/browse/SPARK-36657
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: William Hyun
>Assignee: William Hyun
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36657) Update comment in `gen-sql-config-docs.py`

2021-09-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-36657.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33902
[https://github.com/apache/spark/pull/33902]

> Update comment in `gen-sql-config-docs.py`
> --
>
> Key: SPARK-36657
> URL: https://issues.apache.org/jira/browse/SPARK-36657
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: William Hyun
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36656) CollapseProject should not collapse correlated scalar subqueries

2021-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409174#comment-17409174
 ] 

Apache Spark commented on SPARK-36656:
--

User 'allisonwang-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/33903

> CollapseProject should not collapse correlated scalar subqueries
> 
>
> Key: SPARK-36656
> URL: https://issues.apache.org/jira/browse/SPARK-36656
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Priority: Major
>
> Currently, the optimizer rule `CollapseProject` inlines expressions generated 
> from correlated scalar subqueries, which can create unnecessary left outer 
> joins.
> {code:sql}
> select c1, s, s * 10 from (
> select c1, (select first(c2) from t2 where t1.c1 = t2.c1) s from t1)
> {code}
> {code:scala}
> // Before
> Project [c1, s, (s * 10)]
> +- Project [c1, scalar-subquery [c1] AS s]
>:  +- Aggregate [c1], [first(c2), c1] 
>:  +- LocalRelation [c1, c2]
>+- LocalRelation [c1, c2]
> // After (scalar subqueries are inlined)
> Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)]
> :  +- Aggregate [c1], [first(c2), c1] 
> :  +- LocalRelation [c1, c2]
> :  +- Aggregate [c1], [first(c2), c1] 
> :  +- LocalRelation [c1, c2]
> +- LocalRelation [c1, c2]
> {code}
> Then this query will have two LeftOuter joins created. We should only 
> collapse projects after correlated subqueries are rewritten as joins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36657) Update comment in `gen-sql-config-docs.py`

2021-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409171#comment-17409171
 ] 

Apache Spark commented on SPARK-36657:
--

User 'williamhyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/33902

> Update comment in `gen-sql-config-docs.py`
> --
>
> Key: SPARK-36657
> URL: https://issues.apache.org/jira/browse/SPARK-36657
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: William Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36656) CollapseProject should not collapse correlated scalar subqueries

2021-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409173#comment-17409173
 ] 

Apache Spark commented on SPARK-36656:
--

User 'allisonwang-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/33903

> CollapseProject should not collapse correlated scalar subqueries
> 
>
> Key: SPARK-36656
> URL: https://issues.apache.org/jira/browse/SPARK-36656
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Priority: Major
>
> Currently, the optimizer rule `CollapseProject` inlines expressions generated 
> from correlated scalar subqueries, which can create unnecessary left outer 
> joins.
> {code:sql}
> select c1, s, s * 10 from (
> select c1, (select first(c2) from t2 where t1.c1 = t2.c1) s from t1)
> {code}
> {code:scala}
> // Before
> Project [c1, s, (s * 10)]
> +- Project [c1, scalar-subquery [c1] AS s]
>:  +- Aggregate [c1], [first(c2), c1] 
>:  +- LocalRelation [c1, c2]
>+- LocalRelation [c1, c2]
> // After (scalar subqueries are inlined)
> Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)]
> :  +- Aggregate [c1], [first(c2), c1] 
> :  +- LocalRelation [c1, c2]
> :  +- Aggregate [c1], [first(c2), c1] 
> :  +- LocalRelation [c1, c2]
> +- LocalRelation [c1, c2]
> {code}
> Then this query will have two LeftOuter joins created. We should only 
> collapse projects after correlated subqueries are rewritten as joins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36656) CollapseProject should not collapse correlated scalar subqueries

2021-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36656:


Assignee: Apache Spark

> CollapseProject should not collapse correlated scalar subqueries
> 
>
> Key: SPARK-36656
> URL: https://issues.apache.org/jira/browse/SPARK-36656
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Assignee: Apache Spark
>Priority: Major
>
> Currently, the optimizer rule `CollapseProject` inlines expressions generated 
> from correlated scalar subqueries, which can create unnecessary left outer 
> joins.
> {code:sql}
> select c1, s, s * 10 from (
> select c1, (select first(c2) from t2 where t1.c1 = t2.c1) s from t1)
> {code}
> {code:scala}
> // Before
> Project [c1, s, (s * 10)]
> +- Project [c1, scalar-subquery [c1] AS s]
>:  +- Aggregate [c1], [first(c2), c1] 
>:  +- LocalRelation [c1, c2]
>+- LocalRelation [c1, c2]
> // After (scalar subqueries are inlined)
> Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)]
> :  +- Aggregate [c1], [first(c2), c1] 
> :  +- LocalRelation [c1, c2]
> :  +- Aggregate [c1], [first(c2), c1] 
> :  +- LocalRelation [c1, c2]
> +- LocalRelation [c1, c2]
> {code}
> Then this query will have two LeftOuter joins created. We should only 
> collapse projects after correlated subqueries are rewritten as joins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36656) CollapseProject should not collapse correlated scalar subqueries

2021-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36656:


Assignee: (was: Apache Spark)

> CollapseProject should not collapse correlated scalar subqueries
> 
>
> Key: SPARK-36656
> URL: https://issues.apache.org/jira/browse/SPARK-36656
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Priority: Major
>
> Currently, the optimizer rule `CollapseProject` inlines expressions generated 
> from correlated scalar subqueries, which can create unnecessary left outer 
> joins.
> {code:sql}
> select c1, s, s * 10 from (
> select c1, (select first(c2) from t2 where t1.c1 = t2.c1) s from t1)
> {code}
> {code:scala}
> // Before
> Project [c1, s, (s * 10)]
> +- Project [c1, scalar-subquery [c1] AS s]
>:  +- Aggregate [c1], [first(c2), c1] 
>:  +- LocalRelation [c1, c2]
>+- LocalRelation [c1, c2]
> // After (scalar subqueries are inlined)
> Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)]
> :  +- Aggregate [c1], [first(c2), c1] 
> :  +- LocalRelation [c1, c2]
> :  +- Aggregate [c1], [first(c2), c1] 
> :  +- LocalRelation [c1, c2]
> +- LocalRelation [c1, c2]
> {code}
> Then this query will have two LeftOuter joins created. We should only 
> collapse projects after correlated subqueries are rewritten as joins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36657) Update comment in `gen-sql-config-docs.py`

2021-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36657:


Assignee: (was: Apache Spark)

> Update comment in `gen-sql-config-docs.py`
> --
>
> Key: SPARK-36657
> URL: https://issues.apache.org/jira/browse/SPARK-36657
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: William Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36657) Update comment in `gen-sql-config-docs.py`

2021-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409170#comment-17409170
 ] 

Apache Spark commented on SPARK-36657:
--

User 'williamhyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/33902

> Update comment in `gen-sql-config-docs.py`
> --
>
> Key: SPARK-36657
> URL: https://issues.apache.org/jira/browse/SPARK-36657
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: William Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36657) Update comment in `gen-sql-config-docs.py`

2021-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36657:


Assignee: Apache Spark

> Update comment in `gen-sql-config-docs.py`
> --
>
> Key: SPARK-36657
> URL: https://issues.apache.org/jira/browse/SPARK-36657
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: William Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36657) Update comment in `gen-sql-config-docs.py`

2021-09-02 Thread William Hyun (Jira)
William Hyun created SPARK-36657:


 Summary: Update comment in `gen-sql-config-docs.py`
 Key: SPARK-36657
 URL: https://issues.apache.org/jira/browse/SPARK-36657
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.3.0
Reporter: William Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36656) CollapseProject should not collapse correlated scalar subqueries

2021-09-02 Thread Allison Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-36656:
-
Description: 
Currently, the optimizer rule `CollapseProject` inlines expressions generated 
from correlated scalar subqueries, which can create unnecessary left outer 
joins.

{code:sql}
select c1, s, s * 10 from (
select c1, (select first(c2) from t2 where t1.c1 = t2.c1) s from t1)
{code}

{code:scala}
// Before
Project [c1, s, (s * 10)]
+- Project [c1, scalar-subquery [c1] AS s]
   :  +- Aggregate [c1], [first(c2), c1] 
   :  +- LocalRelation [c1, c2]
   +- LocalRelation [c1, c2]

// After (scalar subqueries are inlined)
Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)]
:  +- Aggregate [c1], [first(c2), c1] 
:  +- LocalRelation [c1, c2]
:  +- Aggregate [c1], [first(c2), c1] 
:  +- LocalRelation [c1, c2]
+- LocalRelation [c1, c2]
{code}

Then this query will have two LeftOuter joins created. We should only collapse 
projects after correlated subqueries are rewritten as joins.

  was:
Currently, the optimizer rule `CollapseProject` inlines expressions generated 
from correlated scalar subqueries, which can create unnecessary left outer 
joins.

{code:scala}
// Before
Project [c1, s, (s * 10)]
+- Project [c1, scalar-subquery [c1] AS s]
   :  +- Aggregate [c1], [first(c2), c1] 
   :  +- LocalRelation [c1, c2]
   +- LocalRelation [c1, c2]

// After (scalar subqueries are inlined)
Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)]
:  +- Aggregate [c1], [first(c2), c1] 
:  +- LocalRelation [c1, c2]
:  +- Aggregate [c1], [first(c2), c1] 
:  +- LocalRelation [c1, c2]
+- LocalRelation [c1, c2]
{code}

Then this query will have two LeftOuter joins created. We should only collapse 
projects after correlated subqueries are rewritten as joins.


> CollapseProject should not collapse correlated scalar subqueries
> 
>
> Key: SPARK-36656
> URL: https://issues.apache.org/jira/browse/SPARK-36656
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Priority: Major
>
> Currently, the optimizer rule `CollapseProject` inlines expressions generated 
> from correlated scalar subqueries, which can create unnecessary left outer 
> joins.
> {code:sql}
> select c1, s, s * 10 from (
> select c1, (select first(c2) from t2 where t1.c1 = t2.c1) s from t1)
> {code}
> {code:scala}
> // Before
> Project [c1, s, (s * 10)]
> +- Project [c1, scalar-subquery [c1] AS s]
>:  +- Aggregate [c1], [first(c2), c1] 
>:  +- LocalRelation [c1, c2]
>+- LocalRelation [c1, c2]
> // After (scalar subqueries are inlined)
> Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)]
> :  +- Aggregate [c1], [first(c2), c1] 
> :  +- LocalRelation [c1, c2]
> :  +- Aggregate [c1], [first(c2), c1] 
> :  +- LocalRelation [c1, c2]
> +- LocalRelation [c1, c2]
> {code}
> Then this query will have two LeftOuter joins created. We should only 
> collapse projects after correlated subqueries are rewritten as joins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36656) CollapseProject should not collapse correlated scalar subqueries

2021-09-02 Thread Allison Wang (Jira)
Allison Wang created SPARK-36656:


 Summary: CollapseProject should not collapse correlated scalar 
subqueries
 Key: SPARK-36656
 URL: https://issues.apache.org/jira/browse/SPARK-36656
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Allison Wang


Currently, the optimizer rule `CollapseProject` inlines expressions generated 
from correlated scalar subqueries, which can create unnecessary left outer 
joins.

{code:scala}
// Before
Project [c1, s, (s * 10)]
+- Project [c1, scalar-subquery [c1] AS s]
   :  +- Aggregate [c1], [first(c2), c1] 
   :  +- LocalRelation [c1, c2]
   +- LocalRelation [c1, c2]

// After (scalar subqueries are inlined)
Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)]
:  +- Aggregate [c1], [first(c2), c1] 
:  +- LocalRelation [c1, c2]
:  +- Aggregate [c1], [first(c2), c1] 
:  +- LocalRelation [c1, c2]
+- LocalRelation [c1, c2]
{code}

Then this query will have two LeftOuter joins created. We should only collapse 
projects after correlated subqueries are rewritten as joins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36655) Add `versionadded` for API added in Spark 3.3.0

2021-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36655:


Assignee: Apache Spark

> Add `versionadded` for API added in Spark 3.3.0
> ---
>
> Key: SPARK-36655
> URL: https://issues.apache.org/jira/browse/SPARK-36655
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36655) Add `versionadded` for API added in Spark 3.3.0

2021-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36655:


Assignee: (was: Apache Spark)

> Add `versionadded` for API added in Spark 3.3.0
> ---
>
> Key: SPARK-36655
> URL: https://issues.apache.org/jira/browse/SPARK-36655
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36655) Add `versionadded` for API added in Spark 3.3.0

2021-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409134#comment-17409134
 ] 

Apache Spark commented on SPARK-36655:
--

User 'xinrong-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/33901

> Add `versionadded` for API added in Spark 3.3.0
> ---
>
> Key: SPARK-36655
> URL: https://issues.apache.org/jira/browse/SPARK-36655
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36654) Drop type ignores from numpy imports

2021-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36654:


Assignee: (was: Apache Spark)

> Drop type ignores from numpy imports
> 
>
> Key: SPARK-36654
> URL: https://issues.apache.org/jira/browse/SPARK-36654
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.1.2, 3.2.0, 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Minor
>
> Currently we use {{type: ingore[import]}} on all numpy imports ‒ this was 
> necessary because numpy didn't provide annotations at the time when we added 
> stubs to PySpark.
> Since numpy 1.20 (https://github.com/numpy/numpy/releases/tag/v1.20.0) numpy 
> is PEP 561 compatible and these ignores are no longer necessary (current 
> numpy version is 1.21).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36654) Drop type ignores from numpy imports

2021-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36654:


Assignee: Apache Spark

> Drop type ignores from numpy imports
> 
>
> Key: SPARK-36654
> URL: https://issues.apache.org/jira/browse/SPARK-36654
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.1.2, 3.2.0, 3.3.0
>Reporter: Maciej Szymkiewicz
>Assignee: Apache Spark
>Priority: Minor
>
> Currently we use {{type: ingore[import]}} on all numpy imports ‒ this was 
> necessary because numpy didn't provide annotations at the time when we added 
> stubs to PySpark.
> Since numpy 1.20 (https://github.com/numpy/numpy/releases/tag/v1.20.0) numpy 
> is PEP 561 compatible and these ignores are no longer necessary (current 
> numpy version is 1.21).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36654) Drop type ignores from numpy imports

2021-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409133#comment-17409133
 ] 

Apache Spark commented on SPARK-36654:
--

User 'zero323' has created a pull request for this issue:
https://github.com/apache/spark/pull/33900

> Drop type ignores from numpy imports
> 
>
> Key: SPARK-36654
> URL: https://issues.apache.org/jira/browse/SPARK-36654
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.1.2, 3.2.0, 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Minor
>
> Currently we use {{type: ingore[import]}} on all numpy imports ‒ this was 
> necessary because numpy didn't provide annotations at the time when we added 
> stubs to PySpark.
> Since numpy 1.20 (https://github.com/numpy/numpy/releases/tag/v1.20.0) numpy 
> is PEP 561 compatible and these ignores are no longer necessary (current 
> numpy version is 1.21).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36655) Add `versionadded` for API added in Spark 3.3.0

2021-09-02 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-36655:


 Summary: Add `versionadded` for API added in Spark 3.3.0
 Key: SPARK-36655
 URL: https://issues.apache.org/jira/browse/SPARK-36655
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36654) Drop type ignores from numpy imports

2021-09-02 Thread Maciej Szymkiewicz (Jira)
Maciej Szymkiewicz created SPARK-36654:
--

 Summary: Drop type ignores from numpy imports
 Key: SPARK-36654
 URL: https://issues.apache.org/jira/browse/SPARK-36654
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.1.2, 3.2.0, 3.3.0
Reporter: Maciej Szymkiewicz


Currently we use {{type: ingore[import]}} on all numpy imports ‒ this was 
necessary because numpy didn't provide annotations at the time when we added 
stubs to PySpark.

Since numpy 1.20 (https://github.com/numpy/numpy/releases/tag/v1.20.0) numpy is 
PEP 561 compatible and these ignores are no longer necessary (current numpy 
version is 1.21).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36653) Implement Series.__xor__

2021-09-02 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-36653:
-

 Summary: Implement Series.__xor__
 Key: SPARK-36653
 URL: https://issues.apache.org/jira/browse/SPARK-36653
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36652) AQE dynamic join selection should not apply to non-equi join

2021-09-02 Thread Cheng Su (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Su updated SPARK-36652:
-
Affects Version/s: (was: 3.2.0)

> AQE dynamic join selection should not apply to non-equi join
> 
>
> Key: SPARK-36652
> URL: https://issues.apache.org/jira/browse/SPARK-36652
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Cheng Su
>Priority: Minor
>
> Currently `DynamicJoinSelection` has two features: 1.demote broadcast hash 
> join, and 2.promote shuffled hash join. Both are achieved by adding join hint 
> in query plan, and only works for equi join. However the rule is matching 
> with `Join` operator now - 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala#L71,]
>  so it would add hint for non-equi join by mistake.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36443) Demote BroadcastJoin causes performance regression and increases OOM risks

2021-09-02 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408958#comment-17408958
 ] 

Dongjoon Hyun commented on SPARK-36443:
---

Thank you for the details.

> Demote BroadcastJoin causes performance regression and increases OOM risks
> --
>
> Key: SPARK-36443
> URL: https://issues.apache.org/jira/browse/SPARK-36443
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Kent Yao
>Priority: Major
> Attachments: image-2021-08-06-11-24-34-122.png, 
> image-2021-08-06-17-57-15-765.png, screenshot-1.png
>
>
>  
> h2. A test case
> Use bin/spark-sql with local mode and all other default settings with 3.1.2 
> to run the case below
> {code:sql}
> // Some comments here
> set spark.sql.shuffle.partitions=20;
> set spark.sql.adaptive.enabled=true;
> -- set spark.sql.adaptive.nonEmptyPartitionRatioForBroadcastJoin=0; -- 
> (default 0.2)enable this for not demote bhj
> set spark.sql.autoBroadcastJoinThreshold=200;
> SELECT
>   l.id % 12345 k,
>   sum(l.id) sum,
>   count(l.id) cnt,
>   avg(l.id) avg,
>   min(l.id) min,
>   max(l.id) max
> from (select id % 3 id from range(0, 1e8, 1, 100)) l
>   left join (SELECT max(id) as id, id % 2 gid FROM range(0, 1000, 2, 100) 
> group by gid) r ON l.id = r.id
> GROUP BY 1;
> {code}
>  
>  1. demote bhj w/ nonEmptyPartitionRatioForBroadcastJoin comment out
>  
> | |
> ||[Job Id 
> ▾|http://localhost:4040/jobs/?=Job+Id=false=100#completed]||[Description|http://localhost:4040/jobs/?=Description=100#completed]||[Submitted|http://localhost:4040/jobs/?=Submitted=100#completed]||[Duration|http://localhost:4040/jobs/?=Duration=100#completed]||Stages:
>  Succeeded/Total||Tasks (for all stages): Succeeded/Total||
> |4|SELECT l.id % 12345 k, sum(l.id) sum, count(l.id) cnt, avg(l.id) avg, 
> min(l.id) min, max(l.id) max from (select id % 3 id from range(0, 1e8, 1, 
> 100)) l left join (SELECT max(id) as id, id % 2 gid FROM range(0, 1000, 2, 
> 100) group by gid) r ON l.id = r.id GROUP BY 1[main at 
> NativeMethodAccessorImpl.java:0|http://localhost:4040/jobs/job/?id=4]|2021/08/06
>  17:31:37|71 ms|1/1 (4 skipped)|3/3 (205 skipped) 
>   |
> |3|SELECT l.id % 12345 k, sum(l.id) sum, count(l.id) cnt, avg(l.id) avg, 
> min(l.id) min, max(l.id) max from (select id % 3 id from range(0, 1e8, 1, 
> 100)) l left join (SELECT max(id) as id, id % 2 gid FROM range(0, 1000, 2, 
> 100) group by gid) r ON l.id = r.id GROUP BY 1[main at 
> NativeMethodAccessorImpl.java:0|http://localhost:4040/jobs/job/?id=3]|2021/08/06
>  17:31:18|19 s|1/1 (3 skipped)|4/4 (201 skipped) 
>   |
> |2|SELECT l.id % 12345 k, sum(l.id) sum, count(l.id) cnt, avg(l.id) avg, 
> min(l.id) min, max(l.id) max from (select id % 3 id from range(0, 1e8, 1, 
> 100)) l left join (SELECT max(id) as id, id % 2 gid FROM range(0, 1000, 2, 
> 100) group by gid) r ON l.id = r.id GROUP BY 1[main at 
> NativeMethodAccessorImpl.java:0|http://localhost:4040/jobs/job/?id=2]|2021/08/06
>  17:31:18|87 ms|1/1 (1 skipped)|1/1 (100 skipped) 
>   |
> |1|SELECT l.id % 12345 k, sum(l.id) sum, count(l.id) cnt, avg(l.id) avg, 
> min(l.id) min, max(l.id) max from (select id % 3 id from range(0, 1e8, 1, 
> 100)) l left join (SELECT max(id) as id, id % 2 gid FROM range(0, 1000, 2, 
> 100) group by gid) r ON l.id = r.id GROUP BY 1[main at 
> NativeMethodAccessorImpl.java:0|http://localhost:4040/jobs/job/?id=1]|2021/08/06
>  17:31:16|2 s|1/1|100/100 
>   |
> |0|SELECT l.id % 12345 k, sum(l.id) sum, count(l.id) cnt, avg(l.id) avg, 
> min(l.id) min, max(l.id) max from (select id % 3 id from range(0, 1e8, 1, 
> 100)) l left join (SELECT max(id) as id, id % 2 gid FROM range(0, 1000, 2, 
> 100) group by gid) r ON l.id = r.id GROUP BY 1[main at 
> NativeMethodAccessorImpl.java:0|http://localhost:4040/jobs/job/?id=0]|2021/08/06
>  17:31:15|2 s|1/1|100/100 |
> 2. set nonEmptyPartitionRatioForBroadcastJoin to 0 to tell spark not to 
> demote bhj
>  
> ||[Job Id (Job Group) 
> ▾|http://localhost:4040/jobs/?=Job+Id+%28Job+Group%29=false=100#completed]||[Description|http://localhost:4040/jobs/?=Description=100#completed]||[Submitted|http://localhost:4040/jobs/?=Submitted=100#completed]||[Duration|http://localhost:4040/jobs/?=Duration=100#completed]||Stages:
>  Succeeded/Total||Tasks (for all stages): Succeeded/Total||
> |5|SELECT l.id % 12345 k, sum(l.id) sum, count(l.id) cnt, avg(l.id) avg, 
> min(l.id) min, max(l.id) max from (select id % 3 id from range(0, 1e8, 1, 
> 100)) l left join (SELECT max(id) as id, id % 2 gid FROM range(0, 1000, 2, 
> 100) group by gid) r ON l.id = r.id GROUP BY 1[main at 
> NativeMethodAccessorImpl.java:0|http://localhost:4040/jobs/job/?id=5]|2021/08/06
>  18:25:15|29 ms|1/1 (2 skipped)|3/3 (200 skipped) 
>   |
> |4|SELECT l.id % 

[jira] [Updated] (SPARK-36652) AQE dynamic join selection should not apply to non-equi join

2021-09-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-36652:
--
Parent: SPARK-33828
Issue Type: Sub-task  (was: Bug)

> AQE dynamic join selection should not apply to non-equi join
> 
>
> Key: SPARK-36652
> URL: https://issues.apache.org/jira/browse/SPARK-36652
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Cheng Su
>Priority: Minor
>
> Currently `DynamicJoinSelection` has two features: 1.demote broadcast hash 
> join, and 2.promote shuffled hash join. Both are achieved by adding join hint 
> in query plan, and only works for equi join. However the rule is matching 
> with `Join` operator now - 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala#L71,]
>  so it would add hint for non-equi join by mistake.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36617) Inconsistencies in approxQuantile annotations

2021-09-02 Thread Maciej Szymkiewicz (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz reassigned SPARK-36617:
--

Assignee: Cary Lee

> Inconsistencies in approxQuantile annotations
> -
>
> Key: SPARK-36617
> URL: https://issues.apache.org/jira/browse/SPARK-36617
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.1.0, 3.2.0, 3.3.0
>Reporter: Maciej Szymkiewicz
>Assignee: Cary Lee
>Priority: Major
> Fix For: 3.2.0, 3.1.3
>
>
> I've been reviewing PR in the legacy repo 
> (https://github.com/zero323/pyspark-stubs/pull/552) and it looks like we have 
> two problems with annotations for {{approxQuantile}}.
> First of all {{DataFrame.approxQuantile}} should overload definition to match 
> input arguments ‒ if col is a sequence then result should be a list of lists:
> {code:python}
> @overload
> def approxQuantile(
> self,
> col: str,
> probabilities: Union[List[float], Tuple[float]],
> relativeError: float
> ) -> List[float]: ...
> @overload
> def approxQuantile(
> self,
> col: Union[List[str], Tuple[str]],
> probabilities: Union[List[float], Tuple[float]],
> relativeError: float
> ) -> List[List[float]]: ...
> {code}
> Additionally {{DataFrameStatFunctions.approxQuantile}} should match whatever 
> we have in {{DataFrame}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36652) AQE dynamic join selection should not apply to non-equi join

2021-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36652:


Assignee: (was: Apache Spark)

> AQE dynamic join selection should not apply to non-equi join
> 
>
> Key: SPARK-36652
> URL: https://issues.apache.org/jira/browse/SPARK-36652
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Cheng Su
>Priority: Minor
>
> Currently `DynamicJoinSelection` has two features: 1.demote broadcast hash 
> join, and 2.promote shuffled hash join. Both are achieved by adding join hint 
> in query plan, and only works for equi join. However the rule is matching 
> with `Join` operator now - 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala#L71,]
>  so it would add hint for non-equi join by mistake.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36652) AQE dynamic join selection should not apply to non-equi join

2021-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36652:


Assignee: Apache Spark

> AQE dynamic join selection should not apply to non-equi join
> 
>
> Key: SPARK-36652
> URL: https://issues.apache.org/jira/browse/SPARK-36652
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Cheng Su
>Assignee: Apache Spark
>Priority: Minor
>
> Currently `DynamicJoinSelection` has two features: 1.demote broadcast hash 
> join, and 2.promote shuffled hash join. Both are achieved by adding join hint 
> in query plan, and only works for equi join. However the rule is matching 
> with `Join` operator now - 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala#L71,]
>  so it would add hint for non-equi join by mistake.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36652) AQE dynamic join selection should not apply to non-equi join

2021-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408914#comment-17408914
 ] 

Apache Spark commented on SPARK-36652:
--

User 'c21' has created a pull request for this issue:
https://github.com/apache/spark/pull/33899

> AQE dynamic join selection should not apply to non-equi join
> 
>
> Key: SPARK-36652
> URL: https://issues.apache.org/jira/browse/SPARK-36652
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Cheng Su
>Priority: Minor
>
> Currently `DynamicJoinSelection` has two features: 1.demote broadcast hash 
> join, and 2.promote shuffled hash join. Both are achieved by adding join hint 
> in query plan, and only works for equi join. However the rule is matching 
> with `Join` operator now - 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala#L71,]
>  so it would add hint for non-equi join by mistake.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36630) Add the option to use physical statistics to avoid large tables being broadcast

2021-09-02 Thread gaoyajun02 (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaoyajun02 updated SPARK-36630:
---
Parent: (was: SPARK-33828)
Issue Type: Question  (was: Sub-task)

> Add the option to use physical statistics to avoid large tables being 
> broadcast
> ---
>
> Key: SPARK-36630
> URL: https://issues.apache.org/jira/browse/SPARK-36630
> Project: Spark
>  Issue Type: Question
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: gaoyajun02
>Priority: Major
>
> Currently when AQE's queryStage is not materialized, it uses the stats of the 
> logical plan to estimate whether the plan can be converted to BHJ, and in 
> some scenarios the estimated value is several orders of magnitude smaller 
> than the actual broadcast data, which can lead to large tables being broadcast



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-36630) Add the option to use physical statistics to avoid large tables being broadcast

2021-09-02 Thread gaoyajun02 (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaoyajun02 updated SPARK-36630:
---
Comment: was deleted

(was: close it)

> Add the option to use physical statistics to avoid large tables being 
> broadcast
> ---
>
> Key: SPARK-36630
> URL: https://issues.apache.org/jira/browse/SPARK-36630
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: gaoyajun02
>Priority: Major
>
> Currently when AQE's queryStage is not materialized, it uses the stats of the 
> logical plan to estimate whether the plan can be converted to BHJ, and in 
> some scenarios the estimated value is several orders of magnitude smaller 
> than the actual broadcast data, which can lead to large tables being broadcast



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-36630) Add the option to use physical statistics to avoid large tables being broadcast

2021-09-02 Thread gaoyajun02 (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaoyajun02 closed SPARK-36630.
--

close it

> Add the option to use physical statistics to avoid large tables being 
> broadcast
> ---
>
> Key: SPARK-36630
> URL: https://issues.apache.org/jira/browse/SPARK-36630
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: gaoyajun02
>Priority: Major
>
> Currently when AQE's queryStage is not materialized, it uses the stats of the 
> logical plan to estimate whether the plan can be converted to BHJ, and in 
> some scenarios the estimated value is several orders of magnitude smaller 
> than the actual broadcast data, which can lead to large tables being broadcast



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36630) Add the option to use physical statistics to avoid large tables being broadcast

2021-09-02 Thread gaoyajun02 (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaoyajun02 resolved SPARK-36630.

Resolution: Fixed

> Add the option to use physical statistics to avoid large tables being 
> broadcast
> ---
>
> Key: SPARK-36630
> URL: https://issues.apache.org/jira/browse/SPARK-36630
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: gaoyajun02
>Priority: Major
>
> Currently when AQE's queryStage is not materialized, it uses the stats of the 
> logical plan to estimate whether the plan can be converted to BHJ, and in 
> some scenarios the estimated value is several orders of magnitude smaller 
> than the actual broadcast data, which can lead to large tables being broadcast



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36630) Add the option to use physical statistics to avoid large tables being broadcast

2021-09-02 Thread gaoyajun02 (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408901#comment-17408901
 ] 

gaoyajun02 commented on SPARK-36630:


Found my issue is similar to https://issues.apache.org/jira/browse/SPARK-35264.

I can set the autoBroadcastThreshold to -1 to disable logical plan stats,

and set spark.sql.adaptive.autoBroadcastJoinThreshold to control the threshold 
for physical stats

> Add the option to use physical statistics to avoid large tables being 
> broadcast
> ---
>
> Key: SPARK-36630
> URL: https://issues.apache.org/jira/browse/SPARK-36630
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: gaoyajun02
>Priority: Major
>
> Currently when AQE's queryStage is not materialized, it uses the stats of the 
> logical plan to estimate whether the plan can be converted to BHJ, and in 
> some scenarios the estimated value is several orders of magnitude smaller 
> than the actual broadcast data, which can lead to large tables being broadcast



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36652) AQE dynamic join selection should not apply to non-equi join

2021-09-02 Thread Cheng Su (Jira)
Cheng Su created SPARK-36652:


 Summary: AQE dynamic join selection should not apply to non-equi 
join
 Key: SPARK-36652
 URL: https://issues.apache.org/jira/browse/SPARK-36652
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0, 3.3.0
Reporter: Cheng Su


Currently `DynamicJoinSelection` has two features: 1.demote broadcast hash 
join, and 2.promote shuffled hash join. Both are achieved by adding join hint 
in query plan, and only works for equi join. However the rule is matching with 
`Join` operator now - 
[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala#L71,]
 so it would add hint for non-equi join by mistake.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36637) Bad error message when using non-existing named window

2021-09-02 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-36637:
---

Assignee: angerszhu

> Bad error message when using non-existing named window
> --
>
> Key: SPARK-36637
> URL: https://issues.apache.org/jira/browse/SPARK-36637
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Wenchen Fan
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.2.0
>
>
> {code:java}
> CREATE TABLE employees 
> (name STRING, dept STRING, salary INT, age INT);
> SELECT AVG(age) OVER win AS salary, 
> AVG(salary) OVER win AS avgsalary,
> MIN(salary) OVER win AS minsalary,
> MAX(salary) OVER win AS maxsalary,
> COUNT(1) OVER win AS numEmps
> FROM employees;
> Error in query: unresolved operator 'Aggregate 
> [unresolvedwindowexpression(avg(age#43), WindowSpecReference(win)) AS 
> salary#34, unresolvedwindowexpression(avg(salary#42), 
> WindowSpecReference(win)) AS avgsalary#35, 
> unresolvedwindowexpression(min(salary#42), WindowSpecReference(win)) AS 
> minsalary#36, unresolvedwindowexpression(max(salary#42), 
> WindowSpecReference(win)) AS maxsalary#37, 
> unresolvedwindowexpression(count(1), WindowSpecReference(win)) AS numEmps#38];
> 'Aggregate [unresolvedwindowexpression(avg(age#43), WindowSpecReference(win)) 
> AS salary#34, unresolvedwindowexpression(avg(salary#42), 
> WindowSpecReference(win)) AS avgsalary#35, 
> unresolvedwindowexpression(min(salary#42), WindowSpecReference(win)) AS 
> minsalary#36, unresolvedwindowexpression(max(salary#42), 
> WindowSpecReference(win)) AS maxsalary#37, 
> unresolvedwindowexpression(count(1), WindowSpecReference(win)) AS numEmps#38]
> +- SubqueryAlias spark_catalog.default.employees
> +- HiveTableRelation [`default`.`employees`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [name#40, 
> dept#41, salary#42, age#43], Partition Cols: []]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36637) Bad error message when using non-existing named window

2021-09-02 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-36637.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33892
[https://github.com/apache/spark/pull/33892]

> Bad error message when using non-existing named window
> --
>
> Key: SPARK-36637
> URL: https://issues.apache.org/jira/browse/SPARK-36637
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Wenchen Fan
>Priority: Major
> Fix For: 3.2.0
>
>
> {code:java}
> CREATE TABLE employees 
> (name STRING, dept STRING, salary INT, age INT);
> SELECT AVG(age) OVER win AS salary, 
> AVG(salary) OVER win AS avgsalary,
> MIN(salary) OVER win AS minsalary,
> MAX(salary) OVER win AS maxsalary,
> COUNT(1) OVER win AS numEmps
> FROM employees;
> Error in query: unresolved operator 'Aggregate 
> [unresolvedwindowexpression(avg(age#43), WindowSpecReference(win)) AS 
> salary#34, unresolvedwindowexpression(avg(salary#42), 
> WindowSpecReference(win)) AS avgsalary#35, 
> unresolvedwindowexpression(min(salary#42), WindowSpecReference(win)) AS 
> minsalary#36, unresolvedwindowexpression(max(salary#42), 
> WindowSpecReference(win)) AS maxsalary#37, 
> unresolvedwindowexpression(count(1), WindowSpecReference(win)) AS numEmps#38];
> 'Aggregate [unresolvedwindowexpression(avg(age#43), WindowSpecReference(win)) 
> AS salary#34, unresolvedwindowexpression(avg(salary#42), 
> WindowSpecReference(win)) AS avgsalary#35, 
> unresolvedwindowexpression(min(salary#42), WindowSpecReference(win)) AS 
> minsalary#36, unresolvedwindowexpression(max(salary#42), 
> WindowSpecReference(win)) AS maxsalary#37, 
> unresolvedwindowexpression(count(1), WindowSpecReference(win)) AS numEmps#38]
> +- SubqueryAlias spark_catalog.default.employees
> +- HiveTableRelation [`default`.`employees`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [name#40, 
> dept#41, salary#42, age#43], Partition Cols: []]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36617) Inconsistencies in approxQuantile annotations

2021-09-02 Thread Maciej Szymkiewicz (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz resolved SPARK-36617.

Fix Version/s: 3.1.3
   3.2.0
   Resolution: Resolved

Issue resolved by pull request 33880
https://github.com/apache/spark/pull/33880

> Inconsistencies in approxQuantile annotations
> -
>
> Key: SPARK-36617
> URL: https://issues.apache.org/jira/browse/SPARK-36617
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.1.0, 3.2.0, 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
> Fix For: 3.2.0, 3.1.3
>
>
> I've been reviewing PR in the legacy repo 
> (https://github.com/zero323/pyspark-stubs/pull/552) and it looks like we have 
> two problems with annotations for {{approxQuantile}}.
> First of all {{DataFrame.approxQuantile}} should overload definition to match 
> input arguments ‒ if col is a sequence then result should be a list of lists:
> {code:python}
> @overload
> def approxQuantile(
> self,
> col: str,
> probabilities: Union[List[float], Tuple[float]],
> relativeError: float
> ) -> List[float]: ...
> @overload
> def approxQuantile(
> self,
> col: Union[List[str], Tuple[str]],
> probabilities: Union[List[float], Tuple[float]],
> relativeError: float
> ) -> List[List[float]]: ...
> {code}
> Additionally {{DataFrameStatFunctions.approxQuantile}} should match whatever 
> we have in {{DataFrame}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36622) spark.history.kerberos.principal doesn't take value _HOST

2021-09-02 Thread Thomas Graves (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408813#comment-17408813
 ] 

Thomas Graves commented on SPARK-36622:
---

Supported _HOST for SHS likely makes sense since its a server.

> spark.history.kerberos.principal doesn't take value _HOST
> -
>
> Key: SPARK-36622
> URL: https://issues.apache.org/jira/browse/SPARK-36622
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy, Security, Spark Core
>Affects Versions: 3.0.1, 3.1.2
>Reporter: pralabhkumar
>Priority: Minor
>
> spark.history.kerberos.principal doesn't understand value _HOST. 
> It says failure to login for principal : spark/_HOST@realm . 
> It will be helpful to take _HOST value via config file and change it with 
> current hostname(similar to what Hive does) . This will also help to run SHS 
> on multiple machines without hardcoding principal hostname.  
> .spark.history.kerberos.principal
>  
> It require minor change in HistoryServer.scala in initSecurity  method . 
>  
> Please let me know , if this request make sense , I'll create the PR . 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-35623) Volcano resource manager for Spark on Kubernetes

2021-09-02 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408750#comment-17408750
 ] 

Senthil Kumar edited comment on SPARK-35623 at 9/2/21, 11:56 AM:
-

[~dipanjanK] Include me too pls.

mail id: senthissen...@gmail.com


was (Author: senthh):
[~dipanjanK] Include me too pls

> Volcano resource manager for Spark on Kubernetes
> 
>
> Key: SPARK-35623
> URL: https://issues.apache.org/jira/browse/SPARK-35623
> Project: Spark
>  Issue Type: Brainstorming
>  Components: Kubernetes
>Affects Versions: 3.1.1, 3.1.2
>Reporter: Dipanjan Kailthya
>Priority: Minor
>  Labels: kubernetes, resourcemanager
>
> Dear Spark Developers, 
>   
>  Hello from the Netherlands! Posting this here as I still haven't gotten 
> accepted to post in the spark dev mailing list.
>   
>  My team is planning to use spark with Kubernetes support on our shared 
> (multi-tenant) on premise Kubernetes cluster. However we would like to have 
> certain scheduling features like fair-share and preemption which as we 
> understand are not built into the current spark-kubernetes resource manager 
> yet. We have been working on and are close to a first successful prototype 
> integration with Volcano ([https://volcano.sh/en/docs/]). Briefly this means 
> a new resource manager component with lots in common with existing 
> spark-kubernetes resource manager, but instead of pods it launches Volcano 
> jobs which delegate the driver and executor pod creation and lifecycle 
> management to Volcano. We are interested in contributing this to open source, 
> either directly in spark or as a separate project.
>   
>  So, two questions: 
>   
>  1. Do the spark maintainers see this as a valuable contribution to the 
> mainline spark codebase? If so, can we have some guidance on how to publish 
> the changes? 
>   
>  2. Are any other developers / organizations interested to contribute to this 
> effort? If so, please get in touch.
>   
>  Best,
>  Dipanjan



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35623) Volcano resource manager for Spark on Kubernetes

2021-09-02 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408750#comment-17408750
 ] 

Senthil Kumar commented on SPARK-35623:
---

[~dipanjanK] Include me too pls

> Volcano resource manager for Spark on Kubernetes
> 
>
> Key: SPARK-35623
> URL: https://issues.apache.org/jira/browse/SPARK-35623
> Project: Spark
>  Issue Type: Brainstorming
>  Components: Kubernetes
>Affects Versions: 3.1.1, 3.1.2
>Reporter: Dipanjan Kailthya
>Priority: Minor
>  Labels: kubernetes, resourcemanager
>
> Dear Spark Developers, 
>   
>  Hello from the Netherlands! Posting this here as I still haven't gotten 
> accepted to post in the spark dev mailing list.
>   
>  My team is planning to use spark with Kubernetes support on our shared 
> (multi-tenant) on premise Kubernetes cluster. However we would like to have 
> certain scheduling features like fair-share and preemption which as we 
> understand are not built into the current spark-kubernetes resource manager 
> yet. We have been working on and are close to a first successful prototype 
> integration with Volcano ([https://volcano.sh/en/docs/]). Briefly this means 
> a new resource manager component with lots in common with existing 
> spark-kubernetes resource manager, but instead of pods it launches Volcano 
> jobs which delegate the driver and executor pod creation and lifecycle 
> management to Volcano. We are interested in contributing this to open source, 
> either directly in spark or as a separate project.
>   
>  So, two questions: 
>   
>  1. Do the spark maintainers see this as a valuable contribution to the 
> mainline spark codebase? If so, can we have some guidance on how to publish 
> the changes? 
>   
>  2. Are any other developers / organizations interested to contribute to this 
> effort? If so, please get in touch.
>   
>  Best,
>  Dipanjan



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36632) DivideYMInterval should throw the same exception when divide by zero.

2021-09-02 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-36632:
---
Summary: DivideYMInterval should throw the same exception when divide by 
zero.  (was: DivideYMInterval should consider ansi mode.)

> DivideYMInterval should throw the same exception when divide by zero.
> -
>
> Key: SPARK-36632
> URL: https://issues.apache.org/jira/browse/SPARK-36632
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> DivideYMInterval not consider the ansi mode, we should support it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36627) Tasks with Java proxy objects fail to deserialize

2021-09-02 Thread Samuel Souza (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samuel Souza updated SPARK-36627:
-
Description: 
In JavaSerializer.JavaDeserializationStream we override resolveClass of 
ObjectInputStream to use the threads' contextClassLoader. However, we do not 
override resolveProxyClass, which is used when deserializing Java proxy 
objects, which makes spark use the wrong classloader when deserializing 
objects, which causes the job to fail with the following exception:

{code}
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 
1.0 (TID 4, , executor 1): java.lang.ClassNotFoundException: 
at 
java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at 
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:398)
at 
java.base/java.io.ObjectInputStream.resolveProxyClass(ObjectInputStream.java:829)
at 
java.base/java.io.ObjectInputStream.readProxyDesc(ObjectInputStream.java:1917)
...
at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
{code}

  was:
In JavaSerializer.JavaDeserializationStream we override resolveClass of 
ObjectInputStream to use the threads' contextClassLoader. However, we do not 
override resolveProxyClass, which is used when deserializing Java proxy 
objects, which makes spark use the wrong classloader when deserializing 
objects, which causes the job to fail with the following exception:

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 
1.0 (TID 4, , executor 1): java.lang.ClassNotFoundException: 
at 
java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at 
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:398)
at 
java.base/java.io.ObjectInputStream.resolveProxyClass(ObjectInputStream.java:829)
at 
java.base/java.io.ObjectInputStream.readProxyDesc(ObjectInputStream.java:1917)
...
at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)



> Tasks with Java proxy objects fail to deserialize
> -
>
> Key: SPARK-36627
> URL: https://issues.apache.org/jira/browse/SPARK-36627
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.3
>Reporter: Samuel Souza
>Priority: Minor
>
> In JavaSerializer.JavaDeserializationStream we override resolveClass of 
> ObjectInputStream to use the threads' contextClassLoader. However, we do not 
> override resolveProxyClass, which is used when deserializing Java proxy 
> objects, which makes spark use the wrong classloader when deserializing 
> objects, which causes the job to fail with the following exception:
> {code}
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in 
> stage 1.0 (TID 4, , executor 1): java.lang.ClassNotFoundException: 
> 
>   at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
>   at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
>   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
>   at java.base/java.lang.Class.forName0(Native Method)
>   at java.base/java.lang.Class.forName(Class.java:398)
>   at 
> java.base/java.io.ObjectInputStream.resolveProxyClass(ObjectInputStream.java:829)
>   at 
> java.base/java.io.ObjectInputStream.readProxyDesc(ObjectInputStream.java:1917)
>   ...
>   at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36627) Tasks with Java proxy objects fail to deserialize

2021-09-02 Thread Samuel Souza (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samuel Souza updated SPARK-36627:
-
Summary: Tasks with Java proxy objects fail to deserialize  (was: Tasks 
with Java proxy objects fail to desrialize)

> Tasks with Java proxy objects fail to deserialize
> -
>
> Key: SPARK-36627
> URL: https://issues.apache.org/jira/browse/SPARK-36627
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.3
>Reporter: Samuel Souza
>Priority: Minor
>
> In JavaSerializer.JavaDeserializationStream we override resolveClass of 
> ObjectInputStream to use the threads' contextClassLoader. However, we do not 
> override resolveProxyClass, which is used when deserializing Java proxy 
> objects, which makes spark use the wrong classloader when deserializing 
> objects, which causes the job to fail with the following exception:
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in 
> stage 1.0 (TID 4, , executor 1): java.lang.ClassNotFoundException: 
> 
>   at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
>   at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
>   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
>   at java.base/java.lang.Class.forName0(Native Method)
>   at java.base/java.lang.Class.forName(Class.java:398)
>   at 
> java.base/java.io.ObjectInputStream.resolveProxyClass(ObjectInputStream.java:829)
>   at 
> java.base/java.io.ObjectInputStream.readProxyDesc(ObjectInputStream.java:1917)
>   ...
>   at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36644) Push down boolean column filter

2021-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36644:


Assignee: Apache Spark

> Push down boolean column filter
> ---
>
> Key: SPARK-36644
> URL: https://issues.apache.org/jira/browse/SPARK-36644
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.1.2, 3.2.0
>Reporter: Kazuyuki Tanimura
>Assignee: Apache Spark
>Priority: Major
>
> The following query does not push down the filter 
> ```
> SELECT * FROM t WHERE boolean_field
> ```
> although the following query pushes down the filter as expected.
> ```
> SELECT * FROM t WHERE boolean_field = true
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36644) Push down boolean column filter

2021-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408658#comment-17408658
 ] 

Apache Spark commented on SPARK-36644:
--

User 'kazuyukitanimura' has created a pull request for this issue:
https://github.com/apache/spark/pull/33898

> Push down boolean column filter
> ---
>
> Key: SPARK-36644
> URL: https://issues.apache.org/jira/browse/SPARK-36644
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.1.2, 3.2.0
>Reporter: Kazuyuki Tanimura
>Priority: Major
>
> The following query does not push down the filter 
> ```
> SELECT * FROM t WHERE boolean_field
> ```
> although the following query pushes down the filter as expected.
> ```
> SELECT * FROM t WHERE boolean_field = true
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36644) Push down boolean column filter

2021-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36644:


Assignee: (was: Apache Spark)

> Push down boolean column filter
> ---
>
> Key: SPARK-36644
> URL: https://issues.apache.org/jira/browse/SPARK-36644
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.1.2, 3.2.0
>Reporter: Kazuyuki Tanimura
>Priority: Major
>
> The following query does not push down the filter 
> ```
> SELECT * FROM t WHERE boolean_field
> ```
> although the following query pushes down the filter as expected.
> ```
> SELECT * FROM t WHERE boolean_field = true
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36630) Add the option to use physical statistics to avoid large tables being broadcast

2021-09-02 Thread gaoyajun02 (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaoyajun02 updated SPARK-36630:
---
Parent: SPARK-33828
Issue Type: Sub-task  (was: Improvement)

> Add the option to use physical statistics to avoid large tables being 
> broadcast
> ---
>
> Key: SPARK-36630
> URL: https://issues.apache.org/jira/browse/SPARK-36630
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: gaoyajun02
>Priority: Major
>
> Currently when AQE's queryStage is not materialized, it uses the stats of the 
> logical plan to estimate whether the plan can be converted to BHJ, and in 
> some scenarios the estimated value is several orders of magnitude smaller 
> than the actual broadcast data, which can lead to large tables being broadcast



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36651) NPE in HiveShim$.toCatalystDecimal(HiveShim.scala:110)

2021-09-02 Thread Serge Shikov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Shikov updated SPARK-36651:
-
Description: 
This is still reproducible in Spark 2.4.5. Original issue SPARK-25080 conains 
steps to reproduce.

 

 ```
 Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 1190 in stage 392.0 failed 4 times, most recent failure: Lost task 1190.3 
in stage 392.0 (TID 122055, ip-172-31-32-196.ec2.internal, executor 487): 
java.lang.NullPointerException
 at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110)
 at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:414)
 at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:413)
 at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:442)
 at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:433)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
 at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:217)
 at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:294)
 at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:265)
 at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
 at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
 at org.apache.spark.scheduler.Task.run(Task.scala:109)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
  
 Driver stacktrace:
 at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1753)
 at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1741)
 at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1740)
 at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
 at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1740)
 at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871)
 at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871)
 at scala.Option.foreach(Option.scala:257)
 at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:871)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1974)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1923)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1912)
 at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
 at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:682)
 at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:194)
 ... 67 more
 Caused by: java.lang.NullPointerException
 at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110)
 at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:414)
 at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:413)
 at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:442)
 at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:433)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
 at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:217)
 at 

[jira] [Updated] (SPARK-36651) NPE in HiveShim$.toCatalystDecimal(HiveShim.scala:110)

2021-09-02 Thread Serge Shikov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Shikov updated SPARK-36651:
-
Issue Type: Bug  (was: Improvement)

> NPE in HiveShim$.toCatalystDecimal(HiveShim.scala:110)
> --
>
> Key: SPARK-36651
> URL: https://issues.apache.org/jira/browse/SPARK-36651
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.4.5
> Environment: AWS EMR
>Reporter: Serge Shikov
>Priority: Minor
>
> This is still reproducible in Spark 2.4.5. Original issue conains steps to 
> reproduce.
>  
>  ```
>  Caused by: org.apache.spark.SparkException: Job aborted due to stage 
> failure: Task 1190 in stage 392.0 failed 4 times, most recent failure: Lost 
> task 1190.3 in stage 392.0 (TID 122055, ip-172-31-32-196.ec2.internal, 
> executor 487): java.lang.NullPointerException
>  at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110)
>  at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:414)
>  at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:413)
>  at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:442)
>  at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:433)
>  at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>  at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>  at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:217)
>  at 
> org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:294)
>  at 
> org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:265)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>  at org.apache.spark.scheduler.Task.run(Task.scala:109)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>   
>  Driver stacktrace:
>  at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1753)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1741)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1740)
>  at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>  at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1740)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871)
>  at scala.Option.foreach(Option.scala:257)
>  at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:871)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1974)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1923)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1912)
>  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:682)
>  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:194)
>  ... 67 more
>  Caused by: java.lang.NullPointerException
>  at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110)
>  at 
> 

[jira] [Updated] (SPARK-36651) NPE in HiveShim$.toCatalystDecimal(HiveShim.scala:110)

2021-09-02 Thread Serge Shikov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Shikov updated SPARK-36651:
-
Priority: Major  (was: Minor)

> NPE in HiveShim$.toCatalystDecimal(HiveShim.scala:110)
> --
>
> Key: SPARK-36651
> URL: https://issues.apache.org/jira/browse/SPARK-36651
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.4.5
> Environment: AWS EMR
>Reporter: Serge Shikov
>Priority: Major
>
> This is still reproducible in Spark 2.4.5. Original issue conains steps to 
> reproduce.
>  
>  ```
>  Caused by: org.apache.spark.SparkException: Job aborted due to stage 
> failure: Task 1190 in stage 392.0 failed 4 times, most recent failure: Lost 
> task 1190.3 in stage 392.0 (TID 122055, ip-172-31-32-196.ec2.internal, 
> executor 487): java.lang.NullPointerException
>  at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110)
>  at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:414)
>  at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:413)
>  at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:442)
>  at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:433)
>  at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>  at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>  at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:217)
>  at 
> org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:294)
>  at 
> org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:265)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>  at org.apache.spark.scheduler.Task.run(Task.scala:109)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>   
>  Driver stacktrace:
>  at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1753)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1741)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1740)
>  at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>  at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1740)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871)
>  at scala.Option.foreach(Option.scala:257)
>  at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:871)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1974)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1923)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1912)
>  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:682)
>  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:194)
>  ... 67 more
>  Caused by: java.lang.NullPointerException
>  at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110)
>  at 
> 

[jira] [Updated] (SPARK-36651) NPE in HiveShim$.toCatalystDecimal(HiveShim.scala:110)

2021-09-02 Thread Serge Shikov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Shikov updated SPARK-36651:
-
Description: 
This is still reproducible in Spark 2.4.5. Original issue conains steps to 
reproduce.

 

 ```
 Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 1190 in stage 392.0 failed 4 times, most recent failure: Lost task 1190.3 
in stage 392.0 (TID 122055, ip-172-31-32-196.ec2.internal, executor 487): 
java.lang.NullPointerException
 at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110)
 at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:414)
 at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:413)
 at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:442)
 at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:433)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
 at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:217)
 at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:294)
 at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:265)
 at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
 at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
 at org.apache.spark.scheduler.Task.run(Task.scala:109)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
  
 Driver stacktrace:
 at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1753)
 at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1741)
 at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1740)
 at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
 at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1740)
 at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871)
 at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871)
 at scala.Option.foreach(Option.scala:257)
 at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:871)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1974)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1923)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1912)
 at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
 at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:682)
 at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:194)
 ... 67 more
 Caused by: java.lang.NullPointerException
 at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110)
 at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:414)
 at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:413)
 at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:442)
 at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:433)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
 at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:217)
 at 

[jira] [Updated] (SPARK-36651) NPE in HiveShim$.toCatalystDecimal(HiveShim.scala:110)

2021-09-02 Thread Serge Shikov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Shikov updated SPARK-36651:
-
Affects Version/s: (was: 2.3.1)
   2.4.5

> NPE in HiveShim$.toCatalystDecimal(HiveShim.scala:110)
> --
>
> Key: SPARK-36651
> URL: https://issues.apache.org/jira/browse/SPARK-36651
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Affects Versions: 2.4.5
> Environment: AWS EMR
>Reporter: Serge Shikov
>Priority: Minor
>
> This is still reproducible in Spark 2.4.5. Original issue conains steps to 
> reproduce.
>  
>  ```
>  Caused by: org.apache.spark.SparkException: Job aborted due to stage 
> failure: Task 1190 in stage 392.0 failed 4 times, most recent failure: Lost 
> task 1190.3 in stage 392.0 (TID 122055, ip-172-31-32-196.ec2.internal, 
> executor 487): java.lang.NullPointerException
>  at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110)
>  at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:414)
>  at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:413)
>  at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:442)
>  at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:433)
>  at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>  at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>  at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:217)
>  at 
> org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:294)
>  at 
> org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:265)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>  at org.apache.spark.scheduler.Task.run(Task.scala:109)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>   
>  Driver stacktrace:
>  at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1753)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1741)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1740)
>  at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>  at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1740)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871)
>  at scala.Option.foreach(Option.scala:257)
>  at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:871)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1974)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1923)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1912)
>  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:682)
>  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:194)
>  ... 67 more
>  Caused by: java.lang.NullPointerException
>  at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110)
>  at 
> 

[jira] [Created] (SPARK-36651) NPE in HiveShim$.toCatalystDecimal(HiveShim.scala:110)

2021-09-02 Thread Serge Shikov (Jira)
Serge Shikov created SPARK-36651:


 Summary: NPE in HiveShim$.toCatalystDecimal(HiveShim.scala:110)
 Key: SPARK-36651
 URL: https://issues.apache.org/jira/browse/SPARK-36651
 Project: Spark
  Issue Type: Improvement
  Components: Input/Output
Affects Versions: 2.3.1
 Environment: AWS EMR
Reporter: Serge Shikov


NPE while reading hive table.

 

```
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 1190 in stage 392.0 failed 4 times, most recent failure: Lost task 1190.3 
in stage 392.0 (TID 122055, ip-172-31-32-196.ec2.internal, executor 487): 
java.lang.NullPointerException
at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110)
at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:414)
at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:413)
at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:442)
at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:433)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:217)
at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:294)
at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:265)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
 
Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1753)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1741)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1740)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1740)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871)
at scala.Option.foreach(Option.scala:257)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:871)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1974)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1923)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1912)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:682)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:194)
... 67 more
Caused by: java.lang.NullPointerException
at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110)
at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:414)
at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:413)
at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:442)
at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:433)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at 

[jira] [Commented] (SPARK-36650) ApplicationMaster shutdown hook should catch timeout exception

2021-09-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408601#comment-17408601
 ] 

Apache Spark commented on SPARK-36650:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/33897

> ApplicationMaster shutdown hook should catch timeout exception
> --
>
> Key: SPARK-36650
> URL: https://issues.apache.org/jira/browse/SPARK-36650
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> ApplicationMaster shutdown hook should catch timeout exception



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36650) ApplicationMaster shutdown hook should catch timeout exception

2021-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36650:


Assignee: (was: Apache Spark)

> ApplicationMaster shutdown hook should catch timeout exception
> --
>
> Key: SPARK-36650
> URL: https://issues.apache.org/jira/browse/SPARK-36650
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> ApplicationMaster shutdown hook should catch timeout exception



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36650) ApplicationMaster shutdown hook should catch timeout exception

2021-09-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36650:


Assignee: Apache Spark

> ApplicationMaster shutdown hook should catch timeout exception
> --
>
> Key: SPARK-36650
> URL: https://issues.apache.org/jira/browse/SPARK-36650
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>
> ApplicationMaster shutdown hook should catch timeout exception



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36584) ExecutorMonitor#onBlockUpdated will receive event from driver

2021-09-02 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-36584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

胡振宇 updated SPARK-36584:

Description: 
 When driver broadcast object, it will send the 
[SparkListenerBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228]
 event. 
[ExecutorMonitor#onBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380]
 receives and handles the event, in this method, 
 it calls 
[ensureExecutorIsTracked|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L489]
 to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
my understanding, `ExecutorMonitor` should only monitor executor not driver.  
Although this will not cause any problems at the moment because 
UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
potential risk



  was:
 When driver broadcast object, it will send the 
[SparkListenerBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228]
 event. 
[ExecutorMonitor#onBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380]receives
 and handles the event, in this method, 
 it calls 
[ensureExecutorIsTracked|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L489]
 to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
this will not cause any problems at the moment because 
UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
potential risk




> ExecutorMonitor#onBlockUpdated will receive event from driver
> -
>
> Key: SPARK-36584
> URL: https://issues.apache.org/jira/browse/SPARK-36584
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 3.1.2
> Environment: Spark 3.1.2
>Reporter: 胡振宇
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
>  When driver broadcast object, it will send the 
> [SparkListenerBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228]
>  event. 
> [ExecutorMonitor#onBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380]
>  receives and handles the event, in this method, 
>  it calls 
> [ensureExecutorIsTracked|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L489]
>  to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
> my understanding, `ExecutorMonitor` should only monitor executor not driver.  
> Although this will not cause any problems at the moment because 
> UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
> potential risk



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36584) ExecutorMonitor#onBlockUpdated will receive event from driver

2021-09-02 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-36584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

胡振宇 updated SPARK-36584:

Description: 
 When driver broadcast object, it will send the 
[SparkListenerBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228]
 event. 
[ExecutorMonitor#onBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380]receives
 and handles the event, in this method, 
 it calls 
[ensureExecutorIsTracked|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L489]
 to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
this will not cause any problems at the moment because 
UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
potential risk



  was:
 When driver broadcast object, it will send the 
[SparkListenerBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228]
 event. 
[ExecutorMonitor#onBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380]receives
 and handles the event, in this method, 
 it calls [ensureExecutorIsTracked|#L489]
 to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
this will not cause any problems at the moment because 
UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
potential risk




> ExecutorMonitor#onBlockUpdated will receive event from driver
> -
>
> Key: SPARK-36584
> URL: https://issues.apache.org/jira/browse/SPARK-36584
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 3.1.2
> Environment: Spark 3.1.2
>Reporter: 胡振宇
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
>  When driver broadcast object, it will send the 
> [SparkListenerBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228]
>  event. 
> [ExecutorMonitor#onBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380]receives
>  and handles the event, in this method, 
>  it calls 
> [ensureExecutorIsTracked|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L489]
>  to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
> my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
> this will not cause any problems at the moment because 
> UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
> potential risk



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36584) ExecutorMonitor#onBlockUpdated will receive event from driver

2021-09-02 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-36584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

胡振宇 updated SPARK-36584:

Description: 
 When driver broadcast object, it will send the 
[SparkListenerBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228]
 event. 
[ExecutorMonitor#onBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380]receives
 and handles the event, in this method, 
 it calls [ensureExecutorIsTracked|#L489]
 to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
this will not cause any problems at the moment because 
UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
potential risk



  was:
 When driver broadcast object, it will send the 
[SparkListenerBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228]
 event. [ExecutorMonitor#onBlockUpdated |#L380]receives and handles the event, 
in this method, 
 it calls [ensureExecutorIsTracked|#L489]
 to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
this will not cause any problems at the moment because 
UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
potential risk




> ExecutorMonitor#onBlockUpdated will receive event from driver
> -
>
> Key: SPARK-36584
> URL: https://issues.apache.org/jira/browse/SPARK-36584
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 3.1.2
> Environment: Spark 3.1.2
>Reporter: 胡振宇
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
>  When driver broadcast object, it will send the 
> [SparkListenerBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228]
>  event. 
> [ExecutorMonitor#onBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380]receives
>  and handles the event, in this method, 
>  it calls [ensureExecutorIsTracked|#L489]
>  to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
> my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
> this will not cause any problems at the moment because 
> UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
> potential risk



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36584) ExecutorMonitor#onBlockUpdated will receive event from driver

2021-09-02 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-36584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

胡振宇 updated SPARK-36584:

Description: 
 When driver broadcast object, it will send the 
[SparkListenerBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228]
 event. [ExecutorMonitor#onBlockUpdated |#L380]receives and handles the event, 
in this method, 
 it calls [ensureExecutorIsTracked|#L489]
 to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
this will not cause any problems at the moment because 
UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
potential risk



  was:
 When driver broadcast object, it will send the [SparkListenerBlockUpdated|  
#L228] event. [ExecutorMonitor#onBlockUpdated |#L380]receives and handles the 
event, in this method, 
 it calls [ensureExecutorIsTracked|#L489]
 to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
this will not cause any problems at the moment because 
UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
potential risk




> ExecutorMonitor#onBlockUpdated will receive event from driver
> -
>
> Key: SPARK-36584
> URL: https://issues.apache.org/jira/browse/SPARK-36584
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 3.1.2
> Environment: Spark 3.1.2
>Reporter: 胡振宇
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
>  When driver broadcast object, it will send the 
> [SparkListenerBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228]
>  event. [ExecutorMonitor#onBlockUpdated |#L380]receives and handles the 
> event, in this method, 
>  it calls [ensureExecutorIsTracked|#L489]
>  to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
> my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
> this will not cause any problems at the moment because 
> UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
> potential risk



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36584) ExecutorMonitor#onBlockUpdated will receive event from driver

2021-09-02 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-36584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

胡振宇 updated SPARK-36584:

Description: 
 When driver broadcast object, it will send the [SparkListenerBlockUpdated|  
#L228] event. [ExecutorMonitor#onBlockUpdated |#L380]receives and handles the 
event, in this method, 
 it calls [ensureExecutorIsTracked|#L489]
 to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
this will not cause any problems at the moment because 
UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
potential risk



  was:
 When driver broadcast object, it will send the 
[SparkListenerBlockUpdated|#L228] event. [ExecutorMonitor#onBlockUpdated 
|#L380]receives and handles the event, in this method, 
 it calls [ensureExecutorIsTracked|#L489]
 to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
this will not cause any problems at the moment because 
UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
potential risk

[link title|http://example.com]


> ExecutorMonitor#onBlockUpdated will receive event from driver
> -
>
> Key: SPARK-36584
> URL: https://issues.apache.org/jira/browse/SPARK-36584
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 3.1.2
> Environment: Spark 3.1.2
>Reporter: 胡振宇
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
>  When driver broadcast object, it will send the [SparkListenerBlockUpdated|  
> #L228] event. [ExecutorMonitor#onBlockUpdated |#L380]receives and handles the 
> event, in this method, 
>  it calls [ensureExecutorIsTracked|#L489]
>  to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
> my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
> this will not cause any problems at the moment because 
> UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
> potential risk



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36584) ExecutorMonitor#onBlockUpdated will receive event from driver

2021-09-02 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-36584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

胡振宇 updated SPARK-36584:

Description: 
 When driver broadcast object, it will send the 
[SparkListenerBlockUpdated|#L228] event. [ExecutorMonitor#onBlockUpdated 
|#L380]receives and handles the event, in this method, 
 it calls [ensureExecutorIsTracked|#L489]
 to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
this will not cause any problems at the moment because 
UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
potential risk

  was:
 When driver broadcast object, it will send the 
[SparkListenerBlockUpdated|#L228] event. 
[ExecutorMonitor#onBlockUpdated|#L380]] receives and handles the event, in this 
method, 
 it calls [ensureExecutorIsTracked|#L489]
 to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
this will not cause any problems at the moment because 
UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
potential risk


> ExecutorMonitor#onBlockUpdated will receive event from driver
> -
>
> Key: SPARK-36584
> URL: https://issues.apache.org/jira/browse/SPARK-36584
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 3.1.2
> Environment: Spark 3.1.2
>Reporter: 胡振宇
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
>  When driver broadcast object, it will send the 
> [SparkListenerBlockUpdated|#L228] event. [ExecutorMonitor#onBlockUpdated 
> |#L380]receives and handles the event, in this method, 
>  it calls [ensureExecutorIsTracked|#L489]
>  to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
> my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
> this will not cause any problems at the moment because 
> UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
> potential risk



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36584) ExecutorMonitor#onBlockUpdated will receive event from driver

2021-09-02 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-36584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

胡振宇 updated SPARK-36584:

Description: 
 When driver broadcast object, it will send the 
[SparkListenerBlockUpdated|#L228] event. [ExecutorMonitor#onBlockUpdated 
|#L380]receives and handles the event, in this method, 
 it calls [ensureExecutorIsTracked|#L489]
 to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
this will not cause any problems at the moment because 
UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
potential risk

[link title|http://example.com]

  was:
 When driver broadcast object, it will send the 
[SparkListenerBlockUpdated|#L228] event. [ExecutorMonitor#onBlockUpdated 
|#L380]receives and handles the event, in this method, 
 it calls [ensureExecutorIsTracked|#L489]
 to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
this will not cause any problems at the moment because 
UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
potential risk


> ExecutorMonitor#onBlockUpdated will receive event from driver
> -
>
> Key: SPARK-36584
> URL: https://issues.apache.org/jira/browse/SPARK-36584
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 3.1.2
> Environment: Spark 3.1.2
>Reporter: 胡振宇
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
>  When driver broadcast object, it will send the 
> [SparkListenerBlockUpdated|#L228] event. [ExecutorMonitor#onBlockUpdated 
> |#L380]receives and handles the event, in this method, 
>  it calls [ensureExecutorIsTracked|#L489]
>  to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
> my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
> this will not cause any problems at the moment because 
> UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
> potential risk
> [link title|http://example.com]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36584) ExecutorMonitor#onBlockUpdated will receive event from driver

2021-09-02 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-36584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

胡振宇 updated SPARK-36584:

Description: 
 When driver broadcast object, it will send the 
[SparkListenerBlockUpdated|#L228] event. 
[ExecutorMonitor#onBlockUpdated|#L380]] receives and handles the event, in this 
method, 
 it calls [ensureExecutorIsTracked|#L489]
 to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
this will not cause any problems at the moment because 
UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
potential risk

  was:
 When driver broadcast object, it will send the 
[SparkListenerBlockUpdated|#L228]] event. 
[ExecutorMonitor#onBlockUpdated|[https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380]]
 receives and handles the event, in this method, 
 it calls [ensureExecutorIsTracked|#L489]]
 to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
this will not cause any problems at the moment because 
UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
potential risk


> ExecutorMonitor#onBlockUpdated will receive event from driver
> -
>
> Key: SPARK-36584
> URL: https://issues.apache.org/jira/browse/SPARK-36584
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 3.1.2
> Environment: Spark 3.1.2
>Reporter: 胡振宇
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
>  When driver broadcast object, it will send the 
> [SparkListenerBlockUpdated|#L228] event. 
> [ExecutorMonitor#onBlockUpdated|#L380]] receives and handles the event, in 
> this method, 
>  it calls [ensureExecutorIsTracked|#L489]
>  to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
> my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
> this will not cause any problems at the moment because 
> UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
> potential risk



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36584) ExecutorMonitor#onBlockUpdated will receive event from driver

2021-09-02 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-36584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

胡振宇 updated SPARK-36584:

Description: 
 When driver broadcast object, it will send the 
[SparkListenerBlockUpdated|#L228]] event. 
[ExecutorMonitor#onBlockUpdated|[https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380]]
 receives and handles the event, in this method, 
 it calls [ensureExecutorIsTracked|#L489]]
 to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
this will not cause any problems at the moment because 
UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
potential risk

  was:
 When driver broadcast object, it will send the 
[SparkListenerBlockUpdated|[https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228]]
 event. 
[[ExecutorMonitor#onBlockUpdated|#onBlockUpdated]|[https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380]]
 receives and handles the event, in this method, 
 it calls 
[ensureExecutorIsTracked|[https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L489]]
 to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
this will not cause any problems at the moment because 
UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
potential risk


> ExecutorMonitor#onBlockUpdated will receive event from driver
> -
>
> Key: SPARK-36584
> URL: https://issues.apache.org/jira/browse/SPARK-36584
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 3.1.2
> Environment: Spark 3.1.2
>Reporter: 胡振宇
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
>  When driver broadcast object, it will send the 
> [SparkListenerBlockUpdated|#L228]] event. 
> [ExecutorMonitor#onBlockUpdated|[https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380]]
>  receives and handles the event, in this method, 
>  it calls [ensureExecutorIsTracked|#L489]]
>  to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
> my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
> this will not cause any problems at the moment because 
> UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
> potential risk



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36584) ExecutorMonitor#onBlockUpdated will receive event from driver

2021-09-02 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-36584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

胡振宇 updated SPARK-36584:

Description: 
 When driver broadcast object, it will send the 
[SparkListenerBlockUpdated|[https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228]]
 event. 
[[ExecutorMonitor#onBlockUpdated|#onBlockUpdated]|[https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380]]
 receives and handles the event, in this method, 
 it calls 
[ensureExecutorIsTracked|[https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L489]]
 to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
this will not cause any problems at the moment because 
UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
potential risk

  was:
 When driver broadcast object, 
it will send the 
[SparkListenerBlockUpdated](https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228)
 event.
[ExecutorMonitor#onBlockUpdated](https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380)
 
receives and handles the event, in this method, 
it calls 
[ensureExecutorIsTracked](https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L489)
to put driver in `executors` variable, but in my understanding, 
`ExecutorMonitor` should only monitor Executor, not Driver. 
Moreover, adding a `driver` to the `executors` will affect the calculation of
[ExecutorAllocationManager#removeExecutors](https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L552),
 and the driver will occupy the count of `executors`

 Issue Type: Question  (was: Bug)
   Priority: Minor  (was: Major)

> ExecutorMonitor#onBlockUpdated will receive event from driver
> -
>
> Key: SPARK-36584
> URL: https://issues.apache.org/jira/browse/SPARK-36584
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 3.1.2
> Environment: Spark 3.1.2
>Reporter: 胡振宇
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
>  When driver broadcast object, it will send the 
> [SparkListenerBlockUpdated|[https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228]]
>  event. 
> [[ExecutorMonitor#onBlockUpdated|#onBlockUpdated]|[https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380]]
>  receives and handles the event, in this method, 
>  it calls 
> [ensureExecutorIsTracked|[https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L489]]
>  to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In 
> my understanding, `ExecutorMonitor` should only monitor Executor.  Although 
> this will not cause any problems at the moment because 
> UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a 
> potential risk



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >