[jira] [Resolved] (SPARK-40395) Provide query context in AnalysisException

2023-11-25 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-40395.

Resolution: Implemented

> Provide query context in AnalysisException
> --
>
> Key: SPARK-40395
> URL: https://issues.apache.org/jira/browse/SPARK-40395
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Provide query context in AnalysisException for better error messages



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40395) Provide query context in AnalysisException

2023-11-25 Thread Gengliang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789768#comment-17789768
 ] 

Gengliang Wang commented on SPARK-40395:


Resolved in [https://github.com/apache/spark/pull/37841.] The PR was using a 
wrong jira.

> Provide query context in AnalysisException
> --
>
> Key: SPARK-40395
> URL: https://issues.apache.org/jira/browse/SPARK-40395
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Provide query context in AnalysisException for better error messages



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35161) Error-handling SQL functions

2023-11-25 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-35161.

Resolution: Done

> Error-handling SQL functions
> 
>
> Key: SPARK-35161
> URL: https://issues.apache.org/jira/browse/SPARK-35161
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Create new Error-handling version SQL functions for existing SQL 
> functions/operators, which returns NULL if overflow/error occurs. So that:
> 1. Users can manage to finish queries without interruptions in ANSI mode.
> 2. Users can get NULLs instead of unreasonable results if overflow occurs 
> when ANSI mode is off.
> For example, the behavior of the following SQL operations is unreasonable:
> {code:java}
> 2147483647 + 2 => -2147483647
> CAST(2147483648L AS INT) => -2147483648
> {code}
> With the new safe version SQL functions:
> {code:java}
> TRY_ADD(2147483647, 2) => null
> TRY_CAST(2147483648L AS INT) => null
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35161) Error-handling SQL functions

2023-11-25 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-35161:
--

Assignee: Gengliang Wang

> Error-handling SQL functions
> 
>
> Key: SPARK-35161
> URL: https://issues.apache.org/jira/browse/SPARK-35161
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Create new Error-handling version SQL functions for existing SQL 
> functions/operators, which returns NULL if overflow/error occurs. So that:
> 1. Users can manage to finish queries without interruptions in ANSI mode.
> 2. Users can get NULLs instead of unreasonable results if overflow occurs 
> when ANSI mode is off.
> For example, the behavior of the following SQL operations is unreasonable:
> {code:java}
> 2147483647 + 2 => -2147483647
> CAST(2147483648L AS INT) => -2147483648
> {code}
> With the new safe version SQL functions:
> {code:java}
> TRY_ADD(2147483647, 2) => null
> TRY_CAST(2147483648L AS INT) => null
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46105) df.emptyDataFrame shows 1 if we repartition

2023-11-25 Thread dharani_sugumar (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dharani_sugumar updated SPARK-46105:

Attachment: Screenshot 2023-11-26 at 11.54.58 AM.png

> df.emptyDataFrame shows 1 if we repartition
> ---
>
> Key: SPARK-46105
> URL: https://issues.apache.org/jira/browse/SPARK-46105
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.3
> Environment: EKS
> EMR
>Reporter: dharani_sugumar
>Priority: Major
> Attachments: Screenshot 2023-11-26 at 11.54.58 AM.png
>
>
> {color:#FF}Version: 3.3.3{color}
>  
> {color:#FF}scala> val df = spark.emptyDataFrame{color}
> {color:#FF}df: org.apache.spark.sql.DataFrame = []{color}
> {color:#FF}scala> df.rdd.getNumPartitions{color}
> {color:#FF}res0: Int = 0{color}
> {color:#FF}scala> df.repartition(1).rdd.getNumPartitions{color}
> {color:#FF}res1: Int = 1{color}
> {color:#FF}scala> df.repartition(1).rdd.isEmpty(){color}
> {color:#FF}[Stage 1:>                                                     
>      (0 + 1) /                                                                
>              res2: Boolean = true{color}
> Version: 3.2.4
> scala> val df = spark.emptyDataFrame
> df: org.apache.spark.sql.DataFrame = []
> scala> df.rdd.getNumPartitions
> res0: Int = 0
> scala> df.repartition(1).rdd.getNumPartitions
> res1: Int = 0
> scala> df.repartition(1).rdd.isEmpty()
> res2: Boolean = true
>  
> {color:#FF}Version: 3.5.0{color}
> {color:#FF}scala> val df = spark.emptyDataFrame{color}
> {color:#FF}df: org.apache.spark.sql.DataFrame = []{color}
> {color:#FF}scala> df.rdd.getNumPartitions{color}
> {color:#FF}res0: Int = 0{color}
> {color:#FF}scala> df.repartition(1).rdd.getNumPartitions{color}
> {color:#FF}res1: Int = 1{color}
> {color:#FF}scala> df.repartition(1).rdd.isEmpty(){color}
> {color:#FF}[Stage 1:>                                                     
>      (0 + 1) /                                                                
>              res2: Boolean = true{color}
>  
> When we do repartition of 1 on an empty dataframe, the resultant partition is 
> 1 in version 3.3.x and 3.5.x whereas when I do the same in version 3.2.x, the 
> resultant partition is 0. May i know why this behaviour is changed from 3.2.x 
> to higher versions. 
>  
> The reason for raising this as a bug is I have a scenario where my final 
> dataframe returns 0 records in EKS(local spark) with single node(driver and 
> executor on the sam node) but it returns 1 in EMR both uses a same spark 
> version 3.3.3. I'm not sure why this behaves different in both the 
> environments. As a interim solution, I had to repartition a empty dataframe 
> if my final dataframe is empty which returns 1 for 3.3.3. Would like to know 
> if this really a bug or this behaviour exists in the future versions and 
> cannot be changed?
>  
> Because, If we go for a spark upgrade and this behaviour is changed, we will 
> face the issue again. 
> Please confirm on this.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46105) df.emptyDataFrame shows 1 if we repartition(1) in Spark 3.3.x and above

2023-11-25 Thread dharani_sugumar (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dharani_sugumar updated SPARK-46105:

Summary: df.emptyDataFrame shows 1 if we repartition(1) in Spark 3.3.x and 
above  (was: df.emptyDataFrame shows 1 if we repartition)

> df.emptyDataFrame shows 1 if we repartition(1) in Spark 3.3.x and above
> ---
>
> Key: SPARK-46105
> URL: https://issues.apache.org/jira/browse/SPARK-46105
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.3
> Environment: EKS
> EMR
>Reporter: dharani_sugumar
>Priority: Major
> Attachments: Screenshot 2023-11-26 at 11.54.58 AM.png
>
>
> {color:#FF}Version: 3.3.3{color}
>  
> {color:#FF}scala> val df = spark.emptyDataFrame{color}
> {color:#FF}df: org.apache.spark.sql.DataFrame = []{color}
> {color:#FF}scala> df.rdd.getNumPartitions{color}
> {color:#FF}res0: Int = 0{color}
> {color:#FF}scala> df.repartition(1).rdd.getNumPartitions{color}
> {color:#FF}res1: Int = 1{color}
> {color:#FF}scala> df.repartition(1).rdd.isEmpty(){color}
> {color:#FF}[Stage 1:>                                                     
>      (0 + 1) /                                                                
>              res2: Boolean = true{color}
> Version: 3.2.4
> scala> val df = spark.emptyDataFrame
> df: org.apache.spark.sql.DataFrame = []
> scala> df.rdd.getNumPartitions
> res0: Int = 0
> scala> df.repartition(1).rdd.getNumPartitions
> res1: Int = 0
> scala> df.repartition(1).rdd.isEmpty()
> res2: Boolean = true
>  
> {color:#FF}Version: 3.5.0{color}
> {color:#FF}scala> val df = spark.emptyDataFrame{color}
> {color:#FF}df: org.apache.spark.sql.DataFrame = []{color}
> {color:#FF}scala> df.rdd.getNumPartitions{color}
> {color:#FF}res0: Int = 0{color}
> {color:#FF}scala> df.repartition(1).rdd.getNumPartitions{color}
> {color:#FF}res1: Int = 1{color}
> {color:#FF}scala> df.repartition(1).rdd.isEmpty(){color}
> {color:#FF}[Stage 1:>                                                     
>      (0 + 1) /                                                                
>              res2: Boolean = true{color}
>  
> When we do repartition of 1 on an empty dataframe, the resultant partition is 
> 1 in version 3.3.x and 3.5.x whereas when I do the same in version 3.2.x, the 
> resultant partition is 0. May i know why this behaviour is changed from 3.2.x 
> to higher versions. 
>  
> The reason for raising this as a bug is I have a scenario where my final 
> dataframe returns 0 records in EKS(local spark) with single node(driver and 
> executor on the sam node) but it returns 1 in EMR both uses a same spark 
> version 3.3.3. I'm not sure why this behaves different in both the 
> environments. As a interim solution, I had to repartition a empty dataframe 
> if my final dataframe is empty which returns 1 for 3.3.3. Would like to know 
> if this really a bug or this behaviour exists in the future versions and 
> cannot be changed?
>  
> Because, If we go for a spark upgrade and this behaviour is changed, we will 
> face the issue again. 
> Please confirm on this.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46105) df.emptyDataFrame shows 1 if we repartition

2023-11-25 Thread dharani_sugumar (Jira)
dharani_sugumar created SPARK-46105:
---

 Summary: df.emptyDataFrame shows 1 if we repartition
 Key: SPARK-46105
 URL: https://issues.apache.org/jira/browse/SPARK-46105
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.3.3
 Environment: EKS

EMR
Reporter: dharani_sugumar


{color:#FF}Version: 3.3.3{color}

 

{color:#FF}scala> val df = spark.emptyDataFrame{color}
{color:#FF}df: org.apache.spark.sql.DataFrame = []{color}

{color:#FF}scala> df.rdd.getNumPartitions{color}
{color:#FF}res0: Int = 0{color}

{color:#FF}scala> df.repartition(1).rdd.getNumPartitions{color}
{color:#FF}res1: Int = 1{color}

{color:#FF}scala> df.repartition(1).rdd.isEmpty(){color}
{color:#FF}[Stage 1:>                                                       
   (0 + 1) /                                                                    
         res2: Boolean = true{color}

Version: 3.2.4

scala> val df = spark.emptyDataFrame
df: org.apache.spark.sql.DataFrame = []

scala> df.rdd.getNumPartitions
res0: Int = 0

scala> df.repartition(1).rdd.getNumPartitions
res1: Int = 0

scala> df.repartition(1).rdd.isEmpty()
res2: Boolean = true

 

{color:#FF}Version: 3.5.0{color}

{color:#FF}scala> val df = spark.emptyDataFrame{color}
{color:#FF}df: org.apache.spark.sql.DataFrame = []{color}

{color:#FF}scala> df.rdd.getNumPartitions{color}
{color:#FF}res0: Int = 0{color}

{color:#FF}scala> df.repartition(1).rdd.getNumPartitions{color}
{color:#FF}res1: Int = 1{color}

{color:#FF}scala> df.repartition(1).rdd.isEmpty(){color}
{color:#FF}[Stage 1:>                                                       
   (0 + 1) /                                                                    
         res2: Boolean = true{color}

 

When we do repartition of 1 on an empty dataframe, the resultant partition is 1 
in version 3.3.x and 3.5.x whereas when I do the same in version 3.2.x, the 
resultant partition is 0. May i know why this behaviour is changed from 3.2.x 
to higher versions. 

 

The reason for raising this as a bug is I have a scenario where my final 
dataframe returns 0 records in EKS(local spark) with single node(driver and 
executor on the sam node) but it returns 1 in EMR both uses a same spark 
version 3.3.3. I'm not sure why this behaves different in both the 
environments. As a interim solution, I had to repartition a empty dataframe if 
my final dataframe is empty which returns 1 for 3.3.3. Would like to know if 
this really a bug or this behaviour exists in the future versions and cannot be 
changed?

 

Because, If we go for a spark upgrade and this behaviour is changed, we will 
face the issue again. 

Please confirm on this.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46090) Support plan fragment level SQL configs in AQE

2023-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46090:
---
Labels: pull-request-available  (was: )

> Support plan fragment level SQL configs  in AQE
> ---
>
> Key: SPARK-46090
> URL: https://issues.apache.org/jira/browse/SPARK-46090
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: XiDuo You
>Priority: Major
>  Labels: pull-request-available
>
> AQE executes query plan stage by stage, so there is a chance to support plan 
> fragment level SQL configs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46090) Support plan fragment level SQL configs in AQE

2023-11-25 Thread XiDuo You (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You updated SPARK-46090:
--
Summary: Support plan fragment level SQL configs  in AQE  (was: Support 
plan fragment level SQL configs)

> Support plan fragment level SQL configs  in AQE
> ---
>
> Key: SPARK-46090
> URL: https://issues.apache.org/jira/browse/SPARK-46090
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: XiDuo You
>Priority: Major
>
> AQE executes query plan stage by stage, so there is a chance to support plan 
> fragment level SQL configs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46090) Support plan fragment level SQL configs

2023-11-25 Thread XiDuo You (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You updated SPARK-46090:
--
Summary: Support plan fragment level SQL configs  (was: Support stage level 
SQL configs)

> Support plan fragment level SQL configs
> ---
>
> Key: SPARK-46090
> URL: https://issues.apache.org/jira/browse/SPARK-46090
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: XiDuo You
>Priority: Major
>
> AQE executes query plan stage by stage, so there is a chance to support stage 
> level SQL configs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46090) Support plan fragment level SQL configs

2023-11-25 Thread XiDuo You (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You updated SPARK-46090:
--
Description: AQE executes query plan stage by stage, so there is a chance 
to support plan fragment level SQL configs.  (was: AQE executes query plan 
stage by stage, so there is a chance to support stage level SQL configs.)

> Support plan fragment level SQL configs
> ---
>
> Key: SPARK-46090
> URL: https://issues.apache.org/jira/browse/SPARK-46090
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: XiDuo You
>Priority: Major
>
> AQE executes query plan stage by stage, so there is a chance to support plan 
> fragment level SQL configs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46104) NPE when broadcast join include null key

2023-11-25 Thread lrz (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz updated SPARK-46104:

Attachment: 1.jpg

> NPE when broadcast join include null key
> 
>
> Key: SPARK-46104
> URL: https://issues.apache.org/jira/browse/SPARK-46104
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: lrz
>Priority: Major
> Attachments: 1.jpg
>
>
> missing initialize for UnsafeProjection at UnsafeHashedRelation, which lead 
> to a npe exception when the key contains null value.
> here is the generated code:
> !1.jpg!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46104) NPE when broadcast join include null key

2023-11-25 Thread lrz (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz updated SPARK-46104:

Description: 
missing initialize for UnsafeProjection at UnsafeHashedRelation, which lead to 
a npe exception when the key contains null value.

here is the generated code:

!1.jpg!

 

  was:
missing initialize for UnsafeProjection at UnsafeHashedRelation, which lead to 
a npe exception when the key contains null value.

here is the generated code:

!image-2023-11-26-10-28-58-066.png!

 


> NPE when broadcast join include null key
> 
>
> Key: SPARK-46104
> URL: https://issues.apache.org/jira/browse/SPARK-46104
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: lrz
>Priority: Major
> Attachments: 1.jpg
>
>
> missing initialize for UnsafeProjection at UnsafeHashedRelation, which lead 
> to a npe exception when the key contains null value.
> here is the generated code:
> !1.jpg!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46104) NPE when broadcast join include null key

2023-11-25 Thread lrz (Jira)
lrz created SPARK-46104:
---

 Summary: NPE when broadcast join include null key
 Key: SPARK-46104
 URL: https://issues.apache.org/jira/browse/SPARK-46104
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.1
Reporter: lrz


missing initialize for UnsafeProjection at UnsafeHashedRelation, which lead to 
a npe exception when the key contains null value.

here is the generated code:

!image-2023-11-26-10-28-58-066.png!

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44850) Heartbeat (sparkconnect scala)

2023-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44850:
---
Labels: pull-request-available  (was: )

> Heartbeat (sparkconnect scala)
> --
>
> Key: SPARK-44850
> URL: https://issues.apache.org/jira/browse/SPARK-44850
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Alice Sayutina
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44209) Expose amount of shuffle data available on the node

2023-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44209:
---
Labels: pull-request-available  (was: )

> Expose amount of shuffle data available on the node
> ---
>
> Key: SPARK-44209
> URL: https://issues.apache.org/jira/browse/SPARK-44209
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Affects Versions: 3.4.1
>Reporter: Deependra Patel
>Priority: Trivial
>  Labels: pull-request-available
>
> [ShuffleMetrics|https://github.com/apache/spark/blob/43f7a86a05ad8c7ec7060607e43d9ca4d0fe4166/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java#L318]
>  doesn't have metrics like 
> "totalShuffleDataBytes" and "numAppsWithShuffleData", these metrics are per 
> node published by External Shuffle Service.
>  
> Adding these metrics would help in - 
> 1. Deciding if we can decommission the node if no shuffle data present
> 2. Better live monitoring of customer's workload to see if there is skewed 
> shuffle data present on the node



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33275) ANSI mode: runtime errors instead of returning null on invalid inputs

2023-11-25 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-33275.

Resolution: Done

> ANSI mode: runtime errors instead of returning null on invalid inputs
> -
>
> Key: SPARK-33275
> URL: https://issues.apache.org/jira/browse/SPARK-33275
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>
> We should respect the ANSI mode in more places. What we have done so far are 
> mostly the overflow check in various operators. This ticket is to track a 
> category of ANSI mode behaviors: operators should throw runtime errors 
> instead of returning null when the input is invalid.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33275) ANSI mode: runtime errors instead of returning null on invalid inputs

2023-11-25 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-33275:
--

Assignee: Apache Spark

> ANSI mode: runtime errors instead of returning null on invalid inputs
> -
>
> Key: SPARK-33275
> URL: https://issues.apache.org/jira/browse/SPARK-33275
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>
> We should respect the ANSI mode in more places. What we have done so far are 
> mostly the overflow check in various operators. This ticket is to track a 
> category of ANSI mode behaviors: operators should throw runtime errors 
> instead of returning null when the input is invalid.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46100) Replace (string|array).size with (string|array).length in module core

2023-11-25 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-46100.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44011
[https://github.com/apache/spark/pull/44011]

> Replace (string|array).size with (string|array).length in module core
> -
>
> Key: SPARK-46100
> URL: https://issues.apache.org/jira/browse/SPARK-46100
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46101) Replace (string|array).size with (string|array).length in module SQL

2023-11-25 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-46101:
-
Priority: Minor  (was: Major)
 Summary: Replace (string|array).size with (string|array).length in module 
SQL  (was: Fix these issue in module sql)

> Replace (string|array).size with (string|array).length in module SQL
> 
>
> Key: SPARK-46101
> URL: https://issues.apache.org/jira/browse/SPARK-46101
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46100) Replace (string|array).size with (string|array).length in module core

2023-11-25 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-46100:
-
Summary: Replace (string|array).size with (string|array).length in module 
core  (was: Fix these issue in module core)

> Replace (string|array).size with (string|array).length in module core
> -
>
> Key: SPARK-46100
> URL: https://issues.apache.org/jira/browse/SPARK-46100
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46100) Replace (string|array).size with (string|array).length in module core

2023-11-25 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-46100:
-
Priority: Minor  (was: Major)

> Replace (string|array).size with (string|array).length in module core
> -
>
> Key: SPARK-46100
> URL: https://issues.apache.org/jira/browse/SPARK-46100
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46102) Prune keys or values from Generate if it is a map type

2023-11-25 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-46102:

Summary: Prune keys or values from Generate if it is a map type  (was: 
Prune keys or values from Generate if it is a map type.)

> Prune keys or values from Generate if it is a map type
> --
>
> Key: SPARK-46102
> URL: https://issues.apache.org/jira/browse/SPARK-46102
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46102) Prune keys or values from Generate if it is a map type.

2023-11-25 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-46102:
---

 Summary: Prune keys or values from Generate if it is a map type.
 Key: SPARK-46102
 URL: https://issues.apache.org/jira/browse/SPARK-46102
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45715) QueryPlanningTracker::measurePhase minor refactor

2023-11-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45715:
--
Fix Version/s: (was: 3.4.2)

> QueryPlanningTracker::measurePhase minor refactor
> -
>
> Key: SPARK-45715
> URL: https://issues.apache.org/jira/browse/SPARK-45715
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.1
>Reporter: xy
>Priority: Minor
>  Labels: pull-request-available
>
> code typo refactor  QueryPlanningTracker::measurePhase 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45715) QueryPlanningTracker::measurePhase minor refactor

2023-11-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45715:
--
Affects Version/s: 4.0.0
   (was: 3.4.1)

> QueryPlanningTracker::measurePhase minor refactor
> -
>
> Key: SPARK-45715
> URL: https://issues.apache.org/jira/browse/SPARK-45715
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: xy
>Priority: Minor
>  Labels: pull-request-available
>
> code typo refactor  QueryPlanningTracker::measurePhase 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46072) Missing .jars when applying code to spark-connect

2023-11-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46072:
--
Fix Version/s: (was: 3.4.2)
   (was: 3.5.1)

> Missing .jars when applying code to spark-connect
> -
>
> Key: SPARK-46072
> URL: https://issues.apache.org/jira/browse/SPARK-46072
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.1
> Environment: python 3.9
> scala 2.12
> spark 3.4.1
> hdfs 3.1.2
> hive 3.1.3
>Reporter: Dmitry Kravchuk
>Priority: Major
>
> I've built spark with following maven code for our onprem hadoop cluster:
> {code:bash}
> ./build/mvn -Pyarn -Pkubernetes -Dhadoop.version=3.1.2 -Pscala-2.12 -Phive 
> -Phive-thriftserver -DskipTests clean package
> {code}
>  
> So I start connect server like that:
> {code:bash}
> ./sbin/start-connect-server.sh --packages 
> org.apache.spark:spark-connect_2.12:3.4.1
> {code}
>  
> When I'm trying to run any code after following code I always have an error 
> from connect-server side:
> {code:bash}
> ./bin/pyspark --remote "sc://localhost"
> {code}
> Error: 
> {code:bash}
>           
> /home/zeppelin/.ivy2/local/org.apache.spark/spark-connect_2.12/3.4.1/jars/spark-connect_2.12.jar
>          central: tried
>           
> https://repo1.maven.org/maven2/org/apache/spark/spark-connect_2.12/3.4.1/spark-connect_2.12-3.4.1.pom
>           -- artifact 
> org.apache.spark#spark-connect_2.12;3.4.1!spark-connect_2.12.jar:
>           
> https://repo1.maven.org/maven2/org/apache/spark/spark-connect_2.12/3.4.1/spark-connect_2.12-3.4.1.jar
>          spark-packages: tried
>           
> https://repos.spark-packages.org/org/apache/spark/spark-connect_2.12/3.4.1/spark-connect_2.12-3.4.1.pom
>           -- artifact 
> org.apache.spark#spark-connect_2.12;3.4.1!spark-connect_2.12.jar:
>           
> https://repos.spark-packages.org/org/apache/spark/spark-connect_2.12/3.4.1/spark-connect_2.12-3.4.1.jar
>                 ::
>                 ::          UNRESOLVED DEPENDENCIES         ::
>                 ::
>                 :: org.apache.spark#spark-connect_2.12;3.4.1: not found
>                 ::
> {code}
>  
> Where am I wrong? I thought it's a firewall issue what it's not cause I fixed 
> to set http_proxy and https_proxy variables with my own credentials.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46072) Missing .jars when applying code to spark-connect

2023-11-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46072:
--
Target Version/s:   (was: 3.5.0)

> Missing .jars when applying code to spark-connect
> -
>
> Key: SPARK-46072
> URL: https://issues.apache.org/jira/browse/SPARK-46072
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.1
> Environment: python 3.9
> scala 2.12
> spark 3.4.1
> hdfs 3.1.2
> hive 3.1.3
>Reporter: Dmitry Kravchuk
>Priority: Major
>
> I've built spark with following maven code for our onprem hadoop cluster:
> {code:bash}
> ./build/mvn -Pyarn -Pkubernetes -Dhadoop.version=3.1.2 -Pscala-2.12 -Phive 
> -Phive-thriftserver -DskipTests clean package
> {code}
>  
> So I start connect server like that:
> {code:bash}
> ./sbin/start-connect-server.sh --packages 
> org.apache.spark:spark-connect_2.12:3.4.1
> {code}
>  
> When I'm trying to run any code after following code I always have an error 
> from connect-server side:
> {code:bash}
> ./bin/pyspark --remote "sc://localhost"
> {code}
> Error: 
> {code:bash}
>           
> /home/zeppelin/.ivy2/local/org.apache.spark/spark-connect_2.12/3.4.1/jars/spark-connect_2.12.jar
>          central: tried
>           
> https://repo1.maven.org/maven2/org/apache/spark/spark-connect_2.12/3.4.1/spark-connect_2.12-3.4.1.pom
>           -- artifact 
> org.apache.spark#spark-connect_2.12;3.4.1!spark-connect_2.12.jar:
>           
> https://repo1.maven.org/maven2/org/apache/spark/spark-connect_2.12/3.4.1/spark-connect_2.12-3.4.1.jar
>          spark-packages: tried
>           
> https://repos.spark-packages.org/org/apache/spark/spark-connect_2.12/3.4.1/spark-connect_2.12-3.4.1.pom
>           -- artifact 
> org.apache.spark#spark-connect_2.12;3.4.1!spark-connect_2.12.jar:
>           
> https://repos.spark-packages.org/org/apache/spark/spark-connect_2.12/3.4.1/spark-connect_2.12-3.4.1.jar
>                 ::
>                 ::          UNRESOLVED DEPENDENCIES         ::
>                 ::
>                 :: org.apache.spark#spark-connect_2.12;3.4.1: not found
>                 ::
> {code}
>  
> Where am I wrong? I thought it's a firewall issue what it's not cause I fixed 
> to set http_proxy and https_proxy variables with my own credentials.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46101) Fix these issue in module sql

2023-11-25 Thread Jiaan Geng (Jira)
Jiaan Geng created SPARK-46101:
--

 Summary: Fix these issue in module sql
 Key: SPARK-46101
 URL: https://issues.apache.org/jira/browse/SPARK-46101
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Jiaan Geng
Assignee: Jiaan Geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46100) Fix these issue in module core

2023-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46100:
---
Labels: pull-request-available  (was: )

> Fix these issue in module core
> --
>
> Key: SPARK-46100
> URL: https://issues.apache.org/jira/browse/SPARK-46100
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46100) Fix these issue in module core

2023-11-25 Thread Jiaan Geng (Jira)
Jiaan Geng created SPARK-46100:
--

 Summary: Fix these issue in module core
 Key: SPARK-46100
 URL: https://issues.apache.org/jira/browse/SPARK-46100
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Jiaan Geng
Assignee: Jiaan Geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46098) Reduce stack depth by replace (string|array).size with (string|array).length

2023-11-25 Thread Jiaan Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng updated SPARK-46098:
---
Description: 
There are a lot of (string|array).size called.
In fact, the size calls the underlying length, this behavior increase the stack 
length.
We should call (string|array).length directly.

We also get the compile waring Replace .size with .length on arrays and strings

  was:
There are a lot of (string|array).size called.
In fact, the size calls the underlying length, this behavior increase the stack 
length.
We should call 

# Replace .size with .length on arrays and strings


> Reduce stack depth by replace (string|array).size with (string|array).length
> 
>
> Key: SPARK-46098
> URL: https://issues.apache.org/jira/browse/SPARK-46098
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>
> There are a lot of (string|array).size called.
> In fact, the size calls the underlying length, this behavior increase the 
> stack length.
> We should call (string|array).length directly.
> We also get the compile waring Replace .size with .length on arrays and 
> strings



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46098) Reduce stack depth by replace (string|array).size with (string|array).length

2023-11-25 Thread Jiaan Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng updated SPARK-46098:
---
Description: 
There are a lot of (string|array).size called.
In fact, the size calls the underlying length, this behavior increase the stack 
length.
We should call 

# Replace .size with .length on arrays and strings

  was:
There are a lot of # Replace .size with .length on arrays and strings

# Replace .size with .length on arrays and strings


> Reduce stack depth by replace (string|array).size with (string|array).length
> 
>
> Key: SPARK-46098
> URL: https://issues.apache.org/jira/browse/SPARK-46098
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>
> There are a lot of (string|array).size called.
> In fact, the size calls the underlying length, this behavior increase the 
> stack length.
> We should call 
> # Replace .size with .length on arrays and strings



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46098) Reduce stack depth by replace (string|array).size with (string|array).length

2023-11-25 Thread Jiaan Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng updated SPARK-46098:
---
Description: 
There are a lot of # Replace .size with .length on arrays and strings

# Replace .size with .length on arrays and strings

  was:
There are a lot of 

# Replace .size with .length on arrays and strings


> Reduce stack depth by replace (string|array).size with (string|array).length
> 
>
> Key: SPARK-46098
> URL: https://issues.apache.org/jira/browse/SPARK-46098
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>
> There are a lot of # Replace .size with .length on arrays and strings
> # Replace .size with .length on arrays and strings



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org