[jira] [Resolved] (SPARK-40395) Provide query context in AnalysisException
[ https://issues.apache.org/jira/browse/SPARK-40395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-40395. Resolution: Implemented > Provide query context in AnalysisException > -- > > Key: SPARK-40395 > URL: https://issues.apache.org/jira/browse/SPARK-40395 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Provide query context in AnalysisException for better error messages -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40395) Provide query context in AnalysisException
[ https://issues.apache.org/jira/browse/SPARK-40395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789768#comment-17789768 ] Gengliang Wang commented on SPARK-40395: Resolved in [https://github.com/apache/spark/pull/37841.] The PR was using a wrong jira. > Provide query context in AnalysisException > -- > > Key: SPARK-40395 > URL: https://issues.apache.org/jira/browse/SPARK-40395 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Provide query context in AnalysisException for better error messages -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35161) Error-handling SQL functions
[ https://issues.apache.org/jira/browse/SPARK-35161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-35161. Resolution: Done > Error-handling SQL functions > > > Key: SPARK-35161 > URL: https://issues.apache.org/jira/browse/SPARK-35161 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Create new Error-handling version SQL functions for existing SQL > functions/operators, which returns NULL if overflow/error occurs. So that: > 1. Users can manage to finish queries without interruptions in ANSI mode. > 2. Users can get NULLs instead of unreasonable results if overflow occurs > when ANSI mode is off. > For example, the behavior of the following SQL operations is unreasonable: > {code:java} > 2147483647 + 2 => -2147483647 > CAST(2147483648L AS INT) => -2147483648 > {code} > With the new safe version SQL functions: > {code:java} > TRY_ADD(2147483647, 2) => null > TRY_CAST(2147483648L AS INT) => null > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35161) Error-handling SQL functions
[ https://issues.apache.org/jira/browse/SPARK-35161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-35161: -- Assignee: Gengliang Wang > Error-handling SQL functions > > > Key: SPARK-35161 > URL: https://issues.apache.org/jira/browse/SPARK-35161 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Create new Error-handling version SQL functions for existing SQL > functions/operators, which returns NULL if overflow/error occurs. So that: > 1. Users can manage to finish queries without interruptions in ANSI mode. > 2. Users can get NULLs instead of unreasonable results if overflow occurs > when ANSI mode is off. > For example, the behavior of the following SQL operations is unreasonable: > {code:java} > 2147483647 + 2 => -2147483647 > CAST(2147483648L AS INT) => -2147483648 > {code} > With the new safe version SQL functions: > {code:java} > TRY_ADD(2147483647, 2) => null > TRY_CAST(2147483648L AS INT) => null > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46105) df.emptyDataFrame shows 1 if we repartition
[ https://issues.apache.org/jira/browse/SPARK-46105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dharani_sugumar updated SPARK-46105: Attachment: Screenshot 2023-11-26 at 11.54.58 AM.png > df.emptyDataFrame shows 1 if we repartition > --- > > Key: SPARK-46105 > URL: https://issues.apache.org/jira/browse/SPARK-46105 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.3 > Environment: EKS > EMR >Reporter: dharani_sugumar >Priority: Major > Attachments: Screenshot 2023-11-26 at 11.54.58 AM.png > > > {color:#FF}Version: 3.3.3{color} > > {color:#FF}scala> val df = spark.emptyDataFrame{color} > {color:#FF}df: org.apache.spark.sql.DataFrame = []{color} > {color:#FF}scala> df.rdd.getNumPartitions{color} > {color:#FF}res0: Int = 0{color} > {color:#FF}scala> df.repartition(1).rdd.getNumPartitions{color} > {color:#FF}res1: Int = 1{color} > {color:#FF}scala> df.repartition(1).rdd.isEmpty(){color} > {color:#FF}[Stage 1:> > (0 + 1) / > res2: Boolean = true{color} > Version: 3.2.4 > scala> val df = spark.emptyDataFrame > df: org.apache.spark.sql.DataFrame = [] > scala> df.rdd.getNumPartitions > res0: Int = 0 > scala> df.repartition(1).rdd.getNumPartitions > res1: Int = 0 > scala> df.repartition(1).rdd.isEmpty() > res2: Boolean = true > > {color:#FF}Version: 3.5.0{color} > {color:#FF}scala> val df = spark.emptyDataFrame{color} > {color:#FF}df: org.apache.spark.sql.DataFrame = []{color} > {color:#FF}scala> df.rdd.getNumPartitions{color} > {color:#FF}res0: Int = 0{color} > {color:#FF}scala> df.repartition(1).rdd.getNumPartitions{color} > {color:#FF}res1: Int = 1{color} > {color:#FF}scala> df.repartition(1).rdd.isEmpty(){color} > {color:#FF}[Stage 1:> > (0 + 1) / > res2: Boolean = true{color} > > When we do repartition of 1 on an empty dataframe, the resultant partition is > 1 in version 3.3.x and 3.5.x whereas when I do the same in version 3.2.x, the > resultant partition is 0. May i know why this behaviour is changed from 3.2.x > to higher versions. > > The reason for raising this as a bug is I have a scenario where my final > dataframe returns 0 records in EKS(local spark) with single node(driver and > executor on the sam node) but it returns 1 in EMR both uses a same spark > version 3.3.3. I'm not sure why this behaves different in both the > environments. As a interim solution, I had to repartition a empty dataframe > if my final dataframe is empty which returns 1 for 3.3.3. Would like to know > if this really a bug or this behaviour exists in the future versions and > cannot be changed? > > Because, If we go for a spark upgrade and this behaviour is changed, we will > face the issue again. > Please confirm on this. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46105) df.emptyDataFrame shows 1 if we repartition(1) in Spark 3.3.x and above
[ https://issues.apache.org/jira/browse/SPARK-46105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dharani_sugumar updated SPARK-46105: Summary: df.emptyDataFrame shows 1 if we repartition(1) in Spark 3.3.x and above (was: df.emptyDataFrame shows 1 if we repartition) > df.emptyDataFrame shows 1 if we repartition(1) in Spark 3.3.x and above > --- > > Key: SPARK-46105 > URL: https://issues.apache.org/jira/browse/SPARK-46105 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.3 > Environment: EKS > EMR >Reporter: dharani_sugumar >Priority: Major > Attachments: Screenshot 2023-11-26 at 11.54.58 AM.png > > > {color:#FF}Version: 3.3.3{color} > > {color:#FF}scala> val df = spark.emptyDataFrame{color} > {color:#FF}df: org.apache.spark.sql.DataFrame = []{color} > {color:#FF}scala> df.rdd.getNumPartitions{color} > {color:#FF}res0: Int = 0{color} > {color:#FF}scala> df.repartition(1).rdd.getNumPartitions{color} > {color:#FF}res1: Int = 1{color} > {color:#FF}scala> df.repartition(1).rdd.isEmpty(){color} > {color:#FF}[Stage 1:> > (0 + 1) / > res2: Boolean = true{color} > Version: 3.2.4 > scala> val df = spark.emptyDataFrame > df: org.apache.spark.sql.DataFrame = [] > scala> df.rdd.getNumPartitions > res0: Int = 0 > scala> df.repartition(1).rdd.getNumPartitions > res1: Int = 0 > scala> df.repartition(1).rdd.isEmpty() > res2: Boolean = true > > {color:#FF}Version: 3.5.0{color} > {color:#FF}scala> val df = spark.emptyDataFrame{color} > {color:#FF}df: org.apache.spark.sql.DataFrame = []{color} > {color:#FF}scala> df.rdd.getNumPartitions{color} > {color:#FF}res0: Int = 0{color} > {color:#FF}scala> df.repartition(1).rdd.getNumPartitions{color} > {color:#FF}res1: Int = 1{color} > {color:#FF}scala> df.repartition(1).rdd.isEmpty(){color} > {color:#FF}[Stage 1:> > (0 + 1) / > res2: Boolean = true{color} > > When we do repartition of 1 on an empty dataframe, the resultant partition is > 1 in version 3.3.x and 3.5.x whereas when I do the same in version 3.2.x, the > resultant partition is 0. May i know why this behaviour is changed from 3.2.x > to higher versions. > > The reason for raising this as a bug is I have a scenario where my final > dataframe returns 0 records in EKS(local spark) with single node(driver and > executor on the sam node) but it returns 1 in EMR both uses a same spark > version 3.3.3. I'm not sure why this behaves different in both the > environments. As a interim solution, I had to repartition a empty dataframe > if my final dataframe is empty which returns 1 for 3.3.3. Would like to know > if this really a bug or this behaviour exists in the future versions and > cannot be changed? > > Because, If we go for a spark upgrade and this behaviour is changed, we will > face the issue again. > Please confirm on this. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46105) df.emptyDataFrame shows 1 if we repartition
dharani_sugumar created SPARK-46105: --- Summary: df.emptyDataFrame shows 1 if we repartition Key: SPARK-46105 URL: https://issues.apache.org/jira/browse/SPARK-46105 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.3.3 Environment: EKS EMR Reporter: dharani_sugumar {color:#FF}Version: 3.3.3{color} {color:#FF}scala> val df = spark.emptyDataFrame{color} {color:#FF}df: org.apache.spark.sql.DataFrame = []{color} {color:#FF}scala> df.rdd.getNumPartitions{color} {color:#FF}res0: Int = 0{color} {color:#FF}scala> df.repartition(1).rdd.getNumPartitions{color} {color:#FF}res1: Int = 1{color} {color:#FF}scala> df.repartition(1).rdd.isEmpty(){color} {color:#FF}[Stage 1:> (0 + 1) / res2: Boolean = true{color} Version: 3.2.4 scala> val df = spark.emptyDataFrame df: org.apache.spark.sql.DataFrame = [] scala> df.rdd.getNumPartitions res0: Int = 0 scala> df.repartition(1).rdd.getNumPartitions res1: Int = 0 scala> df.repartition(1).rdd.isEmpty() res2: Boolean = true {color:#FF}Version: 3.5.0{color} {color:#FF}scala> val df = spark.emptyDataFrame{color} {color:#FF}df: org.apache.spark.sql.DataFrame = []{color} {color:#FF}scala> df.rdd.getNumPartitions{color} {color:#FF}res0: Int = 0{color} {color:#FF}scala> df.repartition(1).rdd.getNumPartitions{color} {color:#FF}res1: Int = 1{color} {color:#FF}scala> df.repartition(1).rdd.isEmpty(){color} {color:#FF}[Stage 1:> (0 + 1) / res2: Boolean = true{color} When we do repartition of 1 on an empty dataframe, the resultant partition is 1 in version 3.3.x and 3.5.x whereas when I do the same in version 3.2.x, the resultant partition is 0. May i know why this behaviour is changed from 3.2.x to higher versions. The reason for raising this as a bug is I have a scenario where my final dataframe returns 0 records in EKS(local spark) with single node(driver and executor on the sam node) but it returns 1 in EMR both uses a same spark version 3.3.3. I'm not sure why this behaves different in both the environments. As a interim solution, I had to repartition a empty dataframe if my final dataframe is empty which returns 1 for 3.3.3. Would like to know if this really a bug or this behaviour exists in the future versions and cannot be changed? Because, If we go for a spark upgrade and this behaviour is changed, we will face the issue again. Please confirm on this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46090) Support plan fragment level SQL configs in AQE
[ https://issues.apache.org/jira/browse/SPARK-46090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46090: --- Labels: pull-request-available (was: ) > Support plan fragment level SQL configs in AQE > --- > > Key: SPARK-46090 > URL: https://issues.apache.org/jira/browse/SPARK-46090 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: XiDuo You >Priority: Major > Labels: pull-request-available > > AQE executes query plan stage by stage, so there is a chance to support plan > fragment level SQL configs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46090) Support plan fragment level SQL configs in AQE
[ https://issues.apache.org/jira/browse/SPARK-46090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiDuo You updated SPARK-46090: -- Summary: Support plan fragment level SQL configs in AQE (was: Support plan fragment level SQL configs) > Support plan fragment level SQL configs in AQE > --- > > Key: SPARK-46090 > URL: https://issues.apache.org/jira/browse/SPARK-46090 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: XiDuo You >Priority: Major > > AQE executes query plan stage by stage, so there is a chance to support plan > fragment level SQL configs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46090) Support plan fragment level SQL configs
[ https://issues.apache.org/jira/browse/SPARK-46090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiDuo You updated SPARK-46090: -- Summary: Support plan fragment level SQL configs (was: Support stage level SQL configs) > Support plan fragment level SQL configs > --- > > Key: SPARK-46090 > URL: https://issues.apache.org/jira/browse/SPARK-46090 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: XiDuo You >Priority: Major > > AQE executes query plan stage by stage, so there is a chance to support stage > level SQL configs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46090) Support plan fragment level SQL configs
[ https://issues.apache.org/jira/browse/SPARK-46090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiDuo You updated SPARK-46090: -- Description: AQE executes query plan stage by stage, so there is a chance to support plan fragment level SQL configs. (was: AQE executes query plan stage by stage, so there is a chance to support stage level SQL configs.) > Support plan fragment level SQL configs > --- > > Key: SPARK-46090 > URL: https://issues.apache.org/jira/browse/SPARK-46090 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: XiDuo You >Priority: Major > > AQE executes query plan stage by stage, so there is a chance to support plan > fragment level SQL configs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46104) NPE when broadcast join include null key
[ https://issues.apache.org/jira/browse/SPARK-46104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lrz updated SPARK-46104: Attachment: 1.jpg > NPE when broadcast join include null key > > > Key: SPARK-46104 > URL: https://issues.apache.org/jira/browse/SPARK-46104 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: lrz >Priority: Major > Attachments: 1.jpg > > > missing initialize for UnsafeProjection at UnsafeHashedRelation, which lead > to a npe exception when the key contains null value. > here is the generated code: > !1.jpg! > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46104) NPE when broadcast join include null key
[ https://issues.apache.org/jira/browse/SPARK-46104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lrz updated SPARK-46104: Description: missing initialize for UnsafeProjection at UnsafeHashedRelation, which lead to a npe exception when the key contains null value. here is the generated code: !1.jpg! was: missing initialize for UnsafeProjection at UnsafeHashedRelation, which lead to a npe exception when the key contains null value. here is the generated code: !image-2023-11-26-10-28-58-066.png! > NPE when broadcast join include null key > > > Key: SPARK-46104 > URL: https://issues.apache.org/jira/browse/SPARK-46104 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: lrz >Priority: Major > Attachments: 1.jpg > > > missing initialize for UnsafeProjection at UnsafeHashedRelation, which lead > to a npe exception when the key contains null value. > here is the generated code: > !1.jpg! > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46104) NPE when broadcast join include null key
lrz created SPARK-46104: --- Summary: NPE when broadcast join include null key Key: SPARK-46104 URL: https://issues.apache.org/jira/browse/SPARK-46104 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1 Reporter: lrz missing initialize for UnsafeProjection at UnsafeHashedRelation, which lead to a npe exception when the key contains null value. here is the generated code: !image-2023-11-26-10-28-58-066.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44850) Heartbeat (sparkconnect scala)
[ https://issues.apache.org/jira/browse/SPARK-44850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44850: --- Labels: pull-request-available (was: ) > Heartbeat (sparkconnect scala) > -- > > Key: SPARK-44850 > URL: https://issues.apache.org/jira/browse/SPARK-44850 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Alice Sayutina >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44209) Expose amount of shuffle data available on the node
[ https://issues.apache.org/jira/browse/SPARK-44209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44209: --- Labels: pull-request-available (was: ) > Expose amount of shuffle data available on the node > --- > > Key: SPARK-44209 > URL: https://issues.apache.org/jira/browse/SPARK-44209 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Affects Versions: 3.4.1 >Reporter: Deependra Patel >Priority: Trivial > Labels: pull-request-available > > [ShuffleMetrics|https://github.com/apache/spark/blob/43f7a86a05ad8c7ec7060607e43d9ca4d0fe4166/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java#L318] > doesn't have metrics like > "totalShuffleDataBytes" and "numAppsWithShuffleData", these metrics are per > node published by External Shuffle Service. > > Adding these metrics would help in - > 1. Deciding if we can decommission the node if no shuffle data present > 2. Better live monitoring of customer's workload to see if there is skewed > shuffle data present on the node -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33275) ANSI mode: runtime errors instead of returning null on invalid inputs
[ https://issues.apache.org/jira/browse/SPARK-33275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-33275. Resolution: Done > ANSI mode: runtime errors instead of returning null on invalid inputs > - > > Key: SPARK-33275 > URL: https://issues.apache.org/jira/browse/SPARK-33275 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > > We should respect the ANSI mode in more places. What we have done so far are > mostly the overflow check in various operators. This ticket is to track a > category of ANSI mode behaviors: operators should throw runtime errors > instead of returning null when the input is invalid. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33275) ANSI mode: runtime errors instead of returning null on invalid inputs
[ https://issues.apache.org/jira/browse/SPARK-33275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-33275: -- Assignee: Apache Spark > ANSI mode: runtime errors instead of returning null on invalid inputs > - > > Key: SPARK-33275 > URL: https://issues.apache.org/jira/browse/SPARK-33275 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > > We should respect the ANSI mode in more places. What we have done so far are > mostly the overflow check in various operators. This ticket is to track a > category of ANSI mode behaviors: operators should throw runtime errors > instead of returning null when the input is invalid. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46100) Replace (string|array).size with (string|array).length in module core
[ https://issues.apache.org/jira/browse/SPARK-46100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-46100. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44011 [https://github.com/apache/spark/pull/44011] > Replace (string|array).size with (string|array).length in module core > - > > Key: SPARK-46100 > URL: https://issues.apache.org/jira/browse/SPARK-46100 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46101) Replace (string|array).size with (string|array).length in module SQL
[ https://issues.apache.org/jira/browse/SPARK-46101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-46101: - Priority: Minor (was: Major) Summary: Replace (string|array).size with (string|array).length in module SQL (was: Fix these issue in module sql) > Replace (string|array).size with (string|array).length in module SQL > > > Key: SPARK-46101 > URL: https://issues.apache.org/jira/browse/SPARK-46101 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46100) Replace (string|array).size with (string|array).length in module core
[ https://issues.apache.org/jira/browse/SPARK-46100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-46100: - Summary: Replace (string|array).size with (string|array).length in module core (was: Fix these issue in module core) > Replace (string|array).size with (string|array).length in module core > - > > Key: SPARK-46100 > URL: https://issues.apache.org/jira/browse/SPARK-46100 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46100) Replace (string|array).size with (string|array).length in module core
[ https://issues.apache.org/jira/browse/SPARK-46100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-46100: - Priority: Minor (was: Major) > Replace (string|array).size with (string|array).length in module core > - > > Key: SPARK-46100 > URL: https://issues.apache.org/jira/browse/SPARK-46100 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46102) Prune keys or values from Generate if it is a map type
[ https://issues.apache.org/jira/browse/SPARK-46102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-46102: Summary: Prune keys or values from Generate if it is a map type (was: Prune keys or values from Generate if it is a map type.) > Prune keys or values from Generate if it is a map type > -- > > Key: SPARK-46102 > URL: https://issues.apache.org/jira/browse/SPARK-46102 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46102) Prune keys or values from Generate if it is a map type.
Yuming Wang created SPARK-46102: --- Summary: Prune keys or values from Generate if it is a map type. Key: SPARK-46102 URL: https://issues.apache.org/jira/browse/SPARK-46102 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Yuming Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45715) QueryPlanningTracker::measurePhase minor refactor
[ https://issues.apache.org/jira/browse/SPARK-45715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45715: -- Fix Version/s: (was: 3.4.2) > QueryPlanningTracker::measurePhase minor refactor > - > > Key: SPARK-45715 > URL: https://issues.apache.org/jira/browse/SPARK-45715 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.1 >Reporter: xy >Priority: Minor > Labels: pull-request-available > > code typo refactor QueryPlanningTracker::measurePhase -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45715) QueryPlanningTracker::measurePhase minor refactor
[ https://issues.apache.org/jira/browse/SPARK-45715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45715: -- Affects Version/s: 4.0.0 (was: 3.4.1) > QueryPlanningTracker::measurePhase minor refactor > - > > Key: SPARK-45715 > URL: https://issues.apache.org/jira/browse/SPARK-45715 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: xy >Priority: Minor > Labels: pull-request-available > > code typo refactor QueryPlanningTracker::measurePhase -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46072) Missing .jars when applying code to spark-connect
[ https://issues.apache.org/jira/browse/SPARK-46072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-46072: -- Fix Version/s: (was: 3.4.2) (was: 3.5.1) > Missing .jars when applying code to spark-connect > - > > Key: SPARK-46072 > URL: https://issues.apache.org/jira/browse/SPARK-46072 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.1 > Environment: python 3.9 > scala 2.12 > spark 3.4.1 > hdfs 3.1.2 > hive 3.1.3 >Reporter: Dmitry Kravchuk >Priority: Major > > I've built spark with following maven code for our onprem hadoop cluster: > {code:bash} > ./build/mvn -Pyarn -Pkubernetes -Dhadoop.version=3.1.2 -Pscala-2.12 -Phive > -Phive-thriftserver -DskipTests clean package > {code} > > So I start connect server like that: > {code:bash} > ./sbin/start-connect-server.sh --packages > org.apache.spark:spark-connect_2.12:3.4.1 > {code} > > When I'm trying to run any code after following code I always have an error > from connect-server side: > {code:bash} > ./bin/pyspark --remote "sc://localhost" > {code} > Error: > {code:bash} > > /home/zeppelin/.ivy2/local/org.apache.spark/spark-connect_2.12/3.4.1/jars/spark-connect_2.12.jar > central: tried > > https://repo1.maven.org/maven2/org/apache/spark/spark-connect_2.12/3.4.1/spark-connect_2.12-3.4.1.pom > -- artifact > org.apache.spark#spark-connect_2.12;3.4.1!spark-connect_2.12.jar: > > https://repo1.maven.org/maven2/org/apache/spark/spark-connect_2.12/3.4.1/spark-connect_2.12-3.4.1.jar > spark-packages: tried > > https://repos.spark-packages.org/org/apache/spark/spark-connect_2.12/3.4.1/spark-connect_2.12-3.4.1.pom > -- artifact > org.apache.spark#spark-connect_2.12;3.4.1!spark-connect_2.12.jar: > > https://repos.spark-packages.org/org/apache/spark/spark-connect_2.12/3.4.1/spark-connect_2.12-3.4.1.jar > :: > :: UNRESOLVED DEPENDENCIES :: > :: > :: org.apache.spark#spark-connect_2.12;3.4.1: not found > :: > {code} > > Where am I wrong? I thought it's a firewall issue what it's not cause I fixed > to set http_proxy and https_proxy variables with my own credentials. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46072) Missing .jars when applying code to spark-connect
[ https://issues.apache.org/jira/browse/SPARK-46072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-46072: -- Target Version/s: (was: 3.5.0) > Missing .jars when applying code to spark-connect > - > > Key: SPARK-46072 > URL: https://issues.apache.org/jira/browse/SPARK-46072 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.1 > Environment: python 3.9 > scala 2.12 > spark 3.4.1 > hdfs 3.1.2 > hive 3.1.3 >Reporter: Dmitry Kravchuk >Priority: Major > > I've built spark with following maven code for our onprem hadoop cluster: > {code:bash} > ./build/mvn -Pyarn -Pkubernetes -Dhadoop.version=3.1.2 -Pscala-2.12 -Phive > -Phive-thriftserver -DskipTests clean package > {code} > > So I start connect server like that: > {code:bash} > ./sbin/start-connect-server.sh --packages > org.apache.spark:spark-connect_2.12:3.4.1 > {code} > > When I'm trying to run any code after following code I always have an error > from connect-server side: > {code:bash} > ./bin/pyspark --remote "sc://localhost" > {code} > Error: > {code:bash} > > /home/zeppelin/.ivy2/local/org.apache.spark/spark-connect_2.12/3.4.1/jars/spark-connect_2.12.jar > central: tried > > https://repo1.maven.org/maven2/org/apache/spark/spark-connect_2.12/3.4.1/spark-connect_2.12-3.4.1.pom > -- artifact > org.apache.spark#spark-connect_2.12;3.4.1!spark-connect_2.12.jar: > > https://repo1.maven.org/maven2/org/apache/spark/spark-connect_2.12/3.4.1/spark-connect_2.12-3.4.1.jar > spark-packages: tried > > https://repos.spark-packages.org/org/apache/spark/spark-connect_2.12/3.4.1/spark-connect_2.12-3.4.1.pom > -- artifact > org.apache.spark#spark-connect_2.12;3.4.1!spark-connect_2.12.jar: > > https://repos.spark-packages.org/org/apache/spark/spark-connect_2.12/3.4.1/spark-connect_2.12-3.4.1.jar > :: > :: UNRESOLVED DEPENDENCIES :: > :: > :: org.apache.spark#spark-connect_2.12;3.4.1: not found > :: > {code} > > Where am I wrong? I thought it's a firewall issue what it's not cause I fixed > to set http_proxy and https_proxy variables with my own credentials. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46101) Fix these issue in module sql
Jiaan Geng created SPARK-46101: -- Summary: Fix these issue in module sql Key: SPARK-46101 URL: https://issues.apache.org/jira/browse/SPARK-46101 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46100) Fix these issue in module core
[ https://issues.apache.org/jira/browse/SPARK-46100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46100: --- Labels: pull-request-available (was: ) > Fix these issue in module core > -- > > Key: SPARK-46100 > URL: https://issues.apache.org/jira/browse/SPARK-46100 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46100) Fix these issue in module core
Jiaan Geng created SPARK-46100: -- Summary: Fix these issue in module core Key: SPARK-46100 URL: https://issues.apache.org/jira/browse/SPARK-46100 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46098) Reduce stack depth by replace (string|array).size with (string|array).length
[ https://issues.apache.org/jira/browse/SPARK-46098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-46098: --- Description: There are a lot of (string|array).size called. In fact, the size calls the underlying length, this behavior increase the stack length. We should call (string|array).length directly. We also get the compile waring Replace .size with .length on arrays and strings was: There are a lot of (string|array).size called. In fact, the size calls the underlying length, this behavior increase the stack length. We should call # Replace .size with .length on arrays and strings > Reduce stack depth by replace (string|array).size with (string|array).length > > > Key: SPARK-46098 > URL: https://issues.apache.org/jira/browse/SPARK-46098 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > There are a lot of (string|array).size called. > In fact, the size calls the underlying length, this behavior increase the > stack length. > We should call (string|array).length directly. > We also get the compile waring Replace .size with .length on arrays and > strings -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46098) Reduce stack depth by replace (string|array).size with (string|array).length
[ https://issues.apache.org/jira/browse/SPARK-46098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-46098: --- Description: There are a lot of (string|array).size called. In fact, the size calls the underlying length, this behavior increase the stack length. We should call # Replace .size with .length on arrays and strings was: There are a lot of # Replace .size with .length on arrays and strings # Replace .size with .length on arrays and strings > Reduce stack depth by replace (string|array).size with (string|array).length > > > Key: SPARK-46098 > URL: https://issues.apache.org/jira/browse/SPARK-46098 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > There are a lot of (string|array).size called. > In fact, the size calls the underlying length, this behavior increase the > stack length. > We should call > # Replace .size with .length on arrays and strings -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46098) Reduce stack depth by replace (string|array).size with (string|array).length
[ https://issues.apache.org/jira/browse/SPARK-46098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-46098: --- Description: There are a lot of # Replace .size with .length on arrays and strings # Replace .size with .length on arrays and strings was: There are a lot of # Replace .size with .length on arrays and strings > Reduce stack depth by replace (string|array).size with (string|array).length > > > Key: SPARK-46098 > URL: https://issues.apache.org/jira/browse/SPARK-46098 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > There are a lot of # Replace .size with .length on arrays and strings > # Replace .size with .length on arrays and strings -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org