[jira] [Updated] (SPARK-46119) Override toString method for UnresolvedAlias
[ https://issues.apache.org/jira/browse/SPARK-46119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46119: --- Labels: pull-request-available (was: ) > Override toString method for UnresolvedAlias > > > Key: SPARK-46119 > URL: https://issues.apache.org/jira/browse/SPARK-46119 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46120) Remove helper function DataFrame.withPlan
[ https://issues.apache.org/jira/browse/SPARK-46120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46120: --- Labels: pull-request-available (was: ) > Remove helper function DataFrame.withPlan > - > > Key: SPARK-46120 > URL: https://issues.apache.org/jira/browse/SPARK-46120 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46120) Remove helper function DataFrame.withPlan
Ruifeng Zheng created SPARK-46120: - Summary: Remove helper function DataFrame.withPlan Key: SPARK-46120 URL: https://issues.apache.org/jira/browse/SPARK-46120 Project: Spark Issue Type: Bug Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46119) Override toString method for UnresolvedAlias
Yuming Wang created SPARK-46119: --- Summary: Override toString method for UnresolvedAlias Key: SPARK-46119 URL: https://issues.apache.org/jira/browse/SPARK-46119 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Yuming Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46117) Enhancing readability of PySpark API reference by hiding verbose typehints.
[ https://issues.apache.org/jira/browse/SPARK-46117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46117: --- Labels: pull-request-available (was: ) > Enhancing readability of PySpark API reference by hiding verbose typehints. > --- > > Key: SPARK-46117 > URL: https://issues.apache.org/jira/browse/SPARK-46117 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Currently, the PySpark API documentation displays all type hints in the > signatures, which can make the documentation appear cluttered and less > readable. By setting `autodoc_typehints` to 'none', we can achieve a cleaner > and more concise presentation of our API, similar to how the Pandas > documentation handles type hints. This approach has been effective in Pandas, > making the documentation more approachable and easier to understand, > especially for newcomers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46118) Use `SparkSession.sessionState.conf` instead of `sqlContext.conf`
Yang Jie created SPARK-46118: Summary: Use `SparkSession.sessionState.conf` instead of `sqlContext.conf` Key: SPARK-46118 URL: https://issues.apache.org/jira/browse/SPARK-46118 Project: Spark Issue Type: Improvement Components: Connect, SQL Affects Versions: 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46118) Use `SparkSession.sessionState.conf` instead of `sqlContext.conf`
[ https://issues.apache.org/jira/browse/SPARK-46118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46118: --- Labels: pull-request-available (was: ) > Use `SparkSession.sessionState.conf` instead of `sqlContext.conf` > - > > Key: SPARK-46118 > URL: https://issues.apache.org/jira/browse/SPARK-46118 > Project: Spark > Issue Type: Improvement > Components: Connect, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46117) Enhancing readability of PySpark API reference by hiding verbose typehints.
Haejoon Lee created SPARK-46117: --- Summary: Enhancing readability of PySpark API reference by hiding verbose typehints. Key: SPARK-46117 URL: https://issues.apache.org/jira/browse/SPARK-46117 Project: Spark Issue Type: Bug Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee Currently, the PySpark API documentation displays all type hints in the signatures, which can make the documentation appear cluttered and less readable. By setting `autodoc_typehints` to 'none', we can achieve a cleaner and more concise presentation of our API, similar to how the Pandas documentation handles type hints. This approach has been effective in Pandas, making the documentation more approachable and easier to understand, especially for newcomers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46114) Define IndexError for PySpark error framework
[ https://issues.apache.org/jira/browse/SPARK-46114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46114: Assignee: Hyukjin Kwon > Define IndexError for PySpark error framework > - > > Key: SPARK-46114 > URL: https://issues.apache.org/jira/browse/SPARK-46114 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46114) Define IndexError for PySpark error framework
[ https://issues.apache.org/jira/browse/SPARK-46114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46114. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44028 [https://github.com/apache/spark/pull/44028] > Define IndexError for PySpark error framework > - > > Key: SPARK-46114 > URL: https://issues.apache.org/jira/browse/SPARK-46114 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46116) Adding "Q Support" and "Mailing Lists" link into PySpark doc homepage.
[ https://issues.apache.org/jira/browse/SPARK-46116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46116: Description: It is aimed at improving user engagement and providing quick access to community support and discussions. This approach is inspired by the [Pandas documentation](https://pandas.pydata.org/docs/index.html), which effectively uses a similar section for community engagement. The "Q Support" will lead users to a curated list of StackOverflow questions tagged with `pyspark`, while the mailing lists will offer platforms for deeper discussions and insights within the Spark community. was:The addition of the "Q Support" link provides quick access to the community-driven Q platform, StackOverflow, where users can seek help and contribute to discussions about PySpark. It enhances the user experience by connecting the documentation with a dynamic and interactive community resource. > Adding "Q Support" and "Mailing Lists" link into PySpark doc homepage. > > > Key: SPARK-46116 > URL: https://issues.apache.org/jira/browse/SPARK-46116 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > It is aimed at improving user engagement and providing quick access to > community support and discussions. This approach is inspired by the [Pandas > documentation](https://pandas.pydata.org/docs/index.html), which effectively > uses a similar section for community engagement. > The "Q Support" will lead users to a curated list of StackOverflow > questions tagged with `pyspark`, while the mailing lists will offer platforms > for deeper discussions and insights within the Spark community. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46116) Adding "Q Support" and "Mailing Lists" link into PySpark doc homepage.
[ https://issues.apache.org/jira/browse/SPARK-46116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46116: Summary: Adding "Q Support" and "Mailing Lists" link into PySpark doc homepage. (was: Enriching PySpark doc with "Useful links" including Q Support and Mailing Lists) > Adding "Q Support" and "Mailing Lists" link into PySpark doc homepage. > > > Key: SPARK-46116 > URL: https://issues.apache.org/jira/browse/SPARK-46116 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > The addition of the "Q Support" link provides quick access to the > community-driven Q platform, StackOverflow, where users can seek help and > contribute to discussions about PySpark. It enhances the user experience by > connecting the documentation with a dynamic and interactive community > resource. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46116) Adding "Q Support" and "Mailing Lists" link into PySpark doc homepage.
[ https://issues.apache.org/jira/browse/SPARK-46116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46116: --- Labels: pull-request-available (was: ) > Adding "Q Support" and "Mailing Lists" link into PySpark doc homepage. > > > Key: SPARK-46116 > URL: https://issues.apache.org/jira/browse/SPARK-46116 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > The addition of the "Q Support" link provides quick access to the > community-driven Q platform, StackOverflow, where users can seek help and > contribute to discussions about PySpark. It enhances the user experience by > connecting the documentation with a dynamic and interactive community > resource. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46116) Enriching PySpark doc with "Useful links" including Q Support and Mailing Lists
[ https://issues.apache.org/jira/browse/SPARK-46116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46116: Summary: Enriching PySpark doc with "Useful links" including Q Support and Mailing Lists (was: Enriching PySpark doc with "Useful links" Including Q Support and Mailing Lists) > Enriching PySpark doc with "Useful links" including Q Support and Mailing > Lists > - > > Key: SPARK-46116 > URL: https://issues.apache.org/jira/browse/SPARK-46116 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > The addition of the "Q Support" link provides quick access to the > community-driven Q platform, StackOverflow, where users can seek help and > contribute to discussions about PySpark. It enhances the user experience by > connecting the documentation with a dynamic and interactive community > resource. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46116) Enriching "Useful links" on PySpark docs including "Q Support" and "Mailing Lists"
[ https://issues.apache.org/jira/browse/SPARK-46116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46116: Summary: Enriching "Useful links" on PySpark docs including "Q Support" and "Mailing Lists" (was: Enriching PySpark Documentation with "Useful Links" Including Q Support and Mailing Lists) > Enriching "Useful links" on PySpark docs including "Q Support" and "Mailing > Lists" > > > Key: SPARK-46116 > URL: https://issues.apache.org/jira/browse/SPARK-46116 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > The addition of the "Q Support" link provides quick access to the > community-driven Q platform, StackOverflow, where users can seek help and > contribute to discussions about PySpark. It enhances the user experience by > connecting the documentation with a dynamic and interactive community > resource. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46116) Enriching PySpark doc with "Useful links" Including Q Support and Mailing Lists
[ https://issues.apache.org/jira/browse/SPARK-46116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46116: Summary: Enriching PySpark doc with "Useful links" Including Q Support and Mailing Lists (was: Enriching "Useful links" on PySpark docs including "Q Support" and "Mailing Lists") > Enriching PySpark doc with "Useful links" Including Q Support and Mailing > Lists > - > > Key: SPARK-46116 > URL: https://issues.apache.org/jira/browse/SPARK-46116 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > The addition of the "Q Support" link provides quick access to the > community-driven Q platform, StackOverflow, where users can seek help and > contribute to discussions about PySpark. It enhances the user experience by > connecting the documentation with a dynamic and interactive community > resource. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46116) Enriching PySpark Documentation with "Useful Links" Including Q Support and Mailing Lists
[ https://issues.apache.org/jira/browse/SPARK-46116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46116: Summary: Enriching PySpark Documentation with "Useful Links" Including Q Support and Mailing Lists (was: Add "Q Support" Link to PySpark Documentation Homepage) > Enriching PySpark Documentation with "Useful Links" Including Q Support and > Mailing Lists > --- > > Key: SPARK-46116 > URL: https://issues.apache.org/jira/browse/SPARK-46116 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > The addition of the "Q Support" link provides quick access to the > community-driven Q platform, StackOverflow, where users can seek help and > contribute to discussions about PySpark. It enhances the user experience by > connecting the documentation with a dynamic and interactive community > resource. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46116) Add "Q Support" Link to PySpark Documentation Homepage
Haejoon Lee created SPARK-46116: --- Summary: Add "Q Support" Link to PySpark Documentation Homepage Key: SPARK-46116 URL: https://issues.apache.org/jira/browse/SPARK-46116 Project: Spark Issue Type: Bug Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee The addition of the "Q Support" link provides quick access to the community-driven Q platform, StackOverflow, where users can seek help and contribute to discussions about PySpark. It enhances the user experience by connecting the documentation with a dynamic and interactive community resource. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46115) Restrict charsets in encode()
Max Gekk created SPARK-46115: Summary: Restrict charsets in encode() Key: SPARK-46115 URL: https://issues.apache.org/jira/browse/SPARK-46115 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Max Gekk Assignee: Max Gekk Currently the list of supported charsets in encode() is not stable and fully depends on the used JDK version. So, sometimes user code might not work because a devop changed Java version in Spark cluster. The ticket aims to restrict the list of supported charsets by: {code} 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16' {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46105) df.emptyDataFrame shows 1 if we repartition(1) in Spark 3.3.x and above
[ https://issues.apache.org/jira/browse/SPARK-46105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789908#comment-17789908 ] XiDuo You commented on SPARK-46105: --- Please see SPARK-39915 > df.emptyDataFrame shows 1 if we repartition(1) in Spark 3.3.x and above > --- > > Key: SPARK-46105 > URL: https://issues.apache.org/jira/browse/SPARK-46105 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.3 > Environment: EKS > EMR >Reporter: dharani_sugumar >Priority: Major > Attachments: Screenshot 2023-11-26 at 11.54.58 AM.png > > > {color:#FF}Version: 3.3.3{color} > > {color:#FF}scala> val df = spark.emptyDataFrame{color} > {color:#FF}df: org.apache.spark.sql.DataFrame = []{color} > {color:#FF}scala> df.rdd.getNumPartitions{color} > {color:#FF}res0: Int = 0{color} > {color:#FF}scala> df.repartition(1).rdd.getNumPartitions{color} > {color:#FF}res1: Int = 1{color} > {color:#FF}scala> df.repartition(1).rdd.isEmpty(){color} > {color:#FF}[Stage 1:> > (0 + 1) / > res2: Boolean = true{color} > Version: 3.2.4 > scala> val df = spark.emptyDataFrame > df: org.apache.spark.sql.DataFrame = [] > scala> df.rdd.getNumPartitions > res0: Int = 0 > scala> df.repartition(1).rdd.getNumPartitions > res1: Int = 0 > scala> df.repartition(1).rdd.isEmpty() > res2: Boolean = true > > {color:#FF}Version: 3.5.0{color} > {color:#FF}scala> val df = spark.emptyDataFrame{color} > {color:#FF}df: org.apache.spark.sql.DataFrame = []{color} > {color:#FF}scala> df.rdd.getNumPartitions{color} > {color:#FF}res0: Int = 0{color} > {color:#FF}scala> df.repartition(1).rdd.getNumPartitions{color} > {color:#FF}res1: Int = 1{color} > {color:#FF}scala> df.repartition(1).rdd.isEmpty(){color} > {color:#FF}[Stage 1:> > (0 + 1) / > res2: Boolean = true{color} > > When we do repartition of 1 on an empty dataframe, the resultant partition is > 1 in version 3.3.x and 3.5.x whereas when I do the same in version 3.2.x, the > resultant partition is 0. May i know why this behaviour is changed from 3.2.x > to higher versions. > > The reason for raising this as a bug is I have a scenario where my final > dataframe returns 0 records in EKS(local spark) with single node(driver and > executor on the sam node) but it returns 1 in EMR both uses a same spark > version 3.3.3. I'm not sure why this behaves different in both the > environments. As a interim solution, I had to repartition a empty dataframe > if my final dataframe is empty which returns 1 for 3.3.3. Would like to know > if this really a bug or this behaviour exists in the future versions and > cannot be changed? > > Because, If we go for a spark upgrade and this behaviour is changed, we will > face the issue again. > Please confirm on this. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45311) Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search for an encoder for a generic type, and since 3.5.x isn't "an expression encoder"
[ https://issues.apache.org/jira/browse/SPARK-45311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789905#comment-17789905 ] Marc Le Bihan commented on SPARK-45311: --- Thanks. I only had to change the : {{{color:#0033b3}public{color} Ressources getRessources()}} getter to {{{color:#0033b3}public {color}{color:#00}Map{color}<{color:#00}RessourceJeuDeDonneesId{color}, {color:#00}Ressource{color}> getRessources()}} to make it working. And change a test from {{jeuDeDonnees.getRessources().forEach((ressource) -> {color:#871094}LOGGER{color}.info(...))}} to {{jeuDeDonnees.getRessources().forEach((id, ressource) -> {color:#871094}LOGGER{color}.info(...))}} and now, all the troubles I had in this issue are solved or have found a workaround. > Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search > for an encoder for a generic type, and since 3.5.x isn't "an expression > encoder" > - > > Key: SPARK-45311 > URL: https://issues.apache.org/jira/browse/SPARK-45311 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0, 3.4.1, 3.5.0 > Environment: Debian 12 > Java 17 > Underlying Spring-Boot 2.7.14 >Reporter: Marc Le Bihan >Priority: Major > Attachments: JavaTypeInference_116.png, sparkIssue_02.png > > > If you find it convenient, you might clone the > [https://gitlab.com/territoirevif/minimal-tests-spark-issue] project (that > does many operations around cities, local authorities and accounting with > open data) where I've extracted from my work what's necessary to make a set > of 35 tests that run correctly with Spark 3.3.x, and show the troubles > encountered with 3.4.x and 3.5.x. > > It is working well with Spark 3.2.x, 3.3.x. But as soon as I selec{*}t Spark > 3.4.x{*}, where the encoder seems to have deeply changed, the encoder fails > with two problems: > > *1)* It throws *java.util.NoSuchElementException: None.get* messages > everywhere. > Asking over the Internet, I wasn't alone facing this problem. Reading it, > you'll see that I've attempted a debug but my Scala skills are low. > [https://stackoverflow.com/questions/76036349/encoders-bean-doesnt-work-anymore-on-a-java-pojo-with-spark-3-4-0] > {color:#172b4d}by the way, if possible, the encoder and decoder functions > should forward a parameter as soon as the name of the field being handled is > known, and then all the long of their process, so that when the encoder is at > any point where it has to throw an exception, it knows the field it is > handling in its specific call and can send a message like:{color} > {color:#00875a}_java.util.NoSuchElementException: None.get when encoding [the > method or field it was targeting]_{color} > > *2)* *Not found an encoder of the type RS to Spark SQL internal > representation.* Consider to change the input type to one of supported at > (...) > Or : Not found an encoder of the type *OMI_ID* to Spark SQL internal > representation (...) > > where *RS* and *OMI_ID* are generic types. > This is strange. > [https://stackoverflow.com/questions/76045255/encoders-bean-attempts-to-check-the-validity-of-a-return-type-considering-its-ge] > > *3)* When I switch to the *Spark 3.5.0* version, the same problems remain, > but another add itself to the list: > "{*}Only expression encoders are supported for now{*}" on what was accepted > and working before. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46114) Define IndexError for PySpark error framework
[ https://issues.apache.org/jira/browse/SPARK-46114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46114: --- Labels: pull-request-available (was: ) > Define IndexError for PySpark error framework > - > > Key: SPARK-46114 > URL: https://issues.apache.org/jira/browse/SPARK-46114 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46114) Define IndexError for PySpark error framework
Hyukjin Kwon created SPARK-46114: Summary: Define IndexError for PySpark error framework Key: SPARK-46114 URL: https://issues.apache.org/jira/browse/SPARK-46114 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46112) Enforce usage of PySpark-specific Exceptions over built-in Python Exceptions
Haejoon Lee created SPARK-46112: --- Summary: Enforce usage of PySpark-specific Exceptions over built-in Python Exceptions Key: SPARK-46112 URL: https://issues.apache.org/jira/browse/SPARK-46112 Project: Spark Issue Type: Sub-task Components: Build, PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee Currently, in the PySpark codebase, there is an inconsistency in the usage of exceptions. In some instances, PySpark-specific exceptions are utilized, while in others, generic Python built-in exceptions are used. This inconsistency can lead to confusion and difficulty in maintaining and debugging the code. See [https://github.com/apache/spark/pull/44024] related work to fix such a case. The goal of this ticket is to establish a standardized practice for error handling in PySpark by mandating the use of PySpark-specific exceptions where applicable. This will ensure that all exceptions thrown within PySpark adhere to a consistent format and standard, making them more informative and easier to handle. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46111) Add copyright to the PySpark official documentation.
[ https://issues.apache.org/jira/browse/SPARK-46111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46111: --- Labels: pull-request-available (was: ) > Add copyright to the PySpark official documentation. > > > Key: SPARK-46111 > URL: https://issues.apache.org/jira/browse/SPARK-46111 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Add copyright to the PySpark official documentation by using Sphinx extension. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46111) Add copyright to the PySpark official documentation.
Haejoon Lee created SPARK-46111: --- Summary: Add copyright to the PySpark official documentation. Key: SPARK-46111 URL: https://issues.apache.org/jira/browse/SPARK-46111 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 4.0.0 Reporter: Haejoon Lee Add copyright to the PySpark official documentation by using Sphinx extension. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45699) Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it loses precision"
[ https://issues.apache.org/jira/browse/SPARK-45699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-45699: Assignee: Hannah Amundson > Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it > loses precision" > -- > > Key: SPARK-45699 > URL: https://issues.apache.org/jira/browse/SPARK-45699 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Hannah Amundson >Priority: Major > Labels: pull-request-available > > {code:java} > error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1199:67: > Widening conversion from Long to Double is deprecated because it loses > precision. Write `.toDouble` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks.threshold > [error] val threshold = max(speculationMultiplier * medianDuration, > minTimeToSpeculation) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1207:60: > Widening conversion from Long to Double is deprecated because it loses > precision. Write `.toDouble` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks > [error] foundTasks = checkAndSubmitSpeculatableTasks(timeMs, threshold, > customizedThreshold = true) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:137:48: > Widening conversion from Int to Float is deprecated because it loses > precision. Write `.toFloat` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.sql.connect.client.arrow.IntVectorReader.getFloat > [error] override def getFloat(i: Int): Float = getInt(i) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:146:49: > Widening conversion from Long to Float is deprecated because it loses > precision. Write `.toFloat` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getFloat > [error] override def getFloat(i: Int): Float = getLong(i) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:147:51: > Widening conversion from Long to Double is deprecated because it loses > precision. Write `.toDouble` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getDouble > [error] override def getDouble(i: Int): Double = getLong(i) > [error] ^ {code} > > > The example of the compilation warning is as above, there are probably over > 100 similar cases that need to be fixed. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45699) Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it loses precision"
[ https://issues.apache.org/jira/browse/SPARK-45699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-45699. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43890 [https://github.com/apache/spark/pull/43890] > Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it > loses precision" > -- > > Key: SPARK-45699 > URL: https://issues.apache.org/jira/browse/SPARK-45699 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Hannah Amundson >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code:java} > error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1199:67: > Widening conversion from Long to Double is deprecated because it loses > precision. Write `.toDouble` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks.threshold > [error] val threshold = max(speculationMultiplier * medianDuration, > minTimeToSpeculation) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1207:60: > Widening conversion from Long to Double is deprecated because it loses > precision. Write `.toDouble` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks > [error] foundTasks = checkAndSubmitSpeculatableTasks(timeMs, threshold, > customizedThreshold = true) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:137:48: > Widening conversion from Int to Float is deprecated because it loses > precision. Write `.toFloat` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.sql.connect.client.arrow.IntVectorReader.getFloat > [error] override def getFloat(i: Int): Float = getInt(i) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:146:49: > Widening conversion from Long to Float is deprecated because it loses > precision. Write `.toFloat` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getFloat > [error] override def getFloat(i: Int): Float = getLong(i) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:147:51: > Widening conversion from Long to Double is deprecated because it loses > precision. Write `.toDouble` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getDouble > [error] override def getDouble(i: Int): Double = getLong(i) > [error] ^ {code} > > > The example of the compilation warning is as above, there are probably over > 100 similar cases that need to be fixed. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45888) Apply error class framework to state data source & state metadata data source
[ https://issues.apache.org/jira/browse/SPARK-45888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45888: --- Labels: pull-request-available (was: ) > Apply error class framework to state data source & state metadata data source > - > > Key: SPARK-45888 > URL: https://issues.apache.org/jira/browse/SPARK-45888 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Jungtaek Lim >Priority: Blocker > Labels: pull-request-available > > Intended to be a blocker issue for the release of state data source reader. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46110) Use error classes in catalog, conf, connect, observation, pandas modules
[ https://issues.apache.org/jira/browse/SPARK-46110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46110: --- Labels: pull-request-available (was: ) > Use error classes in catalog, conf, connect, observation, pandas modules > > > Key: SPARK-46110 > URL: https://issues.apache.org/jira/browse/SPARK-46110 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46110) Use error classes in catalog, conf, connect, observation, pandas modules
Hyukjin Kwon created SPARK-46110: Summary: Use error classes in catalog, conf, connect, observation, pandas modules Key: SPARK-46110 URL: https://issues.apache.org/jira/browse/SPARK-46110 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-46109) Migrate to error classes in PySpark
[ https://issues.apache.org/jira/browse/SPARK-46109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon deleted SPARK-46109: - > Migrate to error classes in PySpark > --- > > Key: SPARK-46109 > URL: https://issues.apache.org/jira/browse/SPARK-46109 > Project: Spark > Issue Type: Umbrella >Reporter: Hyukjin Kwon >Priority: Major > > SPARK-41597 continues here to use error classes in PySpark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46109) Migrate to error classes in PySpark
[ https://issues.apache.org/jira/browse/SPARK-46109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46109: - > Migrate to error classes in PySpark > --- > > Key: SPARK-46109 > URL: https://issues.apache.org/jira/browse/SPARK-46109 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > SPARK-41597 continues here to use error classes in PySpark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46109) Migrate to error classes in PySpark
Hyukjin Kwon created SPARK-46109: Summary: Migrate to error classes in PySpark Key: SPARK-46109 URL: https://issues.apache.org/jira/browse/SPARK-46109 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon SPARK-41597 continues here to use error classes in PySpark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46108) XML: keepInnerXmlAsRaw option
Ufuk Süngü created SPARK-46108: -- Summary: XML: keepInnerXmlAsRaw option Key: SPARK-46108 URL: https://issues.apache.org/jira/browse/SPARK-46108 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Ufuk Süngü Built-in XML data source gives related value and schema of the inner or nested elements. However, additional operations should be made by developers manually to convert unstructured data to structured, tabular format. If nested elements are kept in a format that is suitable with XML (for each level), we can convert them easily to a structured, tabular format with the existing methods that have already been developed (infer method of XmlInferSchema and parseColumn method of StaxXmlParser). Therefore there should be an option that affects StaxXmlParser and InferSchema classes to keep inner XML elements in their original or raw format. https://github.com/apache/spark/pull/44022 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32933) Use keyword-only syntax for keyword_only methods
[ https://issues.apache.org/jira/browse/SPARK-32933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789874#comment-17789874 ] Hyukjin Kwon commented on SPARK-32933: -- Here the PR and JIRA: https://github.com/apache/spark/pull/44023 https://issues.apache.org/jira/browse/SPARK-46107 > Use keyword-only syntax for keyword_only methods > > > Key: SPARK-32933 > URL: https://issues.apache.org/jira/browse/SPARK-32933 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.1.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Minor > Fix For: 3.1.0 > > > Since 3.0, provides syntax for indicating keyword-only arguments ([PEP > 3102|https://www.python.org/dev/peps/pep-3102/]). > It is not a full replacement for our current usage of {{keyword_only}}, but > it would allow us to make our expectations explicit: > {code:python} > @keyword_only > def __init__(self, degree=2, inputCol=None, outputCol=None): > {code} > {code:python} > @keyword_only > def __init__(self, *, degree=2, inputCol=None, outputCol=None): > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46107) Deprecate pyspark.keyword_only API
[ https://issues.apache.org/jira/browse/SPARK-46107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46107: --- Labels: pull-request-available (was: ) > Deprecate pyspark.keyword_only API > -- > > Key: SPARK-46107 > URL: https://issues.apache.org/jira/browse/SPARK-46107 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > See https://issues.apache.org/jira/browse/SPARK-32933. We don't need this > anymore -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46107) Deprecate pyspark.keyword_only API
Hyukjin Kwon created SPARK-46107: Summary: Deprecate pyspark.keyword_only API Key: SPARK-46107 URL: https://issues.apache.org/jira/browse/SPARK-46107 Project: Spark Issue Type: Improvement Components: ML, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon See https://issues.apache.org/jira/browse/SPARK-32933. We don't need this anymore -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46094) Add support for code profiling executors
[ https://issues.apache.org/jira/browse/SPARK-46094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46094: --- Labels: pull-request-available (was: ) > Add support for code profiling executors > > > Key: SPARK-46094 > URL: https://issues.apache.org/jira/browse/SPARK-46094 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 4.0.0 >Reporter: Parth Chandra >Priority: Major > Labels: pull-request-available > > To profile a Spark application a user or developer has to run a spark job > locally on the development machine and use a tool like Java flight recorder, > Yourkit, or async-profiler to record profiling information. Because profiling > can be expensive, the profiler is typically attached to the Spark jvm process > after the process has started and stopped once sufficient profiling data is > collected. > The developers environment is frequently different from the production > environment and may not yield accurate information. > However, the profiling process is hard when a Spark application runs as a > distributed job on a cluster where the developer may have limited access to > the actual nodes where the executor processes are running. Also, in > environments like Kubernetes where the executor pods may be removed as soon > as the job completes, retrieving the profiling information from each executor > pod can become quite tricky. > This feature is to add a low overhead sampling profiler like async-profiler > as a built in capability to the Spark job that can be turned on using only > user configurable parameters (async-profiler is a low overhead profiler that > can be invoked programmatically and is available as a single multi-platform > jar (for linux, and mac). > In addition, for convenience, the feature would save profiling output files > to the distributed file system so that information from all executors can be > available in a single place. > The feature would add an executor plugin that does not add any overhead > unless enabled and can be configured to accept profiler arguments as a > configuration parameter. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46074) [CONNECT][SCALA] Insufficient details in error when a UDF fails
[ https://issues.apache.org/jira/browse/SPARK-46074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46074. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43983 [https://github.com/apache/spark/pull/43983] > [CONNECT][SCALA] Insufficient details in error when a UDF fails > --- > > Key: SPARK-46074 > URL: https://issues.apache.org/jira/browse/SPARK-46074 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Niranjan Jayakar >Assignee: Niranjan Jayakar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently, when a UDF fails the connect client does not receive the actual > error that caused the failure. > As an example, the error message looks like - > {code:java} > Exception in thread "main" org.apache.spark.SparkException: > grpc_shaded.io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to > stage failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost > task 2.3 in stage 0.0 (TID 10) (10.68.141.158 executor 0): > org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user > defined function (` (Main$$$Lambda$4770/1714264622)`: (int) => int). > SQLSTATE: 39000 {code} > In this case, the actual error was a {{{}java.lang.NoClassDefFoundError{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46074) [CONNECT][SCALA] Insufficient details in error when a UDF fails
[ https://issues.apache.org/jira/browse/SPARK-46074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46074: Assignee: Niranjan Jayakar > [CONNECT][SCALA] Insufficient details in error when a UDF fails > --- > > Key: SPARK-46074 > URL: https://issues.apache.org/jira/browse/SPARK-46074 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Niranjan Jayakar >Assignee: Niranjan Jayakar >Priority: Major > Labels: pull-request-available > > Currently, when a UDF fails the connect client does not receive the actual > error that caused the failure. > As an example, the error message looks like - > {code:java} > Exception in thread "main" org.apache.spark.SparkException: > grpc_shaded.io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to > stage failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost > task 2.3 in stage 0.0 (TID 10) (10.68.141.158 executor 0): > org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user > defined function (` (Main$$$Lambda$4770/1714264622)`: (int) => int). > SQLSTATE: 39000 {code} > In this case, the actual error was a {{{}java.lang.NoClassDefFoundError{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45311) Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search for an encoder for a generic type, and since 3.5.x isn't "an expression encoder"
[ https://issues.apache.org/jira/browse/SPARK-45311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789867#comment-17789867 ] Giambattista Bloisi commented on SPARK-45311: - The issue arise while Encoders.bean is inferring the schema for JeuDeDonnees class. This class has a field of type Ressources.class which extends a LinkedHashMap. A simple work-around to let the tests pass is to modify the JeuDeDonnees and declare ressources as a Map: {code:java} private Map ressources; //... public Map getRessources() { //...{code} and, when required, iterate the values explicitly: {code:java} jeuDeDonnees.getRessources().values().forEach {code} The exception is thrown because the code assumes (wrongly in that case) that if a class (such as Ressources.class) is a Map, then it has generic type information attached to it, here instead the information is available in the base/super class. There is a wider problem behind this. There are cases where mapping to a Spark schema would be ambigous, for example: * Ressources could have also getters and setters, should it be mapped as a map or a struct? * A class could implement both List and Map interfaces. should it be mapped as an array or a map? IMO the workaround is also a good idiomatic way to structure beans to be used with Spark, as it makes the mapping explicit and removes the possibility of ambiguities. > Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search > for an encoder for a generic type, and since 3.5.x isn't "an expression > encoder" > - > > Key: SPARK-45311 > URL: https://issues.apache.org/jira/browse/SPARK-45311 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0, 3.4.1, 3.5.0 > Environment: Debian 12 > Java 17 > Underlying Spring-Boot 2.7.14 >Reporter: Marc Le Bihan >Priority: Major > Attachments: JavaTypeInference_116.png, sparkIssue_02.png > > > If you find it convenient, you might clone the > [https://gitlab.com/territoirevif/minimal-tests-spark-issue] project (that > does many operations around cities, local authorities and accounting with > open data) where I've extracted from my work what's necessary to make a set > of 35 tests that run correctly with Spark 3.3.x, and show the troubles > encountered with 3.4.x and 3.5.x. > > It is working well with Spark 3.2.x, 3.3.x. But as soon as I selec{*}t Spark > 3.4.x{*}, where the encoder seems to have deeply changed, the encoder fails > with two problems: > > *1)* It throws *java.util.NoSuchElementException: None.get* messages > everywhere. > Asking over the Internet, I wasn't alone facing this problem. Reading it, > you'll see that I've attempted a debug but my Scala skills are low. > [https://stackoverflow.com/questions/76036349/encoders-bean-doesnt-work-anymore-on-a-java-pojo-with-spark-3-4-0] > {color:#172b4d}by the way, if possible, the encoder and decoder functions > should forward a parameter as soon as the name of the field being handled is > known, and then all the long of their process, so that when the encoder is at > any point where it has to throw an exception, it knows the field it is > handling in its specific call and can send a message like:{color} > {color:#00875a}_java.util.NoSuchElementException: None.get when encoding [the > method or field it was targeting]_{color} > > *2)* *Not found an encoder of the type RS to Spark SQL internal > representation.* Consider to change the input type to one of supported at > (...) > Or : Not found an encoder of the type *OMI_ID* to Spark SQL internal > representation (...) > > where *RS* and *OMI_ID* are generic types. > This is strange. > [https://stackoverflow.com/questions/76045255/encoders-bean-attempts-to-check-the-validity-of-a-return-type-considering-its-ge] > > *3)* When I switch to the *Spark 3.5.0* version, the same problems remain, > but another add itself to the list: > "{*}Only expression encoders are supported for now{*}" on what was accepted > and working before. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42655) Incorrect ambiguous column reference error
[ https://issues.apache.org/jira/browse/SPARK-42655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-42655: --- Labels: pull-request-available (was: ) > Incorrect ambiguous column reference error > -- > > Key: SPARK-42655 > URL: https://issues.apache.org/jira/browse/SPARK-42655 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Shrikant Prasad >Assignee: Shrikant Prasad >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > val df1 = > sc.parallelize(List((1,2,3,4,5),(1,2,3,4,5))).toDF("id","col2","col3","col4", > "col5") > val op_cols_same_case = List("id","col2","col3","col4", "col5", "id") > val df2 = df1.select(op_cols_same_case.head, op_cols_same_case.tail: _*) > df2.select("id").show() > > This query runs fine. > > But when we change the casing of the op_cols to have mix of upper & lower > case ("id" & "ID") it throws an ambiguous col ref error: > > val df1 = > sc.parallelize(List((1,2,3,4,5),(1,2,3,4,5))).toDF("id","col2","col3","col4", > "col5") > val op_cols_mixed_case = List("id","col2","col3","col4", "col5", "ID") > val df3 = df1.select(op_cols_mixed_case.head, op_cols_mixed_case.tail: _*) > df3.select("id").show() > org.apache.spark.sql.AnalysisException: Reference 'id' is ambiguous, could > be: id, id. > at > org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.resolve(package.scala:363) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren(LogicalPlan.scala:112) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$resolveExpressionByPlanChildren$1(Analyzer.scala:1857) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$resolveExpression$2(Analyzer.scala:1787) > at > org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:60) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.innerResolve$1(Analyzer.scala:1794) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.resolveExpression(Analyzer.scala:1812) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.resolveExpressionByPlanChildren(Analyzer.scala:1863) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$17.$anonfun$applyOrElse$94(Analyzer.scala:1577) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:209) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:286) > at scala.collection.TraversableLike.map$(TraversableLike.scala:279) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:209) > > Since, Spark is case insensitive, it should work for second case also when we > have upper and lower case column names in the column list. > It also works fine in Spark 2.3. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45974) Add scan.filterAttributes non-empty judgment for RowLevelOperationRuntimeGroupFiltering
[ https://issues.apache.org/jira/browse/SPARK-45974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-45974. - Fix Version/s: 3.5.1 4.0.0 Resolution: Fixed Issue resolved by pull request 43869 [https://github.com/apache/spark/pull/43869] > Add scan.filterAttributes non-empty judgment for > RowLevelOperationRuntimeGroupFiltering > --- > > Key: SPARK-45974 > URL: https://issues.apache.org/jira/browse/SPARK-45974 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Zhen Wang >Assignee: Zhen Wang >Priority: Major > Labels: pull-request-available > Fix For: 3.5.1, 4.0.0 > > > When scan.filterAttributes is empty, an invalid dynamic Pruning condition > will be generated in RowLevelOperationRuntimeGroupFiltering -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45974) Add scan.filterAttributes non-empty judgment for RowLevelOperationRuntimeGroupFiltering
[ https://issues.apache.org/jira/browse/SPARK-45974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-45974: --- Assignee: Zhen Wang > Add scan.filterAttributes non-empty judgment for > RowLevelOperationRuntimeGroupFiltering > --- > > Key: SPARK-45974 > URL: https://issues.apache.org/jira/browse/SPARK-45974 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Zhen Wang >Assignee: Zhen Wang >Priority: Major > Labels: pull-request-available > > When scan.filterAttributes is empty, an invalid dynamic Pruning condition > will be generated in RowLevelOperationRuntimeGroupFiltering -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46106) If the hive table is a table, the outsourcing information will be displayed during ShowCreateTableCommand.
[ https://issues.apache.org/jira/browse/SPARK-46106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46106: --- Labels: pull-request-available (was: ) > If the hive table is a table, the outsourcing information will be displayed > during ShowCreateTableCommand. > -- > > Key: SPARK-46106 > URL: https://issues.apache.org/jira/browse/SPARK-46106 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: guihuawen >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > For example: > CREATE EXTERNAL TABLE test_extaral_1 (a String); > When using SHOW CREATE TABLE test, if it is an external table, it is not > displayed whether it is an external table. > spark-sql> show create table test_extaral_1; > createtab_stmt > CREATE TABLE `test`.`test_extaral_1` ( > `a` STRING) > USING orc > LOCATION '/test/test_extaral_1' > > You can modify the display and see whether it is the appearance。 > spark-sql> show create table test_extaral_1; > createtab_stmt > CREATE EXTERNAL TABLE `test`.`test_extaral_1` ( > `a` STRING) > USING orc > CREATE EXTERNAL TABLE `test`.`test_extaral_1` ( > LOCATION '/test/test_extaral_1' > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46106) If the hive table is a table, the outsourcing information will be displayed during ShowCreateTableCommand.
guihuawen created SPARK-46106: - Summary: If the hive table is a table, the outsourcing information will be displayed during ShowCreateTableCommand. Key: SPARK-46106 URL: https://issues.apache.org/jira/browse/SPARK-46106 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: guihuawen Fix For: 3.5.0 For example: CREATE EXTERNAL TABLE test_extaral_1 (a String); When using SHOW CREATE TABLE test, if it is an external table, it is not displayed whether it is an external table. spark-sql> show create table test_extaral_1; createtab_stmt CREATE TABLE `test`.`test_extaral_1` ( `a` STRING) USING orc LOCATION '/test/test_extaral_1' You can modify the display and see whether it is the appearance。 spark-sql> show create table test_extaral_1; createtab_stmt CREATE EXTERNAL TABLE `test`.`test_extaral_1` ( `a` STRING) USING orc CREATE EXTERNAL TABLE `test`.`test_extaral_1` ( LOCATION '/test/test_extaral_1' -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39769) Rename trait Unevaluable
[ https://issues.apache.org/jira/browse/SPARK-39769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-39769: --- Labels: pull-request-available (was: ) > Rename trait Unevaluable > > > Key: SPARK-39769 > URL: https://issues.apache.org/jira/browse/SPARK-39769 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Ted Yu >Priority: Minor > Labels: pull-request-available > > I came upon `trait Unevaluable` which is defined in > sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala > Unevaluable is not a word. > There are `valuable`, `invaluable` but I have never seen Unevaluable. > This issue renames the trait to Unevaluatable -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45826) Add a SQL config for extra stack traces in Origin
[ https://issues.apache.org/jira/browse/SPARK-45826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-45826. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43695 [https://github.com/apache/spark/pull/43695] > Add a SQL config for extra stack traces in Origin > - > > Key: SPARK-45826 > URL: https://issues.apache.org/jira/browse/SPARK-45826 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Add a SQL config to control how many extra stack traces should be captured in > the withOrigin method. This should improve user experience in troubleshooting > issues. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46101) Replace (string|array).size with (string|array).length in module SQL
[ https://issues.apache.org/jira/browse/SPARK-46101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46101: --- Labels: pull-request-available (was: ) > Replace (string|array).size with (string|array).length in module SQL > > > Key: SPARK-46101 > URL: https://issues.apache.org/jira/browse/SPARK-46101 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org