[jira] [Updated] (SPARK-46543) json_tuple throw PySparkValueError for empty fields
[ https://issues.apache.org/jira/browse/SPARK-46543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46543: --- Labels: pull-request-available (was: ) > json_tuple throw PySparkValueError for empty fields > --- > > Key: SPARK-46543 > URL: https://issues.apache.org/jira/browse/SPARK-46543 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46543) json_tuple throw PySparkValueError for empty fields
Ruifeng Zheng created SPARK-46543: - Summary: json_tuple throw PySparkValueError for empty fields Key: SPARK-46543 URL: https://issues.apache.org/jira/browse/SPARK-46543 Project: Spark Issue Type: Improvement Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46542) Remove the check for `c>=0` from `ExternalCatalogUtils#needsEscaping`
[ https://issues.apache.org/jira/browse/SPARK-46542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46542: --- Labels: pull-request-available (was: ) > Remove the check for `c>=0` from `ExternalCatalogUtils#needsEscaping` > - > > Key: SPARK-46542 > URL: https://issues.apache.org/jira/browse/SPARK-46542 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > Labels: pull-request-available > > > {code:java} > def needsEscaping(c: Char): Boolean = { > c >= 0 && c < charToEscape.size() && charToEscape.get(c) > } {code} > > > The numerical range of Char in Scala is from 0 to 65,535, so `c>=0` is always > true. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46542) Remove the check for `c>=0` from `ExternalCatalogUtils#needsEscaping`
Yang Jie created SPARK-46542: Summary: Remove the check for `c>=0` from `ExternalCatalogUtils#needsEscaping` Key: SPARK-46542 URL: https://issues.apache.org/jira/browse/SPARK-46542 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Yang Jie {code:java} def needsEscaping(c: Char): Boolean = { c >= 0 && c < charToEscape.size() && charToEscape.get(c) } {code} The numerical range of Char in Scala is from 0 to 65,535, so `c>=0` is always true. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45914) Support `commit` and `abort` API for Python data source write
[ https://issues.apache.org/jira/browse/SPARK-45914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45914. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44497 [https://github.com/apache/spark/pull/44497] > Support `commit` and `abort` API for Python data source write > - > > Key: SPARK-45914 > URL: https://issues.apache.org/jira/browse/SPARK-45914 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Support `commit` and `abort` API for Python data source write. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45914) Support `commit` and `abort` API for Python data source write
[ https://issues.apache.org/jira/browse/SPARK-45914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45914: Assignee: Allison Wang > Support `commit` and `abort` API for Python data source write > - > > Key: SPARK-45914 > URL: https://issues.apache.org/jira/browse/SPARK-45914 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > > Support `commit` and `abort` API for Python data source write. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46397) sha2(df.a, 1024) throws a different exception in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-46397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-46397. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44529 [https://github.com/apache/spark/pull/44529] > sha2(df.a, 1024) throws a different exception in Spark Connect > -- > > Key: SPARK-46397 > URL: https://issues.apache.org/jira/browse/SPARK-46397 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code} > from pyspark.sql import functions as sf > spark.range(1).select(sf.sha2(sf.col("id"), 1024)).collect() > {code} > Non-connect: > {code} > ... > pyspark.errors.exceptions.captured.IllegalArgumentException: requirement > failed: numBits 1024 is not in the permitted values (0, 224, 256, 384, 512) > {code} > Connect: > {code} > ... > pyspark.errors.exceptions.connect.AnalysisException: > [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "sha2(id, 1024)" due > to data type mismatch: Parameter 1 requires the "BINARY" type, however "id" > has the type "BIGINT". SQLSTATE: 42K09; > 'Project [unresolvedalias(sha2(id#1L, 1024))] > +- Range (0, 1, step=1, splits=Some(1)) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46533) Refine docstring of `array_min/array_max/array_size/array_repeat`
[ https://issues.apache.org/jira/browse/SPARK-46533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-46533: Assignee: Yang Jie > Refine docstring of `array_min/array_max/array_size/array_repeat` > - > > Key: SPARK-46533 > URL: https://issues.apache.org/jira/browse/SPARK-46533 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46533) Refine docstring of `array_min/array_max/array_size/array_repeat`
[ https://issues.apache.org/jira/browse/SPARK-46533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-46533. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44522 [https://github.com/apache/spark/pull/44522] > Refine docstring of `array_min/array_max/array_size/array_repeat` > - > > Key: SPARK-46533 > URL: https://issues.apache.org/jira/browse/SPARK-46533 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46449) Add ability to create databases via Catalog API
[ https://issues.apache.org/jira/browse/SPARK-46449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-46449: - Description: As of Spark 3.5, the only way to create a database is via SQL. The Catalog API should offer an equivalent. Perhaps something like: {code:python} spark.catalog.createDatabase( name: str, existsOk: bool = False, comment: str = None, location: str = None, properties: dict = None, ) {code} If {{schema}} is the preferred terminology, then we should use that instead of {{database}}. was: As of Spark 3.5, the only way to create a database is via SQL. The Catalog API should offer an equivalent. Perhaps something like: {code:python} spark.catalog.createDatabase( name: str, existsOk: bool = False, comment: str = None, location: str = None, properties: dict = None, ) {code} > Add ability to create databases via Catalog API > --- > > Key: SPARK-46449 > URL: https://issues.apache.org/jira/browse/SPARK-46449 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Nicholas Chammas >Priority: Minor > > As of Spark 3.5, the only way to create a database is via SQL. The Catalog > API should offer an equivalent. > Perhaps something like: > {code:python} > spark.catalog.createDatabase( > name: str, > existsOk: bool = False, > comment: str = None, > location: str = None, > properties: dict = None, > ) > {code} > If {{schema}} is the preferred terminology, then we should use that instead > of {{database}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46540) Respect column names when Python data source read function outputs named Row objects
[ https://issues.apache.org/jira/browse/SPARK-46540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46540: --- Labels: pull-request-available (was: ) > Respect column names when Python data source read function outputs named Row > objects > > > Key: SPARK-46540 > URL: https://issues.apache.org/jira/browse/SPARK-46540 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46540) Respect column names when Python data source read function outputs named Row objects
[ https://issues.apache.org/jira/browse/SPARK-46540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated SPARK-46540: - Summary: Respect column names when Python data source read function outputs named Row objects (was: Respect named arguments when Python data source read function outputs Row objects) > Respect column names when Python data source read function outputs named Row > objects > > > Key: SPARK-46540 > URL: https://issues.apache.org/jira/browse/SPARK-46540 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46540) Respects named arguments when Python data source read function outputs Row objects
Allison Wang created SPARK-46540: Summary: Respects named arguments when Python data source read function outputs Row objects Key: SPARK-46540 URL: https://issues.apache.org/jira/browse/SPARK-46540 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Allison Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46540) Respect named arguments when Python data source read function outputs Row objects
[ https://issues.apache.org/jira/browse/SPARK-46540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated SPARK-46540: - Summary: Respect named arguments when Python data source read function outputs Row objects (was: Respects named arguments when Python data source read function outputs Row objects) > Respect named arguments when Python data source read function outputs Row > objects > - > > Key: SPARK-46540 > URL: https://issues.apache.org/jira/browse/SPARK-46540 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46538) Fix the ambiguous column reference issue in ALSModel.transform
[ https://issues.apache.org/jira/browse/SPARK-46538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-46538. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44526 [https://github.com/apache/spark/pull/44526] > Fix the ambiguous column reference issue in ALSModel.transform > -- > > Key: SPARK-46538 > URL: https://issues.apache.org/jira/browse/SPARK-46538 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46484) Fix missing `plan_id` in`df.melt.groupby`
[ https://issues.apache.org/jira/browse/SPARK-46484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-46484. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44462 [https://github.com/apache/spark/pull/44462] > Fix missing `plan_id` in`df.melt.groupby` > - > > Key: SPARK-46484 > URL: https://issues.apache.org/jira/browse/SPARK-46484 > Project: Spark > Issue Type: Bug > Components: Connect, SQL >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46397) sha2(df.a, 1024) throws a different exception in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-46397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46397: --- Labels: pull-request-available (was: ) > sha2(df.a, 1024) throws a different exception in Spark Connect > -- > > Key: SPARK-46397 > URL: https://issues.apache.org/jira/browse/SPARK-46397 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > {code} > from pyspark.sql import functions as sf > spark.range(1).select(sf.sha2(sf.col("id"), 1024)).collect() > {code} > Non-connect: > {code} > ... > pyspark.errors.exceptions.captured.IllegalArgumentException: requirement > failed: numBits 1024 is not in the permitted values (0, 224, 256, 384, 512) > {code} > Connect: > {code} > ... > pyspark.errors.exceptions.connect.AnalysisException: > [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "sha2(id, 1024)" due > to data type mismatch: Parameter 1 requires the "BINARY" type, however "id" > has the type "BIGINT". SQLSTATE: 42K09; > 'Project [unresolvedalias(sha2(id#1L, 1024))] > +- Range (0, 1, step=1, splits=Some(1)) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46382) XML: Capture values interspersed between elements
[ https://issues.apache.org/jira/browse/SPARK-46382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46382: Assignee: Shujing Yang > XML: Capture values interspersed between elements > - > > Key: SPARK-46382 > URL: https://issues.apache.org/jira/browse/SPARK-46382 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Shujing Yang >Assignee: Shujing Yang >Priority: Major > Labels: pull-request-available > > In XML, elements typically consist of a name and a value, with the value > enclosed between the opening and closing tags. But XML also allows to include > arbitrary values interspersed between these elements. To address this, we > provide an option named `valueTags`, which is enabled by default, to capture > these values. Consider the following example: > ``` > > 1 > value1 > > value2 > 2 > value3 > > > ``` > In this example, ``,``, and `` are named elements with their > respective values enclosed within tags. There are arbitrary values value1 > value2 value3 interspersed between the elements. Please note that there can > be multiple occurrences of values in a single element (i.e. there are value2, > value3 in the element ) > > We should parse the values between tags into the valueTags field. If there > are multiple occurrences of value tags, the value tag field will be converted > to an array type. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46382) XML: Capture values interspersed between elements
[ https://issues.apache.org/jira/browse/SPARK-46382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46382. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44318 [https://github.com/apache/spark/pull/44318] > XML: Capture values interspersed between elements > - > > Key: SPARK-46382 > URL: https://issues.apache.org/jira/browse/SPARK-46382 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Shujing Yang >Assignee: Shujing Yang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > In XML, elements typically consist of a name and a value, with the value > enclosed between the opening and closing tags. But XML also allows to include > arbitrary values interspersed between these elements. To address this, we > provide an option named `valueTags`, which is enabled by default, to capture > these values. Consider the following example: > ``` > > 1 > value1 > > value2 > 2 > value3 > > > ``` > In this example, ``,``, and `` are named elements with their > respective values enclosed within tags. There are arbitrary values value1 > value2 value3 interspersed between the elements. Please note that there can > be multiple occurrences of values in a single element (i.e. there are value2, > value3 in the element ) > > We should parse the values between tags into the valueTags field. If there > are multiple occurrences of value tags, the value tag field will be converted > to an array type. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44812) Push filters through intersect
[ https://issues.apache.org/jira/browse/SPARK-44812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44812: --- Labels: pull-request-available (was: ) > Push filters through intersect > -- > > Key: SPARK-44812 > URL: https://issues.apache.org/jira/browse/SPARK-44812 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.1 >Reporter: copperybean >Priority: Major > Labels: pull-request-available > > For following SQL > {code:sql} > select a from (select a from tl intersect select x from tr) where a > 123 > {code} > The physical plan is > {code:bash} > == Physical Plan == > AdaptiveSparkPlan isFinalPlan=false > +- HashAggregate(keys=[a#8L], functions=[]) > +- Exchange hashpartitioning(a#8L, 200), ENSURE_REQUIREMENTS, [plan_id=133] > +- HashAggregate(keys=[a#8L], functions=[]) > +- BroadcastHashJoin [coalesce(a#8L, 0), isnull(a#8L)], > [coalesce(x#24L, 0), isnull(x#24L)], LeftSemi, BuildRight, false > :- Filter (isnotnull(a#8L) AND (a#8L > 123)) > : +- FileScan json [a#8L] ... > +- BroadcastExchange > HashedRelationBroadcastMode(List(coalesce(input[0, bigint, true], 0), > isnull(input[0, bigint, true])),false), [plan_id=129] > +- FileScan json [x#24L] ... {code} > We can find the filter {color:#ff8b00}_a > 123_{color} is not pushed to right > table. > > > Further more, for following SQL > {code:sql} > select a from (select a from tl intersect select x from tr) join trr on a = > y{code} > The physical plan is > {code:bash} > *(3) Project [a#8L] > +- *(3) BroadcastHashJoin [a#8L], [y#114L], Inner, BuildRight, false > :- *(3) HashAggregate(keys=[a#8L], functions=[]) > : +- Exchange hashpartitioning(a#8L, 200), ENSURE_REQUIREMENTS, > [plan_id=506] > : +- *(1) HashAggregate(keys=[a#8L], functions=[]) > : +- *(1) BroadcastHashJoin [coalesce(a#8L, 0), isnull(a#8L)], > [coalesce(x#24L, 0), isnull(x#24L)], LeftSemi, BuildRight, false > : :- *(1) Filter isnotnull(a#8L) > : : +- FileScan json [a#8L] ... > : +- BroadcastExchange > HashedRelationBroadcastMode(List(coalesce(input[0, bigint, true], 0), > isnull(input[0, bigint, true])),false), [plan_id=490] > : +- FileScan json [x#24L] ... > +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, > false]),false), [plan_id=512] > +- *(2) Filter isnotnull(y#114L) > +- FileScan json [y#114L] ...{code} > There should be a filter _{color:#ff8b00}isnotnull( x ){color}_ for table tr, > while it's not pushed down. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46539) SELECT * EXCEPT(all fields from a struct) results in an assertion failure
[ https://issues.apache.org/jira/browse/SPARK-46539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46539: --- Labels: pull-request-available (was: ) > SELECT * EXCEPT(all fields from a struct) results in an assertion failure > - > > Key: SPARK-46539 > URL: https://issues.apache.org/jira/browse/SPARK-46539 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Stefan Kandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > This example > {code:sql} > -- Removing all fields results in an empty struct > > SELECT * EXCEPT(c1.a) FROM VALUES(named_struct('a', 1)) AS t(c1); > {code} > throws an AssertionError during serialization: > {code:java} > AssertionError: assertion failed: each serializer expression should contain > at least one `BoundReference` > {code} > instead of just return an empty struct -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46539) SELECT * EXCEPT(all fields from a struct) results in an assertion failure
Stefan Kandic created SPARK-46539: - Summary: SELECT * EXCEPT(all fields from a struct) results in an assertion failure Key: SPARK-46539 URL: https://issues.apache.org/jira/browse/SPARK-46539 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Stefan Kandic Fix For: 4.0.0 This example {code:sql} -- Removing all fields results in an empty struct > SELECT * EXCEPT(c1.a) FROM VALUES(named_struct('a', 1)) AS t(c1); {code} throws an AssertionError during serialization: {code:java} AssertionError: assertion failed: each serializer expression should contain at least one `BoundReference` {code} instead of just return an empty struct -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46537) Convert NPE and asserts from commands to internal errors
[ https://issues.apache.org/jira/browse/SPARK-46537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-46537. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44525 [https://github.com/apache/spark/pull/44525] > Convert NPE and asserts from commands to internal errors > > > Key: SPARK-46537 > URL: https://issues.apache.org/jira/browse/SPARK-46537 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Handle NPE and asserts from eagerly executed commands, and convert them to > internal errors. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46192) failed to insert the table using the default value of union
[ https://issues.apache.org/jira/browse/SPARK-46192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801004#comment-17801004 ] Barbara Raciniewska commented on SPARK-46192: - I am working on it > failed to insert the table using the default value of union > --- > > Key: SPARK-46192 > URL: https://issues.apache.org/jira/browse/SPARK-46192 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0, 3.4.1 >Reporter: zengxl >Priority: Major > > > Obtain the following tables and data > {code:java} > create table test_spark(k string default null,v int default null) stored as > orc; > create table test_spark_1(k string default null,v int default null) stored as > orc; > insert into table test_spark_1 values('k1',1),('k2',2),('k3',3); > create table test_spark_2(k string default null,v int default null) stored as > orc; > insert into table test_spark_2 values('k3',3),('k4',4),('k5',5); > {code} > Execute the following SQL > {code:java} > insert into table test_spark (k) > select k from test_spark_1 > union > select k from test_spark_2 > {code} > exception: > {code:java} > 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is > CatalogAndIdentifier > 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is > CatalogAndIdentifier > 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is > CatalogAndIdentifier > 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: > i.userSpecifiedCols.size is 1 > 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: > i.userSpecifiedCols.size is 1 > 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: i.table.output 2 > ,resolved :1 , i.query 1 > 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: here is > ResolveUserSpecifiedColumns tableOutoyt: 2---nameToQueryExpr : 1Error in > query: `default`.`test_spark` requires that the data to be inserted have the > same number of columns as the target table: target table has 2 column(s) but > the inserted data has 1 column(s), including 0 partition column(s) having > constant value(s). {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46538) Fix the ambiguous column reference issue in ALSModel.transform
[ https://issues.apache.org/jira/browse/SPARK-46538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-46538: -- Summary: Fix the ambiguous column reference issue in ALSModel.transform (was: Fix an ambiguous column reference issue in ALSModel.transform) > Fix the ambiguous column reference issue in ALSModel.transform > -- > > Key: SPARK-46538 > URL: https://issues.apache.org/jira/browse/SPARK-46538 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46538) Fix an ambiguous column reference issue in ALSModel.transform
[ https://issues.apache.org/jira/browse/SPARK-46538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46538: --- Labels: pull-request-available (was: ) > Fix an ambiguous column reference issue in ALSModel.transform > - > > Key: SPARK-46538 > URL: https://issues.apache.org/jira/browse/SPARK-46538 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46538) Fix an ambiguous column reference issue in ALSModel.transform
Ruifeng Zheng created SPARK-46538: - Summary: Fix an ambiguous column reference issue in ALSModel.transform Key: SPARK-46538 URL: https://issues.apache.org/jira/browse/SPARK-46538 Project: Spark Issue Type: Bug Components: ML Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46537) Convert NPE and asserts from commands to internal errors
[ https://issues.apache.org/jira/browse/SPARK-46537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46537: --- Labels: pull-request-available (was: ) > Convert NPE and asserts from commands to internal errors > > > Key: SPARK-46537 > URL: https://issues.apache.org/jira/browse/SPARK-46537 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > > Handle NPE and asserts from eagerly executed commands, and convert them to > internal errors. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46537) Convert NPE and asserts from commands to internal errors
[ https://issues.apache.org/jira/browse/SPARK-46537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800970#comment-17800970 ] Nikita Awasthi commented on SPARK-46537: User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/44525 > Convert NPE and asserts from commands to internal errors > > > Key: SPARK-46537 > URL: https://issues.apache.org/jira/browse/SPARK-46537 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Handle NPE and asserts from eagerly executed commands, and convert them to > internal errors. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46537) Convert NPE and asserts from commands to internal errors
Max Gekk created SPARK-46537: Summary: Convert NPE and asserts from commands to internal errors Key: SPARK-46537 URL: https://issues.apache.org/jira/browse/SPARK-46537 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Max Gekk Assignee: Max Gekk Handle NPE and asserts from eagerly executed commands, and convert them to internal errors. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46536) Support GROUP BY calendar_interval_type
Wenchen Fan created SPARK-46536: --- Summary: Support GROUP BY calendar_interval_type Key: SPARK-46536 URL: https://issues.apache.org/jira/browse/SPARK-46536 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Wenchen Fan Currently, Spark GROUP BY only allows orderable data types, otherwise the plan analysis fails: [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExprUtils.scala#L197-L203] However, this is too strict as GROUP BY only cares about equality, not ordering. The CalendarInterval type is not orderable (1 month and 30 days, we don't know which one is larger), but has well-defined equality. In fact, we already support `SELECT DISTINCT calendar_interval_type` in some cases (when hash aggregate is picked by the planner). The proposal here is to officially support calendar interval type in GROUP BY. We should relax the check inside `CheckAnalysis`, and make `CalendarInterval` implements `Comparable` using natural ordering (compare months first, then days, then seconds), and test with both hash aggregate and sort aggregate. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46535) NPE when describe extended a column without col stats
zouxxyy created SPARK-46535: --- Summary: NPE when describe extended a column without col stats Key: SPARK-46535 URL: https://issues.apache.org/jira/browse/SPARK-46535 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0 Reporter: zouxxyy -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39859) Support v2 `DESCRIBE TABLE EXTENDED` for columns
[ https://issues.apache.org/jira/browse/SPARK-39859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-39859: --- Labels: pull-request-available (was: ) > Support v2 `DESCRIBE TABLE EXTENDED` for columns > > > Key: SPARK-39859 > URL: https://issues.apache.org/jira/browse/SPARK-39859 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Huaxin Gao >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46532) Pass message parameters in metadata of ErrorInfo
[ https://issues.apache.org/jira/browse/SPARK-46532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-46532. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44468 [https://github.com/apache/spark/pull/44468] > Pass message parameters in metadata of ErrorInfo > > > Key: SPARK-46532 > URL: https://issues.apache.org/jira/browse/SPARK-46532 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Put message parameters together with an error class in the `messageParameter` > field in metadata of `ErrorInfo`. Now, it is not possible to re-construct an > error having only an error class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46519) Clear unused error classes from error-classes.json file
[ https://issues.apache.org/jira/browse/SPARK-46519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-46519. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44503 [https://github.com/apache/spark/pull/44503] > Clear unused error classes from error-classes.json file > --- > > Key: SPARK-46519 > URL: https://issues.apache.org/jira/browse/SPARK-46519 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46519) Clear unused error classes from error-classes.json file
[ https://issues.apache.org/jira/browse/SPARK-46519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-46519: Assignee: BingKun Pan > Clear unused error classes from error-classes.json file > --- > > Key: SPARK-46519 > URL: https://issues.apache.org/jira/browse/SPARK-46519 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org