[jira] [Updated] (SPARK-46543) json_tuple throw PySparkValueError for empty fields

2023-12-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46543:
---
Labels: pull-request-available  (was: )

> json_tuple throw PySparkValueError for empty fields
> ---
>
> Key: SPARK-46543
> URL: https://issues.apache.org/jira/browse/SPARK-46543
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46543) json_tuple throw PySparkValueError for empty fields

2023-12-28 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-46543:
-

 Summary: json_tuple throw PySparkValueError for empty fields
 Key: SPARK-46543
 URL: https://issues.apache.org/jira/browse/SPARK-46543
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46542) Remove the check for `c>=0` from `ExternalCatalogUtils#needsEscaping`

2023-12-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46542:
---
Labels: pull-request-available  (was: )

> Remove the check for `c>=0` from `ExternalCatalogUtils#needsEscaping`
> -
>
> Key: SPARK-46542
> URL: https://issues.apache.org/jira/browse/SPARK-46542
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
>
>  
> {code:java}
> def needsEscaping(c: Char): Boolean = {
>   c >= 0 && c < charToEscape.size() && charToEscape.get(c)
> } {code}
>  
>  
> The numerical range of Char in Scala is from 0 to 65,535, so `c>=0` is always 
> true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46542) Remove the check for `c>=0` from `ExternalCatalogUtils#needsEscaping`

2023-12-28 Thread Yang Jie (Jira)
Yang Jie created SPARK-46542:


 Summary: Remove the check for `c>=0` from 
`ExternalCatalogUtils#needsEscaping`
 Key: SPARK-46542
 URL: https://issues.apache.org/jira/browse/SPARK-46542
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yang Jie


 
{code:java}
def needsEscaping(c: Char): Boolean = {
  c >= 0 && c < charToEscape.size() && charToEscape.get(c)
} {code}
 

 

The numerical range of Char in Scala is from 0 to 65,535, so `c>=0` is always 
true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45914) Support `commit` and `abort` API for Python data source write

2023-12-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45914.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44497
[https://github.com/apache/spark/pull/44497]

> Support `commit` and `abort` API for Python data source write
> -
>
> Key: SPARK-45914
> URL: https://issues.apache.org/jira/browse/SPARK-45914
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Support `commit` and `abort` API for Python data source write.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45914) Support `commit` and `abort` API for Python data source write

2023-12-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45914:


Assignee: Allison Wang

> Support `commit` and `abort` API for Python data source write
> -
>
> Key: SPARK-45914
> URL: https://issues.apache.org/jira/browse/SPARK-45914
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Support `commit` and `abort` API for Python data source write.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46397) sha2(df.a, 1024) throws a different exception in Spark Connect

2023-12-28 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-46397.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44529
[https://github.com/apache/spark/pull/44529]

> sha2(df.a, 1024) throws a different exception in Spark Connect
> --
>
> Key: SPARK-46397
> URL: https://issues.apache.org/jira/browse/SPARK-46397
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code}
> from pyspark.sql import functions as sf
> spark.range(1).select(sf.sha2(sf.col("id"), 1024)).collect()
> {code}
> Non-connect:
> {code}
> ...
> pyspark.errors.exceptions.captured.IllegalArgumentException: requirement 
> failed: numBits 1024 is not in the permitted values (0, 224, 256, 384, 512)
> {code}
> Connect:
> {code}
> ...
> pyspark.errors.exceptions.connect.AnalysisException: 
> [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "sha2(id, 1024)" due 
> to data type mismatch: Parameter 1 requires the "BINARY" type, however "id" 
> has the type "BIGINT". SQLSTATE: 42K09;
> 'Project [unresolvedalias(sha2(id#1L, 1024))]
> +- Range (0, 1, step=1, splits=Some(1))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46533) Refine docstring of `array_min/array_max/array_size/array_repeat`

2023-12-28 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-46533:


Assignee: Yang Jie

> Refine docstring of `array_min/array_max/array_size/array_repeat`
> -
>
> Key: SPARK-46533
> URL: https://issues.apache.org/jira/browse/SPARK-46533
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46533) Refine docstring of `array_min/array_max/array_size/array_repeat`

2023-12-28 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-46533.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44522
[https://github.com/apache/spark/pull/44522]

> Refine docstring of `array_min/array_max/array_size/array_repeat`
> -
>
> Key: SPARK-46533
> URL: https://issues.apache.org/jira/browse/SPARK-46533
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46449) Add ability to create databases via Catalog API

2023-12-28 Thread Nicholas Chammas (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-46449:
-
Description: 
As of Spark 3.5, the only way to create a database is via SQL. The Catalog API 
should offer an equivalent.

Perhaps something like:
{code:python}
spark.catalog.createDatabase(
name: str,
existsOk: bool = False,
comment: str = None,
location: str = None,
properties: dict = None,
)
{code}

If {{schema}} is the preferred terminology, then we should use that instead of 
{{database}}.

  was:
As of Spark 3.5, the only way to create a database is via SQL. The Catalog API 
should offer an equivalent.

Perhaps something like:
{code:python}
spark.catalog.createDatabase(
name: str,
existsOk: bool = False,
comment: str = None,
location: str = None,
properties: dict = None,
)
{code}


> Add ability to create databases via Catalog API
> ---
>
> Key: SPARK-46449
> URL: https://issues.apache.org/jira/browse/SPARK-46449
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Nicholas Chammas
>Priority: Minor
>
> As of Spark 3.5, the only way to create a database is via SQL. The Catalog 
> API should offer an equivalent.
> Perhaps something like:
> {code:python}
> spark.catalog.createDatabase(
> name: str,
> existsOk: bool = False,
> comment: str = None,
> location: str = None,
> properties: dict = None,
> )
> {code}
> If {{schema}} is the preferred terminology, then we should use that instead 
> of {{database}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46540) Respect column names when Python data source read function outputs named Row objects

2023-12-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46540:
---
Labels: pull-request-available  (was: )

> Respect column names when Python data source read function outputs named Row 
> objects
> 
>
> Key: SPARK-46540
> URL: https://issues.apache.org/jira/browse/SPARK-46540
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46540) Respect column names when Python data source read function outputs named Row objects

2023-12-28 Thread Allison Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-46540:
-
Summary: Respect column names when Python data source read function outputs 
named Row objects  (was: Respect named arguments when Python data source read 
function outputs Row objects)

> Respect column names when Python data source read function outputs named Row 
> objects
> 
>
> Key: SPARK-46540
> URL: https://issues.apache.org/jira/browse/SPARK-46540
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46540) Respects named arguments when Python data source read function outputs Row objects

2023-12-28 Thread Allison Wang (Jira)
Allison Wang created SPARK-46540:


 Summary: Respects named arguments when Python data source read 
function outputs Row objects
 Key: SPARK-46540
 URL: https://issues.apache.org/jira/browse/SPARK-46540
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Allison Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46540) Respect named arguments when Python data source read function outputs Row objects

2023-12-28 Thread Allison Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-46540:
-
Summary: Respect named arguments when Python data source read function 
outputs Row objects  (was: Respects named arguments when Python data source 
read function outputs Row objects)

> Respect named arguments when Python data source read function outputs Row 
> objects
> -
>
> Key: SPARK-46540
> URL: https://issues.apache.org/jira/browse/SPARK-46540
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46538) Fix the ambiguous column reference issue in ALSModel.transform

2023-12-28 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-46538.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44526
[https://github.com/apache/spark/pull/44526]

> Fix the ambiguous column reference issue in ALSModel.transform
> --
>
> Key: SPARK-46538
> URL: https://issues.apache.org/jira/browse/SPARK-46538
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46484) Fix missing `plan_id` in`df.melt.groupby`

2023-12-28 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-46484.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44462
[https://github.com/apache/spark/pull/44462]

> Fix missing `plan_id` in`df.melt.groupby`
> -
>
> Key: SPARK-46484
> URL: https://issues.apache.org/jira/browse/SPARK-46484
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46397) sha2(df.a, 1024) throws a different exception in Spark Connect

2023-12-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46397:
---
Labels: pull-request-available  (was: )

> sha2(df.a, 1024) throws a different exception in Spark Connect
> --
>
> Key: SPARK-46397
> URL: https://issues.apache.org/jira/browse/SPARK-46397
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> from pyspark.sql import functions as sf
> spark.range(1).select(sf.sha2(sf.col("id"), 1024)).collect()
> {code}
> Non-connect:
> {code}
> ...
> pyspark.errors.exceptions.captured.IllegalArgumentException: requirement 
> failed: numBits 1024 is not in the permitted values (0, 224, 256, 384, 512)
> {code}
> Connect:
> {code}
> ...
> pyspark.errors.exceptions.connect.AnalysisException: 
> [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "sha2(id, 1024)" due 
> to data type mismatch: Parameter 1 requires the "BINARY" type, however "id" 
> has the type "BIGINT". SQLSTATE: 42K09;
> 'Project [unresolvedalias(sha2(id#1L, 1024))]
> +- Range (0, 1, step=1, splits=Some(1))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46382) XML: Capture values interspersed between elements

2023-12-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46382:


Assignee: Shujing Yang

> XML: Capture values interspersed between elements
> -
>
> Key: SPARK-46382
> URL: https://issues.apache.org/jira/browse/SPARK-46382
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Shujing Yang
>Assignee: Shujing Yang
>Priority: Major
>  Labels: pull-request-available
>
> In XML, elements typically consist of a name and a value, with the value 
> enclosed between the opening and closing tags. But XML also allows to include 
> arbitrary values interspersed between these elements. To address this, we 
> provide an option named `valueTags`, which is enabled by default, to capture 
> these values. Consider the following example:
> ```
> 
>     1
>   value1
>   
>     value2
>     2
>     value3
>   
> 
> ```
> In this example, ``,``, and `` are named elements with their 
> respective values enclosed within tags. There are arbitrary values value1 
> value2 value3 interspersed between the elements. Please note that there can 
> be multiple occurrences of values in a single element (i.e. there are value2, 
> value3 in the element )
>  
> We should parse the values between tags into the valueTags field. If there 
> are multiple occurrences of value tags, the value tag field will be converted 
> to an array type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46382) XML: Capture values interspersed between elements

2023-12-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46382.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44318
[https://github.com/apache/spark/pull/44318]

> XML: Capture values interspersed between elements
> -
>
> Key: SPARK-46382
> URL: https://issues.apache.org/jira/browse/SPARK-46382
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Shujing Yang
>Assignee: Shujing Yang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> In XML, elements typically consist of a name and a value, with the value 
> enclosed between the opening and closing tags. But XML also allows to include 
> arbitrary values interspersed between these elements. To address this, we 
> provide an option named `valueTags`, which is enabled by default, to capture 
> these values. Consider the following example:
> ```
> 
>     1
>   value1
>   
>     value2
>     2
>     value3
>   
> 
> ```
> In this example, ``,``, and `` are named elements with their 
> respective values enclosed within tags. There are arbitrary values value1 
> value2 value3 interspersed between the elements. Please note that there can 
> be multiple occurrences of values in a single element (i.e. there are value2, 
> value3 in the element )
>  
> We should parse the values between tags into the valueTags field. If there 
> are multiple occurrences of value tags, the value tag field will be converted 
> to an array type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44812) Push filters through intersect

2023-12-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44812:
---
Labels: pull-request-available  (was: )

> Push filters through intersect
> --
>
> Key: SPARK-44812
> URL: https://issues.apache.org/jira/browse/SPARK-44812
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: copperybean
>Priority: Major
>  Labels: pull-request-available
>
> For following SQL
> {code:sql}
> select a from (select a from tl intersect select x from tr) where a > 123 
> {code}
> The physical plan is
> {code:bash}
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- HashAggregate(keys=[a#8L], functions=[])
>    +- Exchange hashpartitioning(a#8L, 200), ENSURE_REQUIREMENTS, [plan_id=133]
>       +- HashAggregate(keys=[a#8L], functions=[])
>          +- BroadcastHashJoin [coalesce(a#8L, 0), isnull(a#8L)], 
> [coalesce(x#24L, 0), isnull(x#24L)], LeftSemi, BuildRight, false
>             :- Filter (isnotnull(a#8L) AND (a#8L > 123))
>             :  +- FileScan json [a#8L] ...
>             +- BroadcastExchange 
> HashedRelationBroadcastMode(List(coalesce(input[0, bigint, true], 0), 
> isnull(input[0, bigint, true])),false), [plan_id=129]
>                +- FileScan json [x#24L] ... {code}
> We can find the filter {color:#ff8b00}_a > 123_{color} is not pushed to right 
> table.
>  
>  
> Further more, for following SQL
> {code:sql}
> select a from (select a from tl intersect select x from tr) join trr on a = 
> y{code}
> The physical plan is
> {code:bash}
> *(3) Project [a#8L]
> +- *(3) BroadcastHashJoin [a#8L], [y#114L], Inner, BuildRight, false
>    :- *(3) HashAggregate(keys=[a#8L], functions=[])
>    :  +- Exchange hashpartitioning(a#8L, 200), ENSURE_REQUIREMENTS, 
> [plan_id=506]
>    :     +- *(1) HashAggregate(keys=[a#8L], functions=[])
>    :        +- *(1) BroadcastHashJoin [coalesce(a#8L, 0), isnull(a#8L)], 
> [coalesce(x#24L, 0), isnull(x#24L)], LeftSemi, BuildRight, false
>    :           :- *(1) Filter isnotnull(a#8L)
>    :           :  +- FileScan json [a#8L] ...
>    :           +- BroadcastExchange 
> HashedRelationBroadcastMode(List(coalesce(input[0, bigint, true], 0), 
> isnull(input[0, bigint, true])),false), [plan_id=490]
>    :              +- FileScan json [x#24L] ...
>    +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, 
> false]),false), [plan_id=512]
>       +- *(2) Filter isnotnull(y#114L)
>          +- FileScan json [y#114L] ...{code}
> There should be a filter _{color:#ff8b00}isnotnull( x ){color}_ for table tr, 
> while it's not pushed down.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46539) SELECT * EXCEPT(all fields from a struct) results in an assertion failure

2023-12-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46539:
---
Labels: pull-request-available  (was: )

> SELECT * EXCEPT(all fields from a struct) results in an assertion failure
> -
>
> Key: SPARK-46539
> URL: https://issues.apache.org/jira/browse/SPARK-46539
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> This example
> {code:sql}
> -- Removing all fields results in an empty struct
> > SELECT * EXCEPT(c1.a) FROM VALUES(named_struct('a', 1)) AS t(c1);
> {code}
> throws an AssertionError during serialization:
> {code:java}
> AssertionError: assertion failed: each serializer expression should contain 
> at least one `BoundReference`
> {code}
> instead of just return an empty struct



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46539) SELECT * EXCEPT(all fields from a struct) results in an assertion failure

2023-12-28 Thread Stefan Kandic (Jira)
Stefan Kandic created SPARK-46539:
-

 Summary: SELECT * EXCEPT(all fields from a struct) results in an 
assertion failure
 Key: SPARK-46539
 URL: https://issues.apache.org/jira/browse/SPARK-46539
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Stefan Kandic
 Fix For: 4.0.0


This example
{code:sql}
-- Removing all fields results in an empty struct
> SELECT * EXCEPT(c1.a) FROM VALUES(named_struct('a', 1)) AS t(c1);
{code}
throws an AssertionError during serialization:
{code:java}
AssertionError: assertion failed: each serializer expression should contain at 
least one `BoundReference`
{code}

instead of just return an empty struct



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46537) Convert NPE and asserts from commands to internal errors

2023-12-28 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-46537.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44525
[https://github.com/apache/spark/pull/44525]

> Convert NPE and asserts from commands to internal errors
> 
>
> Key: SPARK-46537
> URL: https://issues.apache.org/jira/browse/SPARK-46537
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Handle NPE and asserts from eagerly executed commands, and convert them to 
> internal errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46192) failed to insert the table using the default value of union

2023-12-28 Thread Barbara Raciniewska (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801004#comment-17801004
 ] 

Barbara Raciniewska commented on SPARK-46192:
-

I am working on it

> failed to insert the table using the default value of union
> ---
>
> Key: SPARK-46192
> URL: https://issues.apache.org/jira/browse/SPARK-46192
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0, 3.4.1
>Reporter: zengxl
>Priority: Major
>
>  
> Obtain the following tables and data
> {code:java}
> create table test_spark(k string default null,v int default null) stored as 
> orc;
> create table test_spark_1(k string default null,v int default null) stored as 
> orc;
> insert into table test_spark_1 values('k1',1),('k2',2),('k3',3);
> create table test_spark_2(k string default null,v int default null) stored as 
> orc; 
> insert into table test_spark_2 values('k3',3),('k4',4),('k5',5);
> {code}
> Execute the following SQL
> {code:java}
> insert into table test_spark (k) 
> select k from test_spark_1
> union
> select k from test_spark_2 
> {code}
> exception:
> {code:java}
> 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is 
> CatalogAndIdentifier
> 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is 
> CatalogAndIdentifier
> 23/12/01 10:44:25 INFO HiveSessionStateBuilder$$anon$1: here is 
> CatalogAndIdentifier
> 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: 
> i.userSpecifiedCols.size is 1
> 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: 
> i.userSpecifiedCols.size is 1
> 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: i.table.output 2 
> ,resolved :1 , i.query 1
> 23/12/01 10:44:26 INFO Analyzer$ResolveUserSpecifiedColumns: here is 
> ResolveUserSpecifiedColumns tableOutoyt: 2---nameToQueryExpr : 1Error in 
> query: `default`.`test_spark` requires that the data to be inserted have the 
> same number of columns as the target table: target table has 2 column(s) but 
> the inserted data has 1 column(s), including 0 partition column(s) having 
> constant value(s). {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46538) Fix the ambiguous column reference issue in ALSModel.transform

2023-12-28 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-46538:
--
Summary: Fix the ambiguous column reference issue in ALSModel.transform  
(was: Fix an ambiguous column reference issue in ALSModel.transform)

> Fix the ambiguous column reference issue in ALSModel.transform
> --
>
> Key: SPARK-46538
> URL: https://issues.apache.org/jira/browse/SPARK-46538
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46538) Fix an ambiguous column reference issue in ALSModel.transform

2023-12-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46538:
---
Labels: pull-request-available  (was: )

> Fix an ambiguous column reference issue in ALSModel.transform
> -
>
> Key: SPARK-46538
> URL: https://issues.apache.org/jira/browse/SPARK-46538
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46538) Fix an ambiguous column reference issue in ALSModel.transform

2023-12-28 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-46538:
-

 Summary: Fix an ambiguous column reference issue in 
ALSModel.transform
 Key: SPARK-46538
 URL: https://issues.apache.org/jira/browse/SPARK-46538
 Project: Spark
  Issue Type: Bug
  Components: ML
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46537) Convert NPE and asserts from commands to internal errors

2023-12-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46537:
---
Labels: pull-request-available  (was: )

> Convert NPE and asserts from commands to internal errors
> 
>
> Key: SPARK-46537
> URL: https://issues.apache.org/jira/browse/SPARK-46537
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
>
> Handle NPE and asserts from eagerly executed commands, and convert them to 
> internal errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46537) Convert NPE and asserts from commands to internal errors

2023-12-28 Thread Nikita Awasthi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800970#comment-17800970
 ] 

Nikita Awasthi commented on SPARK-46537:


User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/44525

> Convert NPE and asserts from commands to internal errors
> 
>
> Key: SPARK-46537
> URL: https://issues.apache.org/jira/browse/SPARK-46537
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Handle NPE and asserts from eagerly executed commands, and convert them to 
> internal errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46537) Convert NPE and asserts from commands to internal errors

2023-12-28 Thread Max Gekk (Jira)
Max Gekk created SPARK-46537:


 Summary: Convert NPE and asserts from commands to internal errors
 Key: SPARK-46537
 URL: https://issues.apache.org/jira/browse/SPARK-46537
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Max Gekk
Assignee: Max Gekk


Handle NPE and asserts from eagerly executed commands, and convert them to 
internal errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46536) Support GROUP BY calendar_interval_type

2023-12-28 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-46536:
---

 Summary: Support GROUP BY calendar_interval_type
 Key: SPARK-46536
 URL: https://issues.apache.org/jira/browse/SPARK-46536
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Wenchen Fan


Currently, Spark GROUP BY only allows orderable data types, otherwise the plan 
analysis fails: 
[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExprUtils.scala#L197-L203]

However, this is too strict as GROUP BY only cares about equality, not 
ordering. The CalendarInterval type is not orderable (1 month and 30 days, we 
don't know which one is larger), but has well-defined equality. In fact, we 
already support `SELECT DISTINCT calendar_interval_type` in some cases (when 
hash aggregate is picked by the planner).

The proposal here is to officially support calendar interval type in GROUP BY. 
We should relax the check inside `CheckAnalysis`, and make `CalendarInterval` 
implements `Comparable` using natural ordering (compare months first, then 
days, then seconds), and test with both hash aggregate and sort aggregate.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46535) NPE when describe extended a column without col stats

2023-12-28 Thread zouxxyy (Jira)
zouxxyy created SPARK-46535:
---

 Summary: NPE when describe extended a column without col stats
 Key: SPARK-46535
 URL: https://issues.apache.org/jira/browse/SPARK-46535
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0
Reporter: zouxxyy






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39859) Support v2 `DESCRIBE TABLE EXTENDED` for columns

2023-12-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-39859:
---
Labels: pull-request-available  (was: )

> Support v2 `DESCRIBE TABLE EXTENDED` for columns
> 
>
> Key: SPARK-39859
> URL: https://issues.apache.org/jira/browse/SPARK-39859
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Huaxin Gao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46532) Pass message parameters in metadata of ErrorInfo

2023-12-28 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-46532.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44468
[https://github.com/apache/spark/pull/44468]

> Pass message parameters in metadata of ErrorInfo
> 
>
> Key: SPARK-46532
> URL: https://issues.apache.org/jira/browse/SPARK-46532
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Put message parameters together with an error class in the `messageParameter` 
> field in metadata of `ErrorInfo`. Now, it is not possible to re-construct an 
> error having only an error class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46519) Clear unused error classes from error-classes.json file

2023-12-28 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-46519.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44503
[https://github.com/apache/spark/pull/44503]

> Clear unused error classes from error-classes.json file
> ---
>
> Key: SPARK-46519
> URL: https://issues.apache.org/jira/browse/SPARK-46519
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46519) Clear unused error classes from error-classes.json file

2023-12-28 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-46519:


Assignee: BingKun Pan

> Clear unused error classes from error-classes.json file
> ---
>
> Key: SPARK-46519
> URL: https://issues.apache.org/jira/browse/SPARK-46519
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org