date:20230522

[jira] [Resolved] (SPARK-43719) Handle missing row.excludedInStages field

2023-05-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-43719.
---
Fix Version/s: 3.3.3
   3.5.0
   3.4.1
   Resolution: Fixed

Issue resolved by pull request 41266
[https://github.com/apache/spark/pull/41266]

> Handle missing row.excludedInStages field
> -
>
> Key: SPARK-43719
> URL: https://issues.apache.org/jira/browse/SPARK-43719
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.3.3, 3.5.0, 3.4.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43719) Handle missing row.excludedInStages field

2023-05-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-43719:
-

Assignee: Dongjoon Hyun

> Handle missing row.excludedInStages field
> -
>
> Key: SPARK-43719
> URL: https://issues.apache.org/jira/browse/SPARK-43719
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43719) Handle missing row.excludedInStages field

2023-05-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-43719:
--
Affects Version/s: 3.3.2

> Handle missing row.excludedInStages field
> -
>
> Key: SPARK-43719
> URL: https://issues.apache.org/jira/browse/SPARK-43719
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43718) References to a specific side's key in a USING join can have wrong nullability

2023-05-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-43718.
---
Fix Version/s: 3.3.3
   3.5.0
   3.4.1
   Resolution: Fixed

Issue resolved by pull request 41267
[https://github.com/apache/spark/pull/41267]

> References to a specific side's key in a USING join can have wrong nullability
> --
>
> Key: SPARK-43718
> URL: https://issues.apache.org/jira/browse/SPARK-43718
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.0, 3.5.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Major
>  Labels: correctness
> Fix For: 3.3.3, 3.5.0, 3.4.1
>
>
> Assume this data:
> {noformat}
> create or replace temp view t1 as values (1), (2), (3) as (c1);
> create or replace temp view t2 as values (2), (3), (4) as (c1);
> {noformat}
> The following query produces incorrect results:
> {noformat}
> spark-sql (default)> select explode(array(t1.c1, t2.c1)) as x1
> from t1
> full outer join t2
> using (c1);
> 1
> -1  <== should be null
> 2
> 2
> 3
> 3
> -1  <== should be null
> 4
> Time taken: 0.663 seconds, Fetched 8 row(s)
> spark-sql (default)> 
> {noformat}
> Similar issues occur with right outer join and left outer join.
> {{t1.c1}} and {{t2.c1}} have the wrong nullability at the time the array is 
> resolved, so the array's {{containsNull}} value is incorrect.
> Queries that don't use arrays also can get wrong results. Assume this data:
> {noformat}
> create or replace temp view t1 as values (0), (1), (2) as (c1);
> create or replace temp view t2 as values (1), (2), (3) as (c1);
> create or replace temp view t3 as values (1, 2), (3, 4), (4, 5) as (a, b);
> {noformat}
> The following query produces incorrect results:
> {noformat}
> select t1.c1 as t1_c1, t2.c1 as t2_c1, b
> from t1
> full outer join t2
> using (c1),
> lateral (
>   select b
>   from t3
>   where a = coalesce(t2.c1, 1)
> ) lt3;
> 1 1   2
> NULL  3   4
> Time taken: 2.395 seconds, Fetched 2 row(s)
> spark-sql (default)> 
> {noformat}
> The result should be the following:
> {noformat}
> 0 NULL2
> 1 1   2
> NULL  3   4
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43718) References to a specific side's key in a USING join can have wrong nullability

2023-05-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-43718:
-

Assignee: Bruce Robbins

> References to a specific side's key in a USING join can have wrong nullability
> --
>
> Key: SPARK-43718
> URL: https://issues.apache.org/jira/browse/SPARK-43718
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.0, 3.5.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Major
>  Labels: correctness
>
> Assume this data:
> {noformat}
> create or replace temp view t1 as values (1), (2), (3) as (c1);
> create or replace temp view t2 as values (2), (3), (4) as (c1);
> {noformat}
> The following query produces incorrect results:
> {noformat}
> spark-sql (default)> select explode(array(t1.c1, t2.c1)) as x1
> from t1
> full outer join t2
> using (c1);
> 1
> -1  <== should be null
> 2
> 2
> 3
> 3
> -1  <== should be null
> 4
> Time taken: 0.663 seconds, Fetched 8 row(s)
> spark-sql (default)> 
> {noformat}
> Similar issues occur with right outer join and left outer join.
> {{t1.c1}} and {{t2.c1}} have the wrong nullability at the time the array is 
> resolved, so the array's {{containsNull}} value is incorrect.
> Queries that don't use arrays also can get wrong results. Assume this data:
> {noformat}
> create or replace temp view t1 as values (0), (1), (2) as (c1);
> create or replace temp view t2 as values (1), (2), (3) as (c1);
> create or replace temp view t3 as values (1, 2), (3, 4), (4, 5) as (a, b);
> {noformat}
> The following query produces incorrect results:
> {noformat}
> select t1.c1 as t1_c1, t2.c1 as t2_c1, b
> from t1
> full outer join t2
> using (c1),
> lateral (
>   select b
>   from t3
>   where a = coalesce(t2.c1, 1)
> ) lt3;
> 1 1   2
> NULL  3   4
> Time taken: 2.395 seconds, Fetched 2 row(s)
> spark-sql (default)> 
> {noformat}
> The result should be the following:
> {noformat}
> 0 NULL2
> 1 1   2
> NULL  3   4
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43590) Make `CheckConnectJvmClientCompatibility` to compare client and protobuf

2023-05-22 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-43590.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41235
[https://github.com/apache/spark/pull/41235]

> Make `CheckConnectJvmClientCompatibility` to compare client and  protobuf
> -
>
> Key: SPARK-43590
> URL: https://issues.apache.org/jira/browse/SPARK-43590
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43590) Make `CheckConnectJvmClientCompatibility` to compare client and protobuf

2023-05-22 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-43590:


Assignee: Yang Jie

> Make `CheckConnectJvmClientCompatibility` to compare client and  protobuf
> -
>
> Key: SPARK-43590
> URL: https://issues.apache.org/jira/browse/SPARK-43590
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43739) Upgrade commons-io to 2.12.0

2023-05-22 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725200#comment-17725200
 ] 

Snoot.io commented on SPARK-43739:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/41271

> Upgrade commons-io to 2.12.0
> 
>
> Key: SPARK-43739
> URL: https://issues.apache.org/jira/browse/SPARK-43739
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43603) Rebalance pyspark.pandas.DataFrame Unit Tests

2023-05-22 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725198#comment-17725198
 ] 

Snoot.io commented on SPARK-43603:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/41258

> Rebalance pyspark.pandas.DataFrame Unit Tests
> -
>
> Key: SPARK-43603
> URL: https://issues.apache.org/jira/browse/SPARK-43603
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43384) Make `df.show` print a nice string for MapType

2023-05-22 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725195#comment-17725195
 ] 

Snoot.io commented on SPARK-43384:
--

User 'Yikf' has created a pull request for this issue:
https://github.com/apache/spark/pull/41065

> Make `df.show` print a nice string for MapType
> --
>
> Key: SPARK-43384
> URL: https://issues.apache.org/jira/browse/SPARK-43384
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: yikaifei
>Priority: Minor
>
> Make `df.show` print a nice string for MapType.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43719) Handle missing row.excludedInStages field

2023-05-22 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725193#comment-17725193
 ] 

Snoot.io commented on SPARK-43719:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/41266

> Handle missing row.excludedInStages field
> -
>
> Key: SPARK-43719
> URL: https://issues.apache.org/jira/browse/SPARK-43719
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43738) Upgrade dropwizard metrics 4.2.18

2023-05-22 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725191#comment-17725191
 ] 

Snoot.io commented on SPARK-43738:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/41270

> Upgrade dropwizard metrics 4.2.18
> -
>
> Key: SPARK-43738
> URL: https://issues.apache.org/jira/browse/SPARK-43738
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43740) Hide unsupported session methods from auto-completion

2023-05-22 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-43740:
-

 Summary: Hide unsupported session methods from auto-completion
 Key: SPARK-43740
 URL: https://issues.apache.org/jira/browse/SPARK-43740
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43586) There will be many invalid tasks when `Range.numSlices` > `Range.numElements`

2023-05-22 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725185#comment-17725185
 ] 

Snoot.io commented on SPARK-43586:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/41230

> There will be many invalid tasks when `Range.numSlices` > `Range.numElements`
> -
>
> Key: SPARK-43586
> URL: https://issues.apache.org/jira/browse/SPARK-43586
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Minor
> Attachments: image-2023-05-19-13-01-19-589.png
>
>
> For example, start a spark shell with `--master "local[100]"`, then run 
> `spark.range(10).map(_ + 1).reduce(_ + _)`, there will be 100 tasks in the 
> job, although there are only 10 elements in the Range:
> !image-2023-05-19-13-01-19-589.png|width=733,height=203!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42958) Refactor `CheckConnectJvmClientCompatibility` to compare client and avro

2023-05-22 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725184#comment-17725184
 ] 

Snoot.io commented on SPARK-42958:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/41233

> Refactor `CheckConnectJvmClientCompatibility` to compare client and avro
> 
>
> Key: SPARK-42958
> URL: https://issues.apache.org/jira/browse/SPARK-42958
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43590) Make `CheckConnectJvmClientCompatibility` to compare client and protobuf

2023-05-22 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725183#comment-17725183
 ] 

Snoot.io commented on SPARK-43590:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/41235

> Make `CheckConnectJvmClientCompatibility` to compare client and  protobuf
> -
>
> Key: SPARK-43590
> URL: https://issues.apache.org/jira/browse/SPARK-43590
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43612) Python: Artifact transfer from Scala/JVM client to Server

2023-05-22 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725182#comment-17725182
 ] 

Snoot.io commented on SPARK-43612:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/41250

> Python: Artifact transfer from Scala/JVM client to Server
> -
>
> Key: SPARK-43612
> URL: https://issues.apache.org/jira/browse/SPARK-43612
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.5.0
>
>
> Should implement https://issues.apache.org/jira/browse/SPARK-42653 in Python 
> Spark Connect client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43625) Document the difference between `Drop(column)` and `Drop(columnName)`

2023-05-22 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725181#comment-17725181
 ] 

Snoot.io commented on SPARK-43625:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/41273

> Document the difference between `Drop(column)` and `Drop(columnName)`
> -
>
> Key: SPARK-43625
> URL: https://issues.apache.org/jira/browse/SPARK-43625
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43612) Python: Artifact transfer from Scala/JVM client to Server

2023-05-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-43612.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41250
[https://github.com/apache/spark/pull/41250]

> Python: Artifact transfer from Scala/JVM client to Server
> -
>
> Key: SPARK-43612
> URL: https://issues.apache.org/jira/browse/SPARK-43612
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.5.0
>
>
> Should implement https://issues.apache.org/jira/browse/SPARK-42653 in Python 
> Spark Connect client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43612) Python: Artifact transfer from Scala/JVM client to Server

2023-05-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-43612:


Assignee: Hyukjin Kwon

> Python: Artifact transfer from Scala/JVM client to Server
> -
>
> Key: SPARK-43612
> URL: https://issues.apache.org/jira/browse/SPARK-43612
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> Should implement https://issues.apache.org/jira/browse/SPARK-42653 in Python 
> Spark Connect client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43540) Add working directory into classpath on the driver in K8S cluster mode

2023-05-22 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725178#comment-17725178
 ] 

Snoot.io commented on SPARK-43540:
--

User 'turboFei' has created a pull request for this issue:
https://github.com/apache/spark/pull/41201

> Add working directory into classpath on the driver in K8S cluster mode
> --
>
> Key: SPARK-43540
> URL: https://issues.apache.org/jira/browse/SPARK-43540
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Fei Wang
>Priority: Major
>
> In Yarn cluster modes, the passed files/jars are able to be accessed in the 
> classloader. Looks like this is not the case in Kubernetes cluster mode.
> After SPARK-33782, it places spark.files, spark.jars and spark.files under 
> the current working directory on the driver in K8S cluster mode. but the 
> spark.files and spark.jars seems are not accessible by the classloader.
>  
> we need to add the current working directory into classpath.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43625) Document the difference between `Drop(column)` and `Drop(columnName)`

2023-05-22 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725179#comment-17725179
 ] 

Snoot.io commented on SPARK-43625:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/41273

> Document the difference between `Drop(column)` and `Drop(columnName)`
> -
>
> Key: SPARK-43625
> URL: https://issues.apache.org/jira/browse/SPARK-43625
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43737) Upgrade zstd-jni to 1.5.5-3

2023-05-22 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725177#comment-17725177
 ] 

Snoot.io commented on SPARK-43737:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/41269

> Upgrade zstd-jni to 1.5.5-3
> ---
>
> Key: SPARK-43737
> URL: https://issues.apache.org/jira/browse/SPARK-43737
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43625) Document the difference between `Drop(column)` and `Drop(columnName)`

2023-05-22 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-43625:
--
Component/s: (was: SQL)
 Documentation

> Document the difference between `Drop(column)` and `Drop(columnName)`
> -
>
> Key: SPARK-43625
> URL: https://issues.apache.org/jira/browse/SPARK-43625
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43651) Assign a name to the error class _LEGACY_ERROR_TEMP_2403

2023-05-22 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725168#comment-17725168
 ] 

Snoot.io commented on SPARK-43651:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/41252

> Assign a name to the error class _LEGACY_ERROR_TEMP_2403
> 
>
> Key: SPARK-43651
> URL: https://issues.apache.org/jira/browse/SPARK-43651
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43651) Assign a name to the error class _LEGACY_ERROR_TEMP_2403

2023-05-22 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725167#comment-17725167
 ] 

Snoot.io commented on SPARK-43651:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/41252

> Assign a name to the error class _LEGACY_ERROR_TEMP_2403
> 
>
> Key: SPARK-43651
> URL: https://issues.apache.org/jira/browse/SPARK-43651
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43649) Assign a name to the error class _LEGACY_ERROR_TEMP_2401

2023-05-22 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725166#comment-17725166
 ] 

Snoot.io commented on SPARK-43649:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/41252

> Assign a name to the error class _LEGACY_ERROR_TEMP_2401
> 
>
> Key: SPARK-43649
> URL: https://issues.apache.org/jira/browse/SPARK-43649
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43650) Assign a name to the error class _LEGACY_ERROR_TEMP_2402

2023-05-22 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725165#comment-17725165
 ] 

Snoot.io commented on SPARK-43650:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/41252

> Assign a name to the error class _LEGACY_ERROR_TEMP_2402
> 
>
> Key: SPARK-43650
> URL: https://issues.apache.org/jira/browse/SPARK-43650
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41532) DF operations that involve multiple data frames should fail if sessions don't match

2023-05-22 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725164#comment-17725164
 ] 

Snoot.io commented on SPARK-41532:
--

User 'Hisoka-X' has created a pull request for this issue:
https://github.com/apache/spark/pull/41259

> DF operations that involve multiple data frames should fail if sessions don't 
> match
> ---
>
> Key: SPARK-41532
> URL: https://issues.apache.org/jira/browse/SPARK-41532
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Jia Fan
>Priority: Major
> Fix For: 3.5.0
>
>
> We do not support joining for example two data frames from different Spark 
> Connect Sessions. To avoid exceptions, the client should clearly fail when it 
> tries to construct such a composition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43649) Assign a name to the error class _LEGACY_ERROR_TEMP_2401

2023-05-22 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725163#comment-17725163
 ] 

Snoot.io commented on SPARK-43649:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/41252

> Assign a name to the error class _LEGACY_ERROR_TEMP_2401
> 
>
> Key: SPARK-43649
> URL: https://issues.apache.org/jira/browse/SPARK-43649
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43604) Refactor `INVALID_SQL_SYNTAX` for avoiding to embed error's text in source code

2023-05-22 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725162#comment-17725162
 ] 

Snoot.io commented on SPARK-43604:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/41254

> Refactor `INVALID_SQL_SYNTAX` for avoiding to embed error's text in source 
> code
> ---
>
> Key: SPARK-43604
> URL: https://issues.apache.org/jira/browse/SPARK-43604
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>
> As discussed in PR(https://github.com/apache/spark/pull/41214), embed error's 
> text in source code is unfriendly, such as hindering the internationalization 
> of subsequent error messages.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43739) Upgrade commons-io to 2.12.0

2023-05-22 Thread BingKun Pan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-43739:

Summary: Upgrade commons-io to 2.12.0  (was: Upgrade commons-io & 
commons-crypto to newest version)

> Upgrade commons-io to 2.12.0
> 
>
> Key: SPARK-43739
> URL: https://issues.apache.org/jira/browse/SPARK-43739
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43739) Upgrade commons-io & commons-crypto to newest version

2023-05-22 Thread BingKun Pan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-43739:

Summary: Upgrade commons-io & commons-crypto to newest version  (was: 
Upgrade 'commons-io' & 'commons-crypto' to newest version)

> Upgrade commons-io & commons-crypto to newest version
> -
>
> Key: SPARK-43739
> URL: https://issues.apache.org/jira/browse/SPARK-43739
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43739) Upgrade 'commons-io' & 'commons-crypto' to newest version

2023-05-22 Thread BingKun Pan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-43739:

Summary: Upgrade 'commons-io' & 'commons-crypto' to newest version  (was: 
Upgrade `commons-io` & `commons-crypto` to newest version)

> Upgrade 'commons-io' & 'commons-crypto' to newest version
> -
>
> Key: SPARK-43739
> URL: https://issues.apache.org/jira/browse/SPARK-43739
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43739) Upgrade `commons-io` & `commons-crypto` to newest version

2023-05-22 Thread BingKun Pan (Jira)

BingKun Pan created SPARK-43739:
---

 Summary: Upgrade `commons-io` & `commons-crypto` to newest version
 Key: SPARK-43739
 URL: https://issues.apache.org/jira/browse/SPARK-43739
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.5.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43738) Upgrade dropwizard metrics 4.2.18

2023-05-22 Thread BingKun Pan (Jira)

BingKun Pan created SPARK-43738:
---

 Summary: Upgrade dropwizard metrics 4.2.18
 Key: SPARK-43738
 URL: https://issues.apache.org/jira/browse/SPARK-43738
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.5.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42996) Assign JIRA tickets and add comments for all failing tests.

2023-05-22 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-42996.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41255
[https://github.com/apache/spark/pull/41255]

> Assign JIRA tickets and add comments for all failing tests.
> ---
>
> Key: SPARK-42996
> URL: https://issues.apache.org/jira/browse/SPARK-42996
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.5.0
>
>
> Adding details to parity tests instead of just "Fails in Spark Connect, 
> should enable".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42996) Assign JIRA tickets and add comments for all failing tests.

2023-05-22 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-42996:
-

Assignee: Haejoon Lee

> Assign JIRA tickets and add comments for all failing tests.
> ---
>
> Key: SPARK-42996
> URL: https://issues.apache.org/jira/browse/SPARK-42996
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> Adding details to parity tests instead of just "Fails in Spark Connect, 
> should enable".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43737) Upgrade zstd-jni to 1.5.5-3

2023-05-22 Thread BingKun Pan (Jira)

BingKun Pan created SPARK-43737:
---

 Summary: Upgrade zstd-jni to 1.5.5-3
 Key: SPARK-43737
 URL: https://issues.apache.org/jira/browse/SPARK-43737
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.5.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43514) Unexpected NullPointerException or IllegalArgumentException inside UDFs of ML features caused by certain SQL functions

2023-05-22 Thread Ritika Maheshwari (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725144#comment-17725144
 ] 

Ritika Maheshwari commented on SPARK-43514:
---

Unable to reproduce the error on 3.4.0

See the attachment. !Screen Shot 2023-05-22 at 5.39.55 PM.png!

 

> Unexpected NullPointerException or IllegalArgumentException inside UDFs of ML 
> features caused by certain SQL functions
> --
>
> Key: SPARK-43514
> URL: https://issues.apache.org/jira/browse/SPARK-43514
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SQL
>Affects Versions: 3.3.2, 3.4.0
> Environment: Scala version: 2.12.17
> Test examples were executed inside Zeppelin 0.10.1 with Spark 3.4.0.
> Spark 3.3.2 deployed on cluster was used to check the issue on real data.
>Reporter: Svyatoslav Semenyuk
>Priority: Major
>  Labels: ml, sql
> Attachments: Screen Shot 2023-05-22 at 5.39.55 PM.png
>
>
> We designed a function that joins two DFs on common column with some 
> similarity. All next code will be on Scala 2.12.
> I've added {{show}} calls for demonstration purposes.
> {code:scala}
> import org.apache.spark.ml.Pipeline
> import org.apache.spark.ml.feature.{HashingTF, MinHashLSH, NGram, 
> RegexTokenizer, MinHashLSHModel}
> import org.apache.spark.sql.{DataFrame, Column}
> /**
>  * Joins two data frames on a string column using LSH algorithm
>  * for similarity computation.
>  *
>  * If input data frames have columns with identical names,
>  * the resulting dataframe will have columns from them both
>  * with prefixes `datasetA` and `datasetB` respectively.
>  *
>  * For example, if both dataframes have a column with name `myColumn`,
>  * then the result will have columns `datasetAMyColumn` and 
> `datasetBMyColumn`.
>  */
> def similarityJoin(
> df: DataFrame,
> anotherDf: DataFrame,
> joinExpr: String,
> threshold: Double = 0.8,
> ): DataFrame = {
> df.show(false)
> anotherDf.show(false)
> val pipeline = new Pipeline().setStages(Array(
> new RegexTokenizer()
> .setPattern("")
> .setMinTokenLength(1)
> .setInputCol(joinExpr)
> .setOutputCol("tokens"),
> new NGram().setN(3).setInputCol("tokens").setOutputCol("ngrams"),
> new HashingTF().setInputCol("ngrams").setOutputCol("vectors"),
> new MinHashLSH().setInputCol("vectors").setOutputCol("lsh"),
> )
> )
> val model = pipeline.fit(df)
> val storedHashed = model.transform(df)
> val landedHashed = model.transform(anotherDf)
> val commonColumns = df.columns.toSet & anotherDf.columns.toSet
> /**
>  * Converts column name from a data frame to the column of resulting 
> dataset.
>  */
> def convertColumn(datasetName: String)(columnName: String): Column = {
> val newName =
> if (commonColumns.contains(columnName)) 
> s"$datasetName${columnName.capitalize}"
> else columnName
> col(s"$datasetName.$columnName") as newName
> }
> val columnsToSelect = df.columns.map(convertColumn("datasetA")) ++
>   anotherDf.columns.map(convertColumn("datasetB"))
> val result = model
> .stages
> .last
> .asInstanceOf[MinHashLSHModel]
> .approxSimilarityJoin(storedHashed, landedHashed, threshold, 
> "confidence")
> .select(columnsToSelect.toSeq: _*)
> result.show(false)
> result
> }
> {code}
> Now consider such simple example:
> {code:scala}
> val inputDF1 = Seq("", null).toDF("name").filter(length($"name") > 2) as "df1"
> val inputDF2 = Seq("", null).toDF("name").filter(length($"name") > 2) as "df2"
> similarityJoin(inputDF1, inputDF2, "name", 0.6)
> {code}
> This example runs with no errors and outputs 3 empty DFs. Let's add 
> {{distinct}} method to one data frame:
> {code:scala}
> val inputDF1 = Seq("", null).toDF("name").distinct().filter(length($"name") > 
> 2) as "df1"
> val inputDF2 = Seq("", null).toDF("name").filter(length($"name") > 2) as "df2"
> similarityJoin(inputDF1, inputDF2, "name", 0.6)
> {code}
> This example outputs two empty DFs and then fails at {{result.show(false)}}. 
> Error:
> {code:none}
> org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user 
> defined function (LSHModel$$Lambda$3769/0x000101804840: 
> (struct,values:array>) => 
> array,values:array>>).
>   ... many elided
> Caused by: java.lang.IllegalArgumentException: requirement failed: Must have 
> at least 1 non zero entry.
>   at scala.Predef$.require(Predef.scala:281)
>   at 
> org.apache.spark.ml.feature.MinHashLSHModel.hashFunction(MinHashLSH.scala:61)
>   at

[jira] [Updated] (SPARK-43514) Unexpected NullPointerException or IllegalArgumentException inside UDFs of ML features caused by certain SQL functions

2023-05-22 Thread Ritika Maheshwari (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ritika Maheshwari updated SPARK-43514:
--
Attachment: Screen Shot 2023-05-22 at 5.39.55 PM.png

> Unexpected NullPointerException or IllegalArgumentException inside UDFs of ML 
> features caused by certain SQL functions
> --
>
> Key: SPARK-43514
> URL: https://issues.apache.org/jira/browse/SPARK-43514
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SQL
>Affects Versions: 3.3.2, 3.4.0
> Environment: Scala version: 2.12.17
> Test examples were executed inside Zeppelin 0.10.1 with Spark 3.4.0.
> Spark 3.3.2 deployed on cluster was used to check the issue on real data.
>Reporter: Svyatoslav Semenyuk
>Priority: Major
>  Labels: ml, sql
> Attachments: Screen Shot 2023-05-22 at 5.39.55 PM.png
>
>
> We designed a function that joins two DFs on common column with some 
> similarity. All next code will be on Scala 2.12.
> I've added {{show}} calls for demonstration purposes.
> {code:scala}
> import org.apache.spark.ml.Pipeline
> import org.apache.spark.ml.feature.{HashingTF, MinHashLSH, NGram, 
> RegexTokenizer, MinHashLSHModel}
> import org.apache.spark.sql.{DataFrame, Column}
> /**
>  * Joins two data frames on a string column using LSH algorithm
>  * for similarity computation.
>  *
>  * If input data frames have columns with identical names,
>  * the resulting dataframe will have columns from them both
>  * with prefixes `datasetA` and `datasetB` respectively.
>  *
>  * For example, if both dataframes have a column with name `myColumn`,
>  * then the result will have columns `datasetAMyColumn` and 
> `datasetBMyColumn`.
>  */
> def similarityJoin(
> df: DataFrame,
> anotherDf: DataFrame,
> joinExpr: String,
> threshold: Double = 0.8,
> ): DataFrame = {
> df.show(false)
> anotherDf.show(false)
> val pipeline = new Pipeline().setStages(Array(
> new RegexTokenizer()
> .setPattern("")
> .setMinTokenLength(1)
> .setInputCol(joinExpr)
> .setOutputCol("tokens"),
> new NGram().setN(3).setInputCol("tokens").setOutputCol("ngrams"),
> new HashingTF().setInputCol("ngrams").setOutputCol("vectors"),
> new MinHashLSH().setInputCol("vectors").setOutputCol("lsh"),
> )
> )
> val model = pipeline.fit(df)
> val storedHashed = model.transform(df)
> val landedHashed = model.transform(anotherDf)
> val commonColumns = df.columns.toSet & anotherDf.columns.toSet
> /**
>  * Converts column name from a data frame to the column of resulting 
> dataset.
>  */
> def convertColumn(datasetName: String)(columnName: String): Column = {
> val newName =
> if (commonColumns.contains(columnName)) 
> s"$datasetName${columnName.capitalize}"
> else columnName
> col(s"$datasetName.$columnName") as newName
> }
> val columnsToSelect = df.columns.map(convertColumn("datasetA")) ++
>   anotherDf.columns.map(convertColumn("datasetB"))
> val result = model
> .stages
> .last
> .asInstanceOf[MinHashLSHModel]
> .approxSimilarityJoin(storedHashed, landedHashed, threshold, 
> "confidence")
> .select(columnsToSelect.toSeq: _*)
> result.show(false)
> result
> }
> {code}
> Now consider such simple example:
> {code:scala}
> val inputDF1 = Seq("", null).toDF("name").filter(length($"name") > 2) as "df1"
> val inputDF2 = Seq("", null).toDF("name").filter(length($"name") > 2) as "df2"
> similarityJoin(inputDF1, inputDF2, "name", 0.6)
> {code}
> This example runs with no errors and outputs 3 empty DFs. Let's add 
> {{distinct}} method to one data frame:
> {code:scala}
> val inputDF1 = Seq("", null).toDF("name").distinct().filter(length($"name") > 
> 2) as "df1"
> val inputDF2 = Seq("", null).toDF("name").filter(length($"name") > 2) as "df2"
> similarityJoin(inputDF1, inputDF2, "name", 0.6)
> {code}
> This example outputs two empty DFs and then fails at {{result.show(false)}}. 
> Error:
> {code:none}
> org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user 
> defined function (LSHModel$$Lambda$3769/0x000101804840: 
> (struct,values:array>) => 
> array,values:array>>).
>   ... many elided
> Caused by: java.lang.IllegalArgumentException: requirement failed: Must have 
> at least 1 non zero entry.
>   at scala.Predef$.require(Predef.scala:281)
>   at 
> org.apache.spark.ml.feature.MinHashLSHModel.hashFunction(MinHashLSH.scala:61)
>   at org.apache.spark.ml.feature.LSHModel.$anonfun$transform$1(LSH.scala:99)
>   ... many more
> {code}

[jira] [Commented] (SPARK-43718) References to a specific side's key in a USING join can have wrong nullability

2023-05-22 Thread Bruce Robbins (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725143#comment-17725143
 ] 

Bruce Robbins commented on SPARK-43718:
---

PR here: https://github.com/apache/spark/pull/41267

> References to a specific side's key in a USING join can have wrong nullability
> --
>
> Key: SPARK-43718
> URL: https://issues.apache.org/jira/browse/SPARK-43718
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.0, 3.5.0
>Reporter: Bruce Robbins
>Priority: Major
>  Labels: correctness
>
> Assume this data:
> {noformat}
> create or replace temp view t1 as values (1), (2), (3) as (c1);
> create or replace temp view t2 as values (2), (3), (4) as (c1);
> {noformat}
> The following query produces incorrect results:
> {noformat}
> spark-sql (default)> select explode(array(t1.c1, t2.c1)) as x1
> from t1
> full outer join t2
> using (c1);
> 1
> -1  <== should be null
> 2
> 2
> 3
> 3
> -1  <== should be null
> 4
> Time taken: 0.663 seconds, Fetched 8 row(s)
> spark-sql (default)> 
> {noformat}
> Similar issues occur with right outer join and left outer join.
> {{t1.c1}} and {{t2.c1}} have the wrong nullability at the time the array is 
> resolved, so the array's {{containsNull}} value is incorrect.
> Queries that don't use arrays also can get wrong results. Assume this data:
> {noformat}
> create or replace temp view t1 as values (0), (1), (2) as (c1);
> create or replace temp view t2 as values (1), (2), (3) as (c1);
> create or replace temp view t3 as values (1, 2), (3, 4), (4, 5) as (a, b);
> {noformat}
> The following query produces incorrect results:
> {noformat}
> select t1.c1 as t1_c1, t2.c1 as t2_c1, b
> from t1
> full outer join t2
> using (c1),
> lateral (
>   select b
>   from t3
>   where a = coalesce(t2.c1, 1)
> ) lt3;
> 1 1   2
> NULL  3   4
> Time taken: 2.395 seconds, Fetched 2 row(s)
> spark-sql (default)> 
> {noformat}
> The result should be the following:
> {noformat}
> 0 NULL2
> 1 1   2
> NULL  3   4
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43546) Complete parity tests of Pandas UDF

2023-05-22 Thread Xinrong Meng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-43546:
-
Summary: Complete parity tests of Pandas UDF  (was: Complete Pandas UDF 
parity tests)

> Complete parity tests of Pandas UDF
> ---
>
> Key: SPARK-43546
> URL: https://issues.apache.org/jira/browse/SPARK-43546
> Project: Spark
>  Issue Type: Test
>  Components: Connect, PySpark
>Affects Versions: 3.5.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Tests as shown below should be added to Connect.
> test_pandas_udf_grouped_agg.py
> test_pandas_udf_scalar.py
> test_pandas_udf_window.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43735) Enable SeriesDateTimeTests.test_weekday for pandas 2.0.0.

2023-05-22 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-43735:
---

 Summary: Enable SeriesDateTimeTests.test_weekday for pandas 2.0.0.
 Key: SPARK-43735
 URL: https://issues.apache.org/jira/browse/SPARK-43735
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark
Affects Versions: 3.5.0
Reporter: Haejoon Lee






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43733) Enable SeriesDateTimeTests.test_second for pandas 2.0.0.

2023-05-22 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-43733:
---

 Summary: Enable SeriesDateTimeTests.test_second for pandas 2.0.0.
 Key: SPARK-43733
 URL: https://issues.apache.org/jira/browse/SPARK-43733
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark
Affects Versions: 3.5.0
Reporter: Haejoon Lee






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43736) Enable SeriesDateTimeTests.test_year for pandas 2.0.0.

2023-05-22 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-43736:
---

 Summary: Enable SeriesDateTimeTests.test_year for pandas 2.0.0.
 Key: SPARK-43736
 URL: https://issues.apache.org/jira/browse/SPARK-43736
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark
Affects Versions: 3.5.0
Reporter: Haejoon Lee






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43734) Expression "(v)" within a window function doesn't raise a AnalysisException

2023-05-22 Thread Xinrong Meng (Jira)

Xinrong Meng created SPARK-43734:


 Summary: Expression "(v)" within a window function doesn't 
raise a AnalysisException
 Key: SPARK-43734
 URL: https://issues.apache.org/jira/browse/SPARK-43734
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 3.5.0
Reporter: Xinrong Meng


Expression "(v)" within a window function doesn't raise a 
AnalysisException

 

See PandasUDFWindowParityTests.test_invalid_args for reproduction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43731) Enable SeriesDateTimeTests.test_month for pandas 2.0.0.

2023-05-22 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-43731:
---

 Summary: Enable SeriesDateTimeTests.test_month for pandas 2.0.0.
 Key: SPARK-43731
 URL: https://issues.apache.org/jira/browse/SPARK-43731
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark
Affects Versions: 3.5.0
Reporter: Haejoon Lee






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43732) Enable SeriesDateTimeTests.test_quarter for pandas 2.0.0.

2023-05-22 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-43732:
---

 Summary: Enable SeriesDateTimeTests.test_quarter for pandas 2.0.0.
 Key: SPARK-43732
 URL: https://issues.apache.org/jira/browse/SPARK-43732
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark
Affects Versions: 3.5.0
Reporter: Haejoon Lee






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43729) Enable SeriesDateTimeTests.test_microsecond for pandas 2.0.0.

2023-05-22 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-43729:
---

 Summary: Enable SeriesDateTimeTests.test_microsecond for pandas 
2.0.0.
 Key: SPARK-43729
 URL: https://issues.apache.org/jira/browse/SPARK-43729
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark
Affects Versions: 3.5.0
Reporter: Haejoon Lee






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43728) Enable SeriesDateTimeTests.test_hour for pandas 2.0.0.

2023-05-22 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-43728:
---

 Summary: Enable SeriesDateTimeTests.test_hour for pandas 2.0.0.
 Key: SPARK-43728
 URL: https://issues.apache.org/jira/browse/SPARK-43728
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark
Affects Versions: 3.5.0
Reporter: Haejoon Lee






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43730) Enable SeriesDateTimeTests.test_minute for pandas 2.0.0.

2023-05-22 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-43730:
---

 Summary: Enable SeriesDateTimeTests.test_minute for pandas 2.0.0.
 Key: SPARK-43730
 URL: https://issues.apache.org/jira/browse/SPARK-43730
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark
Affects Versions: 3.5.0
Reporter: Haejoon Lee






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43727) Parity returnType check in Spark Connect

2023-05-22 Thread Xinrong Meng (Jira)

Xinrong Meng created SPARK-43727:


 Summary: Parity returnType check in Spark Connect
 Key: SPARK-43727
 URL: https://issues.apache.org/jira/browse/SPARK-43727
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.5.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43726) Enable SeriesDateTimeTests.test_daysinmonth for pandas 2.0.0.

2023-05-22 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-43726:
---

 Summary: Enable SeriesDateTimeTests.test_daysinmonth for pandas 
2.0.0.
 Key: SPARK-43726
 URL: https://issues.apache.org/jira/browse/SPARK-43726
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark
Affects Versions: 3.5.0
Reporter: Haejoon Lee






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43725) Enable SeriesDateTimeTests.test_days_in_month for pandas 2.0.0.

2023-05-22 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-43725:
---

 Summary: Enable SeriesDateTimeTests.test_days_in_month for pandas 
2.0.0.
 Key: SPARK-43725
 URL: https://issues.apache.org/jira/browse/SPARK-43725
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark
Affects Versions: 3.5.0
Reporter: Haejoon Lee






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43724) Enable SeriesDateTimeTests.test_dayofyear for pandas 2.0.0.

2023-05-22 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-43724:
---

 Summary: Enable SeriesDateTimeTests.test_dayofyear for pandas 
2.0.0.
 Key: SPARK-43724
 URL: https://issues.apache.org/jira/browse/SPARK-43724
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark
Affects Versions: 3.5.0
Reporter: Haejoon Lee






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43723) Enable SeriesDateTimeTests.test_dayofweek for pandas 2.0.0.

2023-05-22 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-43723:
---

 Summary: Enable SeriesDateTimeTests.test_dayofweek for pandas 
2.0.0.
 Key: SPARK-43723
 URL: https://issues.apache.org/jira/browse/SPARK-43723
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark
Affects Versions: 3.5.0
Reporter: Haejoon Lee






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43722) Enable SeriesDateTimeTests.test_day for pandas 2.0.0.

2023-05-22 Thread Haejoon Lee (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43722:

Summary: Enable SeriesDateTimeTests.test_day for pandas 2.0.0.  (was: 
Enable SeriesTests.test_day for pandas 2.0.0.)

> Enable SeriesDateTimeTests.test_day for pandas 2.0.0.
> -
>
> Key: SPARK-43722
> URL: https://issues.apache.org/jira/browse/SPARK-43722
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43721) Enable DataFramePlotMatplotlibTests.test_kde_plot for pandas 2.0.0.

2023-05-22 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-43721:
---

 Summary: Enable DataFramePlotMatplotlibTests.test_kde_plot for 
pandas 2.0.0.
 Key: SPARK-43721
 URL: https://issues.apache.org/jira/browse/SPARK-43721
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark
Affects Versions: 3.5.0
Reporter: Haejoon Lee






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43722) Enable SeriesTests.test_day for pandas 2.0.0.

2023-05-22 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-43722:
---

 Summary: Enable SeriesTests.test_day for pandas 2.0.0.
 Key: SPARK-43722
 URL: https://issues.apache.org/jira/browse/SPARK-43722
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark
Affects Versions: 3.5.0
Reporter: Haejoon Lee






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43720) Enable DataFramePlotMatplotlibTests.test_hist_plot for pandas 2.0.0.

2023-05-22 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-43720:
---

 Summary: Enable DataFramePlotMatplotlibTests.test_hist_plot for 
pandas 2.0.0.
 Key: SPARK-43720
 URL: https://issues.apache.org/jira/browse/SPARK-43720
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark
Affects Versions: 3.5.0
Reporter: Haejoon Lee






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43715) Add spark DataFrame binary file reader / writer

2023-05-22 Thread Weichen Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weichen Xu reassigned SPARK-43715:
--

Assignee: Weichen Xu

> Add spark DataFrame binary file reader / writer
> ---
>
> Key: SPARK-43715
> URL: https://issues.apache.org/jira/browse/SPARK-43715
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.5.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Major
>
> In new distributed spark ML module (designed to support spark connect and 
> support local inference)
> We need to save ML model to hadoop file system using custom binary file 
> format, the reason is:
>  * We often submit a spark application to spark cluster for running the 
> training model job, we need to save trained model to hadoop file system 
> before the spark application completes.
>  * But we want to support local model inference, that means if we save the 
> model by current spark DataFrame writer (e.g. parquet format), when loading 
> model we have to rely on the spark service. But we hope we can load model 
> without spark service. So we want the model being saved as the original 
> binary format that our ML code can handle.
> so we need to add a DataFrame reader / writer format, that can load / save 
> binary files, the API is like:
>  
> {*}Writer API{*}:
> Supposing we have a dataframe with schema:
> [file_path: String, content: binary],
> we can save the dataframe to a hadoop path, each row we will save it as a 
> file under the hadoop path, the saved file path is \{hadoop 
> path}/\{file_path}, "file_path" can be a multiple part path.
>  
> {*}Reader API{*}:
> `spark.read.format("binaryFileV2").load(...)`
>  
> It will return a spark dataframe , each row contains the file path and the 
> file content binary string.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43538) Spark Homebrew Formulae currently depends on non-officially-supported Java 20

2023-05-22 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-43538:

Issue Type: Bug  (was: Request)

> Spark Homebrew Formulae currently depends on non-officially-supported Java 20
> -
>
> Key: SPARK-43538
> URL: https://issues.apache.org/jira/browse/SPARK-43538
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 3.2.4, 3.3.2, 3.4.0
> Environment: Homebrew (e.g., macOS)
>Reporter: Ghislain Fourny
>Assignee: Yuming Wang
>Priority: Minor
> Fix For: 3.4.0
>
>
> I am not sure if homebrew-related issues can also be reported here? The 
> Homebrew formulae for apache-spark runs on (latest) openjdk 20.
> [https://formulae.brew.sh/formula/apache-spark]
> However, Apache Spark is documented to work with Java 8/11/17:
> [https://spark.apache.org/docs/latest/]
> Is this an overlook, or is Java 20 officially supported, too?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43538) Spark Homebrew Formulae currently depends on non-officially-supported Java 20

2023-05-22 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-43538.
-
Fix Version/s: 3.4.0
 Assignee: Yuming Wang
   Resolution: Fixed

Issue resolved by pull request 
[https://github.com/Homebrew/homebrew-core/pull/131189]. Please reinstall it if 
you have installed the Spark Homebrew formulae.


> Spark Homebrew Formulae currently depends on non-officially-supported Java 20
> -
>
> Key: SPARK-43538
> URL: https://issues.apache.org/jira/browse/SPARK-43538
> Project: Spark
>  Issue Type: Request
>  Components: Java API
>Affects Versions: 3.2.4, 3.3.2, 3.4.0
> Environment: Homebrew (e.g., macOS)
>Reporter: Ghislain Fourny
>Assignee: Yuming Wang
>Priority: Minor
> Fix For: 3.4.0
>
>
> I am not sure if homebrew-related issues can also be reported here? The 
> Homebrew formulae for apache-spark runs on (latest) openjdk 20.
> [https://formulae.brew.sh/formula/apache-spark]
> However, Apache Spark is documented to work with Java 8/11/17:
> [https://spark.apache.org/docs/latest/]
> Is this an overlook, or is Java 20 officially supported, too?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43719) Handle missing row.excludedInStages field

2023-05-22 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-43719:
-

 Summary: Handle missing row.excludedInStages field
 Key: SPARK-43719
 URL: https://issues.apache.org/jira/browse/SPARK-43719
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40500) Use `pd.items` instead of `pd.iteritems`

2023-05-22 Thread Saiwing Yeung (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725129#comment-17725129
 ] 

Saiwing Yeung commented on SPARK-40500:
---

This would also be useful for me.

If it's useful I can make a PR that patches 3.2 (same scope as 
https://github.com/apache/spark/pull/37947).

> Use `pd.items` instead of `pd.iteritems`
> 
>
> Key: SPARK-40500
> URL: https://issues.apache.org/jira/browse/SPARK-40500
> Project: Spark
>  Issue Type: Improvement
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43718) References to a specific side's key in a USING join can have wrong nullability

2023-05-22 Thread Bruce Robbins (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruce Robbins updated SPARK-43718:
--
Description: 
Assume this data:
{noformat}
create or replace temp view t1 as values (1), (2), (3) as (c1);
create or replace temp view t2 as values (2), (3), (4) as (c1);
{noformat}
The following query produces incorrect results:
{noformat}
spark-sql (default)> select explode(array(t1.c1, t2.c1)) as x1
from t1
full outer join t2
using (c1);
1
-1  <== should be null
2
2
3
3
-1  <== should be null
4
Time taken: 0.663 seconds, Fetched 8 row(s)
spark-sql (default)> 
{noformat}
Similar issues occur with right outer join and left outer join.

{{t1.c1}} and {{t2.c1}} have the wrong nullability at the time the array is 
resolved, so the array's {{containsNull}} value is incorrect.

Queries that don't use arrays also can get wrong results. Assume this data:
{noformat}
create or replace temp view t1 as values (0), (1), (2) as (c1);
create or replace temp view t2 as values (1), (2), (3) as (c1);
create or replace temp view t3 as values (1, 2), (3, 4), (4, 5) as (a, b);
{noformat}
The following query produces incorrect results:
{noformat}
select t1.c1 as t1_c1, t2.c1 as t2_c1, b
from t1
full outer join t2
using (c1),
lateral (
  select b
  from t3
  where a = coalesce(t2.c1, 1)
) lt3;
1   1   2
NULL3   4
Time taken: 2.395 seconds, Fetched 2 row(s)
spark-sql (default)> 
{noformat}
The result should be the following:
{noformat}
0   NULL2
1   1   2
NULL3   4
{noformat}



  was:
Assume this data:
{noformat}
create or replace temp view t1 as values (1), (2), (3) as (c1);
create or replace temp view t2 as values (2), (3), (4) as (c1);
{noformat}
The following query produces incorrect results:
{noformat}
spark-sql (default)> select explode(array(t1.c1, t2.c1)) as x1
from t1
full outer join t2
using (c1);
1
-1  <== should be null
2
2
3
3
-1  <== should be null
4
Time taken: 0.663 seconds, Fetched 8 row(s)
spark-sql (default)> 
{noformat}
Similar issues occur with right outer join and left outer join.

{{t1.c1}} and {{t2.c1}} have the wrong nullability at the time the array is 
resolved, so the array's {{containsNull}} value is incorrect.


> References to a specific side's key in a USING join can have wrong nullability
> --
>
> Key: SPARK-43718
> URL: https://issues.apache.org/jira/browse/SPARK-43718
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.0, 3.5.0
>Reporter: Bruce Robbins
>Priority: Major
>  Labels: correctness
>
> Assume this data:
> {noformat}
> create or replace temp view t1 as values (1), (2), (3) as (c1);
> create or replace temp view t2 as values (2), (3), (4) as (c1);
> {noformat}
> The following query produces incorrect results:
> {noformat}
> spark-sql (default)> select explode(array(t1.c1, t2.c1)) as x1
> from t1
> full outer join t2
> using (c1);
> 1
> -1  <== should be null
> 2
> 2
> 3
> 3
> -1  <== should be null
> 4
> Time taken: 0.663 seconds, Fetched 8 row(s)
> spark-sql (default)> 
> {noformat}
> Similar issues occur with right outer join and left outer join.
> {{t1.c1}} and {{t2.c1}} have the wrong nullability at the time the array is 
> resolved, so the array's {{containsNull}} value is incorrect.
> Queries that don't use arrays also can get wrong results. Assume this data:
> {noformat}
> create or replace temp view t1 as values (0), (1), (2) as (c1);
> create or replace temp view t2 as values (1), (2), (3) as (c1);
> create or replace temp view t3 as values (1, 2), (3, 4), (4, 5) as (a, b);
> {noformat}
> The following query produces incorrect results:
> {noformat}
> select t1.c1 as t1_c1, t2.c1 as t2_c1, b
> from t1
> full outer join t2
> using (c1),
> lateral (
>   select b
>   from t3
>   where a = coalesce(t2.c1, 1)
> ) lt3;
> 1 1   2
> NULL  3   4
> Time taken: 2.395 seconds, Fetched 2 row(s)
> spark-sql (default)> 
> {noformat}
> The result should be the following:
> {noformat}
> 0 NULL2
> 1 1   2
> NULL  3   4
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43718) References to a specific side's key in a USING join can have wrong nullability

2023-05-22 Thread Bruce Robbins (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruce Robbins updated SPARK-43718:
--
Affects Version/s: 3.3.2

> References to a specific side's key in a USING join can have wrong nullability
> --
>
> Key: SPARK-43718
> URL: https://issues.apache.org/jira/browse/SPARK-43718
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.0, 3.5.0
>Reporter: Bruce Robbins
>Priority: Major
>  Labels: correctness
>
> Assume this data:
> {noformat}
> create or replace temp view t1 as values (1), (2), (3) as (c1);
> create or replace temp view t2 as values (2), (3), (4) as (c1);
> {noformat}
> The following query produces incorrect results:
> {noformat}
> spark-sql (default)> select explode(array(t1.c1, t2.c1)) as x1
> from t1
> full outer join t2
> using (c1);
> 1
> -1  <== should be null
> 2
> 2
> 3
> 3
> -1  <== should be null
> 4
> Time taken: 0.663 seconds, Fetched 8 row(s)
> spark-sql (default)> 
> {noformat}
> Similar issues occur with right outer join and left outer join.
> {{t1.c1}} and {{t2.c1}} have the wrong nullability at the time the array is 
> resolved, so the array's {{containsNull}} value is incorrect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43718) References to a specific side's key in a USING join can have wrong nullability

2023-05-22 Thread Bruce Robbins (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruce Robbins updated SPARK-43718:
--
Affects Version/s: 3.4.0

> References to a specific side's key in a USING join can have wrong nullability
> --
>
> Key: SPARK-43718
> URL: https://issues.apache.org/jira/browse/SPARK-43718
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Bruce Robbins
>Priority: Major
>  Labels: correctness
>
> Assume this data:
> {noformat}
> create or replace temp view t1 as values (1), (2), (3) as (c1);
> create or replace temp view t2 as values (2), (3), (4) as (c1);
> {noformat}
> The following query produces incorrect results:
> {noformat}
> spark-sql (default)> select explode(array(t1.c1, t2.c1)) as x1
> from t1
> full outer join t2
> using (c1);
> 1
> -1  <== should be null
> 2
> 2
> 3
> 3
> -1  <== should be null
> 4
> Time taken: 0.663 seconds, Fetched 8 row(s)
> spark-sql (default)> 
> {noformat}
> Similar issues occur with right outer join and left outer join.
> {{t1.c1}} and {{t2.c1}} have the wrong nullability at the time the array is 
> resolved, so the array's {{containsNull}} value is incorrect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43718) References to a specific side's key in a USING join can have wrong nullability

2023-05-22 Thread Bruce Robbins (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725122#comment-17725122
 ] 

Bruce Robbins commented on SPARK-43718:
---

I think I have a handle on this. I will submit in a PR in the coming days.

> References to a specific side's key in a USING join can have wrong nullability
> --
>
> Key: SPARK-43718
> URL: https://issues.apache.org/jira/browse/SPARK-43718
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Bruce Robbins
>Priority: Major
>  Labels: correctness
>
> Assume this data:
> {noformat}
> create or replace temp view t1 as values (1), (2), (3) as (c1);
> create or replace temp view t2 as values (2), (3), (4) as (c1);
> {noformat}
> The following query produces incorrect results:
> {noformat}
> spark-sql (default)> select explode(array(t1.c1, t2.c1)) as x1
> from t1
> full outer join t2
> using (c1);
> 1
> -1  <== should be null
> 2
> 2
> 3
> 3
> -1  <== should be null
> 4
> Time taken: 0.663 seconds, Fetched 8 row(s)
> spark-sql (default)> 
> {noformat}
> Similar issues occur with right outer join and left outer join.
> {{t1.c1}} and {{t2.c1}} have the wrong nullability at the time the array is 
> resolved, so the array's {{containsNull}} value is incorrect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43716) Revert scala-maven-plugin to 4.8.0

2023-05-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-43716:
--
Summary: Revert scala-maven-plugin to 4.8.0  (was: Revert 
scala-maven-plugin upgrade)

> Revert scala-maven-plugin to 4.8.0
> --
>
> Key: SPARK-43716
> URL: https://issues.apache.org/jira/browse/SPARK-43716
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42809) Upgrade scala-maven-plugin from 4.8.0 to 4.8.1

2023-05-22 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725121#comment-17725121
 ] 

Dongjoon Hyun commented on SPARK-42809:
---

This is logically reverted via SPARK-43716

> Upgrade scala-maven-plugin from 4.8.0 to 4.8.1
> --
>
> Key: SPARK-42809
> URL: https://issues.apache.org/jira/browse/SPARK-42809
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43716) Revert scala-maven-plugin upgrade

2023-05-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-43716.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41261
[https://github.com/apache/spark/pull/41261]

> Revert scala-maven-plugin upgrade
> -
>
> Key: SPARK-43716
> URL: https://issues.apache.org/jira/browse/SPARK-43716
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43716) Revert scala-maven-plugin upgrade

2023-05-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-43716:
-

Assignee: Bjørn Jørgensen

> Revert scala-maven-plugin upgrade
> -
>
> Key: SPARK-43716
> URL: https://issues.apache.org/jira/browse/SPARK-43716
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43718) References to a specific side's key in a USING join can have wrong nullability

2023-05-22 Thread Bruce Robbins (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruce Robbins updated SPARK-43718:
--
Labels: correctness  (was: )

> References to a specific side's key in a USING join can have wrong nullability
> --
>
> Key: SPARK-43718
> URL: https://issues.apache.org/jira/browse/SPARK-43718
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Bruce Robbins
>Priority: Major
>  Labels: correctness
>
> Assume this data:
> {noformat}
> create or replace temp view t1 as values (1), (2), (3) as (c1);
> create or replace temp view t2 as values (2), (3), (4) as (c1);
> {noformat}
> The following query produces incorrect results:
> {noformat}
> spark-sql (default)> select explode(array(t1.c1, t2.c1)) as x1
> from t1
> full outer join t2
> using (c1);
> 1
> -1  <== should be null
> 2
> 2
> 3
> 3
> -1  <== should be null
> 4
> Time taken: 0.663 seconds, Fetched 8 row(s)
> spark-sql (default)> 
> {noformat}
> Similar issues occur with right outer join and left outer join.
> {{t1.c1}} and {{t2.c1}} have the wrong nullability at the time the array is 
> resolved, so the array's {{containsNull}} value is incorrect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43718) References to a specific side's key in a USING join can have wrong nullability

2023-05-22 Thread Bruce Robbins (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruce Robbins updated SPARK-43718:
--
Description: 
Assume this data:
{noformat}
create or replace temp view t1 as values (1), (2), (3) as (c1);
create or replace temp view t2 as values (2), (3), (4) as (c1);
{noformat}
The following query produces incorrect results:
{noformat}
spark-sql (default)> select explode(array(t1.c1, t2.c1)) as x1
from t1
full outer join t2
using (c1);
1
-1  <== should be null
2
2
3
3
-1  <== should be null
4
Time taken: 0.663 seconds, Fetched 8 row(s)
spark-sql (default)> 
{noformat}
Similar issues occur with right outer join and left outer join.

{{t1.c1}} and {{t2.c1}} have the wrong nullability at the time the array is 
resolved, so the array's {{containsNull}} value is incorrect.

  was:
Assume this data:
{noformat}
create or replace temp view t1 as values (1), (2), (3) as (c1);
create or replace temp view t2 as values (2), (3), (4) as (c1);
{noformat}
The following query produces the wrong result:
{noformat}
spark-sql (default)> select explode(array(t1.c1, t2.c1)) as x1
from t1
full outer join t2
using (c1);
1
-1  <== should be null
2
2
3
3
-1  <== should be null
4
Time taken: 0.663 seconds, Fetched 8 row(s)
spark-sql (default)> 
{noformat}
Similar issues occur with right outer join and left outer join.

{{t1.c1}} and {{t2.c1}} have the wrong nullability at the time the array is 
resolved, so the array's {{containsNull}} value is incorrect.


> References to a specific side's key in a USING join can have wrong nullability
> --
>
> Key: SPARK-43718
> URL: https://issues.apache.org/jira/browse/SPARK-43718
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Bruce Robbins
>Priority: Major
>
> Assume this data:
> {noformat}
> create or replace temp view t1 as values (1), (2), (3) as (c1);
> create or replace temp view t2 as values (2), (3), (4) as (c1);
> {noformat}
> The following query produces incorrect results:
> {noformat}
> spark-sql (default)> select explode(array(t1.c1, t2.c1)) as x1
> from t1
> full outer join t2
> using (c1);
> 1
> -1  <== should be null
> 2
> 2
> 3
> 3
> -1  <== should be null
> 4
> Time taken: 0.663 seconds, Fetched 8 row(s)
> spark-sql (default)> 
> {noformat}
> Similar issues occur with right outer join and left outer join.
> {{t1.c1}} and {{t2.c1}} have the wrong nullability at the time the array is 
> resolved, so the array's {{containsNull}} value is incorrect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43591) Assign a name to the error class _LEGACY_ERROR_TEMP_0013

2023-05-22 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-43591.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41236
[https://github.com/apache/spark/pull/41236]

> Assign a name to the error class _LEGACY_ERROR_TEMP_0013
> 
>
> Key: SPARK-43591
> URL: https://issues.apache.org/jira/browse/SPARK-43591
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43718) References to a specific side's key in a USING join can have wrong nullability

2023-05-22 Thread Bruce Robbins (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruce Robbins updated SPARK-43718:
--
Description: 
Assume this data:
{noformat}
create or replace temp view t1 as values (1), (2), (3) as (c1);
create or replace temp view t2 as values (2), (3), (4) as (c1);
{noformat}
The following query produces the wrong result:
{noformat}
spark-sql (default)> select explode(array(t1.c1, t2.c1)) as x1
from t1
full outer join t2
using (c1);
1
-1  <== should be null
2
2
3
3
-1  <== should be null
4
Time taken: 0.663 seconds, Fetched 8 row(s)
spark-sql (default)> 
{noformat}
Similar issues occur with right outer join and left outer join.

{{t1.c1}} and {{t2.c1}} have the wrong nullability at the time the array is 
resolved, so the array's {{containsNull}} value is incorrect.

  was:
Assume this data:
{noformat}
create or replace temp view t1 as values (1), (2), (3) as (c1);
create or replace temp view t2 as values (2), (3), (4) as (c1);
{noformat}
The following query produces the wrong result:
{noformat}
spark-sql (default)> select explode(array(t1.c1, t2.c1)) as x1
from t1
full outer join t2
using (c1);
1
-1  <== should be null
2
2
3
3
-1  <== should be null
4
Time taken: 0.663 seconds, Fetched 8 row(s)
spark-sql (default)> 
{noformat}
Similar issues occur with right outer join and left outer join.



> References to a specific side's key in a USING join can have wrong nullability
> --
>
> Key: SPARK-43718
> URL: https://issues.apache.org/jira/browse/SPARK-43718
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Bruce Robbins
>Priority: Major
>
> Assume this data:
> {noformat}
> create or replace temp view t1 as values (1), (2), (3) as (c1);
> create or replace temp view t2 as values (2), (3), (4) as (c1);
> {noformat}
> The following query produces the wrong result:
> {noformat}
> spark-sql (default)> select explode(array(t1.c1, t2.c1)) as x1
> from t1
> full outer join t2
> using (c1);
> 1
> -1  <== should be null
> 2
> 2
> 3
> 3
> -1  <== should be null
> 4
> Time taken: 0.663 seconds, Fetched 8 row(s)
> spark-sql (default)> 
> {noformat}
> Similar issues occur with right outer join and left outer join.
> {{t1.c1}} and {{t2.c1}} have the wrong nullability at the time the array is 
> resolved, so the array's {{containsNull}} value is incorrect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43591) Assign a name to the error class _LEGACY_ERROR_TEMP_0013

2023-05-22 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-43591:


Assignee: BingKun Pan

> Assign a name to the error class _LEGACY_ERROR_TEMP_0013
> 
>
> Key: SPARK-43591
> URL: https://issues.apache.org/jira/browse/SPARK-43591
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43583) When encryption is enabled on the External Shuffle Service, then processing of push meta requests throws NPE

2023-05-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-43583:
--
Affects Version/s: 3.4.0
   3.3.2
   3.2.4
   (was: 3.2.0)

> When encryption is enabled on the External Shuffle Service, then processing 
> of push meta requests throws NPE
> 
>
> Key: SPARK-43583
> URL: https://issues.apache.org/jira/browse/SPARK-43583
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.2.4, 3.3.2, 3.4.0
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Fix For: 3.5.0
>
>
> After enabling support for over-the-wire encryption for spark shuffle 
> services, the meta requests for push-merged blocks fail with this error:
> {code:java}
> java.lang.RuntimeException: java.lang.NullPointerException
>   at 
> org.apache.spark.network.server.AbstractAuthRpcHandler.getMergedBlockMetaReqHandler(AbstractAuthRpcHandler.java:110)
>   at 
> org.apache.spark.network.crypto.AuthRpcHandler.getMergedBlockMetaReqHandler(AuthRpcHandler.java:144)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processMergedBlockMetaRequest(TransportRequestHandler.java:275)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:117)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
>   at 
> org.sparkproject.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
>   at 
> org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   at 
> org.sparkproject.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43718) References to a specific side's key in a USING join can have wrong nullability

2023-05-22 Thread Bruce Robbins (Jira)

Bruce Robbins created SPARK-43718:
-

 Summary: References to a specific side's key in a USING join can 
have wrong nullability
 Key: SPARK-43718
 URL: https://issues.apache.org/jira/browse/SPARK-43718
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0
Reporter: Bruce Robbins


Assume this data:
{noformat}
create or replace temp view t1 as values (1), (2), (3) as (c1);
create or replace temp view t2 as values (2), (3), (4) as (c1);
{noformat}
The following query produces the wrong result:
{noformat}
spark-sql (default)> select explode(array(t1.c1, t2.c1)) as x1
from t1
full outer join t2
using (c1);
1
-1  <== should be null
2
2
3
3
-1  <== should be null
4
Time taken: 0.663 seconds, Fetched 8 row(s)
spark-sql (default)> 
{noformat}
Similar issues occur with right outer join and left outer join.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43583) When encryption is enabled on the External Shuffle Service, then processing of push meta requests throws NPE

2023-05-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-43583:
-

Assignee: Chandni Singh

> When encryption is enabled on the External Shuffle Service, then processing 
> of push meta requests throws NPE
> 
>
> Key: SPARK-43583
> URL: https://issues.apache.org/jira/browse/SPARK-43583
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>
> After enabling support for over-the-wire encryption for spark shuffle 
> services, the meta requests for push-merged blocks fail with this error:
> {code:java}
> java.lang.RuntimeException: java.lang.NullPointerException
>   at 
> org.apache.spark.network.server.AbstractAuthRpcHandler.getMergedBlockMetaReqHandler(AbstractAuthRpcHandler.java:110)
>   at 
> org.apache.spark.network.crypto.AuthRpcHandler.getMergedBlockMetaReqHandler(AuthRpcHandler.java:144)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processMergedBlockMetaRequest(TransportRequestHandler.java:275)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:117)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
>   at 
> org.sparkproject.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
>   at 
> org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   at 
> org.sparkproject.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43583) When encryption is enabled on the External Shuffle Service, then processing of push meta requests throws NPE

2023-05-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-43583.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41225
[https://github.com/apache/spark/pull/41225]

> When encryption is enabled on the External Shuffle Service, then processing 
> of push meta requests throws NPE
> 
>
> Key: SPARK-43583
> URL: https://issues.apache.org/jira/browse/SPARK-43583
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Fix For: 3.5.0
>
>
> After enabling support for over-the-wire encryption for spark shuffle 
> services, the meta requests for push-merged blocks fail with this error:
> {code:java}
> java.lang.RuntimeException: java.lang.NullPointerException
>   at 
> org.apache.spark.network.server.AbstractAuthRpcHandler.getMergedBlockMetaReqHandler(AbstractAuthRpcHandler.java:110)
>   at 
> org.apache.spark.network.crypto.AuthRpcHandler.getMergedBlockMetaReqHandler(AuthRpcHandler.java:144)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processMergedBlockMetaRequest(TransportRequestHandler.java:275)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:117)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
>   at 
> org.sparkproject.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
>   at 
> org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   at 
> org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   at 
> org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   at 
> org.sparkproject.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43487) Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError`

2023-05-22 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-43487.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41155
[https://github.com/apache/spark/pull/41155]

> Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError`
> -
>
> Key: SPARK-43487
> URL: https://issues.apache.org/jira/browse/SPARK-43487
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Johan Lasperas
>Assignee: Johan Lasperas
>Priority: Minor
> Fix For: 3.5.0
>
>
> The batch of errors migrated to error classes as part of spark-40540 contains 
> an error that got mixed up with the wrong error message:
> [ambiguousRelationAliasNameInNestedCTEError|https://github.com/apache/spark/commit/43a6b932759865c45ccf36f3e9cf6898c1b762da#diff-744ac13f6fe074fddeab09b407404bffa2386f54abc83c501e6e1fe618f6db56R1983]
>  uses the same error message as the following 
> commandUnsupportedInV2TableError:
>  
> {code:java}
> WITH t AS (SELECT 1), t2 AS ( WITH t AS (SELECT 2) SELECT * FROM t) SELECT * 
> FROM t2;
> AnalysisException: t is not supported for v2 tables
> {code}
> The error should be:
> {code:java}
> AnalysisException: Name tis ambiguous in nested CTE.
> Please set spark.sql.legacy.ctePrecedencePolicy to CORRECTED so that name 
> defined in inner CTE takes precedence. If set it to LEGACY, outer CTE 
> definitions will take precedence. See more details in SPARK-28228.{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43487) Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError`

2023-05-22 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-43487:


Assignee: Johan Lasperas

> Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError`
> -
>
> Key: SPARK-43487
> URL: https://issues.apache.org/jira/browse/SPARK-43487
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Johan Lasperas
>Assignee: Johan Lasperas
>Priority: Minor
>
> The batch of errors migrated to error classes as part of spark-40540 contains 
> an error that got mixed up with the wrong error message:
> [ambiguousRelationAliasNameInNestedCTEError|https://github.com/apache/spark/commit/43a6b932759865c45ccf36f3e9cf6898c1b762da#diff-744ac13f6fe074fddeab09b407404bffa2386f54abc83c501e6e1fe618f6db56R1983]
>  uses the same error message as the following 
> commandUnsupportedInV2TableError:
>  
> {code:java}
> WITH t AS (SELECT 1), t2 AS ( WITH t AS (SELECT 2) SELECT * FROM t) SELECT * 
> FROM t2;
> AnalysisException: t is not supported for v2 tables
> {code}
> The error should be:
> {code:java}
> AnalysisException: Name tis ambiguous in nested CTE.
> Please set spark.sql.legacy.ctePrecedencePolicy to CORRECTED so that name 
> defined in inner CTE takes precedence. If set it to LEGACY, outer CTE 
> definitions will take precedence. See more details in SPARK-28228.{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43290) Support IV and AAD optional parameters for aes_encrypt

2023-05-22 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-43290.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40970
[https://github.com/apache/spark/pull/40970]

> Support IV and AAD optional parameters for aes_encrypt
> --
>
> Key: SPARK-43290
> URL: https://issues.apache.org/jira/browse/SPARK-43290
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Steve Weis
>Assignee: Steve Weis
>Priority: Minor
> Fix For: 3.5.0
>
>
> There are some use cases where callers to aes_encrypt may want to provide 
> initialization vectors (IVs) or additional authenticated data (AAD). The most 
> common cases will be:
> 1. Ensuring that ciphertext matches values that have been encrypted by 
> external tools. In those cases, the caller will need to provide an identical 
> IV value.
> 2. For AES-CBC mode, there are some cases where callers want to generate 
> deterministic encrypted output.
> 3. For AES-GCM mode, providing AAD fields allows callers to bind additional 
> data to an encrypted ciphertext so that it can only be decrypted by a caller 
> providing the same value. This is often used to enforce some context.
> The proposed new API is the following:
>  * aes_encrypt(expr, key [, mode [, padding [, iv [, aad)
>  * aes_decrypt(expr, key [, mode [, padding [, aad]]])
> These fields are only supported for specific modes:
>  * ECB: Does not support either IV or AAD and will return an error if either 
> are provided.
>  * CBC: Only supports an IV and will return an error if an AAD is provided
>  * GCM: Supports either IV, AAD, or both.
> If a caller is only providing an AAD to GCM mode, they would need to pass a 
> null value in the IV field.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43290) Support IV and AAD optional parameters for aes_encrypt

2023-05-22 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-43290:


Assignee: Steve Weis

> Support IV and AAD optional parameters for aes_encrypt
> --
>
> Key: SPARK-43290
> URL: https://issues.apache.org/jira/browse/SPARK-43290
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Steve Weis
>Assignee: Steve Weis
>Priority: Minor
>
> There are some use cases where callers to aes_encrypt may want to provide 
> initialization vectors (IVs) or additional authenticated data (AAD). The most 
> common cases will be:
> 1. Ensuring that ciphertext matches values that have been encrypted by 
> external tools. In those cases, the caller will need to provide an identical 
> IV value.
> 2. For AES-CBC mode, there are some cases where callers want to generate 
> deterministic encrypted output.
> 3. For AES-GCM mode, providing AAD fields allows callers to bind additional 
> data to an encrypted ciphertext so that it can only be decrypted by a caller 
> providing the same value. This is often used to enforce some context.
> The proposed new API is the following:
>  * aes_encrypt(expr, key [, mode [, padding [, iv [, aad)
>  * aes_decrypt(expr, key [, mode [, padding [, aad]]])
> These fields are only supported for specific modes:
>  * ECB: Does not support either IV or AAD and will return an error if either 
> are provided.
>  * CBC: Only supports an IV and will return an error if an AAD is provided
>  * GCM: Supports either IV, AAD, or both.
> If a caller is only providing an AAD to GCM mode, they would need to pass a 
> null value in the IV field.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43597) Assign a name to the error class _LEGACY_ERROR_TEMP_0017

2023-05-22 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-43597.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41241
[https://github.com/apache/spark/pull/41241]

> Assign a name to the error class _LEGACY_ERROR_TEMP_0017
> 
>
> Key: SPARK-43597
> URL: https://issues.apache.org/jira/browse/SPARK-43597
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43597) Assign a name to the error class _LEGACY_ERROR_TEMP_0017

2023-05-22 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-43597:


Assignee: BingKun Pan

> Assign a name to the error class _LEGACY_ERROR_TEMP_0017
> 
>
> Key: SPARK-43597
> URL: https://issues.apache.org/jira/browse/SPARK-43597
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43717) Scala Client Dataset#reduce failed to handle null partitions for scala primitive types

2023-05-22 Thread Zhen Li (Jira)

Zhen Li created SPARK-43717:
---

 Summary: Scala Client Dataset#reduce failed to handle null 
partitions for scala primitive types
 Key: SPARK-43717
 URL: https://issues.apache.org/jira/browse/SPARK-43717
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0
Reporter: Zhen Li


Scala client failed with NPE when running:

assert(spark.range(0, 5, 1, 10).as[Long].reduce(_ + _) == 10)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43333) Name union type members after types

2023-05-22 Thread Siying Dong (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725055#comment-17725055
 ] 

Siying Dong commented on SPARK-4:
-

https://github.com/apache/spark/pull/41263/

> Name union type members after types
> ---
>
> Key: SPARK-4
> URL: https://issues.apache.org/jira/browse/SPARK-4
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.3.2
>Reporter: Jose Gonzalez
>Priority: Major
>
> Spark converts Avro union types into record types, where each member of the 
> union type corresponds to a field in the record type. The current behaviour 
> is to name the record fields "member0", "member1", etc, for each member of 
> the union type. We propose having the option to instead use the member type 
> name.
> The purpose of this is twofold:
>  # To allow adding or removing types to the union without affecting the 
> record names of other member types. If the new or removed type is not ordered 
> last, then existing queries referencing "member2" may need to be rewritten to 
> reference "member1" or "member3".
>  # Referencing the type name in the query is more readable than referencing 
> "member0".
> For example, our system produces an avro schema from a Java type structure 
> where subtyping maps to union types whose members are ordered 
> lexicographically. Adding a subtype can therefore easily result in all 
> references to "member2" needing to be updated to "member3".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43716) Revert scala-maven-plugin upgrade

2023-05-22 Thread Jira

Bjørn Jørgensen created SPARK-43716:
---

 Summary: Revert scala-maven-plugin upgrade
 Key: SPARK-43716
 URL: https://issues.apache.org/jira/browse/SPARK-43716
 Project: Spark
  Issue Type: Dependency upgrade
  Components: Build
Affects Versions: 3.5.0
Reporter: Bjørn Jørgensen






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42981) Add direct arrow serialization

2023-05-22 Thread Jira



[ 
https://issues.apache.org/jira/browse/SPARK-42981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724993#comment-17724993
 ] 

Herman van Hövell commented on SPARK-42981:
---

Nope, I need to pick this up again. Will do so in the next few days.

> Add direct arrow serialization
> --
>
> Key: SPARK-42981
> URL: https://issues.apache.org/jira/browse/SPARK-42981
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43647) Maven test failed in ClientE2ETestSuite/CatalogSuite/StreamingQuerySuite without -Phive

2023-05-22 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-43647:
-
Description: 
 
{code:java}
build/mvn clean install -DskipTests
build/mvn test -pl connector/connect/client/jvm{code}
 

13 test failed with similar reasons:

 
{code:java}
- read and write *** FAILED ***
  io.grpc.StatusRuntimeException: INTERNAL: 
org.apache.spark.sql.sources.DataSourceRegister: Provider 
org.apache.spark.sql.hive.execution.HiveFileFormat could not be instantiated
  at io.grpc.Status.asRuntimeException(Status.java:535)
  at 
io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
  at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
  at scala.collection.Iterator.toStream(Iterator.scala:1417)
  at scala.collection.Iterator.toStream$(Iterator.scala:1416)
  at scala.collection.AbstractIterator.toStream(Iterator.scala:1431)
  at scala.collection.TraversableOnce.toSeq(TraversableOnce.scala:354)
  at scala.collection.TraversableOnce.toSeq$(TraversableOnce.scala:354)
  at scala.collection.AbstractIterator.toSeq(Iterator.scala:1431)
  at org.apache.spark.sql.SparkSession.execute(SparkSession.scala:489)
  ... {code}
 

 

  was:
 
{code:java}
build/mvn clean install -DskipTests
build/mvn test -pl connector/connect/server    {code}
 

13 test failed with similar reasons:

 
{code:java}
- read and write *** FAILED ***
  io.grpc.StatusRuntimeException: INTERNAL: 
org.apache.spark.sql.sources.DataSourceRegister: Provider 
org.apache.spark.sql.hive.execution.HiveFileFormat could not be instantiated
  at io.grpc.Status.asRuntimeException(Status.java:535)
  at 
io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
  at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
  at scala.collection.Iterator.toStream(Iterator.scala:1417)
  at scala.collection.Iterator.toStream$(Iterator.scala:1416)
  at scala.collection.AbstractIterator.toStream(Iterator.scala:1431)
  at scala.collection.TraversableOnce.toSeq(TraversableOnce.scala:354)
  at scala.collection.TraversableOnce.toSeq$(TraversableOnce.scala:354)
  at scala.collection.AbstractIterator.toSeq(Iterator.scala:1431)
  at org.apache.spark.sql.SparkSession.execute(SparkSession.scala:489)
  ... {code}
 

 


> Maven test failed in ClientE2ETestSuite/CatalogSuite/StreamingQuerySuite 
> without -Phive
> ---
>
> Key: SPARK-43647
> URL: https://issues.apache.org/jira/browse/SPARK-43647
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
>  
> {code:java}
> build/mvn clean install -DskipTests
> build/mvn test -pl connector/connect/client/jvm{code}
>  
> 13 test failed with similar reasons:
>  
> {code:java}
> - read and write *** FAILED ***
>   io.grpc.StatusRuntimeException: INTERNAL: 
> org.apache.spark.sql.sources.DataSourceRegister: Provider 
> org.apache.spark.sql.hive.execution.HiveFileFormat could not be instantiated
>   at io.grpc.Status.asRuntimeException(Status.java:535)
>   at 
> io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
>   at scala.collection.Iterator.toStream(Iterator.scala:1417)
>   at scala.collection.Iterator.toStream$(Iterator.scala:1416)
>   at scala.collection.AbstractIterator.toStream(Iterator.scala:1431)
>   at scala.collection.TraversableOnce.toSeq(TraversableOnce.scala:354)
>   at scala.collection.TraversableOnce.toSeq$(TraversableOnce.scala:354)
>   at scala.collection.AbstractIterator.toSeq(Iterator.scala:1431)
>   at org.apache.spark.sql.SparkSession.execute(SparkSession.scala:489)
>   ... {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value

2023-05-22 Thread melin (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724942#comment-17724942
 ] 

melin commented on SPARK-43338:
---

kyuubi verified it: 
[https://kyuubi.readthedocs.io/en/v1.7.1-rc0/connector/spark/hive.html]
 
kyuubi is implemented based on HiveSessionCatalog. If there are huid tables in 
the hive database, another Hudi catalog needs to be registered. The same hms 
has two catalognames, which does not meet my requirements.
 

> Support  modify the SESSION_CATALOG_NAME value
> --
>
> Key: SPARK-43338
> URL: https://issues.apache.org/jira/browse/SPARK-43338
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> {code:java}
> private[sql] object CatalogManager {
> val SESSION_CATALOG_NAME: String = "spark_catalog"
> }{code}
>  
> The SESSION_CATALOG_NAME value cannot be modified。
> If multiple Hive Metastores exist, the platform manages multiple hms metadata 
> and classifies them by catalogName. A different catalog name is required
> [~fanjia] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value

2023-05-22 Thread Jia Fan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724939#comment-17724939
 ] 

Jia Fan commented on SPARK-43338:
-

`Assign each hms a unique catalogname only so that the meta tableId is unique: 
catalog.database.table.`

 

I think the Datasource V2 can do that. But i didn't verify it.

> Support  modify the SESSION_CATALOG_NAME value
> --
>
> Key: SPARK-43338
> URL: https://issues.apache.org/jira/browse/SPARK-43338
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> {code:java}
> private[sql] object CatalogManager {
> val SESSION_CATALOG_NAME: String = "spark_catalog"
> }{code}
>  
> The SESSION_CATALOG_NAME value cannot be modified。
> If multiple Hive Metastores exist, the platform manages multiple hms metadata 
> and classifies them by catalogName. A different catalog name is required
> [~fanjia] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37351) Supports write data flow control

2023-05-22 Thread Jia Fan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724934#comment-17724934
 ] 

Jia Fan commented on SPARK-37351:
-

Do you want data flow control on micro-batch or batch mode?  This is a big 
feature, maybe should create SPIP and send to dev mail list. And I'm not sure 
the community will accept this change. cc [~cloud_fan]

> Supports write data flow control
> 
>
> Key: SPARK-37351
> URL: https://issues.apache.org/jira/browse/SPARK-37351
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: melin
>Priority: Major
>
> The hive table data is written to a relational database, generally an online 
> production database. If the writing speed has no traffic control, it can 
> easily affect the stability of the online system. It is recommended to add 
> traffic control parameters
> [~fanjia] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value

2023-05-22 Thread melin (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724933#comment-17724933
 ] 

melin commented on SPARK-43338:
---

I don't need to access multiple hms in the same sparksession, I only need to 
access one of them. Assign each hms a unique catalogname only so that the meta 
tableId is unique: catalog.database.table.

> Support  modify the SESSION_CATALOG_NAME value
> --
>
> Key: SPARK-43338
> URL: https://issues.apache.org/jira/browse/SPARK-43338
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> {code:java}
> private[sql] object CatalogManager {
> val SESSION_CATALOG_NAME: String = "spark_catalog"
> }{code}
>  
> The SESSION_CATALOG_NAME value cannot be modified。
> If multiple Hive Metastores exist, the platform manages multiple hms metadata 
> and classifies them by catalogName. A different catalog name is required
> [~fanjia] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43715) Add spark DataFrame binary file reader / writer

2023-05-22 Thread Weichen Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weichen Xu updated SPARK-43715:
---
Description: 
In new distributed spark ML module (designed to support spark connect and 
support local inference)

We need to save ML model to hadoop file system using custom binary file format, 
the reason is:
 * We often submit a spark application to spark cluster for running the 
training model job, we need to save trained model to hadoop file system before 
the spark application completes.
 * But we want to support local model inference, that means if we save the 
model by current spark DataFrame writer (e.g. parquet format), when loading 
model we have to rely on the spark service. But we hope we can load model 
without spark service. So we want the model being saved as the original binary 
format that our ML code can handle.

so we need to add a DataFrame reader / writer format, that can load / save 
binary files, the API is like:

 

{*}Writer API{*}:

Supposing we have a dataframe with schema:

[file_path: String, content: binary],

we can save the dataframe to a hadoop path, each row we will save it as a file 
under the hadoop path, the saved file path is \{hadoop path}/\{file_path}, 
"file_path" can be a multiple part path.

 

{*}Reader API{*}:

`spark.read.format("binaryFileV2").load(...)`

 

It will return a spark dataframe , each row contains the file path and the file 
content binary string.

 

  was:
In new distributed spark ML module (designed to support spark connect and 
support local inference)

We need to save ML model to hadoop file system using custom binary file format, 
the reason is:
 * The training model job is a spark job, we need to save trained model to 
hadoop file sytem after the job completes.
 * But we want to support local model inference, that means if we save the 
model by current spark DataFrame writer (e.g. parquet format), when loading 
model we have to rely on the spark service. But we hope we can load model 
without spark service. So we want the model being saved as the original binary 
format that our ML code can handle.

so we need to add a DataFrame reader / writer format, that can load / save 
binary files, the API is like:

 

{*}Writer API{*}:

Supposing we have a dataframe with schema:

[file_path: String, content: binary],

we can save the dataframe to a hadoop path, each row we will save it as a file 
under the hadoop path, the saved file path is \{hadoop path}/\{file_path}, 
"file_path" can be a multiple part path.

 

{*}Reader API{*}:

`spark.read.format("binaryFileV2").load(...)`

 

It will return a spark dataframe , each row contains the file path and the file 
content binary string.

 


> Add spark DataFrame binary file reader / writer
> ---
>
> Key: SPARK-43715
> URL: https://issues.apache.org/jira/browse/SPARK-43715
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.5.0
>Reporter: Weichen Xu
>Priority: Major
>
> In new distributed spark ML module (designed to support spark connect and 
> support local inference)
> We need to save ML model to hadoop file system using custom binary file 
> format, the reason is:
>  * We often submit a spark application to spark cluster for running the 
> training model job, we need to save trained model to hadoop file system 
> before the spark application completes.
>  * But we want to support local model inference, that means if we save the 
> model by current spark DataFrame writer (e.g. parquet format), when loading 
> model we have to rely on the spark service. But we hope we can load model 
> without spark service. So we want the model being saved as the original 
> binary format that our ML code can handle.
> so we need to add a DataFrame reader / writer format, that can load / save 
> binary files, the API is like:
>  
> {*}Writer API{*}:
> Supposing we have a dataframe with schema:
> [file_path: String, content: binary],
> we can save the dataframe to a hadoop path, each row we will save it as a 
> file under the hadoop path, the saved file path is \{hadoop 
> path}/\{file_path}, "file_path" can be a multiple part path.
>  
> {*}Reader API{*}:
> `spark.read.format("binaryFileV2").load(...)`
>  
> It will return a spark dataframe , each row contains the file path and the 
> file content binary string.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43715) Add spark DataFrame binary file reader / writer

2023-05-22 Thread Weichen Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weichen Xu updated SPARK-43715:
---
Description: 
In new distributed spark ML module (designed to support spark connect and 
support local inference)

We need to save ML model to hadoop file system using custom binary file format, 
the reason is:
 * The training model job is a spark job, we need to save trained model to 
hadoop file sytem after the job completes.
 * But we want to support local model inference, that means if we save the 
model by current spark DataFrame writer (e.g. parquet format), when loading 
model we have to rely on the spark service. But we hope we can load model 
without spark service. So we want the model being saved as the original binary 
format that our ML code can handle.

so we need to add a DataFrame reader / writer format, that can load / save 
binary files, the API is like:

 

{*}Writer API{*}:

Supposing we have a dataframe with schema:

[file_path: String, content: binary],

we can save the dataframe to a hadoop path, each row we will save it as a file 
under the hadoop path, the saved file path is \{hadoop path}/\{file_path}, 
"file_path" can be a multiple part path.

 

{*}Reader API{*}:

`spark.read.format("binaryFileV2").load(...)`

 

It will return a spark dataframe , each row contains the file path and the file 
content binary string.

 

  was:
In new distributed spark ML module (designed to support spark connect and 
support local inference)

We need to save ML model to hadoop file system using custom binary file format, 
the reason is:
 * The training model job is a spark job, we need to save trained model to 
hadoop file sytem after the job completes.
 * But we want to support local model inference, that means if we save the 
model by current spark DataFrame writer (e.g. parquet format), when loading 
model we have to rely on the spark service. But we hope we can load model 
without spark service. So we want the model being saved as the original binary 
format that our ML code can handle.

so we need to add a DataFrame reader / writer format, that can load / save 
binary files, the API is like:

 

{*}Writer API{*}:

Supposing we have a dataframe with schema:

[file_path: String, content: binary],

we can save the dataframe to a hadoop path, each row we will save it as a file 
under the hadoop path, the saved file path is \{hadoop path}/\{file_path}, 
"file_path" can be a multiple part path.

 

Reader API:

`spark.read.format("binaryFileV2").load(...)`

 

It will return a spark dataframe , each row contains the file path and the file 
content binary string.

 


> Add spark DataFrame binary file reader / writer
> ---
>
> Key: SPARK-43715
> URL: https://issues.apache.org/jira/browse/SPARK-43715
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.5.0
>Reporter: Weichen Xu
>Priority: Major
>
> In new distributed spark ML module (designed to support spark connect and 
> support local inference)
> We need to save ML model to hadoop file system using custom binary file 
> format, the reason is:
>  * The training model job is a spark job, we need to save trained model to 
> hadoop file sytem after the job completes.
>  * But we want to support local model inference, that means if we save the 
> model by current spark DataFrame writer (e.g. parquet format), when loading 
> model we have to rely on the spark service. But we hope we can load model 
> without spark service. So we want the model being saved as the original 
> binary format that our ML code can handle.
> so we need to add a DataFrame reader / writer format, that can load / save 
> binary files, the API is like:
>  
> {*}Writer API{*}:
> Supposing we have a dataframe with schema:
> [file_path: String, content: binary],
> we can save the dataframe to a hadoop path, each row we will save it as a 
> file under the hadoop path, the saved file path is \{hadoop 
> path}/\{file_path}, "file_path" can be a multiple part path.
>  
> {*}Reader API{*}:
> `spark.read.format("binaryFileV2").load(...)`
>  
> It will return a spark dataframe , each row contains the file path and the 
> file content binary string.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 3 >

1 - 100 of 215 matches

Mail list logo