[jira] [Updated] (SPARK-46772) Benchmarking Avro with ZSTD

2024-01-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46772:
---
Labels: pull-request-available  (was: )

> Benchmarking Avro with ZSTD 
> 
>
> Key: SPARK-46772
> URL: https://issues.apache.org/jira/browse/SPARK-46772
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46808) Refine error classes in Python with automatic sorting function

2024-01-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46808.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44848
[https://github.com/apache/spark/pull/44848]

> Refine error classes in Python with automatic sorting function
> --
>
> Key: SPARK-46808
> URL: https://issues.apache.org/jira/browse/SPARK-46808
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> There are too many inconsistency within error_classes, and there's no way to 
> automatically generate/sort the error classes. We should make the dev easier.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46808) Refine error classes in Python with automatic sorting function

2024-01-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46808:


Assignee: Hyukjin Kwon

> Refine error classes in Python with automatic sorting function
> --
>
> Key: SPARK-46808
> URL: https://issues.apache.org/jira/browse/SPARK-46808
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> There are too many inconsistency within error_classes, and there's no way to 
> automatically generate/sort the error classes. We should make the dev easier.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46807) Include automation notice in SQL error class documents

2024-01-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46807:


Assignee: Nicholas Chammas

> Include automation notice in SQL error class documents
> --
>
> Key: SPARK-46807
> URL: https://issues.apache.org/jira/browse/SPARK-46807
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46807) Include automation notice in SQL error class documents

2024-01-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46807.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44847
[https://github.com/apache/spark/pull/44847]

> Include automation notice in SQL error class documents
> --
>
> Key: SPARK-46807
> URL: https://issues.apache.org/jira/browse/SPARK-46807
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45869) Revisit and Improve Spark Standalone Cluster

2024-01-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45869:
--
Priority: Critical  (was: Major)

> Revisit and Improve Spark Standalone Cluster
> 
>
> Key: SPARK-45869
> URL: https://issues.apache.org/jira/browse/SPARK-45869
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Critical
>  Labels: releasenotes
> Fix For: 4.0.0
>
>
> This is an experimental internal configuration for advance users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46800) Support `spark.deploy.spreadOutDrivers`

2024-01-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46800:
--
Priority: Major  (was: Critical)

> Support `spark.deploy.spreadOutDrivers`
> ---
>
> Key: SPARK-46800
> URL: https://issues.apache.org/jira/browse/SPARK-46800
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46800) Support `spark.deploy.spreadOutDrivers`

2024-01-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46800:
--
Priority: Critical  (was: Major)

> Support `spark.deploy.spreadOutDrivers`
> ---
>
> Key: SPARK-46800
> URL: https://issues.apache.org/jira/browse/SPARK-46800
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46718) Upgrade Arrow to 15.0.0

2024-01-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46718.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44797
[https://github.com/apache/spark/pull/44797]

> Upgrade Arrow to 15.0.0
> ---
>
> Key: SPARK-46718
> URL: https://issues.apache.org/jira/browse/SPARK-46718
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: image-2024-01-15-14-02-57-814.png
>
>
> https://github.com/apache/arrow/releases/tag/apache-arrow-15.0.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46718) Upgrade Arrow to 15.0.0

2024-01-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46718:
-

Assignee: Yang Jie

> Upgrade Arrow to 15.0.0
> ---
>
> Key: SPARK-46718
> URL: https://issues.apache.org/jira/browse/SPARK-46718
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-01-15-14-02-57-814.png
>
>
> https://github.com/apache/arrow/releases/tag/apache-arrow-15.0.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46805) Upgrade `scalafmt` to 3.7.17

2024-01-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46805.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44845
[https://github.com/apache/spark/pull/44845]

> Upgrade `scalafmt` to 3.7.17
> 
>
> Key: SPARK-46805
> URL: https://issues.apache.org/jira/browse/SPARK-46805
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46808) Refine error classes in Python with automatic sorting function

2024-01-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46808:
-
Parent: SPARK-45673
Issue Type: Sub-task  (was: Test)

> Refine error classes in Python with automatic sorting function
> --
>
> Key: SPARK-46808
> URL: https://issues.apache.org/jira/browse/SPARK-46808
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> There are too many inconsistency within error_classes, and there's no way to 
> automatically generate/sort the error classes. We should make the dev easier.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46809) Check error message parameter properly

2024-01-22 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-46809:

Description: If error message parameter from template is missing in actual 
usage or the name is different, it should raise exception but currently it's 
not. We should handle this to work properly.  (was: If error message parameter 
from template is missing in actual usage, it should raise exception but 
currently it's not. We should handle this to work properly.)

> Check error message parameter properly
> --
>
> Key: SPARK-46809
> URL: https://issues.apache.org/jira/browse/SPARK-46809
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> If error message parameter from template is missing in actual usage or the 
> name is different, it should raise exception but currently it's not. We 
> should handle this to work properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46809) Check error message parameter properly

2024-01-22 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-46809:

Summary: Check error message parameter properly  (was: Check missing error 
message parameter properly)

> Check error message parameter properly
> --
>
> Key: SPARK-46809
> URL: https://issues.apache.org/jira/browse/SPARK-46809
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> If error message parameter from template is missing in actual usage, it 
> should raise exception but currently it's not. We should handle this to work 
> properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46809) Check missing error message parameter properly

2024-01-22 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-46809:
---

 Summary: Check missing error message parameter properly
 Key: SPARK-46809
 URL: https://issues.apache.org/jira/browse/SPARK-46809
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Haejoon Lee


If error message parameter from template is missing in actual usage, it should 
raise exception but currently it's not. We should handle this to work properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46806) Improve error message for spark.table when argument type is wrong

2024-01-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46806.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44846
[https://github.com/apache/spark/pull/44846]

> Improve error message for spark.table when argument type is wrong
> -
>
> Key: SPARK-46806
> URL: https://issues.apache.org/jira/browse/SPARK-46806
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code}
> >>> spark.table(None)
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/.../spark/python/pyspark/sql/session.py", line 1710, in table
> return DataFrame(self._jsparkSession.table(tableName), self)
>  
>   File "/.../spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", 
> line 1322, in __call__
>   File "/.../spark/python/pyspark/errors/exceptions/captured.py", line 215, 
> in deco
> return f(*a, **kw)
>^^^
>   File "/.../spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 
> 326, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o27.table.
> : java.lang.NullPointerException: Cannot invoke "String.length()" because "s" 
> is null
>   at org.antlr.v4.runtime.CharStreams.fromString(CharStreams.java:222)
>   at org.antlr.v4.runtime.CharStreams.fromString(CharStreams.java:212)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:58)
>   at 
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:55)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(AbstractSqlParser.scala:54)
>   at org.apache.spark.sql.DataFrameReader.table(DataFrameReader.scala:681)
>   at org.apache.spark.sql.SparkSession.table(SparkSession.scala:619)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:568)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
>   at py4j.Gateway.invoke(Gateway.java:282)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46806) Improve error message for spark.table when argument type is wrong

2024-01-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46806:


Assignee: Hyukjin Kwon

> Improve error message for spark.table when argument type is wrong
> -
>
> Key: SPARK-46806
> URL: https://issues.apache.org/jira/browse/SPARK-46806
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> {code}
> >>> spark.table(None)
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/.../spark/python/pyspark/sql/session.py", line 1710, in table
> return DataFrame(self._jsparkSession.table(tableName), self)
>  
>   File "/.../spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", 
> line 1322, in __call__
>   File "/.../spark/python/pyspark/errors/exceptions/captured.py", line 215, 
> in deco
> return f(*a, **kw)
>^^^
>   File "/.../spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 
> 326, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o27.table.
> : java.lang.NullPointerException: Cannot invoke "String.length()" because "s" 
> is null
>   at org.antlr.v4.runtime.CharStreams.fromString(CharStreams.java:222)
>   at org.antlr.v4.runtime.CharStreams.fromString(CharStreams.java:212)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:58)
>   at 
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:55)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(AbstractSqlParser.scala:54)
>   at org.apache.spark.sql.DataFrameReader.table(DataFrameReader.scala:681)
>   at org.apache.spark.sql.SparkSession.table(SparkSession.scala:619)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:568)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
>   at py4j.Gateway.invoke(Gateway.java:282)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46808) Refine error classes in Python with automatic sorting function

2024-01-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46808:
---
Labels: pull-request-available  (was: )

> Refine error classes in Python with automatic sorting function
> --
>
> Key: SPARK-46808
> URL: https://issues.apache.org/jira/browse/SPARK-46808
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> There are too many inconsistency within error_classes, and there's no way to 
> automatically generate/sort the error classes. We should make the dev easier.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46807) Include automation notice in SQL error class documents

2024-01-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46807:
---
Labels: pull-request-available  (was: )

> Include automation notice in SQL error class documents
> --
>
> Key: SPARK-46807
> URL: https://issues.apache.org/jira/browse/SPARK-46807
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46808) Refine error classes in Python with automatic sorting function

2024-01-22 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46808:


 Summary: Refine error classes in Python with automatic sorting 
function
 Key: SPARK-46808
 URL: https://issues.apache.org/jira/browse/SPARK-46808
 Project: Spark
  Issue Type: Test
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


There are too many inconsistency within error_classes, and there's no way to 
automatically generate/sort the error classes. We should make the dev easier.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46807) Include automation notice in SQL error class documents

2024-01-22 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-46807:


 Summary: Include automation notice in SQL error class documents
 Key: SPARK-46807
 URL: https://issues.apache.org/jira/browse/SPARK-46807
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 4.0.0
Reporter: Nicholas Chammas






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46800) Support `spark.deploy.spreadOutDrivers`

2024-01-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46800.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44840
[https://github.com/apache/spark/pull/44840]

> Support `spark.deploy.spreadOutDrivers`
> ---
>
> Key: SPARK-46800
> URL: https://issues.apache.org/jira/browse/SPARK-46800
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46806) Improve error message for spark.table when argument type is wrong

2024-01-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46806:
---
Labels: pull-request-available  (was: )

> Improve error message for spark.table when argument type is wrong
> -
>
> Key: SPARK-46806
> URL: https://issues.apache.org/jira/browse/SPARK-46806
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> {code}
> >>> spark.table(None)
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/.../spark/python/pyspark/sql/session.py", line 1710, in table
> return DataFrame(self._jsparkSession.table(tableName), self)
>  
>   File "/.../spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", 
> line 1322, in __call__
>   File "/.../spark/python/pyspark/errors/exceptions/captured.py", line 215, 
> in deco
> return f(*a, **kw)
>^^^
>   File "/.../spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 
> 326, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o27.table.
> : java.lang.NullPointerException: Cannot invoke "String.length()" because "s" 
> is null
>   at org.antlr.v4.runtime.CharStreams.fromString(CharStreams.java:222)
>   at org.antlr.v4.runtime.CharStreams.fromString(CharStreams.java:212)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:58)
>   at 
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:55)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(AbstractSqlParser.scala:54)
>   at org.apache.spark.sql.DataFrameReader.table(DataFrameReader.scala:681)
>   at org.apache.spark.sql.SparkSession.table(SparkSession.scala:619)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:568)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
>   at py4j.Gateway.invoke(Gateway.java:282)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46806) Improve error message for spark.table when argument type is wrong

2024-01-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46806:
-
Description: 
{code}
>>> spark.table(None)
Traceback (most recent call last):
  File "", line 1, in 
  File "/.../spark/python/pyspark/sql/session.py", line 1710, in table
return DataFrame(self._jsparkSession.table(tableName), self)
 
  File "/.../spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 
1322, in __call__
  File "/.../spark/python/pyspark/errors/exceptions/captured.py", line 215, in 
deco
return f(*a, **kw)
   ^^^
  File "/.../spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 
326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o27.table.
: java.lang.NullPointerException: Cannot invoke "String.length()" because "s" 
is null
at org.antlr.v4.runtime.CharStreams.fromString(CharStreams.java:222)
at org.antlr.v4.runtime.CharStreams.fromString(CharStreams.java:212)
at 
org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:58)
at 
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:55)
at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(AbstractSqlParser.scala:54)
at org.apache.spark.sql.DataFrameReader.table(DataFrameReader.scala:681)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:619)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
{code}

  was:
{code}
>>> spark.table(None)
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/sql/session.py", 
line 1710, in table
return DataFrame(self._jsparkSession.table(tableName), self)
 
  File 
"/Users/hyukjin.kwon/workspace/forked/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py",
 line 1322, in __call__
  File 
"/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/errors/exceptions/captured.py",
 line 215, in deco
return f(*a, **kw)
   ^^^
  File 
"/Users/hyukjin.kwon/workspace/forked/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py",
 line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o27.table.
: java.lang.NullPointerException: Cannot invoke "String.length()" because "s" 
is null
at org.antlr.v4.runtime.CharStreams.fromString(CharStreams.java:222)
at org.antlr.v4.runtime.CharStreams.fromString(CharStreams.java:212)
at 
org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:58)
at 
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:55)
at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(AbstractSqlParser.scala:54)
at org.apache.spark.sql.DataFrameReader.table(DataFrameReader.scala:681)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:619)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
{code}


> Improve error message for spark.table when argument type is wrong
> -
>
> Key: SPARK-46806
> URL: https://issues.apache.org/jira/browse/SPARK-46806
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> {code}
> >>> spark.table(None)
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/.../spark/python/pyspark/sql/session.py", line 1710, in table
> return DataFrame(self._jsparkSession.table(tableName), self)
>  

[jira] [Created] (SPARK-46806) Improve error message for spark.table when argument type is wrong

2024-01-22 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46806:


 Summary: Improve error message for spark.table when argument type 
is wrong
 Key: SPARK-46806
 URL: https://issues.apache.org/jira/browse/SPARK-46806
 Project: Spark
  Issue Type: Test
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
>>> spark.table(None)
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/sql/session.py", 
line 1710, in table
return DataFrame(self._jsparkSession.table(tableName), self)
 
  File 
"/Users/hyukjin.kwon/workspace/forked/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py",
 line 1322, in __call__
  File 
"/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/errors/exceptions/captured.py",
 line 215, in deco
return f(*a, **kw)
   ^^^
  File 
"/Users/hyukjin.kwon/workspace/forked/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py",
 line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o27.table.
: java.lang.NullPointerException: Cannot invoke "String.length()" because "s" 
is null
at org.antlr.v4.runtime.CharStreams.fromString(CharStreams.java:222)
at org.antlr.v4.runtime.CharStreams.fromString(CharStreams.java:212)
at 
org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:58)
at 
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:55)
at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(AbstractSqlParser.scala:54)
at org.apache.spark.sql.DataFrameReader.table(DataFrameReader.scala:681)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:619)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46803) Remove scala-2.13 profile

2024-01-22 Thread BingKun Pan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan resolved SPARK-46803.
-
Resolution: Not A Problem

> Remove scala-2.13 profile
> -
>
> Key: SPARK-46803
> URL: https://issues.apache.org/jira/browse/SPARK-46803
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46804) Recover the generated documents

2024-01-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46804:
-

Assignee: Dongjoon Hyun

> Recover the generated documents
> ---
>
> Key: SPARK-46804
> URL: https://issues.apache.org/jira/browse/SPARK-46804
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46804) Recover the generated documents

2024-01-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46804.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44843
[https://github.com/apache/spark/pull/44843]

> Recover the generated documents
> ---
>
> Key: SPARK-46804
> URL: https://issues.apache.org/jira/browse/SPARK-46804
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46802) Cleanup codecov script

2024-01-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46802.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44842
[https://github.com/apache/spark/pull/44842]

> Cleanup codecov script
> --
>
> Key: SPARK-46802
> URL: https://issues.apache.org/jira/browse/SPARK-46802
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We used to use {{coverage_daemon.py}} to track the coverage of the Python 
> worker side (SPARK-7721). However, seems it does not work anymore. We should 
> remove it out first, and think about other ways around.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46802) Cleanup codecov script

2024-01-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46802:


Assignee: Hyukjin Kwon

> Cleanup codecov script
> --
>
> Key: SPARK-46802
> URL: https://issues.apache.org/jira/browse/SPARK-46802
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> We used to use {{coverage_daemon.py}} to track the coverage of the Python 
> worker side (SPARK-7721). However, seems it does not work anymore. We should 
> remove it out first, and think about other ways around.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46803) Remove scala-2.13 profile

2024-01-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46803:
---
Labels: pull-request-available  (was: )

> Remove scala-2.13 profile
> -
>
> Key: SPARK-46803
> URL: https://issues.apache.org/jira/browse/SPARK-46803
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46804) Recover the generated documents

2024-01-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46804:
---
Labels: pull-request-available  (was: )

> Recover the generated documents
> ---
>
> Key: SPARK-46804
> URL: https://issues.apache.org/jira/browse/SPARK-46804
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46804) Recover the generated documents

2024-01-22 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-46804:
-

 Summary: Recover the generated documents
 Key: SPARK-46804
 URL: https://issues.apache.org/jira/browse/SPARK-46804
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46803) Remove scala-2.13 profile

2024-01-22 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-46803:
---

 Summary: Remove scala-2.13 profile
 Key: SPARK-46803
 URL: https://issues.apache.org/jira/browse/SPARK-46803
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46802) Cleanup codecov script

2024-01-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46802:
---
Labels: pull-request-available  (was: )

> Cleanup codecov script
> --
>
> Key: SPARK-46802
> URL: https://issues.apache.org/jira/browse/SPARK-46802
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> We used to use {{coverage_daemon.py}} to track the coverage of the Python 
> worker side (SPARK-7721). However, seems it does not work anymore. We should 
> remove it out first, and think about other ways around.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46801) Do not treat exit 5 as a test failure in Python testing script

2024-01-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46801.
---
Fix Version/s: 3.4.3
   3.5.1
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 44841
[https://github.com/apache/spark/pull/44841]

> Do not treat exit 5 as a test failure in Python testing script
> --
>
> Key: SPARK-46801
> URL: https://issues.apache.org/jira/browse/SPARK-46801
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.3, 3.5.1, 4.0.0
>
>
> {code}
> 
> Running PySpark tests
> 
> Running PySpark tests. Output is in /__w/spark/spark/python/unit-tests.log
> Will test against the following Python executables: ['python3.12']
> Will test the following Python modules: ['pyspark-core', 'pyspark-streaming', 
> 'pyspark-errors']
> python3.12 python_implementation is CPython
> python3.12 version is: Python 3.12.1
> Starting test(python3.12): pyspark.streaming.tests.test_context (temp output: 
> /__w/spark/spark/python/target/8674ed86-36bd-47d1-863b-abb0405557f6/python3.12__pyspark.streaming.tests.test_context__umu69c3v.log)
> Finished test(python3.12): pyspark.streaming.tests.test_context (12s)
> Starting test(python3.12): pyspark.streaming.tests.test_dstream (temp output: 
> /__w/spark/spark/python/target/847eb56b-3c5f-49ab-8a83-3326bb96bc5d/python3.12__pyspark.streaming.tests.test_dstream__rorhk0lc.log)
> Finished test(python3.12): pyspark.streaming.tests.test_dstream (102s)
> Starting test(python3.12): pyspark.streaming.tests.test_kinesis (temp output: 
> /__w/spark/spark/python/target/78f23c83-c24d-4fa1-abbd-edb90f48dff1/python3.12__pyspark.streaming.tests.test_kinesis__q5l1pv0h.log)
> test_kinesis_stream 
> (pyspark.streaming.tests.test_kinesis.KinesisStreamTests.test_kinesis_stream) 
> ... skipped "Skipping all Kinesis Python tests as environmental variable 
> 'ENABLE_KINESIS_TESTS' was not set."
> test_kinesis_stream_api 
> (pyspark.streaming.tests.test_kinesis.KinesisStreamTests.test_kinesis_stream_api)
>  ... skipped "Skipping all Kinesis Python tests as environmental variable 
> 'ENABLE_KINESIS_TESTS' was not set."
> --
> Ran 0 tests in 0.000s
> NO TESTS RAN (skipped=2)
> Had test failures in pyspark.streaming.tests.test_kinesis with python3.12; 
> see logs.
> Error:  running /__w/spark/spark/python/run-tests 
> --modules=pyspark-core,pyspark-streaming,pyspark-errors --parallelism=1 
> --python-executables=python3.12 ; received return code 255
> Error: Process completed with exit code 19.
> {code}
> Scheduled job fails because of exit 5, see 
> https://github.com/pytest-dev/pytest/issues/2393. This isn't a test failure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46801) Do not treat exit 5 as a test failure in Python testing script

2024-01-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46801:
-

Assignee: Hyukjin Kwon

> Do not treat exit 5 as a test failure in Python testing script
> --
>
> Key: SPARK-46801
> URL: https://issues.apache.org/jira/browse/SPARK-46801
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> 
> Running PySpark tests
> 
> Running PySpark tests. Output is in /__w/spark/spark/python/unit-tests.log
> Will test against the following Python executables: ['python3.12']
> Will test the following Python modules: ['pyspark-core', 'pyspark-streaming', 
> 'pyspark-errors']
> python3.12 python_implementation is CPython
> python3.12 version is: Python 3.12.1
> Starting test(python3.12): pyspark.streaming.tests.test_context (temp output: 
> /__w/spark/spark/python/target/8674ed86-36bd-47d1-863b-abb0405557f6/python3.12__pyspark.streaming.tests.test_context__umu69c3v.log)
> Finished test(python3.12): pyspark.streaming.tests.test_context (12s)
> Starting test(python3.12): pyspark.streaming.tests.test_dstream (temp output: 
> /__w/spark/spark/python/target/847eb56b-3c5f-49ab-8a83-3326bb96bc5d/python3.12__pyspark.streaming.tests.test_dstream__rorhk0lc.log)
> Finished test(python3.12): pyspark.streaming.tests.test_dstream (102s)
> Starting test(python3.12): pyspark.streaming.tests.test_kinesis (temp output: 
> /__w/spark/spark/python/target/78f23c83-c24d-4fa1-abbd-edb90f48dff1/python3.12__pyspark.streaming.tests.test_kinesis__q5l1pv0h.log)
> test_kinesis_stream 
> (pyspark.streaming.tests.test_kinesis.KinesisStreamTests.test_kinesis_stream) 
> ... skipped "Skipping all Kinesis Python tests as environmental variable 
> 'ENABLE_KINESIS_TESTS' was not set."
> test_kinesis_stream_api 
> (pyspark.streaming.tests.test_kinesis.KinesisStreamTests.test_kinesis_stream_api)
>  ... skipped "Skipping all Kinesis Python tests as environmental variable 
> 'ENABLE_KINESIS_TESTS' was not set."
> --
> Ran 0 tests in 0.000s
> NO TESTS RAN (skipped=2)
> Had test failures in pyspark.streaming.tests.test_kinesis with python3.12; 
> see logs.
> Error:  running /__w/spark/spark/python/run-tests 
> --modules=pyspark-core,pyspark-streaming,pyspark-errors --parallelism=1 
> --python-executables=python3.12 ; received return code 255
> Error: Process completed with exit code 19.
> {code}
> Scheduled job fails because of exit 5, see 
> https://github.com/pytest-dev/pytest/issues/2393. This isn't a test failure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46802) Cleanup codecov script

2024-01-22 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46802:


 Summary: Cleanup codecov script
 Key: SPARK-46802
 URL: https://issues.apache.org/jira/browse/SPARK-46802
 Project: Spark
  Issue Type: Test
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


We used to use {{coverage_daemon.py}} to track the coverage of the Python 
worker side (SPARK-7721). However, seems it does not work anymore. We should 
remove it out first, and think about other ways around.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46800) Support `spark.deploy.spreadOutDrivers`

2024-01-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46800:
-

Assignee: Dongjoon Hyun

> Support `spark.deploy.spreadOutDrivers`
> ---
>
> Key: SPARK-46800
> URL: https://issues.apache.org/jira/browse/SPARK-46800
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46801) Do not treat exit 5 as a test failure in Python testing script

2024-01-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46801:
---
Labels: pull-request-available  (was: )

> Do not treat exit 5 as a test failure in Python testing script
> --
>
> Key: SPARK-46801
> URL: https://issues.apache.org/jira/browse/SPARK-46801
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> 
> Running PySpark tests
> 
> Running PySpark tests. Output is in /__w/spark/spark/python/unit-tests.log
> Will test against the following Python executables: ['python3.12']
> Will test the following Python modules: ['pyspark-core', 'pyspark-streaming', 
> 'pyspark-errors']
> python3.12 python_implementation is CPython
> python3.12 version is: Python 3.12.1
> Starting test(python3.12): pyspark.streaming.tests.test_context (temp output: 
> /__w/spark/spark/python/target/8674ed86-36bd-47d1-863b-abb0405557f6/python3.12__pyspark.streaming.tests.test_context__umu69c3v.log)
> Finished test(python3.12): pyspark.streaming.tests.test_context (12s)
> Starting test(python3.12): pyspark.streaming.tests.test_dstream (temp output: 
> /__w/spark/spark/python/target/847eb56b-3c5f-49ab-8a83-3326bb96bc5d/python3.12__pyspark.streaming.tests.test_dstream__rorhk0lc.log)
> Finished test(python3.12): pyspark.streaming.tests.test_dstream (102s)
> Starting test(python3.12): pyspark.streaming.tests.test_kinesis (temp output: 
> /__w/spark/spark/python/target/78f23c83-c24d-4fa1-abbd-edb90f48dff1/python3.12__pyspark.streaming.tests.test_kinesis__q5l1pv0h.log)
> test_kinesis_stream 
> (pyspark.streaming.tests.test_kinesis.KinesisStreamTests.test_kinesis_stream) 
> ... skipped "Skipping all Kinesis Python tests as environmental variable 
> 'ENABLE_KINESIS_TESTS' was not set."
> test_kinesis_stream_api 
> (pyspark.streaming.tests.test_kinesis.KinesisStreamTests.test_kinesis_stream_api)
>  ... skipped "Skipping all Kinesis Python tests as environmental variable 
> 'ENABLE_KINESIS_TESTS' was not set."
> --
> Ran 0 tests in 0.000s
> NO TESTS RAN (skipped=2)
> Had test failures in pyspark.streaming.tests.test_kinesis with python3.12; 
> see logs.
> Error:  running /__w/spark/spark/python/run-tests 
> --modules=pyspark-core,pyspark-streaming,pyspark-errors --parallelism=1 
> --python-executables=python3.12 ; received return code 255
> Error: Process completed with exit code 19.
> {code}
> Scheduled job fails because of exit 5, see 
> https://github.com/pytest-dev/pytest/issues/2393. This isn't a test failure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46800) Support `spark.deploy.spreadOutDrivers`

2024-01-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46800:
---
Labels: pull-request-available  (was: )

> Support `spark.deploy.spreadOutDrivers`
> ---
>
> Key: SPARK-46800
> URL: https://issues.apache.org/jira/browse/SPARK-46800
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46801) Do not treat exit 5 as a test failure in Python testing script

2024-01-22 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46801:


 Summary: Do not treat exit 5 as a test failure in Python testing 
script
 Key: SPARK-46801
 URL: https://issues.apache.org/jira/browse/SPARK-46801
 Project: Spark
  Issue Type: Test
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}

Running PySpark tests

Running PySpark tests. Output is in /__w/spark/spark/python/unit-tests.log
Will test against the following Python executables: ['python3.12']
Will test the following Python modules: ['pyspark-core', 'pyspark-streaming', 
'pyspark-errors']
python3.12 python_implementation is CPython
python3.12 version is: Python 3.12.1
Starting test(python3.12): pyspark.streaming.tests.test_context (temp output: 
/__w/spark/spark/python/target/8674ed86-36bd-47d1-863b-abb0405557f6/python3.12__pyspark.streaming.tests.test_context__umu69c3v.log)
Finished test(python3.12): pyspark.streaming.tests.test_context (12s)
Starting test(python3.12): pyspark.streaming.tests.test_dstream (temp output: 
/__w/spark/spark/python/target/847eb56b-3c5f-49ab-8a83-3326bb96bc5d/python3.12__pyspark.streaming.tests.test_dstream__rorhk0lc.log)
Finished test(python3.12): pyspark.streaming.tests.test_dstream (102s)
Starting test(python3.12): pyspark.streaming.tests.test_kinesis (temp output: 
/__w/spark/spark/python/target/78f23c83-c24d-4fa1-abbd-edb90f48dff1/python3.12__pyspark.streaming.tests.test_kinesis__q5l1pv0h.log)
test_kinesis_stream 
(pyspark.streaming.tests.test_kinesis.KinesisStreamTests.test_kinesis_stream) 
... skipped "Skipping all Kinesis Python tests as environmental variable 
'ENABLE_KINESIS_TESTS' was not set."
test_kinesis_stream_api 
(pyspark.streaming.tests.test_kinesis.KinesisStreamTests.test_kinesis_stream_api)
 ... skipped "Skipping all Kinesis Python tests as environmental variable 
'ENABLE_KINESIS_TESTS' was not set."

--
Ran 0 tests in 0.000s

NO TESTS RAN (skipped=2)

Had test failures in pyspark.streaming.tests.test_kinesis with python3.12; see 
logs.
Error:  running /__w/spark/spark/python/run-tests 
--modules=pyspark-core,pyspark-streaming,pyspark-errors --parallelism=1 
--python-executables=python3.12 ; received return code 255
Error: Process completed with exit code 19.
{code}

Scheduled job fails because of exit 5, see 
https://github.com/pytest-dev/pytest/issues/2393. This isn't a test failure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46800) Support `spark.deploy.spreadOutDrivers`

2024-01-22 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-46800:
-

 Summary: Support `spark.deploy.spreadOutDrivers`
 Key: SPARK-46800
 URL: https://issues.apache.org/jira/browse/SPARK-46800
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46799) Improve `MasterSuite` to use nanoTime-based appIDs and workerIDs

2024-01-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46799:
---
Labels: pull-request-available  (was: )

> Improve `MasterSuite` to use nanoTime-based appIDs and workerIDs
> 
>
> Key: SPARK-46799
> URL: https://issues.apache.org/jira/browse/SPARK-46799
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46797) Rename spark.deploy.spreadOut to spark.deploy.spreadOutApps

2024-01-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46797.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44838
[https://github.com/apache/spark/pull/44838]

> Rename spark.deploy.spreadOut to spark.deploy.spreadOutApps
> ---
>
> Key: SPARK-46797
> URL: https://issues.apache.org/jira/browse/SPARK-46797
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46781) Test custom data source and input partition (pyspark.sql.datasource)

2024-01-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46781.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44808
[https://github.com/apache/spark/pull/44808]

> Test custom data source and input partition (pyspark.sql.datasource)
> 
>
> Key: SPARK-46781
> URL: https://issues.apache.org/jira/browse/SPARK-46781
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Test custom data source and input partition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46798) Kafka custom partition location assignment in Spark Structured Streaming (rack awareness)

2024-01-22 Thread Randall Schwager (Jira)
Randall Schwager created SPARK-46798:


 Summary: Kafka custom partition location assignment in Spark 
Structured Streaming (rack awareness)
 Key: SPARK-46798
 URL: https://issues.apache.org/jira/browse/SPARK-46798
 Project: Spark
  Issue Type: New Feature
  Components: Structured Streaming
Affects Versions: 3.5.0, 3.4.0, 3.3.0, 3.2.0, 3.1.0
Reporter: Randall Schwager


SPARK-15406 Added Kafka consumer support to Spark Structured Streaming, but it 
did not add custom partition location assignment as a feature. The Structured 
Streaming Kafka consumer as it exists today evenly allocates Kafka topic 
partitions to executors without regard to Kafka broker rack information or 
executor location. This behavior can drive large cross-AZ networking costs in 
large deployments.

In the [Design 
Doc|https://docs.google.com/document/d/19t2rWe51x7tq2e5AOfrsM9qb8_m7BRuv9fel9i0PqR8/edit#heading=h.k36c6oyz89xw]
 for SPARK-15406, the ability to assign Kafka partitions to particular 
executors (a feature which would enable rack awareness) was discussed, but 
never implemented.

For DStreams users, there does seem to be a way to assign Kafka partitions to 
Spark executors in a custom fashion: 
[LocationStrategies.PreferFixed|https://github.com/apache/spark/blob/master/connector/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/LocationStrategy.scala#L69].

I'd like to propose, and implement if approved, support for custom partition 
location assignment. Please find the design doc describing the change 
[here|https://docs.google.com/document/d/1RoEk_mt8AUh9sTQZ1NfzIuuYKf1zx6BP1K3IlJ2b8iM/edit#heading=h.pbt6pdb2jt5c]
 






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46797) Rename spark.deploy.spreadOut to spark.deploy.spreadOutApps

2024-01-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46797:
--
Parent: SPARK-45869
Issue Type: Sub-task  (was: Improvement)

> Rename spark.deploy.spreadOut to spark.deploy.spreadOutApps
> ---
>
> Key: SPARK-46797
> URL: https://issues.apache.org/jira/browse/SPARK-46797
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46797) Rename spark.deploy.spreadOut to spark.deploy.spreadOutApps

2024-01-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46797:
-

Assignee: Dongjoon Hyun

> Rename spark.deploy.spreadOut to spark.deploy.spreadOutApps
> ---
>
> Key: SPARK-46797
> URL: https://issues.apache.org/jira/browse/SPARK-46797
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46797) Rename spark.deploy.spreadOut to spark.deploy.spreadOutApps

2024-01-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46797:
---
Labels: pull-request-available  (was: )

> Rename spark.deploy.spreadOut to spark.deploy.spreadOutApps
> ---
>
> Key: SPARK-46797
> URL: https://issues.apache.org/jira/browse/SPARK-46797
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46797) Rename spark.deploy.spreadOut to spark.deploy.spreadOutApps

2024-01-22 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-46797:
-

 Summary: Rename spark.deploy.spreadOut to 
spark.deploy.spreadOutApps
 Key: SPARK-46797
 URL: https://issues.apache.org/jira/browse/SPARK-46797
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46796) RocksDB versionID Mismatch in SST files

2024-01-22 Thread Bhuwan Sahni (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809657#comment-17809657
 ] 

Bhuwan Sahni commented on SPARK-46796:
--

PR created - [https://github.com/apache/spark/pull/44837]

> RocksDB versionID Mismatch in SST files
> ---
>
> Key: SPARK-46796
> URL: https://issues.apache.org/jira/browse/SPARK-46796
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.4.2, 3.4.1, 3.5.0, 4.0.0, 3.5.1, 3.5.2
>Reporter: Bhuwan Sahni
>Priority: Major
>  Labels: pull-request-available
>
> We need to ensure that the correct SST files are used on executor during 
> RocksDB load as per mapping in metadata.zip. With current implementation, its 
> possible that the executor uses a SST file (with a different UUID) from a 
> older version which is not the exact file mapped in the metadata.zip. This 
> can cause version Id mismatch errors while loading RocksDB leading to 
> streaming query failures.
> Few scenarios in which such a situation can occur are:
> **Scenario 1 - Distributed file system does not support overwrite 
> functionality**
>  # A task T1 on executor A commits Rocks Db snapshot for version X.
>  # Another task T2 on executor A loads version X-1, and tries to commit X. 
> During commit, SST files are copied but metadata file is not overwritten.
>  # Task T3 is scheduled on A, this task reuses previously loaded X (loaded in 
> (2) above) and commits X+1.
>  # Task T4 is scheduled on A again for state store version X. The executor 
> deletes SST files corresponding to commit X+1, downloads the metadata for 
> version X (which was committed in task T1), and loads RocksDB. This would 
> fail because the metadata in (1) is not compatible with SST files in (2).
>  
> **Scenario 2 - Multiple older State versions have different DFS files for a 
> particular SST file.**
> In the current logic, we look at all the versions older than X to find if a 
> local SST file can be reused. The reuse logic only ensures that the local SST 
> file was present in any of the previous version. However, its possible that 2 
> different older versions had a different SST file (`0001-uuid1.sst` and 
> `0001-uuid2.sst`) uploaded on DFS. These SST files will have the same local 
> name (with UUID truncated) and size, but are not compatible due to different 
> RocksDB Version Ids. We need to ensure that the correct SST file (as per 
> UUID) is picked as mentioned in the metadata.zip.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46796) RocksDB versionID Mismatch in SST files

2024-01-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46796:
---
Labels: pull-request-available  (was: )

> RocksDB versionID Mismatch in SST files
> ---
>
> Key: SPARK-46796
> URL: https://issues.apache.org/jira/browse/SPARK-46796
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.4.2, 3.4.1, 3.5.0, 4.0.0, 3.5.1, 3.5.2
>Reporter: Bhuwan Sahni
>Priority: Major
>  Labels: pull-request-available
>
> We need to ensure that the correct SST files are used on executor during 
> RocksDB load as per mapping in metadata.zip. With current implementation, its 
> possible that the executor uses a SST file (with a different UUID) from a 
> older version which is not the exact file mapped in the metadata.zip. This 
> can cause version Id mismatch errors while loading RocksDB leading to 
> streaming query failures.
> Few scenarios in which such a situation can occur are:
> **Scenario 1 - Distributed file system does not support overwrite 
> functionality**
>  # A task T1 on executor A commits Rocks Db snapshot for version X.
>  # Another task T2 on executor A loads version X-1, and tries to commit X. 
> During commit, SST files are copied but metadata file is not overwritten.
>  # Task T3 is scheduled on A, this task reuses previously loaded X (loaded in 
> (2) above) and commits X+1.
>  # Task T4 is scheduled on A again for state store version X. The executor 
> deletes SST files corresponding to commit X+1, downloads the metadata for 
> version X (which was committed in task T1), and loads RocksDB. This would 
> fail because the metadata in (1) is not compatible with SST files in (2).
>  
> **Scenario 2 - Multiple older State versions have different DFS files for a 
> particular SST file.**
> In the current logic, we look at all the versions older than X to find if a 
> local SST file can be reused. The reuse logic only ensures that the local SST 
> file was present in any of the previous version. However, its possible that 2 
> different older versions had a different SST file (`0001-uuid1.sst` and 
> `0001-uuid2.sst`) uploaded on DFS. These SST files will have the same local 
> name (with UUID truncated) and size, but are not compatible due to different 
> RocksDB Version Ids. We need to ensure that the correct SST file (as per 
> UUID) is picked as mentioned in the metadata.zip.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46687) Implement memory-profiler

2024-01-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46687:
---
Labels: pull-request-available  (was: )

> Implement memory-profiler
> -
>
> Key: SPARK-46687
> URL: https://issues.apache.org/jira/browse/SPARK-46687
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46796) RocksDB versionID Mismatch in SST files

2024-01-22 Thread Bhuwan Sahni (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809634#comment-17809634
 ] 

Bhuwan Sahni commented on SPARK-46796:
--

Working on a PR for the fix.

> RocksDB versionID Mismatch in SST files
> ---
>
> Key: SPARK-46796
> URL: https://issues.apache.org/jira/browse/SPARK-46796
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.4.2, 3.4.1, 3.5.0, 4.0.0, 3.5.1, 3.5.2
>Reporter: Bhuwan Sahni
>Priority: Major
>
> We need to ensure that the correct SST files are used on executor during 
> RocksDB load as per mapping in metadata.zip. With current implementation, its 
> possible that the executor uses a SST file (with a different UUID) from a 
> older version which is not the exact file mapped in the metadata.zip. This 
> can cause version Id mismatch errors while loading RocksDB leading to 
> streaming query failures.
> Few scenarios in which such a situation can occur are:
> **Scenario 1 - Distributed file system does not support overwrite 
> functionality**
>  # A task T1 on executor A commits Rocks Db snapshot for version X.
>  # Another task T2 on executor A loads version X-1, and tries to commit X. 
> During commit, SST files are copied but metadata file is not overwritten.
>  # Task T3 is scheduled on A, this task reuses previously loaded X (loaded in 
> (2) above) and commits X+1.
>  # Task T4 is scheduled on A again for state store version X. The executor 
> deletes SST files corresponding to commit X+1, downloads the metadata for 
> version X (which was committed in task T1), and loads RocksDB. This would 
> fail because the metadata in (1) is not compatible with SST files in (2).
>  
> **Scenario 2 - Multiple older State versions have different DFS files for a 
> particular SST file.**
> In the current logic, we look at all the versions older than X to find if a 
> local SST file can be reused. The reuse logic only ensures that the local SST 
> file was present in any of the previous version. However, its possible that 2 
> different older versions had a different SST file (`0001-uuid1.sst` and 
> `0001-uuid2.sst`) uploaded on DFS. These SST files will have the same local 
> name (with UUID truncated) and size, but are not compatible due to different 
> RocksDB Version Ids. We need to ensure that the correct SST file (as per 
> UUID) is picked as mentioned in the metadata.zip.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46796) RocksDB versionID Mismatch in SST files

2024-01-22 Thread Bhuwan Sahni (Jira)
Bhuwan Sahni created SPARK-46796:


 Summary: RocksDB versionID Mismatch in SST files
 Key: SPARK-46796
 URL: https://issues.apache.org/jira/browse/SPARK-46796
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 3.5.0, 3.4.1, 3.4.2, 4.0.0, 3.5.1, 3.5.2
Reporter: Bhuwan Sahni


We need to ensure that the correct SST files are used on executor during 
RocksDB load as per mapping in metadata.zip. With current implementation, its 
possible that the executor uses a SST file (with a different UUID) from a older 
version which is not the exact file mapped in the metadata.zip. This can cause 
version Id mismatch errors while loading RocksDB leading to streaming query 
failures.

Few scenarios in which such a situation can occur are:

**Scenario 1 - Distributed file system does not support overwrite 
functionality**
 # A task T1 on executor A commits Rocks Db snapshot for version X.
 # Another task T2 on executor A loads version X-1, and tries to commit X. 
During commit, SST files are copied but metadata file is not overwritten.
 # Task T3 is scheduled on A, this task reuses previously loaded X (loaded in 
(2) above) and commits X+1.
 # Task T4 is scheduled on A again for state store version X. The executor 
deletes SST files corresponding to commit X+1, downloads the metadata for 
version X (which was committed in task T1), and loads RocksDB. This would fail 
because the metadata in (1) is not compatible with SST files in (2).

 

**Scenario 2 - Multiple older State versions have different DFS files for a 
particular SST file.**


In the current logic, we look at all the versions older than X to find if a 
local SST file can be reused. The reuse logic only ensures that the local SST 
file was present in any of the previous version. However, its possible that 2 
different older versions had a different SST file (`0001-uuid1.sst` and 
`0001-uuid2.sst`) uploaded on DFS. These SST files will have the same local 
name (with UUID truncated) and size, but are not compatible due to different 
RocksDB Version Ids. We need to ensure that the correct SST file (as per UUID) 
is picked as mentioned in the metadata.zip.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46779) Grouping by subquery with a cached relation can fail

2024-01-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46779:
-

Assignee: Bruce Robbins

> Grouping by subquery with a cached relation can fail
> 
>
> Key: SPARK-46779
> URL: https://issues.apache.org/jira/browse/SPARK-46779
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.2, 3.5.0, 4.0.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Major
>  Labels: pull-request-available
>
> Example:
> {noformat}
> create or replace temp view data(c1, c2) as values
> (1, 2),
> (1, 3),
> (3, 7),
> (4, 5);
> cache table data;
> select c1, (select count(*) from data d1 where d1.c1 = d2.c1), count(c2) from 
> data d2 group by all;
> {noformat}
> It fails with the following error:
> {noformat}
> [INTERNAL_ERROR] Couldn't find count(1)#163L in 
> [c1#78,_groupingexpression#149L,count(1)#82L] SQLSTATE: XX000
> org.apache.spark.SparkException: [INTERNAL_ERROR] Couldn't find count(1)#163L 
> in [c1#78,_groupingexpression#149L,count(1)#82L] SQLSTATE: XX000
> {noformat}
> If you don't cache the view, the query succeeds.
> Note, in 3.4.2 and 3.5.0 the issue happens only with cached tables, not 
> cached views. I think that's because cached views were not getting properly 
> deduplicated in those versions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46779) Grouping by subquery with a cached relation can fail

2024-01-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46779.
---
Fix Version/s: 3.4.3
   3.5.1
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 44806
[https://github.com/apache/spark/pull/44806]

> Grouping by subquery with a cached relation can fail
> 
>
> Key: SPARK-46779
> URL: https://issues.apache.org/jira/browse/SPARK-46779
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.2, 3.5.0, 4.0.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.3, 3.5.1, 4.0.0
>
>
> Example:
> {noformat}
> create or replace temp view data(c1, c2) as values
> (1, 2),
> (1, 3),
> (3, 7),
> (4, 5);
> cache table data;
> select c1, (select count(*) from data d1 where d1.c1 = d2.c1), count(c2) from 
> data d2 group by all;
> {noformat}
> It fails with the following error:
> {noformat}
> [INTERNAL_ERROR] Couldn't find count(1)#163L in 
> [c1#78,_groupingexpression#149L,count(1)#82L] SQLSTATE: XX000
> org.apache.spark.SparkException: [INTERNAL_ERROR] Couldn't find count(1)#163L 
> in [c1#78,_groupingexpression#149L,count(1)#82L] SQLSTATE: XX000
> {noformat}
> If you don't cache the view, the query succeeds.
> Note, in 3.4.2 and 3.5.0 the issue happens only with cached tables, not 
> cached views. I think that's because cached views were not getting properly 
> deduplicated in those versions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46795) Replace UnsupportedOperationException by SparkUnsupportedOperationException in sql/core

2024-01-22 Thread Max Gekk (Jira)
Max Gekk created SPARK-46795:


 Summary: Replace UnsupportedOperationException by 
SparkUnsupportedOperationException in sql/core
 Key: SPARK-46795
 URL: https://issues.apache.org/jira/browse/SPARK-46795
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Max Gekk
Assignee: Max Gekk
 Fix For: 4.0.0


Replace all UnsupportedOperationException by SparkUnsupportedOperationException 
in Catalyst code base, and introduce new legacy error classes with the 
_LEGACY_ERROR_TEMP_ prefix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46795) Replace UnsupportedOperationException by SparkUnsupportedOperationException in sql/core

2024-01-22 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-46795:
-
Description: Replace all UnsupportedOperationException by 
SparkUnsupportedOperationException in sql/core code base, and introduce new 
legacy error classes with the _LEGACY_ERROR_TEMP_ prefix.  (was: Replace all 
UnsupportedOperationException by SparkUnsupportedOperationException in Catalyst 
code base, and introduce new legacy error classes with the _LEGACY_ERROR_TEMP_ 
prefix.)

> Replace UnsupportedOperationException by SparkUnsupportedOperationException 
> in sql/core
> ---
>
> Key: SPARK-46795
> URL: https://issues.apache.org/jira/browse/SPARK-46795
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Replace all UnsupportedOperationException by 
> SparkUnsupportedOperationException in sql/core code base, and introduce new 
> legacy error classes with the _LEGACY_ERROR_TEMP_ prefix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46762) Spark Connect 3.5 Classloading issue with external jar

2024-01-22 Thread nirav patel (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nirav patel updated SPARK-46762:

Summary: Spark Connect 3.5 Classloading issue with external jar  (was: 
Spark Connect 3.5 Classloading issue)

> Spark Connect 3.5 Classloading issue with external jar
> --
>
> Key: SPARK-46762
> URL: https://issues.apache.org/jira/browse/SPARK-46762
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: nirav patel
>Priority: Major
>
> We are having following `java.lang.ClassCastException` error in spark 
> Executors when using spark-connect 3.5 with external spark sql catalog jar - 
> iceberg-spark-runtime-3.5_2.12-1.4.3.jar
> We also set "spark.executor.userClassPathFirst=true" otherwise child class 
> gets loaded by MutableClassLoader and parent class gets loaded by 
> ChildFirstCLassLoader and that causes ClassCastException as well.
>  
> {code:java}
> pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 3) (spark35-m.c.mycomp-dev-test.internal executor 2): 
> java.lang.ClassCastException: class 
> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to 
> class org.apache.iceberg.Table 
> (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed 
> module of loader org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053; 
> org.apache.iceberg.Table is in unnamed module of loader 
> org.apache.spark.util.ChildFirstURLClassLoader @4b18b943)
>     at 
> org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88)
>     at 
> org.apache.iceberg.spark.source.RowDataReader.(RowDataReader.java:50)
>     at 
> org.apache.iceberg.spark.source.SparkRowReaderFactory.createReader(SparkRowReaderFactory.java:45)
>     at 
> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:84)
>     at 
> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
>     at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>     at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
>     at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
>     at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
>     at 
> org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
>     at org.apache.spark.scheduler.Task.run(Task.scala:141)
>     at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
>     at org.apach...{code}
>  
> `org.apache.iceberg.spark.source.SerializableTableWithSize` is a child of 
> `org.apache.iceberg.Table` and they are both in only one jar  
> `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` 
> We verified that there's only one jar of 
> `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` loaded when spark-connect server 
> is started. 
> Looking more into Error it seems classloader itself is instantiated multiple 
> times somewhere. I can see two instances: 
> org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053 and 
> org.apache.spark.util.ChildFirstURLClassLoader @4b18b943 
>  
> *Affected version:*
> spark 3.5 and spark-connect_2.12:3.5.0 works fine
>  
> *Not affected version and variation:*
> Spark 3.4 and spark-connect_2.12:3.4.0 works fine with external jar
> Also works with just Spark 3.5 spark-submit script directly (ie without using 
> spark-connect 3.5 )
>  
> Issue has been open with Iceberg as well: 
> [https://github.com/apache/iceberg/issues/8978]
> And been discussed in dev@org.apache.iceberg: 
> [https://lists.apache.org/thread/5q1pdqqrd1h06hgs8vx9ztt60z5yv8n1]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46763) ReplaceDeduplicateWithAggregate fails when non-grouping keys have duplicate attributes

2024-01-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46763:
---
Labels: pull-request-available  (was: )

> ReplaceDeduplicateWithAggregate fails when non-grouping keys have duplicate 
> attributes
> --
>
> Key: SPARK-46763
> URL: https://issues.apache.org/jira/browse/SPARK-46763
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 3.5.0
>Reporter: Nikhil Sheoran
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45404) Support AWS_ENDPOINT_URL env variable

2024-01-22 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809536#comment-17809536
 ] 

Steve Loughran commented on SPARK-45404:


Just saw this while working on SPARK-35878. 

If you are copying endpoints then you may also want to think about picking up 
the region from AWS_REGION too.

The full list of env vars which I have collected by looking in the AWS SDKs is 
up at 
https://github.com/steveloughran/cloudstore/blob/main/src/main/java/org/apache/hadoop/fs/store/diag/S3ADiagnosticsInfo.java#L379
I do not know what they all mean or do, only that if I get a support call I 
want to know if anyone has been setting them.



> Support AWS_ENDPOINT_URL env variable
> -
>
> Key: SPARK-45404
> URL: https://issues.apache.org/jira/browse/SPARK-45404
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46794) Incorrect results due to inferred predicate from checkpoint with subquery

2024-01-22 Thread Tom van Bussel (Jira)
Tom van Bussel created SPARK-46794:
--

 Summary: Incorrect results due to inferred predicate from 
checkpoint with subquery 
 Key: SPARK-46794
 URL: https://issues.apache.org/jira/browse/SPARK-46794
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0
Reporter: Tom van Bussel


Spark can produce incorrect results when using a checkpointed DataFrame with a 
filter containing a scalar subquery. This subquery is included in the 
constraints of the resulting LogicalRDD, and may then be propagated as a filter 
when joining with the checkpointed DataFrame. This causes the subquery to be 
evaluated twice: once during checkpointing and once while evaluating the query. 
These two subquery evaluations may return different results, e.g. when the 
subquery contains a limit with an underspecified sort order.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46792) Refactor ChannelBuilder system

2024-01-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46792:
---
Labels: pull-request-available  (was: )

> Refactor ChannelBuilder system
> --
>
> Key: SPARK-46792
> URL: https://issues.apache.org/jira/browse/SPARK-46792
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Alice Sayutina
>Priority: Minor
>  Labels: pull-request-available
>
> Refactor ChannelBuilder to separate the specific channel builder 
> implementation from the abstract class for other channel builders



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46793) Revert S3A endpoint fixup logic of SPARK-35878

2024-01-22 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated SPARK-46793:
---
Summary: Revert S3A endpoint fixup logic of SPARK-35878  (was: Revert 
region fixup logic of SPARK-35878)

> Revert S3A endpoint fixup logic of SPARK-35878
> --
>
> Key: SPARK-46793
> URL: https://issues.apache.org/jira/browse/SPARK-46793
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.5.0, 3.4.3
>Reporter: Steve Loughran
>Priority: Major
>
> The v2 SDK does its region resolution "differently", and the changes of 
> SPARK-35878 actually create problems.
> That PR went in to fix  a regression in Hadoop 3.3.1 which has been fixed 
> since 3.3.2; removing it is not going to cause problems on anyone not using 
> the 3.3.1 release, which is 3 years old and replaced by multiple follow on 
> 3.3.x releases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46793) Revert region fixup logic of SPARK-35878

2024-01-22 Thread Steve Loughran (Jira)
Steve Loughran created SPARK-46793:
--

 Summary: Revert region fixup logic of SPARK-35878
 Key: SPARK-46793
 URL: https://issues.apache.org/jira/browse/SPARK-46793
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 3.5.0, 3.4.3
Reporter: Steve Loughran


The v2 SDK does its region resolution "differently", and the changes of 
SPARK-35878 actually create problems.

That PR went in to fix  a regression in Hadoop 3.3.1 which has been fixed since 
3.3.2; removing it is not going to cause problems on anyone not using the 3.3.1 
release, which is 3 years old and replaced by multiple follow on 3.3.x releases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46792) Refactor ChannelBuilder system

2024-01-22 Thread Alice Sayutina (Jira)
Alice Sayutina created SPARK-46792:
--

 Summary: Refactor ChannelBuilder system
 Key: SPARK-46792
 URL: https://issues.apache.org/jira/browse/SPARK-46792
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Alice Sayutina


Refactor ChannelBuilder to separate the specific channel builder implementation 
from the abstract class for other channel builders



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46791) Support Java `Set` in JavaTypeInference

2024-01-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46791.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44828
[https://github.com/apache/spark/pull/44828]

> Support Java `Set` in JavaTypeInference
> ---
>
> Key: SPARK-46791
> URL: https://issues.apache.org/jira/browse/SPARK-46791
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Scala Set (scala.collection.Set) is supported in ScalaReflection so users can 
> encode Scala Set in Dataset. But Java Set is not supported in bean encoder 
> (i.e., JavaTypeInference). This feature inconsistency makes Java users cannot 
> use Set like Scala users do.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46777) Refactor `StreamingDataSourceRelation` into `StreamingDataSourceRelation` and `StreamingDataSourceScanRelation` for parity with batch scan

2024-01-22 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-46777.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44818
[https://github.com/apache/spark/pull/44818]

> Refactor `StreamingDataSourceRelation` into `StreamingDataSourceRelation` and 
> `StreamingDataSourceScanRelation` for parity with batch scan
> --
>
> Key: SPARK-46777
> URL: https://issues.apache.org/jira/browse/SPARK-46777
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Jackie Zhang
>Assignee: Jackie Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> To prepare for the incoming structured streaming operator pushdown, we'd like 
> to refactor some catalyst object relationships first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46777) Refactor `StreamingDataSourceRelation` into `StreamingDataSourceRelation` and `StreamingDataSourceScanRelation` for parity with batch scan

2024-01-22 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-46777:


Assignee: Jackie Zhang

> Refactor `StreamingDataSourceRelation` into `StreamingDataSourceRelation` and 
> `StreamingDataSourceScanRelation` for parity with batch scan
> --
>
> Key: SPARK-46777
> URL: https://issues.apache.org/jira/browse/SPARK-46777
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Jackie Zhang
>Assignee: Jackie Zhang
>Priority: Major
>  Labels: pull-request-available
>
> To prepare for the incoming structured streaming operator pushdown, we'd like 
> to refactor some catalyst object relationships first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46789) Add `VolumeSuite` to K8s IT

2024-01-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46789:
--
Fix Version/s: 3.5.1

> Add `VolumeSuite` to K8s IT
> ---
>
> Key: SPARK-46789
> URL: https://issues.apache.org/jira/browse/SPARK-46789
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46738) `Cast` displayed different results between Regular Spark and Spark Connect

2024-01-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46738:
---
Labels: pull-request-available  (was: )

> `Cast` displayed different results between Regular Spark and Spark Connect
> --
>
> Key: SPARK-46738
> URL: https://issues.apache.org/jira/browse/SPARK-46738
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Attachments: screenshot-1.png
>
>
> The following doctest will throw an error in the tests of the pyspark-connect 
> module
> {code:java}
> Example 5: Decrypt data with key.
> >>> import pyspark.sql.functions as sf
> >>> df = spark.createDataFrame([(
> ... "83F16B2AA704794132802D248E6BFD4E380078182D1544813898AC97E709B28A94",
> ... "",)],
> ... ["input", "key"]
> ... )
> >>> df.select(sf.try_aes_decrypt(
> ... sf.unhex(df.input), df.key
> ... ).cast("STRING")).show(truncate=False) # doctest: +SKIP
> +--+
> |CAST(try_aes_decrypt(unhex(input), key, GCM, DEFAULT, ) AS STRING)|
> +--+
> |Spark |
> +--+ {code}
> {code:java}
> df.select(sf.try_aes_decrypt(
> 4170sf.unhex(df.input), df.key
> 4171).cast("STRING")).show(truncate=False)
> 4172Expected:
> 4173+--+
> 4174|CAST(try_aes_decrypt(unhex(input), key, GCM, DEFAULT, ) AS STRING)|
> 4175+--+
> 4176|Spark |
> 4177+--+
> 4178Got:
> 4179+--+
> 4180|try_aes_decrypt(unhex(input), key, GCM, DEFAULT, )|
> 4181+--+
> 4182|Spark |
> 4183+--+{code}
> !screenshot-1.png|width=671,height=222!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46738) `Cast` displayed different results between Regular Spark and Spark Connect

2024-01-22 Thread BingKun Pan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-46738:

Summary: `Cast` displayed different results between Regular Spark and Spark 
Connect  (was: `Cast` of pyspark displayed different results between Regular 
Spark and Spark Connect)

> `Cast` displayed different results between Regular Spark and Spark Connect
> --
>
> Key: SPARK-46738
> URL: https://issues.apache.org/jira/browse/SPARK-46738
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
> Attachments: screenshot-1.png
>
>
> The following doctest will throw an error in the tests of the pyspark-connect 
> module
> {code:java}
> Example 5: Decrypt data with key.
> >>> import pyspark.sql.functions as sf
> >>> df = spark.createDataFrame([(
> ... "83F16B2AA704794132802D248E6BFD4E380078182D1544813898AC97E709B28A94",
> ... "",)],
> ... ["input", "key"]
> ... )
> >>> df.select(sf.try_aes_decrypt(
> ... sf.unhex(df.input), df.key
> ... ).cast("STRING")).show(truncate=False) # doctest: +SKIP
> +--+
> |CAST(try_aes_decrypt(unhex(input), key, GCM, DEFAULT, ) AS STRING)|
> +--+
> |Spark |
> +--+ {code}
> {code:java}
> df.select(sf.try_aes_decrypt(
> 4170sf.unhex(df.input), df.key
> 4171).cast("STRING")).show(truncate=False)
> 4172Expected:
> 4173+--+
> 4174|CAST(try_aes_decrypt(unhex(input), key, GCM, DEFAULT, ) AS STRING)|
> 4175+--+
> 4176|Spark |
> 4177+--+
> 4178Got:
> 4179+--+
> 4180|try_aes_decrypt(unhex(input), key, GCM, DEFAULT, )|
> 4181+--+
> 4182|Spark |
> 4183+--+{code}
> !screenshot-1.png|width=671,height=222!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46718) Upgrade Arrow to 15.0.0

2024-01-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46718:
--

Assignee: (was: Apache Spark)

> Upgrade Arrow to 15.0.0
> ---
>
> Key: SPARK-46718
> URL: https://issues.apache.org/jira/browse/SPARK-46718
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-01-15-14-02-57-814.png
>
>
> https://github.com/apache/arrow/releases/tag/apache-arrow-15.0.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46718) Upgrade Arrow to 15.0.0

2024-01-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46718:
--

Assignee: Apache Spark

> Upgrade Arrow to 15.0.0
> ---
>
> Key: SPARK-46718
> URL: https://issues.apache.org/jira/browse/SPARK-46718
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-01-15-14-02-57-814.png
>
>
> https://github.com/apache/arrow/releases/tag/apache-arrow-15.0.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46789) Add `VolumeSuite` to K8s IT

2024-01-22 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-46789:


Assignee: Dongjoon Hyun

> Add `VolumeSuite` to K8s IT
> ---
>
> Key: SPARK-46789
> URL: https://issues.apache.org/jira/browse/SPARK-46789
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46789) Add `VolumeSuite` to K8s IT

2024-01-22 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-46789.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44827
[https://github.com/apache/spark/pull/44827]

> Add `VolumeSuite` to K8s IT
> ---
>
> Key: SPARK-46789
> URL: https://issues.apache.org/jira/browse/SPARK-46789
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46673) Refine docstring `aes_encrypt/aes_decrypt/try_aes_decrypt`

2024-01-22 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-46673:


Assignee: BingKun Pan

> Refine docstring `aes_encrypt/aes_decrypt/try_aes_decrypt`
> --
>
> Key: SPARK-46673
> URL: https://issues.apache.org/jira/browse/SPARK-46673
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46673) Refine docstring `aes_encrypt/aes_decrypt/try_aes_decrypt`

2024-01-22 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-46673.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44750
[https://github.com/apache/spark/pull/44750]

> Refine docstring `aes_encrypt/aes_decrypt/try_aes_decrypt`
> --
>
> Key: SPARK-46673
> URL: https://issues.apache.org/jira/browse/SPARK-46673
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46791) Support Java `Set` in JavaTypeInference

2024-01-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46791:
---
Labels: pull-request-available  (was: )

> Support Java `Set` in JavaTypeInference
> ---
>
> Key: SPARK-46791
> URL: https://issues.apache.org/jira/browse/SPARK-46791
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>
> Scala Set (scala.collection.Set) is supported in ScalaReflection so users can 
> encode Scala Set in Dataset. But Java Set is not supported in bean encoder 
> (i.e., JavaTypeInference). This feature inconsistency makes Java users cannot 
> use Set like Scala users do.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46791) Support Java `Set` in JavaTypeInference

2024-01-22 Thread L. C. Hsieh (Jira)
L. C. Hsieh created SPARK-46791:
---

 Summary: Support Java `Set` in JavaTypeInference
 Key: SPARK-46791
 URL: https://issues.apache.org/jira/browse/SPARK-46791
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: L. C. Hsieh
Assignee: L. C. Hsieh


Scala Set (scala.collection.Set) is supported in ScalaReflection so users can 
encode Scala Set in Dataset. But Java Set is not supported in bean encoder 
(i.e., JavaTypeInference). This feature inconsistency makes Java users cannot 
use Set like Scala users do.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org